-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpi: support for MPI testing on LC hardware #7
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just skimmed this and looks like good work!
Some really high level comments to start:
- The MPI hello test should probably be named
hello
or similar instead of more the more genericmpi_tests.c
. The reason is that we may want to add other simple MPI based tessin the future (e.g. in flux-core we havehello
andabort
andversion
tests) - Eventually, we might want to move the MPI testing driver (currently the
script
inmpi-test.gitlab-ci.yml
) to a script so it is easier to update the set of MPI implementations and compilers that are tested on each cluster. (We may want to add a config file for example so the list is easily updated and all in one place.) - minor: it might be slightly preferable to remove
.gitlab-ci
from the name ofmpi-test.gitlab-ci.yml
. Since the file is in.gitlab
already it is redundant.
My original reason for doing this was that I thought we could add the abort and version tests as additional functions to
I like the idea of a config file that could compile and run tests and standardize this across multiple machines. The implementation of this is still a little nebulous in my mind. I'll see if I can hammer out an example... I think moving the |
This is not a bad idea, but I think it will result in more complexity in the long term (plus if we have a test or benchmark from elsewhere, it will be more work to integrate it into the test program than it would be to just drop in the new test).
That sounds good. I think eventually we'll be submitting a suite of tests to the CI flux instance. The script can eventually handle this submission, monitoring of tests, and collection of results from all jobs. |
What you're describing sounds to me like we'll be creating one Flux instance in CI (probably 2 full nodes) and then submitting many different MPI jobs utilizing different compilers to it, rather than creating many small instances (say, 2 nodes, 1 core on each) for each individual MPI job. Am I tracking correctly? |
I think there's a small bit of design work that needs to be done here. I haven't thought about this in detail so I apologize if my thoughts are not well-formed, but it seems like each MPI+compiler test is comprised of the following steps (this is just my first thought, so happy to discuss further)
These steps seem to naturally compose what we'd think of as a batch job. The batch script would handle these steps including compilation of the MPI tests with the defined compiler and MPI, then would submit the suite of jobs and collect and report resuls (implementation TBD). An outer script would submit a batch job for each test mpi and compiler that we're targeting to the CI instance. That way the more resources the CI Flux instance has, the faster we'll run through these tests. Does that make any sense? |
I'll note one drawback to doing the compilation in the batch job is that cores in the allocation will go idle during this stage since no jobs can be run until the compilation completes. An optimization might be to submit the compile step as one single-node job, and the tests as a batch job with a dependency on the compile job. However, this feels like a premature optimization at this point. Hm, we could also submit all of the compile and MPI tests as jobs to the CI instance with appropriate dependencies (no nested batch jobs). This would allow more flexibility in the size of MPI test jobs and would perhaps be more efficient scheduling. It also may be easier to collect the results since all the jobs are submitted at one level 🤔 |
The outer script you described is a major piece this PR is missing. The bare bones of 1-5 you described comprising a batch job are prototyped in
I think we're on the same page. If we're requesting a 2 node instance for testing interconnects, we could submit all of the compilation and run batch jobs to the enclosing instance (each requesting 2 nodes and n cores where n >=2) and let Flux sort out what runs when.
This is on my todo list not only for the MPI work but for aggregating results from the testsuite runs as well. One thing I have noticed when running MPI jobs is there are some things in All excellent thoughts, thanks so much @grondo. I think we're making a lot of progress here, or at least I'm starting to grasp what this could look like. As a first step, I'll look into the "outer script" you described, and we can reason more from there. |
@grondo Let me know if this is closer to the target. Note that, for debugging purposes, it currently outputs the Here's how I've been running for testing:
|
995ba89
to
0618c25
Compare
@grondo This is ready for another review. Some notable changes:
|
Oh, and another GitLab logfile that might be helpful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just took a quick pass through with some comments.
mpi/outer_script.sh
Outdated
version | ||
" | ||
|
||
IS_CORONA=$(echo $HOSTNAME | grep corona) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In LC we have an environment variable LCSCHEDCLUSTER
which might be more useful, e.g. on corona
LCSCHEDCLUSTER=corona
Then you can test if test "$LCSCHEDCLUSTER" = "corona"
.
As a side note, you could generalize this using bash indirect expansion
If the first character of parameter is an exclamation point (!), and parameter is not a nameref, it introduces a level of indirection. Bash uses the value formed by expanding the rest of parameter as the new parameter; this is then expanded and that value is used in the rest of the expansion, rather than the expansion of the original parameter. This is known as indirect expansion. The value is subject to tilde expansion, parameter expansion, command substitution, and arithmetic expansion. If parameter is a nameref, this expands to the name of the variable referenced by parameter instead of performing the complete indirect expansion. The exceptions to this are the expansions of ${!prefix*} and ${!name[@]} described below. The exclamation point must immediately follow the left brace in order to introduce indirection.
E.g.
#!/bin/sh
corona_COMPILERS="
gcc
clang
intel
"
corona_MPIS="
mvapich2
"
MPIS="${LCSCHEDCLUSTER}_MPIS"
COMPILERS="${LCSCHEDCLUSTER}_COMPILERS"
for mpi in ${!MPIS}; do
for compiler in ${!COMPILERS}; do
echo "$compiler $mpi"
done
done
$ bash test.sh
gcc mvapich2
clang mvapich2
intel mvapich2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @grondo for the tip! I think I've addressed all of your first-round comments. This is ready for a second review.
0fb5bba
to
55d3a01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Just a few comments to ensure we capture failure from the tests.
flux job wait --all | ||
for id in $(flux jobs -f completed -no {id}); do | ||
printf "\033[31mjob $id completed:\033[0m\n" | ||
flux job attach $id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have to capture the exit code of flux job attach
. If any job fails then the script should exit with nonzero exit status to indicate failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've got the solution to this one, but it required a few more changes to the script than I was anticipating:
RC=0
for id in $(flux jobs -a -no {id}); do
printf "\033[31mjob $id completed:\033[0m\n"
flux job attach $id || RC=$?
done
exit $RC
flux jobs -f completed -no {id}
only returns ids of jobs with non-zero exit codes, so the for loop had to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flux jobs -f completed -no {id}
only returns ids of jobs with non-zero exit codes, so the for loop had to be updated.
Oh yeah, I somehow missed that on the last review. (Slight correction, -f completed
returns jobs with 0
exit codes, or jobs that were "successful")
What you have seems good for now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only returns ids of jobs with non-zero exit codes
I don't know why I typed the literal opposite of what I meant to type 🤦♂️
Oh yeah, I somehow missed that on the last review
I think you corrected me on this over slack or in a 1-1 a while back, sorry it took so long to fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why I typed the literal opposite of what I meant to type
Ah, yeah I figured it might have just been a typo.
040c2fb
to
925b0ee
Compare
Problem: MPI testing needs to be easily extensible by different MPI implementations and compilers. Add a shell script that takes a (1) compiler and (2) MPI implementation and compiles test code with them. Remove the binaries when done.
Thanks @grondo for the feedback! I believe I've addressed all your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! A couple comments/questions inline.
cd $FTC_DIRECTORY/$NAME || die "Could not find $FTC_DIRECTORY/$NAME" | ||
echo "Running with $1 compiler and $2 MPI" | ||
flux bulksubmit -n1 --watch mpicc -o {} {}.c ::: $TESTS || die "Compilation failure in tests" | ||
flux bulksubmit --watch -N $BATCH_NNODES -n $BATCH_NCORES --output=kvs ./{} ::: $TESTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question - were we going to run all MPI jobs across all cores, or should we use -n
< $BATCH_NCORES
here? I can't remember where we landed on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed that flux batch -N2 -n4
in the outer script would allow for parallelization, because multiple mpi/compiler combos could run at the same time, then the tests themselves would run across all cores in the batch job. This way, if we change the outer script to consume more/less resources, no change is required to the inner script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got it. Sorry, I lost track of some of the discussion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries!
mpi/outer_script.sh
Outdated
flux job wait --all | ||
RC=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I apologize, I wasn't reading the script carefully before. flux job wait --all
will return the highest exit code of all jobs, so you could capture RC=$?
here and wouldn't need the || RC=$?
below. Up to you if you want to make that change or not since I think the current version will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this better (because no "magic number" on line 29). Fixed.
Problem: in order for MPI testing to be extensible by the number of tests being executed, an additional shell script organizing MPI implementations and output is helpful. Add one.
Problem: to test MPI implementations and compilers, we need some kind of basic test that utilizes MPI. Add the MPI tests from the flux-core testsuite, along with some commands from libutil in core for their functionality.
Problem: currently, we don't test changes in flux-core against the MPI implementations and compilers in LC. Add MPI testing for the standard options on Corona by passing an MPI implementation and compiler to a shell script, which compiles and runs an MPI test code.
Problem: flux-core build scripts are reused in other tests, but the FLUX_HWLOC_XMLFILE should not be. Move the lstopo commands to the core testsuite run only.
Merging. Thank you @grondo, I know this one took a lot of work and iterations to review. |
This PR is a stab at supporting MPI testing on LC resources.
We want MPI testing to be extensible easily in three major ways:
.gitlab/mpi-test.gitlab-ci.yml
.mpi/mpi_tests.c
. A call in the main function gathering the return code would also be required..gitlab/mpi-test.gitlab-ci.yml
that covered the MPI implementations and compilers for that machine, and three things would need to be added to the main.gitlab-ci.yml
file: the machine specifications, a reference wrapper building flux and executing the MPI tests, and a test for gitlab to run. See.corona
,.test-core-mpi-corona
, andcorona-mpi-test
, respectively, examples of this.