mpi: support for MPI testing on LC hardware #7

wihobbs · 2023-12-04T23:27:28Z

This PR is a stab at supporting MPI testing on LC resources.

We want MPI testing to be extensible easily in three major ways:

MPI implementation and compiler being tested. These vary by machine.
- For this simple example, an MPI implementation and compiler can be added by adding a new command line call in .gitlab/mpi-test.gitlab-ci.yml.
Test code. This initial PR starts with just “hello, world.”
- Currently, a new test could be added via a new function in mpi/mpi_tests.c. A call in the main function gathering the return code would also be required.
Machines being tested. As El Cap and other machines become available, we want to add them (and replace Tioga/EAS systems.)
- A new machine is probably the most involved thing to add. A script would have to be added to .gitlab/mpi-test.gitlab-ci.yml that covered the MPI implementations and compilers for that machine, and three things would need to be added to the main .gitlab-ci.yml file: the machine specifications, a reference wrapper building flux and executing the MPI tests, and a test for gitlab to run. See .corona, .test-core-mpi-corona, and corona-mpi-test, respectively, examples of this.

grondo

Just skimmed this and looks like good work!

Some really high level comments to start:

The MPI hello test should probably be named hello or similar instead of more the more generic mpi_tests.c. The reason is that we may want to add other simple MPI based tessin the future (e.g. in flux-core we have hello and abort and version tests)
Eventually, we might want to move the MPI testing driver (currently the script in mpi-test.gitlab-ci.yml) to a script so it is easier to update the set of MPI implementations and compilers that are tested on each cluster. (We may want to add a config file for example so the list is easily updated and all in one place.)
minor: it might be slightly preferable to remove .gitlab-ci from the name of mpi-test.gitlab-ci.yml. Since the file is in .gitlab already it is redundant.

mpi/mpi_tests.c

wihobbs · 2023-12-05T22:17:52Z

The MPI hello test should probably be named hello or similar instead of more the more generic mpi_tests.c. The reason is that we may want to add other simple MPI based tessin the future (e.g. in flux-core we have hello and abort and version tests)

My original reason for doing this was that I thought we could add the abort and version tests as additional functions to mpi_tests.c, which would mean only having to compile and link one piece of code. I'm guessing you want these to be separately compiled and run?

Eventually, we might want to move the MPI testing driver (currently the script in mpi-test.gitlab-ci.yml) to a script so it is easier to update the set of MPI implementations and compilers that are tested on each cluster. (We may want to add a config file for example so the list is easily updated and all in one place.)

I like the idea of a config file that could compile and run tests and standardize this across multiple machines. The implementation of this is still a little nebulous in my mind. I'll see if I can hammer out an example...

I think moving the flux run ./src/cmd/flux call to the script would be a good start. We could probably trash the mpi-test.gitlab-ci.yml file if we did this (and just call the script instead).

grondo · 2023-12-05T23:25:35Z

My original reason for doing this was that I thought we could add the abort and version tests as additional functions to mpi_tests.c, which would mean only having to compile and link one piece of code. I'm guessing you want these to be separately compiled and run?

This is not a bad idea, but I think it will result in more complexity in the long term (plus if we have a test or benchmark from elsewhere, it will be more work to integrate it into the test program than it would be to just drop in the new test).

I think moving the flux run ./src/cmd/flux call to the script would be a good start. We could probably trash the mpi-test.gitlab-ci.yml file if we did this (and just call the script instead).

That sounds good. I think eventually we'll be submitting a suite of tests to the CI flux instance. The script can eventually handle this submission, monitoring of tests, and collection of results from all jobs.

wihobbs · 2023-12-05T23:33:54Z

I think eventually we'll be submitting a suite of tests to the CI flux instance.

What you're describing sounds to me like we'll be creating one Flux instance in CI (probably 2 full nodes) and then submitting many different MPI jobs utilizing different compilers to it, rather than creating many small instances (say, 2 nodes, 1 core on each) for each individual MPI job. Am I tracking correctly?

grondo · 2023-12-05T23:54:25Z

What you're describing sounds to me like we'll be creating one Flux instance in CI (probably 2 full nodes) and then submitting many different MPI jobs utilizing different compilers to it, rather than creating many small instances (say, 2 nodes, 1 core on each) for each individual MPI job. Am I tracking correctly?

I think there's a small bit of design work that needs to be done here. I haven't thought about this in detail so I apologize if my thoughts are not well-formed, but it seems like each MPI+compiler test is comprised of the following steps (this is just my first thought, so happy to discuss further)

Load compiler + mpi environment
Build test code in some kind of scratch directory
submit a defined suite of MPI tests as jobs
wait for all jobs
collect results
(where 4. and 5. could be done continuously perhaps)

These steps seem to naturally compose what we'd think of as a batch job. The batch script would handle these steps including compilation of the MPI tests with the defined compiler and MPI, then would submit the suite of jobs and collect and report resuls (implementation TBD). An outer script would submit a batch job for each test mpi and compiler that we're targeting to the CI instance. That way the more resources the CI Flux instance has, the faster we'll run through these tests.

Does that make any sense?

grondo · 2023-12-06T00:06:59Z

The batch script would handle these steps including compilation of the MPI tests with the defined compiler and MPI, then would submit the suite of jobs and collect and report resuls (implementation TBD)

I'll note one drawback to doing the compilation in the batch job is that cores in the allocation will go idle during this stage since no jobs can be run until the compilation completes. An optimization might be to submit the compile step as one single-node job, and the tests as a batch job with a dependency on the compile job. However, this feels like a premature optimization at this point.

Hm, we could also submit all of the compile and MPI tests as jobs to the CI instance with appropriate dependencies (no nested batch jobs). This would allow more flexibility in the size of MPI test jobs and would perhaps be more efficient scheduling. It also may be easier to collect the results since all the jobs are submitted at one level 🤔

wihobbs · 2023-12-06T01:25:55Z

An outer script would submit a batch job for each test mpi and compiler that we're targeting to the CI instance. That way the more resources the CI Flux instance has, the faster we'll run through these tests.

The outer script you described is a major piece this PR is missing. The bare bones of 1-5 you described comprising a batch job are prototyped in de1ce16. However, 3-5 need improvement.

Hm, we could also submit all of the compile and MPI tests as jobs to the CI instance with appropriate dependencies (no nested batch jobs).

I think we're on the same page. If we're requesting a 2 node instance for testing interconnects, we could submit all of the compilation and run batch jobs to the enclosing instance (each requesting 2 nodes and n cores where n >=2) and let Flux sort out what runs when.

then would submit the suite of jobs and collect and report resuls (implementation TBD).

This is on my todo list not only for the MPI work but for aggregating results from the testsuite runs as well. One thing I have noticed when running MPI jobs is there are some things in stderr we may want to collect that don't cause a nonzero return code but do say things we should look at.

All excellent thoughts, thanks so much @grondo. I think we're making a lot of progress here, or at least I'm starting to grasp what this could look like. As a first step, I'll look into the "outer script" you described, and we can reason more from there.

wihobbs · 2023-12-13T19:55:49Z

@grondo Let me know if this is closer to the target. Note that, for debugging purposes, it currently outputs the stdout of all completed jobs. I imagine that in the future we could have a debug=True or --d flag that did this, and the normal behavior would be to only output failed jobs as we discussed.

Here's how I've been running for testing:

flux alloc -N2
cd ~/flux-test-collective
MPI_TESTS_DIRECTORY=$(pwd)/mpi FTC_DIRECTORY=$(pwd) flux run -N2 ../flux-core/src/cmd/flux start ./mpi/outer_script.sh

wihobbs · 2023-12-13T19:58:03Z

I'll also add the logfile here on corona in case you wanted to see it.

wihobbs · 2024-01-04T17:54:40Z

@grondo This is ready for another review. Some notable changes:

removed my hello.c and replaced with 3 tests from flux-core
removed the Makefile, as looping to compile a pre-defined list of tests is easier in a shell script
testing with mvapich2 only for now (openmpi will depend on getting flux-pmix added to this testsuite)

wihobbs · 2024-01-04T17:55:14Z

Oh, and another GitLab logfile that might be helpful.

grondo

Just took a quick pass through with some comments.

mpi/inner_script.sh

.gitlab-ci.yml

mpi/outer_script.sh

grondo · 2024-01-04T19:06:11Z

mpi/outer_script.sh

+version
+"
+
+IS_CORONA=$(echo $HOSTNAME | grep corona)


In LC we have an environment variable LCSCHEDCLUSTER which might be more useful, e.g. on corona

LCSCHEDCLUSTER=corona

Then you can test if test "$LCSCHEDCLUSTER" = "corona".

As a side note, you could generalize this using bash indirect expansion

If the first character of parameter is an exclamation point (!), and parameter is not a nameref, it introduces a level of indirection. Bash uses the value formed by expanding the rest of parameter as the new parameter; this is then expanded and that value is used in the rest of the expansion, rather than the expansion of the original parameter. This is known as indirect expansion. The value is subject to tilde expansion, parameter expansion, command substitution, and arithmetic expansion. If parameter is a nameref, this expands to the name of the variable referenced by parameter instead of performing the complete indirect expansion. The exceptions to this are the expansions of ${!prefix*} and ${!name[@]} described below. The exclamation point must immediately follow the left brace in order to introduce indirection.

E.g.

#!/bin/sh corona_COMPILERS=" gcc clang intel " corona_MPIS=" mvapich2 " MPIS="${LCSCHEDCLUSTER}_MPIS" COMPILERS="${LCSCHEDCLUSTER}_COMPILERS" for mpi in ${!MPIS}; do for compiler in ${!COMPILERS}; do echo "$compiler $mpi" done done

$ bash test.sh gcc mvapich2 clang mvapich2 intel mvapich2

Thanks @grondo for the tip! I think I've addressed all of your first-round comments. This is ready for a second review.

grondo

Looking good! Just a few comments to ensure we capture failure from the tests.

mpi/inner_script.sh

grondo · 2024-01-10T14:58:53Z

mpi/outer_script.sh

+flux job wait --all
+for id in $(flux jobs -f completed -no {id}); do
+    printf "\033[31mjob $id completed:\033[0m\n"
+    flux job attach $id


You might have to capture the exit code of flux job attach. If any job fails then the script should exit with nonzero exit status to indicate failure?

I think I've got the solution to this one, but it required a few more changes to the script than I was anticipating:

RC=0 for id in $(flux jobs -a -no {id}); do printf "\033[31mjob $id completed:\033[0m\n" flux job attach $id || RC=$? done exit $RC

flux jobs -f completed -no {id} only returns ids of jobs with non-zero exit codes, so the for loop had to be updated.

flux jobs -f completed -no {id} only returns ids of jobs with non-zero exit codes, so the for loop had to be updated.

Oh yeah, I somehow missed that on the last review. (Slight correction, -f completed returns jobs with 0 exit codes, or jobs that were "successful")

What you have seems good for now!

only returns ids of jobs with non-zero exit codes

I don't know why I typed the literal opposite of what I meant to type 🤦‍♂️

Oh yeah, I somehow missed that on the last review

I think you corrected me on this over slack or in a 1-1 a while back, sorry it took so long to fix!

I don't know why I typed the literal opposite of what I meant to type

Ah, yeah I figured it might have just been a typo.

Problem: MPI testing needs to be easily extensible by different MPI implementations and compilers. Add a shell script that takes a (1) compiler and (2) MPI implementation and compiles test code with them. Remove the binaries when done.

wihobbs · 2024-01-11T00:56:37Z

Thanks @grondo for the feedback! I believe I've addressed all your comments.

grondo

LGTM! A couple comments/questions inline.

grondo · 2024-01-11T01:58:48Z

mpi/inner_script.sh

+cd $FTC_DIRECTORY/$NAME || die "Could not find $FTC_DIRECTORY/$NAME"
+echo "Running with $1 compiler and $2 MPI"
+flux bulksubmit -n1 --watch mpicc -o {} {}.c ::: $TESTS || die "Compilation failure in tests"
+flux bulksubmit --watch -N $BATCH_NNODES -n $BATCH_NCORES --output=kvs ./{} ::: $TESTS


One question - were we going to run all MPI jobs across all cores, or should we use -n < $BATCH_NCORES here? I can't remember where we landed on that.

I think we discussed that flux batch -N2 -n4 in the outer script would allow for parallelization, because multiple mpi/compiler combos could run at the same time, then the tests themselves would run across all cores in the batch job. This way, if we change the outer script to consume more/less resources, no change is required to the inner script.

Ok, got it. Sorry, I lost track of some of the discussion!

No worries!

grondo · 2024-01-11T02:00:27Z

mpi/outer_script.sh

+flux job wait --all
+RC=0


Actually I apologize, I wasn't reading the script carefully before. flux job wait --all will return the highest exit code of all jobs, so you could capture RC=$? here and wouldn't need the || RC=$? below. Up to you if you want to make that change or not since I think the current version will work.

I like this better (because no "magic number" on line 29). Fixed.

Problem: in order for MPI testing to be extensible by the number of tests being executed, an additional shell script organizing MPI implementations and output is helpful. Add one.

Problem: to test MPI implementations and compilers, we need some kind of basic test that utilizes MPI. Add the MPI tests from the flux-core testsuite, along with some commands from libutil in core for their functionality.

Problem: currently, we don't test changes in flux-core against the MPI implementations and compilers in LC. Add MPI testing for the standard options on Corona by passing an MPI implementation and compiler to a shell script, which compiles and runs an MPI test code.

Problem: flux-core build scripts are reused in other tests, but the FLUX_HWLOC_XMLFILE should not be. Move the lstopo commands to the core testsuite run only.

wihobbs · 2024-01-11T18:48:23Z

Merging. Thank you @grondo, I know this one took a lot of work and iterations to review.

wihobbs requested a review from grondo December 5, 2023 17:34

grondo reviewed Dec 5, 2023

View reviewed changes

mpi/mpi_tests.c Outdated Show resolved Hide resolved

wihobbs force-pushed the mpi-test branch from 237db10 to fd6c85a Compare December 13, 2023 19:50

wihobbs force-pushed the mpi-test branch from fd6c85a to 9f8e2bd Compare December 13, 2023 22:25

wihobbs force-pushed the mpi-test branch 2 times, most recently from 995ba89 to 0618c25 Compare January 4, 2024 17:51

wihobbs requested a review from grondo January 4, 2024 17:52

grondo reviewed Jan 4, 2024

View reviewed changes

wihobbs force-pushed the mpi-test branch 2 times, most recently from 0fb5bba to 55d3a01 Compare January 8, 2024 23:23

wihobbs requested a review from grondo January 8, 2024 23:24

grondo reviewed Jan 10, 2024

View reviewed changes

wihobbs force-pushed the mpi-test branch 2 times, most recently from 040c2fb to 925b0ee Compare January 11, 2024 00:38

mpi: add shell script to compile/run tests

e8c9557

Problem: MPI testing needs to be easily extensible by different MPI implementations and compilers. Add a shell script that takes a (1) compiler and (2) MPI implementation and compiles test code with them. Remove the binaries when done.

wihobbs force-pushed the mpi-test branch from 925b0ee to afa1020 Compare January 11, 2024 00:52

wihobbs requested a review from grondo January 11, 2024 00:56

grondo approved these changes Jan 11, 2024

View reviewed changes

wihobbs added 4 commits January 11, 2024 10:44

mpi: add prototype outer shell script

731ffe1

Problem: in order for MPI testing to be extensible by the number of tests being executed, an additional shell script organizing MPI implementations and output is helpful. Add one.

mpi: add basic MPI testing

ffe5f39

Problem: to test MPI implementations and compilers, we need some kind of basic test that utilizes MPI. Add the MPI tests from the flux-core testsuite, along with some commands from libutil in core for their functionality.

.gitlab: make lstopo specific to flux-core testsuite

984fe3b

Problem: flux-core build scripts are reused in other tests, but the FLUX_HWLOC_XMLFILE should not be. Move the lstopo commands to the core testsuite run only.

wihobbs force-pushed the mpi-test branch from afa1020 to 984fe3b Compare January 11, 2024 18:45

wihobbs merged commit 7bd74ef into flux-framework:main Jan 11, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mpi: support for MPI testing on LC hardware #7

mpi: support for MPI testing on LC hardware #7

wihobbs commented Dec 4, 2023

grondo left a comment

wihobbs commented Dec 5, 2023

grondo commented Dec 5, 2023

wihobbs commented Dec 5, 2023

grondo commented Dec 5, 2023

grondo commented Dec 6, 2023 •

edited

Loading

wihobbs commented Dec 6, 2023

wihobbs commented Dec 13, 2023

wihobbs commented Dec 13, 2023

wihobbs commented Jan 4, 2024

wihobbs commented Jan 4, 2024

grondo left a comment

grondo Jan 4, 2024

wihobbs Jan 8, 2024

grondo left a comment

grondo Jan 10, 2024

wihobbs Jan 11, 2024

grondo Jan 11, 2024

wihobbs Jan 11, 2024

grondo Jan 11, 2024

wihobbs commented Jan 11, 2024

grondo left a comment

grondo Jan 11, 2024

wihobbs Jan 11, 2024

grondo Jan 11, 2024

wihobbs Jan 11, 2024

grondo Jan 11, 2024

wihobbs Jan 11, 2024

wihobbs commented Jan 11, 2024

mpi: support for MPI testing on LC hardware #7

mpi: support for MPI testing on LC hardware #7

Conversation

wihobbs commented Dec 4, 2023

grondo left a comment

Choose a reason for hiding this comment

wihobbs commented Dec 5, 2023

grondo commented Dec 5, 2023

wihobbs commented Dec 5, 2023

grondo commented Dec 5, 2023

grondo commented Dec 6, 2023 • edited Loading

wihobbs commented Dec 6, 2023

wihobbs commented Dec 13, 2023

wihobbs commented Dec 13, 2023

wihobbs commented Jan 4, 2024

wihobbs commented Jan 4, 2024

grondo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grondo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wihobbs commented Jan 11, 2024

grondo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wihobbs commented Jan 11, 2024

grondo commented Dec 6, 2023 •

edited

Loading