Skip to content

Commit b5a5475

Browse files
authored
Fixes and suggestions for ReFrame documentation (#151)
1 parent 521487d commit b5a5475

File tree

1 file changed

+52
-40
lines changed

1 file changed

+52
-40
lines changed

Diff for: docs/pkg-reframe.md

+52-40
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# ReFrame Testing Tutorial
22

3-
When ReFrame tests are enabled for a uenv, they are automatically run:
3+
When [ReFrame] tests are enabled for a uenv, they are automatically run:
44

55
* in the CI/CD pipeline after the image has been built;
66
* in daily/weekly testing of individual vClusters;
77
* and when upgrading and updating vClusters.
88

9-
This page is a tutorial, that will guide you through the process of enabling testing in your uenv, and on writing portable tests that will run on any uenv-enabled system on Alps.
9+
This page is a tutorial that will guide you through the process of enabling testing for your uenv, and on writing portable tests that will run on any uenv-enabled system on [Alps].
1010

1111
!!! info
1212

@@ -18,7 +18,7 @@ This page is a tutorial, that will guide you through the process of enabling tes
1818

1919
## How uenv ReFrame testing works
2020

21-
CSCS maintains a set of ReFrame tests in the CSCS ReFrame tests repository [eth-cscs/cscs-reframe-tests].
21+
CSCS maintains a set of [ReFrame] tests in the CSCS ReFrame tests repository [eth-cscs/cscs-reframe-tests].
2222
These tests cover a very wide range of features, including application tests, login node health and Slurm, and can be run on any vCluster on Alps.
2323

2424
!!! info
@@ -27,15 +27,15 @@ These tests cover a very wide range of features, including application tests, lo
2727

2828
Setting up tests for a uenv requires making changes to two repositories:
2929

30-
* [eth-cscs/alps-uenv] **adding meta data to the uenv** to be used by ReFrame to:
30+
* [eth-cscs/alps-uenv] **adding metadata to the uenv** to be used by ReFrame to:
3131
* load the uenv and configure the environment so that it is ready to run tests;
3232
* and, choose which tests from the test suite are used to test the uenv.
3333
* [eth-cscs/cscs-reframe-tests] **updating and adding tests** in the that are relevant to the uenv.
3434
* might not be necessary if the tests already exist.
3535

3636
### uenv recipe meta data
3737

38-
To enable ReFrame tests in your uenv, and yaml file `extra/reframe.yaml` should be added to the recipe.
38+
To enable ReFrame tests in your uenv, a yaml file `extra/reframe.yaml` should be added to the recipe.
3939

4040
Below is an example `reframe.yaml` file:
4141

@@ -57,21 +57,21 @@ This configuration defines a single _environment_ named `develop`, which corresp
5757
* `features`: a list of ReFrame features that are provided by the environment.
5858
* used to decide which tests will be run against the uenv.
5959
* in this case the uenv provides:
60-
* `cuda`: expect tests that compile and test NVIDIA gpu aware problems.
60+
* `cuda`: expect tests that compile and test NVIDIA GPU aware problems.
6161
* `mpi`: expect basic MPI tests that compile and validate MPI to be run. When combined with `cuda` above, tests for GPU-aware MPI will be run.
6262
* `arbor-dev`: a specific feature that specifies that _the environment provides everything required to build and run arbor_.
6363
* `cc`, `cxx`, `ftn`: define the compiler aliases
6464
* see the [ReFrame environment]s documentation.
6565
* `views`: **(optional)** a list of views to load.
6666
* in this case `develop` view is to be loaded.
6767

68-
Uenv can provide multiple views, for different use cases.
68+
A uenv can provide multiple views, for different use cases.
6969
The most common example is a uenv that provides two views: one that provides an application, and another that provides the tools used to build the application. Another example is the `modules` and `spack` views, that expose a module interface or useful configuration for using Spack with the uenv.
7070

7171
Similarly, it is possible to create multiple environments to test.
7272
The example below defines two environments that provide the same features, i.e. the same tests will be run on each.
7373
The first example is the one above, and the second sets up an equivalent environment using modules.
74-
This would be useful for a uenv that has some users who insist on using modules to set up their build enviroment.
74+
This would be useful for a uenv that has some users who insist on using modules to set up their build environment.
7575

7676
```yaml title="extra/reframe.yaml for multiple environments to test"
7777
develop:
@@ -105,25 +105,25 @@ modules:
105105

106106
!!! question "What is an environment?"
107107

108-
What is the difference between using `module load`, activating a python venv, loading a spack environment, or a uenv view?
108+
What is the difference between using `module load`, activating a python venv, loading a Spack environment, or a uenv view?
109109

110110
Nothing!
111111

112112
They all do the same thing - set environment variables.
113113

114114
The main variables that change the behavior of the system are `PATH` and `LD_LIBRARY_PATH`, though there are many others like `PKG_CONFIG_PATH`, `CUDA_HOME`, `MODULEPATH` etc that will have more subtle effects on configuring and building software.
115115

116-
Configur an environment for running tests requires specifying the commands that will **modify and set environment variables** such that the tests can run. For example, a view or module might be loaded to make the executable of a scientific code be in `PATH`, or to add tools like `cmake`, `nvcc` and `gcc` to `PATH` so that we can run a test that builds an application.
116+
Configuring an environment for running tests requires specifying the commands that will **modify and set environment variables** such that the tests can run. For example, a view or module might be loaded to make the executable of a scientific code be in `PATH`, or to add tools like `cmake`, `nvcc` and `gcc` to `PATH` so that we can run a test that builds an application.
117117

118118
## Creating uenv tests
119119

120120
The final objective for adding tests to a uenv is to have:
121121

122-
1. a uenv deployed with an `meta/extra/reframe.yaml` file;
122+
1. a uenv deployed with an `extra/reframe.yaml` file;
123123
2. and tests in the [eth-cscs/cscs-reframe-tests] repository
124124

125125
In this second half of the tutorial, a workflow for doing this that minimises the amount of time spent
126-
waiting in job and ci/cd queues is provided.
126+
waiting in job and CI/CD queues is provided.
127127
Before starting, you will need the following:
128128

129129
* a working uenv squashfs image with corresponding meta data path;
@@ -133,26 +133,26 @@ Before starting, you will need the following:
133133

134134
```bash
135135
# pull the image that you want to start developing tests for, e.g.:
136-
$ uenv image pull cp2k/24.7:v1
136+
$ uenv image pull cp2k/2024.2:v1
137137
138138
# get the meta data path
139139
$ meta=$(uenv image inspect cp2k/2024.2:v1 --format={meta})
140140
141141
# check the path - your location will be different
142142
$ echo ${meta}
143143
144-
# create the reframe meta data eile
144+
# create the reframe meta data file
145145
$ mkdir -p ${meta}/extra
146146
$ vim ${meta}/extra/reframe.yaml
147147
```
148148

149-
The `meta` path is the meta data for the uenv for the image.
149+
The `meta` path is the metadata for the uenv for the image.
150150

151151
??? note "why create reframe.yaml in this location?"
152152

153153
We inject the `reframe.yaml` file into the meta data in the uenv repo to create a "development environment", where it can be modified while developing the tests in an interactive shell.
154154

155-
The `reframe.yaml` file will be added to the recipe later, once it is time to start testing in a ci/cd pipeline.
155+
The `reframe.yaml` file will be added to the recipe later, once it is time to start testing in a CI/CD pipeline.
156156

157157

158158
### Step 2: set up the CSCS reframe tests
@@ -161,11 +161,11 @@ The next step is to check out and setup ReFrame and the CSCS ReFrame test suite.
161161

162162
It might be a good idea to create a path for this work, and cloning the ReFrame and CSCS test suite repos as sub-directories.
163163

164-
### set up ReFrame
164+
### Set up ReFrame
165165

166166
The first step is to download and set up ReFrame:
167167

168-
* clone from GitHub;
168+
* clone from [ReFrame GitHub repository](https://github.com/reframe-hpc/reframe)
169169
* run the bootstrap process that installs ReFrame's dependencies;
170170
* then add reframe to `PATH`.
171171

@@ -174,14 +174,15 @@ The first step is to download and set up ReFrame:
174174
$ git clone [email protected]:reframe-hpc/reframe.git
175175
176176
# run bootstrap process (only needs to be done once)
177-
$ (cd reframe; ./bootstrap.sh)
177+
$ cd reframe
178+
$ ./bootstrap.sh
178179
179180
# add to PATH and verify that everything works
180-
$ export PATH=$PWD/reframe/bin:$PATH
181+
$ export PATH=$PWD/bin:$PATH
181182
$ reframe --version
182183
```
183184

184-
### set up the ReFrame tests
185+
### Set up the ReFrame tests
185186

186187
The next step is to clone the CSCS test suite, and create a new branch where we will make any changes required to test our uenv.
187188

@@ -199,7 +200,7 @@ git switch -c uenv-arbor
199200
Always create your working branch off of the `alps` branch.
200201
The `alps` branch is used for tests run on Alps vClusters. It will become the main branch, once Piz Daint is decommisioned.
201202

202-
## Adding/Updating tests to a uenv
203+
## Adding/updating tests to a uenv
203204

204205
Now everything is in place to implement the tests for your uenv, which will involve one or two of the following:
205206

@@ -225,7 +226,7 @@ In this tutorial we will write tests that:
225226
3. run a benchmark with a single MPI rank on one GH200 GPU
226227
3. run a benchmark with 4 ranks and 4 GPUs on a GH200 node.
227228

228-
[link to the tests](https://github.com/eth-cscs/cscs-reframe-tests/tree/alps/checks/apps/arbor).
229+
[Link to the tests](https://github.com/eth-cscs/cscs-reframe-tests/tree/alps/checks/apps/arbor).
229230

230231
!!! note
231232

@@ -243,18 +244,16 @@ import uenv
243244
```
244245

245246
It provides the `uenv.uarch()` function, that will be used to determine the uenv
246-
uarch (`gh200`, `a100`, `zen2`, etc) of the system where tests are running.
247+
uarch (`gh200`, `a100`, `zen2`, etc.) of the system where tests are running.
247248
We will see it in action below.
248249

249-
### test: building the software
250+
### Test: building the software
250251

251-
[Link](https://github.com/eth-cscs/cscs-reframe-tests/blob/alps/checks/apps/arbor/arbor-dev.py#L47-L93).
252-
253-
Building is handled by a test, in this case called `arbor_build`, that derives from `rfm.CompileOnlyRegressionTest`.
252+
Building is handled by a test, in this case called [`arbor_build`](https://github.com/eth-cscs/cscs-reframe-tests/blob/alps/checks/apps/arbor/arbor-dev.py#L47-L93), that derives from `rfm.CompileOnlyRegressionTest`.
254253

255254
!!! info
256255

257-
Points of interest are annotated in the code below with :heavy_plus_sign: symbols, click on them to expand.
256+
Points of interest are annotated in the code below with :material-plus-circle: symbols, click on them to expand.
258257

259258
``` { .python .annotate }
260259
class arbor_build(rfm.CompileOnlyRegressionTest):
@@ -305,7 +304,7 @@ class arbor_build(rfm.CompileOnlyRegressionTest):
305304
1. Restrict this test to only run in environments that provide the `arbor-dev` feature.
306305
This can be a list of environments, e.g. `['+arbor-dev+cuda', '+python']` would run the test in environments that provide both `arbor-dev` and `cuda`, or environments that provide `python`.
307306
2. `arbor_download` is a [ReFrame fixture](https://reframe-hpc.readthedocs.io/en/v4.5.0/tutorial_fixtures.html), that handles downloading the source for Arbor.
308-
3. The build stage is performed on a compute node using an sbatch job.
307+
3. The build stage is performed on a compute node using a sbatch job.
309308
Required so that the environment is configured properly, by adding the
310309
correct flags to the script:
311310
```
@@ -330,7 +329,7 @@ class arbor_build(rfm.CompileOnlyRegressionTest):
330329

331330
The new approach of parameterising over uarch means that the test can be configured for _any_ vCluster with gh200 nodes.
332331

333-
### test: run the unit tests
332+
### Test: run the unit tests
334333

335334
The C++ Arbor library provides GoogleTest unit tests that are bundled in a single executable `unit`.
336335
The tests are not MPI enabled, and take less than 30 seconds to run 1000 individual tests.
@@ -360,11 +359,11 @@ class arbor_unit(rfm.RunOnlyRegressionTest):
360359
1. This is the first time that we have added an annotation to a test.
361360
This is a "leaf" in our set of test dependencies, run after the download
362361
and build stages that are its dependencies have run.
363-
2. The unit tests run quickly - so set a short time limit for higher priority queueing
362+
2. The unit tests run quickly - so set a short time limit for higher priority queuing
364363
3. The `arbor_build` stage has to be run before this test, to build the unit tests.
365364
4. Just check that the tests passed - performance checks are implemented elsewhere
366365

367-
### test: single GPU benchmark
366+
### Test: single GPU benchmark
368367

369368
Use the `miniring` benchmark provided by Arbor to check both correctness and performance.
370369

@@ -397,8 +396,8 @@ arbor_references = {
397396
}
398397
```
399398

400-
1. Currently we only have Arbor performance targets for gh200, fields for `zen2` would be added for Eiger testing.
401-
2. These are labelled reference targes. A link will be added when it is found in ReFrame's "documentation".
399+
1. Currently, we only have Arbor performance targets for `gh200`, fields for `zen2` would be added for Eiger testing.
400+
2. These are labelled reference targets. A link will be added when it is found in ReFrame's "documentation".
402401

403402
The test itself:
404403

@@ -449,7 +448,7 @@ class arbor_busyring(rfm.RunOnlyRegressionTest):
449448
In other words - the performance target is set dynamically based on the architecture of the node,
450449
instead of being hard coded using if-else statements in the test itself.
451450

452-
* `self.uarch` is one of the alps arch: `"gh200"`, `"zen2"`, `"a100"`, ... etc, or `None`
451+
* `self.uarch` is one of the alps arch: `"gh200"`, `"zen2"`, `"a100"`, ... etc., or `None`
453452
* `self.current_partition.fullname` is the `vcluster:partition` string, for example `"daint:normal"` or `"todi:debug"`.
454453

455454
!!! note
@@ -458,7 +457,7 @@ class arbor_busyring(rfm.RunOnlyRegressionTest):
458457
does not provide values for the current uarch.
459458
However, in such cases, no comparison is made and the test will pass.
460459

461-
### test: MPI tests
460+
### Test: MPI tests
462461

463462
``` { .python .annotate }
464463
slurm_config = { #(1)
@@ -519,16 +518,23 @@ $ export UENV=arbor/v0.9:v1
519518
It will be a hard requirement that the meta data path will be in the same path
520519
as the `store.squashfs` file.
521520

521+
!!! warning
522+
If the `UENV` variable is not set proprely, the `reframe.yaml` file can't be found and you will see an error:
523+
```
524+
ERROR: failed to load configuration: problem loading the metadata from 'extra/reframe.yaml'
525+
```
526+
522527
To run the tests use the following commands:
523528

524529
```bash
525530
# run the tests
526531
$ reframe -C cscs-reframe-tests/config/cscs.py \
527-
-c cscs-reframe-tests/checks/apps/arbor/ -r --keep-stage-files
532+
-c cscs-reframe-tests/checks/apps/arbor/ --keep-stage-files \
533+
-r
528534
529535
# perform a dry run of tests
530536
$ reframe -C cscs-reframe-tests/config/cscs.py \
531-
-c cscs-reframe-tests/checks/apps/arbor/ -r --keep-stage-files \
537+
-c cscs-reframe-tests/checks/apps/arbor/ --keep-stage-files \
532538
--dry-run
533539
```
534540

@@ -540,7 +546,7 @@ $ reframe -C cscs-reframe-tests/config/cscs.py \
540546
* `--keep-stage-files`: this will keep all of the intermediate scripts and configuration
541547
* stored in the `stage` sub-directory of the current path.
542548
* very useful for debugging problems with our tests.
543-
* `--dry-run`: generate all of the stage files and scripts without running the tests.
549+
* `--dry-run`: generate all the stage files and scripts without running the tests.
544550

545551
!!! tip
546552

@@ -552,6 +558,12 @@ $ reframe -C cscs-reframe-tests/config/cscs.py \
552558
Look in the `stage` path that is created in the path where you called reframe for all
553559
of the job scripts, build files, and results.
554560

561+
!!! tip
562+
If ReFrame tests fail because of `ReqNodesNotAvail` and you think it is a fluke, try setting
563+
`RFM_IGNORE_REQNODENOTAVAIL=y`.
564+
555565
[eth-cscs/alps-uenv]: https://github.com/eth-cscs/alps-uenv
556566
[eth-cscs/cscs-reframe-tests]: https://github.com/eth-cscs/cscs-reframe-tests
557567
[ReFrame environment]: https://reframe-hpc.readthedocs.io/en/stable/tutorial.html#environment-features-and-extras
568+
[ReFrame]: https://reframe-hpc.readthedocs.io/en/stable/
569+
[Alps]: https://www.cscs.ch/computers/alps

0 commit comments

Comments
 (0)