Skip to content

Commit 1e334ab

Browse files
committed
Refactor Ep. 16, fix lesson build order
1 parent 6f573ae commit 1e334ab

File tree

9 files changed

+75
-65
lines changed

9 files changed

+75
-65
lines changed

_config.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,8 @@ sched:
6363
hist: "sacct -u yourUsername"
6464

6565
episode_order:
66-
- 11-hpc-intro
66+
- 10-hpc-intro
67+
- 11-connecting
6768
- 12-cluster
6869
- 13-scheduler
6970
- 14-modules

_episodes/13-scheduler.md

-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ questions:
88
- "How do I capture the output of a program that is run on a node in the
99
cluster?"
1010
objectives:
11-
- "Run a simple Hello World style program on the cluster."
1211
- "Submit a simple Hello World style script to the cluster."
1312
- "Monitor the execution of jobs using command line tools."
1413
- "Inspect the output and error files of your jobs."

_episodes/16-parallel.md

+61-51
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,36 @@
11
---
22
title: "Running a parallel job"
33
teaching: 30
4-
exercises: 30
4+
exercises: 60
55
questions:
66
- "How do we execute a task in parallel?"
7+
- "What benefits arise from parallel execution?"
8+
- "What are the limits of gains from execution in parallel?"
79
objectives:
8-
- "Understand how to run a parallel job on a cluster."
10+
- "Construct a program that can execute in parallel."
11+
- "Prepare a job submission script for the parallel executable."
12+
- "Launch jobs with increasing degrees of parallel execution."
13+
- "Record and summarize the timing and accuracy of the results."
914
keypoints:
10-
- "Parallelism is an important feature of HPC clusters."
11-
- "MPI parallelism is a common case."
15+
- "Parallel programming allows applications to take advantage of
16+
parallel hardware; serial code will not 'just work.'"
17+
- "Distributed memory parallelism is a common case, using the Message
18+
Passing Interface (MPI)."
1219
- "The queuing system facilitates executing parallel tasks."
20+
- "Performance improvements from parallel execution do not scale linearly."
1321
---
1422

1523
We now have the tools we need to run a multi-processor job. This is a very
16-
important aspect of HPC systems, as parallelism is one of the primary tools we
17-
have to improve the performance of computational tasks.
24+
important aspect of HPC systems, as parallelism is one of the primary tools
25+
we have to improve the performance of computational tasks.
1826

1927
Our example implements a stochastic algorithm for estimating the value of
20-
π, the ratio of the circumference to the diameter of a circle.
21-
The program generates a large number of random points on a 1×1 square
22-
centered on (½,½), and checks how many of these points fall
28+
π, the ratio of the circumference to the diameter of a circle.
29+
The program generates a large number of random points on a 1×1 square
30+
centered on (½,½), and checks how many of these points fall
2331
inside the unit circle.
24-
On average, π/4 of the randomly-selected points should fall in the
25-
circle, so π can be estimated from 4*f*, where *f* is the observed
32+
On average, π/4 of the randomly-selected points should fall in the
33+
circle, so π can be estimated from 4*f*, where *f* is the observed
2634
fraction of points that fall in the circle.
2735
Because each sample is independent, this algorithm is easily implemented
2836
in parallel.
@@ -34,14 +42,13 @@ in parallel.
3442
## A Serial Solution to the Problem
3543

3644
We start from a Python script using concepts taught in Software Carpentry's
37-
[Programming with Python](
38-
https://swcarpentry.github.io/python-novice-inflammation/) workshops.
45+
[Programming with Python][inflammation] workshops.
3946
We want to allow the user to specify how many random points should be used
40-
to calculate π through a command-line parameter.
47+
to calculate π through a command-line parameter.
4148
This script will only use a single CPU for its entire run, so it's classified
4249
as a serial process.
4350

44-
Let's write a Python program, `pi.py`, to estimate π for us.
51+
Let's write a Python program, `pi.py`, to estimate π for us.
4552
Start by importing the `numpy` module for calculating the results,
4653
and the `sys` module to process command-line parameters:
4754

@@ -52,9 +59,8 @@ import sys
5259
{: .language-python}
5360

5461
We define a Python function `inside_circle` that accepts a single parameter
55-
for the number of random points used to calculate π.
56-
See [Programming with Python: Creating Functions](
57-
https://swcarpentry.github.io/python-novice-inflammation/08-func/index.html)
62+
for the number of random points used to calculate π.
63+
See [Programming with Python: Creating Functions][python-func]
5864
for a review of Python functions.
5965
It randomly samples points with both *x* and *y* on the half-open interval
6066
[0, 1).
@@ -74,9 +80,8 @@ def inside_circle(total_count):
7480
{: .language-python}
7581

7682
Next, we create a main function to call the `inside_circle` function and
77-
calculate π from its returned result.
78-
See [Programming with Python: Command-Line Programs](
79-
https://swcarpentry.github.io/python-novice-inflammation/12-cmdline/index.html)
83+
calculate π from its returned result.
84+
See [Programming with Python: Command-Line Programs][cmd-line]
8085
for a review of `main` functions and parsing command-line parameters.
8186

8287
```
@@ -93,7 +98,7 @@ if __name__ == '__main__':
9398

9499
If we run the Python script locally with a command-line parameter, as in
95100
`python pi-serial.py 1024`, we should see the script print its estimate of
96-
π:
101+
π:
97102

98103
```
99104
{{ site.local.prompt }} python pi-serial.py 1024
@@ -107,27 +112,28 @@ If we run the Python script locally with a command-line parameter, as in
107112
> built-in capabilities of NumPy. In general, random-number generation is
108113
> difficult to do well, it's easy to accidentally introduce correlations into
109114
> the generated sequence.
115+
>
110116
> * Discuss why generating high quality random numbers might be difficult.
111-
> * Is the quality of random numbers generated sufficient for estimating π
117+
> * Is the quality of random numbers generated sufficient for estimating π
112118
> in this implementation?
113-
>
119+
>
114120
> > ## Solution
115121
> >
116122
> > * Computers are deterministic and produce pseudo random numbers using
117-
> > an algorithm. The choice of algorithm and its parameters determines
118-
> > how random the generated numbers are. Pseudo random number generation
119-
> > algorithms usually produce a sequence numbers taking the previous output
123+
> > an algorithm. The choice of algorithm and its parameters determines
124+
> > how random the generated numbers are. Pseudo random number generation
125+
> > algorithms usually produce a sequence numbers taking the previous output
120126
> > as an input for generating the next number. At some point the sequence of
121-
> > pseudo random numbers will repeat, so care is required to make sure the
122-
> > repetition period is long and that the generated numbers have statistical
127+
> > pseudo random numbers will repeat, so care is required to make sure the
128+
> > repetition period is long and that the generated numbers have statistical
123129
> > properties similar to those of true random numbers.
124130
> > * Yes.
125131
> {: .solution }
126132
{: .discussion }
127133

128134
## Measuring Performance of the Serial Solution
129135

130-
The stochastic method used to estimate π should converge on the true
136+
The stochastic method used to estimate π should converge on the true
131137
value as the number of random points increases.
132138
But as the number of points increases, creating the variables `x`, `y`, and
133139
`radii` requires more time and more memory.
@@ -144,9 +150,7 @@ Since the largest variables in the script are `x`, `y`, and `radii`, each
144150
containing `n_samples` points, we'll modify the script to report their
145151
total memory required.
146152
Each point in `x`, `y`, or `radii` is stored as a NumPy `float64`, we can
147-
use NumPy's [`dtype`](
148-
https://numpy.org/doc/stable/reference/generated/numpy.dtype.html)
149-
function to calculate the size of a `float64`.
153+
use NumPy's [`dtype`][np-dtype] function to calculate the size of a `float64`.
150154

151155
Replace the `print(my_pi)` line with the following:
152156

@@ -157,12 +161,12 @@ print("Pi: {}, memory: {} GiB".format(my_pi, memory_required))
157161
```
158162
{: .language-python}
159163

160-
The first line calculates the bytes of memory required for a single `float64`
161-
value using the `dtype`function.
164+
The first line calculates the bytes of memory required for a single
165+
64-bit floating point number using the `dtype` function.
162166
The second line estimates the total amount of memory required to store three
163167
variables containing `n_samples` `float64` values, converting the value into
164168
units of [gibibytes](https://en.wikipedia.org/wiki/Byte#Multiple-byte_units).
165-
The third line prints both the estimate of π and the estimated amount of
169+
The third line prints both the estimate of π and the estimated amount of
166170
memory used by the script.
167171

168172
The updated Python script is:
@@ -214,15 +218,15 @@ on the total amount of memory required.
214218

215219
### Estimating Calculation Time
216220

217-
Most of the calculations required to estimate π are in the
221+
Most of the calculations required to estimate π are in the
218222
`inside_circle` function:
219223

220224
1. Generating `n_samples` random values for `x` and `y`.
221225
1. Calculating `n_samples` values of `radii` from `x` and `y`.
222226
1. Counting how many values in `radii` are under 1.0.
223227

224228
There's also one multiplication operation and one division operation required
225-
to convert the `counts` value to the final estimate of π in the main
229+
to convert the `counts` value to the final estimate of π in the main
226230
function.
227231

228232
A simple way to measure the calculation time is to use Python's `datetime`
@@ -312,14 +316,14 @@ running on the computer at the same time.
312316
But if the script is the most computationally-intensive process running at the
313317
time, its calculations are the largest influence on the elapsed time.
314318

315-
Now that we've developed our initial script to estimate π, we can see
319+
Now that we've developed our initial script to estimate π, we can see
316320
that as we increase the number of samples:
317321

318-
1. The estimate of π tends to become more accurate.
322+
1. The estimate of π tends to become more accurate.
319323
1. The amount of memory required scales approximately linearly.
320324
1. The amount of time to calculate scales approximately linearly.
321325

322-
In general, achieving a better estimate of π requires a greater number of
326+
In general, achieving a better estimate of π requires a greater number of
323327
points.
324328
Take a closer look at `inside_circle`: should we expect to get high accuracy
325329
on a single machine?
@@ -358,15 +362,15 @@ rather than the command line.
358362
As before, use the status commands to check when your job runs.
359363
Use `ls` to locate the output file, and examine it. Is it what you expected?
360364

361-
* How good is the value for π?
365+
* How good is the value for π?
362366
* How much memory did it need?
363367
* How long did the job take to run?
364368

365369
Modify the job script to increase both the number of samples and the amount
366370
of memory requested (perhaps by a factor of 2, then by a factor of 10),
367371
and resubmit the job each time.
368372

369-
* How good is the value for π?
373+
* How good is the value for π?
370374
* How much memory did it need?
371375
* How long did the job take to run?
372376

@@ -416,7 +420,7 @@ included.
416420
> by examining the environment variables set when the job is launched.
417421
{: .callout}
418422

419-
> ## What Changes Are Needed for an MPI Version of the π Calculator?
423+
> ## What Changes Are Needed for an MPI Version of the π Calculator?
420424
>
421425
> First, we need to import the `MPI` object from the Python module `mpi4py` by
422426
> adding an `from mpi4py import MPI` line immediately below the `import
@@ -578,7 +582,7 @@ As before, use the status commands to check when your job runs.
578582
Use `ls` to locate the output file, and examine it.
579583
Is it what you expected?
580584

581-
* How good is the value for π?
585+
* How good is the value for π?
582586
* How much memory did it need?
583587
* How much faster was this run than the serial run with 100000000 points?
584588

@@ -587,13 +591,13 @@ of memory requested (perhaps by a factor of 2, then by a factor of 10),
587591
and resubmit the job each time.
588592
You can also increase the number of CPUs.
589593

590-
* How good is the value for π?
594+
* How good is the value for π?
591595
* How much memory did it need?
592596
* How long did the job take to run?
593597

594598
## How Much Does MPI Improve Performance?
595599

596-
In theory, by dividing up the π calculations among *n* MPI processes,
600+
In theory, by dividing up the π calculations among *n* MPI processes,
597601
we should see run times reduce by a factor of *n*.
598602
In practice, some time is required to start the additional MPI processes,
599603
for the MPI processes to communicate and coordinate, and some types of
@@ -604,7 +608,7 @@ in the computer, or across multiple compute nodes, additional time is
604608
required for communication compared to all processes operating on a
605609
single CPU.
606610

607-
[Amdahl's Law](https://en.wikipedia.org/wiki/Amdahl's_law) is one way of
611+
[Amdahl's Law][wiki-amdahl] is one way of
608612
predicting improvements in execution time for a **fixed** parallel workload.
609613
If a workload needs 20 hours to complete on a single core,
610614
and one hour of that time is spent on tasks that cannot be parallelized,
@@ -650,8 +654,14 @@ In practice, MPI speedup factors are influenced by:
650654

651655
In an HPC environment, we try to reduce the execution time for all types of
652656
jobs, and MPI is an extremely common way to combine dozens, hundreds, or
653-
thousands of CPUs into solving a single problem. To learn more about
654-
parallelization, see the
655-
[parallel novice lesson](http://www.hpc-carpentry.org/hpc-parallel-novice/)
657+
thousands of CPUs into solving a single problem. To learn more about
658+
parallelization, see the [parallel novice lesson][parallel-novice] lesson.
656659

657660
{% include links.md %}
661+
662+
[cmd-line]: https://swcarpentry.github.io/python-novice-inflammation/12-cmdline/index.html
663+
[inflammation]: https://swcarpentry.github.io/python-novice-inflammation/
664+
[np-dtype]: https://numpy.org/doc/stable/reference/generated/numpy.dtype.html
665+
[parallel-novice]: http://www.hpc-carpentry.org/hpc-parallel-novice/
666+
[python-func]: https://swcarpentry.github.io/python-novice-inflammation/08-func/index.html
667+
[wiki-amdahl]: https://en.wikipedia.org/wiki/Amdahl's_law

_includes/snippets_library/ComputeCanada_Graham_slurm/_config_options.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -57,12 +57,12 @@ sched:
5757
hist: "sacct -u yourUsername"
5858

5959
episode_order:
60-
- 11-hpc-intro
60+
- 10-hpc-intro
61+
- 11-connecting
6162
- 12-cluster
6263
- 13-scheduler
6364
- 14-modules
6465
- 15-transferring-files
6566
- 16-parallel
6667
- 17-resources
6768
- 18-responsibility
68-

_includes/snippets_library/EPCC_Cirrus_pbs/_config_options.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -58,12 +58,12 @@ sched:
5858
hist: "qstat -x"
5959

6060
episode_order:
61-
- 11-hpc-intro
61+
- 10-hpc-intro
62+
- 11-connecting
6263
- 12-cluster
6364
- 13-scheduler
6465
- 14-modules
6566
- 15-transferring-files
6667
- 16-parallel
6768
- 17-resources
6869
- 18-responsibility
69-

_includes/snippets_library/Magic_Castle_EESSI_slurm/_config_options.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,8 @@ sched:
7272
hist: "sacct -u yourUsername"
7373

7474
episode_order:
75-
- 11-hpc-intro
75+
- 10-hpc-intro
76+
- 11-connecting
7677
- 12-cluster
7778
- 13-scheduler
7879
- 14-modules

_includes/snippets_library/NIST_CTCMS_slurm/_config_options.yml

+2-3
Original file line numberDiff line numberDiff line change
@@ -57,12 +57,11 @@ sched:
5757
hist: "sacct -u yourUsername"
5858

5959
episode_order:
60-
- 11-hpc-intro
60+
- 10-hpc-intro
61+
- 11-connecting
6162
- 12-cluster
6263
- 13-scheduler
63-
- 14-modules
6464
- 15-transferring-files
6565
- 16-parallel
6666
- 17-resources
6767
- 18-responsibility
68-

_includes/snippets_library/Norway_SIGMA2_SAGA_slurm/_config_options.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -57,12 +57,12 @@ sched:
5757
hist: "sacct -u $USER"
5858

5959
episode_order:
60-
- 11-hpc-intro
60+
- 10-hpc-intro
61+
- 11-connecting
6162
- 12-cluster
6263
- 13-scheduler
6364
- 14-modules
6465
- 15-transferring-files
6566
- 16-parallel
6667
- 17-resources
6768
- 18-responsibility
68-

_includes/snippets_library/UCL_Myriad_sge/_config_options.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,12 @@ sched:
3636
bash_shebang: "#!/bin/bash -l"
3737

3838
episode_order:
39-
- 11-hpc-intro
39+
- 10-hpc-intro
40+
- 11-connecting
4041
- 12-cluster
4142
- 13-scheduler
4243
- 14-modules
4344
- 15-transferring-files
4445
- 16-parallel
4546
- 17-resources
4647
- 18-responsibility
47-

0 commit comments

Comments
 (0)