1
1
---
2
2
title : " Running a parallel job"
3
3
teaching : 30
4
- exercises : 30
4
+ exercises : 60
5
5
questions :
6
6
- " How do we execute a task in parallel?"
7
+ - " What benefits arise from parallel execution?"
8
+ - " What are the limits of gains from execution in parallel?"
7
9
objectives :
8
- - " Understand how to run a parallel job on a cluster."
10
+ - " Construct a program that can execute in parallel."
11
+ - " Prepare a job submission script for the parallel executable."
12
+ - " Launch jobs with increasing degrees of parallel execution."
13
+ - " Record and summarize the timing and accuracy of the results."
9
14
keypoints :
10
- - " Parallelism is an important feature of HPC clusters."
11
- - " MPI parallelism is a common case."
15
+ - " Parallel programming allows applications to take advantage of
16
+ parallel hardware; serial code will not 'just work.'"
17
+ - " Distributed memory parallelism is a common case, using the Message
18
+ Passing Interface (MPI)."
12
19
- " The queuing system facilitates executing parallel tasks."
20
+ - " Performance improvements from parallel execution do not scale linearly."
13
21
---
14
22
15
23
We now have the tools we need to run a multi-processor job. This is a very
16
- important aspect of HPC systems, as parallelism is one of the primary tools we
17
- have to improve the performance of computational tasks.
24
+ important aspect of HPC systems, as parallelism is one of the primary tools
25
+ we have to improve the performance of computational tasks.
18
26
19
27
Our example implements a stochastic algorithm for estimating the value of
20
- &# 960 ; , the ratio of the circumference to the diameter of a circle.
21
- The program generates a large number of random points on a 1& times ; 1 square
22
- centered on (& frac12 ; , & frac12 ; ), and checks how many of these points fall
28
+ π , the ratio of the circumference to the diameter of a circle.
29
+ The program generates a large number of random points on a 1× 1 square
30
+ centered on (½,½ ), and checks how many of these points fall
23
31
inside the unit circle.
24
- On average, &# 960 ; /4 of the randomly-selected points should fall in the
25
- circle, so &# 960 ; can be estimated from 4* f* , where * f* is the observed
32
+ On average, π /4 of the randomly-selected points should fall in the
33
+ circle, so π can be estimated from 4* f* , where * f* is the observed
26
34
fraction of points that fall in the circle.
27
35
Because each sample is independent, this algorithm is easily implemented
28
36
in parallel.
@@ -34,14 +42,13 @@ in parallel.
34
42
## A Serial Solution to the Problem
35
43
36
44
We start from a Python script using concepts taught in Software Carpentry's
37
- [ Programming with Python] (
38
- https://swcarpentry.github.io/python-novice-inflammation/ ) workshops.
45
+ [ Programming with Python] [ inflammation ] workshops.
39
46
We want to allow the user to specify how many random points should be used
40
- to calculate &# 960 ; through a command-line parameter.
47
+ to calculate π through a command-line parameter.
41
48
This script will only use a single CPU for its entire run, so it's classified
42
49
as a serial process.
43
50
44
- Let's write a Python program, ` pi.py ` , to estimate &# 960 ; for us.
51
+ Let's write a Python program, ` pi.py ` , to estimate π for us.
45
52
Start by importing the ` numpy ` module for calculating the results,
46
53
and the ` sys ` module to process command-line parameters:
47
54
@@ -52,9 +59,8 @@ import sys
52
59
{: .language-python}
53
60
54
61
We define a Python function ` inside_circle ` that accepts a single parameter
55
- for the number of random points used to calculate π ; .
56
- See [ Programming with Python: Creating Functions] (
57
- https://swcarpentry.github.io/python-novice-inflammation/08-func/index.html )
62
+ for the number of random points used to calculate π.
63
+ See [ Programming with Python: Creating Functions] [ python-func ]
58
64
for a review of Python functions.
59
65
It randomly samples points with both * x* and * y* on the half-open interval
60
66
[ 0, 1).
@@ -74,9 +80,8 @@ def inside_circle(total_count):
74
80
{: .language-python}
75
81
76
82
Next, we create a main function to call the ` inside_circle ` function and
77
- calculate π ; from its returned result.
78
- See [ Programming with Python: Command-Line Programs] (
79
- https://swcarpentry.github.io/python-novice-inflammation/12-cmdline/index.html )
83
+ calculate π from its returned result.
84
+ See [ Programming with Python: Command-Line Programs] [ cmd-line ]
80
85
for a review of ` main ` functions and parsing command-line parameters.
81
86
82
87
```
@@ -93,7 +98,7 @@ if __name__ == '__main__':
93
98
94
99
If we run the Python script locally with a command-line parameter, as in
95
100
` python pi-serial.py 1024 ` , we should see the script print its estimate of
96
- &# 960 ; :
101
+ π :
97
102
98
103
```
99
104
{{ site.local.prompt }} python pi-serial.py 1024
@@ -107,27 +112,28 @@ If we run the Python script locally with a command-line parameter, as in
107
112
> built-in capabilities of NumPy. In general, random-number generation is
108
113
> difficult to do well, it's easy to accidentally introduce correlations into
109
114
> the generated sequence.
115
+ >
110
116
> * Discuss why generating high quality random numbers might be difficult.
111
- > * Is the quality of random numbers generated sufficient for estimating &# 960 ;
117
+ > * Is the quality of random numbers generated sufficient for estimating π
112
118
> in this implementation?
113
- >
119
+ >
114
120
> > ## Solution
115
121
> >
116
122
> > * Computers are deterministic and produce pseudo random numbers using
117
- > > an algorithm. The choice of algorithm and its parameters determines
118
- > > how random the generated numbers are. Pseudo random number generation
119
- > > algorithms usually produce a sequence numbers taking the previous output
123
+ > > an algorithm. The choice of algorithm and its parameters determines
124
+ > > how random the generated numbers are. Pseudo random number generation
125
+ > > algorithms usually produce a sequence numbers taking the previous output
120
126
> > as an input for generating the next number. At some point the sequence of
121
- > > pseudo random numbers will repeat, so care is required to make sure the
122
- > > repetition period is long and that the generated numbers have statistical
127
+ > > pseudo random numbers will repeat, so care is required to make sure the
128
+ > > repetition period is long and that the generated numbers have statistical
123
129
> > properties similar to those of true random numbers.
124
130
> > * Yes.
125
131
> {: .solution }
126
132
{: .discussion }
127
133
128
134
## Measuring Performance of the Serial Solution
129
135
130
- The stochastic method used to estimate &# 960 ; should converge on the true
136
+ The stochastic method used to estimate π should converge on the true
131
137
value as the number of random points increases.
132
138
But as the number of points increases, creating the variables ` x ` , ` y ` , and
133
139
` radii ` requires more time and more memory.
@@ -144,9 +150,7 @@ Since the largest variables in the script are `x`, `y`, and `radii`, each
144
150
containing ` n_samples ` points, we'll modify the script to report their
145
151
total memory required.
146
152
Each point in ` x ` , ` y ` , or ` radii ` is stored as a NumPy ` float64 ` , we can
147
- use NumPy's [ ` dtype ` ] (
148
- https://numpy.org/doc/stable/reference/generated/numpy.dtype.html )
149
- function to calculate the size of a ` float64 ` .
153
+ use NumPy's [ ` dtype ` ] [ np-dtype ] function to calculate the size of a ` float64 ` .
150
154
151
155
Replace the ` print(my_pi) ` line with the following:
152
156
@@ -157,12 +161,12 @@ print("Pi: {}, memory: {} GiB".format(my_pi, memory_required))
157
161
```
158
162
{: .language-python}
159
163
160
- The first line calculates the bytes of memory required for a single ` float64 `
161
- value using the ` dtype ` function.
164
+ The first line calculates the bytes of memory required for a single
165
+ 64-bit floating point number using the ` dtype ` function.
162
166
The second line estimates the total amount of memory required to store three
163
167
variables containing ` n_samples ` ` float64 ` values, converting the value into
164
168
units of [ gibibytes] ( https://en.wikipedia.org/wiki/Byte#Multiple-byte_units ) .
165
- The third line prints both the estimate of &# 960 ; and the estimated amount of
169
+ The third line prints both the estimate of π and the estimated amount of
166
170
memory used by the script.
167
171
168
172
The updated Python script is:
@@ -214,15 +218,15 @@ on the total amount of memory required.
214
218
215
219
### Estimating Calculation Time
216
220
217
- Most of the calculations required to estimate &# 960 ; are in the
221
+ Most of the calculations required to estimate π are in the
218
222
` inside_circle ` function:
219
223
220
224
1 . Generating ` n_samples ` random values for ` x ` and ` y ` .
221
225
1 . Calculating ` n_samples ` values of ` radii ` from ` x ` and ` y ` .
222
226
1 . Counting how many values in ` radii ` are under 1.0.
223
227
224
228
There's also one multiplication operation and one division operation required
225
- to convert the ` counts ` value to the final estimate of &# 960 ; in the main
229
+ to convert the ` counts ` value to the final estimate of π in the main
226
230
function.
227
231
228
232
A simple way to measure the calculation time is to use Python's ` datetime `
@@ -312,14 +316,14 @@ running on the computer at the same time.
312
316
But if the script is the most computationally-intensive process running at the
313
317
time, its calculations are the largest influence on the elapsed time.
314
318
315
- Now that we've developed our initial script to estimate &# 960 ; , we can see
319
+ Now that we've developed our initial script to estimate π , we can see
316
320
that as we increase the number of samples:
317
321
318
- 1 . The estimate of &# 960 ; tends to become more accurate.
322
+ 1 . The estimate of π tends to become more accurate.
319
323
1 . The amount of memory required scales approximately linearly.
320
324
1 . The amount of time to calculate scales approximately linearly.
321
325
322
- In general, achieving a better estimate of &# 960 ; requires a greater number of
326
+ In general, achieving a better estimate of π requires a greater number of
323
327
points.
324
328
Take a closer look at ` inside_circle ` : should we expect to get high accuracy
325
329
on a single machine?
@@ -358,15 +362,15 @@ rather than the command line.
358
362
As before, use the status commands to check when your job runs.
359
363
Use ` ls ` to locate the output file, and examine it. Is it what you expected?
360
364
361
- * How good is the value for &# 960 ; ?
365
+ * How good is the value for π ?
362
366
* How much memory did it need?
363
367
* How long did the job take to run?
364
368
365
369
Modify the job script to increase both the number of samples and the amount
366
370
of memory requested (perhaps by a factor of 2, then by a factor of 10),
367
371
and resubmit the job each time.
368
372
369
- * How good is the value for &# 960 ; ?
373
+ * How good is the value for π ?
370
374
* How much memory did it need?
371
375
* How long did the job take to run?
372
376
@@ -416,7 +420,7 @@ included.
416
420
> by examining the environment variables set when the job is launched.
417
421
{: .callout}
418
422
419
- > ## What Changes Are Needed for an MPI Version of the &# 960 ; Calculator?
423
+ > ## What Changes Are Needed for an MPI Version of the π Calculator?
420
424
>
421
425
> First, we need to import the ` MPI ` object from the Python module ` mpi4py ` by
422
426
> adding an ` from mpi4py import MPI ` line immediately below the `import
@@ -578,7 +582,7 @@ As before, use the status commands to check when your job runs.
578
582
Use ` ls ` to locate the output file, and examine it.
579
583
Is it what you expected?
580
584
581
- * How good is the value for &# 960 ; ?
585
+ * How good is the value for π ?
582
586
* How much memory did it need?
583
587
* How much faster was this run than the serial run with 100000000 points?
584
588
@@ -587,13 +591,13 @@ of memory requested (perhaps by a factor of 2, then by a factor of 10),
587
591
and resubmit the job each time.
588
592
You can also increase the number of CPUs.
589
593
590
- * How good is the value for &# 960 ; ?
594
+ * How good is the value for π ?
591
595
* How much memory did it need?
592
596
* How long did the job take to run?
593
597
594
598
## How Much Does MPI Improve Performance?
595
599
596
- In theory, by dividing up the &# 960 ; calculations among * n* MPI processes,
600
+ In theory, by dividing up the π calculations among * n* MPI processes,
597
601
we should see run times reduce by a factor of * n* .
598
602
In practice, some time is required to start the additional MPI processes,
599
603
for the MPI processes to communicate and coordinate, and some types of
@@ -604,7 +608,7 @@ in the computer, or across multiple compute nodes, additional time is
604
608
required for communication compared to all processes operating on a
605
609
single CPU.
606
610
607
- [ Amdahl's Law] ( https://en.wikipedia.org/ wiki/Amdahl's_law ) is one way of
611
+ [ Amdahl's Law] [ wiki-amdahl ] is one way of
608
612
predicting improvements in execution time for a ** fixed** parallel workload.
609
613
If a workload needs 20 hours to complete on a single core,
610
614
and one hour of that time is spent on tasks that cannot be parallelized,
@@ -650,8 +654,14 @@ In practice, MPI speedup factors are influenced by:
650
654
651
655
In an HPC environment, we try to reduce the execution time for all types of
652
656
jobs, and MPI is an extremely common way to combine dozens, hundreds, or
653
- thousands of CPUs into solving a single problem. To learn more about
654
- parallelization, see the
655
- [ parallel novice lesson] ( http://www.hpc-carpentry.org/hpc-parallel-novice/ )
657
+ thousands of CPUs into solving a single problem. To learn more about
658
+ parallelization, see the [ parallel novice lesson] [ parallel-novice ] lesson.
656
659
657
660
{% include links.md %}
661
+
662
+ [ cmd-line ] : https://swcarpentry.github.io/python-novice-inflammation/12-cmdline/index.html
663
+ [ inflammation ] : https://swcarpentry.github.io/python-novice-inflammation/
664
+ [ np-dtype ] : https://numpy.org/doc/stable/reference/generated/numpy.dtype.html
665
+ [ parallel-novice ] : http://www.hpc-carpentry.org/hpc-parallel-novice/
666
+ [ python-func ] : https://swcarpentry.github.io/python-novice-inflammation/08-func/index.html
667
+ [ wiki-amdahl ] : https://en.wikipedia.org/wiki/Amdahl's_law
0 commit comments