Skip to content

Commit 29a824e

Browse files
authored
Merge pull request #86 from qianglise/main
update lesson episode parallel-computing
2 parents abbb37b + 2444fca commit 29a824e

File tree

4 files changed

+240
-56
lines changed

4 files changed

+240
-56
lines changed

content/example/race.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
2+
from multiprocessing import Value
3+
4+
# define a function to increment the value by 1
5+
def inc(i):
6+
val.value += 1
7+
8+
# using a large number to see the problem
9+
n = 100000
10+
11+
# create a shared data and initialize it to 0
12+
val = Value('i', 0)
13+
with ThreadPoolExecutor(max_workers=4) as pool:
14+
pool.map(inc, range(n))
15+
16+
print(val.value)
17+
18+
# create a shared data and initialize it to 0
19+
val = Value('i', 0)
20+
with ProcessPoolExecutor(max_workers=4) as pool:
21+
pool.map(inc, range(n))
22+
23+
print(val.value)

content/exercise/race_dup.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
from concurrent.futures import ProcessPoolExecutor
2+
from multiprocessing import Array
3+
4+
5+
# define a function to increment the value by 1
6+
def inc(i):
7+
ind = mp.current_process().ident % 4
8+
arr[ind] += 1
9+
10+
# define a large number
11+
n = 100000
12+
13+
# create a shared data and initialize it to 0
14+
arr = Array('i', [0]*4)
15+
with ProcessPoolExecutor(max_workers=4) as pool:
16+
pool.map(inc, range(n))
17+
18+
print(arr[:],sum(arr))

content/exercise/race_lock.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
2+
from multiprocessing import Value, Lock
3+
4+
lock = Lock()
5+
6+
# adding lock
7+
def inc(i):
8+
lock.acquire()
9+
val.value += 1
10+
lock.release()
11+
12+
13+
# define a large number
14+
n = 100000
15+
16+
# create a shared data and initialize it to 0
17+
val = Value('i', 0)
18+
with ThreadPoolExecutor(max_workers=4) as pool:
19+
pool.map(inc, range(n))
20+
21+
print(val.value)
22+
23+
# create a shared data and initialize it to 0
24+
val = Value('i', 0)
25+
with ProcessPoolExecutor(max_workers=4) as pool:
26+
pool.map(inc, range(n))
27+
28+
print(val.value)

content/parallel-computing.rst

Lines changed: 171 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -24,33 +24,50 @@ and most of the speed-up in modern CPUs is coming from using multiple
2424
CPU cores, i.e. parallel processing. Parallel processing is normally based
2525
either on multiple threads or multiple processes.
2626

27-
There are three main models of parallel computing:
28-
29-
- **"Embarrassingly" parallel:** the code does not need to synchronize/communicate
30-
with other instances, and you can run
31-
multiple instances of the code separately, and combine the results
32-
later. If you can do this, great!
27+
There are two main models of parallel computing:
3328

3429
- **Shared memory parallelism (multithreading):**
3530

3631
- Parallel threads do separate work and communicate via the same memory and write to shared variables.
37-
- Multiple threads in a single Python program cannot execute at the same time (see GIL below)
32+
- Multiple threads in a single Python program cannot execute at the same time (see **global interpreter lock** below)
3833
- Running multiple threads in Python is *only effective for certain I/O-bound tasks*
3934
- External libraries in other languages (e.g. C) which are called from Python can still use multithreading
4035

4136
- **Distributed memory parallelism (multiprocessing):** Different processes manage their own memory segments and
42-
share data by communicating (passing messages) as needed.
37+
share data by communicating (e.g. passing messages using Message Passing Interface) as needed.
4338

4439
- A process can contain one or more threads
4540
- Two processes can run on different CPU cores and different computers
4641
- Processes have more overhead than threads (creating and destroying processes takes more time)
47-
- Running multiple processes is *only effective for CPU-bound tasks*
42+
- Running multiple processes is *only effective for compute-bound tasks*
43+
44+
45+
.. note::
46+
47+
**"Embarrassingly" parallel**: If you can run multiple instances of a program and do not need to synchronize/communicate with other instances,
48+
i.e. the problem at hand can be easily decomposed into independent tasks or datasets and there is no need to control access to shared resources,
49+
it is known as an embarrassingly parallel program. A few examples are listed here:
50+
- Monte Carlo analysis
51+
- Ensemble calculations of numerical weather prediction
52+
- Discrete Fourier transform
53+
- Convolutional neural networks
54+
- Applying same model on multiple datasets
55+
56+
**GPU computing**: This framwork takes advantages of the massively parallel compute units available in modern GPUs.
57+
It is ideal when you need a large number of simple arithmetic operations
58+
59+
**Distributed computing (Spark, Dask)**: Master-worker parallelism. Master builds a graph of task dependencies and schedules to execute tasks in the appropriate order.
60+
In the next episode we will look at `Dask <https://dask.org/>`__, an array model extension and task scheduler,
61+
which combines multiprocessing with (embarrassingly) parallel workflows and "lazy" execution.
4862

49-
In the next episode we will look at `Dask <https://dask.org/>`__, an array model extension and task scheduler,
50-
which combines multiprocessing with (embarrassingly) parallel workflows and "lazy" execution.
5163

5264
In the Python world, it is common to see the word `concurrency` denoting any type of simultaneous
53-
processing, including *threads*, *tasks* and *processes*.
65+
processing, including *threads*, *tasks* and *processes*.
66+
- Concurrent tasks can be executed in any order but with the same final results
67+
- Concurrent tasks can be but need not to be executed in parallel
68+
- ``concurrent.futures`` module provides implementation of thread and process-based executors for managing resources pools for running concurrent tasks
69+
- Concurrency is difficult: Race condition and Deadlock may arise in concurrent programs
70+
5471

5572
.. warning::
5673

@@ -92,7 +109,7 @@ However, multithreading is still relevant in two situations:
92109
Multithreaded libraries
93110
^^^^^^^^^^^^^^^^^^^^^^^
94111

95-
NumPy and SciPy are built on external libraries such as LAPACK, FFTW append BLAS,
112+
NumPy and SciPy are built on external libraries such as LAPACK, FFTW, BLAS,
96113
which provide optimized routines for linear algebra, Fourier transforms etc.
97114
These libraries are written in C, C++ or Fortran and are thus not limited
98115
by the GIL, so they typically support actual multihreading during the execution.
@@ -101,7 +118,7 @@ like matrix operations or frequency analysis.
101118

102119
Depending on configuration, NumPy will often use multiple threads by default,
103120
but we can use the environment variable ``OMP_NUM_THREADS`` to set the number
104-
of threads manually:
121+
of threads manually in a Unix-like enviroment:
105122

106123
.. code-block:: console
107124
@@ -110,23 +127,6 @@ of threads manually:
110127
After setting this environment variable we continue as usual
111128
and multithreading will be turned on.
112129

113-
.. demo:: Demo: Multithreading NumPy
114-
115-
Here is an example which does a symmetrical matrix inversion of size 4000 by 4000.
116-
To run it, we can save it in a file named `omp_test.py` or download from :download:`here <example/omp_test.py>`.
117-
118-
.. literalinclude:: example/omp_test.py
119-
:language: python
120-
121-
Let us test it with 1 and 4 threads:
122-
123-
.. code-block:: console
124-
125-
$ export OMP_NUM_THREADS=1
126-
$ python omp_test.py
127-
128-
$ export OMP_NUM_THREADS=4
129-
$ python omp_test.py
130130

131131
Multithreaded I/O
132132
^^^^^^^^^^^^^^^^^
@@ -141,7 +141,7 @@ This is how an I/O-bound application might look:
141141

142142
The `threading library <https://docs.python.org/dev/library/threading.html#>`__
143143
provides an API for creating and working with threads. The simplest approach to
144-
create and manage threads is to use the ``ThreadPoolExecutor`` class.
144+
create and manage threads is to use the ``ThreadPoolExecutor`` class from ``concurrent.futures`` module.
145145
An example use case could be to download data from multiple websites using
146146
multiple threads:
147147

@@ -164,39 +164,89 @@ The speedup gained from multithreading I/O bound problems can be understood from
164164
Further details on threading in Python can be found in the **See also** section below.
165165

166166

167+
167168
Multiprocessing
168169
---------------
169170

170171
The ``multiprocessing`` module in Python supports spawning processes using an API
171172
similar to the ``threading`` module. It effectively side-steps the GIL by using
172173
*subprocesses* instead of threads, where each subprocess is an independent Python
173-
process.
174-
175-
One of the simplest ways to use ``multiprocessing`` is via ``Pool`` objects and
174+
process. One of the simplest ways to use ``multiprocessing`` is via ``Pool`` objects and
176175
the parallel :meth:`Pool.map` function, similarly to what we saw for multithreading above.
177-
In the following code, we define a :meth:`square`
178-
function, call the :meth:`cpu_count` method to get the number of CPUs on the machine,
179-
and then initialize a Pool object in a context manager and inside of it call the
180-
:meth:`Pool.map` method to parallelize the computation.
181-
We can save the code in a file named `mp_map.py` or download from :download:`here <example/mp_map.py>`.
182176

183-
.. literalinclude:: example/mp_map.py
184-
:language: python
185-
:emphasize-lines: 1, 11-12
177+
178+
.. note::
179+
180+
``concurrent.futures.ProcessPoolExecutor`` is actually a wrapper for
181+
``multiprocessing.Pool`` to unify the threading and process interfaces.
182+
183+
184+
185+
Multiple arguments
186+
^^^^^^^^^^^^^^^^^^
186187

187188
For functions that take multiple arguments one can instead use the :meth:`Pool.starmap`
188-
function (save as `mp_starmap.py` or download :download:`here <example/mp_starmap.py>`)
189+
function, and there are other options as well, see below:
190+
191+
.. tabs::
192+
193+
.. tab:: ``pool.starmap``
194+
195+
.. code-block:: python
196+
:emphasize-lines: 6,8
197+
198+
import multiprocessing as mp
199+
200+
def power_n(x, n):
201+
return x ** n
202+
203+
if __name__ == '__main__':
204+
with mp.Pool(processes=4) as pool:
205+
res = pool.starmap(power_n, [(x, 2) for x in range(20)])
206+
print(res)
207+
208+
.. tab:: function adapter
209+
210+
.. code-block:: python
211+
:emphasize-lines: 6,7,13
212+
213+
from concurrent.futures import ProcessPoolExecutor
214+
215+
def power_n(x, n):
216+
return x ** n
217+
218+
def f_(args):
219+
return power_n(*args)
220+
221+
xs = np.arange(10)
222+
chunks = np.array_split(xs, xs.shape[0]//2)
223+
224+
with ProcessPoolExecutor(max_workers=4) as pool:
225+
res = pool.map(f_, chunks)
226+
print(list(res))
227+
228+
229+
.. tab:: multiple argument iterables
230+
231+
.. code-block:: python
232+
:emphasize-lines: 7
233+
234+
from concurrent.futures import ProcessPoolExecutor
235+
236+
def power_n(x, n):
237+
return x ** n
238+
239+
with ProcessPoolExecutor(max_workers=4) as pool:
240+
res = pool.map(power_n, range(0,10,2), range(1,11,2))
241+
print(list(res))
242+
189243
190-
.. literalinclude:: example/mp_starmap.py
191-
:language: python
192-
:emphasize-lines: 1, 10-11
193244
194245
.. callout:: Interactive environments
195246

196247
Functionality within multiprocessing requires that the ``__main__`` module be
197-
importable by children processes. This means that for example ``multiprocessing.Pool``
198-
will not work in the interactive interpreter. A fork of multiprocessing, called
199-
``multiprocess``, can be used in interactive environments like Jupyter.
248+
importable by children processes. This means that some functions may not work
249+
in the interactive interpreter like Jupyter-notebook.
200250

201251
``multiprocessing`` has a number of other methods which can be useful for certain
202252
use cases, including ``Process`` and ``Queue`` which make it possible to have direct
@@ -222,8 +272,7 @@ The idea behind MPI is that:
222272
- Tasks communicate and share data by sending messages.
223273
- Many higher-level functions exist to distribute information to other tasks
224274
and gather information from other tasks.
225-
- All tasks typically *run the entire code* and we have to be careful to avoid
226-
that all tasks do the same thing.
275+
227276

228277
``mpi4py`` provides Python bindings for the Message Passing Interface (MPI) standard.
229278
This is how a hello world MPI program looks like in Python:
@@ -422,11 +471,76 @@ Upper-case methods are faster and are strongly recommended for large numeric dat
422471
Exercises
423472
---------
424473

474+
.. exercise:: Multithreading NumPy
475+
476+
Here is a piece of code which does a symmetrical matrix inversion of size 4000 by 4000.
477+
To run it, we can save it in a file named `omp_test.py` or download from :download:`here <example/omp_test.py>`.
478+
479+
.. literalinclude:: example/omp_test.py
480+
:language: python
481+
482+
Let us test it with 1 and 4 threads:
483+
484+
.. code-block:: console
485+
486+
$ export OMP_NUM_THREADS=1
487+
$ python omp_test.py
488+
489+
$ export OMP_NUM_THREADS=4
490+
$ python omp_test.py
491+
492+
493+
.. exercise:: I/O-bound vs CPU-bound
494+
495+
In this exercise, we will simulate an I/O-bound process uing the :meth:`sleep` function.
496+
Typical I/O-bounded processes are disk accesses, network requests etc.
497+
498+
.. literalinclude:: example/io_bound.py
499+
:language: python
500+
501+
When the problem is compute intensive:
502+
503+
.. literalinclude:: example/cpu_bound.py
504+
:language: python
505+
506+
507+
.. exercise:: Race condition
508+
509+
Race condition is considered a common issue for multi-threading/processing applications,
510+
which occurs when two or more threads attempt to access the shared data and
511+
try to modify it at the same time. Try to run the example using different number ``n`` to see the differences.
512+
Think about how we can solve this problem.
513+
514+
515+
.. literalinclude:: example/race.py
516+
:language: python
517+
518+
.. solution::
519+
520+
- locking resources: explicitly using locks
521+
- duplicating resources: making copys of data to each threads/processes so that they do not need to share
522+
523+
.. tabs::
524+
525+
.. tab:: locking
526+
527+
.. literalinclude:: exercise/race_lock.py
528+
:language: python
529+
:emphasize-lines: 2,4,8,10
530+
531+
.. tab:: duplicating
532+
533+
.. literalinclude:: exercise/race_dup.py
534+
:language: python
535+
536+
537+
538+
425539
.. exercise:: Compute numerical integrals
426540

427541
The primary objective of this exercise is to compute integrals :math:`\int_0^1 x^{3/2} \, dx` numerically.
428-
One approach to integration is by establishing a grid along the x-axis. Specifically, the integration range
429-
is divided into 'n' segments or bins. Below is a basic serial code.
542+
One approach to integration is by establishing a grid along the x-axis. Specifically, the integration range
543+
is divided into 'n' segments or bins. Below is a basic serial code.
430544

431545
.. literalinclude:: exercise/1d_Integration_serial.py
432546

@@ -652,5 +766,6 @@ See also
652766
653767
.. keypoints::
654768
655-
- 1 Beaware of GIL and its impact on performance
656-
- 2 Use threads for I/O-bound tasks
769+
- Beaware of GIL and its impact on performance
770+
- Use threads for I/O-bound tasks and multiprocessing for compute-bound tasks
771+
- Make it right before trying to make it fast

0 commit comments

Comments
 (0)