You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -24,33 +24,50 @@ and most of the speed-up in modern CPUs is coming from using multiple
24
24
CPU cores, i.e. parallel processing. Parallel processing is normally based
25
25
either on multiple threads or multiple processes.
26
26
27
-
There are three main models of parallel computing:
28
-
29
-
- **"Embarrassingly" parallel:** the code does not need to synchronize/communicate
30
-
with other instances, and you can run
31
-
multiple instances of the code separately, and combine the results
32
-
later. If you can do this, great!
27
+
There are two main models of parallel computing:
33
28
34
29
- **Shared memory parallelism (multithreading):**
35
30
36
31
- Parallel threads do separate work and communicate via the same memory and write to shared variables.
37
-
- Multiple threads in a single Python program cannot execute at the same time (see GIL below)
32
+
- Multiple threads in a single Python program cannot execute at the same time (see **global interpreter lock** below)
38
33
- Running multiple threads in Python is *only effective for certain I/O-bound tasks*
39
34
- External libraries in other languages (e.g. C) which are called from Python can still use multithreading
40
35
41
36
- **Distributed memory parallelism (multiprocessing):** Different processes manage their own memory segments and
42
-
share data by communicating (passing messages) as needed.
37
+
share data by communicating (e.g. passing messages using Message Passing Interface) as needed.
43
38
44
39
- A process can contain one or more threads
45
40
- Two processes can run on different CPU cores and different computers
46
41
- Processes have more overhead than threads (creating and destroying processes takes more time)
47
-
- Running multiple processes is *only effective for CPU-bound tasks*
42
+
- Running multiple processes is *only effective for compute-bound tasks*
43
+
44
+
45
+
.. note::
46
+
47
+
**"Embarrassingly" parallel**: If you can run multiple instances of a program and do not need to synchronize/communicate with other instances,
48
+
i.e. the problem at hand can be easily decomposed into independent tasks or datasets and there is no need to control access to shared resources,
49
+
it is known as an embarrassingly parallel program. A few examples are listed here:
50
+
- Monte Carlo analysis
51
+
- Ensemble calculations of numerical weather prediction
52
+
- Discrete Fourier transform
53
+
- Convolutional neural networks
54
+
- Applying same model on multiple datasets
55
+
56
+
**GPU computing**: This framwork takes advantages of the massively parallel compute units available in modern GPUs.
57
+
It is ideal when you need a large number of simple arithmetic operations
58
+
59
+
**Distributed computing (Spark, Dask)**: Master-worker parallelism. Master builds a graph of task dependencies and schedules to execute tasks in the appropriate order.
60
+
In the next episode we will look at `Dask <https://dask.org/>`__, an array model extension and task scheduler,
61
+
which combines multiprocessing with (embarrassingly) parallel workflows and "lazy" execution.
48
62
49
-
In the next episode we will look at `Dask <https://dask.org/>`__, an array model extension and task scheduler,
50
-
which combines multiprocessing with (embarrassingly) parallel workflows and "lazy" execution.
51
63
52
64
In the Python world, it is common to see the word `concurrency` denoting any type of simultaneous
53
-
processing, including *threads*, *tasks* and *processes*.
65
+
processing, including *threads*, *tasks* and *processes*.
66
+
- Concurrent tasks can be executed in any order but with the same final results
67
+
- Concurrent tasks can be but need not to be executed in parallel
68
+
- ``concurrent.futures`` module provides implementation of thread and process-based executors for managing resources pools for running concurrent tasks
69
+
- Concurrency is difficult: Race condition and Deadlock may arise in concurrent programs
70
+
54
71
55
72
.. warning::
56
73
@@ -92,7 +109,7 @@ However, multithreading is still relevant in two situations:
92
109
Multithreaded libraries
93
110
^^^^^^^^^^^^^^^^^^^^^^^
94
111
95
-
NumPy and SciPy are built on external libraries such as LAPACK, FFTW append BLAS,
112
+
NumPy and SciPy are built on external libraries such as LAPACK, FFTW, BLAS,
96
113
which provide optimized routines for linear algebra, Fourier transforms etc.
97
114
These libraries are written in C, C++ or Fortran and are thus not limited
98
115
by the GIL, so they typically support actual multihreading during the execution.
@@ -101,7 +118,7 @@ like matrix operations or frequency analysis.
101
118
102
119
Depending on configuration, NumPy will often use multiple threads by default,
103
120
but we can use the environment variable ``OMP_NUM_THREADS`` to set the number
104
-
of threads manually:
121
+
of threads manually in a Unix-like enviroment:
105
122
106
123
.. code-block:: console
107
124
@@ -110,23 +127,6 @@ of threads manually:
110
127
After setting this environment variable we continue as usual
111
128
and multithreading will be turned on.
112
129
113
-
.. demo:: Demo: Multithreading NumPy
114
-
115
-
Here is an example which does a symmetrical matrix inversion of size 4000 by 4000.
116
-
To run it, we can save it in a file named `omp_test.py` or download from :download:`here <example/omp_test.py>`.
117
-
118
-
.. literalinclude:: example/omp_test.py
119
-
:language: python
120
-
121
-
Let us test it with 1 and 4 threads:
122
-
123
-
.. code-block:: console
124
-
125
-
$ export OMP_NUM_THREADS=1
126
-
$ python omp_test.py
127
-
128
-
$ export OMP_NUM_THREADS=4
129
-
$ python omp_test.py
130
130
131
131
Multithreaded I/O
132
132
^^^^^^^^^^^^^^^^^
@@ -141,7 +141,7 @@ This is how an I/O-bound application might look:
141
141
142
142
The `threading library <https://docs.python.org/dev/library/threading.html#>`__
143
143
provides an API for creating and working with threads. The simplest approach to
144
-
create and manage threads is to use the ``ThreadPoolExecutor`` class.
144
+
create and manage threads is to use the ``ThreadPoolExecutor`` class from ``concurrent.futures`` module.
145
145
An example use case could be to download data from multiple websites using
146
146
multiple threads:
147
147
@@ -164,39 +164,89 @@ The speedup gained from multithreading I/O bound problems can be understood from
164
164
Further details on threading in Python can be found in the **See also** section below.
165
165
166
166
167
+
167
168
Multiprocessing
168
169
---------------
169
170
170
171
The ``multiprocessing`` module in Python supports spawning processes using an API
171
172
similar to the ``threading`` module. It effectively side-steps the GIL by using
172
173
*subprocesses* instead of threads, where each subprocess is an independent Python
173
-
process.
174
-
175
-
One of the simplest ways to use ``multiprocessing`` is via ``Pool`` objects and
174
+
process. One of the simplest ways to use ``multiprocessing`` is via ``Pool`` objects and
176
175
the parallel :meth:`Pool.map` function, similarly to what we saw for multithreading above.
177
-
In the following code, we define a :meth:`square`
178
-
function, call the :meth:`cpu_count` method to get the number of CPUs on the machine,
179
-
and then initialize a Pool object in a context manager and inside of it call the
180
-
:meth:`Pool.map` method to parallelize the computation.
181
-
We can save the code in a file named `mp_map.py` or download from :download:`here <example/mp_map.py>`.
182
176
183
-
.. literalinclude:: example/mp_map.py
184
-
:language: python
185
-
:emphasize-lines: 1, 11-12
177
+
178
+
.. note::
179
+
180
+
``concurrent.futures.ProcessPoolExecutor`` is actually a wrapper for
181
+
``multiprocessing.Pool`` to unify the threading and process interfaces.
182
+
183
+
184
+
185
+
Multiple arguments
186
+
^^^^^^^^^^^^^^^^^^
186
187
187
188
For functions that take multiple arguments one can instead use the :meth:`Pool.starmap`
188
-
function (save as `mp_starmap.py` or download :download:`here <example/mp_starmap.py>`)
189
+
function, and there are other options as well, see below:
190
+
191
+
.. tabs::
192
+
193
+
.. tab:: ``pool.starmap``
194
+
195
+
.. code-block:: python
196
+
:emphasize-lines: 6,8
197
+
198
+
import multiprocessing as mp
199
+
200
+
defpower_n(x, n):
201
+
return x ** n
202
+
203
+
if__name__=='__main__':
204
+
with mp.Pool(processes=4) as pool:
205
+
res = pool.starmap(power_n, [(x, 2) for x inrange(20)])
206
+
print(res)
207
+
208
+
.. tab:: function adapter
209
+
210
+
.. code-block:: python
211
+
:emphasize-lines: 6,7,13
212
+
213
+
from concurrent.futures import ProcessPoolExecutor
214
+
215
+
defpower_n(x, n):
216
+
return x ** n
217
+
218
+
deff_(args):
219
+
return power_n(*args)
220
+
221
+
xs = np.arange(10)
222
+
chunks = np.array_split(xs, xs.shape[0]//2)
223
+
224
+
with ProcessPoolExecutor(max_workers=4) as pool:
225
+
res = pool.map(f_, chunks)
226
+
print(list(res))
227
+
228
+
229
+
.. tab:: multiple argument iterables
230
+
231
+
.. code-block:: python
232
+
:emphasize-lines: 7
233
+
234
+
from concurrent.futures import ProcessPoolExecutor
235
+
236
+
defpower_n(x, n):
237
+
return x ** n
238
+
239
+
with ProcessPoolExecutor(max_workers=4) as pool:
240
+
res = pool.map(power_n, range(0,10,2), range(1,11,2))
241
+
print(list(res))
242
+
189
243
190
-
.. literalinclude:: example/mp_starmap.py
191
-
:language: python
192
-
:emphasize-lines: 1, 10-11
193
244
194
245
.. callout:: Interactive environments
195
246
196
247
Functionality within multiprocessing requires that the ``__main__`` module be
197
-
importable by children processes. This means that for example ``multiprocessing.Pool``
198
-
will not work in the interactive interpreter. A fork of multiprocessing, called
199
-
``multiprocess``, can be used in interactive environments like Jupyter.
248
+
importable by children processes. This means that some functions may not work
249
+
in the interactive interpreter like Jupyter-notebook.
200
250
201
251
``multiprocessing`` has a number of other methods which can be useful for certain
202
252
use cases, including ``Process`` and ``Queue`` which make it possible to have direct
@@ -222,8 +272,7 @@ The idea behind MPI is that:
222
272
- Tasks communicate and share data by sending messages.
223
273
- Many higher-level functions exist to distribute information to other tasks
224
274
and gather information from other tasks.
225
-
- All tasks typically *run the entire code* and we have to be careful to avoid
226
-
that all tasks do the same thing.
275
+
227
276
228
277
``mpi4py`` provides Python bindings for the Message Passing Interface (MPI) standard.
229
278
This is how a hello world MPI program looks like in Python:
@@ -422,11 +471,76 @@ Upper-case methods are faster and are strongly recommended for large numeric dat
422
471
Exercises
423
472
---------
424
473
474
+
.. exercise:: Multithreading NumPy
475
+
476
+
Here is a piece of code which does a symmetrical matrix inversion of size 4000 by 4000.
477
+
To run it, we can save it in a file named `omp_test.py` or download from :download:`here <example/omp_test.py>`.
478
+
479
+
.. literalinclude:: example/omp_test.py
480
+
:language: python
481
+
482
+
Let us test it with 1 and 4 threads:
483
+
484
+
.. code-block:: console
485
+
486
+
$ export OMP_NUM_THREADS=1
487
+
$ python omp_test.py
488
+
489
+
$ export OMP_NUM_THREADS=4
490
+
$ python omp_test.py
491
+
492
+
493
+
.. exercise:: I/O-bound vs CPU-bound
494
+
495
+
In this exercise, we will simulate an I/O-bound process uing the :meth:`sleep` function.
496
+
Typical I/O-bounded processes are disk accesses, network requests etc.
497
+
498
+
.. literalinclude:: example/io_bound.py
499
+
:language: python
500
+
501
+
When the problem is compute intensive:
502
+
503
+
.. literalinclude:: example/cpu_bound.py
504
+
:language: python
505
+
506
+
507
+
.. exercise:: Race condition
508
+
509
+
Race condition is considered a common issue for multi-threading/processing applications,
510
+
which occurs when two or more threads attempt to access the shared data and
511
+
try to modify it at the same time. Try to run the example using different number ``n`` to see the differences.
512
+
Think about how we can solve this problem.
513
+
514
+
515
+
.. literalinclude:: example/race.py
516
+
:language: python
517
+
518
+
.. solution::
519
+
520
+
- locking resources: explicitly using locks
521
+
- duplicating resources: making copys of data to each threads/processes so that they do not need to share
522
+
523
+
.. tabs::
524
+
525
+
.. tab:: locking
526
+
527
+
.. literalinclude:: exercise/race_lock.py
528
+
:language: python
529
+
:emphasize-lines: 2,4,8,10
530
+
531
+
.. tab:: duplicating
532
+
533
+
.. literalinclude:: exercise/race_dup.py
534
+
:language: python
535
+
536
+
537
+
538
+
425
539
.. exercise:: Compute numerical integrals
426
540
427
541
The primary objective of this exercise is to compute integrals :math:`\int_0^1 x^{3/2} \, dx` numerically.
428
-
One approach to integration is by establishing a grid along the x-axis. Specifically, the integration range
429
-
is divided into 'n' segments or bins. Below is a basic serial code.
542
+
One approach to integration is by establishing a grid along the x-axis. Specifically, the integration range
543
+
is divided into 'n' segments or bins. Below is a basic serial code.
0 commit comments