Skip to content

Commit ec73f17

Browse files
authored
Merge branch 'ENCCS:main' into main
2 parents 21ed5ed + 2c24c3d commit ec73f17

14 files changed

+172
-170
lines changed

content/GPU-computing.rst

+9-19
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
.. _GPU-computing:
22

3-
GPU computing
3+
4+
GPU Computing
45
=============
56

67
.. questions::
@@ -22,11 +23,10 @@ GPU computing
2223
- 40 min exercises
2324

2425

25-
GPU Intro
26+
Introduction to GPU programming
2627
---------
2728

2829

29-
3030
Moore's law
3131
^^^^^^^^^^^
3232

@@ -61,7 +61,6 @@ with the term *accelerator*. GPU provides much higher instruction throughput
6161
and memory bandwidth than CPU within a similar price and power envelope.
6262

6363

64-
6564
How do GPUs differ from CPUs?
6665
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6766

@@ -78,7 +77,6 @@ and complex flow control to avoid long memory access latencies,
7877
both of which are expensive in terms of transistors.
7978

8079

81-
8280
.. figure:: img/gpu_vs_cpu.png
8381
:align: center
8482

@@ -160,6 +158,7 @@ This workshop will focus on Numba only.
160158
Numba for GPUs
161159
--------------
162160

161+
163162
Terminology
164163
^^^^^^^^^^^
165164

@@ -200,8 +199,6 @@ NumPy arrays are transferred between the CPU and the GPU automatically.
200199
this feature is called dynamic parallelism but Numba does not support it currently
201200

202201

203-
204-
205202
ufunc (gufunc) decorator
206203
^^^^^^^^^^^^^^^^^^^^^^^^
207204

@@ -349,7 +346,8 @@ Alough it is simple to use ufuncs(gfuncs) to run on GPU, the performance is the
349346
In addition, not all functions can be written as ufuncs in practice. To have much more flexibility,
350347
one needs to write a kernel on GPU or device function, which requires more understanding of the GPU programming.
351348

352-
GPU Programming Model
349+
350+
GPU programming model
353351
^^^^^^^^^^^^^^^^^^^^^
354352

355353
Accelerators are a separate main circuit board with the processor, memory, power management, etc.,
@@ -363,6 +361,7 @@ The device code is executed by doing calls to functions (kernels) written specif
363361
to take advantage of the GPU. The kernel calls are asynchronous, the control is returned
364362
to the host after a kernel calls. All kernels are executed sequentially.
365363

364+
366365
GPU Autopsy. Volta GPU
367366
~~~~~~~~~~~~~~~~~~~~~~
368367

@@ -470,7 +469,6 @@ For 1D, it is numba.cuda.threadIdx.x + numba.cuda.blockIdx.x * numba.cuda.blockD
470469
use the GPU computational resources efficiently.
471470

472471

473-
474472
It is important to notice that the total number of threads in a grid is a multiple of the block size.
475473
This is not necessary the case for the problem that we are solving: the length of the vectors
476474
can be non-divisible by selected block size. So we either need to make sure that the threads
@@ -509,13 +507,13 @@ values like 128, 256 or 512 are frequently used
509507
- it must be large than the number of available (single precision, double precision or integer operation) cores in a SM to fully occupy the SM
510508

511509

512-
513-
Data and Memory management
510+
Data and memory management
514511
^^^^^^^^^^^^^^^^^^^^^^^^^^
515512

516513
With many cores trying to access the memory simultaneously and with little cache available,
517514
the accelerator can run out of memory very quickly. This makes the data and memory management an essential task on the GPU.
518515

516+
519517
Data transfer
520518
~~~~~~~~~~~~~
521519

@@ -565,7 +563,6 @@ CUDA Kernel and device functions are created with the ``numba.cuda.jit`` decorat
565563
We will use Numba function ``numba.cuda.grid(ndim)`` to calculate the global thread positions.
566564

567565

568-
569566
.. demo:: Demo: CUDA kernel
570567

571568
.. tabs::
@@ -668,9 +665,6 @@ We will use Numba function ``numba.cuda.grid(ndim)`` to calculate the global thr
668665
:language: ipython
669666

670667

671-
672-
673-
674668
.. note::
675669

676670
``numba.cuda.synchronize()`` is used after the kernel launch to make sure the profiling is correct.
@@ -680,8 +674,6 @@ We will use Numba function ``numba.cuda.grid(ndim)`` to calculate the global thr
680674
e.g. matmul_numba_gpu.max_blocksize = 32
681675

682676

683-
684-
685677
Optimization
686678
------------
687679

@@ -836,8 +828,6 @@ Exercises
836828
$ sbatch job.sh sbatch_matmul_sm.py
837829
838830
839-
840-
841831
.. exercise:: Discrete Laplace Operator
842832

843833
In this exercise, we will work with the discrete Laplace operator.

content/dask.rst

+12-14
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
.. _dask:
22

3-
Dask for scalable analytics
3+
4+
Dask for Scalable Analytics
45
===========================
56

67
.. objectives::
@@ -14,6 +15,7 @@ Dask for scalable analytics
1415
- 40 min teaching/type-along
1516
- 40 min exercises
1617

18+
1719
Overview
1820
--------
1921

@@ -31,6 +33,7 @@ tools to work with big data. In addition, Dask can also speeds up
3133
our analysis by using multiple CPU cores which makes our work run
3234
faster on laptop, HPC and cloud platforms.
3335

36+
3437
What is Dask?
3538
-------------
3639

@@ -49,7 +52,8 @@ Dask is composed of two parts:
4952
by schedulers on a single machine or a cluster. From the
5053
`Dask documentation <https://docs.dask.org/en/stable/>`__.
5154

52-
Dask Clusters
55+
56+
Dask clusters
5357
-------------
5458

5559
Dask needs computing resources in order to perform parallel computations.
@@ -145,8 +149,6 @@ http://localhost:8787/status and can be always queried from commond line by:
145149
# or
146150
client.dashboard_link
147151
148-
149-
150152
When everything finishes, you can shut down the connected scheduler and workers
151153
by calling the :meth:`shutdown` method:
152154

@@ -155,9 +157,7 @@ by calling the :meth:`shutdown` method:
155157
client.shutdown()
156158
157159
158-
159-
160-
Dask Collections
160+
Dask collections
161161
----------------
162162

163163
Dask provides dynamic parallel task scheduling and
@@ -168,7 +168,7 @@ three main high-level collections:
168168
- ``dask.bag``: Parallel Python Lists
169169

170170

171-
Dask Arrays
171+
Dask arrays
172172
^^^^^^^^^^^
173173

174174
A Dask array looks and feels a lot like a NumPy array.
@@ -254,7 +254,8 @@ We can visualize the symbolic operations by calling :meth:`visualize`:
254254
You can find additional details and examples here
255255
https://examples.dask.org/array.html.
256256

257-
Dask Dataframe
257+
258+
Dask dataframe
258259
^^^^^^^^^^^^^^
259260

260261
Dask dataframes split a dataframe into partitions along an index and can be used
@@ -308,7 +309,7 @@ You can find additional details and examples here
308309
https://examples.dask.org/dataframe.html.
309310

310311

311-
Dask Bag
312+
Dask bag
312313
^^^^^^^^
313314

314315
A Dask bag enables processing data that can be represented as a sequence of arbitrary
@@ -375,8 +376,7 @@ specifically the step where we count words in a text.
375376
both parallelisation and the ability to utilize RAM on multiple machines.
376377

377378

378-
379-
Dask Delayed
379+
Dask delayed
380380
^^^^^^^^^^^^
381381

382382
Sometimes problems don't fit into one of the collections like
@@ -452,7 +452,6 @@ to make them lazy and tasks into a graph which we will run later on parallel har
452452
x.compute()
453453

454454

455-
456455
Comparison to Spark
457456
-------------------
458457

@@ -798,7 +797,6 @@ Exercises
798797
plt.plot(tas_sto.year,tas_sto) # plotting trigers computation
799798
800799
801-
802800
.. keypoints::
803801

804802
- Dask uses lazy execution

content/guide.rst

+17-37
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1-
Instructor's guide
1+
Instructor's Guide
22
------------------
33

4+
45
Why teach this lesson
56
^^^^^^^^^^^^^^^^^^^^^
67

@@ -16,6 +17,7 @@ and GPU-ported code in a high-level language like Python can save substantial de
1617
make HPC resources more accessible to a wider range of researchers, and lead to better
1718
overall utilisation of available HPC computing power.
1819

20+
1921
Intended learning outcomes
2022
^^^^^^^^^^^^^^^^^^^^^^^^^^
2123

@@ -43,65 +45,43 @@ Schedule for 3 half-day workshop
4345
+-------------------+------------------------------------+
4446
| Time | Episode |
4547
+===================+====================================+
46-
| 09:00 - 09:20 | :doc:`motivation` |
48+
| 09:00 - 09:15 | Welcome and Ice-Breaking Session |
4749
+-------------------+------------------------------------+
48-
| 09:20 - 10:00 | :doc:`scientific-data` |
50+
| 09:15 - 09:30 | :doc:`motivation` |
4951
+-------------------+------------------------------------+
50-
| 10:00 - 10:20 | Break |
52+
| 09:30 - 10:20 | :doc:`scientific-data` |
5153
+-------------------+------------------------------------+
52-
| 10:20 - 11:00 | :doc:`stack` |
54+
| 10:20 - 10:40 | Break |
5355
+-------------------+------------------------------------+
54-
| 11:00 - 11:20 | Break |
56+
| 10:40 - 11:55 | :doc:`stack` |
5557
+-------------------+------------------------------------+
56-
| 11:20 - 12:00 | :doc:`stack` |
58+
| 11:55 - 12:00 | Q/A and Reflections |
5759
+-------------------+------------------------------------+
5860

59-
6061
**Day 2:**
6162

6263
+-------------------+------------------------------------+
6364
| Time | Episode |
6465
+===================+====================================+
65-
| 09:00 - 09:40 | :doc:`parallel-computing` |
66-
+-------------------+------------------------------------+
67-
| 09:40 - 09:50 | Break |
68-
+-------------------+------------------------------------+
69-
| 09:50 - 10:20 | :doc:`parallel-computing` |
66+
| 09:00 - 10:20 | :doc:`parallel-computing` |
7067
+-------------------+------------------------------------+
7168
| 10:20 - 10:40 | Break |
7269
+-------------------+------------------------------------+
73-
| 10:40 - 11:20 | :doc:`optimization` |
70+
| 10:40 - 11:55 | :doc:`optimization` |
7471
+-------------------+------------------------------------+
75-
| 11:20 - 11:30 | Break |
72+
| 11:55 - 12:00 | Q/A and Reflections |
7673
+-------------------+------------------------------------+
77-
| 11:30 - 12:00 | :doc:`optimization` |
78-
+-------------------+------------------------------------+
79-
8074

8175
**Day 3:**
8276

8377
+-------------------+------------------------------------+
8478
| Time | Episode |
8579
+===================+====================================+
86-
| 09:00 - 09:40 | :doc:`performance-boosting` |
87-
+-------------------+------------------------------------+
88-
| 09:40 - 09:50 | Break |
80+
| 09:00 - 10:10 | :doc:`performance-boosting` |
8981
+-------------------+------------------------------------+
90-
| 09:50 - 10:20 | :doc:`performance-boosting` |
91-
+-------------------+------------------------------------+
92-
| 10:20 - 10:40 | Break |
82+
| 10:10 - 10:30 | Break |
9383
+-------------------+------------------------------------+
94-
| 10:40 - 11:20 | :doc:`dask` |
84+
| 10:30 - 11:50 | :doc:`dask` |
9585
+-------------------+------------------------------------+
96-
| 11:20 - 11:30 | Break |
97-
+-------------------+------------------------------------+
98-
| 11:30 - 12:00 | :doc:`dask` |
99-
+-------------------+------------------------------------+
100-
101-
102-
103-
104-
105-
106-
107-
86+
| 11:50 - 12:00 | Q/A and Summary |
87+
+-------------------+------------------------------------+

0 commit comments

Comments
 (0)