Skip to content

Commit 5def70e

Browse files
committed
regulat lesson materials
1 parent 84efd93 commit 5def70e

13 files changed

+91
-126
lines changed

content/GPU-computing.rst

+9-19
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
.. _GPU-computing:
22

3-
GPU computing
3+
4+
GPU Computing
45
=============
56

67
.. questions::
@@ -22,11 +23,10 @@ GPU computing
2223
- 40 min exercises
2324

2425

25-
GPU Intro
26+
Introduction to GPU programming
2627
---------
2728

2829

29-
3030
Moore's law
3131
^^^^^^^^^^^
3232

@@ -61,7 +61,6 @@ with the term *accelerator*. GPU provides much higher instruction throughput
6161
and memory bandwidth than CPU within a similar price and power envelope.
6262

6363

64-
6564
How do GPUs differ from CPUs?
6665
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6766

@@ -78,7 +77,6 @@ and complex flow control to avoid long memory access latencies,
7877
both of which are expensive in terms of transistors.
7978

8079

81-
8280
.. figure:: img/gpu_vs_cpu.png
8381
:align: center
8482

@@ -160,6 +158,7 @@ This workshop will focus on Numba only.
160158
Numba for GPUs
161159
--------------
162160

161+
163162
Terminology
164163
^^^^^^^^^^^
165164

@@ -200,8 +199,6 @@ NumPy arrays are transferred between the CPU and the GPU automatically.
200199
this feature is called dynamic parallelism but Numba does not support it currently
201200

202201

203-
204-
205202
ufunc (gufunc) decorator
206203
^^^^^^^^^^^^^^^^^^^^^^^^
207204

@@ -349,7 +346,8 @@ Alough it is simple to use ufuncs(gfuncs) to run on GPU, the performance is the
349346
In addition, not all functions can be written as ufuncs in practice. To have much more flexibility,
350347
one needs to write a kernel on GPU or device function, which requires more understanding of the GPU programming.
351348

352-
GPU Programming Model
349+
350+
GPU programming model
353351
^^^^^^^^^^^^^^^^^^^^^
354352

355353
Accelerators are a separate main circuit board with the processor, memory, power management, etc.,
@@ -363,6 +361,7 @@ The device code is executed by doing calls to functions (kernels) written specif
363361
to take advantage of the GPU. The kernel calls are asynchronous, the control is returned
364362
to the host after a kernel calls. All kernels are executed sequentially.
365363

364+
366365
GPU Autopsy. Volta GPU
367366
~~~~~~~~~~~~~~~~~~~~~~
368367

@@ -470,7 +469,6 @@ For 1D, it is numba.cuda.threadIdx.x + numba.cuda.blockIdx.x * numba.cuda.blockD
470469
use the GPU computational resources efficiently.
471470

472471

473-
474472
It is important to notice that the total number of threads in a grid is a multiple of the block size.
475473
This is not necessary the case for the problem that we are solving: the length of the vectors
476474
can be non-divisible by selected block size. So we either need to make sure that the threads
@@ -509,13 +507,13 @@ values like 128, 256 or 512 are frequently used
509507
- it must be large than the number of available (single precision, double precision or integer operation) cores in a SM to fully occupy the SM
510508

511509

512-
513-
Data and Memory management
510+
Data and memory management
514511
^^^^^^^^^^^^^^^^^^^^^^^^^^
515512

516513
With many cores trying to access the memory simultaneously and with little cache available,
517514
the accelerator can run out of memory very quickly. This makes the data and memory management an essential task on the GPU.
518515

516+
519517
Data transfer
520518
~~~~~~~~~~~~~
521519

@@ -565,7 +563,6 @@ CUDA Kernel and device functions are created with the ``numba.cuda.jit`` decorat
565563
We will use Numba function ``numba.cuda.grid(ndim)`` to calculate the global thread positions.
566564

567565

568-
569566
.. demo:: Demo: CUDA kernel
570567

571568
.. tabs::
@@ -668,9 +665,6 @@ We will use Numba function ``numba.cuda.grid(ndim)`` to calculate the global thr
668665
:language: ipython
669666

670667

671-
672-
673-
674668
.. note::
675669

676670
``numba.cuda.synchronize()`` is used after the kernel launch to make sure the profiling is correct.
@@ -680,8 +674,6 @@ We will use Numba function ``numba.cuda.grid(ndim)`` to calculate the global thr
680674
e.g. matmul_numba_gpu.max_blocksize = 32
681675

682676

683-
684-
685677
Optimization
686678
------------
687679

@@ -836,8 +828,6 @@ Exercises
836828
$ sbatch job.sh sbatch_matmul_sm.py
837829
838830
839-
840-
841831
.. exercise:: Discrete Laplace Operator
842832

843833
In this exercise, we will work with the discrete Laplace operator.

content/dask.rst

+12-14
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
.. _dask:
22

3-
Dask for scalable analytics
3+
4+
Dask for Scalable Analytics
45
===========================
56

67
.. objectives::
@@ -14,6 +15,7 @@ Dask for scalable analytics
1415
- 40 min teaching/type-along
1516
- 40 min exercises
1617

18+
1719
Overview
1820
--------
1921

@@ -31,6 +33,7 @@ tools to work with big data. In addition, Dask can also speeds up
3133
our analysis by using multiple CPU cores which makes our work run
3234
faster on laptop, HPC and cloud platforms.
3335

36+
3437
What is Dask?
3538
-------------
3639

@@ -49,7 +52,8 @@ Dask is composed of two parts:
4952
by schedulers on a single machine or a cluster. From the
5053
`Dask documentation <https://docs.dask.org/en/stable/>`__.
5154

52-
Dask Clusters
55+
56+
Dask clusters
5357
-------------
5458

5559
Dask needs computing resources in order to perform parallel computations.
@@ -145,8 +149,6 @@ http://localhost:8787/status and can be always queried from commond line by:
145149
# or
146150
client.dashboard_link
147151
148-
149-
150152
When everything finishes, you can shut down the connected scheduler and workers
151153
by calling the :meth:`shutdown` method:
152154

@@ -155,9 +157,7 @@ by calling the :meth:`shutdown` method:
155157
client.shutdown()
156158
157159
158-
159-
160-
Dask Collections
160+
Dask collections
161161
----------------
162162

163163
Dask provides dynamic parallel task scheduling and
@@ -168,7 +168,7 @@ three main high-level collections:
168168
- ``dask.bag``: Parallel Python Lists
169169

170170

171-
Dask Arrays
171+
Dask arrays
172172
^^^^^^^^^^^
173173

174174
A Dask array looks and feels a lot like a NumPy array.
@@ -254,7 +254,8 @@ We can visualize the symbolic operations by calling :meth:`visualize`:
254254
You can find additional details and examples here
255255
https://examples.dask.org/array.html.
256256

257-
Dask Dataframe
257+
258+
Dask dataframe
258259
^^^^^^^^^^^^^^
259260

260261
Dask dataframes split a dataframe into partitions along an index and can be used
@@ -308,7 +309,7 @@ You can find additional details and examples here
308309
https://examples.dask.org/dataframe.html.
309310

310311

311-
Dask Bag
312+
Dask bag
312313
^^^^^^^^
313314

314315
A Dask bag enables processing data that can be represented as a sequence of arbitrary
@@ -375,8 +376,7 @@ specifically the step where we count words in a text.
375376
both parallelisation and the ability to utilize RAM on multiple machines.
376377

377378

378-
379-
Dask Delayed
379+
Dask delayed
380380
^^^^^^^^^^^^
381381

382382
Sometimes problems don't fit into one of the collections like
@@ -452,7 +452,6 @@ to make them lazy and tasks into a graph which we will run later on parallel har
452452
x.compute()
453453

454454

455-
456455
Comparison to Spark
457456
-------------------
458457

@@ -798,7 +797,6 @@ Exercises
798797
plt.plot(tas_sto.year,tas_sto) # plotting trigers computation
799798
800799
801-
802800
.. keypoints::
803801

804802
- Dask uses lazy execution

content/index.rst

+2-10
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,13 @@ processing on single workstations the focus shifts to profiling and optimising,
2121
and distributed computing.
2222

2323

24-
2524
.. prereq::
2625

2726
- Basic experience with Python
2827
- Basic experience in working in a Linux-like terminal
2928
- Some prior experience in working with large or small datasets
3029

3130

32-
3331
.. csv-table::
3432
:widths: auto
3533
:delim: ;
@@ -43,7 +41,6 @@ and distributed computing.
4341
90 min ; :doc:`dask`
4442

4543

46-
4744
.. toctree::
4845
:maxdepth: 1
4946
:caption: Preparation
@@ -80,9 +77,9 @@ and distributed computing.
8077
guide
8178

8279

83-
8480
.. _learner-personas:
8581

82+
8683
Who is the course for?
8784
----------------------
8885

@@ -91,8 +88,6 @@ datasets and who want to learn powerful tools and best practices for writing mor
9188
performant, parallelised, robust and reproducible data analysis pipelines.
9289

9390

94-
95-
9691
About the course
9792
----------------
9893

@@ -107,16 +102,13 @@ Instructors who wish to teach this lesson can refer to the :doc:`guide` for
107102
practical advice.
108103

109104

110-
111-
112105
See also
113106
--------
114107

115108
Each lesson episode has a "See also" section at the end which lists
116109
recommended further learning material.
117110

118111

119-
120112
Credits
121113
-------
122114

@@ -149,6 +141,7 @@ educational material, in particular:
149141
- `Elegant SciPy <https://github.com/elegant-scipy/notebooks/>`__
150142
- `A Comprehensive Guide to NumPy Data Types <https://axil.github.io/a-comprehensive-guide-to-numpy-data-types.html>`__
151143

144+
152145
Instructional Material
153146
^^^^^^^^^^^^^^^^^^^^^^
154147

@@ -185,7 +178,6 @@ With the understanding that:
185178
publicity, privacy, or moral rights may limit how you use the material.
186179

187180

188-
189181
Software
190182
^^^^^^^^
191183

content/motivation.rst

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
.. _motivation:
22

3+
34
Motivation
45
==========
56

@@ -97,11 +98,11 @@ Specifically, the lesson covers:
9798
- How to measure performance and boost performance of time consuming Python functions
9899
- Various methods to parallelise Python code
99100

100-
The lesson does not cover the following:
101+
The lesson does not cover the following episodes but the lesson materials are provided:
101102

102103
- Visualisation techniques
103104
- Machine learning
104-
- GPU related
105+
- GPU programming
105106

106107
.. keypoints::
107108

0 commit comments

Comments
 (0)