Skip to content

Commit b0a7d3b

Browse files
authored
Merge pull request #117 from wmotion/Episode02
Episode02: fix variable names and shallow editing elsewhere
2 parents b28cc41 + 3dba7df commit b0a7d3b

File tree

1 file changed

+28
-27
lines changed

1 file changed

+28
-27
lines changed

episodes/cupy.Rmd

+28-27
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ From now on we can also use the word *host* to refer to the CPU on the laptop, d
2828
We start by generating an image using Python and NumPy code.
2929
We want to compute a convolution on this input image once on the host and once on the device, and then compare both the execution times and the results.
3030

31-
In an iPython shell or a Jupyter notebook, we can write and execute the following code on the host.
32-
The pixel values will be zero everywhere except for a regular grid of single pixels having value one, very much like a Dirac's delta function; hence the input image is named `deltas`.
31+
We can write and execute the following code on the host in an iPython shell or a Jupyter notebook.
32+
The pixel values will be zero everywhere except for a regular grid of single pixels having value one, very much like a Dirac delta function; hence the input image is named `diracs`.
3333

3434
~~~python
3535
import numpy as np
@@ -39,7 +39,7 @@ diracs = np.zeros((2048, 2048))
3939
diracs[8::16,8::16] = 1
4040
~~~
4141

42-
We can display the top-left corner of the input image to get a feeling of how it looks like, as follows:
42+
We can display the top-left corner of the input image to get a feel for how it looks like, as follows:
4343

4444
~~~python
4545
import pylab as pyl
@@ -54,21 +54,21 @@ pyl.show()
5454

5555
and you should obtain the following image:
5656

57-
![Grid of delta functions](./fig/diracs.png){alt='Grid of delta functions.'}
57+
![Grid of Dirac delta functions](./fig/diracs.png){alt='Grid of Dirac delta functions.'}
5858

5959
## Gaussian convolutions
6060

6161
The illustration below shows an example of convolution (courtesy of Michael Plotke, CC BY-SA 3.0, via Wikimedia Commons).
62-
Looking at the terminology in the illustration, be forewarned that the word *kernel* happens to have different meanings that, inconveniently, apply to both mathematical convolution and coding on a GPU device.
63-
To know more about convolutions, we encourage you to check out [this GitHub repository](https://github.com/vdumoulin/conv_arithmetic) by Vincent Dumoulin and Francesco Visin with some great animations.
62+
Looking at the terminology in the illustration, be forewarned that, inconveniently, the meaning of the word *kernel* is different when talking of mathematical convolutions and of codes for programming a GPU device.
63+
To know more about convolutions, we encourage you to check out [this GitHub repository](https://github.com/vdumoulin/conv_arithmetic) by Vincent Dumoulin and Francesco Visin, who show great animations.
6464

6565
![Example of animated convolution.](./fig/2D_Convolution_Animation.gif){alt="Example of animated convolution"}
6666

6767
In this course section, we will convolve our image with a 2D Gaussian function, having the general form:
6868

6969
$$G(x,y) = \frac{1}{2\pi \sigma^2} \exp\left(-\frac{x^2 + y^2}{2 \sigma^2}\right)$$
7070

71-
where $x$ and $y$ are distances from the origin, and $\sigma$ controls the width of the Gaussian curve.
71+
where $x$ and $y$ are distances from the origin, and $\sigma$ controls the width of the curve.
7272
Since we can think of an image as a matrix of color values, the convolution of that image with a kernel generates a new matrix with different color values.
7373
In particular, convolving images with a 2D Gaussian kernel changes the value of each pixel into a weighted average of the neighboring pixels, thereby smoothing out the features in the input image.
7474

@@ -105,7 +105,7 @@ pyl.imshow(gauss)
105105
pyl.show()
106106
~~~
107107

108-
The code above produces this image of a symmetrical two-dimensional Gaussian:
108+
The code above produces this image of a symmetrical two-dimensional Gaussian surface:
109109

110110
![Two-dimensional Gaussian.](./fig/gauss.png){alt="Two-dimensional Gaussian"}
111111

@@ -129,9 +129,9 @@ We expect that to be in the region of a couple of seconds, as shown in the timin
129129
2.4 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
130130
~~~
131131

132-
Displaying just a corner of the image shows that the Gaussian has so much blurred the original pattern of ones surrounded by zeros that we end up with a regular pattern of Gaussians.
132+
Displaying just a corner of the image shows that the Gaussian filter has so much blurred the original pattern of ones surrounded by zeros that we end up with a regular pattern of Gaussians.
133133

134-
![Grid of Gaussians in the convoluted image.](./fig/convolved_image.png){alt="Grid of Gaussians in the convoluted image"}
134+
![Grid of Gaussian surfaces in the convoluted image.](./fig/convolved_image.png){alt="Grid of Gaussians in the convoluted image"}
135135

136136
## Convolution on the GPU Using CuPy
137137

@@ -142,7 +142,7 @@ This picture depicts the different components of CPU and GPU and how they are co
142142
![CPU and GPU are separate entities with an own memory.](./fig/CPU_and_GPU_separated.png){alt="CPU and GPU are separate entities with an own memory"}
143143

144144
This means that the array created with NumPy is physically stored in a memory of the host's and, therefore, is only available to the CPU.
145-
Since our input image and convolution filter are not yet present in the device memory, we need to copy both data to the GPU before executing any code on it.
145+
Since our input image and convolution filter are not present in the device memory yet, we need to copy them to the GPU before executing any code on it.
146146
In practice, we use CuPy to copy the arrays `diracs` and `gauss` from the host's Random Access Memory (RAM) to the GPU memory as follows:
147147

148148
~~~python
@@ -154,10 +154,11 @@ gauss_gpu = cp.asarray(gauss)
154154

155155
Now it is time to compute the convolution on our GPU.
156156
Inconveniently, SciPy does not offer methods running on GPUs.
157-
Hence, we import the convolution function from a CuPy package aliased as `cupyx`, whose sub-package [`cupyx.scipy`](https://docs.cupy.dev/en/stable/reference/scipy.html) performs a selection of the SciPy operations.
158-
We will soon verify that the GPU convolution function of `cupyx` works out the same calculations as the CPU convolution function of SciPy.
159-
In general, CuPy proper and NumPy are so similar as are the `cupyx` methods and SciPy; this is intended to invite programmers already familiar with NumPy and SciPy to use the GPU for computing.
160-
For now, let's again record the execution time on the device for the same convolution as the host, and can compare the respective performances.
157+
Hence, we import the convolution function from a CuPy package aliased as `cupyx`.
158+
Its sub-package [`cupyx.scipy`](https://docs.cupy.dev/en/stable/reference/scipy.html) performs a selection of the SciPy operations.
159+
We will soon verify that the convolution function of `cupyx` works out the same calculations on the GPU as the convolution function of SciPy on the CPU.
160+
In general, CuPy proper and NumPy are so similar one to another as are the `cupyx` methods and SciPy; this is intended to invite programmers already familiar with NumPy and SciPy to use the GPU for computing.
161+
For now, let's again record the execution time on the device for the same convolution as the host, and compare the respective performances.
161162

162163
~~~python
163164
from cupyx.scipy.signal import convolve2d as convolve2d_gpu
@@ -166,8 +167,8 @@ convolved_image_gpu = convolve2d_gpu(diracs_gpu, gauss_gpu)
166167
%timeit -n 7 -r 1 convolved_image_gpu = convolve2d_gpu(diracs_gpu, gauss_gpu)
167168
~~~
168169

169-
Also the execution time of the GPU convolution will depend very much on the hardware used, as seen for the host.
170-
The timing using a NVIDIA Tesla T4 on Google Colab was:
170+
The execution time of the GPU convolution will depend very much on the hardware used, just as seen for the host.
171+
The timing using a NVIDIA Tesla T4 at Google Colab was:
171172

172173
~~~output
173174
98.2 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 7 loops each)
@@ -178,20 +179,20 @@ Impressive, but is that true?
178179

179180
## Measuring performance
180181

181-
So far we used `timeit` to measure the performance of our Python code, no matter whether it was running on the CPU or was GPU-accelerated.
182-
However, the execution on the GPU is *asynchronous*: the Python interpreter takes back control of the program execution immediately, while the GPU is still executing the task.
182+
So far we used `timeit` to measure the performance of our Python code, regardless of whether it was running on the CPU or was GPU-accelerated.
183+
However, the execution on the GPU is *asynchronous*: the Python interpreter immediately takes back control of the program execution, while the GPU is still executing the task.
183184
Therefore, the timing of `timeit` is not reliable.
184185

185186
Conveniently, `cupyx` provides the function `benchmark()` that measures the actual execution time in the GPU.
186-
The following code executes `convolve2d_gpu()` with the appropriate arguments ten times, and stores inside the `.gpu_times` attribute of the variable `execution_gpu` the execution time of each run in seconds.
187+
The following code executes `convolve2d_gpu()` with the appropriate arguments ten times, and stores inside the `.gpu_times` attribute of the variable `benchmark_gpu` the execution time of each run in seconds.
187188

188189
~~~python
189190
from cupyx.profiler import benchmark
190191

191192
benchmark_gpu = benchmark(convolve2d_gpu, (diracs_gpu, gauss_gpu), n_repeat=10)
192193
~~~
193194

194-
These measurements are also more stable and representative, because `benchmark()` disregards the compile time and the repetitions warm up the GPU.
195+
These measurements are also more stable and representative, because `benchmark()` disregards the compile time and because the repetitions warm up the GPU.
195196
We can then average the execution times, as follows:
196197

197198
~~~python
@@ -215,7 +216,7 @@ If this works, it will save us the time and effort of transferring the arrays `d
215216

216217
::::::::::::::::::::::::::::::::::::: solution
217218

218-
We can call the GPU convolution function `convolve2d_gpu()` directly with `deltas` and `gauss` as argument:
219+
We can call the GPU convolution function `convolve2d_gpu()` directly with `diracs` and `gauss` as argument:
219220

220221
~~~python
221222
convolve2d_gpu(diracs, gauss)
@@ -254,7 +255,7 @@ Hint: use the `cp.asnumpy()` method to copy a CuPy array back to the host.
254255

255256
:::::::::::::::::::::::::::::::::::::: solution
256257

257-
A convenient strategy is to time the execution of a single Python function that groups the transfers to and from the GPU and the convolution, as follows:
258+
An effective strategy is to time the execution of a single Python function that groups the transfers to and from the GPU and the convolution, as follows:
258259

259260
~~~python
260261
def push_compute_pull():
@@ -331,7 +332,7 @@ print(f"{gpu_execution_avg:.6f} s")
331332

332333
You may be surprised that these commands do not throw any error.
333334
Contrary to SciPy, NumPy routines accept CuPy arrays as input, even though the latter exist only in GPU memory.
334-
Indeed, can you recall when we validated our codes using a NumPy and a CuPy array as input of `np.allclose()`?
335+
Indeed, can you recall that we validated our codes using a NumPy and a CuPy array as input of `np.allclose()`?
335336
That worked for the same reason.
336337
[The CuPy documentation](https://docs.cupy.dev/en/stable/user_guide/interoperability.html#numpy) explains why NumPy routines can handle CuPy arrays.
337338

@@ -568,7 +569,7 @@ The number of sources in the image at the 5σ level is 185.
568569
Fastest CPU CCL time = 2.546e+01 ms.
569570
~~~
570571

571-
Let's not just accept the answer, but also do a sanity check.
572+
Let's not just accept the answer, and let's do a sanity check.
572573
What are the values in the labeled image?
573574

574575
~~~python
@@ -633,7 +634,7 @@ all_integrated_fluxes = sl_cpu(data, labeled_image,
633634
range(1, number_of_sources_in_image+1))
634635
~~~
635636

636-
Which yields, on my machine:
637+
which yields, on my machine:
637638

638639
~~~output
639640
797 ms ± 9.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
@@ -649,7 +650,7 @@ fastest_source_measurements_CPU = timing_source_measurements_CPU.best
649650
print(f"Fastest CPU set of source measurements = {1000 * fastest_source_measurements_CPU:.3e} ms.")
650651
~~~
651652

652-
Which yields
653+
which yields
653654

654655
~~~output
655656
Fastest CPU set of source measurements = 7.838e+02 ms.

0 commit comments

Comments
 (0)