You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: episodes/cupy.Rmd
+28-27
Original file line number
Diff line number
Diff line change
@@ -28,8 +28,8 @@ From now on we can also use the word *host* to refer to the CPU on the laptop, d
28
28
We start by generating an image using Python and NumPy code.
29
29
We want to compute a convolution on this input image once on the host and once on the device, and then compare both the execution times and the results.
30
30
31
-
In an iPython shell or a Jupyter notebook, we can write and execute the following code on the host.
32
-
The pixel values will be zero everywhere except for a regular grid of single pixels having value one, very much like a Dirac's delta function; hence the input image is named `deltas`.
31
+
We can write and execute the following code on the host in an iPython shell or a Jupyter notebook.
32
+
The pixel values will be zero everywhere except for a regular grid of single pixels having value one, very much like a Dirac delta function; hence the input image is named `diracs`.
33
33
34
34
~~~python
35
35
import numpy as np
@@ -39,7 +39,7 @@ diracs = np.zeros((2048, 2048))
39
39
diracs[8::16,8::16] =1
40
40
~~~
41
41
42
-
We can display the top-left corner of the input image to get a feeling of how it looks like, as follows:
42
+
We can display the top-left corner of the input image to get a feel for how it looks like, as follows:
43
43
44
44
~~~python
45
45
import pylab as pyl
@@ -54,21 +54,21 @@ pyl.show()
54
54
55
55
and you should obtain the following image:
56
56
57
-
{alt='Grid of delta functions.'}
57
+
{alt='Grid of Dirac delta functions.'}
58
58
59
59
## Gaussian convolutions
60
60
61
61
The illustration below shows an example of convolution (courtesy of Michael Plotke, CC BY-SA 3.0, via Wikimedia Commons).
62
-
Looking at the terminology in the illustration, be forewarned thatthe word *kernel*happens to have different meanings that, inconveniently, apply to both mathematical convolution and coding on a GPU device.
63
-
To know more about convolutions, we encourage you to check out [this GitHub repository](https://github.com/vdumoulin/conv_arithmetic) by Vincent Dumoulin and Francesco Visin with some great animations.
62
+
Looking at the terminology in the illustration, be forewarned that, inconveniently, the meaning of the word *kernel*is different when talking of mathematical convolutions and of codes for programming a GPU device.
63
+
To know more about convolutions, we encourage you to check out [this GitHub repository](https://github.com/vdumoulin/conv_arithmetic) by Vincent Dumoulin and Francesco Visin, who show great animations.
64
64
65
65
{alt="Example of animated convolution"}
66
66
67
67
In this course section, we will convolve our image with a 2D Gaussian function, having the general form:
where $x$ and $y$ are distances from the origin, and $\sigma$ controls the width of the Gaussian curve.
71
+
where $x$ and $y$ are distances from the origin, and $\sigma$ controls the width of the curve.
72
72
Since we can think of an image as a matrix of color values, the convolution of that image with a kernel generates a new matrix with different color values.
73
73
In particular, convolving images with a 2D Gaussian kernel changes the value of each pixel into a weighted average of the neighboring pixels, thereby smoothing out the features in the input image.
74
74
@@ -105,7 +105,7 @@ pyl.imshow(gauss)
105
105
pyl.show()
106
106
~~~
107
107
108
-
The code above produces this image of a symmetrical two-dimensional Gaussian:
108
+
The code above produces this image of a symmetrical two-dimensional Gaussian surface:
@@ -129,9 +129,9 @@ We expect that to be in the region of a couple of seconds, as shown in the timin
129
129
2.4 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
130
130
~~~
131
131
132
-
Displaying just a corner of the image shows that the Gaussian has so much blurred the original pattern of ones surrounded by zeros that we end up with a regular pattern of Gaussians.
132
+
Displaying just a corner of the image shows that the Gaussian filter has so much blurred the original pattern of ones surrounded by zeros that we end up with a regular pattern of Gaussians.
133
133
134
-
{alt="Grid of Gaussians in the convoluted image"}
134
+
{alt="Grid of Gaussians in the convoluted image"}
135
135
136
136
## Convolution on the GPU Using CuPy
137
137
@@ -142,7 +142,7 @@ This picture depicts the different components of CPU and GPU and how they are co
142
142
{alt="CPU and GPU are separate entities with an own memory"}
143
143
144
144
This means that the array created with NumPy is physically stored in a memory of the host's and, therefore, is only available to the CPU.
145
-
Since our input image and convolution filter are not yet present in the device memory, we need to copy both data to the GPU before executing any code on it.
145
+
Since our input image and convolution filter are not present in the device memory yet, we need to copy them to the GPU before executing any code on it.
146
146
In practice, we use CuPy to copy the arrays `diracs` and `gauss` from the host's Random Access Memory (RAM) to the GPU memory as follows:
Now it is time to compute the convolution on our GPU.
156
156
Inconveniently, SciPy does not offer methods running on GPUs.
157
-
Hence, we import the convolution function from a CuPy package aliased as `cupyx`, whose sub-package [`cupyx.scipy`](https://docs.cupy.dev/en/stable/reference/scipy.html) performs a selection of the SciPy operations.
158
-
We will soon verify that the GPU convolution function of `cupyx` works out the same calculations as the CPU convolution function of SciPy.
159
-
In general, CuPy proper and NumPy are so similar as are the `cupyx` methods and SciPy; this is intended to invite programmers already familiar with NumPy and SciPy to use the GPU for computing.
160
-
For now, let's again record the execution time on the device for the same convolution as the host, and can compare the respective performances.
157
+
Hence, we import the convolution function from a CuPy package aliased as `cupyx`.
158
+
Its sub-package [`cupyx.scipy`](https://docs.cupy.dev/en/stable/reference/scipy.html) performs a selection of the SciPy operations.
159
+
We will soon verify that the convolution function of `cupyx` works out the same calculations on the GPU as the convolution function of SciPy on the CPU.
160
+
In general, CuPy proper and NumPy are so similar one to another as are the `cupyx` methods and SciPy; this is intended to invite programmers already familiar with NumPy and SciPy to use the GPU for computing.
161
+
For now, let's again record the execution time on the device for the same convolution as the host, and compare the respective performances.
161
162
162
163
~~~python
163
164
from cupyx.scipy.signal import convolve2d as convolve2d_gpu
Also the execution time of the GPU convolution will depend very much on the hardware used, as seen for the host.
170
-
The timing using a NVIDIA Tesla T4 on Google Colab was:
170
+
The execution time of the GPU convolution will depend very much on the hardware used, just as seen for the host.
171
+
The timing using a NVIDIA Tesla T4 at Google Colab was:
171
172
172
173
~~~output
173
174
98.2 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 7 loops each)
@@ -178,20 +179,20 @@ Impressive, but is that true?
178
179
179
180
## Measuring performance
180
181
181
-
So far we used `timeit` to measure the performance of our Python code, no matter whether it was running on the CPU or was GPU-accelerated.
182
-
However, the execution on the GPU is *asynchronous*: the Python interpreter takes back control of the program execution immediately, while the GPU is still executing the task.
182
+
So far we used `timeit` to measure the performance of our Python code, regardless of whether it was running on the CPU or was GPU-accelerated.
183
+
However, the execution on the GPU is *asynchronous*: the Python interpreter immediately takes back control of the program execution, while the GPU is still executing the task.
183
184
Therefore, the timing of `timeit` is not reliable.
184
185
185
186
Conveniently, `cupyx` provides the function `benchmark()` that measures the actual execution time in the GPU.
186
-
The following code executes `convolve2d_gpu()` with the appropriate arguments ten times, and stores inside the `.gpu_times` attribute of the variable `execution_gpu` the execution time of each run in seconds.
187
+
The following code executes `convolve2d_gpu()` with the appropriate arguments ten times, and stores inside the `.gpu_times` attribute of the variable `benchmark_gpu` the execution time of each run in seconds.
These measurements are also more stable and representative, because `benchmark()` disregards the compile time and the repetitions warm up the GPU.
195
+
These measurements are also more stable and representative, because `benchmark()` disregards the compile time and because the repetitions warm up the GPU.
195
196
We can then average the execution times, as follows:
196
197
197
198
~~~python
@@ -215,7 +216,7 @@ If this works, it will save us the time and effort of transferring the arrays `d
215
216
216
217
::::::::::::::::::::::::::::::::::::: solution
217
218
218
-
We can call the GPU convolution function `convolve2d_gpu()` directly with `deltas` and `gauss` as argument:
219
+
We can call the GPU convolution function `convolve2d_gpu()` directly with `diracs` and `gauss` as argument:
219
220
220
221
~~~python
221
222
convolve2d_gpu(diracs, gauss)
@@ -254,7 +255,7 @@ Hint: use the `cp.asnumpy()` method to copy a CuPy array back to the host.
254
255
255
256
:::::::::::::::::::::::::::::::::::::: solution
256
257
257
-
A convenient strategy is to time the execution of a single Python function that groups the transfers to and from the GPU and the convolution, as follows:
258
+
An effective strategy is to time the execution of a single Python function that groups the transfers to and from the GPU and the convolution, as follows:
0 commit comments