Skip to content

Commit 984c4ce

Browse files
committed
Merge branch 'development' of github.com:gjbex/Python-for-HPC into development
2 parents 3c16f0f + aec988b commit 984c4ce

13 files changed

+5894
-943
lines changed

python_for_hpc.pptx

131 KB
Binary file not shown.

source-code/README.md

+6
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
This is source code that is either used in the presentation, or was developed
44
to create it. There is some material not covered in the presentation as well.
55

6+
67
## Requirements
78

89
* Python version: at least 3.6
@@ -18,6 +19,10 @@ to create it. There is some material not covered in the presentation as well.
1819
* jupyter
1920
* ipywidgets
2021

22+
* For the GPU code:
23+
* pycuda
24+
* scikit-cuda
25+
2126

2227
## What is it?
2328

@@ -39,3 +44,4 @@ to create it. There is some material not covered in the presentation as well.
3944
1. `pypy`: code to experiment with the Pypy interpreter.
4045
1. `file-formats`: influcence of file formats on performance.
4146
1. `gpu`: some examples of using GPUs.
47+
1. `performance`: general considerations about performance.

source-code/gpu/README.md

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# GPU
2+
3+
Sample code for performing computations on a GPU.
4+
5+
6+
## What is it?
7+
8+
1. `pycuda.ipynb`: jupyter notebook illustrating pyCUDA.
9+
1. `curand.ipynb`: jupyter notebook illustrating generating random
10+
numbers on a GPU.
11+
1. `scikit_cuda.ipynb`: jupyter notebook illustrating linear algebra
12+
on a GPU device.
13+
1. `numba.ipynb`: jupyter notebook illustrating using numba for
14+
GPU computing.

source-code/gpu/curand.ipynb

+199
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "2a4181f6-f103-4025-a45c-5bd31bbc3d12",
6+
"metadata": {},
7+
"source": [
8+
"# Requirements"
9+
]
10+
},
11+
{
12+
"cell_type": "code",
13+
"execution_count": 1,
14+
"id": "eac06215-e0dd-430b-a157-a5aab4904b65",
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"import math\n",
19+
"import numpy as np\n",
20+
"import pycuda.autoinit\n",
21+
"from pycuda import gpuarray\n",
22+
"from pycuda.compiler import SourceModule"
23+
]
24+
},
25+
{
26+
"cell_type": "markdown",
27+
"id": "5832d66d-0446-499c-a17e-ffcac9bdbd0a",
28+
"metadata": {},
29+
"source": [
30+
"We determine $\\pi$ as the ratio between a circle of radius 1 and the square that circumscribes it. The area of the circle will be approximated by the number of randomly selected points that fall into it, compared to the (larger) number of points that fall into the square.\n",
31+
"\n",
32+
"If we choose $x$ and $y$ independently from a uniform distribution $[0, 1[$, then $(x, y)$ represents a point and lies in a circle with radius 1 if $x^2 + y^2 \\le 1$. Since this is only one quarter of the circle and circumscribing square, we get $\\pi$ by dividing the number of points in the circle by the total number of points, and multiplying by 4."
33+
]
34+
},
35+
{
36+
"cell_type": "markdown",
37+
"id": "32ca5892-664f-4317-a41d-b4f7a4d3f156",
38+
"metadata": {},
39+
"source": [
40+
"# Implementation"
41+
]
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"id": "2bc4dbd6-5018-427d-9ee9-347bfe6cd9d9",
46+
"metadata": {},
47+
"source": [
48+
"To generate random numbers for each thread on the GPU, we use the curand library, which is a C++ library. Since we use plain C for our CUDA code, we have to make sure that the header file is read ouside of an extermal C block. The `SourceModule` add this automatically by default, so we make sure this isn't done by specifying the appropriate option. In the source code, we implement our kernel in an external C blcok.\n",
49+
"\n",
50+
"The random number generator is intialized using the `curand_init` function that takes a seed as its first argument. We ensure that it is unique for each thread by adding a thread-specific constant to the clock time.\n",
51+
"\n",
52+
"Random numbers are sampled from a uniform distribution using the `curand_uniform` function."
53+
]
54+
},
55+
{
56+
"cell_type": "code",
57+
"execution_count": 48,
58+
"id": "122a78d1-9d07-4484-8f30-a01ed58a9715",
59+
"metadata": {},
60+
"outputs": [],
61+
"source": [
62+
"source_code = '''\n",
63+
" #include <curand_kernel.h>\n",
64+
" \n",
65+
" typedef unsigned long long cu_long;\n",
66+
" \n",
67+
" extern \"C\" {\n",
68+
" __global__ void estimate_pi(cu_long nr_tries, cu_long *nr_hits) {\n",
69+
" curandState rand_state;\n",
70+
" int thread_id = blockIdx.x*blockDim.x + threadIdx.x;\n",
71+
" curand_init((cu_long) clock() + (cu_long) thread_id,\n",
72+
" (cu_long) 0, (cu_long) 0, &rand_state);\n",
73+
" float x, y;\n",
74+
" for (cu_long i = 0; i < nr_tries; ++i) {\n",
75+
" x = curand_uniform(&rand_state);\n",
76+
" y = curand_uniform(&rand_state);\n",
77+
" if (x*x + y*y < 1.0f) {\n",
78+
" nr_hits[thread_id]++;\n",
79+
" }\n",
80+
" }\n",
81+
" }\n",
82+
" }\n",
83+
"'''\n",
84+
"\n",
85+
"kernels = SourceModule(no_extern_c=True, source=source_code)\n",
86+
"pi_kernel = kernels.get_function('estimate_pi')"
87+
]
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"id": "e54d0d19-9545-46b6-b79f-5a3a110e7a66",
92+
"metadata": {},
93+
"source": [
94+
"Set the number of threads per block and the number of blocks per grid, and create an array of the appropriate size to store the counts for each thread. Also specify the number of points to try"
95+
]
96+
},
97+
{
98+
"cell_type": "code",
99+
"execution_count": 44,
100+
"id": "71134a08-5195-4205-ae1c-51c7298704e7",
101+
"metadata": {},
102+
"outputs": [],
103+
"source": [
104+
"threads_per_block = 32\n",
105+
"blocks_per_grid = 512\n",
106+
"total_threads = threads_per_block*blocks_per_grid\n",
107+
"nr_hits = gpuarray.zeros((total_threads, ), dtype=np.uint64)\n",
108+
"nr_tries = np.uint64(2**24)"
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"id": "84c1a177-7a4f-4b27-9a11-4b318601ac4d",
114+
"metadata": {},
115+
"source": [
116+
"Now we can execute the kernel and compute the value of $\\pi$."
117+
]
118+
},
119+
{
120+
"cell_type": "code",
121+
"execution_count": 45,
122+
"id": "a872e28c-9256-4d50-b46c-f8381131eda1",
123+
"metadata": {},
124+
"outputs": [],
125+
"source": [
126+
"pi_kernel(nr_tries, nr_hits, grid=(blocks_per_grid, 1, 1), block=(threads_per_block, 1, 1))"
127+
]
128+
},
129+
{
130+
"cell_type": "code",
131+
"execution_count": 46,
132+
"id": "53ddbb43-0f22-427e-83a6-c6d0264e958e",
133+
"metadata": {},
134+
"outputs": [],
135+
"source": [
136+
"pi_computed = 4.0*np.sum(nr_hits.get())/(nr_tries*total_threads)"
137+
]
138+
},
139+
{
140+
"cell_type": "markdown",
141+
"id": "19bbb167-50b8-4bdc-8c92-1fb2f69d1e1b",
142+
"metadata": {},
143+
"source": [
144+
"Checking the accuracy as comared with $\\pi$'s true value shows that it is correct upto a millionth."
145+
]
146+
},
147+
{
148+
"cell_type": "code",
149+
"execution_count": 47,
150+
"id": "f86f0258-64c8-44a0-b8b3-5a6f5d24e362",
151+
"metadata": {},
152+
"outputs": [
153+
{
154+
"name": "stdout",
155+
"output_type": "stream",
156+
"text": [
157+
"1.0e-01 True\n",
158+
"1.0e-02 True\n",
159+
"1.0e-03 True\n",
160+
"1.0e-04 True\n",
161+
"1.0e-05 True\n",
162+
"1.0e-06 True\n",
163+
"1.0e-07 False\n",
164+
"1.0e-08 False\n",
165+
"1.0e-09 False\n",
166+
"1.0e-10 False\n",
167+
"1.0e-11 False\n",
168+
"1.0e-12 False\n"
169+
]
170+
}
171+
],
172+
"source": [
173+
"for tol in np.logspace(-1, -12, num=12):\n",
174+
" print(f'{tol:.1e} {math.isclose(pi_computed, math.pi, rel_tol=tol)}')"
175+
]
176+
}
177+
],
178+
"metadata": {
179+
"kernelspec": {
180+
"display_name": "Python 3 (ipykernel)",
181+
"language": "python",
182+
"name": "python3"
183+
},
184+
"language_info": {
185+
"codemirror_mode": {
186+
"name": "ipython",
187+
"version": 3
188+
},
189+
"file_extension": ".py",
190+
"mimetype": "text/x-python",
191+
"name": "python",
192+
"nbconvert_exporter": "python",
193+
"pygments_lexer": "ipython3",
194+
"version": "3.9.7"
195+
}
196+
},
197+
"nbformat": 4,
198+
"nbformat_minor": 5
199+
}

0 commit comments

Comments
 (0)