Skip to content

Commit e67dd76

Browse files
committed
single_circuits benchmarks with data and plots
1 parent b848381 commit e67dd76

11 files changed

+1094
-0
lines changed
39 KB
Loading
18 KB
Loading
25.2 KB
Loading

nersc/single_circuits/README.md

+267
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
2+
# Benchmarking quantum circuits
3+
4+
5+
## Run with Python `venv`
6+
7+
### `lightning-kokkos` from pypi wheels
8+
9+
Python venv with pypi wheels
10+
```
11+
cd /global/common/software/m4693/
12+
13+
module load python
14+
mkdir -p venv
15+
python -m venv venv/qml_LK
16+
source venv/qml_LK/bin/activate
17+
18+
cd /global/cfs/cdirs/m4693/qml-benchmarks-devel
19+
pip install -e . # --user
20+
21+
pip install ray # for other experiments
22+
23+
pip install pennylane-lightning
24+
pip install pennylane-lightning[kokkos]
25+
26+
pip install pennylane-catalyst
27+
```
28+
29+
Start interactive job on CPU node for testing
30+
``` bash
31+
salloc -q interactive -C cpu -t 0:30:00 -A m4693
32+
33+
# and execute in this interactive session:
34+
35+
source /global/common/software/m4693/venv/qml_LK/bin/activate
36+
cd nersc/
37+
38+
# to restrict the number of threads:
39+
export OMP_NUM_THREADS=32
40+
python3 single_circuits/demo_variational.py -q lightning.qubit -n 25
41+
```
42+
43+
Stats on interactive CPU node (nid004079)
44+
```
45+
> Weights as native numpy arrays
46+
lightning.qubit
47+
15 - 0.1 s
48+
20 - 3.3 s
49+
21 - 7 s
50+
22 - 16 s
51+
23 - 35 s
52+
lightning.kokkos
53+
23 - 1 s
54+
25 - 5 s (7 s with 32 threads)
55+
26 - 34 s
56+
57+
> Benchmarking numpy/qml.numpy, gradients with "adjoint"
58+
> no-grad: qml.np.array(requires_grad=True) but no jacobian requested
59+
lightning.qubit
60+
numpy qml.np qml.np qjit qjit qjit
61+
no-grad grad comp no-grad grad
62+
15 - 0.14 0.16 1.3 10.4 0.1 error
63+
16 - 0.24 0.25 2.0 11.6 0.2
64+
17 - 0.44 0.42 3.7 12.8 0.3
65+
20 - 3.75 3.74 32.6 19.8 3.4
66+
> NotImplementedError: Converting dtype('O') to a ctypes type
67+
lightning.kokkos (with 32 threads)
68+
numpy qml.np qml.np qjit qjit qjit
69+
no-grad grad comp no-grad grad
70+
15 - 0.1 0.1 0.7 10.3 0.0
71+
20 - 0.3 0.3 2.4 16.6 0.3
72+
23 - 1.4 1.4 15.1 21.5 1.3
73+
25 - 6.9 6.9 101.1 30.7 7.3
74+
75+
> Benchmarking numpy/qml.numpy, gradients with "finite-diff"
76+
lightning.qubit
77+
numpy qml.np qml.np qjit qjit qjit
78+
no-grad grad comp no-grad grad
79+
15 - 0.1 0.2 - 9.9 0.0 42.3
80+
16 - 0.1 0.3 - 11.1 0.1 87.7
81+
17 - 0.3 0.4 - 12.3 0.2
82+
20 - 2.6 3.0 - 18.2 2.5
83+
84+
lightning.kokkos (with 32 threads)
85+
numpy qml.np qml.np qjit qjit qjit
86+
no-grad grad comp no-grad grad
87+
15 - 0.1 0.1 - 10.1 0.0 27.3
88+
20 - 0.2 0.3 - 16.1 0.3 189.4
89+
23 - 1.3 1.5 - 21.3 1.4 -
90+
25 - 6.5 6.6 - 30.4 7.6 -
91+
```
92+
93+
### `lightning-kokkos` from source with CUDA
94+
95+
lightning-kokkos with GPU
96+
- https://pypi.org/project/PennyLane-Lightning-Kokkos/
97+
- https://docs.pennylane.ai/projects/lightning/en/stable/lightning_kokkos/installation.html
98+
- https://github.com/PennyLaneAI/lightning-on-hpc/blob/main/DataCollection/distributed/LUMI_LKOKKOS_VQE/README.md-
99+
100+
``` bash
101+
cd /global/common/software/m4693/
102+
103+
module load cudatoolkit
104+
105+
module load python
106+
mkdir -p venv
107+
python -m venv venv/qml_LK_GPU
108+
source venv/qml_LK_GPU/bin/activate
109+
110+
python -m pip install pip==22.0
111+
112+
git clone https://github.com/PennyLaneAI/pennylane-lightning.git
113+
cd pennylane-lightning
114+
115+
git checkout v0.36.0
116+
117+
pip install -r requirements.txt
118+
pip install ray
119+
120+
# pip install pennylane-catalyst # [added later]
121+
122+
# install lightning-qubit as prerequisite
123+
CXX=$(which CC) python -m pip install -e . --verbose
124+
125+
CXX=$(which CC) CMAKE_ARGS="-DKokkos_ENABLE_OPENMP=ON -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE80:BOOL=ON -DCMAKE_CXX_COMPILER=$(which CC)" PL_BACKEND="lightning_kokkos" python -m pip install . --verbose
126+
```
127+
128+
Start interactive job on GPU node for testing
129+
``` bash
130+
salloc -q interactive -C gpu -t 0:30:00 -A m4693
131+
132+
# and execute in this interactive session:
133+
134+
source /global/common/software/m4693/venv/qml_LK_GPU/bin/activate
135+
cd nersc/
136+
137+
# to restrict the number of threads:
138+
#export OMP_NUM_THREADS=1
139+
140+
python3 single_circuits/demo_variational.py -q lightning.kokkos -n 25
141+
```
142+
143+
Stats on interactive GPU node (nid200381)
144+
```
145+
lightning.kokkos
146+
23 - s
147+
25 - 3 s
148+
26 - 6 s
149+
27 - 12 s
150+
28 - 25 s
151+
152+
> Benchmarking numpy/qml.numpy, gradients
153+
> no-grad: qml.np.array(requires_grad=True) but no jacobian requested
154+
lightning.kokkos
155+
numpy qml.np jacobian qjit qjit qjit
156+
no-grad grad comp no-grad grad
157+
22 - s 2 s 5 s 20 s 1 s
158+
23 - s 4 s 9 s
159+
25 - 3 s 18 s 37 s 50 s 24 s
160+
26 - 6 s
161+
27 - 12 s
162+
28 - 25 s
163+
> Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize()
164+
165+
> Benchmarking numpy/qml.numpy, gradients with "finite-diff"
166+
lightning.kokkos
167+
numpy qml.np qml.np qjit qjit qjit
168+
no-grad grad comp no-grad grad
169+
15 - 10.3 0.1 52.7
170+
20 - 16.5 0.4
171+
22 - 0.3 0.5 - 20.4 1.1
172+
23 - 0.6 0.8 - 22.9 2.8
173+
25 - 2.6 2.9 - 49.1 24.0
174+
26 - 5.6 5.8 -
175+
176+
```
177+
178+
Run batch of circuits in parallel
179+
``` bash
180+
# @ray.remote(num_gpus=0.5) has same runtime than num_gpus=1
181+
time python3 single_circuits/batch_variational.py -n 26 -s 4
182+
183+
# move task to background and monitor GPU usage
184+
nvidia-smi
185+
```
186+
187+
Stats on 1 interactive GPU node
188+
```
189+
ray_init in 7 to 15 s
190+
> How long does 1 circuit run on its GPU?
191+
25 features
192+
samples run_time run_time/sample*gpu
193+
- 3
194+
16 32 8
195+
26 features
196+
samples run_time run_time/sample*gpu
197+
- 6
198+
4 10 10
199+
8 23 11
200+
16 39 10
201+
32 77 10
202+
> create dev 1.8 s
203+
> create circuit < 1 ms
204+
27 features
205+
samples run_time run_time/sample*gpu
206+
- 12
207+
4 16 16
208+
8 31 15
209+
> Overhead of 4 s per circuit with Ray
210+
> This includes creating dev + circuit
211+
212+
30 features
213+
samples run_time run_time/sample*gpu
214+
- n.a.
215+
4 120 120
216+
> create dev 3.3 s
217+
> create circuit < 1 ms
218+
219+
> Run r circuits sequentially within 1 ray job:
220+
batch_variational.py -n 26 -s 32 -r 8
221+
total: 48.949 s
222+
per_circuit: 6.119 s
223+
> per circuit runtime is equivalent to run w/o ray
224+
```
225+
226+
## Run in `podman` containers
227+
228+
Prerequisite: Make sure to have datasets available in `single_circuits/linearly_separable`.
229+
230+
Start interactive job on CPU node for testing
231+
``` bash
232+
salloc -q interactive -C cpu -t 0:30:00 -A m4693
233+
234+
# and execute in this interactive session:
235+
236+
IMG=tgermain/ubu22-pennylane-ray
237+
238+
# For preliminary testing whether image is available on node:
239+
CFSH=/global/cfs/cdirs/m4693 # CFS home
240+
REPO_DIR=$CFSH/qml-benchmarks-devel # qml-benchmark repo
241+
ROOT_DIR=$REPO_DIR/nersc/root # to access local python packages
242+
WORK_DIR=$REPO_DIR/nersc # to store output files
243+
# Mount /tmp to avoid following error with Ray:
244+
# ValueError: Can't find a `node_ip_address.json` file
245+
246+
podman-hpc run -it \
247+
--net host \
248+
--volume /tmp:/tmp \
249+
--volume $ROOT_DIR:/root \
250+
--volume $REPO_DIR:/qml-benchmarks \
251+
--volume $WORK_DIR:/work_dir \
252+
--workdir /work_dir \
253+
-e HDF5_USE_FILE_LOCKING='FALSE' \
254+
--shm-size=10.24gb \
255+
$IMG bash
256+
257+
# Then execute in container, in `work_dir/`:
258+
259+
python3 single_circuits/circuit_variational.py --model IQPVariationalClassifier --numFeatures 21 --inputPath single_circuits/linearly_separable/
260+
261+
python3 single_circuits/demo_variational.py
262+
263+
# exit container
264+
265+
# Run container interactively with wrapper
266+
./wrap_podman.sh $IMG "python3 single_circuits/demo_variational.py"
267+
```

0 commit comments

Comments
 (0)