|
| 1 | + |
| 2 | +# Benchmarking quantum circuits |
| 3 | + |
| 4 | + |
| 5 | +## Run with Python `venv` |
| 6 | + |
| 7 | +### `lightning-kokkos` from pypi wheels |
| 8 | + |
| 9 | +Python venv with pypi wheels |
| 10 | +``` |
| 11 | +cd /global/common/software/m4693/ |
| 12 | +
|
| 13 | +module load python |
| 14 | +mkdir -p venv |
| 15 | +python -m venv venv/qml_LK |
| 16 | +source venv/qml_LK/bin/activate |
| 17 | +
|
| 18 | +cd /global/cfs/cdirs/m4693/qml-benchmarks-devel |
| 19 | +pip install -e . # --user |
| 20 | +
|
| 21 | +pip install ray # for other experiments |
| 22 | +
|
| 23 | +pip install pennylane-lightning |
| 24 | +pip install pennylane-lightning[kokkos] |
| 25 | +
|
| 26 | +pip install pennylane-catalyst |
| 27 | +``` |
| 28 | + |
| 29 | +Start interactive job on CPU node for testing |
| 30 | +``` bash |
| 31 | +salloc -q interactive -C cpu -t 0:30:00 -A m4693 |
| 32 | + |
| 33 | +# and execute in this interactive session: |
| 34 | + |
| 35 | +source /global/common/software/m4693/venv/qml_LK/bin/activate |
| 36 | +cd nersc/ |
| 37 | + |
| 38 | +# to restrict the number of threads: |
| 39 | +export OMP_NUM_THREADS=32 |
| 40 | +python3 single_circuits/demo_variational.py -q lightning.qubit -n 25 |
| 41 | +``` |
| 42 | + |
| 43 | +Stats on interactive CPU node (nid004079) |
| 44 | +``` |
| 45 | +> Weights as native numpy arrays |
| 46 | +lightning.qubit |
| 47 | + 15 - 0.1 s |
| 48 | + 20 - 3.3 s |
| 49 | + 21 - 7 s |
| 50 | + 22 - 16 s |
| 51 | + 23 - 35 s |
| 52 | +lightning.kokkos |
| 53 | + 23 - 1 s |
| 54 | + 25 - 5 s (7 s with 32 threads) |
| 55 | + 26 - 34 s |
| 56 | +
|
| 57 | +> Benchmarking numpy/qml.numpy, gradients with "adjoint" |
| 58 | +> no-grad: qml.np.array(requires_grad=True) but no jacobian requested |
| 59 | +lightning.qubit |
| 60 | + numpy qml.np qml.np qjit qjit qjit |
| 61 | + no-grad grad comp no-grad grad |
| 62 | + 15 - 0.14 0.16 1.3 10.4 0.1 error |
| 63 | + 16 - 0.24 0.25 2.0 11.6 0.2 |
| 64 | + 17 - 0.44 0.42 3.7 12.8 0.3 |
| 65 | + 20 - 3.75 3.74 32.6 19.8 3.4 |
| 66 | +> NotImplementedError: Converting dtype('O') to a ctypes type |
| 67 | +lightning.kokkos (with 32 threads) |
| 68 | + numpy qml.np qml.np qjit qjit qjit |
| 69 | + no-grad grad comp no-grad grad |
| 70 | + 15 - 0.1 0.1 0.7 10.3 0.0 |
| 71 | + 20 - 0.3 0.3 2.4 16.6 0.3 |
| 72 | + 23 - 1.4 1.4 15.1 21.5 1.3 |
| 73 | + 25 - 6.9 6.9 101.1 30.7 7.3 |
| 74 | +
|
| 75 | +> Benchmarking numpy/qml.numpy, gradients with "finite-diff" |
| 76 | +lightning.qubit |
| 77 | + numpy qml.np qml.np qjit qjit qjit |
| 78 | + no-grad grad comp no-grad grad |
| 79 | + 15 - 0.1 0.2 - 9.9 0.0 42.3 |
| 80 | + 16 - 0.1 0.3 - 11.1 0.1 87.7 |
| 81 | + 17 - 0.3 0.4 - 12.3 0.2 |
| 82 | + 20 - 2.6 3.0 - 18.2 2.5 |
| 83 | +
|
| 84 | +lightning.kokkos (with 32 threads) |
| 85 | + numpy qml.np qml.np qjit qjit qjit |
| 86 | + no-grad grad comp no-grad grad |
| 87 | + 15 - 0.1 0.1 - 10.1 0.0 27.3 |
| 88 | + 20 - 0.2 0.3 - 16.1 0.3 189.4 |
| 89 | + 23 - 1.3 1.5 - 21.3 1.4 - |
| 90 | + 25 - 6.5 6.6 - 30.4 7.6 - |
| 91 | +``` |
| 92 | + |
| 93 | +### `lightning-kokkos` from source with CUDA |
| 94 | + |
| 95 | +lightning-kokkos with GPU |
| 96 | +- https://pypi.org/project/PennyLane-Lightning-Kokkos/ |
| 97 | +- https://docs.pennylane.ai/projects/lightning/en/stable/lightning_kokkos/installation.html |
| 98 | +- https://github.com/PennyLaneAI/lightning-on-hpc/blob/main/DataCollection/distributed/LUMI_LKOKKOS_VQE/README.md- |
| 99 | + |
| 100 | +``` bash |
| 101 | +cd /global/common/software/m4693/ |
| 102 | + |
| 103 | +module load cudatoolkit |
| 104 | + |
| 105 | +module load python |
| 106 | +mkdir -p venv |
| 107 | +python -m venv venv/qml_LK_GPU |
| 108 | +source venv/qml_LK_GPU/bin/activate |
| 109 | + |
| 110 | +python -m pip install pip==22.0 |
| 111 | + |
| 112 | +git clone https://github.com/PennyLaneAI/pennylane-lightning.git |
| 113 | +cd pennylane-lightning |
| 114 | + |
| 115 | +git checkout v0.36.0 |
| 116 | + |
| 117 | +pip install -r requirements.txt |
| 118 | +pip install ray |
| 119 | + |
| 120 | +# pip install pennylane-catalyst # [added later] |
| 121 | + |
| 122 | +# install lightning-qubit as prerequisite |
| 123 | +CXX=$(which CC) python -m pip install -e . --verbose |
| 124 | + |
| 125 | +CXX=$(which CC) CMAKE_ARGS="-DKokkos_ENABLE_OPENMP=ON -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE80:BOOL=ON -DCMAKE_CXX_COMPILER=$(which CC)" PL_BACKEND="lightning_kokkos" python -m pip install . --verbose |
| 126 | +``` |
| 127 | + |
| 128 | +Start interactive job on GPU node for testing |
| 129 | +``` bash |
| 130 | +salloc -q interactive -C gpu -t 0:30:00 -A m4693 |
| 131 | + |
| 132 | +# and execute in this interactive session: |
| 133 | + |
| 134 | +source /global/common/software/m4693/venv/qml_LK_GPU/bin/activate |
| 135 | +cd nersc/ |
| 136 | + |
| 137 | +# to restrict the number of threads: |
| 138 | +#export OMP_NUM_THREADS=1 |
| 139 | + |
| 140 | +python3 single_circuits/demo_variational.py -q lightning.kokkos -n 25 |
| 141 | +``` |
| 142 | + |
| 143 | +Stats on interactive GPU node (nid200381) |
| 144 | +``` |
| 145 | +lightning.kokkos |
| 146 | + 23 - s |
| 147 | + 25 - 3 s |
| 148 | + 26 - 6 s |
| 149 | + 27 - 12 s |
| 150 | + 28 - 25 s |
| 151 | +
|
| 152 | +> Benchmarking numpy/qml.numpy, gradients |
| 153 | +> no-grad: qml.np.array(requires_grad=True) but no jacobian requested |
| 154 | +lightning.kokkos |
| 155 | + numpy qml.np jacobian qjit qjit qjit |
| 156 | + no-grad grad comp no-grad grad |
| 157 | + 22 - s 2 s 5 s 20 s 1 s |
| 158 | + 23 - s 4 s 9 s |
| 159 | + 25 - 3 s 18 s 37 s 50 s 24 s |
| 160 | + 26 - 6 s |
| 161 | + 27 - 12 s |
| 162 | + 28 - 25 s |
| 163 | +> Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize() |
| 164 | +
|
| 165 | +> Benchmarking numpy/qml.numpy, gradients with "finite-diff" |
| 166 | +lightning.kokkos |
| 167 | + numpy qml.np qml.np qjit qjit qjit |
| 168 | + no-grad grad comp no-grad grad |
| 169 | + 15 - 10.3 0.1 52.7 |
| 170 | + 20 - 16.5 0.4 |
| 171 | + 22 - 0.3 0.5 - 20.4 1.1 |
| 172 | + 23 - 0.6 0.8 - 22.9 2.8 |
| 173 | + 25 - 2.6 2.9 - 49.1 24.0 |
| 174 | + 26 - 5.6 5.8 - |
| 175 | +
|
| 176 | +``` |
| 177 | + |
| 178 | +Run batch of circuits in parallel |
| 179 | +``` bash |
| 180 | +# @ray.remote(num_gpus=0.5) has same runtime than num_gpus=1 |
| 181 | +time python3 single_circuits/batch_variational.py -n 26 -s 4 |
| 182 | + |
| 183 | +# move task to background and monitor GPU usage |
| 184 | +nvidia-smi |
| 185 | +``` |
| 186 | + |
| 187 | +Stats on 1 interactive GPU node |
| 188 | +``` |
| 189 | +ray_init in 7 to 15 s |
| 190 | +> How long does 1 circuit run on its GPU? |
| 191 | +25 features |
| 192 | + samples run_time run_time/sample*gpu |
| 193 | + - 3 |
| 194 | + 16 32 8 |
| 195 | +26 features |
| 196 | + samples run_time run_time/sample*gpu |
| 197 | + - 6 |
| 198 | + 4 10 10 |
| 199 | + 8 23 11 |
| 200 | + 16 39 10 |
| 201 | + 32 77 10 |
| 202 | +> create dev 1.8 s |
| 203 | +> create circuit < 1 ms |
| 204 | +27 features |
| 205 | + samples run_time run_time/sample*gpu |
| 206 | + - 12 |
| 207 | + 4 16 16 |
| 208 | + 8 31 15 |
| 209 | +> Overhead of 4 s per circuit with Ray |
| 210 | +> This includes creating dev + circuit |
| 211 | +
|
| 212 | +30 features |
| 213 | + samples run_time run_time/sample*gpu |
| 214 | + - n.a. |
| 215 | + 4 120 120 |
| 216 | +> create dev 3.3 s |
| 217 | +> create circuit < 1 ms |
| 218 | +
|
| 219 | +> Run r circuits sequentially within 1 ray job: |
| 220 | +batch_variational.py -n 26 -s 32 -r 8 |
| 221 | + total: 48.949 s |
| 222 | + per_circuit: 6.119 s |
| 223 | +> per circuit runtime is equivalent to run w/o ray |
| 224 | +``` |
| 225 | + |
| 226 | +## Run in `podman` containers |
| 227 | + |
| 228 | +Prerequisite: Make sure to have datasets available in `single_circuits/linearly_separable`. |
| 229 | + |
| 230 | +Start interactive job on CPU node for testing |
| 231 | +``` bash |
| 232 | +salloc -q interactive -C cpu -t 0:30:00 -A m4693 |
| 233 | + |
| 234 | +# and execute in this interactive session: |
| 235 | + |
| 236 | +IMG=tgermain/ubu22-pennylane-ray |
| 237 | + |
| 238 | +# For preliminary testing whether image is available on node: |
| 239 | +CFSH=/global/cfs/cdirs/m4693 # CFS home |
| 240 | +REPO_DIR=$CFSH/qml-benchmarks-devel # qml-benchmark repo |
| 241 | +ROOT_DIR=$REPO_DIR/nersc/root # to access local python packages |
| 242 | +WORK_DIR=$REPO_DIR/nersc # to store output files |
| 243 | +# Mount /tmp to avoid following error with Ray: |
| 244 | +# ValueError: Can't find a `node_ip_address.json` file |
| 245 | + |
| 246 | +podman-hpc run -it \ |
| 247 | + --net host \ |
| 248 | + --volume /tmp:/tmp \ |
| 249 | + --volume $ROOT_DIR:/root \ |
| 250 | + --volume $REPO_DIR:/qml-benchmarks \ |
| 251 | + --volume $WORK_DIR:/work_dir \ |
| 252 | + --workdir /work_dir \ |
| 253 | + -e HDF5_USE_FILE_LOCKING='FALSE' \ |
| 254 | + --shm-size=10.24gb \ |
| 255 | + $IMG bash |
| 256 | + |
| 257 | +# Then execute in container, in `work_dir/`: |
| 258 | + |
| 259 | +python3 single_circuits/circuit_variational.py --model IQPVariationalClassifier --numFeatures 21 --inputPath single_circuits/linearly_separable/ |
| 260 | + |
| 261 | +python3 single_circuits/demo_variational.py |
| 262 | + |
| 263 | +# exit container |
| 264 | + |
| 265 | +# Run container interactively with wrapper |
| 266 | +./wrap_podman.sh $IMG "python3 single_circuits/demo_variational.py" |
| 267 | +``` |
0 commit comments