Skip to content

Commit 1fde1ab

Browse files
committed
AMReX-MPMD - scripts for Perlmutter
1 parent aa792b1 commit 1fde1ab

File tree

12 files changed

+216
-2
lines changed

12 files changed

+216
-2
lines changed

Docs/source/MPMD_Tutorials.rst

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
1+
.. _tutorials_mpmd:
2+
13
AMReX-MPMD
24
==========
35

46
AMReX-MPMD utilizes the Multiple Program Multiple Data (MPMD) feature of MPI to provide cross-functionality for AMReX-based codes.
5-
The framework enables data transfer across two different applications through **MPMD::Copier** class, which takes **BoxArray** of its application as an argument.
7+
The framework enables data transfer across two different applications through **MPMD::Copier** class, which typically takes **BoxArray** of its application as an argument.
68
**Copier** instances created in both the applications together identify the overlapping cells for which the data transfer must occur.
79
**Copier::send** & **Copier::recv** functions, which take a **MultiFab** as an argument, are used to transfer the desired data of overlapping regions.
810

@@ -22,6 +24,8 @@ Overview
2224

2325
The domain in ``main_1.cpp`` is set to ``lo = {0, 0, 0}`` and ``hi = {31, 31, 31}``, while the domain in ``main_2.cpp`` is set to ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``.
2426
Hence, the data transfer will occur for the region ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``.
27+
Furthermore, the domain in ``main_1.cpp`` is split into boxes using ``max_grid_size=16``, while the domain in ``main_2.cpp`` is split using ``max_grid_size=8``.
28+
Therefore, the **BoxArray**, and moreover, the number of boxes for the overlapping region are different across the two applications.
2529
The data transfer demonstration is performed using a two component *MultiFab*.
2630
The first component is populated in ``main_1.cpp`` before it is transferred to ``main_2.cpp``.
2731
The second component is populated in ``main_2.cpp`` based on the received first component.
@@ -58,6 +62,19 @@ Furthermore, the run process here assumes that the current working directory is
5862
# Running the MPMD process with 12 ranks
5963
mpirun -np 8 Source_1/main3d.gnu.DEBUG.MPI.ex : -np 4 Source_2/main3d.gnu.DEBUG.MPI.ex
6064
65+
Running on Perlmutter (NERSC)
66+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
67+
68+
This section presents information regarding sample scripts that can be used to run this case on `Perlmutter (NERSC) <https://docs.nersc.gov/systems/perlmutter/>`_.
69+
The scripts ``mpmd_cpu.sh`` and ``mpmd_cpu.conf`` can be used to run the CPU version.
70+
Similarly, ``mpmd_gpu.sh`` and ``mpmd_gpu.conf`` can be used to run the GPU version.
71+
Please note that ``perlmutter_gpu.profile`` must be leveraged to compile the GPU version of the applications.
72+
73+
The content presented here is based on the following references:
74+
75+
* `NERSC documentation <https://docs.nersc.gov/jobs/examples/#mpmd-multiple-program-multiple-data-jobs>`_
76+
* `WarpX documentation <https://warpx.readthedocs.io/en/latest/install/hpc/perlmutter.html>`_
77+
6178
Case-2
6279
------
6380

@@ -72,7 +89,7 @@ Contents
7289
Overview
7390
^^^^^^^^
7491

75-
In the previous case (Case-1) of MPMD each application has its own domain, and, therefore, different **BoxArray**.
92+
In the previous case (Case-1) of MPMD each application has its own domain, and therefore, different **BoxArray**.
7693
However, there exist scenarios where both applications deal with the same **BoxArray**.
7794
The current case presents such a scenario where the **BoxArray** is defined only in the ``main.cpp`` application, but this information is relayed to ``main.py`` application through the **MPMD::Copier**.
7895

@@ -115,3 +132,45 @@ Furthermore, the run process here assumes that the current working directory is
115132
116133
# Running the MPMD process with 12 ranks
117134
mpirun -np 8 ./main3d.gnu.DEBUG.MPI.ex : -np 4 python main.py
135+
136+
Running on Perlmutter (NERSC)
137+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
138+
139+
Running this case on perlmutter involves creating a python virtual environment.
140+
pyAMReX must be compiled and installed into this virtual environment after its creation.
141+
Similar to the previous case, this case also has supporting scripts to run on CPUs and GPUs.
142+
143+
Creating a virtual environment
144+
""""""""""""""""""""""""""""""
145+
146+
.. code-block:: bash
147+
148+
# Setup the required environment variables
149+
source perlmutter_gpu.profile
150+
151+
# BEFORE PERFORMING THE FOLLOWING COMMANDS
152+
# MOVE TO A DIRECTORY WHERE THE PYTHON VIRTUAL ENVIRONMENT MUST EXIST
153+
154+
python3 -m pip install --upgrade pip
155+
python3 -m pip install --upgrade virtualenv
156+
python3 -m pip cache purge
157+
python3 -m venv pyamrex-gpu
158+
source pyamrex-gpu/bin/activate
159+
python3 -m pip install --upgrade pip
160+
python3 -m pip install --upgrade build
161+
python3 -m pip install --upgrade packaging
162+
python3 -m pip install --upgrade wheel
163+
python3 -m pip install --upgrade setuptools
164+
python3 -m pip install --upgrade cython
165+
python3 -m pip install --upgrade numpy
166+
python3 -m pip install --upgrade pandas
167+
python3 -m pip install --upgrade scipy
168+
MPICC="cc -target-accel=nvidia80 -shared" python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py
169+
python3 -m pip install --upgrade openpmd-api
170+
python3 -m pip install --upgrade matplotlib
171+
python3 -m pip install --upgrade yt
172+
python3 -m pip install --upgrade cupy-cuda12x # CUDA 12 compatible wheel
173+
174+
The content presented here is based on the following reference:
175+
176+
* `WarpX documentation <https://warpx.readthedocs.io/en/latest/install/hpc/perlmutter.html>`_

Docs/source/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ sorted by the following categories:
5151
- :ref:`heFFTe<tutorials_heffte>` -- heFFTe distributed tutorials.
5252
- :ref:`Linear Solvers<tutorials_linearsolvers>` -- Examples of several linear solvers.
5353
- :ref:`ML/PYTORCH<tutorials_ml>` -- Use of pytorch models to replace point-wise computational kernels.
54+
- :ref:`MPMD<tutorials_mpmd>` -- Usage of AMReX-MPMD (Multiple Program Multiple Data) framework.
5455
- :ref:`MUI<tutorials_mui>` -- Incorporates the MxUI/MUI (Multiscale Universal interface) frame into AMReX.
5556
- :ref:`Particles<tutorials_particles>` -- Basic usage of AMReX's particle data structures.
5657
- :ref:`Python<tutorials_python>` -- Using AMReX and interfacing with AMReX applications form Python - via `pyAMReX <https://github.com/AMReX-Codes/pyamrex/>`__
@@ -75,6 +76,7 @@ sorted by the following categories:
7576
heFFTe_Tutorial
7677
LinearSolvers_Tutorial
7778
ML_Tutorial
79+
MPMD_Tutorials
7880
MUI_Tutorial
7981
Particles_Tutorial
8082
Python_Tutorial
@@ -102,6 +104,8 @@ sorted by the following categories:
102104

103105
.. _`Linear Solvers`: LinearSolvers_Tutorial.html
104106

107+
.. _`MPMD`: MPMD_Tutorials.html
108+
105109
.. _`MUI`: MUI_Tutorial.html
106110

107111
.. _`Particles`: Particles_Tutorial.html
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
0-7 ./Source_1/main3d.gnu.x86-milan.DEBUG.MPI.ex
2+
8-11 ./Source_2/main3d.gnu.x86-milan.DEBUG.MPI.ex

ExampleCodes/MPMD/Case-1/mpmd_cpu.sh

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#!/bin/bash
2+
#SBATCH -N 1 # Total number of nodes
3+
#SBATCH -n 12 # Total number of tasks
4+
#SBATCH -c 4 # number of processors per MPI task
5+
#SBATCH -C cpu
6+
#SBATCH -q debug
7+
#SBATCH -J mpmd_test
8+
#SBATCH -t 00:05:00
9+
#SBATCH -A mpxxx
10+
11+
#OpenMP settings:
12+
export OMP_NUM_THREADS=1
13+
export OMP_PLACES=threads
14+
export OMP_PROC_BIND=spread
15+
16+
#run the application:
17+
srun --multi-prog --cpu_bind=cores ./mpmd_cpu.conf
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
0-7 ./Source_1/main3d.gnu.DEBUG.MPI.CUDA.ex
2+
8-11 ./Source_2/main3d.gnu.DEBUG.MPI.CUDA.ex

ExampleCodes/MPMD/Case-1/mpmd_gpu.sh

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#!/bin/bash
2+
#SBATCH -N 3 # Total number of nodes
3+
#SBATCH -n 12 # Total number of tasks
4+
#SBATCH -c 4 # number of processors per MPI task
5+
#SBATCH -C gpu
6+
#SBATCH -G 12 # Total number of GPUs
7+
#SBATCH -q debug
8+
#SBATCH -J mpmd_test
9+
#SBATCH -t 00:05:00
10+
#SBATCH -A mpxxx
11+
12+
source ./perlmutter_gpu.profile
13+
14+
#OpenMP settings:
15+
export OMP_NUM_THREADS=1
16+
export OMP_PLACES=threads
17+
export OMP_PROC_BIND=spread
18+
# Taken from WarpX
19+
export MPICH_OFI_NIC_POLICY=GPU
20+
GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1"
21+
22+
#run the application:
23+
srun --multi-prog --cpu_bind=cores --gpu-bind=single:1 ./mpmd_gpu.conf
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# required dependencies
2+
module load gpu
3+
module load PrgEnv-gnu
4+
module load craype
5+
module load craype-x86-milan
6+
module load craype-accel-nvidia80
7+
module load cudatoolkit
8+
module load cmake/3.24.3
9+
10+
# necessary to use CUDA-Aware MPI and run a job
11+
export CRAY_ACCEL_TARGET=nvidia80
12+
13+
# optimize CUDA compilation for A100
14+
export AMREX_CUDA_ARCH=8.0
15+
16+
# optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3)
17+
# note: the cc/CC/ftn wrappers below add those
18+
export CXXFLAGS="-march=znver3"
19+
export CFLAGS="-march=znver3"
20+
21+
# compiler environment hints
22+
export CC=cc
23+
export CXX=CC
24+
export FC=ftn
25+
export CUDACXX=$(which nvcc)
26+
export CUDAHOSTCXX=CC
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
0-7 main3d.gnu.x86-milan.DEBUG.MPI.ex
2+
8-11 python main.py

ExampleCodes/MPMD/Case-2/mpmd_cpu.sh

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#!/bin/bash
2+
#SBATCH -N 1 # Total number of nodes
3+
#SBATCH -n 12 # Total number of tasks
4+
#SBATCH -c 4 # number of processors per MPI task
5+
#SBATCH -C cpu
6+
#SBATCH -q debug
7+
#SBATCH -J mpmd_test
8+
#SBATCH -t 00:05:00
9+
#SBATCH -A mpxxx
10+
11+
# Activate the virtual environment
12+
source /path/to/pyamrex-gpu/bin/activate
13+
14+
#OpenMP settings:
15+
export OMP_NUM_THREADS=1
16+
export OMP_PLACES=threads
17+
export OMP_PROC_BIND=spread
18+
19+
#run the application:
20+
srun --multi-prog --cpu_bind=cores ./mpmd_cpu.conf
21+
22+
deactivate
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
0-7 main3d.gnu.DEBUG.MPI.CUDA.ex
2+
8-11 python main.py

ExampleCodes/MPMD/Case-2/mpmd_gpu.sh

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/bin/bash
2+
#SBATCH -N 3 # Total number of nodes
3+
#SBATCH -n 12 # Total number of tasks
4+
#SBATCH -c 4 # number of processors per MPI task
5+
#SBATCH -C gpu
6+
#SBATCH -G 12 # Total number of GPUs
7+
#SBATCH -q debug
8+
#SBATCH -J mpmd_test
9+
#SBATCH -t 00:05:00
10+
#SBATCH -A mpxxx
11+
12+
source ./perlmutter_gpu.profile
13+
# Activate the virtual environment
14+
source /path/to/pyamrex-gpu/bin/activate
15+
16+
#OpenMP settings:
17+
export OMP_NUM_THREADS=1
18+
export OMP_PLACES=threads
19+
export OMP_PROC_BIND=spread
20+
# Taken from WarpX
21+
export MPICH_OFI_NIC_POLICY=GPU
22+
GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1"
23+
24+
#run the application:
25+
srun --multi-prog --cpu_bind=cores --gpu-bind=single:1 ./mpmd_gpu.conf
26+
27+
deactivate
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# required dependencies
2+
module load gpu
3+
module load PrgEnv-gnu
4+
module load craype
5+
module load craype-x86-milan
6+
module load craype-accel-nvidia80
7+
module load cudatoolkit
8+
module load cmake/3.24.3
9+
# Required for pyAMReX
10+
module load cray-python/3.11.5
11+
12+
# necessary to use CUDA-Aware MPI and run a job
13+
export CRAY_ACCEL_TARGET=nvidia80
14+
15+
# optimize CUDA compilation for A100
16+
export AMREX_CUDA_ARCH=8.0
17+
18+
# optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3)
19+
# note: the cc/CC/ftn wrappers below add those
20+
export CXXFLAGS="-march=znver3"
21+
export CFLAGS="-march=znver3"
22+
23+
# compiler environment hints
24+
export CC=cc
25+
export CXX=CC
26+
export FC=ftn
27+
export CUDACXX=$(which nvcc)
28+
export CUDAHOSTCXX=CC

0 commit comments

Comments
 (0)