AMReX-MPMD - scripts for Perlmutter

siddanib · siddanib · commit 1fde1ab5985e · 2024-05-14T17:11:48.000-07:00
diff --git a/Docs/source/MPMD_Tutorials.rst b/Docs/source/MPMD_Tutorials.rst
@@ -1,8 +1,10 @@
+.. _tutorials_mpmd:
+
 AMReX-MPMD
 ==========
 
 AMReX-MPMD utilizes the Multiple Program Multiple Data (MPMD) feature of MPI to provide cross-functionality for AMReX-based codes.
-The framework enables data transfer across two different applications through **MPMD::Copier** class, which takes **BoxArray** of its application as an argument.
+The framework enables data transfer across two different applications through **MPMD::Copier** class, which typically takes **BoxArray** of its application as an argument.
 **Copier** instances created in both the applications together identify the overlapping cells for which the data transfer must occur.
 **Copier::send** & **Copier::recv** functions, which take a **MultiFab** as an argument, are used to transfer the desired data of overlapping regions.
 
@@ -22,6 +24,8 @@ Overview
 
 The domain in ``main_1.cpp`` is set to ``lo = {0, 0, 0}`` and ``hi = {31, 31, 31}``, while the domain in ``main_2.cpp`` is set to ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``.
 Hence, the data transfer will occur for the region ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``.
+Furthermore, the domain in ``main_1.cpp`` is split into boxes using ``max_grid_size=16``, while the domain in ``main_2.cpp`` is split using ``max_grid_size=8``.
+Therefore, the **BoxArray**, and moreover, the number of boxes for the overlapping region are different across the two applications.
 The data transfer demonstration is performed using a two component *MultiFab*.
 The first component is populated in ``main_1.cpp`` before it is transferred to ``main_2.cpp``.
 The second component is populated in ``main_2.cpp`` based on the received first component.
@@ -58,6 +62,19 @@ Furthermore, the run process here assumes that the current working directory is
    # Running the MPMD process with 12 ranks
    mpirun -np 8 Source_1/main3d.gnu.DEBUG.MPI.ex : -np 4 Source_2/main3d.gnu.DEBUG.MPI.ex
 
+Running on Perlmutter (NERSC)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This section presents information regarding sample scripts that can be used to run this case on `Perlmutter (NERSC) <https://docs.nersc.gov/systems/perlmutter/>`_.
+The scripts ``mpmd_cpu.sh`` and ``mpmd_cpu.conf`` can be used to run the CPU version.
+Similarly, ``mpmd_gpu.sh`` and ``mpmd_gpu.conf`` can be used to run the GPU version.
+Please note that ``perlmutter_gpu.profile`` must be leveraged to compile the GPU version of the applications.
+
+The content presented here is based on the following references:
+
+   * `NERSC documentation <https://docs.nersc.gov/jobs/examples/#mpmd-multiple-program-multiple-data-jobs>`_
+   * `WarpX documentation <https://warpx.readthedocs.io/en/latest/install/hpc/perlmutter.html>`_
+
 Case-2
 ------
 
@@ -72,7 +89,7 @@ Contents
 Overview
 ^^^^^^^^
 
-In the previous case (Case-1) of MPMD each application has its own domain, and, therefore, different **BoxArray**.
+In the previous case (Case-1) of MPMD each application has its own domain, and therefore, different **BoxArray**.
 However, there exist scenarios where both applications deal with the same **BoxArray**.
 The current case presents such a scenario where the **BoxArray** is defined only in the ``main.cpp`` application, but this information is relayed to ``main.py`` application through the **MPMD::Copier**.
 
@@ -115,3 +132,45 @@ Furthermore, the run process here assumes that the current working directory is
 
    # Running the MPMD process with 12 ranks
    mpirun -np 8 ./main3d.gnu.DEBUG.MPI.ex : -np 4 python main.py
+
+Running on Perlmutter (NERSC)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Running this case on perlmutter involves creating a python virtual environment.
+pyAMReX must be compiled and installed into this virtual environment after its creation.
+Similar to the previous case, this case also has supporting scripts to run on CPUs and GPUs.
+
+Creating a virtual environment
+""""""""""""""""""""""""""""""
+
+.. code-block:: bash
+
+   # Setup the required environment variables
+   source perlmutter_gpu.profile
+
+   # BEFORE PERFORMING THE FOLLOWING COMMANDS
+   # MOVE TO A DIRECTORY WHERE THE PYTHON VIRTUAL ENVIRONMENT MUST EXIST
+
+   python3 -m pip install --upgrade pip
+   python3 -m pip install --upgrade virtualenv
+   python3 -m pip cache purge
+   python3 -m venv pyamrex-gpu
+   source pyamrex-gpu/bin/activate
+   python3 -m pip install --upgrade pip
+   python3 -m pip install --upgrade build
+   python3 -m pip install --upgrade packaging
+   python3 -m pip install --upgrade wheel
+   python3 -m pip install --upgrade setuptools
+   python3 -m pip install --upgrade cython
+   python3 -m pip install --upgrade numpy
+   python3 -m pip install --upgrade pandas
+   python3 -m pip install --upgrade scipy
+   MPICC="cc -target-accel=nvidia80 -shared" python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py
+   python3 -m pip install --upgrade openpmd-api
+   python3 -m pip install --upgrade matplotlib
+   python3 -m pip install --upgrade yt
+   python3 -m pip install --upgrade cupy-cuda12x  # CUDA 12 compatible wheel
+
+The content presented here is based on the following reference:
+
+   * `WarpX documentation <https://warpx.readthedocs.io/en/latest/install/hpc/perlmutter.html>`_
diff --git a/Docs/source/index.rst b/Docs/source/index.rst
@@ -51,6 +51,7 @@ sorted by the following categories:
 - :ref:`heFFTe<tutorials_heffte>`  -- heFFTe distributed tutorials.
 - :ref:`Linear Solvers<tutorials_linearsolvers>`  -- Examples of several linear solvers.
 - :ref:`ML/PYTORCH<tutorials_ml>`  -- Use of pytorch models to replace point-wise computational kernels.
+- :ref:`MPMD<tutorials_mpmd>` -- Usage of AMReX-MPMD (Multiple Program Multiple Data) framework.
 - :ref:`MUI<tutorials_mui>`  -- Incorporates the MxUI/MUI (Multiscale Universal interface) frame into AMReX.
 - :ref:`Particles<tutorials_particles>`  -- Basic usage of AMReX's particle data structures.
 - :ref:`Python<tutorials_python>`  -- Using AMReX and interfacing with AMReX applications form Python - via `pyAMReX <https://github.com/AMReX-Codes/pyamrex/>`__
@@ -75,6 +76,7 @@ sorted by the following categories:
    heFFTe_Tutorial
    LinearSolvers_Tutorial
    ML_Tutorial
+   MPMD_Tutorials
    MUI_Tutorial
    Particles_Tutorial
    Python_Tutorial
@@ -102,6 +104,8 @@ sorted by the following categories:
 
 .. _`Linear Solvers`:  LinearSolvers_Tutorial.html
 
+.. _`MPMD`:  MPMD_Tutorials.html
+
 .. _`MUI`: MUI_Tutorial.html
 
 .. _`Particles`: Particles_Tutorial.html
diff --git a/ExampleCodes/MPMD/Case-1/mpmd_cpu.conf b/ExampleCodes/MPMD/Case-1/mpmd_cpu.conf
@@ -0,0 +1,2 @@
+0-7 ./Source_1/main3d.gnu.x86-milan.DEBUG.MPI.ex
+8-11 ./Source_2/main3d.gnu.x86-milan.DEBUG.MPI.ex
diff --git a/ExampleCodes/MPMD/Case-1/mpmd_cpu.sh b/ExampleCodes/MPMD/Case-1/mpmd_cpu.sh
@@ -0,0 +1,17 @@
+#!/bin/bash
+#SBATCH -N 1 # Total number of nodes
+#SBATCH -n 12 # Total number of tasks
+#SBATCH -c 4 # number of processors per MPI task
+#SBATCH -C cpu
+#SBATCH -q debug
+#SBATCH -J mpmd_test
+#SBATCH -t 00:05:00
+#SBATCH -A mpxxx
+
+#OpenMP settings:
+export OMP_NUM_THREADS=1
+export OMP_PLACES=threads
+export OMP_PROC_BIND=spread
+
+#run the application:
+srun --multi-prog --cpu_bind=cores ./mpmd_cpu.conf
diff --git a/ExampleCodes/MPMD/Case-1/mpmd_gpu.conf b/ExampleCodes/MPMD/Case-1/mpmd_gpu.conf
@@ -0,0 +1,2 @@
+0-7 ./Source_1/main3d.gnu.DEBUG.MPI.CUDA.ex 
+8-11 ./Source_2/main3d.gnu.DEBUG.MPI.CUDA.ex
diff --git a/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh b/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+#SBATCH -N 3 # Total number of nodes
+#SBATCH -n 12 # Total number of tasks
+#SBATCH -c 4 # number of processors per MPI task
+#SBATCH -C gpu
+#SBATCH -G 12 # Total number of GPUs
+#SBATCH -q debug
+#SBATCH -J mpmd_test
+#SBATCH -t 00:05:00
+#SBATCH -A mpxxx
+
+source ./perlmutter_gpu.profile 
+
+#OpenMP settings:
+export OMP_NUM_THREADS=1
+export OMP_PLACES=threads
+export OMP_PROC_BIND=spread
+# Taken from WarpX
+export MPICH_OFI_NIC_POLICY=GPU
+GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1"
+
+#run the application:
+srun --multi-prog --cpu_bind=cores --gpu-bind=single:1 ./mpmd_gpu.conf
diff --git a/ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile b/ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile
@@ -0,0 +1,26 @@
+# required dependencies
+module load gpu
+module load PrgEnv-gnu
+module load craype
+module load craype-x86-milan
+module load craype-accel-nvidia80
+module load cudatoolkit
+module load cmake/3.24.3
+
+# necessary to use CUDA-Aware MPI and run a job
+export CRAY_ACCEL_TARGET=nvidia80
+
+# optimize CUDA compilation for A100
+export AMREX_CUDA_ARCH=8.0
+
+# optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3)
+# note: the cc/CC/ftn wrappers below add those
+export CXXFLAGS="-march=znver3"
+export CFLAGS="-march=znver3"
+
+# compiler environment hints
+export CC=cc
+export CXX=CC
+export FC=ftn
+export CUDACXX=$(which nvcc)
+export CUDAHOSTCXX=CC
diff --git a/ExampleCodes/MPMD/Case-2/mpmd_cpu.conf b/ExampleCodes/MPMD/Case-2/mpmd_cpu.conf
@@ -0,0 +1,2 @@
+0-7 main3d.gnu.x86-milan.DEBUG.MPI.ex
+8-11 python main.py
diff --git a/ExampleCodes/MPMD/Case-2/mpmd_cpu.sh b/ExampleCodes/MPMD/Case-2/mpmd_cpu.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+#SBATCH -N 1 # Total number of nodes
+#SBATCH -n 12 # Total number of tasks
+#SBATCH -c 4 # number of processors per MPI task
+#SBATCH -C cpu
+#SBATCH -q debug
+#SBATCH -J mpmd_test
+#SBATCH -t 00:05:00
+#SBATCH -A mpxxx
+
+# Activate the virtual environment
+source /path/to/pyamrex-gpu/bin/activate
+
+#OpenMP settings:
+export OMP_NUM_THREADS=1
+export OMP_PLACES=threads
+export OMP_PROC_BIND=spread
+
+#run the application:
+srun --multi-prog --cpu_bind=cores ./mpmd_cpu.conf
+
+deactivate
diff --git a/ExampleCodes/MPMD/Case-2/mpmd_gpu.conf b/ExampleCodes/MPMD/Case-2/mpmd_gpu.conf
@@ -0,0 +1,2 @@
+0-7 main3d.gnu.DEBUG.MPI.CUDA.ex 
+8-11 python main.py
diff --git a/ExampleCodes/MPMD/Case-2/mpmd_gpu.sh b/ExampleCodes/MPMD/Case-2/mpmd_gpu.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+#SBATCH -N 3 # Total number of nodes
+#SBATCH -n 12 # Total number of tasks
+#SBATCH -c 4 # number of processors per MPI task
+#SBATCH -C gpu
+#SBATCH -G 12 # Total number of GPUs
+#SBATCH -q debug
+#SBATCH -J mpmd_test
+#SBATCH -t 00:05:00
+#SBATCH -A mpxxx
+
+source ./perlmutter_gpu.profile
+# Activate the virtual environment
+source /path/to/pyamrex-gpu/bin/activate
+
+#OpenMP settings:
+export OMP_NUM_THREADS=1
+export OMP_PLACES=threads
+export OMP_PROC_BIND=spread
+# Taken from WarpX
+export MPICH_OFI_NIC_POLICY=GPU
+GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1"
+
+#run the application:
+srun --multi-prog --cpu_bind=cores --gpu-bind=single:1 ./mpmd_gpu.conf
+
+deactivate
diff --git a/ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile b/ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile
@@ -0,0 +1,28 @@
+# required dependencies
+module load gpu
+module load PrgEnv-gnu
+module load craype
+module load craype-x86-milan
+module load craype-accel-nvidia80
+module load cudatoolkit
+module load cmake/3.24.3
+# Required for pyAMReX
+module load cray-python/3.11.5
+
+# necessary to use CUDA-Aware MPI and run a job
+export CRAY_ACCEL_TARGET=nvidia80
+
+# optimize CUDA compilation for A100
+export AMREX_CUDA_ARCH=8.0
+
+# optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3)
+# note: the cc/CC/ftn wrappers below add those
+export CXXFLAGS="-march=znver3"
+export CFLAGS="-march=znver3"
+
+# compiler environment hints
+export CC=cc
+export CXX=CC
+export FC=ftn
+export CUDACXX=$(which nvcc)
+export CUDAHOSTCXX=CC

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+0-7 ./Source_1/main3d.gnu.x86-milan.DEBUG.MPI.ex`
	`2`	`+8-11 ./Source_2/main3d.gnu.x86-milan.DEBUG.MPI.ex`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+0-7 ./Source_1/main3d.gnu.DEBUG.MPI.CUDA.ex`
	`2`	`+8-11 ./Source_2/main3d.gnu.DEBUG.MPI.CUDA.ex`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+0-7 main3d.gnu.x86-milan.DEBUG.MPI.ex`
	`2`	`+8-11 python main.py`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+0-7 main3d.gnu.DEBUG.MPI.CUDA.ex`
	`2`	`+8-11 python main.py`