From 409382c4b4d28617bb739d84733ecde872a92343 Mon Sep 17 00:00:00 2001 From: sam Date: Fri, 8 Dec 2023 09:09:27 +0100 Subject: [PATCH] move salloc to advanced page --- source/jobs/job_advanced.rst | 112 +++++++++++++++++++++++++++++++++ source/jobs/job_types.rst | 119 +---------------------------------- 2 files changed, 114 insertions(+), 117 deletions(-) diff --git a/source/jobs/job_advanced.rst b/source/jobs/job_advanced.rst index 147dfdcfd..3c10d0bb2 100644 --- a/source/jobs/job_advanced.rst +++ b/source/jobs/job_advanced.rst @@ -168,3 +168,115 @@ on that node group. On Vaughan the output is rather boring as all nodes are iden By specifying additional command line arguments it is possible to further customize the output format. See the `sinfo manual page `_. + +salloc +------ + +You can use ``salloc`` to create a resource allocation. ``salloc`` will wait until the +resources are available and then return a shell prompt. Note however that that shell +is running on the node where you ran ``salloc`` (likely a login node). Contrary to +``srun``, the shell is **not** running on the allocated resources. You can +however run commands on the allocated resources via ``srun``. + +**Note that in particular on clusters with multiple CPU architectures, you need to +understand Linux environments and the way they interact with Slurm very well as you +are now executing commands in two potentially incompatible sections of the cluster that +require different settings in the environment. So if you execute a command in the wrong +environment it may run inefficiently, or it may simply fail.** + +There is no problem with Vaughan though as on that cluster all CPUs are of the same +generation. + +Interactive jobs with salloc +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Non-X11 programs +"""""""""""""""" + +E.g., to run a shared memory program on 16 (virtual) cores, the following commands +can be used: + +.. code:: bash + + login$ salloc --ntasks=1 --cpus-per-task=16 --time=1:00:00 --mem-per-cpu=3g + +executed on a login node will return a shell on the login node. To then execute commands, +it is still better to rebuild the environment just as for a batch script as some modules +will use different settings when they load in a Slurm environment. So let's use this shell +to start the demo program ``omp_hello`` on the allocated resources: + +.. code:: bash + + login$ module --force purge + login$ module load calcua/2020a vsc-tutorial + login$ srun omp_hello + login$ exit + +It is essential in this example that ``omp_hello`` is started through ``srun`` as otherwise +it would be running on the login node rather than on the allocated resources. Also do not forget +to leave the shell when you have finished your interactive work! + +For an MPI or hybrid MPI/OpenMP program you would proceed in exactly the same way, except that the +resource request is different to allocate all MPI tasks. E.g., to run the demo program +``mpi_omp_hello`` in an interactive shell and using 16 MPI processes and 8 threads per MPI +rank, you'd allocate the resources through + +.. code:: bash + + login$ salloc --ntasks=16 --cpus-per-task=8 --time=1:00:00 --mem-per-cpu=3g + +and then run the program with + +.. code:: bash + + login$ module --force purge + login$ module load calcua/2020a vsc-tutorial + login$ srun mpi_omp_hello + login$ exit + +Note that since we are using all allocated resources, we don't need to specify the number of tasks +or virtual CPUs to ``srun``. It will take care of of properly distributing the job according to the +options specified when calling ``salloc``. + + +Running a shared memory X11 program with salloc +""""""""""""""""""""""""""""""""""""""""""""""" + +You can also use ``salloc`` to create a job allocation and then use ``srun`` to +attach an interactive shell to the job in the node that has been allocated. X11 +programs rarely use distributed memory parallelism, so in most case you will be +requesting just a single task. + +The first step is to ensure that X11 access from the login node to your local +screen is properly set up. + +Next, starting a session that uses a full node on Vaughan can be done as + +.. code:: bash + + login$ salloc -n 1 -c 64 -t 1:00:00 --x11 + login$ srun --jobid= --pty bash + r0c00cn0$ xclock + r0c00cn0$ exit + login$ exit + +What this does is: + +1. The first command, executed on the login node, creates a job allocation for + 64 cores. It returns with a shell prompt on the login node as soon as the + allocation is ready and prints the job ID of the running job on the screen. + The ``--x11`` option is used to forward X11 traffic. + +2. Next we log on to the allocated compute node using attaching an interactive + shell (``--pty bash``) to the job ID with ``srun``. + +3. We can now execute X11 commands, launch graphical applications, or anything + else that we want to do and is supported on a compute node. + +4. The first ``exit`` command leaves the compute node and returns to the login + shell, **but still within the salloc command**. + +5. Hence we need a second ``exit`` command to leave the shell created by + ``salloc`` and free the resources for other users. + +**So do not forget that you need to exit two shells to free the resources!** diff --git a/source/jobs/job_types.rst b/source/jobs/job_types.rst index 63c73720a..25b4bfca8 100644 --- a/source/jobs/job_types.rst +++ b/source/jobs/job_types.rst @@ -263,9 +263,6 @@ The following lines automate the launch of the three jobs: Interactive job --------------- -Method 1: with srun -~~~~~~~~~~~~~~~~~~~ - Interactively running shared memory programs """""""""""""""""""""""""""""""""""""""""""" @@ -337,117 +334,5 @@ just a single task. To add support for X11, use the ``--x11`` option before ``-- r0c00cn0$ xclock r0c00cn0$ exit -would allocate 64 (virtual) cores, and the second line starts a simple X11 program, ``x11``, -which is only good to test if X11 programs work but should not be used for other purposes -than this on the clusters. - - -Method 2: With salloc -~~~~~~~~~~~~~~~~~~~~~ - -Non-X11 programs -"""""""""""""""" - -You can use ``salloc`` to create a resource allocation. ``salloc`` will wait until the -resources are available and then return a shell prompt. Note however that that shell -is running on the node where you ran ``salloc`` (likely a login node). Contrary to -the method just before this one based on ``srun``, the shell is **not** running on the -allocated resources. You can however run commands on the allocated resources via -``srun``. - -**Note that in particular on clusters with multiple CPU architectures, you need to -understand Linux environments and the way they interact with Slurm very well as you -are now executing commands in two potentially incompatible sections of the cluster that -require different settings in the environment. So if you execute a command in the wrong -environment it may run inefficiently, or it may simply fail.** - -There is no problem with Vaughan though as on that cluster all CPUs are of the same -generation. - -E.g., to run a shared memory program on 16 (virtual) cores, the following commands -can be used: - -.. code:: bash - - login$ salloc --ntasks=1 --cpus-per-task=16 --time=1:00:00 --mem-per-cpu=3g - -executed on a login node will return a shell on the login node. To then execute commands. -it is still better to rebuild the environment just as for a batch script as some modules -will use different settings when they load in a Slurm environment. So let's use this shell -to start the demo program ``omp_hello`` on the allocated resources: - -.. code:: bash - - login$ module --force purge - login$ module load calcua/2020a vsc-tutorial - login$ srun omp_hello - login$ exit - -It is essential in this example that ``omp_hello`` is started through ``srun`` as otherwise -it would be running on the login node rather than on the allocated resources. Also do not forget -to leave the shell when you have finished your interactive work! - -For an MPI or hybrid MPI/OpenMP program you would proceed in exactly the same way, except that the -resource request is different to allocate all MPI tasks. E.g., to run the demo program -``mpi_omp_hello`` in an interactive shell and using 16 MPI processes and 8 threads per MPI -rank, you'd allocate the resources through - -.. code:: bash - - login$ salloc --ntasks=16 --cpus-per-task=8 --time=1:00:00 --mem-per-cpu=3g - -and then run the program with - -.. code:: bash - - login$ module --force purge - login$ module load calcua/2020a vsc-tutorial - login$ srun mpi_omp_hello - login$ exit - -Note that since we are using all allocated resources, we don't need to specify the number of tasks -or virtual CPUs to ``srun``. It will take care of of properly distributing the job according to the -options specified when calling ``salloc``. - - -Running a shared memory X11 program with salloc -""""""""""""""""""""""""""""""""""""""""""""""" - -You can also use ``salloc`` to create a job allocation and then use ``srun`` to -attach an interactive shell to the job in the node that has been allocated. X11 -programs rarely use distributed memory parallelism, so in most case you will be -requesting just a single task. - -The first step is to ensure that X11 access from the login node to your local -screen is properly set up. - -Next, starting a session that uses a full node on Vaughan can be done as - -.. code:: bash - - login$ salloc -n 1 -c 64 -t 1:00:00 --x11 - login$ srun --jobid= --pty bash - r0c00cn0$ xclock - r0c00cn0$ exit - login$ exit - -What this does is: - -1. The first command, executed on the login node, creates a job allocation for - 64 cores. It returns with a shell prompt on the login node as soon as the - allocation is ready and prints the job ID of the running job on the screen. - The ``--x11`` option is used to forward X11 traffic. - -2. Next we log on to the allocated compute node using attaching an interactive - shell (``--pty bash``) to the job ID with ``srun``. - -3. We can now execute X11 commands, launch graphical applications, or anything - else that we want to do and is supported on a compute node. - -4. The first ``exit`` command leaves the compute node and returns to the login - shell, **but still within the salloc command**. - -5. Hence we need a second ``exit`` command to leave the shell created by - ``salloc`` and free the resources for other users. - -**So do not forget that you need to exit two shells to free the resources!** +would allocate 64 (virtual) cores, and the second line starts a simple X11 program, ``xclock``, +to test if X11 programs work.