Skip to content

Commit

Permalink
move salloc to advanced page
Browse files Browse the repository at this point in the history
  • Loading branch information
smoors committed Dec 8, 2023
1 parent 50736ee commit 409382c
Show file tree
Hide file tree
Showing 2 changed files with 114 additions and 117 deletions.
112 changes: 112 additions & 0 deletions source/jobs/job_advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -168,3 +168,115 @@ on that node group. On Vaughan the output is rather boring as all nodes are iden

By specifying additional command line arguments it is possible to further customize the
output format. See the `sinfo manual page <https://slurm.schedmd.com/sinfo.html>`_.

salloc
------

You can use ``salloc`` to create a resource allocation. ``salloc`` will wait until the
resources are available and then return a shell prompt. Note however that that shell
is running on the node where you ran ``salloc`` (likely a login node). Contrary to
``srun``, the shell is **not** running on the allocated resources. You can
however run commands on the allocated resources via ``srun``.

**Note that in particular on clusters with multiple CPU architectures, you need to
understand Linux environments and the way they interact with Slurm very well as you
are now executing commands in two potentially incompatible sections of the cluster that
require different settings in the environment. So if you execute a command in the wrong
environment it may run inefficiently, or it may simply fail.**

There is no problem with Vaughan though as on that cluster all CPUs are of the same
generation.

Interactive jobs with salloc
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Non-X11 programs
""""""""""""""""

E.g., to run a shared memory program on 16 (virtual) cores, the following commands
can be used:

.. code:: bash
login$ salloc --ntasks=1 --cpus-per-task=16 --time=1:00:00 --mem-per-cpu=3g
executed on a login node will return a shell on the login node. To then execute commands,
it is still better to rebuild the environment just as for a batch script as some modules
will use different settings when they load in a Slurm environment. So let's use this shell
to start the demo program ``omp_hello`` on the allocated resources:

.. code:: bash
login$ module --force purge
login$ module load calcua/2020a vsc-tutorial
login$ srun omp_hello
login$ exit
It is essential in this example that ``omp_hello`` is started through ``srun`` as otherwise
it would be running on the login node rather than on the allocated resources. Also do not forget
to leave the shell when you have finished your interactive work!

For an MPI or hybrid MPI/OpenMP program you would proceed in exactly the same way, except that the
resource request is different to allocate all MPI tasks. E.g., to run the demo program
``mpi_omp_hello`` in an interactive shell and using 16 MPI processes and 8 threads per MPI
rank, you'd allocate the resources through

.. code:: bash
login$ salloc --ntasks=16 --cpus-per-task=8 --time=1:00:00 --mem-per-cpu=3g
and then run the program with

.. code:: bash
login$ module --force purge
login$ module load calcua/2020a vsc-tutorial
login$ srun mpi_omp_hello
login$ exit
Note that since we are using all allocated resources, we don't need to specify the number of tasks
or virtual CPUs to ``srun``. It will take care of of properly distributing the job according to the
options specified when calling ``salloc``.


Running a shared memory X11 program with salloc
"""""""""""""""""""""""""""""""""""""""""""""""

You can also use ``salloc`` to create a job allocation and then use ``srun`` to
attach an interactive shell to the job in the node that has been allocated. X11
programs rarely use distributed memory parallelism, so in most case you will be
requesting just a single task.

The first step is to ensure that X11 access from the login node to your local
screen is properly set up.

Next, starting a session that uses a full node on Vaughan can be done as

.. code:: bash
login$ salloc -n 1 -c 64 -t 1:00:00 --x11
login$ srun --jobid=<jobid> --pty bash
r0c00cn0$ xclock
r0c00cn0$ exit
login$ exit
What this does is:

1. The first command, executed on the login node, creates a job allocation for
64 cores. It returns with a shell prompt on the login node as soon as the
allocation is ready and prints the job ID of the running job on the screen.
The ``--x11`` option is used to forward X11 traffic.

2. Next we log on to the allocated compute node using attaching an interactive
shell (``--pty bash``) to the job ID with ``srun``.

3. We can now execute X11 commands, launch graphical applications, or anything
else that we want to do and is supported on a compute node.

4. The first ``exit`` command leaves the compute node and returns to the login
shell, **but still within the salloc command**.

5. Hence we need a second ``exit`` command to leave the shell created by
``salloc`` and free the resources for other users.

**So do not forget that you need to exit two shells to free the resources!**
119 changes: 2 additions & 117 deletions source/jobs/job_types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -263,9 +263,6 @@ The following lines automate the launch of the three jobs:
Interactive job
---------------

Method 1: with srun
~~~~~~~~~~~~~~~~~~~

Interactively running shared memory programs
""""""""""""""""""""""""""""""""""""""""""""

Expand Down Expand Up @@ -337,117 +334,5 @@ just a single task. To add support for X11, use the ``--x11`` option before ``--
r0c00cn0$ xclock
r0c00cn0$ exit
would allocate 64 (virtual) cores, and the second line starts a simple X11 program, ``x11``,
which is only good to test if X11 programs work but should not be used for other purposes
than this on the clusters.


Method 2: With salloc
~~~~~~~~~~~~~~~~~~~~~

Non-X11 programs
""""""""""""""""

You can use ``salloc`` to create a resource allocation. ``salloc`` will wait until the
resources are available and then return a shell prompt. Note however that that shell
is running on the node where you ran ``salloc`` (likely a login node). Contrary to
the method just before this one based on ``srun``, the shell is **not** running on the
allocated resources. You can however run commands on the allocated resources via
``srun``.

**Note that in particular on clusters with multiple CPU architectures, you need to
understand Linux environments and the way they interact with Slurm very well as you
are now executing commands in two potentially incompatible sections of the cluster that
require different settings in the environment. So if you execute a command in the wrong
environment it may run inefficiently, or it may simply fail.**

There is no problem with Vaughan though as on that cluster all CPUs are of the same
generation.

E.g., to run a shared memory program on 16 (virtual) cores, the following commands
can be used:

.. code:: bash
login$ salloc --ntasks=1 --cpus-per-task=16 --time=1:00:00 --mem-per-cpu=3g
executed on a login node will return a shell on the login node. To then execute commands.
it is still better to rebuild the environment just as for a batch script as some modules
will use different settings when they load in a Slurm environment. So let's use this shell
to start the demo program ``omp_hello`` on the allocated resources:

.. code:: bash
login$ module --force purge
login$ module load calcua/2020a vsc-tutorial
login$ srun omp_hello
login$ exit
It is essential in this example that ``omp_hello`` is started through ``srun`` as otherwise
it would be running on the login node rather than on the allocated resources. Also do not forget
to leave the shell when you have finished your interactive work!

For an MPI or hybrid MPI/OpenMP program you would proceed in exactly the same way, except that the
resource request is different to allocate all MPI tasks. E.g., to run the demo program
``mpi_omp_hello`` in an interactive shell and using 16 MPI processes and 8 threads per MPI
rank, you'd allocate the resources through

.. code:: bash
login$ salloc --ntasks=16 --cpus-per-task=8 --time=1:00:00 --mem-per-cpu=3g
and then run the program with

.. code:: bash
login$ module --force purge
login$ module load calcua/2020a vsc-tutorial
login$ srun mpi_omp_hello
login$ exit
Note that since we are using all allocated resources, we don't need to specify the number of tasks
or virtual CPUs to ``srun``. It will take care of of properly distributing the job according to the
options specified when calling ``salloc``.


Running a shared memory X11 program with salloc
"""""""""""""""""""""""""""""""""""""""""""""""

You can also use ``salloc`` to create a job allocation and then use ``srun`` to
attach an interactive shell to the job in the node that has been allocated. X11
programs rarely use distributed memory parallelism, so in most case you will be
requesting just a single task.

The first step is to ensure that X11 access from the login node to your local
screen is properly set up.

Next, starting a session that uses a full node on Vaughan can be done as

.. code:: bash
login$ salloc -n 1 -c 64 -t 1:00:00 --x11
login$ srun --jobid=<jobid> --pty bash
r0c00cn0$ xclock
r0c00cn0$ exit
login$ exit
What this does is:

1. The first command, executed on the login node, creates a job allocation for
64 cores. It returns with a shell prompt on the login node as soon as the
allocation is ready and prints the job ID of the running job on the screen.
The ``--x11`` option is used to forward X11 traffic.

2. Next we log on to the allocated compute node using attaching an interactive
shell (``--pty bash``) to the job ID with ``srun``.

3. We can now execute X11 commands, launch graphical applications, or anything
else that we want to do and is supported on a compute node.

4. The first ``exit`` command leaves the compute node and returns to the login
shell, **but still within the salloc command**.

5. Hence we need a second ``exit`` command to leave the shell created by
``salloc`` and free the resources for other users.

**So do not forget that you need to exit two shells to free the resources!**
would allocate 64 (virtual) cores, and the second line starts a simple X11 program, ``xclock``,
to test if X11 programs work.

0 comments on commit 409382c

Please sign in to comment.