Skip to content

Commit 50736ee

Browse files
committed
dont ssh into compute nodes
1 parent 30cc5a5 commit 50736ee

File tree

1 file changed

+20
-24
lines changed

1 file changed

+20
-24
lines changed

source/jobs/job_types.rst

Lines changed: 20 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -413,45 +413,41 @@ options specified when calling ``salloc``.
413413
Running a shared memory X11 program with salloc
414414
"""""""""""""""""""""""""""""""""""""""""""""""
415415

416-
As you can log on to a compute node while you have resources allocated on that node, you can also
417-
use ``salloc`` to create an allocation and then use ``ssh`` to log on to the node that has been
418-
allocated. This only makes sense if you allocate one node or less, as when you log on via ``ssh``,
419-
you leave the Slurm environment and hence cannot rely on Slurm anymore to start tasks. Moreover,
420-
if you have multiple jobs running on the node, you may inadvertently influence those jobs as
421-
any work that you're doing in that ssh session is not fully under the control of Slurm anymore.
416+
You can also use ``salloc`` to create a job allocation and then use ``srun`` to
417+
attach an interactive shell to the job in the node that has been allocated. X11
418+
programs rarely use distributed memory parallelism, so in most case you will be
419+
requesting just a single task.
422420

423-
As when using ``srun``, the first step is to ensure that X11 access from the login node to your
424-
local screen is properly set up.
421+
The first step is to ensure that X11 access from the login node to your local
422+
screen is properly set up.
425423

426424
Next, starting a session that uses a full node on Vaughan can be done as
427425

428426
.. code:: bash
429427
430-
login$ salloc -n 1 -c 64 -t 1:00:00
431-
login$ ssh -X $SLURM_JOB_NODELIST
428+
login$ salloc -n 1 -c 64 -t 1:00:00 --x11
429+
login$ srun --jobid=<jobid> --pty bash
432430
r0c00cn0$ xclock
433431
r0c00cn0$ exit
434432
login$ exit
435433
436434
What this does is:
437435

438-
1. The first command, executed on the login node, creates an allocation for 64 cores. It returns with
439-
a shell prompt on the login node as soon as the allocation is ready.
436+
1. The first command, executed on the login node, creates a job allocation for
437+
64 cores. It returns with a shell prompt on the login node as soon as the
438+
allocation is ready and prints the job ID of the running job on the screen.
439+
The ``--x11`` option is used to forward X11 traffic.
440440

441-
2. Next we log on to the allocated compute node using ``ssh``. The ``-X`` option is used to forwarded
442-
X11 traffic. As we allocate only a single node, ``$SLURM_JOB_NODELIST`` is just a single node and can be
443-
used as argument to ``ssh``.
441+
2. Next we log on to the allocated compute node using attaching an interactive
442+
shell (``--pty bash``) to the job ID with ``srun``.
444443

445-
3. It is important to note that now we are in our **home directory** on the compute node as ``ssh`` starts
446-
with a clean login shell. We can now execute X11 commands or anything we want to do and is supported on
447-
a compute node.
444+
3. We can now execute X11 commands, launch graphical applications, or anything
445+
else that we want to do and is supported on a compute node.
448446

449-
4. The first ``exit`` command leaves the compute node and returns to the login shell, **but still within the
450-
salloc command**.
447+
4. The first ``exit`` command leaves the compute node and returns to the login
448+
shell, **but still within the salloc command**.
451449

452-
5. Hence we need a second ``exit`` command to leave the shell created by ``salloc`` and free the resources for
453-
other users.
450+
5. Hence we need a second ``exit`` command to leave the shell created by
451+
``salloc`` and free the resources for other users.
454452

455453
**So do not forget that you need to exit two shells to free the resources!**
456-
457-

0 commit comments

Comments
 (0)