Skip to content

Commit 425a197

Browse files
committed
Merge branch 'slurm-24.05' into 24.05.ug-before-reduce-patches
2 parents b5ef918 + bd17c8d commit 425a197

37 files changed

+1045
-719
lines changed

META

+2-2
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
Name: slurm
88
Major: 24
99
Minor: 05
10-
Micro: 4
11-
Version: 24.05.4
10+
Micro: 5
11+
Version: 24.05.5
1212
Release: 1
1313

1414
##

NEWS

+22
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
This file describes changes in recent versions of Slurm. It primarily
22
documents those changes that are of interest to users and administrators.
33

4+
* Changes in Slurm 24.05.6
5+
==========================
6+
47
* Changes in Slurm 24.05.5
58
==========================
69
-- Fix issue signaling cron jobs resulting in unintended requeues.
@@ -14,6 +17,25 @@ documents those changes that are of interest to users and administrators.
1417
removal of a dynamic node.
1518
-- gpu/nvml - Attempt loading libnvidia-ml.so.1 as a fallback for failure in
1619
loading libnvidia-ml.so.
20+
-- slurmrestd - Fix populating non-required object fields of objects as '{}' in
21+
JSON/YAML instead of 'null' causing compiled OpenAPI clients to reject
22+
the response to 'GET /slurm/v0.0.40/jobs' due to validation failure of
23+
'.jobs[].job_resources'.
24+
-- Fix sstat/sattach protocol errors for steps on higher version slurmd's
25+
(regressions since 20.11.0rc1 and 16.05.1rc1 respectively).
26+
-- slurmd - Avoid a crash when starting slurmd version 24.05 with
27+
SlurmdSpoolDir files that have been upgraded to a newer major version of
28+
Slurm. Log warnings instead.
29+
-- Fix race condition in stepmgr step completion handling.
30+
-- Fix slurmctld segfault with stepmgr and MpiParams when running a job array.
31+
-- Fix requeued jobs keeping their priority until the decay thread happens.
32+
-- slurmctld - Fix crash and possible split brain issue if the
33+
backup controller handles an scontrol reconfigure while in control
34+
before the primary resumes operation.
35+
-- Fix stepmgr not getting dynamic node addrs from the controller
36+
-- stepmgr - avoid "Unexpected missing socket" errors.
37+
-- Fix `scontrol show steps` with dynamic stepmgr
38+
-- Support IPv6 in configless mode.
1739

1840
* Changes in Slurm 24.05.4
1941
==========================

debian/changelog

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
slurm-smd (24.05.4-1) UNRELEASED; urgency=medium
1+
slurm-smd (24.05.5-1) UNRELEASED; urgency=medium
22

33
* Initial release.
44

doc/html/containers.shtml

+82-10
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,11 @@ job or any given plugin).</li>
7878

7979
<h2 id="prereq">Prerequisites<a class="slurm_link" href="#prereq"></a></h2>
8080
<p>The host kernel must be configured to allow user land containers:</p>
81-
<pre>$ sudo sysctl -w kernel.unprivileged_userns_clone=1</pre>
81+
<pre>
82+
sudo sysctl -w kernel.unprivileged_userns_clone=1
83+
sudo sysctl -w kernel.apparmor_restrict_unprivileged_unconfined=0
84+
sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0
85+
</pre>
8286

8387
<p>Docker also provides a tool to verify the kernel configuration:
8488
<pre>$ dockerd-rootless-setuptool.sh check --force
@@ -353,6 +357,62 @@ exit $rc
353357
</pre>
354358
</p>
355359

360+
<h3 id="multiple-runtimes">Handling multiple runtimes
361+
<a class="slurm_link" href="#multiple-runtimes"></a>
362+
</h3>
363+
364+
<p>If you wish to accommodate multiple runtimes in your environment,
365+
it is possible to do so with a bit of extra setup. This section outlines one
366+
possible way to do so:</p>
367+
368+
<ol>
369+
<li>Create a generic oci.conf that calls a wrapper script
370+
<pre>
371+
IgnoreFileConfigJson=true
372+
RunTimeRun="/opt/slurm-oci/run %b %m %u %U %n %j %s %t %@"
373+
RunTimeKill="kill -s SIGTERM %p"
374+
RunTimeDelete="kill -s SIGKILL %p"
375+
</pre>
376+
</li>
377+
<li>Create the wrapper script to check for user-specific run configuration
378+
(e.g., /opt/slurm-oci/run)
379+
<pre>
380+
#!/bin/bash
381+
if [[ -e ~/.slurm-oci-run ]]; then
382+
~/.slurm-oci-run "$@"
383+
else
384+
/opt/slurm-oci/slurm-oci-run-default "$@"
385+
fi
386+
</pre>
387+
</li>
388+
<li>Create a generic run configuration to use as the default
389+
(e.g., /opt/slurm-oci/slurm-oci-run-default)
390+
<pre>
391+
#!/bin/bash --login
392+
# Parse
393+
CONTAINER="$1"
394+
SPOOL_DIR="$2"
395+
USER_NAME="$3"
396+
USER_ID="$4"
397+
NODE_NAME="$5"
398+
JOB_ID="$6"
399+
STEP_ID="$7"
400+
TASK_ID="$8"
401+
shift 8 # subsequent arguments are the command to run in the container
402+
# Run
403+
apptainer run --bind /var/spool --containall "$CONTAINER" "$@"
404+
</pre>
405+
</li>
406+
<li>Add executable permissions to both scripts
407+
<pre>chmod +x /opt/slurm-oci/run /opt/slurm-oci/slurm-oci-run-default</pre>
408+
</li>
409+
</ol>
410+
411+
<p>Once this is done, users may create a script at '~/.slurm-oci-run' if
412+
they wish to customize the container run process, such as using a different
413+
container runtime. Users should model this file after the default
414+
'/opt/slurm-oci/slurm-oci-run-default'</p>
415+
356416
<h2 id="testing">Testing OCI runtime outside of Slurm
357417
<a class="slurm_link" href="#testing"></a>
358418
</h2>
@@ -458,11 +518,16 @@ scrun being isolated from the network and not being able to communicate with
458518
the Slurm controller. The container is run by Slurm on the compute nodes which
459519
makes having Docker setup a network isolation layer ineffective for the
460520
container.</li>
461-
<li><pre>docker exec</pre> command is not supported.</li>
462-
<li><pre>docker compose</pre> command is not supported.</li>
463-
<li><pre>docker pause</pre> command is not supported.</li>
464-
<li><pre>docker unpause</pre> command is not supported.</li>
465-
<li><pre>docker swarm</pre> command is not supported.</li>
521+
<li><code>docker exec</code> command is not supported.</li>
522+
<li><code>docker swarm</code> command is not supported.</li>
523+
<li><code>docker compose</code>/<code>docker-compose</code> command is not
524+
supported.</li>
525+
<li><code>docker pause</code> command is not supported.</li>
526+
<li><code>docker unpause</code> command is not supported.</li>
527+
<li><code>docker swarm</code> command is not supported.</li>
528+
<li>All <code>docker</code> commands are not supported inside of containers.</li>
529+
<li><a href="https://docs.docker.com/reference/api/engine/">Docker API</a> is
530+
not supported inside of containers.</li>
466531
</ol>
467532

468533
<h3>Setup procedure</h3>
@@ -580,9 +645,16 @@ configuration.</li>
580645
<li>All containers must use
581646
<a href="https://github.com/containers/podman/blob/main/docs/tutorials/basic_networking.md">
582647
host networking</a></li>
583-
<li><pre>podman exec</pre> command is not supported.</li>
584-
<li><pre>podman kube</pre> command is not supported.</li>
585-
<li><pre>podman pod</pre> command is not supported.</li>
648+
<li><code>podman exec</code> command is not supported.</li>
649+
<li><code>podman-compose</code> command is not supported, due to only being
650+
partially implemented. Some compositions may work but each container
651+
may be run on different nodes. The network for all containers must be
652+
the <code>network_mode: host</code> device.</li>
653+
<li><code>podman kube</code> command is not supported.</li>
654+
<li><code>podman pod</code> command is not supported.</li>
655+
<li><code>podman farm</code> command is not supported.</li>
656+
<li>All <code>podman</code> commands are not supported inside of containers.</li>
657+
<li>Podman REST API is not supported inside of containers.</li>
586658
</ol>
587659

588660
<h3>Setup procedure</h3>
@@ -875,6 +947,6 @@ Overview slides of Sarus are
875947

876948
<hr size=4 width="100%">
877949

878-
<p style="text-align:center;">Last modified 08 October 2024</p>
950+
<p style="text-align:center;">Last modified 27 November 2024</p>
879951

880952
<!--#include virtual="footer.txt"-->

doc/html/faq.shtml

+4-4
Original file line numberDiff line numberDiff line change
@@ -1231,9 +1231,9 @@ that node may be rendered unusable, but no other harm will result.</p>
12311231

12321232
<p><a id="clock"><b>Do I need to maintain synchronized
12331233
clocks on the cluster?</b></a><br>
1234-
In general, yes. Having inconsistent clocks may cause nodes to
1235-
be unusable. Slurm log files should contain references to
1236-
expired credentials. For example:</p>
1234+
In general, yes. Having inconsistent clocks may cause nodes to be unusable and
1235+
generate errors in Slurm log files regarding expired credentials. For example:
1236+
</p>
12371237
<pre>
12381238
error: Munge decode failed: Expired credential
12391239
ENCODED: Wed May 12 12:34:56 2008
@@ -2438,6 +2438,6 @@ dset TV::parallel_configs {
24382438
}
24392439
!-->
24402440

2441-
<p style="text-align:center;">Last modified 07 November 2024</p>
2441+
<p style="text-align:center;">Last modified 19 November 2024</p>
24422442

24432443
<!--#include virtual="footer.txt"-->

doc/html/quickstart_admin.shtml

+61-57
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,8 @@
1111
<ul>
1212
<li><a href="#prereqs">Installing Prerequisites</a></li>
1313
<li><a href="#rpmbuild">Building RPMs</a></li>
14-
<li><a href="#rpms">Installing RPMs</a></li>
1514
<li><a href="#debuild">Building Debian Packages</a></li>
16-
<li><a href="#debinstall">Installing Debian Packages</a></li>
15+
<li><a href="#pkg_install">Installing Packages</a></li>
1716
<li><a href="#manual_build">Building Manually</a></li>
1817
</ul>
1918
</li>
@@ -208,28 +207,6 @@ Some macro definitions that may be used in building Slurm include:
208207
%with_munge "--with-munge=/opt/munge"
209208
</pre>
210209

211-
<h3 id="rpms">RPMs Installed<a class="slurm_link" href="#rpms"></a></h3>
212-
213-
<p>The RPMs needed on the head node, compute nodes, and slurmdbd node can vary
214-
by configuration, but here is a suggested starting point:
215-
<ul>
216-
<li>Head Node (where the slurmctld daemon runs),<br>
217-
Compute and Login Nodes
218-
<ul>
219-
<li>slurm</li>
220-
<li>slurm-perlapi</li>
221-
<li>slurm-slurmctld (only on the head node)</li>
222-
<li>slurm-slurmd (only on the compute nodes)</li>
223-
</ul>
224-
</li>
225-
<li>SlurmDBD Node
226-
<ul>
227-
<li>slurm</li>
228-
<li>slurm-slurmdbd</li>
229-
</ul>
230-
</li>
231-
</ul>
232-
233210
<h3 id="debuild">Building Debian Packages
234211
<a class="slurm_link" href="#debuild"></a>
235212
</h3>
@@ -258,40 +235,67 @@ the packages:</p>
258235

259236
<p>The packages will be in the parent directory after debuild completes.</p>
260237

261-
<h3 id="debinstall">Installing Debian Packages
262-
<a class="slurm_link" href="#debinstall"></a>
238+
<h3 id="pkg_install">Installing Packages
239+
<a class="slurm_link" href="#pkg_install"></a>
263240
</h3>
264241

265-
<p>The packages needed on the head node, compute nodes, and slurmdbd node can
266-
vary site to site, but this is a good starting point:</p>
267-
<ul>
268-
<li>SlurmDBD Node
269-
<ul>
270-
<li>slurm-smd</li>
271-
<li>slurm-smd-slurmdbd</li>
272-
</ul>
273-
</li>
274-
<li>Head Node (slurmctld node)
275-
<ul>
276-
<li>slurm-smd</li>
277-
<li>slurm-smd-slurmctld</li>
278-
<li>slurm-smd-client</li>
279-
</ul>
280-
</li>
281-
<li>Compute Nodes (slurmd node)
282-
<ul>
283-
<li>slurm-smd</li>
284-
<li>slurm-smd-slurmd</li>
285-
<li>slurm-smd-client</li>
286-
</ul>
287-
</li>
288-
<li>Login Nodes
289-
<ul>
290-
<li>slurm-smd</li>
291-
<li>slurm-smd-client</li>
292-
</ul>
293-
</li>
294-
</ul>
242+
<p>The following packages are recommended to achieve basic functionality for the
243+
different <a href="#nodes">node types</a>. Other packages may be added to enable
244+
optional functionality:</p>
245+
246+
<table class="tlist">
247+
<tbody>
248+
<tr>
249+
<td id="rpms"><strong>RPM name</strong></td>
250+
<td id="debinstall"><strong>DEB name</strong></td>
251+
<td><a href="#login">Login</a></td>
252+
<td><a href="#ctld">Controller</a></td>
253+
<td><a href="#compute">Compute</a></td>
254+
<td><a href="#dbd">DBD</a></td>
255+
</tr>
256+
<tr>
257+
<td><code>slurm</code></td>
258+
<td><code>slurm-smd</code></td>
259+
<td><b>X</b></td>
260+
<td><b>X</b></td>
261+
<td><b>X</b></td>
262+
<td><b>X</b></td>
263+
</tr>
264+
<tr>
265+
<td><code>slurm-perlapi</code></td>
266+
<td><code>slurm-smd-client</code></td>
267+
<td><b>X</b></td>
268+
<td><b>X</b></td>
269+
<td><b>X</b></td>
270+
<td></td>
271+
</tr>
272+
<tr>
273+
<td><code>slurm-slurmctld</code></td>
274+
<td><code>slurm-smd-slurmctld</code></td>
275+
<td></td>
276+
<td><b>X</b></td>
277+
<td></td>
278+
<td></td>
279+
</tr>
280+
<tr>
281+
<td><code>slurm-slurmd</code></td>
282+
<td><code>slurm-smd-slurmd</code></td>
283+
<td></td>
284+
<td></td>
285+
<td><b>X</b></td>
286+
<td></td>
287+
</tr>
288+
<tr>
289+
<td><code>slurm-slurmdbd</code></td>
290+
<td><code>slurm-smd-slurmdbd</code></td>
291+
<td></td>
292+
<td></td>
293+
<td></td>
294+
<td><b>X</b></td>
295+
</tr>
296+
</tbody>
297+
</table>
298+
<br>
295299

296300
<h3 id="manual_build">Building Manually
297301
<a class="slurm_link" href="#manual_build"></a>
@@ -833,6 +837,6 @@ cd /usr/ports/sysutils/slurm-wlm && make install
833837
typical compute nodes. Installing from source allows the user to enable
834838
options such as mysql and gui tools via a configuration menu.</p>
835839

836-
<p style="text-align:center;">Last modified 31 October 2024</p>
840+
<p style="text-align:center;">Last modified 14 November 2024</p>
837841

838842
<!--#include virtual="footer.txt"-->

doc/html/related_software.shtml

+1-1
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,7 @@ time as performed with this tool:
173173
<a href="elasticsearch.html">jobcomp/elasticsearch</a>, and
174174
<a href="jobcomp_kafka.html">jobcomp/kafka</a>) parse and/or
175175
serialize JSON format data. These plugins and slurmrestd are designed to
176-
make use of the <b>JSON-C library (&gt;= v1.12.0)</b> for this purpose.
176+
make use of the <b>JSON-C library (&gt;= v0.15)</b> for this purpose.
177177
Instructions for the build are as follows:</p>
178178
<pre>
179179
git clone --depth 1 --single-branch -b json-c-0.15-20200726 https://github.com/json-c/json-c.git json-c

0 commit comments

Comments
 (0)