Skip to content

Commit 7d57e58

Browse files
authored
Merge pull request #88 from hpc-carpentry/llnl_blog_post
LLNL blog post
2 parents b71898a + cd00c96 commit 7d57e58

File tree

1 file changed

+103
-0
lines changed

1 file changed

+103
-0
lines changed
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
---
2+
layout: page
3+
authors: ["Andrew Reid", Trevor Keller", "Jane Herriman"]
4+
teaser: "We ran the full user workshop at LLNL!"
5+
title: "HPC Carpentry at LLNL"
6+
date: 2024-08-13
7+
time: "12:00:00"
8+
tags: ["HPC Carpentry", "Lesson Program Implementation"]
9+
---
10+
11+
## HPC Carpentry at LLNL
12+
13+
In the first week of June, 2024, instructors from [HPC Carpentry][hpcc]
14+
taught our full workflow workshop for the first time. Over a four-day
15+
stint at Lawrence Livermore National Laboratory, we delivered this
16+
content not once, but twice!
17+
18+
It was immensely rewarding to see all this material come together in
19+
one place. Traveling to teach in person, while not without hiccups, was
20+
extremely worthwhile. We believe we served our learners pretty well, and
21+
we learned a few lessons relevant to future workshops.
22+
23+
### Workshop Structure
24+
25+
Each workshop ran over two days. On the first day, we did the [Unix Shell
26+
intro][shell] lesson from Software Carpentry in the morning, and our own
27+
[HPC Intro][intro] lesson in the afternoon. On the second day, we did a
28+
variant of the [workflow lesson][work], adapted for the Maestro workflow
29+
tool (rather than Snakemake), because it is developed and used at LLNL.
30+
31+
The instructor team consisted of Andrew Reid and Trevor Keller from
32+
the HPC Carpentry steering committee, and Jane Herriman from LLNL,
33+
along with helpers from the LLNL community.
34+
35+
While split-terminal tools exist, we used vanilla [tmux][tmux] with two
36+
terminals attached to the same session. This allowed the instructors to type on
37+
their own laptop while referencing the lesson webpage and selectively sharing
38+
the terminal. Learners followed along on the enhanced terminal displayed at the
39+
front of the room. Note: to "scroll up" in `tmux`, press
40+
<kbd>Ctrl</kbd>+<kbd>b</kbd>, <kbd>[</kbd>, then arrow-key around.
41+
42+
#### Maestro
43+
44+
Maestro is a capable workflow engine, and one we would not have explored had
45+
Jane not ported the Snakemake lesson so expertly. Maestro favors
46+
reproducibility, running every step of the task from scratch at every
47+
invocation. This is a significant difference from Snakemake which, like Make,
48+
does not re-execute completed "targets." A significant benefit of Maestro is
49+
that the tool does not persist while jobs execute: it generates and submits
50+
native Slurm jobs, with tooling in place to check the status of running
51+
workflows. This is much more HPC-compatible, for large-scale or time-consuming
52+
jobs.
53+
54+
### Learners
55+
56+
Learners had a range of backgrounds, from undergraduate bio-informatics
57+
students to experienced Linux HPC users. The lessons generally went
58+
at a slightly faster pace than expected, without leaving anyone
59+
behind. This was in part because access to LLNL's system `Ruby` was by means
60+
of pre-authorized RSA tokens, removing a lot of the friction
61+
from the initial connection process that has been time-consuming in other
62+
versions of the workshop. The instructors live-coded plenty of mistakes, opening
63+
discussions on some interesting tangential topics. LLNL runs a pool of "login
64+
nodes" per HPC system, rather than a single machine, which made for interesting,
65+
early discussion of networked filesystems. The sheer number of nodes also made
66+
the output of `sinfo` tricky to comprehend at-a-glance, which is awesome.
67+
68+
### Lesson Feedback
69+
70+
One major take-away is that the workflow lesson in particular is
71+
vulnerable to learners losing the thread if they miss a step. This lesson,
72+
in either its Maestro or Snakemake version, builds up an increasingly
73+
sophisticated workflow specification file, incrementally demonstrating
74+
workflow concepts in the context of the tool. Consequently, a learner
75+
who misses a step and falls behind can find themselves unable to recover,
76+
since the remainder of the lesson builds on precisely the content that was
77+
missed. The Workflow lesson differs in this respect from the Shell and
78+
HPC intro lessons, where later steps can better stand on their own.
79+
80+
The solution to this, which we already started to implement for the
81+
second workshop, was to have a shared online notepad with "checkpoint"
82+
versions of the file, to which learners can refer if they fall behind,
83+
with helpers bridging the content gap for them. Also, LLNL supports and
84+
uses the [`give`][give] tool, allowing users to easily pass files around:
85+
it's nifty!
86+
87+
The hands-on Carpentries approach proved itself once again, building
88+
muscle memory and vocabulary in learners, who could then move on to their
89+
LLNL summer research projects with greater confidence in their ability
90+
to productively use the shared high-performance computing resources.
91+
92+
For the project, it was confirmation that the HPC User workshop can
93+
work, including the valuable feedback about checkpoint files and a
94+
shared notepad. We look forward to teaching this workshop more, and
95+
getting it out of beta status and into our main curriculum.
96+
97+
<!-- links -->
98+
[give]: https://github.com/hpc/give
99+
[hpcc]: https://hpc-carpentry.org/
100+
[intro]: https://hpc-workshops.github.io/llnl-hpc-intro/
101+
[shell]: https://swcarpentry.github.io/shell-novice
102+
[tmux]: https://github.com/tmux/tmux/wiki
103+
[work]: https://xorjane.github.io/maestro-workflow-lesson/

0 commit comments

Comments
 (0)