|
| 1 | +--- |
| 2 | +layout: page |
| 3 | +authors: ["Andrew Reid", Trevor Keller", "Jane Herriman"] |
| 4 | +teaser: "We ran the full user workshop at LLNL!" |
| 5 | +title: "HPC Carpentry at LLNL" |
| 6 | +date: 2024-08-13 |
| 7 | +time: "12:00:00" |
| 8 | +tags: ["HPC Carpentry", "Lesson Program Implementation"] |
| 9 | +--- |
| 10 | + |
| 11 | +## HPC Carpentry at LLNL |
| 12 | + |
| 13 | +In the first week of June, 2024, instructors from [HPC Carpentry][hpcc] |
| 14 | +taught our full workflow workshop for the first time. Over a four-day |
| 15 | +stint at Lawrence Livermore National Laboratory, we delivered this |
| 16 | +content not once, but twice! |
| 17 | + |
| 18 | +It was immensely rewarding to see all this material come together in |
| 19 | +one place. Traveling to teach in person, while not without hiccups, was |
| 20 | +extremely worthwhile. We believe we served our learners pretty well, and |
| 21 | +we learned a few lessons relevant to future workshops. |
| 22 | + |
| 23 | +### Workshop Structure |
| 24 | + |
| 25 | +Each workshop ran over two days. On the first day, we did the [Unix Shell |
| 26 | +intro][shell] lesson from Software Carpentry in the morning, and our own |
| 27 | +[HPC Intro][intro] lesson in the afternoon. On the second day, we did a |
| 28 | +variant of the [workflow lesson][work], adapted for the Maestro workflow |
| 29 | +tool (rather than Snakemake), because it is developed and used at LLNL. |
| 30 | + |
| 31 | +The instructor team consisted of Andrew Reid and Trevor Keller from |
| 32 | +the HPC Carpentry steering committee, and Jane Herriman from LLNL, |
| 33 | +along with helpers from the LLNL community. |
| 34 | + |
| 35 | +While split-terminal tools exist, we used vanilla [tmux][tmux] with two |
| 36 | +terminals attached to the same session. This allowed the instructors to type on |
| 37 | +their own laptop while referencing the lesson webpage and selectively sharing |
| 38 | +the terminal. Learners followed along on the enhanced terminal displayed at the |
| 39 | +front of the room. Note: to "scroll up" in `tmux`, press |
| 40 | +<kbd>Ctrl</kbd>+<kbd>b</kbd>, <kbd>[</kbd>, then arrow-key around. |
| 41 | + |
| 42 | +#### Maestro |
| 43 | + |
| 44 | +Maestro is a capable workflow engine, and one we would not have explored had |
| 45 | +Jane not ported the Snakemake lesson so expertly. Maestro favors |
| 46 | +reproducibility, running every step of the task from scratch at every |
| 47 | +invocation. This is a significant difference from Snakemake which, like Make, |
| 48 | +does not re-execute completed "targets." A significant benefit of Maestro is |
| 49 | +that the tool does not persist while jobs execute: it generates and submits |
| 50 | +native Slurm jobs, with tooling in place to check the status of running |
| 51 | +workflows. This is much more HPC-compatible, for large-scale or time-consuming |
| 52 | +jobs. |
| 53 | + |
| 54 | +### Learners |
| 55 | + |
| 56 | +Learners had a range of backgrounds, from undergraduate bio-informatics |
| 57 | +students to experienced Linux HPC users. The lessons generally went |
| 58 | +at a slightly faster pace than expected, without leaving anyone |
| 59 | +behind. This was in part because access to LLNL's system `Ruby` was by means |
| 60 | +of pre-authorized RSA tokens, removing a lot of the friction |
| 61 | +from the initial connection process that has been time-consuming in other |
| 62 | +versions of the workshop. The instructors live-coded plenty of mistakes, opening |
| 63 | +discussions on some interesting tangential topics. LLNL runs a pool of "login |
| 64 | +nodes" per HPC system, rather than a single machine, which made for interesting, |
| 65 | +early discussion of networked filesystems. The sheer number of nodes also made |
| 66 | +the output of `sinfo` tricky to comprehend at-a-glance, which is awesome. |
| 67 | + |
| 68 | +### Lesson Feedback |
| 69 | + |
| 70 | +One major take-away is that the workflow lesson in particular is |
| 71 | +vulnerable to learners losing the thread if they miss a step. This lesson, |
| 72 | +in either its Maestro or Snakemake version, builds up an increasingly |
| 73 | +sophisticated workflow specification file, incrementally demonstrating |
| 74 | +workflow concepts in the context of the tool. Consequently, a learner |
| 75 | +who misses a step and falls behind can find themselves unable to recover, |
| 76 | +since the remainder of the lesson builds on precisely the content that was |
| 77 | +missed. The Workflow lesson differs in this respect from the Shell and |
| 78 | +HPC intro lessons, where later steps can better stand on their own. |
| 79 | + |
| 80 | +The solution to this, which we already started to implement for the |
| 81 | +second workshop, was to have a shared online notepad with "checkpoint" |
| 82 | +versions of the file, to which learners can refer if they fall behind, |
| 83 | +with helpers bridging the content gap for them. Also, LLNL supports and |
| 84 | +uses the [`give`][give] tool, allowing users to easily pass files around: |
| 85 | +it's nifty! |
| 86 | + |
| 87 | +The hands-on Carpentries approach proved itself once again, building |
| 88 | +muscle memory and vocabulary in learners, who could then move on to their |
| 89 | +LLNL summer research projects with greater confidence in their ability |
| 90 | +to productively use the shared high-performance computing resources. |
| 91 | + |
| 92 | +For the project, it was confirmation that the HPC User workshop can |
| 93 | +work, including the valuable feedback about checkpoint files and a |
| 94 | +shared notepad. We look forward to teaching this workshop more, and |
| 95 | +getting it out of beta status and into our main curriculum. |
| 96 | + |
| 97 | +<!-- links --> |
| 98 | +[give]: https://github.com/hpc/give |
| 99 | +[hpcc]: https://hpc-carpentry.org/ |
| 100 | +[intro]: https://hpc-workshops.github.io/llnl-hpc-intro/ |
| 101 | +[shell]: https://swcarpentry.github.io/shell-novice |
| 102 | +[tmux]: https://github.com/tmux/tmux/wiki |
| 103 | +[work]: https://xorjane.github.io/maestro-workflow-lesson/ |
0 commit comments