Skip to content

Commit b6c206b

Browse files
committed
update documentation with details surrounding sedutil/dracut
1 parent 0046210 commit b6c206b

File tree

1 file changed

+74
-1
lines changed

1 file changed

+74
-1
lines changed

switch.md

Lines changed: 74 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Don't forget to grab an anti-static wrist strap for this build, they are cheaply
4343
- [AsRock Rack EPYCD8-2T](http://asrockrack.com/general/productdetail.asp?Model=EPYCD8-2T#Specifications)
4444
- [AsRock Rack TPM2-S Nuvoton NPCT650](https://www.asrockrack.com/general/productdetail.asp?Model=TPM2-S#Specifications)
4545
- ~4x32gb~ 8x32gb Crucial 2666Mhz ECC RDIMM CL19 (~`CT32G4RFD4266`~ <- I originally ordered four of this part number, and received `CT32G4RFD4266.36FB1`, but it turns out the [QVL](https://www.asrockrack.com/general/productdetail.asp?Model=EPYCD8-2T#Memory) for the EPYCD8-2T is even more specific and requires `CT32G4RFD4266.2G6H1.001`. I was sent `CT32G4RFD4266.36FD1` after ordering the QVL part, and this RAM worked.)
46-
- 2x[Western Digital SN570 500GB M.2](https://www.westerndigital.com/en-ca/products/internal-drives/wd-blue-sn570-nvme-ssd#WDS500G3B0C)
46+
- ~2x[Western Digital SN570 500GB M.2](https://www.westerndigital.com/en-ca/products/internal-drives/wd-blue-sn570-nvme-ssd#WDS500G3B0C)~ 2x[Crucial P5 Plus 1TB M.2](https://www.crucial.com/ssd/p5-plus/ct1000p5pssd8)
4747
- [Kingston A400 120GB 2.5"](https://www.kingston.com/en/ssd/a400-solid-state-drive)
4848
- 3.5" to 2.5" drive tray
4949
- [Dynatron A26](https://www.dynatron.co/product-page/a26)
@@ -294,6 +294,79 @@ so I should be able to omit one of the software layers. I'll update this after a
294294
a failure. I also had not enabled `clevis-luks-askpass.path` so it's possible that now that I have,
295295
the system can recover without intervention.
296296

297+
Edit: Well this was quite a chore. Read the next, new, section for details. Right now the system
298+
has been up for about 12 hours, and it typically fails under small loads (like it is) with many
299+
IO operations on the RAID array in a couple days. I'll post again after a failure or a substantial
300+
uptime.
301+
302+
```
303+
❯ uptime
304+
12:14:14 up 11:34, 1 user, load average: 2.14, 1.97, 1.98
305+
```
306+
307+
### SED Passwordless Boot via `dracut`, `clevis` and `sedutil-cli`
308+
309+
It turns out that using a TPM to unlock both SEDs and LUKS drives in an automated way with secure
310+
boot active is fairly tricky. To see how I accomplished this, read on.
311+
312+
I hammered my way through this in about 12 hours. The reason it took so long is that several times,
313+
to get out of an unbootable situation, I needed to:
314+
315+
1. Turn off secure boot
316+
1. Reinstall a basic OS on the LUKS drive
317+
1. Use the basic OS to mess with the SED using `sedutil-cli`
318+
1. Adjust scripts
319+
1. Build a new UEFI image.
320+
1. Turn on secure boot.
321+
1. Power down.
322+
1. Boot and most likely, repeat.
323+
324+
I had this process down to about 10 minutes in the end. The reboots were the time killer. 2 minutes
325+
every time.
326+
327+
Here is how the solution works:
328+
329+
- I used `dracut` modules to [install](./src/core-switch/scripts/security/sedutil/setup.sh) custom
330+
logic at boot.
331+
- The [module](./src/core-switch/scripts/security/sedutil/module-setup.sh) I created includes
332+
`sedutil-cli`, `argon2`, `clevis-tpm2` and associated libraries. It also includes tpm2-encrypted
333+
HDD passphrases for SEDs.
334+
- The [script](./src/core-switch/scripts/security/sedutil/unlock-sed.sh) that is invoked at boot
335+
follows this logic:
336+
- Check lock status.
337+
- Locked
338+
- Attempt to decrypt encrypted password
339+
- Success
340+
- Unlock SED
341+
- Done
342+
- Failure
343+
- Use `systemd-ask-password` to prompt the user for the user passphrase
344+
- Derive the HDD passphrase using `argon2`
345+
- Unlock SED
346+
- Done
347+
- Unlocked
348+
- Done
349+
350+
[This](./src/core-switch/scripts/store-hdd-passphrases.sh) is the script I use to regenerate LUKS
351+
and SED encrypted keys, any time the PCR values in the TPM change (when you upgrade the kernel or
352+
change a BIOS setting, for example).
353+
354+
It works! I had to adjust the TPM registers I was measuring to get it to consistently boot, I
355+
believe because I'm changing the UEFI image itself. I am considering putting the encrypted keys
356+
on the boot partition and mounting it during this process to see if I can include another PCR value.
357+
I changed from PCRs 0,1,2,3,4,5,6,7 to 0,2,3,4,6,7,8. I am having trouble finding a definitive
358+
answer on what these values represent, but through trial and error found a consistent set to
359+
measure.
360+
361+
About this solution: We use a null salt for the `argon2` extension, since we want to be able to
362+
recover from passphrase alone. The argon2 params run in about 10 seconds on my system, which is a
363+
bit much, but I am okay with it since passwordless boot just needs to TPM-decrypt the passphrase
364+
and unlock, there is no derivation necessary. To make the dracut module a bit nicer, one could add
365+
real checks in the `check` method of `module-setup.sh`. In reality though, this module not firing in
366+
my system would render it unbootable, so check merely provides feedback that everything expected is
367+
present when building the image, but it doesn't guarantee you didn't forget to add something you
368+
needed. Anyway, `check()` should be populated.
369+
297370
### Networking Setup
298371

299372
I made some changes to the basic networking config in `/etc/default/networking`:

0 commit comments

Comments
 (0)