|
2 | 2 | id: post-powercut |
3 | 3 | aliases: |
4 | 4 | - Post-powercut Todo List |
5 | | -tags: [] |
| 5 | +tags: |
| 6 | + - powercut |
| 7 | + - todo |
6 | 8 | created: 2023-12-05T01:36:11 |
7 | | -modified: 2024-01-31T08:23:37 |
8 | | -title: Post-Powercut Todo List |
| 9 | +modified: 2024-09-27T13:35:28 |
| 10 | +title: post-powercut |
9 | 11 | --- |
10 | 12 |
|
11 | 13 | # Post-powercut Todo List |
12 | 14 |
|
13 | 15 | A list of things that should be done/checked immediately after a power cut: |
14 | 16 |
|
15 | | -- Check KVM, hit ctrl+D on minerva to make sure it boots. |
16 | | -- Check KVM, hit F1 on sprout to make sure it boots |
17 | | -- Check KVM, sometimes you need to press F1 on carbon for it to boot |
18 | | -- Stop Exim on the mail server (Morpheus) until minerva (NFS) is online. |
19 | | -- If LDAP is down, you'll need to use the ALOM to do the next step. |
20 | | -- Check that ldapclient started (svcs -xv). If it didn't, run svcadm clear ldap/client to make it start. This usually happens because murphy comes back before morpheus does, and the LDAP client won't start due to lack of an LDAP server. |
21 | | -- Apache on [hardcase](../hardware/nix/hardcase.md) sometimes tries to start before networking is finished starting. To fix it, disable/re-enable it a few times. This usually makes it turn on. |
22 | | -- [paphos](../hardware/paphos.md) is old and sometimes its time will become out of sync. To make sure its time is accurate, run: |
| 17 | +- Ensure the [`aperture`](../hardware/aperture/index.md) servers have the correct IP addresses: |
| 18 | + - `eno1` should have the internal IP address (`10.10.0.0/24`) - this should be reserved by DHCP on [`mordor`](../hardware/network/mordor.md) |
| 19 | + - `eno2` should have *no IP address* |
| 20 | + - `br0` should have the external IP address (`136.206.16.0/24`) - this should also be reserved by DHCP on [`mordor`](../hardware/network/mordor.md) |
| 21 | +- If the [`bastion-vm`](../services/bastion-vm.md) fails to start, check: |
| 22 | + - `/storage` is mounted `rw` on each [`aperture`](docs/hardware/aperture/index.md) server |
| 23 | + - `br0` is present and configured on each [`aperture`](docs/hardware/aperture/index.md) server |
| 24 | + - `vm-resources.service.consul` is running and `http://vm-resources.service.consul:8000/bastion/bastion-vm-latest.qcow2` is accessible |
| 25 | + - if the `latest` symlink points to a corrupted image, `ln -sf` it to an earlier one |
| 26 | +- All the [`nixos`](docs/procedures/nixos.md) boxes rely on [`DNS`](docs/services/bind.md) for [`ldap`](docs/services/ldap.md) and [`nfs`](docs/services/nfs.md): |
| 27 | + - Make sure bind is running on [`paphos`](docs/hardware/paphos.md) |
| 28 | + - mount `/storage` |
| 29 | + - `systemctl restart` `httpd`, `php-fpm-rbusers-*` and `ldap` |
| 30 | +- Apache on [`hardcase`](../hardware/nix/hardcase.md) sometimes tries to start before networking is finished starting. To fix it, disable/re-enable it a few times. This usually makes it turn on. |
| 31 | +- [`paphos`](../hardware/paphos.md) is old and sometimes its time will become out of sync. To make sure its time is accurate, run: |
23 | 32 |
|
24 | 33 | ```bash |
25 | 34 | sudo service ntp restart |
|
0 commit comments