|
2 | 2 | id: post-powercut
|
3 | 3 | aliases:
|
4 | 4 | - Post-powercut Todo List
|
5 |
| -tags: [] |
| 5 | +tags: |
| 6 | + - powercut |
| 7 | + - todo |
6 | 8 | created: 2023-12-05T01:36:11
|
7 |
| -modified: 2024-01-31T08:23:37 |
8 |
| -title: Post-Powercut Todo List |
| 9 | +modified: 2024-09-27T13:35:28 |
| 10 | +title: post-powercut |
9 | 11 | ---
|
10 | 12 |
|
11 | 13 | # Post-powercut Todo List
|
12 | 14 |
|
13 | 15 | A list of things that should be done/checked immediately after a power cut:
|
14 | 16 |
|
15 |
| -- Check KVM, hit ctrl+D on minerva to make sure it boots. |
16 |
| -- Check KVM, hit F1 on sprout to make sure it boots |
17 |
| -- Check KVM, sometimes you need to press F1 on carbon for it to boot |
18 |
| -- Stop Exim on the mail server (Morpheus) until minerva (NFS) is online. |
19 |
| -- If LDAP is down, you'll need to use the ALOM to do the next step. |
20 |
| -- Check that ldapclient started (svcs -xv). If it didn't, run svcadm clear ldap/client to make it start. This usually happens because murphy comes back before morpheus does, and the LDAP client won't start due to lack of an LDAP server. |
21 |
| -- Apache on [hardcase](../hardware/nix/hardcase.md) sometimes tries to start before networking is finished starting. To fix it, disable/re-enable it a few times. This usually makes it turn on. |
22 |
| -- [paphos](../hardware/paphos.md) is old and sometimes its time will become out of sync. To make sure its time is accurate, run: |
| 17 | +- Ensure the [`aperture`](../hardware/aperture/index.md) servers have the correct IP addresses: |
| 18 | + - `eno1` should have the internal IP address (`10.10.0.0/24`) - this should be reserved by DHCP on [`mordor`](../hardware/network/mordor.md) |
| 19 | + - `eno2` should have *no IP address* |
| 20 | + - `br0` should have the external IP address (`136.206.16.0/24`) - this should also be reserved by DHCP on [`mordor`](../hardware/network/mordor.md) |
| 21 | +- If the [`bastion-vm`](../services/bastion-vm.md) fails to start, check: |
| 22 | + - `/storage` is mounted `rw` on each [`aperture`](docs/hardware/aperture/index.md) server |
| 23 | + - `br0` is present and configured on each [`aperture`](docs/hardware/aperture/index.md) server |
| 24 | + - `vm-resources.service.consul` is running and `http://vm-resources.service.consul:8000/bastion/bastion-vm-latest.qcow2` is accessible |
| 25 | + - if the `latest` symlink points to a corrupted image, `ln -sf` it to an earlier one |
| 26 | +- All the [`nixos`](docs/procedures/nixos.md) boxes rely on [`DNS`](docs/services/bind.md) for [`ldap`](docs/services/ldap.md) and [`nfs`](docs/services/nfs.md): |
| 27 | + - Make sure bind is running on [`paphos`](docs/hardware/paphos.md) |
| 28 | + - mount `/storage` |
| 29 | + - `systemctl restart` `httpd`, `php-fpm-rbusers-*` and `ldap` |
| 30 | +- Apache on [`hardcase`](../hardware/nix/hardcase.md) sometimes tries to start before networking is finished starting. To fix it, disable/re-enable it a few times. This usually makes it turn on. |
| 31 | +- [`paphos`](../hardware/paphos.md) is old and sometimes its time will become out of sync. To make sure its time is accurate, run: |
23 | 32 |
|
24 | 33 | ```bash
|
25 | 34 | sudo service ntp restart
|
|
0 commit comments