Skip to content

Commit b802f1c

Browse files
[Documentation:System] Repair Services Cron Job Documentation (#679)
Updates to the System Customization, Automated Grading, and Websockets / System & Debugging documentation pages have been implemented to document how services are restarted automatically via the cron job introduced within [#11566](Submitty/Submitty#11566), how to disable this hourly script, and how to set automatic notifications for service failure outputs. This makes progress on [#11622](Submitty/Submitty#11622).
1 parent ca387ac commit b802f1c

File tree

3 files changed

+27
-4
lines changed

3 files changed

+27
-4
lines changed

_docs/developer/development_instructions/automated_grading.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,12 +118,14 @@ number:
118118

119119
---
120120

121-
## Debugging
121+
## Debugging
122122

123123
To debug new features for autograding, it can be helpful to run
124124
`submitty_autograding_shipper.py` and `submitty_autograding_worker.py`
125125
interactively and inspect the output.
126126

127+
_NOTE: A cron job runs hourly to detect autograding shipper outages on the primary machine. To avoid interference during debugging, this job should be disabled before proceeding. See [Capture Cron Error Messages](/sysadmin/installation/system_customization#capture-cron-error-messages) for instructions on disabling the script._
128+
127129
To do this:
128130

129131
1. Stop the daemons (on each server, as appropriate)

_docs/sysadmin/installation/system_customization.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,25 @@ You may want to back up more of `/var/local/submitty` to save configurations and
2828

2929
## Capture cron error messages
3030

31-
The `submitty_daemon` user runs the [sbin/send_email.py](https://github.com/Submitty/Submitty/blob/master/sbin/send_email.py)
32-
script. Console output from this script can be emailed to a sysadmin to help ensure that errors can be reported and addressed.
31+
To ensure the reliability of the various Submitty services, such as the WebSocket server, their health status is monitored and restarted hourly via the [sbin/repair_services.sh](https://github.com/Submitty/Submitty/blob/master/sbin/repair_services.sh) script run by the `submitty_daemon` user. This script leverages `systemctl` along with various health-check utility scripts to verify the active state of these services, triggering a restart if an inactive state is detected.
3332

34-
The first line should be set as `MAILTO=` with a valid email address. For example:
33+
Service failures can occur for various reasons, including unhandled exceptions, memory leaks, port binding issues, or OS-level disruptions such as resource exhaustion. All failures are logged with their relevant timestamp, source, and last output within the `/var/log/services` directory for the given day in the format `YYYYMMDD.txt`.
34+
35+
To disable this auto-repair mechanism, comment out the relevant line in the source `.setup/submitty_crontab` file within your repository. Since the crontab is auto-generated during installation, any changes must be followed by a re-run of `submitty_install` to persist them.
36+
37+
```bash
38+
# In .setup/submitty_crontab, comment out the repair_services.sh line:
39+
# 0 * * * * submitty_daemon sudo /usr/local/submitty/sbin/repair_services.sh
40+
41+
# Then re-apply the configuration:
42+
$ submitty_install
43+
```
44+
45+
_Note: This mechanism should only be disabled with caution in production environments._
46+
47+
The `submitty_daemon` user runs a variety of other scripts, such as [sbin/send_email.py](https://github.com/Submitty/Submitty/blob/master/sbin/send_email.py) to send pending emails every minute. Console output from these scripts can be emailed to a sysadmin to help ensure that errors can be reported and addressed.
48+
49+
The first line of the relevant script should be set as `MAILTO=` with a valid email address, as shown below.
3550
```
3651
3752
* * * * * python3 /usr/local/submitty/sbin/send_email.py

_docs/sysadmin/troubleshooting/system_debugging.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,12 @@ redirect_from:
6262
/var/log/nginx/error.log
6363
```
6464

65+
* Look for errors in the daily service outage log
66+
67+
```
68+
/var/local/submitty/logs/services/YYYYMMDD.txt
69+
```
70+
6571
* Check the SSL keys / certificates for apache & nginx.
6672
Look for ssl key & certificate files specified in the enabled
6773
`.conf` files for apache & nginx:

0 commit comments

Comments
 (0)