Skip to content

Commit ad293d2

Browse files
authored
Merge pull request #160 from cdoern/ilab-processes
Introduce dev-doc for process management
2 parents c601d42 + 4e1479c commit ad293d2

File tree

2 files changed

+70
-0
lines changed

2 files changed

+70
-0
lines changed

.spellcheck-en-custom.txt

+6
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Containerfile
2929
cpp
3030
cuBLAS
3131
CUDA
32+
ctrl
3233
customizations
3334
CVE
3435
CVEs
@@ -132,6 +133,7 @@ Params
132133
Pareja
133134
PEFT
134135
Pereira
136+
PID
135137
PlantUML
136138
PLOS
137139
pluggable
@@ -189,8 +191,10 @@ Standup
189191
subcommand
190192
subcommands
191193
subdirectory
194+
subprocess
192195
Sudalairaj
193196
supportability
197+
systemd
194198
Taj
195199
tatsu
196200
TBD
@@ -210,6 +214,8 @@ ui
210214
unquantized
211215
unstaged
212216
USM
217+
UUID
218+
UUIDs
213219
UX
214220
venv
215221
Vishnoi

docs/cli/ilab-processes.md

+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Processes in InstructLab
2+
3+
The ability to detach from processes is crucial to the user experience of InstructLab. However, the concept of multi-processing, process management, and the monitoring of processes is very complex.
4+
5+
It is important to try and add this concept in as simply as possible, expanding on the state reporting, logging, and other features as we go along.
6+
7+
## Phased approach to InstructLab Processes
8+
9+
This document is going to describe phase 1 of implementing processes in InstructLab. Phase 1 is to be described as the "ilab simple process management system". This will depend purely on python packages, PID tracking, and log files to create the experience of detachable processes. The key here is the concept of the UUID, allowing a future REST API to keep track of InstructLab processes using these unique identifiers.
10+
11+
We can re-visit all this in phase 2, when we discuss if we want to utilize something like systemd or a more in-depth process-monitor repo to track processes.
12+
13+
### Phase 1
14+
15+
Phase one would focus on adding the concept of detaching from processes, re-attaching to them, and managing the various artifacts from the processes.
16+
17+
Process management would only apply to `ilab data generate` and `ilab model train` in a first iteration. This would be followed by commands like `ilab model evaluate`, `ilab model serve`, and `ilab model download`. All of these commands have long running processes that would benefit from detachment.
18+
19+
The workflow would allow for:
20+
21+
`ilab data generate -dt` (run a detached generation process)
22+
`ilab model train -dt` (run a detached training process)
23+
24+
`ilab process list`
25+
26+
```console=
27+
+------------+-------+--------------------------------------+------------------------------------------------------------------------------------------------------------------+----------+
28+
| Type | PID | UUID | Log File | Runtime |
29+
+------------+-------+--------------------------------------+------------------------------------------------------------------------------------------------------------------+----------+
30+
| Generation | 39832 | 82d00a5b-5ed5-4cfd-9a75-a87e4f420b27 | /Users/charliedoern/.local/share/instructlab/logs/generation/generation-82d00a5b-5ed5-4cfd-9a75-a87e4f420b27.log | 69:26:28 |
31+
| Generation | 40791 | 09f9d301-4fd9-4045-bfda-8a56f1d96016 | /Users/charliedoern/.local/share/instructlab/logs/generation/generation-09f9d301-4fd9-4045-bfda-8a56f1d96016.log | 68:45:40 |
32+
| Generation | 47390 | 4ccabfa5-604f-49c6-b5c3-730ce328d62a | /Users/charliedoern/.local/share/instructlab/logs/generation/generation-4ccabfa5-604f-49c6-b5c3-730ce328d62a.log | 67:26:33 |
33+
| Generation | 50872 | 093ac2e9-080c-45fe-89c5-43d508d6369c | /Users/charliedoern/.local/share/instructlab/logs/generation/generation-093ac2e9-080c-45fe-89c5-43d508d6369c.log | 05:24:56 |
34+
+------------+-------+--------------------------------------+------------------------------------------------------------------------------------------------------------------+----------+
35+
```
36+
37+
`ilab process attach <UUID>`
38+
39+
This command would re-attach to the given process, allowing to user to view the live logs of the process. `attach` would trail the log file and listen for user-input to kill the process.
40+
41+
These commands will be done in a very simple way at first using the following architecture:
42+
43+
1. a detached process be re-attachable by tailing the log file and then allowing the user to ctrl+c the process as normal using `KeyboardInterrupt`
44+
2. The process registry will be maintained for tracking UUIDs created via the `uuid` python package, the PID of the actual process, a `log_file` where the process will be outputting its logs to so that the user can re-attach, and the start time of the process. The log file directory will be tracked using our `DEFAULTS` package and will be standard throughout releases.
45+
46+
The general flow would be:
47+
48+
1. a user runs `ilab data generate -dt`
49+
2. a UUID, PID, and log file is added to the process registry.
50+
3. the process would exit, and print the UUID of the sdg run
51+
4. a user could attach to this process using `ilab process attach <UUID>`.
52+
5. This command would look in the process registry for the PID and/or UUID, get the log file, tail the log file, and listen for a ctrl+c keyboard interrupt.
53+
54+
This allows us to detach from processes while still running them in the background and maintain log files all without the use of anything other than UUID and subprocess.
55+
56+
#### Log file management
57+
58+
If existing log files from the various libraries exist, those will be used in this scenario. If they do not, InstructLab will manage writing process logs to disk. Regardless of whether the libraries maintain their own log file, InstructLab will need to co-locate the log files in a centralized directory.
59+
60+
If a log file exists, it will be copied and renamed into the following directory format:
61+
62+
`~/.local/share/instructlab/logs/<command_name>/<command_name>-<timestamp>.log`
63+
64+
If the log file does not exist, InstructLab will create one with this format. Libraries are responsible for standardizing where their logs are stored if they already exist so the Core package can access them in a uniform fashion and copy them to the proper directory.

0 commit comments

Comments
 (0)