Skip to content

Commit 0038c99

Browse files
committed
add some "cluster management" doc
1 parent e49962a commit 0038c99

File tree

1 file changed

+102
-1
lines changed

1 file changed

+102
-1
lines changed

README.md

+102-1
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,107 @@ nginx example (in addition to SSL setup):
109109
}
110110
```
111111

112+
112113
# Cluster management
113114

114-
TODO
115+
## Perequisites
116+
117+
Install autossh using package manager, and "dwq" from pip3.
118+
119+
120+
## Setup
121+
122+
1. set up ssh authentication to ci.riot-os.org.
123+
124+
E.g., add this to `~/.ssh/config`:
125+
126+
```
127+
Host murdock
128+
HostName ci.riot-os.org
129+
User murdock-slave
130+
Port 22
131+
IdentityFile ~/.ssh/id_rsa_murdock-slave
132+
IdentitiesOnly yes
133+
LocalForward 7711 127.0.0.1:7711
134+
LocalForward 6379 127.0.0.1:6379
135+
ServerAliveInterval 60
136+
ServerAliveCountMax 2
137+
```
138+
139+
Make sure `~/.ssh/id_rsa_murdock-slave` can log in to `[email protected]`.
140+
141+
2. keep an ssh connection open that forwards the ports 7711 and 6379.
142+
143+
E.g., use this alias and "autossh":
144+
145+
$ alias dwq_connect='autossh -M0 -N -C -f murdock'
146+
147+
Then start up autossh with `dwq_connect` (automate this or repeat for each session).
148+
149+
150+
## dwqm (dwq management utility)
151+
152+
Try "dwqm --help".
153+
154+
Useful things:
155+
156+
- list all queues in the disque instance:
157+
158+
$ dwqm queue --list
159+
160+
This is a raw queue listing and includes queues used internally by dwq. Those
161+
are named "control::*" and "status::*".
162+
163+
- list all connected workers:
164+
165+
$ dwqm control --list
166+
167+
- set worker(s) to "paused", will not run any jubs until resumed or restarted:
168+
169+
$ dwqm control --pause worker1 [worker2] ...
170+
171+
- resume worker(s):
172+
173+
$ dwqm control --resume worker1 [worker2] ...
174+
175+
- shutdown worker(s) (with our current murdock scripts, this will shutdown the
176+
worker, pull the newest build container, then __restart__ the worker):
177+
178+
$ dwqm control --shutdown worker1 [worker2] ...
179+
180+
## dwqc (dwq client, runs jobs on queue)
181+
182+
In our setup, every build worker listens on the "default" queue. Those workers
183+
are executing inside of the build container.
184+
185+
Every test worker listens on a queue named after the board it is connected to,
186+
e.g., "samr21-xpro", "nrf52dk" or "esp32-wroom-32".
187+
188+
__every__ worker also listens on a queue named after it's hostname
189+
190+
For example, in our setup, "riotbuild" listens on the queues "default" and
191+
"riotbuild", "pi-36f90aef" listend on "pi-36f90aef" and "nrf52dk".
192+
193+
`dwqc` needs a git repo and commit either as parameters or via environment.
194+
Either manually set "DWQ_REPO" and "DWQ_COMMIT", or use an alias:
195+
196+
$ alias dwqset='export DWQ_REPO=https://github.com/RIOT-OS/RIOT DWQ_COMMIT=$(git rev-parse HEAD)'
197+
$ cd src/riot
198+
$ dwqset # following dwqc jobs will now be executed in the specified checkout
199+
200+
201+
Run a single job on the queue named "default":
202+
203+
$ dwqc "echo hello world!"
204+
205+
Run a single job on a specific queue:
206+
207+
$ dwqc -q riotbuild "ccache -s"
208+
209+
Run multiple jobs on a single queue:
210+
211+
$ for i in $(seq 10); do echo "echo $i"; done | dwqc -q queue_name
212+
213+
Create command from stdin plus base command:
214+
215+
$ echo "first second third" | dwqc -s "echo \${1}" # will create job "echo first"

0 commit comments

Comments
 (0)