You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fixesrails#262: Automatic worker process recycling
This PR adds two new configuration parameters:
* recycle_on_oom to the Worker (via queue.yml)
* calc_memory_usage as a global parameter (via application.rb,
environment/*.rb, or an initializer)
There are no specific unit requirements placed on either of these new
parameters. What's important is: They use the same order of magnitude
and they are comparable.
For example, if the calc_memory_usage proc returns 300Mb as 300 (as
in Megabytes) then the recycle_on_oom set on the work should be 300 too.
Any worker without recycle_on_oom is not impacted in anyway.
If the calc_memory_usage is nil (default), then this oom
checking it off for workers under the control of this Supervisor.
The check for OOM is made after the Job has run to completion and
before the SolidQueue worker does any additional processing.
The single biggest change to SolidQueue, that probably requires
the most review is moving the job.unblock_next_blocked_job out of
ClaimedExecution and up one level into Pool. The rational
for this change is that the ensure block on the Job execution
is not guarrenteed to run if the system / thread is forcibly shutdown
while the job is inflight. However, the Thread.ensure *does* seem
to get called reliably on forced shutdowns.
Give my almost assuredly incomplete understanding of the concurrency
implementation despite Rosa working very hard to help me to grok it,
there is some risk here that this change is wrong.
My logic for this change is as follows:
* A job that complete successfully would have release its lock -- no
change
* A job that completes by way of an unhandled exception would have
released its lock -- no change
* A job that was killed inflight because of a worker recycle_on_oom
(or an ugly restart out of the users control -- again, looking
at you Heroku) needs to release its lock -- there is no guarantee
that its going to be the job that starts on the worker restart. If
release its lock in this use-case, then it doesn't, then that worker
could find itself waiting on the dispatcher (I think) to expire
Semaphores before it is able to take on new work.
Small fix
Copy file name to clipboardExpand all lines: README.md
+107-3
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,6 @@ For small projects, you can run Solid Queue on the same machine as your webserve
53
53
54
54
**Note**: future changes to the schema will come in the form of regular migrations.
55
55
56
-
57
56
### Single database configuration
58
57
59
58
Running Solid Queue in a separate database is recommended, but it's also possible to use one single database for both the app and the queue. Just follow these steps:
@@ -99,7 +98,6 @@ By default, Solid Queue will try to find your configuration under `config/queue.
99
98
bin/jobs -c config/calendar.yml
100
99
```
101
100
102
-
103
101
This is what this configuration looks like:
104
102
105
103
```yml
@@ -236,6 +234,7 @@ There are several settings that control how Solid Queue works that you can set a
236
234
- `preserve_finished_jobs`: whether to keep finished jobs in the `solid_queue_jobs` table—defaults to `true`.
237
235
- `clear_finished_jobs_after`: period to keep finished jobs around, in case `preserve_finished_jobs` is true—defaults to 1 day. **Note:** Right now, there's no automatic cleanup of finished jobs. You'd need to do this by periodically invoking `SolidQueue::Job.clear_finished_in_batches`, but this will happen automatically in the near future.
238
236
- `default_concurrency_control_period`: the value to be used as the default for the `duration` parameter in [concurrency controls](#concurrency-controls). It defaults to 3 minutes.
237
+
- `calc_memory_usage`: a proc returns the memory consumption of the process(es) that you want to measure. It yields the Worker process PID and runs in the context of the Worker that is configured with `recycle_on_oom`. [Read more](#memory-consumption).
239
238
240
239
## Errors when enqueuing
241
240
@@ -428,7 +427,112 @@ my_periodic_resque_job:
428
427
schedule: "*/5 * * * *"
429
428
```
430
429
431
-
and the job will be enqueued via `perform_later` so it'll run in Resque. However, in this case we won't track any `solid_queue_recurring_execution` record for it and there won't be any guarantees that the job is enqueued only once each time.
430
+
and the job will be enqueued via `perform_later` so it'll run in Resque. However, in this case we won't track any
431
+
`solid_queue_recurring_execution`record for it and there won't be any guarantees that the job is enqueued only once
432
+
each time.
433
+
434
+
## Recycle On OOM
435
+
436
+
This feature recycles / restarts a worker whenever it exceeds the specified memory threshold. This is particularly
437
+
useful for jobs with high memory consumption or when deploying in a memory-constrained environment.
438
+
439
+
If the result of the `calc_memory_usage` Proc is greater than the `recycle_on_oom` value configured on a specific
440
+
worker, that worker will restart. It's important that the units returned by the `calc_memory_usage` Proc match the units
441
+
of the `recycle_on_oom` value.
442
+
For instance, if the `calc_memory_usage` Proc returns a value MB (i.e., 300 Vs. 300_000_000), the `recycle_on_oom` value
443
+
should also be specified in MB.
444
+
445
+
Using the `get_process_memory` gem, and configuring it return an integer value in MB, you can configure SolidQueue as
0 commit comments