Add healthcheck #300

gerhard · 2019-01-14T18:29:38Z

RabbitMQ healthchecks are hard, as captured by @michaelklishin in
#174 (comment)
and followed-up in rabbitmq/rabbitmq-cli#292

This is an attempt to simplify the inherent complexity by taking the
first step in what will likely become a new rabbitmqctl diagnostics
command. To follow-up on this idea, join rabbitmq/rabbitmq-cli#292.

Things that we might want to discuss:

is 30s a reasonable amount to wait before starting a RabbitMQ node for healthines?
should we account for nodes that can take a long time to boot?
should we retry longer before considering the node unhealthy?
is 3s a reasonable command timeout?
should we improve the active listeners check?

Superceeds #174
Related to docker-library/docs#1395

I've based this on PR #297 since the current debian image won't docker build due to openssl failures, as captured by the CI.

This is a first commit, the feature as a whole is WIP. The next steps are captured in a TODO at the tail of the Dockerfile. Sharing this early so that we can discuss the direction that this is going in. Various decisions made in the Dockerfile have been captured inline in comments, should make for a good PR discussion. The primary goal is to upgrade the Erlang/OTP version to latest stable, which is v21.2.2 at the time of this commit. RabbitMQ v3.7.x will stop supporting Erlang v20.x in September 2019 (~8 months from now). RabbitMQ v3.8.x will only support Erlang v21.x. RabbitMQ Erlang/OTP release support policy was announced on the rabbitmq-users mailing list in October 2018: https://groups.google.com/forum/#!msg/rabbitmq-users/G4UJ9zbIYHs/tyt_kDoFBgAJ The secondary goal is to only ship the required artefacts in the final image. For example, all Erlang/OTP applications & features which are not required by RabbitMQ are disabled. I suspect that the final Erlang/OTP release can be shrunk further (it currently stands 130MB), but this is a minor concern right now. Related to the secondary goal, we enable certain features in Erlang/OTP which are useful when debugging: * extra microstate accounting is not known to negatively affect performance, this feature is exposed via the `rabbitmq-diagnostics runtime_thread_stats` command added in v3.7.10 * lock counting is only enabled if the Erlang VM is started in a specific mode, this feature doesn't impact the default beam.smp runtime The final goal is to be explicit about the OpenSSL version that Erlang/OTP uses. Using a shared OpenSSL might be convenient, but it has the following drawbacks: * depending on the base image for OpenSSL updates * not knowing which OpenSSL version we compile against * not knowing how OpenSSL was configured * not being able to change OpenSSL configuration Compiling OpenSSL adds an extra concern and definitely complicates this Dockerfile, but one possible mitigation would be to automate version bumps when a new OpenSSL version gets published. I am also expecting that images will be automatically built & published from this Dockerfile. Since OpenSSL is compiled with all defaults, I do not expect things to stop working and become a maintenance overhead - we are not using any advanced compilation flags. I am including the full docker build log, I always find the information captured in build logs to be helpful. Resources which I found helpful while putting this Dockerfile together: * https://github.com/rabbitmq/rabbitmq-server-boshrelease/blob/816bb377a59975c461e1af72367f187edc39ad3d/packages/erlang-21.1/packaging * https://github.com/erlang/docker-erlang-otp/blob/e2e804aeeb6e6bc5fd49f66481be1dff829428f5/21/Dockerfile * https://github.com/erlang/docker-erlang-example#2-build-stage-1-create-a-minimal-docker-image * https://bugs.erlang.org/browse/ERL-823 * http://erlang.org/pipermail/erlang-questions/2019-January/097012.html * https://github.com/lrascao/erlang-ec2-build * https://github.com/kerl/kerl/blob/master/kerl

@michaelklishin

Thanks @michaelklishin!

Multi-stage builds are a great feature: https://docs.docker.com/develop/develop-images/multistage-build/

@tianon

@tianon I've left a few questions for you in the Dockerfile as TODOs. A few highlighlights: * I've added capture the way I build this image locally in the build script * ha.pool.sks-keyservers.net is not as stable as pgpkeys.eu, there are many unstable PGP keyservers https://sks-keyservers.net/status/ * GitHub SSL was failing in wget when grabbing gosu, curl is more reliable * docker-entrypoint.sh fails if rabbitmq-plugins is not invoked with the -q flag, a fix since 3.7.10 rabbitmq/rabbitmq-server-boshrelease@2da9884#commitcomment-31470432

@michaelklishin

RabbitMQ healthchecks are hard, as captured by @michaelklishin in #174 (comment) and followed-up in rabbitmq/rabbitmq-cli#292 This is an attempt to simplify the inherent complexity by taking the first step in what will likely become a new rabbitmqctl diagnostics command. To follow-up on this idea, join rabbitmq/rabbitmq-cli#292. Things that we might want to discuss: * is 30s a reasonable amount to wait before starting a RabbitMQ node for healthines? * should we account for nodes that can take a long time to boot? * should we retry longer before considering the node unhealthy? * is 3s a reasonable command timeout? * should we improve the active listeners check? Superceeds #174 Related to docker-library/docs#1395

From my perspective, they are both outside-of-this-container concerns

gerhard · 2019-01-14T18:30:06Z

People which will be interested in this: @michaelklishin

People which might be interested in this: @lukebakken @acogoluegnes @MarcialRosales @mkuratczyk

yosifkit · 2019-01-15T00:18:26Z

Since there isn't a "one-size-fits-all" check, I'd still rather stick to documenting possible options instead of forcing one on everybody.

See also https://github.com/docker-library/faq#healthcheck.

gerhard · 2019-01-15T09:30:59Z

While some users might want a more comprehensive check, a coarse default which will be accurate in all except edge cases is better than no healthcheck.

Having read the Healthcheck FAQ, I understand the reasoning and will play within the constraints of the ecosystem.

@yosifkit is it worth contributing the rationale from this PR into https://github.com/docker-library/docs/tree/master/rabbitmq ?

michaelklishin · 2019-01-18T23:51:09Z

FWIW RabbitMQ CLI tools are being extended to make a number of health checks to be one liners or a combination of a few one liners. Then Docker and Kubernetes users would be able to pick the "stage" they want and easily use it as their healthcheck/liveness probe.

michaelklishin · 2019-01-24T12:04:15Z

All but one of the new health check commands (see rabbitmq-diagnostics --help) will be available as of RabbitMQ 3.7.11.

michaelklishin · 2019-02-05T15:59:03Z

Worth mentioning here: Kubernetes seems to be moving past the One True Health Check™ idea and towards a list of both generic and system-specific checks.

gerhard added 8 commits January 9, 2019 13:56

Keep the Docker community as maintainer for now

d83147b

Thanks @michaelklishin!

Start a new stage and only copy the build artefacts required by RabbitMQ

230e301

Multi-stage builds are a great feature: https://docs.docker.com/develop/develop-images/multistage-build/

Install RabbitMQ, slim down Erlang/OTP & OpenSSL artefacts

6c450a2

Capture RabbitMQ minor version in the build script

ea55853

Keep VOLUME & EXPOSE statements together

c13487f

From my perspective, they are both outside-of-this-container concerns

michaelklishin mentioned this pull request Jan 14, 2019

Add a healthcheck script #174

Closed

gerhard closed this Jan 15, 2019

gerhard mentioned this pull request Jan 15, 2019

Improve RabbitMQ HEALTHCHECK docker-library/healthcheck#17

Merged

gerhard deleted the healthcheck branch February 4, 2019 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add healthcheck #300

Add healthcheck #300

gerhard commented Jan 14, 2019

gerhard commented Jan 14, 2019 •

edited

Loading

yosifkit commented Jan 15, 2019

gerhard commented Jan 15, 2019

michaelklishin commented Jan 18, 2019

michaelklishin commented Jan 24, 2019

michaelklishin commented Feb 5, 2019

Add healthcheck #300

Add healthcheck #300

Conversation

gerhard commented Jan 14, 2019

gerhard commented Jan 14, 2019 • edited Loading

yosifkit commented Jan 15, 2019

gerhard commented Jan 15, 2019

michaelklishin commented Jan 18, 2019

michaelklishin commented Jan 24, 2019

michaelklishin commented Feb 5, 2019

gerhard commented Jan 14, 2019 •

edited

Loading