-
Notifications
You must be signed in to change notification settings - Fork 433
Add healthcheck #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add healthcheck #300
Conversation
This is a first commit, the feature as a whole is WIP. The next steps are captured in a TODO at the tail of the Dockerfile. Sharing this early so that we can discuss the direction that this is going in. Various decisions made in the Dockerfile have been captured inline in comments, should make for a good PR discussion. The primary goal is to upgrade the Erlang/OTP version to latest stable, which is v21.2.2 at the time of this commit. RabbitMQ v3.7.x will stop supporting Erlang v20.x in September 2019 (~8 months from now). RabbitMQ v3.8.x will only support Erlang v21.x. RabbitMQ Erlang/OTP release support policy was announced on the rabbitmq-users mailing list in October 2018: https://groups.google.com/forum/#!msg/rabbitmq-users/G4UJ9zbIYHs/tyt_kDoFBgAJ The secondary goal is to only ship the required artefacts in the final image. For example, all Erlang/OTP applications & features which are not required by RabbitMQ are disabled. I suspect that the final Erlang/OTP release can be shrunk further (it currently stands 130MB), but this is a minor concern right now. Related to the secondary goal, we enable certain features in Erlang/OTP which are useful when debugging: * extra microstate accounting is not known to negatively affect performance, this feature is exposed via the `rabbitmq-diagnostics runtime_thread_stats` command added in v3.7.10 * lock counting is only enabled if the Erlang VM is started in a specific mode, this feature doesn't impact the default beam.smp runtime The final goal is to be explicit about the OpenSSL version that Erlang/OTP uses. Using a shared OpenSSL might be convenient, but it has the following drawbacks: * depending on the base image for OpenSSL updates * not knowing which OpenSSL version we compile against * not knowing how OpenSSL was configured * not being able to change OpenSSL configuration Compiling OpenSSL adds an extra concern and definitely complicates this Dockerfile, but one possible mitigation would be to automate version bumps when a new OpenSSL version gets published. I am also expecting that images will be automatically built & published from this Dockerfile. Since OpenSSL is compiled with all defaults, I do not expect things to stop working and become a maintenance overhead - we are not using any advanced compilation flags. I am including the full docker build log, I always find the information captured in build logs to be helpful. Resources which I found helpful while putting this Dockerfile together: * https://github.com/rabbitmq/rabbitmq-server-boshrelease/blob/816bb377a59975c461e1af72367f187edc39ad3d/packages/erlang-21.1/packaging * https://github.com/erlang/docker-erlang-otp/blob/e2e804aeeb6e6bc5fd49f66481be1dff829428f5/21/Dockerfile * https://github.com/erlang/docker-erlang-example#2-build-stage-1-create-a-minimal-docker-image * https://bugs.erlang.org/browse/ERL-823 * http://erlang.org/pipermail/erlang-questions/2019-January/097012.html * https://github.com/lrascao/erlang-ec2-build * https://github.com/kerl/kerl/blob/master/kerl
Multi-stage builds are a great feature: https://docs.docker.com/develop/develop-images/multistage-build/
@tianon I've left a few questions for you in the Dockerfile as TODOs. A few highlighlights: * I've added capture the way I build this image locally in the build script * ha.pool.sks-keyservers.net is not as stable as pgpkeys.eu, there are many unstable PGP keyservers https://sks-keyservers.net/status/ * GitHub SSL was failing in wget when grabbing gosu, curl is more reliable * docker-entrypoint.sh fails if rabbitmq-plugins is not invoked with the -q flag, a fix since 3.7.10 rabbitmq/rabbitmq-server-boshrelease@2da9884#commitcomment-31470432
RabbitMQ healthchecks are hard, as captured by @michaelklishin in #174 (comment) and followed-up in rabbitmq/rabbitmq-cli#292 This is an attempt to simplify the inherent complexity by taking the first step in what will likely become a new rabbitmqctl diagnostics command. To follow-up on this idea, join rabbitmq/rabbitmq-cli#292. Things that we might want to discuss: * is 30s a reasonable amount to wait before starting a RabbitMQ node for healthines? * should we account for nodes that can take a long time to boot? * should we retry longer before considering the node unhealthy? * is 3s a reasonable command timeout? * should we improve the active listeners check? Superceeds #174 Related to docker-library/docs#1395
From my perspective, they are both outside-of-this-container concerns
People which will be interested in this: @michaelklishin People which might be interested in this: @lukebakken @acogoluegnes @MarcialRosales @mkuratczyk |
Since there isn't a "one-size-fits-all" check, I'd still rather stick to documenting possible options instead of forcing one on everybody. |
While some users might want a more comprehensive check, a coarse default which will be accurate in all except edge cases is better than no healthcheck. Having read the Healthcheck FAQ, I understand the reasoning and will play within the constraints of the ecosystem. @yosifkit is it worth contributing the rationale from this PR into https://github.com/docker-library/docs/tree/master/rabbitmq ? |
FWIW RabbitMQ CLI tools are being extended to make a number of health checks to be one liners or a combination of a few one liners. Then Docker and Kubernetes users would be able to pick the "stage" they want and easily use it as their healthcheck/liveness probe. |
All but one of the new health check commands (see |
Worth mentioning here: Kubernetes seems to be moving past the One True Health Check™ idea and towards a list of both generic and system-specific checks. |
RabbitMQ healthchecks are hard, as captured by @michaelklishin in
#174 (comment)
and followed-up in rabbitmq/rabbitmq-cli#292
This is an attempt to simplify the inherent complexity by taking the
first step in what will likely become a new rabbitmqctl diagnostics
command. To follow-up on this idea, join rabbitmq/rabbitmq-cli#292.
Things that we might want to discuss:
Superceeds #174
Related to docker-library/docs#1395
I've based this on PR #297 since the current debian image won't
docker build
due to openssl failures, as captured by the CI.