Skip to content

fix: use rabbitmq length for RabbitMQNodeDown #1579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: stackhpc/2024.1
Choose a base branch
from

Conversation

jackhodgkiss
Copy link
Contributor

The RabbitMQNodeDown made the assumption that all deployments involve three controllers. However, this is not always the case as we do support deployments with a single controller or more than three controllers.

Before this would have caused false alerts in deployments with a single controller. Whilst also concealing alerts in deployments with more than three controllers.

@jackhodgkiss jackhodgkiss self-assigned this Mar 17, 2025
@jackhodgkiss jackhodgkiss requested a review from a team as a code owner March 17, 2025 13:21
@product-auto-label product-auto-label bot added size: xs monitoring All things related to observability & telemetry labels Mar 17, 2025
@@ -6,7 +6,7 @@ groups:
- name: rabbitmq.rules
rules:
- alert: RabbitMQNodeDown
expr: sum(rabbitmq_build_info{instance!=""}) < 3
expr: sum(rabbitmq_build_info{instance!=""}) < {% endraw %}{{ groups['controllers'] | length }}{% raw %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use the rabbitmq group from here instead? https://github.com/openstack/kayobe/blob/master/ansible/roles/kolla-ansible/templates/overcloud-components.j2#L62

Just thinking this wouldn't work if anyone has moved RabbitMQ to a different group

@@ -6,7 +6,7 @@ groups:
- name: rabbitmq.rules
rules:
- alert: RabbitMQNodeDown
expr: sum(rabbitmq_build_info{instance!=""}) < 3
expr: sum(rabbitmq_build_info{instance!=""}) < {% endraw %}{{ groups['controllers'] | length }}{% raw %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could potentially use the rabbitmq group here since the templating is within a raw tag and thus templated by kolla-ansible

The `RabbitMQNodeDown` made the assumption that all deployments involve
only three RabbitMQ nodes. However, this is not always the case as we
do support deployments with a single node or more than three.

Before this would have caused false alerts in deployments with a single
RabbitMQ node. Whilst also concealing alerts in deployments with more
than three nodes.
@jackhodgkiss jackhodgkiss force-pushed the fix-rabbitmq-node-down-rule branch from 61b564c to e183052 Compare March 23, 2025 12:39
@jackhodgkiss jackhodgkiss requested review from jovial and MoteHue March 23, 2025 12:40
@jackhodgkiss jackhodgkiss changed the title fix: use controller length for RabbitMQNodeDown fix: use rabbitmq length for RabbitMQNodeDown Mar 24, 2025
Copy link
Contributor

@MoteHue MoteHue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks!

@jackhodgkiss jackhodgkiss marked this pull request as draft March 24, 2025 22:54
@jackhodgkiss
Copy link
Contributor Author

This fails to template correctly.

  - alert: RabbitMQNodeDown
    expr: sum(rabbitmq_build_info{instance!=""}) < {{ groups['rabbitmq'] | length }}
    for: 30m
    labels:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
monitoring All things related to observability & telemetry size: xs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants