Skip to content

Commit

Permalink
Add alert ClusterCrossplaneResourcesNotReady for Crossplane resourc…
Browse files Browse the repository at this point in the history
…es that are critical for clusters (#1482)

Co-authored-by: Jose Armesto <[email protected]>
  • Loading branch information
AndiDog and fiunchinho authored Feb 10, 2025
1 parent fdf2914 commit bc15e71
Show file tree
Hide file tree
Showing 3 changed files with 62 additions and 0 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Add alert `ClusterCrossplaneResourcesNotReady` for Crossplane resources that are critical for clusters

### Fixed

- fix capi-kubeadmconfig rule for hybrid providers
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
{{- include "labels.common" . | nindent 4 }}
name: cluster-crossplane.rules
namespace: {{ .Values.namespace }}
spec:
groups:
- name: cluster-crossplane
rules:
- alert: ClusterCrossplaneResourcesNotReady
annotations:
# Crossplane doesn't offer object names and the objects are stored on the MC, so right
# now (2025-01), we can't make this alert WC-specific.
description: '{{`Not all managed Crossplane resources of type "{{ $labels.gvk }}" on {{ $labels.cluster_id }} are ready. This could affect creation or health of workload clusters.`}}'
opsrecipe: cluster-crossplane-resources
# Match critical resources deployed by cluster-aws via aws-nth-crossplane-resources,
# cilium-crossplane-resources, ...
expr: crossplane_managed_resource_exists{gvk=~".*Kind=(Queue|QueuePolicy|Role|Rule|SecurityGroup|SecurityGroupEgressRule|SecurityGroupIngressRule|Target)"} != crossplane_managed_resource_ready{gvk=~".*Kind=(Queue|QueuePolicy|Role|Rule|SecurityGroup|SecurityGroupEgressRule|SecurityGroupIngressRule|Target)"}
for: 15m
labels:
area: kaas
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_status_updating: "true"
cancel_if_outside_working_hours: {{ include "workingHoursOnly" . }}
severity: page
team: phoenix
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
rule_files:
- cluster-crossplane.rules.yml

tests:
- interval: 1m
input_series:
- series: 'crossplane_managed_resource_exists{gvk="cloudwatchevents.aws.upbound.io/v1beta1, Kind=Rule", cluster_id="mymc"}'
values: "6x20"
- series: 'crossplane_managed_resource_ready{gvk="cloudwatchevents.aws.upbound.io/v1beta1, Kind=Rule", cluster_id="mymc"}'
values: "5x20"

alert_rule_test:
- alertname: ClusterCrossplaneResourcesNotReady
eval_time: 20m
exp_alerts:
- exp_labels:
area: kaas
cancel_if_cluster_status_creating: "true"
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_status_updating: "true"
cancel_if_outside_working_hours: "false"
cluster_id: "mymc"
gvk: "cloudwatchevents.aws.upbound.io/v1beta1, Kind=Rule"
severity: page
team: phoenix
exp_annotations:
description: 'Not all managed Crossplane resources of type "cloudwatchevents.aws.upbound.io/v1beta1, Kind=Rule" on mymc are ready. This could affect creation or health of workload clusters.'
opsrecipe: cluster-crossplane-resources

0 comments on commit bc15e71

Please sign in to comment.