Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates main to the revert of release 2024-05-09 #2431

Merged
merged 10 commits into from
Jun 6, 2024
43 changes: 43 additions & 0 deletions alert-policies/network-flow-devices/Flow Destinations Baseline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Name of the alert
name: Flow Destinations Baseline

# Description and details
description: |+
This alert is triggered when the unique count of 'Destination:Port' endpoints for a Flow Device fluctuates more than 2 standard deviations above or below baseline for over 5 minutes.
This is a measurement on the total number of destinations for your traffic and can be an associated metric to throughput signals from your applications.

# Type of alert
type: BASELINE

# NRQL query
nrql:
# Baseline alerts can use an optional FACET
query: "FROM KFlow SELECT uniqueCount(dst_addr, l4_dst_port) FACET entity.name, entity.guid"

# Direction in which baseline is set (Default: LOWER_ONLY)
baselineDirection: UPPER_AND_LOWER

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 2
# Time in seconds; 120 - 3600, must be a multiple of 60 for Baseline conditions
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Loss of Signal Settings
expiration:
# Close open violations if signal is lost (Default: false)
closeViolationsOnExpiration: true
# Open "Loss of Signal" violation if signal is lost (Default: false)
openViolationOnExpiration: true
# Time in seconds; Max value: 172800 (48hrs), null if closeViolationsOnExpiration and openViolationOnExpiration are both 'false'
expirationDuration: 86400

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
43 changes: 43 additions & 0 deletions alert-policies/network-flow-devices/Flow Sources Baseline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Name of the alert
name: Flow Sources Baseline

# Description and details
description: |+
This alert is triggered when the unique count of 'Source:Port' endpoints for a Flow Device fluctuates more than 2 standard deviations above or below baseline for over 5 minutes.
This is a measurement on the total number of sources for your traffic and can be an associated metric to throughput signals from your applications.

# Type of alert
type: BASELINE

# NRQL query
nrql:
# Baseline alerts can use an optional FACET
query: "FROM KFlow SELECT uniqueCount(src_addr, l4_src_port) FACET entity.name, entity.guid"

# Direction in which baseline is set (Default: LOWER_ONLY)
baselineDirection: UPPER_AND_LOWER

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 2
# Time in seconds; 120 - 3600, must be a multiple of 60 for Baseline conditions
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Loss of Signal Settings
expiration:
# Close open violations if signal is lost (Default: false)
closeViolationsOnExpiration: true
# Open "Loss of Signal" violation if signal is lost (Default: false)
openViolationOnExpiration: true
# Time in seconds; Max value: 172800 (48hrs), null if closeViolationsOnExpiration and openViolationOnExpiration are both 'false'
expirationDuration: 86400

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
2 changes: 1 addition & 1 deletion alert-policies/prometheus-agent/DupMetrics.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: |+
This alert is triggered if two or more jobs scraping the same instance in the same cluster.
type: STATIC
nrql:
query: "FROM Metric select uniqueCount(job) WHERE metricName LIKE 'prometheus%' facet instance, cluster_name"
query: "FROM Metric select uniqueCount(job) facet instance, cluster_name"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE
Expand Down
Loading
Loading