Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding MariaDB dashboard and alert conditions #2139

Merged
merged 24 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
9053091
Adding MariaDB dashboard and alert conditions
harrykimpel Nov 3, 2023
274539e
Merge branch 'newrelic:main' into main
harrykimpel Nov 3, 2023
f382b6b
Adding MariaDB dashboard and alert conditions
harrykimpel Nov 6, 2023
d9800bb
Merge branch 'newrelic:main' into main
harrykimpel Nov 6, 2023
31cbbcd
Changing filenames to lowercase with dashes
harrykimpel Nov 10, 2023
8260bf1
Changing filenames to lowercase with dashes
harrykimpel Nov 10, 2023
a45ea62
Adjusted MariaDB logo url
harrykimpel Nov 10, 2023
f07d8a5
Adding k6 Prometheus remote write
harrykimpel Nov 10, 2023
34ff2b2
Adding k6 Prometheus remote write
harrykimpel Nov 10, 2023
e4236ca
Updating k6 Prometheus dashboard image and data source
harrykimpel Nov 11, 2023
456dd5e
Merge branch 'main' into main
sarahkitten Nov 14, 2023
bfb21fd
Removing the id from the new k6-prometheus quickstart
harrykimpel Nov 16, 2023
111d243
Merge branch 'main' of https://github.com/harrykimpel/newrelic-quicks…
harrykimpel Nov 16, 2023
606805e
Updating the k6 prometheus
harrykimpel Nov 16, 2023
80d32e2
Merge branch 'main' into main
sarahkitten Nov 16, 2023
3e4173d
Updating the k6 prometheus
harrykimpel Nov 16, 2023
7272510
Updating the k6 prometheus
harrykimpel Nov 16, 2023
501fdfd
Updating MariaDB dashboard images and k6 prometheus config
harrykimpel Nov 23, 2023
d0f329a
Updating MariaDB alert condition description
harrykimpel Nov 23, 2023
11b017a
Minor changes to descriptions
harrykimpel Dec 12, 2023
e174873
Changed k6 logo from png to svg
harrykimpel Dec 12, 2023
64c2f2c
Merge branch 'newrelic:main' into main
harrykimpel Dec 12, 2023
7c48450
Removing permissions field
harrykimpel Dec 18, 2023
f12ba9d
Merge branch 'main' into main
aswanson-nr Jan 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,6 @@ snapshots/

# yarn
yarn.lock
.yarn-integrity
.yarn-integrity
yarn-error.log
utils/yarn-error.log
27 changes: 27 additions & 0 deletions alert-policies/mariadb/innodb-pending-reads-and-writes.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: InnoDB Pending Reads and Writes

description: |+
This alert is triggered when the aggregate number of pending reads and writes in the MySQL buffer pool is greater than 2 for 5 minutes, which indicates the database engine is backlogged and waiting on resources.

type: STATIC
nrql:
query: "FROM MysqlSample SELECT max(db.innodb.dataPendingReads) + max(db.innodb.dataPendingWrites) FACET displayName"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 2
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
29 changes: 29 additions & 0 deletions alert-policies/mariadb/max-connection-errors-per-second.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Max Connection Errors per Second

description: |+
This alert is triggered when there is at least one error against the max_connections limit in a 5 minute window, which indicates you have requests to your MariaDB instance that are failing to connect.
This setting's default is 151, but can vary based on the underlying resources available to your instance. You can review your current max_connections limit with this query:
SHOW VARIABLES LIKE 'max_connections';

type: STATIC
nrql:
query: "FROM MysqlSample SELECT max(net.connectionErrorsMaxConnectionsPerSecond) FACET displayName"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 1
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: AT_LEAST_ONCE

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
59 changes: 59 additions & 0 deletions alert-policies/mariadb/questions-per-second.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: Questions per Second

description: |+
This alert is triggered when the current rate of Questions is greater than 2 standard deviations above the baseline for 60s, which could be an early indicator of a saturation problem for your instance.
It is important to note that this alert is disabled by default and you need to edit the configuration in New Relic One to add a targeted MySQL instance:
"WHERE displayName = 'MySql Instance Name'"
This allows the baseline to be calculated against a single instance instead of all running MySQL instances being monitored.

type: BASELINE
nrql:
# Cannot use FACET in Baseline alerts
query: "FROM MysqlSample SELECT average(query.questionsPerSecond)"

# Direction in which baseline is set (Default: LOWER_ONLY)
baselineDirection: UPPER_ONLY

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 2
# Time in seconds; 120 - 3600, must be a multiple of 60 for Baseline conditions
thresholdDuration: 120
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Adding a Warning threshold is optional
- priority: WARNING
operator: ABOVE
threshold: 1
thresholdDuration: 300
thresholdOccurrences: ALL

# Loss of Signal Settings
expiration:
# Close open violations if signal is lost (Default: false)
closeViolationsOnExpiration: false
# Open "Loss of Signal" violation if signal is lost (Default: false)
openViolationOnExpiration: false
# Time in seconds; Max value: 172800 (48hrs), null if closeViolationsOnExpiration and openViolationOnExpiration are both 'false'
expirationDuration:

# Advanced Signal Settings
signal:
# Max Value for Baseline conditions = 20
evaluationOffset: 3
# Type of value that should be used to fill gaps
fillOption: NONE
# Integer; Used in conjunction with STATIC fillOption, otherwise null
fillValue:

# OPTIONAL: URL of runbook to be sent with notification
runbookUrl:
aswanson-nr marked this conversation as resolved.
Show resolved Hide resolved

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
29 changes: 29 additions & 0 deletions alert-policies/mariadb/slow-queries-per-second.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Slow Queries per Second

description: |+
This alert is triggered when the number of slow queries per second is greater than 5 for 5 minutes, which could indicate capacity issues or a query that has been changed and is experiencing performance issues.
The Slow_queries counter increments based on your settings applied to MySQL's long_query_time parameter (default 10s), which you can review with this query:
SHOW VARIABLES LIKE 'long_query_time';

type: STATIC
nrql:
query: "FROM MysqlSample SELECT average(query.slowQueriesPerSecond) FACET displayName"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 5
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
27 changes: 27 additions & 0 deletions alert-policies/redis/blocked-clients.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Blocked clients alert

description: |+
This alert is triggered when at least one blocked client occurs.

type: STATIC
nrql:
query: "SELECT sum(`net.blockedClients`) FROM RedisSample facet entityName"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 0
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
62 changes: 62 additions & 0 deletions alert-policies/redis/current-connections-anomaly.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
name: Anomalies in current connections

# Description and details
description: |
This alert is triggered when the number of current connections deviates from the norm either up or down.

# Type of alert: BASELINE | STATIC
type: BASELINE

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# NRQL query
nrql:
query: "SELECT max(`net.connectedClients`) FROM RedisSample facet entityName"

# Direction in which baseline is set (Default: LOWER_ONLY)
baselineDirection: UPPER_AND_LOWER

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 30
# Time in seconds; 120 - 3600, must be a multiple of 60 for Baseline conditions
thresholdDuration: 3600
# How many data points must be in violation for the duration
thresholdOccurrences: AT_LEAST_ONCE

# Adding a Warning threshold is optional
- priority: WARNING
operator: ABOVE
threshold: 5
thresholdDuration: 300
thresholdOccurrences: AT_LEAST_ONCE

# Loss of Signal Settings
expiration:
# Close open violations if signal is lost (Default: false)
closeViolationsOnExpiration: false
# Open "Loss of Signal" violation if signal is lost (Default: false)
openViolationOnExpiration: false
# Time in seconds; Max value: 172800 (48hrs), null if closeViolationsOnExpiration and openViolationOnExpiration are both 'false'
expirationDuration:
aswanson-nr marked this conversation as resolved.
Show resolved Hide resolved

# Advanced Signal Settings
signal:
# Max Value for Baseline conditions = 20
evaluationOffset: 3
# Type of value that should be used to fill gaps
fillOption: NONE
# Integer; Used in conjunction with STATIC fillOption, otherwise null
fillValue:

# OPTIONAL: URL of runbook to be sent with notification
runbookUrl:
aswanson-nr marked this conversation as resolved.
Show resolved Hide resolved

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
Loading