-
Notifications
You must be signed in to change notification settings - Fork 4k
pkg/*: Refactor metric visibility handling across the codebase #158376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
369ecb0 to
236deb6
Compare
arjunmahishi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aa-joshi, @asg0451, @dhartunian, and @kev-cao)
Metrics change detectedThis PR adds or updates one or more CRDB metrics. If you want these metrics to be exported by CRDB Cloud clusters to Internal CRL Datadog and/or included in the customer metric export integration (Essential metrics for standard deployment, and Essential metrics for advanced deployment), refer to this Installation and Usage guide of a CLI tool that syncs the metric mappings in managed-service. Run this CLI tool after your CRDB PR is merged.
Note: Your metric will appear in Internal CRL Datadog only after the managed-service PR merges and the new OTel configuration rolls out to at least one cluster running a CRDB build that includes this metric. Docs: cockroach-metric-sync Questions: reach out to @obs-india-prs |
aa-joshi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aa-joshi reviewed 25 of 30 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @asg0451, @dhartunian, and @kev-cao)
-- commits line 4 at r1:
Lets add details about why are we introducing the field and impact of it on existing metrics which are tagged under essential.
pkg/util/metric/metric.proto line 101 at r1 (raw file):
// dashboards) and helps categorize metrics by their relevance to different users. enum MetricVisibility { // INTERNAL is used for metrics that are only useful to CockroachDB developers
nit: We can omit Cockroachdb developers here.
pkg/util/metric/metric.proto line 107 at r1 (raw file):
INTERNAL = 0; // SUPPORT is used for metrics needed to help Support Engineers diagnose user issues.
nit: We can omit Support Engineers.
"SUPPORT is used for metrics needed to diagnose production issues."
pkg/util/metric/metric.proto line 113 at r1 (raw file):
// ESSENTIAL is used for metrics that are both critical for end users and core // to observability.
ESSENTIAL is used for metrics which are critical to track the health of the cluster. When metrics is marked ESSENTIAL:
- metric is included in tsdump
- metrics in included in documentations, dashboards and monitoring pipelines.
We can update the SUPPORT in similar format.
pkg/util/metric/metric.proto line 120 at r1 (raw file):
// metrics" that can be found at // https://github.com/cockroachlabs/cockroachdb-runbook-template/blob/main/monitoring-alerts/monitoring-dashboard-custom.md ESSENTIAL = 2;
Lets make sure that we are not regressing anything due to this change.
ff2b56d to
9b7250d
Compare
Abhinav1299
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aa-joshi, @asg0451, @dhartunian, and @kev-cao)
Previously, aa-joshi (Akshay Joshi) wrote…
Lets add details about why are we introducing the field and impact of it on existing metrics which are tagged under essential.
Updated
pkg/util/metric/metric.proto line 101 at r1 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
nit: We can omit Cockroachdb developers here.
Updated
pkg/util/metric/metric.proto line 107 at r1 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
nit: We can omit Support Engineers.
"SUPPORT is used for metrics needed to diagnose production issues."
Updated
pkg/util/metric/metric.proto line 113 at r1 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
ESSENTIAL is used for metrics which are critical to track the health of the cluster. When metrics is marked ESSENTIAL:
- metric is included in tsdump
- metrics in included in documentations, dashboards and monitoring pipelines.
We can update the SUPPORT in similar format.
Updated
pkg/util/metric/metric.proto line 120 at r1 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
Lets make sure that we are not regressing anything due to this change.
Yup, I've removed all the dependency of Essential and replaced it with the updated Visibility label.
Moreover, once this is merged. I'll let the docs team know about this change so that they can work on the documentation accordingly.
aa-joshi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aa-joshi reviewed 2 of 30 files at r1, 10 of 12 files at r2, 1 of 1 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asg0451, @dhartunian, and @kev-cao)
-- commits line 18 at r3:
Lets remove numbers as it is subject to change in future.
pkg/util/metric/metric.proto line 101 at r3 (raw file):
// dashboards) and helps categorize metrics by their relevance to different users. enum MetricVisibility { // INTERNAL is used for metrics which are only useful for internal debugging purposes.
Will this be a default value if not passed? How do we treat the metrics if visibility is not passed?
pkg/kv/kvserver/metrics.go line 881 at r3 (raw file):
) // Update L0-level-size metric visibility to SUPPORT _ = func() struct{} {
Let's extend the storageLevelMetricMetadata to handle visibility instead of handling in custom way.
This PR replaces the binary essential boolean field on metrics with a three-level MetricVisibility enum, providing better granularity for metric categorization and enabling more targeted metric exports. This change addresses the need to differentiate between metrics used for internal debugging, support troubleshooting, and essential customer monitoring. Previously, CockroachDB metrics used a binary essential boolean field to categorize metrics as either user-facing or not. This was insufficient for several use cases to exclude internal debugging metrics while still including support-relevant ones in our toolings like tsdump. For the impact, All metrics previously marked as `Essential: true` are now `Visibility: metric.Metadata_ESSENTIAL` with identical semantics and no functional changes to existing essential metrics. Our downstream tooling of tsdump will collect all these essential and support metrics for its non-verbose run. Significantly reducing the size of overall tsdump. Part of: CRDB-57261 Epic: CRDB-55082 Release note: None
9b7250d to
5705b37
Compare
Abhinav1299
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aa-joshi, @asg0451, @dhartunian, and @kev-cao)
Previously, aa-joshi (Akshay Joshi) wrote…
Lets remove numbers as it is subject to change in future.
Done
pkg/kv/kvserver/metrics.go line 881 at r3 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
Let's extend the storageLevelMetricMetadata to handle visibility instead of handling in custom way.
Done
Abhinav1299
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aa-joshi, @asg0451, @dhartunian, and @kev-cao)
pkg/util/metric/metric.proto line 101 at r3 (raw file):
Previously, aa-joshi (Akshay Joshi) wrote…
Will this be a default value if not passed? How do we treat the metrics if visibility is not passed?
Yes, The default will be INTERNAL.
When we create a struct without initializing a field, it gets the zero value for that type. For enums (which are int32 types), the zero value is 0. Since Metadata_INTERNAL = 0, any metric created without setting Visibility will have INTERNAL visibility.
This PR replaces the binary essential boolean field on metrics with a
three-level MetricVisibility enum of INTERNAL, SUPPORT, or ESSENTIAL,
providing better granularity for metric categorization and enabling more
targeted metric exports. This change addresses the need to differentiate
between metrics used for internal debugging, support troubleshooting,
and essential customer monitoring.
Previously, CockroachDB metrics used a binary essential boolean field
to categorize metrics as either user-facing or not. This was insufficient
for several use cases to exclude internal debugging metrics while still
including support-relevant ones in our toolings like tsdump.
For the impact, All metrics previously marked as
Essential: truearenow
Visibility: metric.Metadata_ESSENTIALwith identical semantics andno functional changes to existing essential metrics. Our downstream
tooling of tsdump will collect all these essential and support metrics
for its non-verbose run. Significantly reducing the size of overall tsdump.
Part of: CRDB-57261
Epic: CRDB-55082
Release note: None