Skip to content

Conversation

@Abhinav1299
Copy link
Contributor

@Abhinav1299 Abhinav1299 commented Nov 26, 2025

This PR replaces the binary essential boolean field on metrics with a
three-level MetricVisibility enum of INTERNAL, SUPPORT, or ESSENTIAL,
providing better granularity for metric categorization and enabling more
targeted metric exports. This change addresses the need to differentiate
between metrics used for internal debugging, support troubleshooting,
and essential customer monitoring.

Previously, CockroachDB metrics used a binary essential boolean field
to categorize metrics as either user-facing or not. This was insufficient
for several use cases to exclude internal debugging metrics while still
including support-relevant ones in our toolings like tsdump.

For the impact, All metrics previously marked as Essential: true are
now Visibility: metric.Metadata_ESSENTIAL with identical semantics and
no functional changes to existing essential metrics. Our downstream
tooling of tsdump will collect all these essential and support metrics
for its non-verbose run. Significantly reducing the size of overall tsdump.

Part of: CRDB-57261
Epic: CRDB-55082
Release note: None

@Abhinav1299 Abhinav1299 requested review from a team as code owners November 26, 2025 12:21
@Abhinav1299 Abhinav1299 requested review from aa-joshi, arjunmahishi and dhartunian and removed request for a team November 26, 2025 12:21
@blathers-crl
Copy link

blathers-crl bot commented Nov 26, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@Abhinav1299 Abhinav1299 requested review from asg0451, kev-cao and yuzefovich and removed request for a team November 26, 2025 12:22
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich removed their request for review November 26, 2025 17:54
@tbg tbg removed the request for review from a team November 27, 2025 15:44
Copy link
Contributor

@arjunmahishi arjunmahishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aa-joshi, @asg0451, @dhartunian, and @kev-cao)

@blathers-crl
Copy link

blathers-crl bot commented Nov 28, 2025

Metrics change detected

This PR adds or updates one or more CRDB metrics. If you want these metrics to be exported by CRDB Cloud clusters to Internal CRL Datadog and/or included in the customer metric export integration (Essential metrics for standard deployment, and Essential metrics for advanced deployment), refer to this Installation and Usage guide of a CLI tool that syncs the metric mappings in managed-service. Run this CLI tool after your CRDB PR is merged.

  • The CLI opens a PR in managed-service with the required config changes.
  • Please track that PR and ensure it merges so your metrics become available to CRDB Cloud clusters.

Note: Your metric will appear in Internal CRL Datadog only after the managed-service PR merges and the new OTel configuration rolls out to at least one cluster running a CRDB build that includes this metric.

Docs: cockroach-metric-sync

Questions: reach out to @obs-india-prs

Copy link
Contributor

@aa-joshi aa-joshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aa-joshi reviewed 25 of 30 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asg0451, @dhartunian, and @kev-cao)


-- commits line 4 at r1:
Lets add details about why are we introducing the field and impact of it on existing metrics which are tagged under essential.


pkg/util/metric/metric.proto line 101 at r1 (raw file):

  // dashboards) and helps categorize metrics by their relevance to different users.
  enum MetricVisibility {
    // INTERNAL is used for metrics that are only useful to CockroachDB developers

nit: We can omit Cockroachdb developers here.


pkg/util/metric/metric.proto line 107 at r1 (raw file):

    INTERNAL = 0;

    // SUPPORT is used for metrics needed to help Support Engineers diagnose user issues.

nit: We can omit Support Engineers.
"SUPPORT is used for metrics needed to diagnose production issues."


pkg/util/metric/metric.proto line 113 at r1 (raw file):

    // ESSENTIAL is used for metrics that are both critical for end users and core
    // to observability.

ESSENTIAL is used for metrics which are critical to track the health of the cluster. When metrics is marked ESSENTIAL:

  • metric is included in tsdump
  • metrics in included in documentations, dashboards and monitoring pipelines.

We can update the SUPPORT in similar format.


pkg/util/metric/metric.proto line 120 at r1 (raw file):

    // metrics" that can be found at
    // https://github.com/cockroachlabs/cockroachdb-runbook-template/blob/main/monitoring-alerts/monitoring-dashboard-custom.md
    ESSENTIAL = 2;

Lets make sure that we are not regressing anything due to this change.

Copy link
Contributor Author

@Abhinav1299 Abhinav1299 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aa-joshi, @asg0451, @dhartunian, and @kev-cao)


-- commits line 4 at r1:

Previously, aa-joshi (Akshay Joshi) wrote…

Lets add details about why are we introducing the field and impact of it on existing metrics which are tagged under essential.

Updated


pkg/util/metric/metric.proto line 101 at r1 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

nit: We can omit Cockroachdb developers here.

Updated


pkg/util/metric/metric.proto line 107 at r1 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

nit: We can omit Support Engineers.
"SUPPORT is used for metrics needed to diagnose production issues."

Updated


pkg/util/metric/metric.proto line 113 at r1 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

ESSENTIAL is used for metrics which are critical to track the health of the cluster. When metrics is marked ESSENTIAL:

  • metric is included in tsdump
  • metrics in included in documentations, dashboards and monitoring pipelines.

We can update the SUPPORT in similar format.

Updated


pkg/util/metric/metric.proto line 120 at r1 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

Lets make sure that we are not regressing anything due to this change.

Yup, I've removed all the dependency of Essential and replaced it with the updated Visibility label.
Moreover, once this is merged. I'll let the docs team know about this change so that they can work on the documentation accordingly.

Copy link
Contributor

@aa-joshi aa-joshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aa-joshi reviewed 2 of 30 files at r1, 10 of 12 files at r2, 1 of 1 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asg0451, @dhartunian, and @kev-cao)


-- commits line 18 at r3:
Lets remove numbers as it is subject to change in future.


pkg/util/metric/metric.proto line 101 at r3 (raw file):

  // dashboards) and helps categorize metrics by their relevance to different users.
  enum MetricVisibility {
    // INTERNAL is used for metrics which are only useful for internal debugging purposes.

Will this be a default value if not passed? How do we treat the metrics if visibility is not passed?


pkg/kv/kvserver/metrics.go line 881 at r3 (raw file):

	)
	// Update L0-level-size metric visibility to SUPPORT
	_ = func() struct{} {

Let's extend the storageLevelMetricMetadata to handle visibility instead of handling in custom way.

This PR replaces the binary essential boolean field on metrics with a
three-level MetricVisibility enum, providing better granularity for metric
categorization and enabling more targeted metric exports. This change
addresses the need to differentiate between metrics used for internal
debugging, support troubleshooting, and essential customer monitoring.

Previously, CockroachDB metrics used a binary essential boolean field
to categorize metrics as either user-facing or not. This was insufficient
for several use cases to exclude internal debugging metrics while still
including support-relevant ones in our toolings like tsdump.

For the impact, All metrics previously marked as `Essential: true` are
now `Visibility: metric.Metadata_ESSENTIAL` with identical semantics and
no functional changes to existing essential metrics. Our downstream
tooling of tsdump will collect all these essential and support metrics
for its non-verbose run. Significantly reducing the size of overall tsdump.

Part of: CRDB-57261
Epic: CRDB-55082
Release note: None
Copy link
Contributor Author

@Abhinav1299 Abhinav1299 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aa-joshi, @asg0451, @dhartunian, and @kev-cao)


-- commits line 18 at r3:

Previously, aa-joshi (Akshay Joshi) wrote…

Lets remove numbers as it is subject to change in future.

Done


pkg/kv/kvserver/metrics.go line 881 at r3 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

Let's extend the storageLevelMetricMetadata to handle visibility instead of handling in custom way.

Done

Copy link
Contributor Author

@Abhinav1299 Abhinav1299 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aa-joshi, @asg0451, @dhartunian, and @kev-cao)


pkg/util/metric/metric.proto line 101 at r3 (raw file):

Previously, aa-joshi (Akshay Joshi) wrote…

Will this be a default value if not passed? How do we treat the metrics if visibility is not passed?

Yes, The default will be INTERNAL.
When we create a struct without initializing a field, it gets the zero value for that type. For enums (which are int32 types), the zero value is 0. Since Metadata_INTERNAL = 0, any metric created without setting Visibility will have INTERNAL visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants