schedulerlatency: Increase the go scheduler latency metric time coverage #158474

iskettaneh · 2025-11-28T18:33:27Z

Before this commit, the go scheduler latency metric was publish once every 10 seconds, and it was based on 2.5 seconds worth of data. That meant that there was 75% blind spot in that metric. This is especially important for short-lived overload that might not have been detected with this metric.

This commit builds on the current interval at which we measure the scheduler latency (100ms), and keeps adding these 100ms measurements into a histogram that gets published (and cleared) every 10s.

The figure below shows the Before/After metric on 2 clusters with the old and the new metric when running the following command: while true; do timeout 3.5 roachprod run $CLUSTER:4 -- './cockroach workload run kv --concurrency=256 --read-percent=95 --duration=120m {pgurl:1}'; sleep 57.5; done

You can see that in the Before figure, many of these spikes are missed. While they are visible in the new metric.

Release note: None

Fixes: #158475

Before this commit, the go scheduler latency metric was publish once every 10 seconds, and it was based on 2.5 seconds worth of data. That meant that there was 75% blind spot in that metric. This is especially important for short-lived overload that might not have been detected with this metric. This commit builds on the current interval at which we measure the scheduler latency (100ms), and keeps adding these 100ms measurements into a histogram that gets published (and cleared) every 10s.

blathers-crl · 2025-11-28T18:33:33Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2025-11-28T18:33:39Z

This change is

iskettaneh requested a review from sumeerbhola November 28, 2025 18:33

iskettaneh marked this pull request as ready for review November 28, 2025 19:32

iskettaneh requested a review from a team as a code owner November 28, 2025 19:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

schedulerlatency: Increase the go scheduler latency metric time coverage #158474

schedulerlatency: Increase the go scheduler latency metric time coverage #158474

iskettaneh commented Nov 28, 2025 •

edited

Loading

Uh oh!

blathers-crl bot commented Nov 28, 2025

Uh oh!

cockroach-teamcity commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

schedulerlatency: Increase the go scheduler latency metric time coverage #158474

Are you sure you want to change the base?

schedulerlatency: Increase the go scheduler latency metric time coverage #158474

Conversation

iskettaneh commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blathers-crl bot commented Nov 28, 2025

Uh oh!

cockroach-teamcity commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iskettaneh commented Nov 28, 2025 •

edited

Loading