Skip to content

Conversation

@iskettaneh
Copy link
Contributor

@iskettaneh iskettaneh commented Nov 28, 2025

Before this commit, the go scheduler latency metric was publish once every 10 seconds, and it was based on 2.5 seconds worth of data. That meant that there was 75% blind spot in that metric. This is especially important for short-lived overload that might not have been detected with this metric.

This commit builds on the current interval at which we measure the scheduler latency (100ms), and keeps adding these 100ms measurements into a histogram that gets published (and cleared) every 10s.

The figure below shows the Before/After metric on 2 clusters with the old and the new metric when running the following command: while true; do timeout 3.5 roachprod run $CLUSTER:4 -- './cockroach workload run kv --concurrency=256 --read-percent=95 --duration=120m {pgurl:1}'; sleep 57.5; done

schedLatencyBefAft

You can see that in the Before figure, many of these spikes are missed. While they are visible in the new metric.

Release note: None

Fixes: #158475

Before this commit, the go scheduler latency metric was publish once
every 10 seconds, and it was based on 2.5 seconds worth of data. That
meant that there was 75% blind spot in that metric. This is especially
important for short-lived overload that might not have been detected
with this metric.

This commit builds on the current interval at which we measure the
scheduler latency (100ms), and keeps adding these 100ms measurements
into a histogram that gets published (and cleared) every 10s.
@blathers-crl
Copy link

blathers-crl bot commented Nov 28, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@iskettaneh iskettaneh marked this pull request as ready for review November 28, 2025 19:32
@iskettaneh iskettaneh requested a review from a team as a code owner November 28, 2025 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

schedulerlatency: Increase the go scheduler latency metric time coverage

2 participants