You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Revert "Ensure there's always a . or e in floats. (#162)"
This reverts commit 67670fe.
The format change wreaks havoc with histograms if you have a mixed set
of monitored targets. `histogram_quantile` spits out weird values,
which are difficult to diagnose.
A developer team here at SoundCloud pulled in the current version of
prometheus/common via a different indirect dependency than
prometheus/client_golang. (client_golang alone would not pull it in
yet if you are using Go modules.) Canary instances then had the newer
prometheus/common, while the normal production instances had not. The
calculated quantiles for the complete service jumped up dramatically,
so the team rolled back and started to look for a performance
regression. (Just looking at the canary alone still worked, but since
nobody suspected this kind of monitoring failure case, the
investigation went totally the wrong way.)
Given the wide distribution of prometheus/client_golang and the way Go
modules work, we will see many of those innocuous upgrades that
suddenly change the `le` values on new deployments. Since this change
is buried behind dependencies, users will run into this problem
without suspecting it, even if we announce it very loud and clearly in
prometheus/common or even prometheus/client_golang.
For now, we have to revert the change and then think about a way to
mitigate it. I'm thinking of sanitizing `le` values in Prometheus
2.x. But let's think of it without haste.
Signed-off-by: beorn7 <[email protected]>
0 commit comments