Append application metrics to /metrics so that prometheus can read #1616#2262
Append application metrics to /metrics so that prometheus can read #1616#2262dashpole merged 6 commits intogoogle:masterfrom
Conversation
|
Hi @sanek9. Thanks for your PR. I'm waiting for a google or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
|
Hello @dashpole |
metrics/prometheus.go
Outdated
| } | ||
| } | ||
|
|
||
| if c.includedMetrics.Has(container.AppMetrics) { |
There was a problem hiding this comment.
can we move this up with other metrics? That way we get consistent labels with other metrics.
There was a problem hiding this comment.
I think not, because these metrics are initialized when cadvisor starts, but application metrics can be changed at any time
There was a problem hiding this comment.
I see. You don't need to use the containerMetrics thingy above, but if you place it after for _, cm := range c.containerMetrics {, you can simplify your function a bit.
|
sorry to let this sit. Feel free to bump the PR if it doesn't get action in ~1 week |
metrics/prometheus.go
Outdated
| } | ||
| } | ||
|
|
||
| if c.includedMetrics.Has(container.AppMetrics) { |
There was a problem hiding this comment.
I see. You don't need to use the containerMetrics thingy above, but if you place it after for _, cm := range c.containerMetrics {, you can simplify your function a bit.
metrics/prometheus.go
Outdated
| for _, container := range containers { | ||
| cstats := container.Stats | ||
| if len(cstats) > 0 { | ||
| last := cstats[len(cstats)-1] |
There was a problem hiding this comment.
I believe you want to use the first element in the list? At least, that is what we do above...
metrics/prometheus.go
Outdated
| for _, metric := range v { | ||
| values := make([]string, 0, len(metric.Labels)+2) | ||
| labels := make([]string, 0, len(metric.Labels)+2) | ||
| labels = append(labels, "container_name") |
There was a problem hiding this comment.
Please do not add your own labels here, and re-use the logic above.
metrics/prometheus.go
Outdated
| if label == "__name__" { | ||
| continue | ||
| } | ||
| labels = append(labels, sanitizeLabelName(label)) |
There was a problem hiding this comment.
Keep in mind that all metrics returned in a single scrape must have the same labels. See https://github.com/google/cadvisor/pull/2262/files#diff-88eab3cc8cef9ad727beea9923056cbbR1148-R1172 above. I believe this would cause the metrics endpoint to return only some metrics on each scrape.
|
@dashpole, Thanks for good code review, what about this? |
|
/retest |
|
metrics/prometheus.go
Outdated
| copy(clabels, labels) | ||
| copy(cvalues, values) | ||
| for label, value := range metric.Labels { | ||
| if label == "__name__" { |
There was a problem hiding this comment.
why are we special-casing __name__?
There was a problem hiding this comment.
1221 for label, value := range metric.Labels {
1222 // if label == "__name__" {
1223 // continue
1224 // }
1225 clabels = append(clabels, sanitizeLabelName(label))
1226 cvalues = append(cvalues, value)
1227 }
1228 desc := prometheus.NewDesc(metricLabel, "Custom application metric.", clabels, nil)
1229 ch <- prometheus.MustNewConstMetric(desc, prometheus.GaugeValue, float64(metric.FloatValue), cvalues...)
resources are being tracked.
panic: "__name__" is not a valid label name
goroutine 568 [running]:
github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
/go/src/github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus/value.go:102
github.com/google/cadvisor/metrics.(*PrometheusCollector).collectContainersInfo(0xc000110140, 0xc0007b6d20)
/go/src/github.com/google/cadvisor/metrics/prometheus.go:1229 +0x2044
github.com/google/cadvisor/metrics.(*PrometheusCollector).Collect(0xc000110140, 0xc0007b6d20)
/go/src/github.com/google/cadvisor/metrics/prometheus.go:1078 +0x89
github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
/go/src/github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus/registry.go:434 +0x19d
created by github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
/go/src/github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus/registry.go:526 +0xe12
There was a problem hiding this comment.
There was a problem hiding this comment.
yeah, I would prefer doing any filtering at collection time.
dashpole
left a comment
There was a problem hiding this comment.
Looks reasonable. Can you add test cases for this? You will need to modify the metrics/testdata/prometheus_metrics to add the expected output for your case.
|
seems ok |
|
Looks good. Since we don't have e2e tests for these metrics (just unit tests), can you add a blurb to the initial comment describing the manual tests you ran. In particular, make sure you test with additional labels, since I want to make sure this doesn't resurface #1704. |
|
@dashpole, what do you think about this https://github.com/sanek9/cadvisor-test ? log run.shAnd I found mistake, cadvisor crashes when the application exports metric with label "name", "image", "id", ... with any that returns ContainerLabelsFunc, and I don’t know what to do with it. The second problem is when application metrics are greater than application_metrics_count_limit ... |
We could prefix all of the labels from the application with something. E.g. "app_" to make sure there are no collisions.
We probably shouldn't worry about that. It seems to be WAI. It is important for the administrator who runs cAdvisor to limit metric cardinality. |
Solves #1616