Skip to content

Comments

Append application metrics to /metrics so that prometheus can read #1616#2262

Merged
dashpole merged 6 commits intogoogle:masterfrom
sanek9:master
Mar 6, 2020
Merged

Append application metrics to /metrics so that prometheus can read #1616#2262
dashpole merged 6 commits intogoogle:masterfrom
sanek9:master

Conversation

@sanek9
Copy link
Contributor

@sanek9 sanek9 commented Jul 2, 2019

Solves #1616

@k8s-ci-robot
Copy link
Collaborator

Hi @sanek9. Thanks for your PR.

I'm waiting for a google or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dashpole
Copy link
Collaborator

dashpole commented Jul 2, 2019

/ok-to-test

@mr-
Copy link

mr- commented Dec 12, 2019

Hello @dashpole
is there any way we can move that forward? :)
Best regards, Martin

}
}

if c.includedMetrics.Has(container.AppMetrics) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this up with other metrics? That way we get consistent labels with other metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not, because these metrics are initialized when cadvisor starts, but application metrics can be changed at any time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You don't need to use the containerMetrics thingy above, but if you place it after for _, cm := range c.containerMetrics {, you can simplify your function a bit.

@dashpole
Copy link
Collaborator

sorry to let this sit. Feel free to bump the PR if it doesn't get action in ~1 week

}
}

if c.includedMetrics.Has(container.AppMetrics) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You don't need to use the containerMetrics thingy above, but if you place it after for _, cm := range c.containerMetrics {, you can simplify your function a bit.

for _, container := range containers {
cstats := container.Stats
if len(cstats) > 0 {
last := cstats[len(cstats)-1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you want to use the first element in the list? At least, that is what we do above...

for _, metric := range v {
values := make([]string, 0, len(metric.Labels)+2)
labels := make([]string, 0, len(metric.Labels)+2)
labels = append(labels, "container_name")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not add your own labels here, and re-use the logic above.

if label == "__name__" {
continue
}
labels = append(labels, sanitizeLabelName(label))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that all metrics returned in a single scrape must have the same labels. See https://github.com/google/cadvisor/pull/2262/files#diff-88eab3cc8cef9ad727beea9923056cbbR1148-R1172 above. I believe this would cause the metrics endpoint to return only some metrics on each scrape.

@sanek9
Copy link
Contributor Author

sanek9 commented Dec 16, 2019

@dashpole, Thanks for good code review, what about this?

@sanek9
Copy link
Contributor Author

sanek9 commented Dec 16, 2019

/retest

@dashpole
Copy link
Collaborator

I1216 22:28:34.783] >> checking go formatting
I1216 22:28:35.894] The following files are not properly formatted:
I1216 22:28:35.894] metrics/prometheus.go

copy(clabels, labels)
copy(cvalues, values)
for label, value := range metric.Labels {
if label == "__name__" {
Copy link
Collaborator

@dashpole dashpole Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we special-casing __name__?

Copy link
Contributor Author

@sanek9 sanek9 Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1221						for label, value := range metric.Labels {
  1222	//						if label == "__name__" {
  1223	//							continue
  1224	//						}
  1225							clabels = append(clabels, sanitizeLabelName(label))
  1226							cvalues = append(cvalues, value)
  1227						}
  1228						desc := prometheus.NewDesc(metricLabel, "Custom application metric.", clabels, nil)
  1229						ch <- prometheus.MustNewConstMetric(desc, prometheus.GaugeValue, float64(metric.FloatValue), cvalues...)
resources are being tracked.
panic: "__name__" is not a valid label name

goroutine 568 [running]:
github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
	/go/src/github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus/value.go:102
github.com/google/cadvisor/metrics.(*PrometheusCollector).collectContainersInfo(0xc000110140, 0xc0007b6d20)
	/go/src/github.com/google/cadvisor/metrics/prometheus.go:1229 +0x2044
github.com/google/cadvisor/metrics.(*PrometheusCollector).Collect(0xc000110140, 0xc0007b6d20)
	/go/src/github.com/google/cadvisor/metrics/prometheus.go:1078 +0x89
github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
	/go/src/github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus/registry.go:434 +0x19d
created by github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
	/go/src/github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus/registry.go:526 +0xe12

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I would prefer doing any filtering at collection time.

Copy link
Collaborator

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. Can you add test cases for this? You will need to modify the metrics/testdata/prometheus_metrics to add the expected output for your case.

@sanek9
Copy link
Contributor Author

sanek9 commented Dec 17, 2019

seems ok

@dashpole
Copy link
Collaborator

Looks good. Since we don't have e2e tests for these metrics (just unit tests), can you add a blurb to the initial comment describing the manual tests you ran. In particular, make sure you test with additional labels, since I want to make sure this doesn't resurface #1704.

@sanek9
Copy link
Contributor Author

sanek9 commented Dec 21, 2019

@dashpole, what do you think about this https://github.com/sanek9/cadvisor-test ?
it checks that cadvisor always reports the same metrics that the application has

log run.sh
+ export CIMAGE=cadvisor:57104c28
+ docker-compose build
cadvisor uses an image, skipping
Building example-app
Step 1/9 : FROM golang:1.13-alpine3.10
 ---> 69cf534c966a
Step 2/9 : RUN apk add git
 ---> Using cache
 ---> 9f9eaa40faf5
Step 3/9 : WORKDIR /go/src/app
 ---> Using cache
 ---> c60e31656a6a
Step 4/9 : ADD metrics.json /var/cadvisor/metrics.json
 ---> Using cache
 ---> c605fefd1c75
Step 5/9 : LABEL io.cadvisor.metric.prometheus="/var/cadvisor/metrics.json"
 ---> Using cache
 ---> 55cf5e68fab9
Step 6/9 : COPY . .
 ---> Using cache
 ---> 17cbad93997d
Step 7/9 : RUN go get -d -v ./...
 ---> Using cache
 ---> 070d758128a2
Step 8/9 : RUN go install -v ./...
 ---> Using cache
 ---> 69991f1c04ea
Step 9/9 : ENTRYPOINT ["app"]
 ---> Using cache
 ---> 0296828a9aba

Successfully built 0296828a9aba
Successfully tagged cadvisor-test_example-app:latest
Building test
Step 1/7 : FROM golang:1.13-alpine3.10
 ---> 69cf534c966a
Step 2/7 : RUN apk add git
 ---> Using cache
 ---> 9f9eaa40faf5
Step 3/7 : WORKDIR /go/src/test
 ---> Using cache
 ---> b42b5a6647e3
Step 4/7 : COPY . .
 ---> Using cache
 ---> 8bfffd4dbaab
Step 5/7 : RUN go get -t -v ./...
 ---> Using cache
 ---> b9ba9d889c8f
Step 6/7 : ENV CGO_ENABLED 0
 ---> Using cache
 ---> b160e2f0b932
Step 7/7 : CMD ["./run"]
 ---> Using cache
 ---> 409793ee76af

Successfully built 409793ee76af
Successfully tagged cadvisor-test_test:latest
+ RANDOM_METRIC_COUNT=80 CUSTOM_LABEL=custom_label docker-compose up --abort-on-container-exit
Recreating cadvisor-test_example-app_1 ... done
Starting cadvisor-test_cadvisor_1      ... done
Recreating cadvisor-test_test_1        ... done
Attaching to cadvisor-test_cadvisor_1, cadvisor-test_example-app_1, cadvisor-test_test_1
cadvisor_1     | W1221 21:59:28.260698       1 manager.go:256] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: operation not permitted
cadvisor_1     | W1221 21:59:28.320889       1 container.go:409] Failed to create summary reader for "/system.slice/system-tor.slice/tor@default.service": none of the resources are being tracked.
test_1         | starting test
test_1         | start test
test_1         | ..................................................
test_1         | ok
test_1         | PASS
test_1         | ok  	test	27.484s
cadvisor-test_test_1 exited with code 0
Aborting on container exit...
Stopping cadvisor-test_example-app_1   ... done
Stopping cadvisor-test_cadvisor_1      ... done
+ echo status 0
status 0
+ RANDOM_METRIC_COUNT=100 CUSTOM_LABEL=custom_label docker-compose up --abort-on-container-exit
Recreating cadvisor-test_example-app_1 ... done
Starting cadvisor-test_cadvisor_1      ... done
Recreating cadvisor-test_test_1        ... done
Attaching to cadvisor-test_cadvisor_1, cadvisor-test_example-app_1, cadvisor-test_test_1
cadvisor_1     | W1221 22:00:14.568227       1 manager.go:256] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: operation not permitted
cadvisor_1     | W1221 22:00:14.700657       1 container.go:409] Failed to create summary reader for "/system.slice/system-tor.slice/tor@default.service": none of the resources are being tracked.
test_1         | starting test
test_1         | start test
test_1         | ..................................................
test_1         | ok
test_1         | PASS
test_1         | ok  	test	24.513s
cadvisor-test_test_1 exited with code 0
Aborting on container exit...
Stopping cadvisor-test_example-app_1   ... done
Stopping cadvisor-test_cadvisor_1      ... done
+ echo status 0
status 0
+ RANDOM_METRIC_COUNT=101 CUSTOM_LABEL=custom_label docker-compose up --abort-on-container-exit
Recreating cadvisor-test_example-app_1 ... done
Starting cadvisor-test_cadvisor_1      ... done
Recreating cadvisor-test_test_1        ... done
Attaching to cadvisor-test_cadvisor_1, cadvisor-test_example-app_1, cadvisor-test_test_1
cadvisor_1     | W1221 22:00:58.687072       1 manager.go:256] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: operation not permitted
cadvisor_1     | W1221 22:00:58.799102       1 container.go:409] Failed to create summary reader for "/system.slice/system-tor.slice/tor@default.service": none of the resources are being tracked.
cadvisor_1     | W1221 22:00:58.804601       1 container.go:523] Failed to update stats for container "/docker/3e3758f8fbe4a5536d429a8d148a21525261a3ef57755eb2d39567e44a3f2ffc": Error 0: too many metrics to collect, continuing to push custom stats
test_1         | starting test
test_1         | start test
test_1         | ffffffffffffffffffffffffffffffffffffffffffffffffff--- FAIL: TestAppMetrics (24.10s)
test_1         |     app_test.go:122: Cadvisor and App metrics not equal in 50 of 50 cases
test_1         | FAIL
test_1         | exit status 1
test_1         | FAIL	test	24.109s
cadvisor-test_test_1 exited with code 1
Aborting on container exit...
Stopping cadvisor-test_example-app_1   ... done
Stopping cadvisor-test_cadvisor_1      ... done
+ echo status 1
status 1
+ RANDOM_METRIC_COUNT=80 CUSTOM_LABEL=name docker-compose up --abort-on-container-exit
Recreating cadvisor-test_example-app_1 ... done
Starting cadvisor-test_cadvisor_1      ... done
Recreating cadvisor-test_test_1        ... done
Attaching to cadvisor-test_cadvisor_1, cadvisor-test_example-app_1, cadvisor-test_test_1
cadvisor_1     | W1221 22:01:41.111251       1 manager.go:256] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: operation not permitted
cadvisor_1     | W1221 22:01:41.237071       1 container.go:409] Failed to create summary reader for "/system.slice/system-tor.slice/tor@default.service": none of the resources are being tracked.
test_1         | starting test
test_1         | start test
cadvisor_1     | panic: duplicate label names
cadvisor_1     | 
cadvisor_1     | goroutine 113 [running]:
cadvisor_1     | github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
cadvisor_1     | 	/go/src/github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus/value.go:102
cadvisor_1     | github.com/google/cadvisor/metrics.(*PrometheusCollector).collectContainersInfo(0xc0000b8e10, 0xc0006f12c0)
cadvisor_1     | 	/go/src/github.com/google/cadvisor/metrics/prometheus.go:1226 +0x2044
cadvisor_1     | github.com/google/cadvisor/metrics.(*PrometheusCollector).Collect(0xc0000b8e10, 0xc0006f12c0)
cadvisor_1     | 	/go/src/github.com/google/cadvisor/metrics/prometheus.go:1078 +0x89
cadvisor_1     | github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
cadvisor_1     | 	/go/src/github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus/registry.go:434 +0x19d
cadvisor_1     | created by github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
cadvisor_1     | 	/go/src/github.com/google/cadvisor/vendor/github.com/prometheus/client_golang/prometheus/registry.go:526 +0xe12
test_1         | Get http://cadvisor:8080/metrics: EOF
cadvisor-test_cadvisor_1 exited with code 2
Aborting on container exit...
Stopping cadvisor-test_test_1          ... done
Stopping cadvisor-test_example-app_1   ... done
+ echo status 2
status 2

And I found mistake, cadvisor crashes when the application exports metric with label "name", "image", "id", ... with any that returns ContainerLabelsFunc, and I don’t know what to do with it.

The second problem is when application metrics are greater than application_metrics_count_limit ...

@dashpole
Copy link
Collaborator

And I found mistake, cadvisor crashes when the application exports metric with label "name", "image", "id", ... with any that returns ContainerLabelsFunc, and I don’t know what to do with it.

We could prefix all of the labels from the application with something. E.g. "app_" to make sure there are no collisions.

The second problem is when application metrics are greater than application_metrics_count_limit ...

We probably shouldn't worry about that. It seems to be WAI. It is important for the administrator who runs cAdvisor to limit metric cardinality.

Copy link
Collaborator

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dashpole dashpole merged commit c470c61 into google:master Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants