Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read all cgroup v2 metrics that can be read #1120

Merged
merged 12 commits into from
Dec 6, 2024

Conversation

blt
Copy link
Collaborator

@blt blt commented Dec 3, 2024

What does this PR do?

This commit removes any arbitration of the cgroup v2 heirarchy for a
given process. We instead read anything that can be read, looping over
all cgroup files present but not following the hierarchy down.

Metric names will change but they will now reflect underlying system reality.

@blt blt added the no-changelog label Dec 3, 2024 — with Graphite App
@blt blt mentioned this pull request Dec 3, 2024
@blt blt marked this pull request as ready for review December 3, 2024 03:46
@blt blt requested a review from a team as a code owner December 3, 2024 03:46
@blt blt force-pushed the blt/update_procfs_to_0.17 branch from 63f1ce0 to 3f145ec Compare December 3, 2024 17:35
@blt blt force-pushed the blt/read_all_cgroup_v2_metrics_that_can_be_read branch from b4b6486 to 2a32d9b Compare December 3, 2024 17:35
lading/src/observer/linux.rs Outdated Show resolved Hide resolved
mem_stat.stat.total_inactive_file
};
let usage = mem_stat.usage_in_bytes;
let working_set = if usage < inactive_file {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we preserve working_set calculation as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmm. So in the new cgroup sampler we never actually inspect any of the data being read, just immediately rip it out to metrics. I think we could calculate this easily enough in-platform or we could make a special read to preserve this.

Happy to move forward with either, but you're right. working_set won't be present in the new telemetry stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested in preserving it, calculating it in-platform is possible, but has zero-discoverability.

Since our intention is to mimic k8s's working_set, lets name it k8s-like.working_set or something like that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I'll add that in a separate PR up-stack. Figure I'll make a k8s reader where we can put any synthetic measures like this.

Base automatically changed from blt/update_procfs_to_0.17 to main December 3, 2024 22:27
@blt blt force-pushed the blt/read_all_cgroup_v2_metrics_that_can_be_read branch from 8adc60d to 6b05a98 Compare December 3, 2024 22:57
lading/src/observer/linux/cgroup/v2.rs Outdated Show resolved Hide resolved
let metric_prefix = match file_name.to_str() {
Some(s) => format!("cgroup.v2.{s}"),
None => {
// Skip files with non-UTF-8 names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth a warn here? I would be pretty shocked if there were any cgroup file names that were not valid utf-8

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 67078d8

Ok(content) => {
let content = content.trim();

// Cgroup files that have values are either single-valued or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went looking for the rules for these files and found this
https://docs.kernel.org/admin-guide/cgroup-v2.html#interface-files

It looks like multiple values aren't supported (whether new-line separated or space separated) and nested keyed aren't supported.

Worth refactoring this match to ... match the documented potential options, even if we intentionally don't support some of them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update the documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 67078d8

};
let file_path = entry.path();

match fs::read_to_string(&file_path).await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth checking if the file is readable up-front, some entries in cgroupv2 are write-only. eg:

--w-------  1 root root 0 Jul 15 21:02 memory.reclaim
-r--r--r--  1 root root 0 Jul 15 21:02 memory.stat

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that's a good thought.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah actually that metadata check is pretty much embedded in the read attempt, so we double up the syscalls by doing a check up front.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 67078d8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, not going to hold up the PR, but we already fetch the file metadata to determine if its a file. So really you just need to check if metadata.is_file && metadata.permissions.is_readable above.

lading/src/observer/linux/procfs.rs Show resolved Hide resolved
@blt blt force-pushed the blt/read_all_cgroup_v2_metrics_that_can_be_read branch from 2d8f756 to 77b5305 Compare December 4, 2024 21:18
@blt blt changed the base branch from main to blt/attempt_generational_storage_of_capture_metrics December 4, 2024 21:18
@blt blt force-pushed the blt/attempt_generational_storage_of_capture_metrics branch from 36ced36 to 0c1c3c1 Compare December 4, 2024 21:20
@blt blt force-pushed the blt/read_all_cgroup_v2_metrics_that_can_be_read branch 3 times, most recently from b1048f9 to 2376bfa Compare December 5, 2024 00:17
@blt blt changed the base branch from blt/attempt_generational_storage_of_capture_metrics to graphite-base/1120 December 5, 2024 07:57
blt added 4 commits December 5, 2024 07:57
Signed-off-by: Brian L. Troutwine <[email protected]>
This commit removes any arbitration of the cgroup v2 heirarchy for a
given process. We instead read anything that can be read, looping over
all cgroup files present but not following the heirarchy down.

Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
blt added 6 commits December 5, 2024 07:57
Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
This commit splits the Sampler so that cgroup collection is in a separate
implementation from procfs, which has gotten sprawling. I have maintained the
existing Sampler interface and hidden the two new implementations inside of it,
although we might choose to expose them at some point in the future.

Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
@blt blt force-pushed the blt/read_all_cgroup_v2_metrics_that_can_be_read branch from 2376bfa to 8e49816 Compare December 5, 2024 07:57
@blt blt force-pushed the graphite-base/1120 branch from a61114e to 1c73887 Compare December 5, 2024 07:57
@blt blt changed the base branch from graphite-base/1120 to main December 5, 2024 07:58
Signed-off-by: Brian L. Troutwine <[email protected]>
@blt blt force-pushed the blt/read_all_cgroup_v2_metrics_that_can_be_read branch from 8e49816 to 2f79087 Compare December 5, 2024 07:58
Signed-off-by: Brian L. Troutwine <[email protected]>
Copy link
Contributor

@scottopell scottopell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the iterations. My main concern is around treating all cgroup-parsed data as counts, but we can iterate on this as we go.

};
let file_path = entry.path();

match fs::read_to_string(&file_path).await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, not going to hold up the PR, but we already fetch the file metadata to determine if its a file. So really you just need to check if metadata.is_file && metadata.permissions.is_readable above.

s => s.parse()?,
};
let metric_name = format!("{metric_prefix}.{key}");
gauge!(metric_name, labels).set(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocking concern, but are there any values that should be collected as counts rather than gauges? I think they'll reveal themselves shortly if they exist, but I could imagine some of these entries (I'm thinking memory.events has oom_kills for example) would be counts.

@blt blt merged commit d2a93c4 into main Dec 6, 2024
19 checks passed
Copy link
Collaborator Author

blt commented Dec 6, 2024

Merge activity

  • Dec 6, 10:39 AM EST: A user merged this pull request with Graphite.

@blt blt deleted the blt/read_all_cgroup_v2_metrics_that_can_be_read branch December 6, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants