-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read all cgroup v2 metrics that can be read #1120
Conversation
63f1ce0
to
3f145ec
Compare
b4b6486
to
2a32d9b
Compare
mem_stat.stat.total_inactive_file | ||
}; | ||
let usage = mem_stat.usage_in_bytes; | ||
let working_set = if usage < inactive_file { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we preserve working_set calculation as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hmm. So in the new cgroup sampler we never actually inspect any of the data being read, just immediately rip it out to metrics. I think we could calculate this easily enough in-platform or we could make a special read to preserve this.
Happy to move forward with either, but you're right. working_set
won't be present in the new telemetry stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested in preserving it, calculating it in-platform is possible, but has zero-discoverability.
Since our intention is to mimic k8s's working_set
, lets name it k8s-like.working_set
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I'll add that in a separate PR up-stack. Figure I'll make a k8s reader where we can put any synthetic measures like this.
8adc60d
to
6b05a98
Compare
let metric_prefix = match file_name.to_str() { | ||
Some(s) => format!("cgroup.v2.{s}"), | ||
None => { | ||
// Skip files with non-UTF-8 names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth a warn
here? I would be pretty shocked if there were any cgroup file names that were not valid utf-8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 67078d8
Ok(content) => { | ||
let content = content.trim(); | ||
|
||
// Cgroup files that have values are either single-valued or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went looking for the rules for these files and found this
https://docs.kernel.org/admin-guide/cgroup-v2.html#interface-files
It looks like multiple values aren't supported (whether new-line separated or space separated) and nested keyed aren't supported.
Worth refactoring this match
to ... match the documented potential options, even if we intentionally don't support some of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 67078d8
}; | ||
let file_path = entry.path(); | ||
|
||
match fs::read_to_string(&file_path).await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth checking if the file is readable up-front, some entries in cgroupv2 are write-only. eg:
--w------- 1 root root 0 Jul 15 21:02 memory.reclaim
-r--r--r-- 1 root root 0 Jul 15 21:02 memory.stat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah that's a good thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah actually that metadata check is pretty much embedded in the read attempt, so we double up the syscalls by doing a check up front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 67078d8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, not going to hold up the PR, but we already fetch the file metadata to determine if its a file. So really you just need to check if metadata.is_file && metadata.permissions.is_readable
above.
2d8f756
to
77b5305
Compare
36ced36
to
0c1c3c1
Compare
b1048f9
to
2376bfa
Compare
Signed-off-by: Brian L. Troutwine <[email protected]>
This commit removes any arbitration of the cgroup v2 heirarchy for a given process. We instead read anything that can be read, looping over all cgroup files present but not following the heirarchy down. Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
This commit splits the Sampler so that cgroup collection is in a separate implementation from procfs, which has gotten sprawling. I have maintained the existing Sampler interface and hidden the two new implementations inside of it, although we might choose to expose them at some point in the future. Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: Brian L. Troutwine <[email protected]>
2376bfa
to
8e49816
Compare
a61114e
to
1c73887
Compare
Signed-off-by: Brian L. Troutwine <[email protected]>
8e49816
to
2f79087
Compare
Signed-off-by: Brian L. Troutwine <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the iterations. My main concern is around treating all cgroup-parsed data as counts
, but we can iterate on this as we go.
}; | ||
let file_path = entry.path(); | ||
|
||
match fs::read_to_string(&file_path).await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, not going to hold up the PR, but we already fetch the file metadata to determine if its a file. So really you just need to check if metadata.is_file && metadata.permissions.is_readable
above.
s => s.parse()?, | ||
}; | ||
let metric_name = format!("{metric_prefix}.{key}"); | ||
gauge!(metric_name, labels).set(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocking concern, but are there any values that should be collected as count
s rather than gauges? I think they'll reveal themselves shortly if they exist, but I could imagine some of these entries (I'm thinking memory.events
has oom_kills
for example) would be counts.
Merge activity
|
What does this PR do?
This commit removes any arbitration of the cgroup v2 heirarchy for a
given process. We instead read anything that can be read, looping over
all cgroup files present but not following the hierarchy down.
Metric names will change but they will now reflect underlying system reality.