Skip to content

Commit

Permalink
feat: add metric for non-evictable memory
Browse files Browse the repository at this point in the history
The goal of this PR is to have a cAdvisor metric which (as accurately as possible)
describes the amount of container memory which is not evictable by the kernel.

This new metric can be used to accurately graph and alert on
container memory usage regardless of its evictable memory usage patterns (e.g large active page cache).

working_set_bytes today does not always align with non-evictable memory.
For example, two containers in a pod sharing files in an emptyDir will increase total_active_file cache
as one container writes and another container reads over time, dramatically increasing working_set_bytes.
Under increasing non-evictable memory demands from the file owning process, total_active_file will decrease, and working_set_bytes's value will hover around ~90% of the cgroup memory limit.
This makes alerting difficult, as working_set_bytes does not accurately detail that the pod has evictable active page cache that the kernel is slowly draining.

In other words, total_active_file memory can be evicted by the kernel, but is included in working_set_bytes.

Alternatively to a new metric, working_set_bytes could be updated to represent non evictable memory and exclude total_active_file (along with any other evictable fields).
  • Loading branch information
jrcichra committed Jan 17, 2024
1 parent 04006e5 commit e579c7d
Show file tree
Hide file tree
Showing 13 changed files with 74 additions and 15 deletions.
5 changes: 5 additions & 0 deletions cmd/internal/storage/bigquery/bigquery.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ const (
colMemoryUsage string = "memory_usage"
// Working set size
colMemoryWorkingSet string = "memory_working_set"
// Non-evictable set size
colMemoryNonEvictableSet string = "memory_non_evictable_set"
// Container page fault
colMemoryContainerPgfault string = "memory_container_pgfault"
// Constainer major page fault
Expand Down Expand Up @@ -226,6 +228,9 @@ func (s *bigqueryStorage) containerStatsToRows(
// Working set size
row[colMemoryWorkingSet] = stats.Memory.WorkingSet

// Non-evictable set size
row[colMemoryNonEvictableSet] = stats.Memory.NonEvictableSet

// container page fault
row[colMemoryContainerPgfault] = stats.Memory.ContainerData.Pgfault

Expand Down
4 changes: 4 additions & 0 deletions cmd/internal/storage/influxdb/influxdb.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ const (
serMemoryMappedFile string = "memory_mapped_file"
// Working set size
serMemoryWorkingSet string = "memory_working_set"
// Non-evictable set size
serMemoryNonEvictableSet string = "memory_non_evictable_set"
// Number of memory usage hits limits
serMemoryFailcnt string = "memory_failcnt"
// Cumulative count of memory allocation failures
Expand Down Expand Up @@ -256,6 +258,8 @@ func (s *influxdbStorage) memoryStatsToPoints(
points = append(points, makePoint(serMemoryMappedFile, stats.Memory.MappedFile))
// Working Set Size
points = append(points, makePoint(serMemoryWorkingSet, stats.Memory.WorkingSet))
// Non-evictable Set Size
points = append(points, makePoint(serMemoryNonEvictableSet, stats.Memory.NonEvictableSet))
// Number of memory usage hits limits
points = append(points, makePoint(serMemoryFailcnt, stats.Memory.Failcnt))

Expand Down
6 changes: 6 additions & 0 deletions cmd/internal/storage/influxdb/influxdb_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,10 @@ func (self *influxDbTestStorageDriver) StatsEq(a, b *info.ContainerStats) bool {
return false
}

if a.Memory.NonEvictableSet != b.Memory.NonEvictableSet {
return false
}

if !reflect.DeepEqual(a.Network, b.Network) {
return false
}
Expand Down Expand Up @@ -253,6 +257,7 @@ func TestContainerStatsToPoints(t *testing.T) {
assertContainsPointWithValue(t, points, serMemoryMappedFile, stats.Memory.MappedFile)
assertContainsPointWithValue(t, points, serMemoryUsage, stats.Memory.Usage)
assertContainsPointWithValue(t, points, serMemoryWorkingSet, stats.Memory.WorkingSet)
assertContainsPointWithValue(t, points, serMemoryNonEvictableSet, stats.Memory.NonEvictableSet)
assertContainsPointWithValue(t, points, serMemoryFailcnt, stats.Memory.Failcnt)
assertContainsPointWithValue(t, points, serMemoryFailure, stats.Memory.ContainerData.Pgfault)
assertContainsPointWithValue(t, points, serMemoryFailure, stats.Memory.ContainerData.Pgmajfault)
Expand Down Expand Up @@ -353,6 +358,7 @@ func createTestStats() (*info.ContainerInfo, *info.ContainerStats) {
Swap: 1024,
MappedFile: 1025327104,
WorkingSet: 23630012416,
NonEvictableSet: 29459246253,
Failcnt: 1,
ContainerData: info.MemoryStatsMemoryData{Pgfault: 100328455, Pgmajfault: 97},
HierarchicalData: info.MemoryStatsMemoryData{Pgfault: 100328454, Pgmajfault: 96},
Expand Down
4 changes: 4 additions & 0 deletions cmd/internal/storage/statsd/statsd.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ const (
serMemoryMappedFile string = "memory_mapped_file"
// Working set size
serMemoryWorkingSet string = "memory_working_set"
// Non-evictable set size
serMemoryNonEvictableSet string = "memory_non_evictable_set"
// Number of memory usage hits limits
serMemoryFailcnt string = "memory_failcnt"
// Cumulative count of memory allocation failures
Expand Down Expand Up @@ -159,6 +161,8 @@ func (s *statsdStorage) memoryStatsToValues(series *map[string]uint64, stats *in
(*series)[serMemoryMappedFile] = stats.Memory.MappedFile
// Working Set Size
(*series)[serMemoryWorkingSet] = stats.Memory.WorkingSet
// Non-evictable Set Size
(*series)[serMemoryNonEvictableSet] = stats.Memory.NonEvictableSet
// Number of memory usage hits limits
(*series)[serMemoryFailcnt] = stats.Memory.Failcnt

Expand Down
4 changes: 4 additions & 0 deletions cmd/internal/storage/stdout/stdout.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ const (
serMemoryMappedFile string = "memory_mapped_file"
// Working set size
serMemoryWorkingSet string = "memory_working_set"
// Non-evictable set size
serMemoryNonEvictableSet string = "memory_non_evictable_set"
// Number of memory usage hits limits
serMemoryFailcnt string = "memory_failcnt"
// Cumulative count of memory allocation failures
Expand Down Expand Up @@ -164,6 +166,8 @@ func (driver *stdoutStorage) memoryStatsToValues(series *map[string]uint64, stat
(*series)[serMemoryMappedFile] = stats.Memory.MappedFile
// Working Set Size
(*series)[serMemoryWorkingSet] = stats.Memory.WorkingSet
// Non-evictable Set Size
(*series)[serMemoryNonEvictableSet] = stats.Memory.NonEvictableSet
// Number of memory usage hits limits
(*series)[serMemoryFailcnt] = stats.Memory.Failcnt

Expand Down
25 changes: 18 additions & 7 deletions container/libcontainer/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -834,15 +834,26 @@ func setMemoryStats(s *cgroups.Stats, ret *info.ContainerStats) {
inactiveFileKeyName = "inactive_file"
}

workingSet := ret.Memory.Usage
if v, ok := s.MemoryStats.Stats[inactiveFileKeyName]; ok {
if workingSet < v {
workingSet = 0
} else {
workingSet -= v
activeFileKeyName := "total_active_file"
if cgroups.IsCgroup2UnifiedMode() {
activeFileKeyName = "active_file"
}

ret.Memory.WorkingSet = subtractStats(ret.Memory.Usage, s.MemoryStats.Stats, []string{inactiveFileKeyName})
ret.Memory.NonEvictableSet = subtractStats(ret.Memory.Usage, s.MemoryStats.Stats, []string{inactiveFileKeyName, activeFileKeyName})
}

func subtractStats(value uint64, stats map[string]uint64, keys []string) uint64 {
for _, key := range keys {
if v, ok := stats[key]; ok {
if value < v {
value = 0
} else {
value -= v
}
}
}
ret.Memory.WorkingSet = workingSet
return value
}

func setCPUSetStats(s *cgroups.Stats, ret *info.ContainerStats) {
Expand Down
5 changes: 5 additions & 0 deletions info/v1/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,11 @@ type MemoryStats struct {
// Units: Bytes.
WorkingSet uint64 `json:"working_set"`

// The amount of non-evictable memory, this gives an aproximate figure
// to determine when a container near OOM-ing.
// Units: Bytes.
NonEvictableSet uint64 `json:"non_evictable_set"`

Failcnt uint64 `json:"failcnt"`

// Size of kernel memory allocated in bytes.
Expand Down
11 changes: 6 additions & 5 deletions info/v2/conversion_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -137,11 +137,12 @@ func TestContainerStatsFromV1(t *testing.T) {
v1Stats := v1.ContainerStats{
Timestamp: timestamp,
Memory: v1.MemoryStats{
Usage: 1,
Cache: 2,
RSS: 3,
WorkingSet: 4,
Failcnt: 5,
Usage: 1,
Cache: 2,
RSS: 3,
WorkingSet: 4,
Failcnt: 5,
NonEvictableSet: 6,
ContainerData: v1.MemoryStatsMemoryData{
Pgfault: 1,
Pgmajfault: 2,
Expand Down
4 changes: 4 additions & 0 deletions integration/tests/api/test_utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,12 @@ func checkMemoryStats(t *testing.T, stat info.MemoryStats) {

assert.NotEqual(0, stat.Usage, "Memory usage should not be zero")
assert.NotEqual(0, stat.WorkingSet, "Memory working set should not be zero")
assert.NotEqual(0, stat.NonEvictableSet, "Memory non-evictable set should not be zero")
if stat.WorkingSet > stat.Usage {
t.Errorf("Memory working set (%d) should be at most equal to memory usage (%d)", stat.WorkingSet, stat.Usage)
}
if stat.NonEvictableSet > stat.Usage {
t.Errorf("Memory non-evictable set (%d) should be at most equal to memory usage (%d)", stat.NonEvictableSet, stat.Usage)
}
// TODO(vmarmol): Add checks for ContainerData and HierarchicalData
}
8 changes: 8 additions & 0 deletions metrics/prometheus.go
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,14 @@ func NewPrometheusCollector(i infoProvider, f ContainerLabelsFunc, includedMetri
return metricValues{{value: float64(s.Memory.WorkingSet), timestamp: s.Timestamp}}
},
},
{
name: "container_memory_non_evictable_set_bytes",
help: "Current non-evictable set in bytes.",
valueType: prometheus.GaugeValue,
getValues: func(s *info.ContainerStats) metricValues {
return metricValues{{value: float64(s.Memory.NonEvictableSet), timestamp: s.Timestamp}}
},
},
{
name: "container_memory_failures_total",
help: "Cumulative count of memory allocation failures.",
Expand Down
7 changes: 4 additions & 3 deletions metrics/prometheus_fake.go
Original file line number Diff line number Diff line change
Expand Up @@ -329,9 +329,10 @@ func (p testSubcontainersInfoProvider) GetRequestedContainersInfo(string, v2.Req
LoadAverage: 2,
},
Memory: info.MemoryStats{
Usage: 8,
MaxUsage: 8,
WorkingSet: 9,
Usage: 8,
MaxUsage: 8,
WorkingSet: 9,
NonEvictableSet: 7,
ContainerData: info.MemoryStatsMemoryData{
Pgfault: 10,
Pgmajfault: 11,
Expand Down
3 changes: 3 additions & 0 deletions metrics/testdata/prometheus_metrics
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,9 @@ container_memory_usage_bytes{container_env_foo_env="prod",container_label_foo_la
# HELP container_memory_working_set_bytes Current working set in bytes.
# TYPE container_memory_working_set_bytes gauge
container_memory_working_set_bytes{container_env_foo_env="prod",container_label_foo_label="bar",id="testcontainer",image="test",name="testcontaineralias",zone_name="hello"} 9 1395066363000
# HELP container_memory_non_evictable_set_bytes Current non-evictable set in bytes.
# TYPE container_memory_non_evictable_set_bytes gauge
container_memory_non_evictable_set_bytes{container_env_foo_env="prod",container_label_foo_label="bar",id="testcontainer",image="test",name="testcontaineralias",zone_name="hello"} 7 1395066363000
# HELP container_network_advance_tcp_stats_total advance tcp connections statistic for container
# TYPE container_network_advance_tcp_stats_total gauge
container_network_advance_tcp_stats_total{container_env_foo_env="prod",container_label_foo_label="bar",id="testcontainer",image="test",name="testcontaineralias",tcp_state="activeopens",zone_name="hello"} 1.1038621e+07 1395066363000
Expand Down
3 changes: 3 additions & 0 deletions metrics/testdata/prometheus_metrics_whitelist_filtered
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,9 @@ container_memory_usage_bytes{container_env_foo_env="prod",id="testcontainer",ima
# HELP container_memory_working_set_bytes Current working set in bytes.
# TYPE container_memory_working_set_bytes gauge
container_memory_working_set_bytes{container_env_foo_env="prod",id="testcontainer",image="test",name="testcontaineralias",zone_name="hello"} 9 1395066363000
# HELP container_memory_non_evictable_set_bytes Current non-evictable set in bytes.
# TYPE container_memory_non_evictable_set_bytes gauge
container_memory_non_evictable_set_bytes{container_env_foo_env="prod",id="testcontainer",image="test",name="testcontaineralias",zone_name="hello"} 7 1395066363000
# HELP container_network_advance_tcp_stats_total advance tcp connections statistic for container
# TYPE container_network_advance_tcp_stats_total gauge
container_network_advance_tcp_stats_total{container_env_foo_env="prod",id="testcontainer",image="test",name="testcontaineralias",tcp_state="activeopens",zone_name="hello"} 1.1038621e+07 1395066363000
Expand Down

0 comments on commit e579c7d

Please sign in to comment.