Consolidation does not happen even when there is cheaper combination of instances available #1962
Labels
kind/bug
Categorizes issue or PR as related to a bug.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
Observed Behavior:
For context, we wanted to leave the cost effectiveness decision to Karpenter, thus we gave a variety of instance types: c5a,c6a,m6a,m5a,c7a,r6a,r5a,r4, in large/xlarge/2xlarge, thinking the different combinations of cpu:memory options would allow Karpenter to make the best decisions for optimal usage to cost on our behalf.
However, based on the nodes karpenter chose for us, our memory utilization is pretty good at 90%, however cpu usage is very low (around 50%).
For instance, we have many c5a.xlarge instances (in the same AZ) that use less than 50% CPU. So two of these could be consolidated into a cheaper m6a.xlarge that has double the amount of memory and the same CPU. But the event on the node says
Normal Unconsolidatable 4m36s (x47 over 15h) karpenter Can't replace with a cheaper node
Instance types with CPU usage looks like this:
This ends up being more expensive than our original pre-provisioned nodepool of nodes (which was around 60-65% utilization for both cpu and memory).
To alleviate this issue, we have removed certain instance types from our list of instance types in our NodePool configuration. However we are curious to know if this is the expected behavior because if so then it seems users still have to understand which specific subset of instances fits the resource needs of our clusters, only then can we make use of Karpenter to minimise costs.
Expected Behavior:
We expect that we should see multi-node consolidation, defined as
For instance, we would expect to see 2 c5a.xlarge instances get consolidated into 1 m6a.xlarge as the cpu and memory would fit into that instance and cost lost.
Reproduction Steps (Please include YAML):
nodepool config:
Versions:
kubectl version
): v1.30The text was updated successfully, but these errors were encountered: