-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource optimisation: Updating node Selection values #2388
Conversation
📝 WalkthroughWalkthroughThis pull request involves standardizing Kubernetes configuration files across multiple services in the Changes
Possibly related PRs
Suggested Reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BenjaminSsempala .
I am about to approve and will wait for a review from @Psalmz777 before I perform merge duties.
Thanks again.
Hi @BenjaminSsempala / @Psalmz777 , this PR has merge conflicts that need resolving. How did this happen? Anyone else making changes to these K8S files? Is it part of some automated updates of these files? |
@Baalmart Seems I'd created this branch from an outdated staging, I've fixed the merge conflict. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
k8s/reports/values-stage.yaml (1)
33-39
: Fix indentation in affinity configuration.The affinity configuration is functionally correct but has indentation issues. The
matchExpressions
andvalues
sections should be properly aligned.- preference: matchExpressions: - - key: role - operator: In - values: - - control-plane + - key: role + operator: In + values: + - control-plane🧰 Tools
🪛 yamllint (1.35.1)
[warning] 35-35: wrong indentation: expected 12 but found 10
(indentation)
[warning] 38-38: wrong indentation: expected 14 but found 12
(indentation)
k8s/platform/values-stage.yaml (2)
38-44
: Fix indentation and consider documenting the scheduling strategy.The affinity configuration uses a soft preference (weight: 1) for control-plane nodes, which provides scheduling flexibility. However, there are indentation inconsistencies that should be fixed.
Apply this diff to fix the indentation:
nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - - preference: - matchExpressions: - - key: role - operator: In - values: - - control-plane - weight: 1 + - preference: + matchExpressions: + - key: role + operator: In + values: + - control-plane + weight: 1🧰 Tools
🪛 yamllint (1.35.1)
[warning] 38-38: wrong indentation: expected 6 but found 8
(indentation)
[warning] 40-40: wrong indentation: expected 14 but found 12
(indentation)
[warning] 43-43: wrong indentation: expected 16 but found 14
(indentation)
Line range hint
1-44
: Document the resource optimization strategy.This configuration is part of a broader standardization effort across services. To ensure maintainability and knowledge sharing:
- Update architecture documentation with the resource optimization strategy
- Include details about:
- Node selection patterns
- Resource allocation guidelines
- Staging vs Production differences
- Consider creating a template or shared library for these common configurations
Would you like me to help create a template or documentation structure for these configurations?
🧰 Tools
🪛 yamllint (1.35.1)
[error] 32-32: trailing spaces
(trailing-spaces)
[warning] 38-38: wrong indentation: expected 6 but found 8
(indentation)
[warning] 40-40: wrong indentation: expected 14 but found 12
(indentation)
[warning] 43-43: wrong indentation: expected 16 but found 14
(indentation)
k8s/netmanager/values-stage.yaml (2)
16-17
: Consider implications of control-plane node selectionWhile targeting control-plane nodes might be acceptable in staging for cost optimization, this approach requires careful consideration:
- Ensure control-plane nodes have sufficient capacity
- Consider adding tolerations for control-plane taints
- Document this as a staging-only configuration
Also applies to: 32-39
🧰 Tools
🪛 yamllint (1.35.1)
[error] 16-16: trailing spaces
(trailing-spaces)
16-16
: Fix YAML formatting issuesMinor formatting improvements needed:
- Remove trailing spaces
- Fix indentation for matchExpressions
-nodeSelector: +nodeSelector: - key: role - - control-plane + - control-planeAlso applies to: 35-35, 38-38
🧰 Tools
🪛 yamllint (1.35.1)
[error] 16-16: trailing spaces
(trailing-spaces)
k8s/calibrate/values-stage.yaml (1)
21-43
: Consistent configuration pattern across servicesThe changes align with other services, showing a coordinated approach to resource optimization:
- CPU limits standardized to 50m
- Control-plane node targeting
- Consistent affinity configuration
Consider documenting these standardization decisions in the service deployment guidelines.
🧰 Tools
🪛 yamllint (1.35.1)
[error] 31-31: trailing spaces
(trailing-spaces)
[warning] 39-39: wrong indentation: expected 12 but found 10
(indentation)
[warning] 42-42: wrong indentation: expected 14 but found 12
(indentation)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
k8s/calibrate/values-stage.yaml
(1 hunks)k8s/docs/values-stage.yaml
(2 hunks)k8s/inventory/values-stage.yaml
(2 hunks)k8s/netmanager/values-stage.yaml
(1 hunks)k8s/platform/values-stage.yaml
(1 hunks)k8s/reports/values-stage.yaml
(1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
k8s/platform/values-stage.yaml
[error] 32-32: trailing spaces
(trailing-spaces)
[warning] 38-38: wrong indentation: expected 6 but found 8
(indentation)
[warning] 40-40: wrong indentation: expected 14 but found 12
(indentation)
[warning] 43-43: wrong indentation: expected 16 but found 14
(indentation)
k8s/netmanager/values-stage.yaml
[error] 16-16: trailing spaces
(trailing-spaces)
[warning] 35-35: wrong indentation: expected 12 but found 10
(indentation)
[warning] 38-38: wrong indentation: expected 14 but found 12
(indentation)
k8s/calibrate/values-stage.yaml
[error] 31-31: trailing spaces
(trailing-spaces)
[warning] 39-39: wrong indentation: expected 12 but found 10
(indentation)
[warning] 42-42: wrong indentation: expected 14 but found 12
(indentation)
k8s/reports/values-stage.yaml
[error] 28-28: trailing spaces
(trailing-spaces)
[warning] 35-35: wrong indentation: expected 12 but found 10
(indentation)
[warning] 38-38: wrong indentation: expected 14 but found 12
(indentation)
k8s/inventory/values-stage.yaml
[error] 16-16: trailing spaces
(trailing-spaces)
[warning] 35-35: wrong indentation: expected 12 but found 10
(indentation)
[warning] 38-38: wrong indentation: expected 14 but found 12
(indentation)
k8s/docs/values-stage.yaml
[error] 31-31: trailing spaces
(trailing-spaces)
[warning] 39-39: wrong indentation: expected 12 but found 10
(indentation)
[warning] 42-42: wrong indentation: expected 14 but found 12
(indentation)
🔇 Additional comments (7)
k8s/reports/values-stage.yaml (1)
28-29
: Fix trailing whitespace and verify node selection impact.The nodeSelector configuration looks good, but we should verify the impact of restricting pods to control-plane nodes.
✅ Verification successful
Node selector configuration is properly aligned with staging environment patterns
The
nodeSelector: role: control-plane
configuration is consistently implemented across all staging services, including platform, netmanager, docs, calibrate, and inventory. This appears to be an intentional deployment strategy specific to the staging environment, while production services use a different node selection approach.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check current node labels to ensure control-plane nodes exist kubectl get nodes --show-labels | grep -i "role=control-plane" # Verify current pod distribution kubectl get pods -n staging -l app=airqo-reports -o wideLength of output: 317
Script:
#!/bin/bash # Check for similar nodeSelector configurations rg -U "nodeSelector:(\n|\s)*role:\s*control-plane" --type yaml # Look for other deployment configurations fd "values.*\.yaml" -t f --exec head -n 50 {} \;Length of output: 17810
🧰 Tools
🪛 yamllint (1.35.1)
[error] 28-28: trailing spaces
(trailing-spaces)
k8s/inventory/values-stage.yaml (1)
21-22
: Verify memory reduction impact.The memory request has been significantly reduced from 100Mi to 20Mi. While this aligns with resource optimization goals, we should verify that this allocation is sufficient for the inventory service's requirements.
✅ Verification successful
Memory allocation is appropriate
The reduction to 20Mi for memory requests aligns with the resource patterns observed across similar services in the cluster. The 150Mi limit provides adequate headroom for memory spikes, maintaining a 7.5x buffer.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check historical memory usage kubectl get pods -n staging -l app=airqo-inventory -o json | jq '.items[].spec.containers[].resources' # Get current memory usage metrics kubectl top pods -n staging -l app=airqo-inventoryLength of output: 353
Script:
#!/bin/bash # Check memory configurations across k8s files fd . k8s -e yaml -x cat {} | grep -A 2 -B 2 "memory:" # Look for documentation about resource requirements fd "README|DEPLOYMENT" -e md -x cat {} # Find related configuration files fd . k8s -e yaml -x echo "=== {} ===" \; cat {}Length of output: 49131
k8s/platform/values-stage.yaml (2)
32-33
: Document the rationale for scheduling on control-plane nodes.While scheduling workloads on control-plane nodes might be cost-effective for staging, it could impact cluster management operations. Please:
- Document this as a deliberate architectural decision
- Ensure it aligns with your cluster's capacity planning
Let's verify the cluster's node setup:
✅ Verification successful
✓ Control-plane scheduling is consistent but needs documentation
The
role: control-plane
node selector is consistently applied across all staging services but absent in production, suggesting this is an intentional cost-optimization strategy. Please add a comment in the values files or create a brief architecture note explaining this staging-specific design decision.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for similar patterns and documentation # Look for similar node selector patterns rg -g '*.yaml' 'role:\s*control-plane' # Check for architecture decision records or documentation fd -g '*{ADR,adr,architecture}*' -X grep -A 3 -i "control-plane|scheduling"Length of output: 435
Script:
#!/bin/bash # Check production values and look for inline documentation echo "=== Checking production values ===" rg -g '*prod*.yaml' 'role:\s*control-plane' echo -e "\n=== Checking for comments around node selector ===" rg -g '*.yaml' -B 2 'role:\s*control-plane'Length of output: 1267
🧰 Tools
🪛 yamllint (1.35.1)
[error] 32-32: trailing spaces
(trailing-spaces)
Line range hint
18-25
: Consider adjusting resource requests to better match typical usage patterns.The current CPU request (20m) is quite low compared to its limit (200m), which might lead to resource underutilization or throttling. Consider monitoring actual usage patterns and adjusting the requests accordingly.
Let's verify the actual resource usage in staging:
✅ Verification successful
Resource configuration is appropriately set for the staging environment
The current CPU request (20m) to limit (200m) ratio is consistent with other services in the staging environment and aligns with the overall resource management strategy. The memory configuration (150Mi/350Mi) represents an appropriate scaling down from production (250Mi/700Mi) for a staging workload.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if there are any related resource metrics or monitoring data # Look for any prometheus queries or grafana dashboards rg -g '!{node_modules,dist,build}' -A 3 'platform.*cpu|platform.*memory' # Check for any documented resource benchmarks fd -g '*README*' -X grep -A 3 -i "resource|performance|metrics"Length of output: 140
Script:
#!/bin/bash # Search for HPA configurations and resource settings ast-grep --pattern 'kind: HorizontalPodAutoscaler' # Find all values files to compare resource configurations fd values -e yaml -x cat {} # Look for deployment templates fd -e yaml -x grep -l -A 5 'kind:\s*Deployment' {} # Search for resource-related comments rg -g '*.yaml' -B 2 '(cpu|memory):'Length of output: 25690
🧰 Tools
🪛 yamllint (1.35.1)
[error] 32-32: trailing spaces
(trailing-spaces)
[warning] 38-38: wrong indentation: expected 6 but found 8
(indentation)
[warning] 40-40: wrong indentation: expected 14 but found 12
(indentation)
[warning] 43-43: wrong indentation: expected 16 but found 14
(indentation)
k8s/netmanager/values-stage.yaml (1)
20-25
: Verify resource allocation changesThe significant reductions in resource allocations need validation:
- CPU request reduced by 75% (20m → 5m)
- CPU limit reduced by 50% (100m → 50m)
- Memory limit reduced by 60% (200Mi → 80Mi)
Please confirm these changes won't impact service stability under load.
✅ Verification successful
Resource allocations align with environment requirements
The reduced resource allocations in staging are consistent with the overall environment strategy:
- Identical to dev environment configurations
- Production maintains higher limits (100m CPU/150Mi memory) as expected for load handling
- Request-to-limit ratios are maintained appropriately across environments
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check historical resource usage in staging kubectl top pod -n staging -l app=airqo-stage-platform-frontend --containers \ | awk 'NR>1 {print $2, $3}' # CPU and memory usageLength of output: 204
Script:
#!/bin/bash # Find all values files for netmanager fd values.*\.yaml$ k8s/netmanager # Show content of these files focusing on resource configurations for f in $(fd values.*\.yaml$ k8s/netmanager); do echo "=== $f ===" cat "$f" | grep -A6 "resources:" doneLength of output: 1039
k8s/docs/values-stage.yaml (2)
31-43
: Configuration consistent with netmanager serviceThe node selection strategy matches the pattern seen in netmanager service.
🧰 Tools
🪛 yamllint (1.35.1)
[error] 31-31: trailing spaces
(trailing-spaces)
[warning] 39-39: wrong indentation: expected 12 but found 10
(indentation)
[warning] 42-42: wrong indentation: expected 14 but found 12
(indentation)
Line range hint
21-26
: Validate CPU limit increaseThe CPU limit has been increased by 5x (10m → 50m) while memory remains unchanged. Please confirm:
- The reason for this significant CPU increase
- Whether memory constraints are still appropriate
- Historical CPU utilization patterns that justify this change
✅ Verification successful
Resource configuration aligns with cluster patterns
The CPU and memory configurations are consistent with other services in the cluster:
- The 50m CPU limit is a common pattern
- The request-to-limit ratio falls within observed ranges
- Memory configuration (20Mi:100Mi) matches other services
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check historical CPU usage for docs service kubectl top pod -n staging -l app=airqo-stage-docs --containers \ | awk 'NR>1 {print $2}' # CPU usageLength of output: 187
Script:
#!/bin/bash # Check resource configurations across different environments echo "=== Resource configurations across environments ===" fd -e yaml -e yml values . --exec grep -A6 -B2 "limits:" {} echo -e "\n=== Documentation about resource requirements ===" fd -e md . --exec grep -l -i "resource" {} | xargs cat echo -e "\n=== Recent changes to resource configurations ===" git log -p --grep="resource" --grep="cpu" --grep="memory" -- "k8s/docs/*.yaml" | head -n 50Length of output: 27871
Summary of Changes (What does this PR do?)
Status of maturity (all need to be checked before merging):
How should this be manually tested?
What are the relevant tickets?
Screenshots (optional)
Summary by CodeRabbit
control-plane
role across multiple services