Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AAW Dev: Review node selection logic #2003

Closed
Jose-Matsuda opened this issue Dec 11, 2024 · 4 comments
Closed

AAW Dev: Review node selection logic #2003

Jose-Matsuda opened this issue Dec 11, 2024 · 4 comments
Assignees

Comments

@Jose-Matsuda
Copy link
Contributor

EPIC
Are there any pods that are on say the system nodepool that shouldnt be there? Are there any daemonsets deploying to nodes that they do not need to be deployed to?

@Souheil-Yazji
Copy link
Contributor

Souheil-Yazji commented Jan 8, 2025

@jacek-dudek please update ticket with details on what you will be doing & update the status of the ticket accordingly.

@jacek-dudek
Copy link

jacek-dudek commented Feb 5, 2025

Starting with the list of daemonsets getting deployed on the cluster and getting a description of what each one does.

k9s-daemonsets-output.ods

@jacek-dudek
Copy link

I found short descriptions of what each daemonset does to help us assess if they should be running on the cluster or a particular nodepool:

aad-pod-identity-nmi
nmi stands for node managed identity.
Part of solution that allows assigning pods azure active directory identities so they can access cloud resources and services.

csi-blob-node
Component of blob-csi-driver. This driver allows Kubernetes to access Azure Blob Storage.

fluentd-operator-fluentd-operator
Part of fluent operator logging tool, a universal solution for Kubernetes cluster logging.

azure-ip-masq-agent
The ip-masq-agent configures iptables rules to handle masquerading node/pod IP addresses when sending traffic to destinations outside the cluster node's IP and the Cluster IP range. This essentially hides pod IP addresses behind the cluster node's IP address. In some environments, traffic to "external" addresses must come from a known machine address. For example, in Google Cloud, any traffic to the internet must come from a VM's IP. When containers are used, as in Google Kubernetes Engine, the Pod IP will be rejected for egress. To avoid this, we must hide the Pod IP behind the VM's own IP address - generally known as "masquerade".

azure-npm
Azure network policy manager. Implements Kubernetes network policies for communication between pods within a cluster and also between pods and the outside world.

cloud-node-manager
Part of Azure cloud controller manager, which is Microsoft Azure's implementation of the Kubernetes cloud provider interface.

cloud-node-manager-windows
Probably same as cloud-node-manager but for nodes that are running windows.

csi-azuredisk-node
Part of azure disk CSI driver for kubernetes. This driver allows kubernetes to access azure disk volumes.

csi-azuredisk-node-win
Probably same as csi-azuredisk-node but for nodes that are running windows.

csi-azurefile-node
Part of azure file CSI driver for kubernetes. This driver allows kubernetes to access azure file shares using smb and nfs protocols.

csi-azurefile-node-win
Probably the same as csi-azurefile-node but for nodes running windows.

istio-cni-node
Is the istio daemonset definition implementing kubernetes container network interface. It's optional when istio mesh is implemented using sidecar containers but required when istio is running in ambient mode.

kube-proxy
Part of core kubernetes.

nvidia-device-plugin
Is a daemonset that implements the kubernetes device plugin framework.
It exposes the number of GPUs on each node of the cluster, keeps track of GPU health, and runs GPU enabled containers.

kube-prometheus-stack-prometheus-node-exporter
Prometheus Node Exporter provides hardware and OS-level system metrics exposed by the kernel.
It measures the following metrics:
Memory RAM total, RAM Used, RAM Cache, RAM Free
Disk Disk Space, IOPS, Mounts
CPU CPU Load, CPU Memory Disk
Network Network traffic, TCP flow, Connections
Node Exporter exposes metrics on ‘/metrics’ sub-path on port 9100.

sysctl
Not sure about this one.

@jacek-dudek
Copy link

In subsequent elab discussion it was confirmed that these are the changes to make:

  • Eliminate the following daemonsets from the cluster as we do not run any windows based machines: cloud-node-manager-windows, csi-azuredisk-node-win, csi-azurefile-node-win.
  • Eliminate these daemonsets, as we don't use azure file shares: csi-azurefile-node, csi-azurefile-node-win.
  • Implement node selection logic for nvidia-device-plugin that makes it run only on nodepools with machines with gpu processors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants