A k8s device plugin for managing and allocating vGPU devices.
Support multi container and multi GPU virtualization allocation and rich scheduling strategies.
The project forks based on gpu-manager and has undergone multiple improvements.
- Efficient scheduling performance
- Ensure the security of container resource isolation
- Simplify GRPC within containers
- Support CUDA 12.x version drivers
- Support CGroupv1 and CGroupv2
- Dual scheduling strategy for nodes and devices
- Provide GPU monitoring indicators
- Idle computing power of dynamic balancing equipment
- GPU device uses virtual memory after exceeding memory limit
- Compatible with hot swappable devices and expansion capabilities
- Compatible with Volcano Batch Scheduler
Note: Checking indicates that the function has been completed, while unchecking indicates that the function has not been completed or is planned to be implemented.
- Kubernetes v1.23+ (other version not tested)
- Docker/Containerd (other version not tested)
- Nvidia Container Toolkit (Configure nvidia container runtime)
- compile binary
make build
Note: After the program compilation is completed, three binary files will be generated in the /bin directory
- build docker image and push it
make docker-build docker-push IMG=<tag>
precondition: nvidia-container-toolkit
must be installed and correctly configure the default container runtime
- deploy
kubectl apply -f deploy/vgpu-manager-scheduler.yaml
kubectl apply -f deploy/vgpu-manager-deviceplugin.yaml
- uninstall
kubectl delete -f deploy/vgpu-manager-scheduler.yaml
kubectl delete -f deploy/vgpu-manager-deviceplugin.yaml
- label nodes with
vgpu-manager-enable=enable
kubectl label node <nodename> vgpu-manager-enable=enable
Note that the scheduler version needs to be modified according to the cluster version
containers:
- image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.28.15
imagePullPolicy: IfNotPresent
name: scheduler
Submit a VGPU container application with 10% computing power and 1GB of memory
Note: VGPU pod requires specifying the scheduler name and the number of vGPU devices to be requested by the container.
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
namespace: default
spec:
schedulerName: vgpu-scheduler # Specify scheduler (default: vgpu-manager)
terminationGracePeriodSeconds: 0
containers:
- name: default
image: nvidia/cuda:12.4.1-devel-ubuntu20.04
command: ["sleep", "9999999"]
resources:
requests:
cpu: 1
memory: 2Gi
limits:
cpu: 2
memory: 4Gi
nvidia.com/vgpu-number: 1 # Allocate one gpu
nvidia.com/vgpu-core: 10 # Allocate 10% of computing power
nvidia.com/vgpu-memory: 1024 # Allocate memory (default: Mib)
Check that the container meets expectations
root@gpu-pod1:/# nvidia-smi
[vGPU INFO(34|loader.c|1043)]: loaded nvml libraries
[vGPU INFO(34|loader.c|1171)]: loaded cuda libraries
Sun Dec 22 19:20:47 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1050 Ti Off | 00000000:01:00.0 Off | N/A |
| N/A 42C P8 N/A / ERR! | 0MiB / 1024MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
Support scheduling strategies for both node and device dimensions
binpack
: Choose the busiest nodes or devices to improve resource utilization and reduce fragmentation.spread
: Select the most idle node or device to distribute tasks and isolate faults.
Add annotations on the vGPU pod: nvidia.com/node-scheduler-policy
or nvidia.com/device-scheduler-policy
metadata:
annotations:
nvidia.com/node-scheduler-policy: spread
nvidia.com/device-scheduler-policy: binpack
Support using annotations to select the device type and uuid to be selected for the pod.
Add annotations to vGPU pod to select or exclude device types to be scheduled:
nvidia.com/include-gpu-type
nvidia.com/exclude-gpu-type
Example: Choose to use A10 and exclude A100
metadata:
annotations:
nvidia.com/include-gpu-type: "A10"
nvidia.com/exclude-gpu-type: "A100"
Note: If there are multiple devices separated by commas
Add annotations to vGPU pod to select or exclude device uuids to be scheduled:
nvidia.com/include-gpu-uuid
nvidia.com/exclude-gpu-uuid
Example: Select a GPU uuid
metadata:
annotations:
nvidia.com/include-gpu-uuid: GPU-49aa2e6a-33f3-99dd-e08b-ea4beb0e0d28
Example: Excluded a GPU uuid
metadata:
annotations:
nvidia.com/exclude-gpu-uuid: GPU-49aa2e6a-33f3-99dd-e08b-ea4beb0e0d28
Note: If there are multiple devices separated by commas
Support the use of annotations on nodes or pods to configure the computing strategy to be used: nvidia.com/vgpu-compute-policy
Supported policy values:
fixed
:Fixed GPU core limit to ensure that task core utilization does not exceed the limit (Default)balance
:Allow tasks to run beyond the limit when there are still remaining resources on the GPU, improving the overall core utilization of the GPUnone
:No core restriction effect, competing for computing power on its own
Note: If policies are configured on both Node and Pod, the configuration on Pod takes priority; otherwise, the policy on Node is used.
When the physical memory of the GPU reaches its limit, allowing the allocation of virtual memory to achieve the effect of memory oversold.
Add the environment variable CUDA_MEM_OVERSOLD
to the container configuration to enable this feature.
Example pod:
apiVersion: v1
kind: Pod
metadata:
name: gpu-example
namespace: default
spec:
schedulerName: vgpu-scheduler
containers:
- name: default
image: quay.io/jitesoft/ubuntu:24.04
command: ["sleep", "9999999"]
env:
- name: CUDA_MEM_OVERSOLD # Add environment variables to the container
value: "true"
resources:
limits:
cpu: 2
memory: 2Gi
nvidia.com/vgpu-number: 1
Note: Only in the above example, defining environment variables in the env of the container is valid.