Skip to content

Commit 4a52cf3

Browse files
authored
Merge pull request #1115 from eero-t/gpu-nfdhook-doc
Improve GPU nfdhook README
2 parents d981282 + 0b519ec commit 4a52cf3

File tree

1 file changed

+18
-8
lines changed

1 file changed

+18
-8
lines changed

cmd/gpu_nfdhook/README.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,20 @@
11
# Intel GPU NFD hook
22

3+
Table of Contents
4+
5+
* [Introduction](#introduction)
6+
* [GPU memory](#gpu-memory)
7+
* [Default labels](#default-labels)
8+
* [PCI-groups (optional)](#pci-groups-optional)
9+
* [Capability labels (optional)](#capability-labels-optional)
10+
* [Limitations](#limitations)
11+
12+
## Introduction
13+
314
This is the [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery)
4-
binary hook implementation for the Intel GPUs. The intel-gpu-initcontainer which
5-
is built among other images can be placed as part of the gpu-plugin deployment,
6-
so that it copies this hook to the host system only in those hosts, in which also
7-
gpu-plugin is deployed.
15+
binary hook implementation for the Intel GPUs. The intel-gpu-initcontainer (which
16+
is built with the other images) can be used as part of the gpu-plugin deployment
17+
to copy hook to the host systems on which gpu-plugin itself is deployed.
818

919
When NFD worker runs this hook, it will add a number of labels to the nodes,
1020
which can be used for example to deploy services to nodes with specific GPU
@@ -15,7 +25,7 @@ In the NFD deployment, the hook requires `/host-sys` -folder to have the host `/
1525

1626
## GPU memory
1727

18-
GPU memory amount is read from sysfs gt/gt* files and turned into a label.
28+
GPU memory amount is read from sysfs `gt/gt*` files and turned into a label.
1929
There are two supported environment variables named `GPU_MEMORY_OVERRIDE` and
2030
`GPU_MEMORY_RESERVED`. Both are supposed to hold numeric byte amounts. For systems with
2131
older kernel drivers or GPUs which do not support reading the GPU memory
@@ -65,14 +75,14 @@ If the value of the `pci-groups` label would not fit into the 63 character lengt
6575
Capability labels are created from information found inside debugfs, and therefore
6676
unfortunately require running the NFD worker as root. Due to coming from debugfs,
6777
which is not guaranteed to be stable, these are not guaranteed to be stable either.
68-
If you don't need these, simply do not run NFD worker as root, that is also more secure.
78+
If you do not need these, simply do not run NFD worker as root, that is also more secure.
6979
Depending on your kernel driver, running the NFD hook as root may introduce following labels:
7080

7181
name | type | description|
7282
-----|------|------|
7383
|`gpu.intel.com/platform_gen`| string | GPU platform generation name, typically an integer. Deprecated.
74-
|`gpu.intel.com/media_version`| string | GPU platform Media pipeline generation name, typically a number.
75-
|`gpu.intel.com/graphics_version`| string | GPU platform graphics/compute pipeline generation name, typically a number.
84+
|`gpu.intel.com/media_version`| string | GPU platform Media pipeline generation name, typically a number. Deprecated.
85+
|`gpu.intel.com/graphics_version`| string | GPU platform graphics/compute pipeline generation name, typically a number. Deprecated.
7686
|`gpu.intel.com/platform_<PLATFORM_NAME>.count`| number | GPU count for the named platform.
7787
|`gpu.intel.com/platform_<PLATFORM_NAME>.tiles`| number | GPU tile count in the GPUs of the named platform.
7888
|`gpu.intel.com/platform_<PLATFORM_NAME>.present`| string | "true" for indicating the presense of the GPU platform.

0 commit comments

Comments
 (0)