You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| -shared-dev-num | int | 1 | Number of containers that can share the same GPU device |
53
-
| -allocation-policy | string | none | 3 possible values: balanced, packed, none. It is meaningful when shared-dev-num > 1, balanced mode is suitable for workload balance among GPU devices, packed mode is suitable for making full use of each GPU device, none mode is the default. Allocation policy does not have effect when resource manager is enabled. |
54
+
| -allocation-policy | string | none | 3 possible values: balanced, packed, none. For shared-dev-num > 1: _balanced_ mode spreads workloads among GPU devices, _packed_ mode fills one GPU fully before moving to next, and _none_ selects first available device from kubelet. Default is _none_. Allocation policy does not have an effect when resource manager is enabled. |
54
55
55
56
The plugin also accepts a number of other arguments (common to all plugins) related to logging.
56
57
Please use the -h option to see the complete list of logging related options.
57
58
59
+
## Operation modes for different workload types
60
+
61
+
Intel GPU-plugin supports a few different operation modes. Depending on the workloads the cluster is running, some modes make more sense than others. Below is a table that explains the differences between the modes and suggests workload types for each mode. Mode selection applies to the whole GPU plugin deployment, so it is a cluster wide decision.
62
+
63
+
| Mode | Sharing | Intended workloads | Suitable for time critical workloads |
64
+
|:---- |:-------- |:------- |:------- |
65
+
| shared-dev-num == 1 | No, 1 container per GPU | Workloads using all GPU capacity, e.g. AI training | Yes |
66
+
| shared-dev-num > 1 | Yes, >1 containers per GPU | (Batch) workloads using only part of GPU resources, e.g. inference, media transcode/analytics, or CPU bound GPU workloads | No |
67
+
| shared-dev-num > 1 && resource-management | Yes and no, 1>= containers per GPU | Any. For best results, all workloads should declare their expected GPU resource usage (memory, millicores). Requires [GAS](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling). See also [fractional use](#fractional-resources-details)| Yes. 1000 millicores = exclusive GPU usage. See note below. |
68
+
69
+
> **Note**: Exclusive GPU usage with >=1000 millicores requires that also *all other GPU containers* specify (non-zero) millicores resource usage.
70
+
58
71
## Installation
59
72
60
73
The following sections detail how to obtain, build, deploy and test the GPU device plugin.
@@ -152,7 +165,9 @@ Release tagged images of the components are also available on the Docker hub, ta
152
165
release version numbers in the format `x.y.z`, corresponding to the branches and releases in this
153
166
repository. Thus the easiest way to deploy the plugin in your cluster is to run this command
154
167
155
-
Note: Replace `<RELEASE_VERSION>` with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images.
168
+
> **Note**: Replace `<RELEASE_VERSION>` with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images.
169
+
170
+
> **Note**: Add ```--dry-run=client -o yaml``` to the ```kubectl``` commands below to visualize the yaml content being applied.
156
171
157
172
See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin.
The default operator deployment depends on NFD and cert-manager. Those components have to be installed to the cluster before the operator can be deployed.
21
+
22
+
> **Note**: Operator can also be installed via Helm charts. See [INSTALL.md](../../INSTALL.md) for details.
23
+
24
+
### NFD
25
+
19
26
Install NFD (if it's not already installed) and node labelling rules (requires NFD v0.10+):
0 commit comments