Skip to content

Commit 6bbc40e

Browse files
authored
Merge pull request intel#798 from eero-t/media-wa
Provide workaround for the media issue and document it
2 parents 1518fb1 + 599fc18 commit 6bbc40e

File tree

2 files changed

+163
-0
lines changed

2 files changed

+163
-0
lines changed

cmd/gpu_plugin/README.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ Table of Contents
1616
* [Run the plugin as administrator](#run-the-plugin-as-administrator)
1717
* [Verify plugin registration](#verify-plugin-registration)
1818
* [Testing the plugin](#testing-the-plugin)
19+
* [Issues with media workloads on multi-GPU setups](#issues-with-media-workloads-on-multi-gpu-setups)
20+
* [Workaround for QSV and VA-API](#workaround-for-qsv-and-va-api)
21+
1922

2023
## Introduction
2124

@@ -242,3 +245,64 @@ We can test the plugin is working by deploying an OpenCL image and running `clin
242245
---- ------ ---- ---- -------
243246
Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 Insufficient gpu.intel.com/i915.
244247
```
248+
249+
250+
## Issues with media workloads on multi-GPU setups
251+
252+
Unlike with 3D & compute, and OneVPL media API, QSV (MediaSDK) & VA-API
253+
media APIs do not offer device discovery functionality for applications.
254+
There is nothing (e.g. environment variable) with which the default
255+
device could be overridden either.
256+
257+
As result, most (all?) media applications using VA-API or QSV, fail to
258+
locate the correct GPU device file unless it is the first ("renderD128")
259+
one, or device file name is explictly specified with an application option.
260+
261+
Kubernetes device plugins expose only requested number of device
262+
files, and their naming matches host device file names (for several
263+
reasons unrelated to media). Therefore, on multi-GPU hosts, the only
264+
GPU device file mapped to the media container can be some other one
265+
than "renderD128", and media applications using VA-API or QSV need to
266+
be explicitly told which one to use.
267+
268+
These options differ from application to application. Relevant FFmpeg
269+
options are documented here:
270+
* VA-API: https://trac.ffmpeg.org/wiki/Hardware/VAAPI
271+
* QSV: https://github.com/Intel-Media-SDK/MediaSDK/wiki/FFmpeg-QSV-Multi-GPU-Selection-on-Linux
272+
273+
274+
### Workaround for QSV and VA-API
275+
276+
[Render device](render-device.sh) shell script locates and outputs the
277+
correct device file name. It can be added to the container and used
278+
to give device file name for the application.
279+
280+
Use it either from another script invoking the application, or
281+
directly from the Pod YAML command line. In latter case, it can be
282+
used either to add the device file name to the end of given command
283+
line, like this:
284+
285+
```bash
286+
command: ["render-device.sh", "vainfo", "--display", "drm", "--device"]
287+
288+
=> /usr/bin/vainfo --display drm --device /dev/dri/renderDXXX
289+
```
290+
291+
Or inline, like this:
292+
293+
```bash
294+
command: ["/bin/sh", "-c",
295+
"vainfo --device $(render-device.sh 1) --display drm"
296+
]
297+
```
298+
299+
If device file name is needed for multiple commands, one can use shell variable:
300+
301+
```bash
302+
command: ["/bin/sh", "-c",
303+
"dev=$(render-device.sh 1) && vainfo --device $dev && <more commands>"
304+
]
305+
```
306+
307+
With argument N, script outputs name of the Nth suitable GPU device
308+
file, which can be used when more than one GPU resource was requested.

cmd/gpu_plugin/render-device.sh

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
#!/bin/sh
2+
#
3+
# Copyright 2021 Intel Corporation.
4+
#
5+
# SPDX-License-Identifier: Apache-2.0
6+
#
7+
#
8+
# Some GPU workloads are unable to find the (Intel) GPU provisioned for
9+
# them by Kubernetes. This script checks and tells which device to use.
10+
#
11+
# For example (all?) media applications using VA-API or QSV media APIs [1],
12+
# fail when /dev/dri/renderD128 is not present, or happens to be of
13+
# a type not supported by the media driver.
14+
#
15+
# Happily (all?) media applications have an option to specify a suitable
16+
# render device name, which can be used with this script.
17+
#
18+
# [1] Compute, 3D, and OneVPL APIs do not suffer from this issue.
19+
#
20+
#
21+
# Running the script requires only few tools, which should be present in
22+
# all distro base images. The required tools, and the packages they
23+
# reside in Debian based distros, are:
24+
# - dash: 'sh' (minimal bourne shell)
25+
# - coreutils: 'seq', 'cat', 'echo'
26+
# - sed: 'sed'
27+
#
28+
# But they are also provided by 'busybox' and 'toybox' tool sets.
29+
30+
31+
usage ()
32+
{
33+
name=${0##*/}
34+
echo "Provides (Intel GPU) render device name application can use, either"
35+
echo "on standard output, or added to given command line. If device index"
36+
echo "N is given, provides name of Nth available (Intel GPU) render device."
37+
echo
38+
echo "Usage:"
39+
echo " $name <device index>"
40+
echo " $name [device index] <media program> [other options] <GPU selection option>"
41+
echo
42+
echo "Examples:"
43+
echo " \$ vainfo --display drm --device \$($name 1)"
44+
echo " \$ $name vainfo --display drm --device"
45+
echo " Running: vainfo --display drm --device /dev/dri/renderD140"
46+
echo
47+
echo "ERROR: $1!"
48+
exit 1
49+
}
50+
51+
if [ $# -eq 0 ]; then
52+
usage "no arguments given"
53+
fi
54+
55+
# determine required GPU index
56+
NaN=$(echo "$1" | sed 's/[0-9]\+//')
57+
if [ "$NaN" = "" ] && [ "$1" != "" ]; then
58+
required=$1
59+
if [ "$required" -lt 1 ] || [ "$required" -gt 127 ]; then
60+
usage "GPU index $required not in range 1-127"
61+
fi
62+
shift
63+
else
64+
required=1
65+
fi
66+
visible=0
67+
68+
vendor=""
69+
intel="0x8086"
70+
# find host index "i" for Nth visible Intel GPU device
71+
for i in $(seq 128 255); do
72+
if [ -w "/dev/dri/renderD$i" ]; then
73+
vendor=$(cat "/sys/class/drm/renderD$i/device/vendor")
74+
if [ "$vendor" = "$intel" ]; then
75+
visible=$((visible+1))
76+
if [ $visible -eq $required ]; then
77+
break
78+
fi
79+
fi
80+
fi
81+
done
82+
83+
if [ $visible -ne $required ]; then
84+
usage "$visible Intel GPU(s) found, not $required as requested"
85+
fi
86+
device="/dev/dri/renderD$i"
87+
88+
if [ $# -eq 0 ]; then
89+
echo "$device"
90+
exit 0
91+
fi
92+
93+
if [ $# -lt 2 ]; then
94+
usage "media program and/or GPU selection option missing"
95+
fi
96+
97+
# run given media workload with GPU device name appended to end
98+
echo "Running: $* $device"
99+
exec "$@" "$device"

0 commit comments

Comments
 (0)