Skip to content

Commit f34457a

Browse files
authored
Merge pull request #18 from EntropyOrg/doc-gpu-setup
Add more GPU-specific documentation
2 parents 6b1af9e + b1058f2 commit f34457a

File tree

5 files changed

+83
-2
lines changed

5 files changed

+83
-2
lines changed

README.pod

+3
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

lib/AI/TensorFlow/Libtensorflow.pm

+3
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,9 @@ __END__
7373
The C<libtensorflow> library provides low-level C bindings
7474
for TensorFlow with a stable ABI.
7575
76+
For more detailed information about this library including how to get started,
77+
see L<AI::TensorFlow::Libtensorflow::Manual>.
78+
7679
=cut
7780
7881
=begin :badges

lib/AI/TensorFlow/Libtensorflow/Manual.pod

+3
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
= L<AI::TensorFlow::Libtensorflow::Manual::Quickstart>
99
Start here to get an overview of the library.
1010

11+
= L<AI::TensorFlow::Libtensorflow::Manual::GPU>
12+
GPU-specific installation and usage information.
13+
1114
= L<AI::TensorFlow::Libtensorflow::Manual::CAPI>
1215
Appendix of all C API functions with their signatures. These are linked from
1316
the documentation of individual methods.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# ABSTRACT: GPU-specific installation and usage information.
2+
# PODNAME: AI::TensorFlow::Libtensorflow::Manual::GPU
3+
=pod
4+
5+
=head1 DESCRIPTION
6+
7+
This guide provides information about using the GPU version of
8+
C<libtensorflow>. This is currently specific to NVIDIA GPUs as
9+
they provide the CUDA API that C<libtensorflow> targets for GPU devices.
10+
11+
=head1 INSTALLATION
12+
13+
In order to use a GPU with C<libtensorflow>, you will need to check that the
14+
L<hardware requirements|https://www.tensorflow.org/install/pip#hardware_requirements> and
15+
L<software requirements|https://www.tensorflow.org/install/pip#software_requirements> are
16+
met. Please refer to the official documentation for the specific
17+
hardware capabilities and software versions.
18+
19+
An alternative to installing all the software listed on the "bare metal" host
20+
machine is to use C<libtensorflow> via a Docker container and the
21+
NVIDIA Container Toolkit. See L<AI::TensorFlow::Libtensorflow::Manual::Quickstart/DOCKER IMAGES>
22+
for more information.
23+
24+
=head1 RUNTIME
25+
26+
When running C<libtensorflow>, your program will attempt to acquire quite a bit
27+
of GPU VRAM. You can check if you have enough free VRAM by using the
28+
C<nvidia-smi> command which displays resource information as well as which
29+
processes are currently using the GPU. If C<libtensorflow> is not able to
30+
allocate enough memory, it will crash with an out-of-memory (OOM) error. This
31+
is typical when running multiple programs that both use the GPU.
32+
33+
If you have multiple GPUs, you can control which GPUs your program can access
34+
by using the
35+
L<C<CUDA_VISIBLE_DEVICES> environment variable|https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars>
36+
provided by the underlying CUDA library. This is typically
37+
done by setting the variable in a C<BEGIN> block before loading
38+
L<AI::TensorFlow::Libtensorflow>:
39+
40+
BEGIN {
41+
# Set the specific GPU device that is available
42+
# to this program to GPU index 0, which is the
43+
# first GPU as listed in the output of `nvidia-smi`.
44+
$ENV{CUDA_VISIBLE_DEVICES} = '0';
45+
require AI::TensorFlow::Libtensorflow;
46+
}
47+
48+
=cut

lib/AI/TensorFlow/Libtensorflow/Manual/Quickstart.pod

+26-2
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,8 @@ C<http://127.0.0.1:8888/> in order to connect to the Jupyter Notebook interface
9494
via the web browser. In the browser, click on the C<notebook> folder to access
9595
the notebooks.
9696

97+
=head2 GPU Docker support
98+
9799
If using the GPU Docker image for NVIDIA support, make sure that the
98100
L<TensorFlow Docker requirements|https://www.tensorflow.org/install/docker#tensorflow_docker_requirements>
99101
are met and that the correct flags are passed to C<docker run>, for example
@@ -102,8 +104,30 @@ C<<
102104
docker run --rm --gpus all [...]
103105
>>
104106

105-
More information about NVIDIA Docker containers can be found in the user guide
106-
for the L<NVIDIA Container Toolkit|https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html>.
107+
More information about NVIDIA Docker containers can be found in the
108+
NVIDIA Container Toolkit
109+
L<Installation Guide|https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html>
110+
(specifically L<Setting up NVIDIA Container Toolkit|https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit>)
111+
and
112+
L<User Guide|https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html>.
113+
114+
=head3 Diagnostics
115+
116+
When using the Docker GPU image, you may come across the error
117+
118+
C<<
119+
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
120+
>>
121+
122+
Make sure that you have installed the NVIDIA Container Toolkit correctly
123+
via the Installation Guide. Also make sure that you only have one Docker daemon
124+
installed. The recommended approach is to install via the official Docker
125+
releases at L<https://docs.docker.com/engine/install/>. Note that in some
126+
cases, you may have other unofficial Docker installations such as the
127+
C<docker.io> package or the Snap C<docker> package, which may conflict with
128+
the official vendor-provided NVIDIA Container Runtime.
129+
130+
=head2 Docker Tags
107131

108132
=begin :list
109133

0 commit comments

Comments
 (0)