Skip to content
This repository was archived by the owner on Mar 29, 2022. It is now read-only.

Commit 2636a4a

Browse files
authored
Merge pull request #10 from SD2E/master
fast forward devel
2 parents bc53780 + 6a14c49 commit 2636a4a

File tree

3 files changed

+121
-1
lines changed

3 files changed

+121
-1
lines changed

docs/authorization.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ expires at: Thu Sep 21 16:40:51 CDT 2017
3838
When your token expires, there is no need to generate a new token. It can be
3939
refreshed by using the command:
4040
```
41-
% auth-tokens-refresh
41+
% auth-tokens-refresh -S
4242
4343
Token for sd2e:wallen successfully refreshed and cached for 14400 seconds
4444
d6f5da11v69ad337f9gaf1926487f9ec4

docs/singularity_gpu_01.md

+118
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
---
2+
layout: page
3+
title: Utilizing GPUs with Singularity
4+
tagline:
5+
---
6+
7+
Utilizing GPGPUs on the Maverick supercomputer through containerized environments.
8+
9+
## Choosing the Right System
10+
11+
You can register your app to ANY system at TACC, but Maverick may not always be the best choice if you don't always need GPUs.
12+
13+
| System | Cores/Node | Pros | Limitations |
14+
|:-----------|:-----------|:------------------------------------------|:-------------------------------------------|
15+
| Stampede | 16 | Thousands of nodes, Xeon Phi accelerators | Retiring ~ Dec2017 |
16+
| Stampede 2 Phase1 | 68 | Thousands of nodes, KNL processors | Slow for serial code |
17+
| Stampede 2 Phase2 | 48 | Thousands of nodes, Skylake processors | Coming Soon, High Demand |
18+
| Lonestar 5 | 24 | Compute, GPUs, Large-mem | UT only, slow external network |
19+
| Wrangler | 24 | SSD Filesystem for fast I/O, Hosted Databases, Hadoop, HDFS | Low node-count |
20+
| Jetstream | 24 | Long running instances, root access | Limited storage |
21+
| Maverick | 20 | GPUs, high memory nodes | Deprecated software stack |
22+
| Chameleon | Variable | GPUs, bare metal VM, software defined networking | Difficult to configure |
23+
| Catapult | 16 | FPGAs | Windows-only |
24+
25+
You can learn about all choices at the [TACC Systems Overview](https://www.tacc.utexas.edu/systems/overview). Detailed specifications can be found in the *User Guide* of each system.
26+
27+
If you have an application already configured on a non-tacc system, you can register that system to the SD2E agave tenant.
28+
29+
- [System Registration Guide](https://sd2e.github.io/api-user-guide/docs/create_systems.html)
30+
31+
After registration, you can not only run applications, but access data as well. Just remember that applications will run as YOUR user when you share them with others.
32+
33+
## Containers @ TACC
34+
35+
TACC supports containerized compute environments through [Singularity](http://singularity.lbl.gov/), which provides environment encapsulation without privilege escalation (root). Singularity provides the following functionality:
36+
37+
- Environment encapsulation
38+
- Image based containers (single file)
39+
- Devices and interconnects are passed into container
40+
- Infiniband
41+
- GPGPUs
42+
- No abnormal privilege escalation allowed
43+
- No root daemons
44+
- Containers are read-only when not root
45+
- Pass in filesystems and directories your user has access to
46+
47+
Since version 2.3, Singularity has supported the two following workflows
48+
49+
### Local Container Development
50+
51+
Create a Singularity container from scratch.
52+
53+
1. Create image of specific size
54+
2. (sudo) bootstrap image
55+
* (sudo) [add content through definition file](http://singularity.lbl.gov/archive/docs/v2-3/bootstrap-image)
56+
* (sudo) [manually install software](http://singularity.lbl.gov/archive/docs/v2-3/docs-changing-containers)
57+
3. Done
58+
59+
<http://singularity.lbl.gov/archive/docs/v2-3/bootstrap-image>
60+
61+
### Docker Import
62+
63+
Utilize your knowledge of Docker to create Singularity images.
64+
65+
1. Pull docker image
66+
2. Run docker image
67+
68+
<http://singularity.lbl.gov/archive/docs/v2-3/docs-docker>
69+
70+
### Running the container
71+
72+
These containers are run without root, so you simply
73+
74+
- run - Run the default functionality of the container, which takes in arguments
75+
- exec - Execute a specific command inside the container, and then exit
76+
- shell - Enter the container and interactively run commands
77+
78+
## GPU containers
79+
80+
Since Singularity supported docker containers, it has been fairly simple to utilize GPUs for machine learning code like [TensorFlow](https://www.tensorflow.org/). From Maverick, which is TACC's GPU system:
81+
82+
```
83+
# Work from a compute node
84+
idev -m 60
85+
# Load the singularity module
86+
module load tacc-singularity
87+
# Pull your image
88+
singularity pull docker://nvidia/caffe:latest
89+
#
90+
singularity exec --nv caffe-latest.img caffe device_query -gpu 0
91+
```
92+
93+
Please note that the `--nv` flag specifically passes the GPU drivers into the container. If you leave it out, the GPU will not be detected.
94+
95+
```
96+
singularity exec caffe-latest.img caffe device_query -gpu 0
97+
```
98+
99+
For TensorFlow, you can directly pull their latest GPU image and utilize it as follows.
100+
101+
```
102+
# Change to your $WORK directory
103+
cd $WORK
104+
#Get the software
105+
git clone https://github.com/tensorflow/models.git ~/models
106+
# Pull the image
107+
singularity pull docker://tensorflow/tensorflow:latest-gpu
108+
# Run the code
109+
singularity exec --nv tensorflow-latest-gpu.img python $HOME/models/tutorials/image/mnist/convolutional.py
110+
```
111+
112+
You probably noticed that we check out the models repository into your `$HOME` directory. This is because your `$HOME` and `$WORK` directories are only available inside the container if the root folders `/home` and `/work` exist inside the container. In the case of `tensorflow-latest-gpu.img`, the `/work` directory does _not_ exist, so any files there are inaccessible to the container.
113+
114+
You may be thinking "what about overlayfs??". The Linux kernel on Maverick does not support overlayfs, so it had to be disabled in our singularity install.
115+
116+
## Build your APP
117+
118+
You can then use these methods in your [next Agave app](create_app.md).

index.md

+2
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ the SD2E platform. Documentation for getting started with the SD2E API is below.
5353

5454
&nbsp;&nbsp;&nbsp;&nbsp;4.3 Actor Based Containers (*coming soon*)
5555

56+
&nbsp;&nbsp;&nbsp;&nbsp;4.4 [Singularity GPGPU Containers](docs/singularity_gpu_01.md)
57+
5658

5759

5860

0 commit comments

Comments
 (0)