|
| 1 | +--- |
| 2 | +layout: page |
| 3 | +title: Utilizing GPUs with Singularity |
| 4 | +tagline: |
| 5 | +--- |
| 6 | + |
| 7 | +Utilizing GPGPUs on the Maverick supercomputer through containerized environments. |
| 8 | + |
| 9 | +## Choosing the Right System |
| 10 | + |
| 11 | +You can register your app to ANY system at TACC, but Maverick may not always be the best choice if you don't always need GPUs. |
| 12 | + |
| 13 | +| System | Cores/Node | Pros | Limitations | |
| 14 | +|:-----------|:-----------|:------------------------------------------|:-------------------------------------------| |
| 15 | +| Stampede | 16 | Thousands of nodes, Xeon Phi accelerators | Retiring ~ Dec2017 | |
| 16 | +| Stampede 2 Phase1 | 68 | Thousands of nodes, KNL processors | Slow for serial code | |
| 17 | +| Stampede 2 Phase2 | 48 | Thousands of nodes, Skylake processors | Coming Soon, High Demand | |
| 18 | +| Lonestar 5 | 24 | Compute, GPUs, Large-mem | UT only, slow external network | |
| 19 | +| Wrangler | 24 | SSD Filesystem for fast I/O, Hosted Databases, Hadoop, HDFS | Low node-count | |
| 20 | +| Jetstream | 24 | Long running instances, root access | Limited storage | |
| 21 | +| Maverick | 20 | GPUs, high memory nodes | Deprecated software stack | |
| 22 | +| Chameleon | Variable | GPUs, bare metal VM, software defined networking | Difficult to configure | |
| 23 | +| Catapult | 16 | FPGAs | Windows-only | |
| 24 | + |
| 25 | +You can learn about all choices at the [TACC Systems Overview](https://www.tacc.utexas.edu/systems/overview). Detailed specifications can be found in the *User Guide* of each system. |
| 26 | + |
| 27 | +If you have an application already configured on a non-tacc system, you can register that system to the SD2E agave tenant. |
| 28 | + |
| 29 | +- [System Registration Guide](https://sd2e.github.io/api-user-guide/docs/create_systems.html) |
| 30 | + |
| 31 | +After registration, you can not only run applications, but access data as well. Just remember that applications will run as YOUR user when you share them with others. |
| 32 | + |
| 33 | +## Containers @ TACC |
| 34 | + |
| 35 | +TACC supports containerized compute environments through [Singularity](http://singularity.lbl.gov/), which provides environment encapsulation without privilege escalation (root). Singularity provides the following functionality: |
| 36 | + |
| 37 | +- Environment encapsulation |
| 38 | +- Image based containers (single file) |
| 39 | +- Devices and interconnects are passed into container |
| 40 | + - Infiniband |
| 41 | + - GPGPUs |
| 42 | +- No abnormal privilege escalation allowed |
| 43 | +- No root daemons |
| 44 | +- Containers are read-only when not root |
| 45 | +- Pass in filesystems and directories your user has access to |
| 46 | + |
| 47 | +Since version 2.3, Singularity has supported the two following workflows |
| 48 | + |
| 49 | +### Local Container Development |
| 50 | + |
| 51 | +Create a Singularity container from scratch. |
| 52 | + |
| 53 | +1. Create image of specific size |
| 54 | +2. (sudo) bootstrap image |
| 55 | + * (sudo) [add content through definition file](http://singularity.lbl.gov/archive/docs/v2-3/bootstrap-image) |
| 56 | + * (sudo) [manually install software](http://singularity.lbl.gov/archive/docs/v2-3/docs-changing-containers) |
| 57 | +3. Done |
| 58 | + |
| 59 | +<http://singularity.lbl.gov/archive/docs/v2-3/bootstrap-image> |
| 60 | + |
| 61 | +### Docker Import |
| 62 | + |
| 63 | +Utilize your knowledge of Docker to create Singularity images. |
| 64 | + |
| 65 | +1. Pull docker image |
| 66 | +2. Run docker image |
| 67 | + |
| 68 | +<http://singularity.lbl.gov/archive/docs/v2-3/docs-docker> |
| 69 | + |
| 70 | +### Running the container |
| 71 | + |
| 72 | +These containers are run without root, so you simply |
| 73 | + |
| 74 | +- run - Run the default functionality of the container, which takes in arguments |
| 75 | +- exec - Execute a specific command inside the container, and then exit |
| 76 | +- shell - Enter the container and interactively run commands |
| 77 | + |
| 78 | +## GPU containers |
| 79 | + |
| 80 | +Since Singularity supported docker containers, it has been fairly simple to utilize GPUs for machine learning code like [TensorFlow](https://www.tensorflow.org/). From Maverick, which is TACC's GPU system: |
| 81 | + |
| 82 | +``` |
| 83 | +# Work from a compute node |
| 84 | +idev -m 60 |
| 85 | +# Load the singularity module |
| 86 | +module load tacc-singularity |
| 87 | +# Pull your image |
| 88 | +singularity pull docker://nvidia/caffe:latest |
| 89 | +# |
| 90 | +singularity exec --nv caffe-latest.img caffe device_query -gpu 0 |
| 91 | +``` |
| 92 | + |
| 93 | +Please note that the `--nv` flag specifically passes the GPU drivers into the container. If you leave it out, the GPU will not be detected. |
| 94 | + |
| 95 | +``` |
| 96 | +singularity exec caffe-latest.img caffe device_query -gpu 0 |
| 97 | +``` |
| 98 | + |
| 99 | +For TensorFlow, you can directly pull their latest GPU image and utilize it as follows. |
| 100 | + |
| 101 | +``` |
| 102 | +# Change to your $WORK directory |
| 103 | +cd $WORK |
| 104 | +#Get the software |
| 105 | +git clone https://github.com/tensorflow/models.git ~/models |
| 106 | +# Pull the image |
| 107 | +singularity pull docker://tensorflow/tensorflow:latest-gpu |
| 108 | +# Run the code |
| 109 | +singularity exec --nv tensorflow-latest-gpu.img python $HOME/models/tutorials/image/mnist/convolutional.py |
| 110 | +``` |
| 111 | + |
| 112 | +You probably noticed that we check out the models repository into your `$HOME` directory. This is because your `$HOME` and `$WORK` directories are only available inside the container if the root folders `/home` and `/work` exist inside the container. In the case of `tensorflow-latest-gpu.img`, the `/work` directory does _not_ exist, so any files there are inaccessible to the container. |
| 113 | + |
| 114 | +You may be thinking "what about overlayfs??". The Linux kernel on Maverick does not support overlayfs, so it had to be disabled in our singularity install. |
| 115 | + |
| 116 | +## Build your APP |
| 117 | + |
| 118 | +You can then use these methods in your [next Agave app](create_app.md). |
0 commit comments