Skip to content

Commit 2349e13

Browse files
author
Ilia Karmanov
authored
Merge pull request ilkarman#81 from ThomasDelteil/gluon_multigpu
Gluon multigpu notebook
2 parents cc9d390 + bdcc866 commit 2349e13

File tree

4 files changed

+992
-5
lines changed

4 files changed

+992
-5
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,6 @@
55
cifar-10-batches-py/
66
__pycache__
77
.DS_Store
8-
8+
notebooks/chestxray
9+
notebooks/*-0000.params
10+
notebooks/*-symbol.json

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Deep Learning Framework Examples
2-
2+
33
<p align="center">
44
<img src="support/logo.png" alt="logo" width="50%"/>
55
</p>
@@ -59,7 +59,7 @@ This is a work in progress
5959
| [Keras(TF)](notebooks/Keras_TF_MultiGPU.ipynb) | 51min | 22min |
6060
| [Tensorflow](notebooks/Tensorflow_MultiGPU.ipynb) | 50min | 25min |
6161
| [Chainer](notebooks/Chainer_MultiGPU.ipynb) | 65min | ? |
62-
| [MXNet(Gluon)]() | ? | ? |
62+
| [MXNet(Gluon)](notebooks/Gluon_MultiGPU.ipynb) | TBA | TBA |
6363

6464
**Train w/ synthetic-data**
6565

@@ -69,7 +69,7 @@ This is a work in progress
6969
| [Keras(TF)](notebooks/Keras_TF_MultiGPU.ipynb) | 18min25s |
7070
| [Tensorflow](notebooks/Tensorflow_MultiGPU.ipynb) | 17min6s |
7171
| [Chainer]() | ? |
72-
| [MXNet(Gluon)]() | ? |
72+
| [MXNet(Gluon)](notebooks/Gluon_MultiGPU.ipynb) | TBA |
7373

7474

7575
Input for this model is 112,120 PNGs of chest X-rays resized to (264, 264). **Note for the notebook to automatically download the data you must install [Azcopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux#download-and-install-azcopy) and increase the size of your OS-Disk in Azure Portal so that you have at-least 45GB of free-space (the Chest X-ray data is large!). The notebooks may take more than 10 minutes to first download the data.** These notebooks train DenseNet-121 and use native data-loaders to pre-process the data and perform the following data-augmentation:
@@ -188,4 +188,4 @@ The below offers some insights we gained after trying to match test-accuracy acr
188188

189189
1. There are multiple RNN implementations/kernels available for most frameworks (for example [Tensorflow](http://returnn.readthedocs.io/en/latest/tf_lstm_benchmark.html)); once reduced down to the cudnnLSTM/GRU level the execution is the fastest, however this implementation is less flexible (e.g. maybe you want layer normalisation) and may become problematic if inference is run on the CPU at a later stage. At the cudDNN level most of the frameworks' runtimes are very similar. [This](https://devblogs.nvidia.com/parallelforall/optimizing-recurrent-neural-networks-cudnn-5/) Nvidia blog-post goes through several interesting cuDNN optimisations for recurrent neural nets e.g. fusing - "combining the computation of many small matrices into that of larger ones and streaming the computation whenever possible, the ratio of computation to memory I/O can be increased, which results in better performance on GPU".
190190

191-
2. It seems that the fastest data-shape for RNNs is TNC - implementing this in [MXNet](notebooks/MXNet_RNN_TNC.ipynb) only gave an improvement of 0.5s so I have chosen to use the sligthly slower shape to remain consistent with other frameworks and to keep the code less complicated
191+
2. It seems that the fastest data-shape for RNNs is TNC - implementing this in [MXNet](notebooks/MXNet_RNN_TNC.ipynb) only gave an improvement of 0.5s so I have chosen to use the sligthly slower shape to remain consistent with other frameworks and to keep the code less complicated

0 commit comments

Comments
 (0)