You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+23-8Lines changed: 23 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,16 +1,18 @@
1
1
# Deep Learning Framework Examples
2
2
3
-

3
+
<palign="center">
4
+
<imgsrc="support/logo.png"alt="logo"width="50%"/>
5
+
</p>
4
6
5
-
*Note: We are finalising our multi-GPU (single-node) examples (DenseNet-121 + Data-augmentation + Logging + Data-loaders), which you can follow [here](https://github.com/ilkarman/DeepLearningFrameworks/tree/multi_gpu#2-training-time-densenet-121-on-chestxray---image-recognition-multi-gpu)*
7
+
*Note: We have recently added multi-GPU (single-node) examples on fine-tuning DenseNet-121 on Chest X-rays aka [CheXnet](https://stanfordmlgroup.github.io/projects/chexnet/). This is still work-in-progress and contributions are highly welcome!*
6
8
7
9
## Goal
8
10
9
11
1. Create a Rosetta Stone of deep-learning frameworks to allow data-scientists to easily leverage their expertise from one framework to another
10
-
2. Optimised GPU code with minimal verbosity (simple examples)
12
+
2. Optimised GPU code with using the most up-to-date highest-level APIs.
11
13
3. Common setup for comparisons across GPUs (potentially CUDA versions and precision)
12
14
4. Common setup for comparisons across languages (Python, Julia, R)
13
-
5. Possibility to verify own installation
15
+
5. Possibility to verify expected performance of own installation
14
16
4. Collaboration between different open-source communities
15
17
16
18
The notebooks are executed on an Azure [Deep Learning Virtual Machine](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-ads.dsvm-deep-learning) using both the K80 and the newer P100.
@@ -37,11 +39,25 @@ The notebooks are executed on an Azure [Deep Learning Virtual Machine](https://a
*Note: It is recommended to use higher level APIs where possible; see these notebooks for examples with [Tensorflow](support/Tensorflow_CNN_highAPI.ipynb), [MXNet](support/MXNet_CNN_highAPI.ipynb) and [CNTK](support/CNTK_CNN_highAPI.ipynb). They are not linked in the table to keep the common-structure-for-all approach*
41
44
42
45
Input for this model is the standard [CIFAR-10 dataset](http://www.cs.toronto.edu/~kriz/cifar.html) containing 50k training images and 10k test images, uniformly split across 10 classes. Each 32 by 32 image is supplied as a tensor of shape (3, 32, 32) with pixel intensity re-scaled from 0-255 to 0-1.
Input for this model is 112,120 PNGs of chest X-rays. **Note for the notebook to automatically download the data you must install [Azcopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux#download-and-install-azcopy) and increase the size of your OS-Disk in Azure Portal so that you have at-least 45GB of free-space (the Chest X-ray data is large!). The notebooks may take more than 10 minutes to first download the data.** These notebooks train DenseNet-121 and use native data-loaders to pre-process the data and perform data-augmentation. We want to rewrite the data-loaders to use OpenCV instead of PIL to reduce IO-bottlenecking.
@@ -57,10 +73,9 @@ Input for this model is the standard [CIFAR-10 dataset](http://www.cs.toronto.ed
57
73
|[R - MXNet](notebooks/.ipynb)| ??? | ??? |
58
74
59
75
60
-
61
76
A pre-trained ResNet50 model is loaded and chopped just after the avg_pooling at the end (7, 7), which outputs a 2048D dimensional vector. This can be plugged into a softmax layer or another classifier such as a boosted tree to perform transfer learning. Allowing for a warm start; this forward-only pass to the avg_pool layer is timed. *Note: batch-size remains constant, however filling the RAM on a GPU would produce further performance boosts (greater for GPUs with more RAM).*
62
77
63
-
### 3. Training Time(s): RNN (GRU) on IMDB - Sentiment Analysis
78
+
### 4. Training Time(s): RNN (GRU) on IMDB - Sentiment Analysis
@@ -142,4 +157,4 @@ The below offers some insights I gained after trying to match test-accuracy acro
142
157
143
158
1. There are multiple RNN implementations/kernels available for most frameworks (for example [Tensorflow](http://returnn.readthedocs.io/en/latest/tf_lstm_benchmark.html)); once reduced down to the cudnnLSTM/GRU level the execution is the fastest, however this implementation is less flexible (e.g. maybe you want layer normalisation) and may become problematic if inference is run on the CPU at a later stage. At the cudDNN level most of the frameworks' runtimes are very similar. [This](https://devblogs.nvidia.com/parallelforall/optimizing-recurrent-neural-networks-cudnn-5/) Nvidia blog-post goes through several interesting cuDNN optimisations for recurrent neural nets e.g. fusing - "combining the computation of many small matrices into that of larger ones and streaming the computation whenever possible, the ratio of computation to memory I/O can be increased, which results in better performance on GPU".
144
159
145
-
2. It seems that the fastest data-shape for RNNs is TNC - implementing this in [MXNet](support/MXNet_RNN_TNC.ipynb) only gave an improvement of 0.5s so I have chosen to use the sligthly slower shape to remain consistent with other frameworks and to keep the code less complicated
160
+
2. It seems that the fastest data-shape for RNNs is TNC - implementing this in [MXNet](support/MXNet_RNN_TNC.ipynb) only gave an improvement of 0.5s so I have chosen to use the sligthly slower shape to remain consistent with other frameworks and to keep the code less complicated
0 commit comments