wenouyang
diff --git a/‎README.md
Lines changed: 12 additions & 6 deletions b/‎README.md
Lines changed: 12 additions & 6 deletions
diff --git a/‎notebooks/Keras_TF_MultiGPU.ipynb
Lines changed: 87 additions & 20 deletions b/‎notebooks/Keras_TF_MultiGPU.ipynb
Lines changed: 87 additions & 20 deletions
@@ -50,14 +50,20 @@ Input for this model is the standard [CIFAR-10 dataset](http://www.cs.toronto.ed
 
 **This is a work in progress**
 
-| DL Library                                        | 1xP100/CUDA 9/CuDNN 7 | 2xP100/CUDA 9/CuDNN 7 | 4xP100/CUDA 9/CuDNN 7 | 
-| -----------------------------------------------   | :------------------:  | :-------------------: | :------------------:  | 
-| [Pytorch](notebooks/PyTorch_MultiGPU.ipynb)       | 41min46s              | 28min50s              | 23min31s                     |
-| [Keras(TF)](notebooks/Keras_TF_MultiGPU.ipynb)    | 51min27s              | 32min1s               | 23min3s                     |
-| [Tensorflow](notebooks/Tensorflow_MultiGPU.ipynb) | 62min8s               | 44min13s              | 33min                     |
+**CUDA 9/CuDNN 7.0**
 
+| DL Library                                        | 1xP100                | 2xP100                | 4xP100                | **4xP100 Synthetic Data** | 
+| -----------------------------------------------   | :------------------:  | :-------------------: | :------------------:  | :------------------:  | 
+| [Pytorch](notebooks/PyTorch_MultiGPU.ipynb)       | 41min46s              | 28min50s              | 23min7s               | 11min48s              |
+| [Keras(TF)](notebooks/Keras_TF_MultiGPU.ipynb)    | 51min27s              | 32min1s               | 22min49s              | 18min30s              |
+| [Tensorflow](notebooks/Tensorflow_MultiGPU.ipynb) | 62min8s               | 44min13s              | 31min4s               | 17min10s              |
+| [Chainer]()                                       | ?                     | ?                     | ?                     | ?                     |
+| [MXNet]()                                         | ?                     | ?                     | ?                     | ?                     |
 
-Input for this model is 112,120 PNGs of chest X-rays. **Note for the notebook to automatically download the data you must install [Azcopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux#download-and-install-azcopy) and increase the size of your OS-Disk in Azure Portal so that you have at-least 45GB of free-space (the Chest X-ray data is large!). The notebooks may take more than 10 minutes to first download the data.** These notebooks train DenseNet-121 and use native data-loaders to pre-process the data and perform data-augmentation. We want to rewrite the data-loaders to use OpenCV instead of PIL to reduce IO-bottlenecking.
+
+Input for this model is 112,120 PNGs of chest X-rays. **Note for the notebook to automatically download the data you must install [Azcopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux#download-and-install-azcopy) and increase the size of your OS-Disk in Azure Portal so that you have at-least 45GB of free-space (the Chest X-ray data is large!). The notebooks may take more than 10 minutes to first download the data.** These notebooks train DenseNet-121 and use native data-loaders to pre-process the data and perform data-augmentation. 
+
+Comparing synthetic data to actual PNG files we can estimate the IO lag for **PyTorch (~11min), Keras(TF) (~4min), Tensorflow (~13min)!** We need to investigate this to establish the most performant data-loading pipeline and any **help is appreciated**. The current plan is to write functions in OpenCV (or perhaps use ChainerCV) and share between all frameworks.
 
 ### 3. Avg Time(s) for 1000 images: ResNet-50 - Feature Extraction
 
 
@@ -164,8 +164,8 @@
       "Please make sure to download\n",
       "https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-linux#download-and-install-azcopy\n",
       "Data already exists\n",
-      "CPU times: user 708 ms, sys: 228 ms, total: 936 ms\n",
-      "Wall time: 936 ms\n"
+      "CPU times: user 711 ms, sys: 209 ms, total: 920 ms\n",
+      "Wall time: 919 ms\n"
      ]
     }
    ],
@@ -364,8 +364,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 1min 26s, sys: 6.1 s, total: 1min 33s\n",
-      "Wall time: 1min 30s\n"
+      "CPU times: user 1min 22s, sys: 5.25 s, total: 1min 27s\n",
+      "Wall time: 1min 25s\n"
      ]
     }
    ],
@@ -390,8 +390,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 35.7 ms, sys: 3.59 ms, total: 39.3 ms\n",
-      "Wall time: 37.2 ms\n"
+      "CPU times: user 32.6 ms, sys: 3.71 ms, total: 36.4 ms\n",
+      "Wall time: 35.1 ms\n"
      ]
     }
    ],
@@ -411,24 +411,23 @@
      "output_type": "stream",
      "text": [
       "Epoch 1/5\n",
-      "342/342 [==============================] - 342s 999ms/step - loss: 0.1807 - val_loss: 0.1685\n",
+      "342/342 [==============================] - 334s 977ms/step - loss: 0.1810 - val_loss: 0.1636\n",
       "Epoch 2/5\n",
-      "342/342 [==============================] - 254s 742ms/step - loss: 0.1522 - val_loss: 0.1488\n",
+      "342/342 [==============================] - 249s 729ms/step - loss: 0.1514 - val_loss: 0.1432\n",
       "Epoch 3/5\n",
-      "342/342 [==============================] - 251s 733ms/step - loss: 0.1485 - val_loss: 0.1463\n",
+      "342/342 [==============================] - 250s 731ms/step - loss: 0.1481 - val_loss: 0.1457\n",
       "Epoch 4/5\n",
-      "342/342 [==============================] - ETA: 0s - loss: 0.145 - 245s 717ms/step - loss: 0.1458 - val_loss: 0.1481\n",
+      "342/342 [==============================] - 251s 734ms/step - loss: 0.1458 - val_loss: 0.1438\n",
       "Epoch 5/5\n",
-      "341/342 [============================>.] - ETA: 0s - loss: 0.1446Epoch 5/5\n",
-      "342/342 [==============================] - 252s 738ms/step - loss: 0.1447 - val_loss: 0.1387\n",
-      "CPU times: user 1h 6min 48s, sys: 23min 26s, total: 1h 30min 14s\n",
-      "Wall time: 23min 3s\n"
+      "342/342 [==============================] - 247s 721ms/step - loss: 0.1440 - val_loss: 0.1418\n",
+      "CPU times: user 1h 7min 8s, sys: 23min 4s, total: 1h 30min 12s\n",
+      "Wall time: 22min 49s\n"
      ]
     },
     {
      "data": {
       "text/plain": [
-       "<keras.callbacks.History at 0x7f319d282860>"
+       "<keras.callbacks.History at 0x7fdf6e7143c8>"
       ]
      },
      "execution_count": 20,
@@ -440,7 +439,7 @@
     "%%time\n",
     "# 1 GPU - Main training loop: 51min 27s\n",
     "# 2 GPU - Main training loop: 32min 1s\n",
-    "# 4 GPU - Main training loop: 23min 3s\n",
+    "# 4 GPU - Main training loop: 22min 49s\n",
     "model.fit_generator(train_dataset,\n",
     "                    epochs=EPOCHS,\n",
     "                    verbose=1,\n",
@@ -481,8 +480,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 5min 40s, sys: 1min 48s, total: 7min 29s\n",
-      "Wall time: 2min 16s\n"
+      "CPU times: user 5min 35s, sys: 1min 44s, total: 7min 20s\n",
+      "Wall time: 2min 14s\n"
      ]
     }
    ],
@@ -502,14 +501,82 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Full AUC [0.8166704403329452, 0.8701978640353484, 0.8036644715384587, 0.8991700123597787, 0.8900824513691525, 0.9197848609229234, 0.7292038166231667, 0.8975747639269652, 0.6324781069481422, 0.8465972198647057, 0.7451801565774874, 0.8049089113120023, 0.7560819980737239, 0.8914456631015975]\n",
-      "Validation AUC: 0.8216\n"
+      "Full AUC [0.810400224263596, 0.8642047989855159, 0.801330086449206, 0.9072074321344181, 0.8906798540400607, 0.9213575843667169, 0.7088805005859234, 0.9128299199053916, 0.6267736564423316, 0.8542487673046052, 0.7531549949370517, 0.803228785418665, 0.7709379338811964, 0.8884575500057307]\n",
+      "Validation AUC: 0.8224\n"
      ]
     }
    ],
    "source": [
     "print(\"Validation AUC: {0:.4f}\".format(compute_roc_auc(test_dataset.classes, y_guess, CLASSES)))"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#####################################################################################################\n",
+    "## Synthetic Data (Pure Training)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Test on fake-data -> no IO lag\n",
+    "batch_in_epoch = train_dataset.n//BATCHSIZE\n",
+    "tot_num = batch_in_epoch * BATCHSIZE\n",
+    "fake_X = np.random.rand(tot_num, 3, 224, 224).astype(np.float32)\n",
+    "fake_y = np.random.rand(tot_num, CLASSES).astype(np.float32) "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch 1/5\n",
+      "87296/87296 [==============================] - 224s 3ms/step - loss: 0.6933\n",
+      "Epoch 2/5\n",
+      "87296/87296 [==============================] - 222s 3ms/step - loss: 0.6932\n",
+      "Epoch 3/5\n",
+      "87296/87296 [==============================] - 222s 3ms/step - loss: 0.6930\n",
+      "Epoch 4/5\n",
+      "87296/87296 [==============================] - 222s 3ms/step - loss: 0.6924\n",
+      "Epoch 5/5\n",
+      "87296/87296 [==============================] - 221s 3ms/step - loss: 0.6911\n",
+      "CPU times: user 1h 5min 19s, sys: 16min 44s, total: 1h 22min 3s\n",
+      "Wall time: 18min 30s\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<keras.callbacks.History at 0x7fdda382a5c0>"
+      ]
+     },
+     "execution_count": 29,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "# 4 GPU - Main training loop: 22min 49s\n",
+    "# 4 GPU - Synthetic data: 18min 30s\n",
+    "model.fit(fake_X,\n",
+    "          fake_y,\n",
+    "          batch_size=BATCHSIZE,\n",
+    "          epochs=EPOCHS,\n",
+    "          verbose=1)"
+   ]
   }
  ],
  "metadata": {