This repo houses the initial scripts for building a deep learning app on BioData Catalyst powered by Seven Bridges. These scripts will be used to test training scalability, among other issues.
train.py
trains a model for image classification.
Arg | Description | Type | Default | Required |
---|---|---|---|---|
--data_dir | Path to directory containing images | string | YES | |
--data_csv | Path to CSV file pointing to images/labels | string | YES | |
--image_column | Column name for images | string | YES | |
--label_column | Column name for labels | string | YES | |
--arch | Model architecture | string | YES | |
--test_ratio | Percentage for testing data | float | 0.3 | |
--epochs | Number of training epochs | int | 15 | |
--classes | Number of classes. If not specified, classes will be inferred from labels | int | None | |
--batch_size | Training batch size | int | 8 | |
--output | Specify file name for output | string | 'model' | |
--auto_resize | Auto-resize to min height/width of image set | store_true | False | |
--auto_batch | Auto-detect max batch size. Selecting this will override any specified batch size | store_true | False | |
--index_first | Set images to depth as the first index (uncommon) | store_true | False |
Most built-in Keras applications are supported. For more information, see https://keras.io/api/applications/.
Arg | Model |
---|---|
densenet121 | DenseNet121 |
densenet169 | DenseNet169 |
densenet201 | DenseNet201 |
efficientnetb0 | EfficientNetB0 |
efficientnetb1 | EfficientNetB1 |
efficientnetb2 | EfficientNetB2 |
efficientnetb3 | EfficientNetB3 |
efficientnetb4 | EfficientNetB4 |
efficientnetb5 | EfficientNetB5 |
efficientnetb6 | EfficientNetB6 |
efficientnetb7 | EfficientNetB7 |
inceptionresnetv2 | InceptionResNetV2 |
inceptionv3 | InceptionV3 |
mobilenet | MobileNet |
mobilenetv2 | MobileNetV2 |
nasnetlarge | NASNetLarge |
nasnetmobile | NASNetMobile |
resnet101 | ResNet101 |
resnet101v2 | ResNet101V2 |
resnet152 | ResNet152 |
resnet152v2 | ResNet152V2 |
resnet50 | ResNet50 |
resnet50v2 | ResNet50V2 |
vgg16 | VGG16 |
vgg19 | VGG19 |
xception | Xception |
Using the auto_batch
feature will calculate the maximum batch size based on the memory allocated by TensorFlow when the model is loaded
if a GPU is detected. This will override any user defined --batch_size
. If no GPU is detected, it will revert to using --batch_size
,
which defaults to 8 if not defined.
get_sizes.py --data_dir /path/to/dir/ --data_csv /path/to/file.csv --image_column image_path_column_name
will create a CSV containing the image name, SimpleITK image shape, and Numpy array shape. It will also print this information to the console.