This repository contains code to train and evaluate models on the RSNA PE dataset and the LIDC-IDRI dataset for our paper Video pretraining advances 3D deep learning on chest CTs.
The data processing steps requires only a standard computer with enough RAM to support the in-memory operations.
For training and testing models, a computer with sufficient GPU memory is recommended.
All models have been trained and tested on a Linux system (Ubuntu 16.04)
All dependencies can be found in environment.yml
- Please install Anaconda in order to create a Python environment.
- Clone this repo (from the command-line:
git clone [email protected]:rajpurkarlab/2021-fall-chest-ct.git
). - Create the environment:
conda env create -f environment.yml
. - Activate the environment:
source activate pe_models
. - Install PyTorch 1.7.1 with the right CUDA version.
Installation should take less than 10 minutes with stable internet.
Download dataset from: RSNA PE Dataset
Make sure to update PROJECT_DATA_DIR in pe_models/constants.py
with path to the directory that contains the RSNA dataset.
Please download the pre-processed label file that contains data split and DICOM header infomation using this link and place it in the RSNA data directory.
Alternatively, you can create the pre-processed file by running:
$ python pe_models/preprocess/rsna.py
To ensure that the dataset is correct and that data are loading in the correct format, run the following unittest:
$ python -W ignore -m unittest
Note that this might take a couple of minutes to complete.
You can also visually inspect example inputs in data/test/
after the unittest is complete.
Download dataset from TCIA Public Access into a PROJECT_DATA_DIR/lidc
folder.
Install pylidc and set up your ~/.pylidcrc
file using the official installation instructions.
You can then create all the necessary pre-processed files by running:
$ python pe_models/preprocess/lidc.py
You can then set the type
in an experiment YAML to lidc-window
or lidc-2d
to train on the LIDC dataset.
To train a model, run the following:
python run.py --config <path_to_config_file> --train
For more documentation, please run:
python run.py --help
To test a model, use the --test
flag, making sure that either the --checkpoint
flag is specified or that the config YAML contains a checkpoint entry:
python run.py --config <path_to_config_file> --checkpoint <path_to_ckpt> --test
To featurize all studies in a dataset (to run a 1d model for example), use the --test_split all
flag
Example configs can be found in ./configs/
Example hyperparameter sweep configs for each model can be found in ./configs/
wandb sweep <path_to_sweep_config>
wandb agent <sweep-id>
To train/test model on custom datasets:
- Please ensure that your data adhere to the same format as the RSNA/LIDC dataset. (See Example)
- Create a dataloader similar to RSNA/LIDC in ./datasets and update ./datasets/init.py to include the name of your custom dataloader.
- Make sure the data.type in your config file points to the name of your dataloader.
To run train/test script on a simulated demo dataset, use:
python run.py --config ./data/demo/resnet18_demo.yaml --checkpoint <path_to_ckpt> --test
You should expect the following results:
{'test/mean_auprc': 0.9107142686843872,
'test/mean_auroc': 0.9166666865348816,
'test/negative_exam_for_pe_auprc': 0.9107142686843872,
'test/negative_exam_for_pe_auroc': 0.9166666865348816,
'test_loss': 0.6920164227485657,
'test_loss_epoch': 0.6920164227485657}
With a GPU, this should take less than 10 minutes to run.
If our work was useful in your research, please consider citing
@article{ke2023video,
title={Video Pretraining Advances 3D Deep Learning on Chest CT Tasks},
author={Alexander Ke and Shih-Cheng Huang and Chloe P O'Connell and Michal Klimont and Serena Yeung and Pranav Rajpurkar},
booktitle={Medical Imaging with Deep Learning},
year={2023},
eprint={2304.00546},
archivePrefix={arXiv},
primaryClass={eess.IV}
}