diff --git a/.gitattributes b/.gitattributes index 12ba24348..030ae10a4 100644 --- a/.gitattributes +++ b/.gitattributes @@ -15,3 +15,10 @@ model_zoo/ECG2AF/km.jpg filter=lfs diff=lfs merge=lfs -text model_zoo/ECG2AF/study_design.jpg filter=lfs diff=lfs merge=lfs -text model_zoo/ECG2AF/architecture.png filter=lfs diff=lfs merge=lfs -text model_zoo/ECG2AF/salience.jpg filter=lfs diff=lfs merge=lfs -text +model_zoo/cardiac_mri_derived_left_ventricular_mass/Lseg.png filter=lfs diff=lfs merge=lfs -text +model_zoo/cardiac_mri_derived_left_ventricular_mass/Lreg.png filter=lfs diff=lfs merge=lfs -text +model_zoo/left_ventricular_mass_from_ecg_student_and_mri_teacher/TrainingAndTestSets.jpg filter=lfs diff=lfs merge=lfs -text +model_zoo/liver_fat_from_mri_ukb/liver_fat_from_echo_teacher_model.png filter=lfs diff=lfs merge=lfs -text +model_zoo/liver_fat_from_mri_ukb/liver_fat_from_ideal_student_model.png filter=lfs diff=lfs merge=lfs -text +model_zoo/ECG_PheWAS/ukb_phewas.png filter=lfs diff=lfs merge=lfs -text +model_zoo/dropfuse/overview.png filter=lfs diff=lfs merge=lfs -text diff --git a/ml4h/tensorize/README.md b/ml4h/tensorize/README.md index f2d46d786..c13131e53 100755 --- a/ml4h/tensorize/README.md +++ b/ml4h/tensorize/README.md @@ -36,23 +36,39 @@ requirements = (here / 'docker/vm_boot_images/config/tensorflow-requirements.txt ... install_requires=requirements, ``` +* Run the application to submit the pipeline to Dataflow to be executed remotely provided the command line argument `--beam_runner` is set to `DataflowRunner`. Set it to `DirectRunner` for local execution. For Example: -* **Note** that Google requires the `id` consist of only the -characters `[-a-z0-9]`, i.e. starting with a letter and ending with a letter or number. - -* Run the application to submit the pipeline to Dataflow to be executed remotely provided the -command line argument `--beam_runner` is set to `DataflowRunner`. Set it to `DirectRunner` for local execution. -For example: ``` python ml4h/tensorize/tensorize_dataflow.py \ - --id categorical-v2023-01-16 \ + --id example_id \ --tensor_type categorical \ - --bigquery_dataset ukbb_dev \ + --bigquery_dataset example_dataset \ --beam_runner DataflowRunner \ - --repo_root /Users/sam/Dropbox/Code/ml4h \ - --gcs_output_path tensors/continuous_v2023_01_17 + --repo_root /Users/johndoe/Dropbox/Code/ml4h \ + --gcs_output_path /path/to/Example_Folder ``` +* Parameters of tensorize_dataflow.py: + * id: The user-defined identifier for this pipeline run. **Note** that Google requires the `id` consist of only the characters `[-a-z0-9]`, i.e. starting with a letter and ending with a letter or number. + + * tensor_type: The type of data to be tensorized. Options are 'categorical', 'continuous', 'icd', 'disease', 'death', or 'phecode_disease'. + + * bigquery_dataset: The BigQuery dataset where the data will be drawn from. Defaults to 'ukbb_dev'. + + * beam_runner: The Apache Beam runner that will execute the pipeline. DataflowRunner is for remote execution. DirectRunner is for local execution. + + * repo_root: The root directory of the cloned ml repo. + + * gcp_project: The name of the Google Cloud Platform project. Defaults to "broad-ml4cvd". + + * gcp_region: The Google Cloud Platform region. Defaults to "us-central1". + + * gcs_output_path: gs:// folder path excluding the bucket name where tensors will be written to. (e.g. specifying /path/to/folder will write to gs:///path/to/folder) + + * logging_level: The Logging level the command should be run with. Options are "DEBUG", "INFO", "WARNING", "ERROR", or "CRITICAL". Defaults to "INFO". + + + * The pipeline can be run multiple times to tensorize different types of fields. This will populate the per-sample tensors in specified GCS buckets. In order to unify them, they can be downloaded via `gsutil` as shown below and merged using `merge_hd5s.py` script. diff --git a/model_zoo/ECG_PheWAS/ukb_phewas.png b/model_zoo/ECG_PheWAS/ukb_phewas.png index 4c5cc00b5..776875dea 100644 Binary files a/model_zoo/ECG_PheWAS/ukb_phewas.png and b/model_zoo/ECG_PheWAS/ukb_phewas.png differ diff --git a/model_zoo/cardiac_mri_derived_left_ventricular_mass/Lreg.png b/model_zoo/cardiac_mri_derived_left_ventricular_mass/Lreg.png new file mode 100644 index 000000000..12432c0a6 --- /dev/null +++ b/model_zoo/cardiac_mri_derived_left_ventricular_mass/Lreg.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:660c74be8d5128d33d305bc6dc7df89d8ccb1966f0e5118c26dbffaa65b950dc +size 31427 diff --git a/model_zoo/cardiac_mri_derived_left_ventricular_mass/Lseg.png b/model_zoo/cardiac_mri_derived_left_ventricular_mass/Lseg.png new file mode 100644 index 000000000..662541b2f --- /dev/null +++ b/model_zoo/cardiac_mri_derived_left_ventricular_mass/Lseg.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c0e989cfebab9c505a8c7f76a3a92153583d63420cdc2a83e4722a4fc16ead02 +size 28767 diff --git a/model_zoo/cardiac_mri_derived_left_ventricular_mass/README.md b/model_zoo/cardiac_mri_derived_left_ventricular_mass/README.md new file mode 100644 index 000000000..50aa81aef --- /dev/null +++ b/model_zoo/cardiac_mri_derived_left_ventricular_mass/README.md @@ -0,0 +1,26 @@ +# Deep learning to estimate cardiac magnetic resonance–derived left ventricular mass +This folder contains models and code supporting the work described in this paper published in the Cardiovascular Digital Health Journal. + +Within participants of the UK Biobank prospective cohort undergoing CMR, 2 convolutional neural networks were trained to estimate LV mass. The first (ML4Hreg) performed regression informed by manually labeled LV mass (available in 5065 individuals), while the second (ML4Hseg) performed LV segmentation informed by InlineVF (version D13A) contours. All models were optimized using the Adam variant of stochastic gradient descent with initial learning rate 1 × 10-3, exponential learning rate decay, and batch size of 4 on K80 graphical processing units. +# ML4Hreg +The first model is a 3D convolutional neural network regressor ML4Hreg trained with the manually annotated LV mass estimates provided by Petersen and colleagues to optimize the log cosh loss function, which behaves like L2 loss for small values and L1 loss for larger values: + +![Loss of ML4Hregs](Lreg.png) + +Here batch size, N, was 4 random samples from the training set of 3178 after excluding testing and validation samples from the total 5065 CMR images with LV mass values included in P. +# ML4Hseg +ML4Hseg, is a 3D semantic +segmenter. To facilitate model development in the absence of hand-labeled segmentations, the models were trained with the InlineVF contours to minimize Lseg; the per-pixel cross-entropy between the label and the model’s prediction. + +![Loss of ML4Hseg](Lseg.png) + +Here the batch size, N, was 4 from the total set of 33,071. Height, H, and width, W, are 256 voxels and there was a maximum of 13 Z slices along the short axis. There is a channel for each of the 3 labels, which were one-hot encoded in the training data, InlineVF (IVF), and probabilistic values from the softmax layer of ML4Hseg. Segmentation architectures used U-Net-style long-range connections between early convolutional layers and deeper layers. Since not all CMR images used the same pixel dimensions, models were built to incorporate pixel size values with their fully connected layers before making predictions. +# Results +The accuracy of both deep learning approaches wwere compared to LV mass obtained using InlineVF within an independent holdout set using manually labeled LV mass as the gold standard. +![Overview of left ventricular (LV) mass algorithms.](https://ars.els-cdn.com/content/image/1-s2.0-S2666693621000232-gr1.jpg) + +Within 33,071 individuals who underwent CMR, models were trained to derive CMR-based LV mass using deep learning regression (ML4Hreg) and segmentation (ML4Hseg). +![Distributions of cardiac magnetic resonance (CMR)-derived left ventricular (LV) mass obtained using each estimation method.](https://ars.els-cdn.com/content/image/1-s2.0-S2666693621000232-gr2.jpg) + +In an independent holdout set of 891 individuals with manually labeled LV mass estimates available, ML4Hseg had favorable correlation with manually labeled LV mass (r = 0.864, 95% confidence interval 0.847–0.880; MAE 10.41 g, 95% CI 9.82–10.99) as compared to ML4Hreg (r = 0.843, 95% confidence interval 0.823–0.861; MAE 10.51, 95% CI 9.86–11.15, P = .01) and centered InlineVF (r = 0.795, 95% confidence interval 0.770–0.818; MAE 14.30, 95% CI 13.46–11.01, P < .01) +![Correlation between manually labeled left ventricular (LV) mass and derived left ventricular mass estimated using each model. ](https://ars.els-cdn.com/content/image/1-s2.0-S2666693621000232-gr3.jpg) \ No newline at end of file diff --git a/model_zoo/cardiac_mri_derived_left_ventricular_mass/architecture_graph_sax_diastole_segment_no_flat.png b/model_zoo/cardiac_mri_derived_left_ventricular_mass/architecture_graph_sax_diastole_segment_no_flat.png deleted file mode 100644 index d70ef0ec3..000000000 --- a/model_zoo/cardiac_mri_derived_left_ventricular_mass/architecture_graph_sax_diastole_segment_no_flat.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:14cffdb43d6565ddb9931ef49382267fa28d7a8af8bcc44abc0ef8615e1bee51 -size 692208 diff --git a/model_zoo/dropfuse/overview.png b/model_zoo/dropfuse/overview.png index 592d3fcc7..cdfbc0ae0 100644 Binary files a/model_zoo/dropfuse/overview.png and b/model_zoo/dropfuse/overview.png differ diff --git a/model_zoo/left_ventricular_mass_from_ecg_student_and_mri_teacher/README.md b/model_zoo/left_ventricular_mass_from_ecg_student_and_mri_teacher/README.md index 2169a8a67..deb9c0c3c 100644 --- a/model_zoo/left_ventricular_mass_from_ecg_student_and_mri_teacher/README.md +++ b/model_zoo/left_ventricular_mass_from_ecg_student_and_mri_teacher/README.md @@ -1,4 +1,16 @@ # Deep Learning to Predict Cardiac Magnetic Resonance-Derived Left Ventricular Mass and Hypertrophy from 12-Lead Electrocardiograms -Three pre-trained models are included here. The model `ecg_rest_raw_age_sex_bmi_lvm_asymmetric_loss.h5` takes as input a 12 Lead resting ECG, as well as age, sex and BMI and has two outputs: one which regresses the left ventricular mass, and a second which gives a probability of left ventricular hypertrophy. -This model was trained with the asymmetric loss described in the paper. The model `ecg_rest_raw_lvm_asymmetric_loss.h5` takes only an ECG as input and regresses left ventricular mass, this model was also trained with the asymmetric loss. -The third model, `ecg_rest_raw_lvm_symmetric_loss.h5` takes only an ECG as input and regresses left ventricular mass, this model was trained with the symmetric logcosh loss. The raw voltage values from the ECG are normalized by dividing by 2000 prior to being input to the model. \ No newline at end of file + +This folder contains models and code supporting the work described in [this paper](https://www.ahajournals.org/doi/10.1161/CIRCIMAGING.120.012281?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed) published in the journal Circulation: Cardiovascular Imaging. + +# LVM-AI +Left Ventricular Mass-Artificial Intelligence (LVM-AI) is a one-dimensional convolutional neural network trained to predict CMR-derived LV mass using 12-lead ECGs. LVM-AI was trained within 32239 individuals from the UK Biobank with paired CMR and 12-lead ECG. It was provided with the entire 10 seconds of the 12-lead ECG waveform as well as participant age, sex, and BMI. +LVM-AI was evaluated in a UK Biobank test set as well as an external health care–based Mass General Brigham (MGB) dataset. In both test sets, LVM-AI was compared to with traditional ECG-based rules for diagnosing CMR-derived left ventricular hypertrophy. Associations between LVM-AI predicted LV mass index and incident cardiovascular events were tested in the UK Biobank and a separate MGB-based ambulatory cohort (MGB outcomes) +![Overview of the training and test samples](TrainingAndTestSets.jpg) +When compared with any ECG rule, LVM-AI demonstrated similar LVH discrimination in the UK Biobank (LVM-AI c-statistic 0.653 [95% CI, 0.608 -0.698] versus any ECG rule c-statistic 0.618 [95% CI, 0.574 -0.663], P=0.11) and superior discrimination in MGB (0.621; 95% CI, 0.592 -0.649 versus 0.588; 95% CI, 0.564 -0.611, P=0.02). + + +# Models +Three pre-trained models are included here: +The model `ecg_rest_raw_age_sex_bmi_lvm_asymmetric_loss.h5` takes as input a 12 Lead resting ECG, as well as age, sex and BMI and has two outputs: one which regresses the left ventricular mass, and a second which gives a probability of left ventricular hypertrophy. This model was trained with the asymmetric loss described in the paper. +The model `ecg_rest_raw_lvm_asymmetric_loss.h5` takes only an ECG as input and regresses left ventricular mass. This model was also trained with the asymmetric loss. +The third model, `ecg_rest_raw_lvm_symmetric_loss.h5` takes only an ECG as input and regresses left ventricular mass. This model was trained with the symmetric logcosh loss. The raw voltage values from the ECG are normalized by dividing by 2000 prior to being input to the model. \ No newline at end of file diff --git a/model_zoo/left_ventricular_mass_from_ecg_student_and_mri_teacher/TrainingAndTestSets.jpg b/model_zoo/left_ventricular_mass_from_ecg_student_and_mri_teacher/TrainingAndTestSets.jpg new file mode 100644 index 000000000..058cf3063 --- /dev/null +++ b/model_zoo/left_ventricular_mass_from_ecg_student_and_mri_teacher/TrainingAndTestSets.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6bea557c5150adbec08f1e50410d7b61c5ce228752c4b9785f7243f5e88bcd69 +size 175820 diff --git a/model_zoo/liver_fat_from_mri_ukb/README.md b/model_zoo/liver_fat_from_mri_ukb/README.md index 62fdb8e8a..38cc10c42 100644 --- a/model_zoo/liver_fat_from_mri_ukb/README.md +++ b/model_zoo/liver_fat_from_mri_ukb/README.md @@ -1,5 +1,7 @@ # Machine learning enables new insights into clinical significance of and genetic contributions to liver fat accumulation +This folder contains models and code supporting the work described in [this paper](https://www.sciencedirect.com/science/article/pii/S2666979X21000823) published in Cell Genomics + Here we host two models for estimating liver fat from abdominal MRI. The liver fat percentage training data is from the returned liver fat values in the [UK Biobank field ID 22402](https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=22402). These values were only calculated for the echo protocol, so to infer liver fat from the ideal protocl we used a teacher/student modeling approach. @@ -9,7 +11,7 @@ This model takes input of shape 160 x 160 x 10 and emits a scalar representing e The input TensorMap is defined at `tensormap.ukb.mri.gre_mullti_echo_10_te_liver`. The output TensorMap associated with these values is defined at `tensormap.ukb.mri.liver_fat`. The keras model file is at [liver_fat_from_echo.h5](liver_fat_from_echo.h5) and the model architecture is shown below. The "?" in the input dimension represents the batch size of the input, which can be determined at runtime. When training the teacher model we used a batch size of 8. -![](liver_fat_from_echo_teacher_model.png) +![https://www.medrxiv.org/content/10.1101/2020.09.03.20187195v1](liver_fat_from_echo_teacher_model.png) ## Student Model diff --git a/model_zoo/liver_fat_from_mri_ukb/liver_fat_from_echo_teacher_model.png b/model_zoo/liver_fat_from_mri_ukb/liver_fat_from_echo_teacher_model.png index 9247e5af7..9f515806c 100644 Binary files a/model_zoo/liver_fat_from_mri_ukb/liver_fat_from_echo_teacher_model.png and b/model_zoo/liver_fat_from_mri_ukb/liver_fat_from_echo_teacher_model.png differ diff --git a/model_zoo/liver_fat_from_mri_ukb/liver_fat_from_ideal_student_model.png b/model_zoo/liver_fat_from_mri_ukb/liver_fat_from_ideal_student_model.png index e8c5fb164..3a141ec42 100644 Binary files a/model_zoo/liver_fat_from_mri_ukb/liver_fat_from_ideal_student_model.png and b/model_zoo/liver_fat_from_mri_ukb/liver_fat_from_ideal_student_model.png differ diff --git a/model_zoo/mi_feature_selection/README.md b/model_zoo/mi_feature_selection/README.md index 106496bb8..9248225e4 100644 --- a/model_zoo/mi_feature_selection/README.md +++ b/model_zoo/mi_feature_selection/README.md @@ -44,5 +44,5 @@ xxx = pickle.load(open('models/coxnet_survival_05_final.pickle', 'rb')) ### Citation -**Selection of 51 predictors from 13,782 candidate multimodal features using machine learning improves coronary artery disease prediction**, Saaket Agrawal, BS*, Marcus D. R. Klarqvist, PhD, MSc, MSc*, Connor Emdin, DPhil, MD, Aniruddh P. Patel, MD, Manish D. Paranjpe, BA, Patrick T. Ellinor, MD, PhD, Anthony Philippakis, MD, PhD, Kenney Ng, PhD, Puneet Batra, PhD, Amit V. Khera, MD, MSc +**[Selection of 51 predictors from 13,782 candidate multimodal features using machine learning improves coronary artery disease prediction](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8672148/)**, Saaket Agrawal, BS*, Marcus D. R. Klarqvist, PhD, MSc, MSc*, Connor Emdin, DPhil, MD, Aniruddh P. Patel, MD, Manish D. Paranjpe, BA, Patrick T. Ellinor, MD, PhD, Anthony Philippakis, MD, PhD, Kenney Ng, PhD, Puneet Batra, PhD, Amit V. Khera, MD, MSc diff --git a/model_zoo/silhouette_mri/README.md b/model_zoo/silhouette_mri/README.md index 1763e3c44..14fd3633f 100644 --- a/model_zoo/silhouette_mri/README.md +++ b/model_zoo/silhouette_mri/README.md @@ -17,4 +17,4 @@ Several files are provided: ### Citation -**Estimating body fat distribution - a driver of cardiometabolic health - from silhouette images**, Marcus D. R. Klarqvist, PhD*, Saaket Agrawal, BS*, Nathaniel Diamant, BS, Patrick T. Ellinor, MD, PhD, Anthony Philippakis, MD, PhD, Kenney Ng, PhD, Puneet Batra, PhD, Amit V. Khera, MD +**[Estimating body fat distribution - a driver of cardiometabolic health - from silhouette images](https://www.medrxiv.org/content/10.1101/2022.01.14.22269328v2)**, Marcus D. R. Klarqvist, PhD*, Saaket Agrawal, BS*, Nathaniel Diamant, BS, Patrick T. Ellinor, MD, PhD, Anthony Philippakis, MD, PhD, Kenney Ng, PhD, Puneet Batra, PhD, Amit V. Khera, MD