Skip to content

iupui-soic/amr_causal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Causal Effects of Antibiotic Exposure on Antimicrobial Resistance

A multi-site double machine learning study of 1.2 million culture episodes

Aravind V. Kuruvikkattil, Shikhar Shukla, Leo A. Celi, Zanthia Wiley, Judy W. Gichoya, Saptarshi Purkayastha

This repository contains the analysis code for estimating the causal effect of prior antibiotic exposure on subsequent antimicrobial resistance using double machine learning (DML) with XGBoost GPU nuisance models across three U.S. health systems: Mass General Brigham (MGB), Stanford Health Care, and Beth Israel Deaconess Medical Center (BIDMC, via the MIMIC-IV dataset).

Data Requirements

  • MGB: ARMD-MGB v1.0.0 (Wei & Kanjilal, 2025). Requires PhysioNet credentialed access.
  • Stanford: ARMD-Stanford (Nateghi Haredasht et al., 2025; Oct 22, 2025 release). Available on Dryad under CC0.
  • BIDMC/MIMIC: MIMIC-IV v3.1 (Johnson et al., 2023). Requires CITI training and signed DUA.

Raw data files should be placed in:

  • /data0/armd/ (MGB)
  • /data0/armd-stanford/ (Stanford)
  • /data0/mimic-iv/ (BIDMC / MIMIC-IV)

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Requires Python 3.10+ and an NVIDIA GPU with CUDA support for XGBoost GPU acceleration.

Usage

Run the full analysis pipeline:

./run_analysis.sh          # All 5 notebooks sequentially
./run_analysis.sh part1    # Data pipeline + primary DML only
./run_analysis.sh part2    # Sensitivity analyses only
./run_analysis.sh part3    # Cross-site downstream analyses only
./run_analysis.sh part4    # Empiric failure analysis only
./run_analysis.sh part5    # MRSA outcome + organism distribution by specimen only

Notebooks are executed via jupyter nbconvert --execute with a 2-hour timeout per notebook. Total runtime: approximately 4-6 hours on a single GPU.

The XGBoost nuisance models run on the GPU by default. Notebooks 02 and 03 honor a CV_DEVICE=cpu environment variable for GPU-free re-runs (results are equivalent; runtime is longer); notebooks 01 and 05 require a CUDA GPU.

Pipeline overview

Step Notebook Description Key outputs
1 01_data_pipeline_dml.ipynb Data build + primary DML, cross-class adjustment, organism-stratified / clustered-bootstrap / propensity-score sensitivity *_dml_primary.csv, cross_class_7x7_matrix.csv
2 02_sensitivity_analyses.ipynb IPTW robustness, E-values, window sensitivity, dose-response, calendar-period drift iptw_results.csv, dml_vs_iptw.csv, evalue_sensitivity.csv
3 03_cross_site_downstream.ipynb Forest plot, heterogeneity, random-effects pooling, time decay, permutations, CEM fig1_forest_plot.pdf, fig_random_effects_forest.pdf
4 04_empiric_failure_analysis.ipynb Empiric therapy failure rates, preventable failures, monotherapy vs combination ef_regimen_mono_vs_combo.csv, failure figures
5 05_mrsa_and_specimen.ipynb MRSA as an outcome + organism distribution by specimen type mrsa_outcome_dml.csv, organism_distribution_by_specimen.csv

Notebooks must be run in order (each depends on outputs from previous steps).

Repository Structure

amr_causal/
├── README.md
├── LICENSE                            (MIT)
├── .gitignore
├── requirements.txt
├── run_analysis.sh                    (pipeline runner)
├── validate_pipeline.py               (output validation)
├── notebooks/
│   ├── 01_data_pipeline_dml.ipynb     (data build + primary DML)
│   ├── 02_sensitivity_analyses.ipynb  (IPTW, E-values, window, dose-response)
│   ├── 03_cross_site_downstream.ipynb (heterogeneity, permutations, CEM)
│   ├── 04_empiric_failure_analysis.ipynb (empiric therapy failure)
│   └── 05_mrsa_and_specimen.ipynb     (MRSA outcome + organism by specimen)
├── outputs/
│   ├── data/                          (intermediate CSVs, gitignored)
│   ├── results/                       (analysis result CSVs)
│   └── figures/                       (publication figures)
├── manuscript/
│   ├── manuscript.tex
│   ├── supplementary.tex
│   └── figures/                       (copied from outputs/figures/)
└── executed/                          (executed notebooks, gitignored)

Key Results

Drug Class MGB (ACE, pp) Stanford (ACE, pp) MIMIC-IV (ACE, pp)
Fluoroquinolones 11.9 12.6 8.6
3rd-gen cephalosporins 4.0 5.7 2.8
Carbapenems 4.0 4.7 2.6
Glycopeptides 3.5 5.6 1.8
Sulfonamides 12.8 4.8* ---
Ext-spec penicillins 2.8 5.0 1.8
Aminoglycosides 3.3 6.0 3.2

All P < 0.001 except *Stanford sulfonamides (P = 0.059, 95% CI −0.2 to 9.7, crosses the null). Sulfonamides not testable in MIMIC-IV. Estimates adjust for concurrent cross-class exposure and the expanded reviewer-requested confounder set, which attenuates the average causal effects (ACE, percentage points) relative to earlier single-class models.

Citation

@article{kuruvikkattil2026amr,
  title   = {Prior Antibiotic Exposure and the Causal Risk of Antimicrobial Resistance: A Multi-Site Study of 1.2 Million Culture Episodes},
  author  = {Kuruvikkattil, Aravind V. and Shukla, Shikhar and Celi, Leo A. and Wiley, Zanthia and Gichoya, Judy W. and Purkayastha, Saptarshi},
  year    = {2026},
  journal = {Submitted},
}

License

MIT License. See LICENSE.

About

Large scale Causal Inference to identify antimicrobial resistance from MGB, Stanford and BIDMC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages