-
Notifications
You must be signed in to change notification settings - Fork 13
Bayesian ITR Curves
With the rapid adoption of Electronic Health Records (EHR) in modern healthcare systems, large amount of medical data become available and various statistical methods are developed for analyzing effects of drugs based on such retrospective data.
The current R packages for treatment effect analysis mainly focus on analyzing average treatment effect in the population or treatment effects on group level where groups are pre-specified by patients’ covariates. Instead, this project focuses on the problem of estimating individual treatment effects, a crucial implementation towards precision medicine. Moreover, we solve the problem of estimating the treatment effects changing over time (we called treatment response curves) and modeling longitudinal outcome variables whereas the existing packages only solve point-in-time estimates, which are not adequate for analyzing present EHR data that usually consist of irregularly observed time series (e.g. MIMIC III [1]). Finally, this project uses MCMC as the inference method and gives a Bayesian estimator of the so-called individualized treatment response (ITR) curves.
In summary, we aim to add an state-of-the-art approach [2] to the R packages for analyzing continuous individual treatment effects based on EHR time series; in this project, we seek to speed up the MCMC inference with the core part implemented in C/C++.
As we mentioned above, most of the existing packages estimate average treatment effects, which ignore the heterogeneity in treatment effects that usually exists in large observational data. These packages include: nonrandom
and MatchIt
use propensity score based methods to adjust the imbalance between the treated group and control group first and then compute the difference-in-means of the outcome variable; package ATE
gives a nonparametric point estimate of the average treatment effects. In contrast, package Findit
estimates heterogeneous treatment effects (HTE) based on variable selections in randomized studies. However, randomizations are typically expensive, unethical and impractical compared to observational studies using the retrospective EHR data. Package beanz
provides Bayesian analysis of HTE, but it limits the HTE estimation at group level while this project aims at estimations at individual level. Nevertheless, all the above mentioned packages give point-in-time estimates of the treatment effects; this project estimates continuous effects based on longitudinal data that are commonly collected by modern EHR systems.
One of the main goal of this project is to speed up the MCMC inference of the ITR approach, so the code of the core part (i.e. inference) should be implemented by C/C++. Extensive benchmark tests should be conducted to ensure the correctness throughout the development. A user-friendly R wrapper is needed for the package in terms of two parts: 1) create a comprehensive list of R function arguments for users to easily adjust the priors used in the Bayesian analysis and configure the inference level of the treatment effects (e.g. at population level, group level or individual level); 2) integrate plot generations (such as plotting the posterior of the estimated ITR curves, forecasting the trajectories of new test patient) and synthetic data generation. Examples of the plots and synthetic data are described in [2].
Current packages are limited to point-in-time estimates of treatment effects at population level or subgroup level, which falls behind advanced analysis of individualized treatment effects. The delivered package will provide a useful and efficient Bayesian analysis tool estimating treatment response curves at individual level based on observational time series data.
Students, please contact mentors below after completing at least one of the tests below.
- Yanxun Xu
Assistant Professor in Department of Applied Mathematics and Statistics at Johns Hopkins University.
Email: [email protected]
Website: http://www.ams.jhu.edu/~yxu70/
Co-author of [2]
- Xingguo Li
Fifth-year Ph.D. candidate, in the Department of Electrical and Computer Engineering, University of Minnesota.
Email: [email protected]
Website: http://people.ece.umn.edu/~lixx1661/
GSOC experience: 2014-2016 as a student; 2017 as a mentor
Students, please do one or more of the following tests before contacting the mentors above.
-
Medium: Write a R script to implement MCMC inference algorithm for Dirichlet Process Mixture model, which should also include a success test on simulated data. DPM is one of the fundamental components in ITR implementation.
-
Hard: Write a R wrapper and use Rcpp to code the MCMC inference for DPM. The inference in this project need to be fully implemented in Rcpp.
Students, please post a link to your test results here.
Yanbo Xu: code link
[1] MIMIC-III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG. Scientific Data (2016). DOI: 10.1038/sdata.2016.35. Available at: http://www.nature.com/articles/sdata201635 [2] Xu, Y., Xu, Y. and Saria, S., 2016. A Bayesian Nonparametic Approach for Estimating Individualized Treatment-Response Curves. arXiv preprint arXiv:1608.05182. [tentatively accepted by Journal of Machine Learning Research]