|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Parametric g-formula: stochastic interventions\n", |
| 8 | + "In the previous tutorial we went over the basics of the parametric g-formula using `TimeFixedGFormula` for basic interventions. Additionally, we can use the g-formula to look at stochastic interventions. Stochastic interventions are treatment plans under which not necessarily everyone is treated, but some random percentage are treated.\n", |
| 9 | + "\n", |
| 10 | + "To estimate the g-formula for stochastic treatments, the process is fairly similar. However, instead of treating everyone, some percentage are treated. A random percentage are treated and then $\\hat{Y_i^a}$ are predicted and averaged. This process is repeated some number times and the average of the averaged potential outcomes is returned.\n", |
| 11 | + "\n", |
| 12 | + "For our example, we will return to the previous data set on ART among HIV-infected individuals and all-cause mortality. First, we will load the data (again ignoring missing data)" |
| 13 | + ] |
| 14 | + }, |
| 15 | + { |
| 16 | + "cell_type": "code", |
| 17 | + "execution_count": 2, |
| 18 | + "metadata": {}, |
| 19 | + "outputs": [ |
| 20 | + { |
| 21 | + "name": "stdout", |
| 22 | + "output_type": "stream", |
| 23 | + "text": [ |
| 24 | + "<class 'pandas.core.frame.DataFrame'>\n", |
| 25 | + "Int64Index: 517 entries, 0 to 546\n", |
| 26 | + "Data columns (total 9 columns):\n", |
| 27 | + "id 517 non-null int64\n", |
| 28 | + "male 517 non-null int64\n", |
| 29 | + "age0 517 non-null int64\n", |
| 30 | + "cd40 517 non-null int64\n", |
| 31 | + "dvl0 517 non-null int64\n", |
| 32 | + "art 517 non-null int64\n", |
| 33 | + "dead 517 non-null float64\n", |
| 34 | + "t 517 non-null float64\n", |
| 35 | + "cd4_wk45 430 non-null float64\n", |
| 36 | + "dtypes: float64(3), int64(6)\n", |
| 37 | + "memory usage: 40.4 KB\n" |
| 38 | + ] |
| 39 | + } |
| 40 | + ], |
| 41 | + "source": [ |
| 42 | + "import numpy as np\n", |
| 43 | + "import pandas as pd\n", |
| 44 | + "\n", |
| 45 | + "from zepid import load_sample_data, spline\n", |
| 46 | + "from zepid.causal.gformula import TimeFixedGFormula\n", |
| 47 | + "\n", |
| 48 | + "df = load_sample_data(timevary=False)\n", |
| 49 | + "dfs = df.dropna(subset=['dead']).copy()\n", |
| 50 | + "dfs.info()\n", |
| 51 | + "\n", |
| 52 | + "dfs[['cd4_rs1', 'cd4_rs2']] = spline(dfs, 'cd40', n_knots=3, term=2, restricted=True)\n", |
| 53 | + "dfs[['age_rs1', 'age_rs2']] = spline(dfs, 'age0', n_knots=3, term=2, restricted=True)" |
| 54 | + ] |
| 55 | + }, |
| 56 | + { |
| 57 | + "cell_type": "markdown", |
| 58 | + "metadata": {}, |
| 59 | + "source": [ |
| 60 | + "Similar to the previous tutorial, we initialize the `TimeFixedGFormula` with the data set (`dfs`), our treatment variable (`art`), and binary outcome (`dead`). Then we fit a regression model predicting all-cause mortality as a function of ART and our set of confounding variables (age, CD4 T-cell count, detectable viral load, gender)" |
| 61 | + ] |
| 62 | + }, |
| 63 | + { |
| 64 | + "cell_type": "code", |
| 65 | + "execution_count": 3, |
| 66 | + "metadata": {}, |
| 67 | + "outputs": [ |
| 68 | + { |
| 69 | + "name": "stdout", |
| 70 | + "output_type": "stream", |
| 71 | + "text": [ |
| 72 | + " Generalized Linear Model Regression Results \n", |
| 73 | + "==============================================================================\n", |
| 74 | + "Dep. Variable: dead No. Observations: 517\n", |
| 75 | + "Model: GLM Df Residuals: 507\n", |
| 76 | + "Model Family: Binomial Df Model: 9\n", |
| 77 | + "Link Function: logit Scale: 1.0000\n", |
| 78 | + "Method: IRLS Log-Likelihood: -202.83\n", |
| 79 | + "Date: Mon, 11 Mar 2019 Deviance: 405.67\n", |
| 80 | + "Time: 07:08:33 Pearson chi2: 534.\n", |
| 81 | + "No. Iterations: 6 Covariance Type: nonrobust\n", |
| 82 | + "==============================================================================\n", |
| 83 | + " coef std err z P>|z| [0.025 0.975]\n", |
| 84 | + "------------------------------------------------------------------------------\n", |
| 85 | + "Intercept -3.9822 2.621 -1.520 0.129 -9.119 1.154\n", |
| 86 | + "art -0.7278 0.393 -1.854 0.064 -1.497 0.042\n", |
| 87 | + "male -0.0773 0.334 -0.231 0.817 -0.732 0.578\n", |
| 88 | + "age0 0.1548 0.092 1.689 0.091 -0.025 0.334\n", |
| 89 | + "age_rs1 -0.0059 0.004 -1.493 0.135 -0.014 0.002\n", |
| 90 | + "age_rs2 0.0129 0.006 2.035 0.042 0.000 0.025\n", |
| 91 | + "cd40 -0.0121 0.004 -3.028 0.002 -0.020 -0.004\n", |
| 92 | + "cd4_rs1 1.887e-05 1.19e-05 1.581 0.114 -4.52e-06 4.23e-05\n", |
| 93 | + "cd4_rs2 -3.866e-05 4.57e-05 -0.846 0.398 -0.000 5.09e-05\n", |
| 94 | + "dvl0 -0.1254 0.398 -0.315 0.753 -0.905 0.654\n", |
| 95 | + "==============================================================================\n" |
| 96 | + ] |
| 97 | + } |
| 98 | + ], |
| 99 | + "source": [ |
| 100 | + "g = TimeFixedGFormula(dfs, exposure='art', outcome='dead')\n", |
| 101 | + "g.outcome_model(model='art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0')" |
| 102 | + ] |
| 103 | + }, |
| 104 | + { |
| 105 | + "cell_type": "markdown", |
| 106 | + "metadata": {}, |
| 107 | + "source": [ |
| 108 | + "However, this time we do some backgound research and find that one potential intervention to increase ART prescriptions increases the probability of ART treatment to 80%. As a result, it is potentially misleading to compare to compare the treat-all vs treat-none scenarios. Instead, we will compare the stochastic treatment where 80% of individuals are treated with ART to the scenario where no one is treated.\n", |
| 109 | + "\n", |
| 110 | + "## Stochastic Treatment Plans\n", |
| 111 | + "To do this using `TimeFixedGFormula` we will instead call `fit_stochastic()` function instead of `fit()`. This function allows us to estimate a stochastic treatment. We specify `p=0.8` to have 80% of the population treated at random. By default, `fit_stochastic()` repeats this process 100 times and takes the average of these repeated random treatments. I will also use the `seed` argument to get replicable results. Let's look at the example" |
| 112 | + ] |
| 113 | + }, |
| 114 | + { |
| 115 | + "cell_type": "code", |
| 116 | + "execution_count": 7, |
| 117 | + "metadata": {}, |
| 118 | + "outputs": [ |
| 119 | + { |
| 120 | + "name": "stdout", |
| 121 | + "output_type": "stream", |
| 122 | + "text": [ |
| 123 | + "RD: -0.06041404870415\n" |
| 124 | + ] |
| 125 | + } |
| 126 | + ], |
| 127 | + "source": [ |
| 128 | + "g.fit_stochastic(p=0.8, seed=1000191)\n", |
| 129 | + "r_80 = g.marginal_outcome\n", |
| 130 | + "\n", |
| 131 | + "g.fit(treatment='none')\n", |
| 132 | + "r_none = g.marginal_outcome\n", |
| 133 | + "\n", |
| 134 | + "print('RD:', r_80 - r_none)" |
| 135 | + ] |
| 136 | + }, |
| 137 | + { |
| 138 | + "cell_type": "markdown", |
| 139 | + "metadata": {}, |
| 140 | + "source": [ |
| 141 | + "Under the treatment plan where 80% of people are randomly treated, the risk of all-cause mortality would have been 6.0% points lower than if no one was treated. \n", |
| 142 | + "\n", |
| 143 | + "After reading some more articles, we find an alternative treatment plan. Under this plan, 75% of men and 90% of women start using HIV. For this plan, we are interested in a conditional stochastic treatment. Again, we want to compare this to the scenario where no one is treated\n", |
| 144 | + "\n", |
| 145 | + "## Conditional Stochastic Treatment Plans\n", |
| 146 | + "For conditionally stochastic treatments, we instead provide `p` a list of probabilities. Additionally, we specify the `conditional` argument with the group restrictions. Again, we will need to use the magic-g functionality. Below is the example of the stochastic plan where 75% of men are treated and 90% of women" |
| 147 | + ] |
| 148 | + }, |
| 149 | + { |
| 150 | + "cell_type": "code", |
| 151 | + "execution_count": 9, |
| 152 | + "metadata": {}, |
| 153 | + "outputs": [ |
| 154 | + { |
| 155 | + "name": "stdout", |
| 156 | + "output_type": "stream", |
| 157 | + "text": [ |
| 158 | + "RD: -0.058656195525173926\n" |
| 159 | + ] |
| 160 | + } |
| 161 | + ], |
| 162 | + "source": [ |
| 163 | + "g.fit_stochastic(p=[0.75, 0.90], conditional=[\"g['male']==1\", \"g['male']==0\"], seed=518012)\n", |
| 164 | + "r_cs = g.marginal_outcome\n", |
| 165 | + "\n", |
| 166 | + "print('RD:', r_cs - r_none)" |
| 167 | + ] |
| 168 | + }, |
| 169 | + { |
| 170 | + "cell_type": "markdown", |
| 171 | + "metadata": {}, |
| 172 | + "source": [ |
| 173 | + "Under the treatment plan where 75% of men and 90% of women are randomly treated, the risk of all-cause mortality would have been 5.9% points lower than if no one was treated. This plan reduces the marginal mortality less than the previous stochastic plan because our HIV-infected population is predominantly men. \n", |
| 174 | + "\n", |
| 175 | + "# Conclusion\n", |
| 176 | + "In this tutorial, I detailed stochastic treatment plans using the g-formula. While presented for a binary outcome, the same procedure can also be used to estimate stochastic treatments for continuous outcomes. Please view other tutorials for information other functions in *zEpid*\n", |
| 177 | + "\n", |
| 178 | + "## Further Readings\n", |
| 179 | + "Ahern et al. (2016). Predicting the population health impacts of community interventions: the case of alcohol outlets and binge drinking. *AJPH*, 106(11), 1938-1943.\n", |
| 180 | + "\n", |
| 181 | + "Snowden et al. (2011) \"Implementation of G-computation on a simulated data set: demonstration of a causal inference technique.\" *AJE* 173.7: 731-738.\n", |
| 182 | + "\n", |
| 183 | + "Robins. (1986) \"A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect.\" *Mathematical modelling* 7.9-12: 1393-1512" |
| 184 | + ] |
| 185 | + } |
| 186 | + ], |
| 187 | + "metadata": { |
| 188 | + "kernelspec": { |
| 189 | + "display_name": "Python 3", |
| 190 | + "language": "python", |
| 191 | + "name": "python3" |
| 192 | + }, |
| 193 | + "language_info": { |
| 194 | + "codemirror_mode": { |
| 195 | + "name": "ipython", |
| 196 | + "version": 3 |
| 197 | + }, |
| 198 | + "file_extension": ".py", |
| 199 | + "mimetype": "text/x-python", |
| 200 | + "name": "python", |
| 201 | + "nbconvert_exporter": "python", |
| 202 | + "pygments_lexer": "ipython3", |
| 203 | + "version": "3.6.3" |
| 204 | + } |
| 205 | + }, |
| 206 | + "nbformat": 4, |
| 207 | + "nbformat_minor": 2 |
| 208 | +} |
0 commit comments