Skip to content

Add docs on justifying instruments in the IV approach #345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
fcf2626
iv and multiple instruments
NathanielF Jun 3, 2024
b92b606
Merge branch 'main' into iv_weak_instruments
NathanielF Jun 7, 2024
b4722b6
Add jax as dependency re-configure IV class fit call
NathanielF Jun 7, 2024
a1d5e8a
add argument for doctest
NathanielF Jun 7, 2024
aecb24f
try and improve test coverage
NathanielF Jun 8, 2024
85dbd74
running ppc checks on iv test
NathanielF Jun 8, 2024
b81d878
updated write up
NathanielF Jun 9, 2024
cc0dfa9
tidying plots
NathanielF Jun 9, 2024
9b40b07
add single instrument f-test
NathanielF Jun 10, 2024
3bf51cf
tighter writing
NathanielF Jun 11, 2024
bbefa9d
add DAG
NathanielF Jun 11, 2024
8b95a0b
tidying
NathanielF Jun 11, 2024
70d4430
further tidying
NathanielF Jun 12, 2024
47b1a06
tidying text and graphs
NathanielF Jun 12, 2024
729f414
further tidying
NathanielF Jun 12, 2024
8bdf46b
fix test bug and rasterise
NathanielF Jun 13, 2024
9be8622
refine last paragraph
NathanielF Jun 13, 2024
1ad5774
trying to lighten notebook
NathanielF Jun 13, 2024
7f87402
add jax to pyproject.toml
NathanielF Jun 13, 2024
fe7107d
remove jax dependency and test on jax
NathanielF Jun 13, 2024
5323acb
trying to make it smaller again
NathanielF Jun 13, 2024
a22b4ef
addressing some of Ben's comments
NathanielF Jun 17, 2024
bff8260
tidy the headings
NathanielF Jun 17, 2024
f4ca088
Merge branch 'main' into iv_weak_instruments
NathanielF Jun 17, 2024
4a56262
run ruff-formatter
NathanielF Jun 17, 2024
5ed005b
addressing Ben's final comments.
NathanielF Jun 18, 2024
9eb41f1
minor change to phrasing
NathanielF Jun 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ repos:
exclude_types: [svg]
- id: check-yaml
- id: check-added-large-files
exclude: &exclude_pattern 'iv_weak_instruments.ipynb'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I seeing right that the use of rasterize hasn't actually impacted the filesize of the notebook?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it dropped it from 6.5 to 3.5 but not enough to avoid the large file check so i had this...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, well I'll certainly take that drop in filesize 👍🏻

args: ["--maxkb=1500"]
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.8
Expand Down
1 change: 1 addition & 0 deletions causalpy/data/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
"geolift1": {"filename": "geolift1.csv"},
"risk": {"filename": "AJR2001.csv"},
"nhefs": {"filename": "nhefs.csv"},
"schoolReturns": {"filename": "schoolingReturns.csv"},
}


Expand Down
3,011 changes: 3,011 additions & 0 deletions causalpy/data/schoolingReturns.csv

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion causalpy/pymc_experiments.py
Original file line number Diff line number Diff line change
Expand Up @@ -1453,7 +1453,7 @@ def __init__(
"mus": [self.ols_beta_first_params, self.ols_beta_second_params],
"sigmas": [1, 1],
"eta": 2,
"lkj_sd": 2,
"lkj_sd": 1,
}
self.priors = priors
self.model.fit(
Expand Down
54 changes: 41 additions & 13 deletions causalpy/pymc_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,8 +303,8 @@
... "mus": [[-2,4], [0.5, 3]],
... "sigmas": [1, 1],
... "eta": 2,
... "lkj_sd": 2,
... })
... "lkj_sd": 1,
... }, None)
Inference data...
"""

Expand Down Expand Up @@ -340,7 +340,7 @@
sigma=priors["sigmas"][1],
dims="covariates",
)
sd_dist = pm.HalfCauchy.dist(beta=priors["lkj_sd"], shape=2)
sd_dist = pm.Exponential.dist(priors["lkj_sd"], shape=2)
chol, corr, sigmas = pm.LKJCholeskyCov(
name="chol_cov",
eta=priors["eta"],
Expand All @@ -366,24 +366,52 @@
shape=(X.shape[0], 2),
)

def fit(self, X, Z, y, t, coords, priors):
"""Draw samples from posterior, prior predictive, and posterior predictive
distributions.
def sample_predictive_distribution(self, ppc_sampler="jax"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking you still want the default to be jax given that we now don't have this as a project dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. It's not going to be invoked in the fit method. But in the notebook i want to suggest jax is the strongly recommended route to sampling the ppc on the instrumental var class.

"""Function to sample the Multivariate Normal posterior predictive
Likelihood term in the IV class. This can be slow without
using the JAX sampler compilation method. If using the
JAX sampler it will sample only the posterior predictive distribution.
If using the PYMC sampler if will sample both the prior
and posterior predictive distributions."""
random_seed = self.sample_kwargs.get("random_seed", None)

if ppc_sampler == "jax":
with self:
self.idata.extend(

Check warning on line 380 in causalpy/pymc_models.py

View check run for this annotation

Codecov / codecov/patch

causalpy/pymc_models.py#L379-L380

Added lines #L379 - L380 were not covered by tests
pm.sample_posterior_predictive(
self.idata,
random_seed=random_seed,
compile_kwargs={"mode": "JAX"},
)
)
elif ppc_sampler == "pymc":
with self:
self.idata.extend(pm.sample_prior_predictive(random_seed=random_seed))
self.idata.extend(
pm.sample_posterior_predictive(
self.idata,
random_seed=random_seed,
)
)

def fit(self, X, Z, y, t, coords, priors, ppc_sampler=None):
"""Draw samples from posterior distribution and potentially
from the prior and posterior predictive distributions. The
fit call can take values for the
ppc_sampler = ['jax', 'pymc', None]
We default to None, so the user can determine if they wish
to spend time sampling the posterior predictive distribution
independently.
"""

# Ensure random_seed is used in sample_prior_predictive() and
# sample_posterior_predictive() if provided in sample_kwargs.
random_seed = self.sample_kwargs.get("random_seed", None)
# Use JAX for ppc sampling of multivariate likelihood

self.build_model(X, Z, y, t, coords, priors)
with self:
self.idata = pm.sample(**self.sample_kwargs)
self.idata.extend(pm.sample_prior_predictive(random_seed=random_seed))
self.idata.extend(
pm.sample_posterior_predictive(
self.idata, progressbar=False, random_seed=random_seed
)
)
self.sample_predictive_distribution(ppc_sampler=ppc_sampler)
return self.idata


Expand Down
1 change: 1 addition & 0 deletions causalpy/tests/test_integration_pymc_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -504,6 +504,7 @@ def test_iv_reg():
sample_kwargs=sample_kwargs
),
)
result.model.sample_predictive_distribution(ppc_sampler="pymc")
assert isinstance(df, pd.DataFrame)
assert isinstance(data, pd.DataFrame)
assert isinstance(instruments_data, pd.DataFrame)
Expand Down
6 changes: 3 additions & 3 deletions docs/source/_static/interrogate_badge.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ Instrumental Variables Regression
:titlesonly:

notebooks/iv_pymc.ipynb
notebooks/iv_weak_instruments.ipynb

Inverse Propensity Score Weighting
=================================
Expand Down
9 changes: 7 additions & 2 deletions docs/source/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ Glossary
Endogenous Variable
An endogenous variable is a variable in a regression equation such that the variable is correlated with the error term of the equation i.e. correlated with the outcome variable (in the system). This is a problem for OLS regression estimation techniques because endogeniety violates the assumptions of the Gauss Markov theorem.

Local Average Treatment effect
LATE
Also known asthe complier average causal effect (CACE), is the effect of a treatment for subjects who comply with the experimental treatment assigned to their sample group. It is the quantity we're estimating in IV designs.

Non-equivalent group designs
NEGD
A quasi-experimental design where units are assigned to conditions non-randomly, and not according to a running variable (see Regression discontinuity design). This can be problematic when assigning causal influence of the treatment - differences in outcomes between groups could be due to the treatment or due to differences in the group attributes themselves.
Expand All @@ -62,6 +66,9 @@ Glossary
Pretest-posttest design
A quasi-experimental design where the treatment effect is estimated by comparing an outcome measure before and after treatment.

Propensity scores
An estimate of the probability of adopting a treatment status. Used in re-weighting schemes to balance observational data.

Quasi-experiment
An empirical comparison used to estimate the effects of a treatment where units are not assigned to conditions at random.

Expand Down Expand Up @@ -101,8 +108,6 @@ Glossary
2SLS
An estimation technique for estimating the parameters of an IV regression. It takes its name from the fact that it uses two OLS regressions - a first and second stage.

Propensity scores
An estimate of the probability of adopting a treatment status. Used in re-weighting schemes to balance observational data.


References
Expand Down
3,556 changes: 3,556 additions & 0 deletions docs/source/notebooks/iv_weak_instruments.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/source/quasi_dags.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"One nice feature of this set up is that we can evaluate the claim of __strong ignorability__ because it implies that $T \\perp\\!\\!\\!\\perp X | PS(X)$ and this ensures the covariate profiles are balanced across the treatment branches conditional on the propensity score. This is a testable implication of the postulated design! Balance plots and measures are ways in which to evaluate if the offset achieved by your propensity score has worked. It is crucial that PS serve as a balancing score, if the measure cannot serve as a balancing score the collision effect can add to the confounding bias rather than remove it. "
"One nice feature of this set up is that we can evaluate the claim of __strong ignorability__ because it implies that $Z \\perp\\!\\!\\!\\perp X | PS(X)$ and this ensures the covariate profiles are balanced across the treatment branches conditional on the propensity score. This is a testable implication of the postulated design! Balance plots and measures are ways in which to evaluate if the offset achieved by your propensity score has worked. It is crucial that PS serve as a balancing score, if the measure cannot serve as a balancing score the collision effect can add to the confounding bias rather than remove it. "
]
},
{
Expand Down
9 changes: 9 additions & 0 deletions docs/source/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,15 @@ @article{acemoglu2001colonial
year={2001}
}

@incollection{card1995returns,
author={Card, David},
title={Using Geographical Variation in College Proximity to Estimate the Return to Schooling},
editor={Christofides, L.N. and Grant, E.K. and Swidinsky, R.},
booktitle={Aspects of Labour Market Behaviour: Essays in Honour of John Vanderkamp},
year={1995},
publisher={University of Toronto Press}
}

@incollection{forde2024nonparam,
author = {Forde, Nathaniel},
title = {Bayesian Non-parametric Causal Inference},
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ dependencies = [
"scipy",
"seaborn>=0.11.2",
"statsmodels",
"xarray>=v2022.11.0",
"xarray>=v2022.11.0"
]

# List additional groups of dependencies here (e.g. development dependencies). Users
Expand Down
Loading