4.6 Article

How to obtain valid tests and confidence intervals after propensity score variable selection?

Journal

STATISTICAL METHODS IN MEDICAL RESEARCH
Volume 29, Issue 3, Pages 677-694

Publisher

SAGE PUBLICATIONS LTD
DOI: 10.1177/0962280219862005

Keywords

Causal inference; double robustness; variable selection; model uncertainty; high-dimensional statistics

Funding

  1. Research Foundation - Flanders (FWO) [1S05916N]
  2. FWO Research Project [G016116N]
  3. Special Research Fund (BOF) [BOF.244.2017.0004.01]

Ask authors/readers for more resources

The problem of how to best select variables for confounding adjustment forms one of the key challenges in the evaluation of exposure or treatment effects in observational studies. Routine practice is often based on stepwise selection procedures that use hypothesis testing, change-in-estimate assessments or the lasso, which have all been criticised for - amongst other things - not giving sufficient priority to the selection of confounders. This has prompted vigorous recent activity in developing procedures that prioritise the selection of confounders, while preventing the selection of so-called instrumental variables that are associated with exposure, but not outcome (after adjustment for the exposure). A major drawback of all these procedures is that there is no finite sample size at which they are guaranteed to deliver treatment effect estimators and associated confidence intervals with adequate performance. This is the result of the estimator jumping back and forth between different selected models, and standard confidence intervals ignoring the resulting model selection uncertainty. In this paper, we will develop insight into this by evaluating the finite-sample distribution of the exposure effect estimator in linear regression, under a number of the aforementioned confounder selection procedures. We will show that by making clever use of propensity scores, a simple and generic solution is obtained in the context of generalized linear models, which overcomes this concern (under weaker conditions than competing proposals). Specifically, we propose to use separate regularized regressions for the outcome and propensity score models in order to construct a doubly robust 'g-estimator'; when these models are sufficiently sparse and correctly specified, standard confidence intervals for the g-estimator implicitly incorporate the uncertainty induced by the variable selection procedure.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available