The impact of covariate measurement error on risk prediction

Polyna Khudyakov; Malka Gorfine; David Zucker; Donna Spiegelman

doi:10.1002/sim.6498

The impact of covariate measurement error on risk prediction

Stat Med. 2015 Jul 10;34(15):2353-67. doi: 10.1002/sim.6498. Epub 2015 Apr 10.

Authors

Polyna Khudyakov¹, Malka Gorfine², David Zucker³, Donna Spiegelman⁴

Affiliations

¹ Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, U.S.A.
² Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Technion City, 32000, Haifa, Israel.
³ Department of Statistics, Hebrew University of Jerusalem, Mt. Scopus, Jerusalem, Israel.
⁴ Departments of Epidemiology, Biostatistics, Nutrition and Global Health, Harvard T.H. Chan School of Public Health, Boston, MA, U.S.A.

Abstract

In the development of risk prediction models, predictors are often measured with error. In this paper, we investigate the impact of covariate measurement error on risk prediction. We compare the prediction performance using a costly variable measured without error, along with error-free covariates, to that of a model based on an inexpensive surrogate along with the error-free covariates. We consider continuous error-prone covariates with homoscedastic and heteroscedastic errors, and also a discrete misclassified covariate. Prediction performance is evaluated by the area under the receiver operating characteristic curve (AUC), the Brier score (BS), and the ratio of the observed to the expected number of events (calibration). In an extensive numerical study, we show that (i) the prediction model with the error-prone covariate is very well calibrated, even when it is mis-specified; (ii) using the error-prone covariate instead of the true covariate can reduce the AUC and increase the BS dramatically; (iii) adding an auxiliary variable, which is correlated with the error-prone covariate but conditionally independent of the outcome given all covariates in the true model, can improve the AUC and BS substantially. We conclude that reducing measurement error in covariates will improve the ensuing risk prediction, unless the association between the error-free and error-prone covariates is very high. Finally, we demonstrate how a validation study can be used to assess the effect of mismeasured covariates on risk prediction. These concepts are illustrated in a breast cancer risk prediction model developed in the Nurses' Health Study.

Keywords: Brier score; ROC-AUC; logistic regression; measurement error; probit regression; risk prediction.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Area Under Curve
Calibration
Computer Simulation
Models, Statistical*
Monte Carlo Method
Predictive Value of Tests
Risk Assessment*

Abstract

Publication types

MeSH terms

Grants and funding