External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination

George C M Siontis; Ioanna Tzoulaki; Peter J Castaldi; John P A Ioannidis

doi:10.1016/j.jclinepi.2014.09.007

External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination

J Clin Epidemiol. 2015 Jan;68(1):25-34. doi: 10.1016/j.jclinepi.2014.09.007. Epub 2014 Oct 23.

Authors

George C M Siontis¹, Ioanna Tzoulaki², Peter J Castaldi³, John P A Ioannidis⁴

Affiliations

¹ Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, University Campus, P.O. Box 1186, 45110 Ioannina, Greece.
² Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, University Campus, P.O. Box 1186, 45110 Ioannina, Greece; Department of Epidemiology and Biostatistics, Imperial College London, Norfolk Place W2 1PG, London, United Kingdom.
³ Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, 181 Longwood Avenue, Boston, MA 02115, USA.
⁴ Department of Medicine, Stanford Prevention Research Center, Stanford University School of Medicine, 1265 Welch Rd, MSOB X306, Stanford, CA 94305, USA; Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, CA 94305, USA. Electronic address: jioannid@stanford.edu.

PMID: 25441703
DOI: 10.1016/j.jclinepi.2014.09.007

Abstract

Objectives: To evaluate how often newly developed risk prediction models undergo external validation and how well they perform in such validations.

Study design and setting: We reviewed derivation studies of newly proposed risk models and their subsequent external validations. Study characteristics, outcome(s), and models' discriminatory performance [area under the curve, (AUC)] in derivation and validation studies were extracted. We estimated the probability of having a validation, change in discriminatory performance with more stringent external validation by overlapping or different authors compared to the derivation estimates.

Results: We evaluated 127 new prediction models. Of those, for 32 models (25%), at least an external validation study was identified; in 22 models (17%), the validation had been done by entirely different authors. The probability of having an external validation by different authors within 5 years was 16%. AUC estimates significantly decreased during external validation vs. the derivation study [median AUC change: -0.05 (P < 0.001) overall; -0.04 (P = 0.009) for validation by overlapping authors; -0.05 (P < 0.001) for validation by different authors]. On external validation, AUC decreased by at least 0.03 in 19 models and never increased by at least 0.03 (P < 0.001).

Conclusion: External independent validation of predictive models in different studies is uncommon. Predictive performance may worsen substantially on external validation.

Keywords: Area under the receiver operating characteristics curve; Derivation study; Discrimination; External validation; Prognostic models; Risk prediction model.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Area Under Curve
Forecasting
Humans
Models, Statistical*
Probability
Prognosis
Reproducibility of Results
Risk*