External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination

J Clin Epidemiol. 2015 Jan;68(1):25-34. doi: 10.1016/j.jclinepi.2014.09.007. Epub 2014 Oct 23.

Abstract

Objectives: To evaluate how often newly developed risk prediction models undergo external validation and how well they perform in such validations.

Study design and setting: We reviewed derivation studies of newly proposed risk models and their subsequent external validations. Study characteristics, outcome(s), and models' discriminatory performance [area under the curve, (AUC)] in derivation and validation studies were extracted. We estimated the probability of having a validation, change in discriminatory performance with more stringent external validation by overlapping or different authors compared to the derivation estimates.

Results: We evaluated 127 new prediction models. Of those, for 32 models (25%), at least an external validation study was identified; in 22 models (17%), the validation had been done by entirely different authors. The probability of having an external validation by different authors within 5 years was 16%. AUC estimates significantly decreased during external validation vs. the derivation study [median AUC change: -0.05 (P < 0.001) overall; -0.04 (P = 0.009) for validation by overlapping authors; -0.05 (P < 0.001) for validation by different authors]. On external validation, AUC decreased by at least 0.03 in 19 models and never increased by at least 0.03 (P < 0.001).

Conclusion: External independent validation of predictive models in different studies is uncommon. Predictive performance may worsen substantially on external validation.

Keywords: Area under the receiver operating characteristics curve; Derivation study; Discrimination; External validation; Prognostic models; Risk prediction model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Area Under Curve
  • Forecasting
  • Humans
  • Models, Statistical*
  • Probability
  • Prognosis
  • Reproducibility of Results
  • Risk*