Article Text

Download PDFPDF

Development and validation of 15-month mortality prediction models: a retrospective observational comparison of machine-learning techniques in a national sample of Medicare recipients
  1. Gregory D Berg1,
  2. Virginia F Gurley2
  1. 1Analytics, AxisPoint Health, Westminster, Colorado, USA
  2. 2Medical Affairs, AxisPoint Health, Westminster, Colorado, USA
  1. Correspondence to Dr Gregory D Berg; gregory.d.berg{at}


Objective The objective is to develop and validate a predictive model for 15-month mortality using a random sample of community-dwelling Medicare beneficiaries.

Data source The Centres for Medicare & Medicaid Services’ Limited Data Set files containing the five per cent samples for 2014 and 2015.

Participants The data analysed contains de-identified administrative claims information at the beneficiary level, including diagnoses, procedures and demographics for 2.7 million beneficiaries.

Setting US national sample of Medicare beneficiaries.

Study design Eleven different models were used to predict 15-month mortality risk: logistic regression (using both stepwise and least absolute shrinkage and selection operator (LASSO) selection of variables as well as models using an age gender baseline, Charlson scores, Charlson conditions, Elixhauser conditions and all variables), naïve Bayes, decision tree with adaptive boosting, neural network and support vector machines (SVMs) validated by simple cross validation. Updated Charlson score weights were generated from the predictive model using only Charlson conditions.

Primary outcome measure C-statistic.

Results The c-statistics was 0.696 for the naïve Bayes model and 0.762 for the decision tree model. For models that used the Charlson score or the Charlson variables the c-statistic was 0.713 and 0.726, respectively, similar to the model using Elixhauser conditions of 0.734. The c-statistic for the SVM model was 0.788 while the four models that performed the best were the logistic regression using all variables, logistic regression after selection of variables by the LASSO method, the logistic regression using a stepwise selection of variables and the neural network with c-statistics of 0.798, 0.798, 0.797 and 0.795, respectively.

Conclusions Improved means for identifying individuals in the last 15 months of life is needed to improve the patient experience of care and reducing the per capita cost of healthcare. This study developed and validated a predictive model for 15-month mortality with higher generalisability than previous administrative claims-based studies.

  • terminal care
  • hospice care
  • achine learning
  • classification
  • palliative care

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

View Full Text

Statistics from


  • Contributors Made substantial contributions to the conception and design of the work (GDB, VFG). Acquisition, analysis and interpretation of the data (GDB, VFG). Drafting the work and revising it critically for important intellectual content (GDB, VFG). Final approval of the version to be published (GDB, VFG). Agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved (GDB, VFG).

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None delared.

  • Ethics approval The Centres for Medicare & Medicaid Services makes Limited Data Set files available to researchers as allowed by federal laws and regulations as well as CMS policy.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Data is available for purchase by the Centres for Medicare & Medicaid Services (CMS).

  • Patient consent for publication Not required.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.