PT - JOURNAL ARTICLE AU - Gregory D Berg AU - Virginia F Gurley TI - Development and validation of 15-month mortality prediction models: a retrospective observational comparison of machine-learning techniques in a national sample of Medicare recipients AID - 10.1136/bmjopen-2018-022935 DP - 2019 Jul 01 TA - BMJ Open PG - e022935 VI - 9 IP - 7 4099 - http://bmjopen.bmj.com/content/9/7/e022935.short 4100 - http://bmjopen.bmj.com/content/9/7/e022935.full SO - BMJ Open2019 Jul 01; 9 AB - Objective The objective is to develop and validate a predictive model for 15-month mortality using a random sample of community-dwelling Medicare beneficiaries.Data source The Centres for Medicare & Medicaid Services’ Limited Data Set files containing the five per cent samples for 2014 and 2015.Participants The data analysed contains de-identified administrative claims information at the beneficiary level, including diagnoses, procedures and demographics for 2.7 million beneficiaries.Setting US national sample of Medicare beneficiaries.Study design Eleven different models were used to predict 15-month mortality risk: logistic regression (using both stepwise and least absolute shrinkage and selection operator (LASSO) selection of variables as well as models using an age gender baseline, Charlson scores, Charlson conditions, Elixhauser conditions and all variables), naïve Bayes, decision tree with adaptive boosting, neural network and support vector machines (SVMs) validated by simple cross validation. Updated Charlson score weights were generated from the predictive model using only Charlson conditions.Primary outcome measure C-statistic.Results The c-statistics was 0.696 for the naïve Bayes model and 0.762 for the decision tree model. For models that used the Charlson score or the Charlson variables the c-statistic was 0.713 and 0.726, respectively, similar to the model using Elixhauser conditions of 0.734. The c-statistic for the SVM model was 0.788 while the four models that performed the best were the logistic regression using all variables, logistic regression after selection of variables by the LASSO method, the logistic regression using a stepwise selection of variables and the neural network with c-statistics of 0.798, 0.798, 0.797 and 0.795, respectively.Conclusions Improved means for identifying individuals in the last 15 months of life is needed to improve the patient experience of care and reducing the per capita cost of healthcare. This study developed and validated a predictive model for 15-month mortality with higher generalisability than previous administrative claims-based studies.