Article Text

Download PDFPDF

Machine learning with sparse nutrition data to improve cardiovascular mortality risk prediction in the USA using nationally randomly sampled data
  1. Joseph Rigdon1,
  2. Sanjay Basu2
  1. 1 Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
  2. 2 Center for Primary Care, Harvard Medical School, Boston, Massachusetts, USA
  1. Correspondence to Dr Joseph Rigdon; jrigdon{at}wakehealth.edu

Abstract

Objectives We aimed to test whether or not adding (1) nutrition predictor variables and/or (2) using machine learning models improves cardiovascular death prediction versus standard Cox models without nutrition predictor variables.

Design Retrospective study.

Setting Six waves of Survey (NHANES) data collected from 1999 to 2011 linked to the National Death Index (NDI).

Participants 29 390 participants were included in the training set for model derivation and 12 600 were included in the test set for model evaluation. Our study sample was approximately 20% black race and 25% Hispanic ethnicity.

Primary and secondary outcome measures Time from NHANES interview until the minimum of time of cardiovascular death or censoring.

Results A standard risk model excluding nutrition data overestimated risk nearly two-fold (calibration slope of predicted vs true risk: 0.53 (95% CI: 0.50 to 0.55)) with moderate discrimination (C-statistic: 0.87 (0.86 to 0.89)). Nutrition data alone failed to improve performance while machine learning alone improved calibration to 1.18 (0.92 to 1.44) and discrimination to 0.91 (0.90 to 0.92). Both together substantially improved calibration (slope: 1.01 (0.76 to 1.27)) and discrimination (C-statistic: 0.93 (0.92 to 0.94)).

Conclusion Our results indicate that the inclusion of nutrition data with available machine learning algorithms can substantially improve cardiovascular risk prediction.

  • cardiovascular disease
  • machine learning
  • nutrition
  • risk prediction

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Footnotes

  • Contributors SB conceptualised the study and design and contributed to data preparation and analysis. JR contributed to data preparation and analysis. Both authors contributed to writing and critically reviewing the manuscript.

  • Funding This work was supported by the National Institute On Minority Health And Health Disparities of the National Institutes of Health under Award Number DP2MD010478.

  • Disclaimer The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available upon reasonable request.