Introduction Left ventricular ejection fraction (LVEF) ≤35%, as current significant implantable cardioverter-defibrillator (ICD) indication for primary prevention of sudden cardiac death (SCD) in heart failure (HF) patients, has been widely recognised to be inefficient. Improvement of patient selection for low LVEF (≤35%) is needed to optimise deployment of ICD. Most of the existing prediction models are not appropriate to identify ICD candidates at high risk of SCD in HF patients with low LVEF. Compared with traditional statistical analysis, machine learning (ML) can employ computer algorithms to identify patterns in large datasets, analyse rules automatically and build both linear and non-linear models in order to make data-driven predictions. This study is aimed to develop and validate new models using ML to improve the prediction of SCD in HF patients with low LVEF.
Methods and analysis We will conduct a retroprospective, multicentre, observational registry of Chinese HF patients with low LVEF. The HF patients with LVEF ≤35% after optimised medication at least 3 months will be enrolled in this study. The primary endpoints are all-cause death and SCD. The secondary endpoints are malignant arrhythmia, sudden cardiac arrest, cardiopulmonary resuscitation and rehospitalisation due to HF. The baseline demographic, clinical, biological, electrophysiological, social and psychological variables will be collected. Both ML and traditional multivariable Cox proportional hazards regression models will be developed and compared in the prediction of SCD. Moreover, the ML model will be validated in a prospective study.
Ethics and dissemination The study protocol has been approved by the Ethics Committee of the First Affiliated Hospital of Nanjing Medical University (2017-SR-06). All results of this study will be published in international peer-reviewed journals and presented at relevant conferences.
Trial registration number ChiCTR-POC-17011842; Pre-results.
- Heart Failure
- Sudden Cardiac Death
- Machine Learning
- Risk Model
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
This study is the first multicentre registry study in China, aimed to investigate the feasibility and accuracy of applying machine learning (ML) to predict sudden cardiac death (SCD) in heart failure (HF) patients with low left ventricular ejection fraction (LVEF).
A broad range of outcomes, including SCD, all-cause death, lethal arrhythmia, sudden cardiac arrest, cardiopulmonary resuscitation and rehospitalisation due to HF, will be evaluated in this study, and the corresponding prognostic models will be developed.
ML and the traditional multivariable Cox proportional hazards regression model will be derived from the same database and be compared.
HF patients with LVEF >35% will not be included based on the design of this study, which will restrict the application of the results of this study to the HF with low LVEF.
It might be difficult to determine the endpoint of this study sometimes for some patients, when dealing with SCD, lethal arrhythmia and sudden cardiac arrest, especially when outside the hospital.
Heart failure (HF) has become a major public health problem with increased prevalence in both Asia and Western countries. The prevalence of HF in Asia is 1.2%–6.7% depending on the population studied.1 In China, there are 4.2 million HF patients, and 500 000 new cases are being diagnosed each year.1 Although the survival rate after HF diagnosis has been increased due to improvement in medical therapy, the mortality of HF remains high. Around 50% of people diagnosed with HF will die within 5 years.2 The two most common causes of death in patients with HF are sudden cardiac death (SCD) and progressive pump failure. SCD in HF patients is usually caused by lethal arrhythmias such as ventricular tachycardia or ventricular fibrillation, and is reported to be responsible for ~50% of all cardiovascular death in HF patients.3 4
The most effective strategy for prevention of SCD in patients with HF is the implantable cardioverter-defibrillator (ICD), associated with 54% relative risk reduction in primary prevention,5 and 50% relative risk reduction in arrhythmia-related death in secondary prevention.6 There is a higher risk of SCD in patients with left ventricular ejection fraction (LVEF) ≤35% than with LVEF >35%.7 At present, LVEF ≤35% is the major ICD indication for primary prevention of SCD.8 However, real-world data show that only 3%–5% of ICD patients for primary prevention with LVEF ≤35% receive shock therapies on an annual basis,9 whereas some SCD victims have LVEF >35%.10 11 Identifying the patients who will be most likely to benefit from primary prevention ICD is urgently needed. Based on the latest literature, LVEF ≤35% is still an independent predictor of all-cause and cardiovascular mortality in chronic systolic HF, and displays a better combination of sensitivity and specificity than 40% cut-off.12 Finding ways to evaluate the SCD risk in patients with lower EF will be more efficient and economically significant. Furthermore, a noticeable decline in the rate of SCD for HF patients with reduced LVEF has been observed, which was consistent with the cumulative benefit of optimising medication including ACE inhibitor (ACEI) or angiotensin receptor blocker (ARB), beta-blocker and mineralocorticoid receptor antagonist (MRA).13 Therefore, it is imperative to update the criterion for ICD implantation.
Over the last decade, lots of multivariate prognostic models derived for chronic HF patients have been proposed (table 1).14–25 However, these models are not appropriate to identify ICD candidates at high risk of SCD in HF patients with low LVEF. Most above prognostic scores were developed form trial databases, and the subjects included various types of HF. There is no specific study for the prognosis of low LVEF population. Additionally, although all the scores are ‘not parsimonious’, some critical factors are not incorporated into the prognostic models, for example, medications are contained in Irbesartan in Heart Failure With Preserved Ejection Fraction Study (I-PRESERVE),17 Meta-Analysis Global Group in Chronic Heart Failure (MAGGIC)21 and Cardiac and Comorbid Conditions HF (3C-HF).23 Optimised medication was not required as inclusion criteria in all 12 studies. Furthermore, the most above prognostic models are not able to predict SCD risk. In recent years, the advances in strain echocardiography,26 27 cardiac magnetic resonance26 27 and cardiac radionuclide imaging28 29 have provided essential insights into the mechanisms of ventricular arrhythmias, and have been recommended to predict the SCD in patients with HF. Although these new methods are effective and non-invasive, the widespread use in large HF population to predict SCD is difficult, due to high equipment and technical requirements. Resting 12-lead ECG and Holter, as the longest surviving, broadly available, quickly deployed and inexpensive tests, can provide a measure of cumulative electrical risk, which may be combined with other factors to improve the SCD risk prediction.30
Based on above reasons, the novel risk assessment tools should meet the following requirements: (1) the risk model should be developed from the population with low LVEF (≤35%) to accelerate its clinical application and promote the accuracy of ICD indications for primary prevention. (2) More cardiac and non-cardiac factors beyond LVEF should be included. (3) Electrical risk factors should be included as candidate predictors to evaluate the risk of sudden arrhythmic death. (4) Although sometimes it is not easy to determine the cause of death, SCD as the primary endpoint should be defined whenever possible.
Data processing is the crucial step to develop the prognostic models. This study involves non-linear prediction models, a large number of patients and numerous predictors with complicated correlations. Traditional hypothesis-driven statistical analysis is difficult to overcome these challenges. The machine learning (ML) approaches have great potential to improve the solution. They employ computer algorithms to identify patterns in large datasets with a large number of variables, analyse rules automatically and build both linear and non-linear models in order to make data-driven predictions or decisions.31 Weng et al 32 found that ML significantly improved the accuracy of cardiovascular risk prediction, increased the number of patients who could benefit from preventive treatment and avoided unnecessary treatment. Recent studies have shown that the application of ML techniques may have the potential to improve HF outcomes and management, including cost savings by improving existing diagnostic and treatment support systems.33 ML algorithms also have been applied to predict SCD in some recent studies and results indicate their significant advantages for predicting SCD.34 35 However, more studies based on large-scale cohort are needed to evaluate ML for prediction of SCD in HF patients. Therefore, the application of ML for the prediction of SCD in HF patients with low LVEF is technically innovative and clinically significant.
The purpose of our study is to develop and validate new models to improve the prediction of SCD in HF patients with low LVEF. The new strategies of identifying HF patients most likely to benefit from primary prevention ICD will improve the revolution of ICD indications. The specific research objective is to develop prediction models to evaluate prognosis and SCD risk, respectively, by ML methods and traditional Cox proportional hazard regression in HF patients with low LVEF (≤35%).
Methods and analysis
This study is a retroprospective, multicentre, non-interventional, observational clinical registry. The primary sponsor is The First Affiliated Hospital of Nanjing Medical University. The study will be conducted across 14 cardiovascular departments in tertiary A hospitals throughout the People’s Republic of China (see online supplementary file 1).
Supplementary file 1
The cases from January 2016 to December 2017 in the First Affiliated Hospital of Nanjing Medical University and Xiamen Cardiovascular Hospital Xiamen University will be collected retrospectively and followed-up prospectively. About 500 retrospective cases meet the inclusion criteria according to preliminary estimation. The prospective recruitment has started in the above 14 hospitals since January 2018. The retrospective cases and the first 1000 prospective cases will be used to develop the prediction models. And the next 1000 prospective cases will be used for model validation. The flow diagram of the progress is illustrated in figure 1.
To participate in this study, patients must comply with all of the following.
Diagnosis of heart failure with reduced EF (HFrEF) according to the 2016 European Society of Cardiology (ESC) HF guideline.8
LVEF ≤35% (measured by Simpson’s methods) after optimised medication including ACEI or ARB, beta-blocker and MRA if available and not contraindicated at least 3 months.
Signed informed consent.
The patient with any of the following will be excluded.
Rheumatic heart disease.
Congenital heart disease.
Pulmonary heart disease.
Pericardial diseases and myocarditis.
Acute myocardial infarction in recent 3 months, including ST segment elevated myocardio infarction (STEMI) and NSTEMI.
Severe haematological disease including leukaemia, lymphoma, aplastic anaemia.
Application of other interventional clinical trials.
Non-drug therapies for improving heart function: cardiac resynchronization therapy with or without implantable cardioverter-defibrillator (CRT-P/D), ICD, heart transplantation, surgical resection of ventricular aneurysm, interventional left ventricular restoration with Revivent/Parachute system), MitraClip therapy for recurrent mitral regurgitation.
All-cause death and SCD, including cardiac death and death from other causes.
Lethal arrhythmia, sudden cardiac arrest, cardiopulmonary resuscitation, rehospitalisation due to HF.
Recruitment and consent
Participants will be identified and recruited at each of the participating centres. The clinical status of potential participants will be assessed, and their medical records will also be reviewed to confirm the eligibility according to the inclusion and exclusion criteria.
The study details will be explained to all potentially eligible and interesting subjects. The patients who agree to attend this study will sign the informed consent form (ICF) indicating that they fully understand the study and their rights of confidentiality and withdrawal from the study without giving a reason.
Prognostic models of HF in the last 10 years have been reviewed, and the associated risk factors have been ranked according to their corresponding HR in respective risk models (table 1, figure 2). Age, sex, New York Heart Association (NYHA) class, LVEF, prior HF hospitalisation, course of HF, severe valvular heart disease, atrial fibrillation, prior myocardial infarction/coronary artery bypass grafting (CABG), renal dysfunction, chronic obstructive pulmonary disease (COPD), diabetes mellitus (DM), ischaemic aetiology, decreased systolic pressure, low body mass index, anaemia, hyponatremia, high N-terminal probrain natriuretic peptide (NT-proBNP), uricemia and current smoker were included. Variables which were not listed in previous models but appear relevant to higher risk of SCD in HF patients, and would therefore, merit consideration, including syncope or presyncope, frequent premature ventricular beat, non-sustained ventricular tachycardia, complete left bundle branch block, long QT interval and increased QT dispersion. In addition, self-care ability, social support and psychological state including depression and anxiety, are also predictors for subsequent poor prognosis in HF patients. The above risk factors have been assessed and confirmed by an expert panel of cardiologists and statisticians and will be collected in this study particularly.
The baseline data that will be collected in all eligible subjects are as follows.
Demographic characteristics: date of birth, gender, height and weight.
Lifestyle behaviour: smoking and drinking status.
Vital signs: blood pressure and heart rate.
Aetiology of HF: the ischaemic aetiology will be confirmed if any following point is met: (a) prior myocardial infarction or revascularisation history (CABG/percutaneous coronary intervention); (b) left main or proximal segment of the left anterior descending artery stenosis ≥75% showed by coronary angiogram (CAG); (c) at least two main coronary artery branches stenosis ≥75% showed by CAG. Otherwise, non-ischaemic HF should be identified.
Prior HF hospitalisation history: first HF hospitalisation or not, times of prior HF hospitalisation, the course of HF (since the HF symptoms appear; if unavailable, since the decreased EF was found).
Coronary heart disease history: myocardial infarction or angina history, CAG result, revascularisation history, recent angina.
Arrhythmia history: atrial fibrillation, atrial flutter, premature atrial contraction (PAC), premature ventricular contraction (PVC), non-sustained VT (NSVT), sustained VT, ventricular fibrillation and some bradyarrhythmias.
Syncope or presyncope history.
Cardiac arrest/cardiopulmonary resuscitation history.
Other histories: hypertension, DM, COPD.
Echocardiography: LV end-diastolic volume, LV end-systolic volume and LVEF measured by Simpson’s method; left atrial diameter, LV end-diastolic diameter and LV end-systolic diameter, pulmonary artery systolic pressure. The status of valve regurgitation will be evaluated (0-none; 1-mild; 2-mild to moderate; 3-moderate; 4-severe).
ECG: left/right bundle branch block will be recorded. QRS duration and QT interval will be tested, and QT dispersion will be calculated.
Holter: total heartbeat of the whole day, minimum/maximum/ average HR, onset of PVC, PAC, NSVT, VT, atrial fibrillation/flutter.
Laboratory tests results: serum creatinine, blood urea nitrogen, serum natrium, haemoglobin, thyroid-stimulating hormone, free triiodothyronine, free thyroxine, NT-proBNP.
Medication: ACEI/ARB, beta-blocker, aldosterone antagonist, diuretic, digoxin, antiplatelet agent, anticoagulant, statin, calcium channel blocker, antiarrhythmics, Ivabradine and angiotensin receptor blocker-neprilysin inhibitor.
Evaluation of self-care behaviour and social support: 9-item European Heart Failure Self-care Behaviour Scale (9-EHFScBS)36 will be used to determine the self-care levels in HF patients. Social Support Rating Scale (SSRS)37 will be used to evaluate the social support condition in HF patients.
Assessment of psychological status: Hamilton Depression Scale (HAMD) and Hamilton Anxiety Scale (HAMA).
Socioeconomic and educational status: marital status, educational status, monthly income, sources of medical expenses, medical insurance.
After being enrolled in this research, all the subjects will be followed-up periodically in the outpatient department or by telephone interview every 3 months. The compliance with medications will be evaluated. As the primary endpoint, all-cause death and SCD will be focused. Cause of death will be analysed in detail. SCD is defined by the WHO as unexpected death that occurs within 1 hour from the onset of new or worsening symptoms (witnessed arrest) or, if unwitnessed, within 24 hours from when the individual was last observed alive and asymptomatic.38 The lethal arrhythmia including ventricular tachycardia/ ventricular fibrillation (VT/VF), sudden cardiac arrest, cardiopulmonary resuscitation and rehospitalisation due to HF will be recorded carefully.
During follow-up, lethal arrhythmia will be recognised more precisely for patients who receive ICD or cardiac resynchronization therapy with implantable cardioverter-defibrillator (CRT/D) implantation, and will be recorded as an adverse event (AE). The patients, who receive CRT-P/D, heart transplantation, surgical resection of a ventricular aneurysm, interventional left ventricular restoration with Revivent/Parachute system, MitraClip therapy for recurrent mitral regurgitation, or some other non-drug therapy to improve heart function, will be followed up as usual.
In the prospective part, clinical data of subjects will be collected and filled in the electrical data capture (EDC) system at baseline and particular follow-up visit. In the retrospective part, the same baseline information, except for 9-EHFScBS, SSRS, HAMD and HAMA questionnaires, will also be captured and input into the EDC system. The following prospective visits (every 3 months) will be conducted regularly and will be recorded in the EDC system. Investigators will record all the information of AEs, study bias, withdrawal from the study or death in EDC system. In this study, the participants will be identified by study codes, and their names will not appear in the EDC system. All the personal information including contact information, medical record and outcome will not be revealed to any person who has not been authorised by a principal investigator. Professional staffs are responsible for database management, data maintenance and regular data backup. Data quality will be monitored regularly. The data collection checklist is showed in table 2.
All above-collected variables, which might be predictors of all adverse prognosis of HF described in endpoint events, will be classified as uncontrollable variables (eg, age, gender, history), controllable variables associated with heart (eg, NYHA class, LVEF, increased heart rate) and controllable variables beyond heart (eg, smoking, anaemia, DM). Appropriate dummy variables will be used for binary variables and categorical variables, and quantitative variables will be fitted as a single continuous measurement (eg, age, heart rate, NT-proBNP), unless there is clear evidence of non-linearity. In order to create a practice simple risk score, some continuous variables will also be categorised into several groups according to both common clinical cut points and expert advice.
Variable selection is the process of selecting a subset of relevant variables for use in model construction, which can substantially reduce the abundant information and decrease the number of variables that are input to the prediction model. In this study, the technique named as ‘information gain ranking’ will be used to select appropriate variables. Information gain represents the effectiveness of a variable based on entropy, which characterises the unpredictability of a system. The information gain of a variable is evaluated as the entropy difference of the system when including and excluding this variable. Then, the variables whose information gain scores are less than a threshold are considered to be insignificant and will be excluded from the prediction.
Prediction models for SCD in HF patients will be developed by the following classification algorithms, respectively: decision trees, logistic regression, support vector machine, random forest and artificial neural network.29 The performance and general error estimation of these ML models will be assessed by 10-fold cross-validation. The dataset will be randomly divided into 10 equal folds. Ninefolds will be used as the training set with the remaining onefold as the validation set. The validation results from 10 repeats will be combined to provide a measure of the overall performance. The prediction models derived from the above classification algorithms above will be evaluated based on the accuracy, sensitivities, specificities and the area under the receiver-operating characteristic (ROC) curve. Finally, clinical experts and computer specialists will discuss and choose the best model to predict the prognosis of SCD in HF patients and then perform further validation with the prospective dataset.
Cox proportional hazards regression
Univariable Cox proportional hazards modelling will be used to identify strong independent baseline candidate predictors for the primary and secondary outcomes. We will use both forward and backward stepwise procedure to derive the multivariable Cox proportional hazards model with p<0.05 as the inclusion criterion. Every variable in the model will be multiplied by its β-coefficient, and the products will be summed to calculate the risk score. Risk function will be used to estimate the level of risk. The calculating formula is as follows.39
P=h (t j; X k)=h0 (t j) exp (SCORE)
SCORE=Xk ßk = ß0 +ß1×1 +ß2×2 +………ßp xp
The dynamic prospective cases will be used for external validation of the optimal ML and Coxproportional hazards models. The validation will be performed using the models to calculate the probability of the outcome of interest occurring for each individual included in the validation sample when compared with the events actually observed to occur in this sample. The discrimination of each model will be estimated by ROC curve. The calibration of the models will be assessed by the Hosmer-Lemeshow goodness-of-fit test. The ML prediction model will be compared with the Cox proportional hazards regression model.
Patient and public involvement
During the design of this study, a survey of patient requirements, including communication needs, follow-up frequency and visit cost, was conducted in population of potential HF participants, which provided important evidence for drawing up this study protocol to meet most of the patients’ needs, build close contact with patients, enhance the overall adherence and improve the accuracy of endpoint event. This study is not a patient-led research, and patients are not involved in the recruitment of the study. The study results will be informed to the participants by phone at the end of this study. The alive patients will be evaluated with the new prediction model, and the ICD intervention will be recommended to the high SCD risk patients.
The retrospective data collection in the two subcentres started in March 2017, and prospective enrolment in all 14 subcentres has started in January 2018. The follow-up period is scheduled to end in December 2019. The major part of data analysis will be performed from January to June 2020. The study framework and process is summarised in figure 3.
Ethics and dissemination
All necessary information about this study will be disclosed to the patients. Every subject will be asked to sign the ICF, indicating that they fully understand the study and voluntarily participate in this study. All results of this study will be published in international peer-reviewed journals and presented at relevant conferences.
The evaluation of SCD risk in HF patients is a problem that urgently needed to be solved. The existing prediction strategies for the SCD risk in HF patients lack clinical practice value for various reasons. ICD indication for primary prevention of SCD could be optimised by identifying the high SCD risk patients in HF with low LVEF (≤35%). It is of great practical value and economic significance.
We reviewed some predictive studies of HF in the past years and ranked the risk factors according to their corresponding HR, which have been included in our study as candidate risk factors. Otherwise, some other variables which appear relevant to risk of SCD in HF patients are also collected. Therefore, the efficiency and practicality of predictive model development has been highly improved.
This study is the first multicentre registry study in China, aimed to investigate the feasibility and accuracy of applying ML to predict SCD in HF patients with low LVEF. A broad range of outcomes, including SCD, all-cause death, lethal arrhythmia, sudden cardiac arrest, cardiopulmonary resuscitation and rehospitalisation due to HF, will be evaluated in this study, and the corresponding prognostic models will be developed. ML and the traditional multivariable Cox proportional hazards regression model will be derived from the same database and will be compared.
The limitations of this study are as follows: (1) HF patients with LVEF >35% will not be included based on the design of this study, which will restrict the application of the results of this study to the HF with low LVEF. (2) It might be difficult to determine the endpoint of this study sometimes for some patients, when dealing with SCD, lethal arrhythmia and sudden cardiac arrest, especially when outside the hospital.
The authors thank Xiamen Cardiovascular Hospital, Xiamen University (Xiamen, China), Wuhan Asia Heart Hospital (Wuhan, China), Jiangning Hospital Affiliated to Nanjing Medical University, (Nanjing, China), The Second People’s Hospital of Lianyungang (Lianyungang, China), The Affiliated Hospital of Jiangsu University (Zhenjiang, China), Taixing People’s Hospital (Taixing, China), The First People’s Hospital of Huaian (Huaian, China), The First People’s Hospital of Yancheng (Yancheng, China), Rugao People’s Hospital (Rugao, China), The First People’s Hospital of Zhangjiagang (Zhangjiagang, China), The Third People’s Hospital of Suzhou (Suzhou, China), The Third People’s Hospital of Wuxi (Wuxi, China) and The Second Affiliated Hospital of Nanjing Medical University (Nanjing, China) for collaboration including recruitment and follow-up of HF patients. The authors also thank the HF patients who participated in the survey of patient requirements during the design of this study.
Contributors JGZ and FM conceived and designed the study. ZZ, XH, ZQ, YW, YC, YLW, YZ, ZC, XZ, JY, JLZ, JG, KL, LC, RZ and HJ participated in different phases of the protocol design. WZ provided expertise in data processing and machine learning. ST and YW provided their expertise for traditional statistical analysis. JGZ obtained funding. FM drafted the final manuscript. All authors have read the manuscript and provided feedback. JGZ approved the final version of the manuscript before submission. FM took responsibility for the submission process.
Funding This study was sponsored partly by the grant of clinical frontier technology from Jiangsu Science and Technology Agency (BE2016764).
Competing interests None declared.
Ethics approval The study protocol has been approved by the Ethics Committee of The First Affiliated Hospital of Nanjing Medical University (2017-SR-06).
Provenance and peer review Not commissioned; externally peer reviewed.
Patient consent for publication Obtained.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.