Objectives Different diagnostic algorithms for non-acute heart failure (HF) exist. Our aim was to compare the ability of these algorithms to identify HF in symptomatic patients aged 80 years and older and identify those patients at highest risk for mortality.
Design Diagnostic accuracy and validation study.
Setting General practice, Belgium.
Participants 365 patients with HF symptoms aged 80 years and older (BELFRAIL cohort). Participants underwent a full clinical assessment, including a detailed echocardiographic examination at home.
Outcome measures The diagnostic accuracy of 4 different algorithms was compared using an intention-to-diagnose analysis. The European Society of Cardiology (ESC) definition of HF was used as the reference standard for HF diagnosis. Kaplan-Meier curves for 5-year all-cause mortality were plotted and HRs and corresponding 95% CIs were calculated to compare the mortality risk predicting abilities of the different algorithms. Net reclassification improvement (NRI) was calculated.
Results The prevalence of HF was 20% (n=74). The 2012 ESC algorithm yielded the highest sensitivity (92%, 95% CI 83% to 97%) as well as the highest referral rate (71%, n=259), whereas the Oudejans algorithm yielded the highest specificity (73%, 95% CI 68% to 78%) and the lowest referral rate (36%, n=133). These differences could be ascribed to differences in N-terminal probrain natriuretic peptide cut-off values (125 vs 400 pg/mL). The Kelder and Oudejans algorithms exhibited NRIs of 12% (95% CI 0.7% to 22%, p=0.04) and 22% (95% CI 9% to 32%, p<0.001), respectively, compared with the ESC algorithm. All algorithms detected patients at high risk for mortality (HR 1.9, 95% CI 1.4 to 2.5; Kelder) to 2.3 (95% CI 1.7 to 3.1; Oudejans). No significant differences were observed among the algorithms with respect to mortality risk predicting abilities.
Conclusions Choosing a diagnostic algorithm for non-acute HF in elderly patients represents a trade-off between sensitivity and specificity, mainly depending on differences between cut-off values for natriuretic peptides.
Statistics from Altmetric.com
Strengths and limitations of this study
Community studies involving elderly patients are scarce, making this study particularly valuable.
All variables needed to evaluate diagnostic performance of the different rules and algorithms were collected in the BELFRAIL study, including a detailed description of echocardiographic variables.
No distinction was made between heart failure with reduced ejection fraction and heart failure with preserved ejection fraction as both are important to detect.
All patients with heart failure symptoms were included, not just those suspected of heart failure by their general practitioner (GP), since suspicion is subjective and could lead to misclassification.
Clinical assessment was performed by the personal GP of each patient, which may have increased heterogeneity, although GPs were trained to limit this risk.
Early diagnosis of heart failure is important to initiate treatment in a timely fashion, as it may reduce mortality, hospitalisations and healthcare costs.1 However, when access to echocardiography is limited, such as primary care, diagnosing heart failure in elderly individuals is challenging.1 ,2 Non-acute heart failure with a gradual onset of symptoms is especially often underdiagnosed or detected in a later stage.3 ,4 Primary care physicians are in need of a reliable diagnostic algorithm that helps them to decide which older symptomatic patients they must refer.
Different diagnostic rules and algorithms to diagnose non-acute heart failure do exist and all of them incorporate natriuretic peptides.5–8 However, since natriuretic peptide levels increase with age and are influenced by comorbidities, controversy remains regarding optimal cut-off values among elderly individuals.9 ,10 Additionally, Mant et al11 incorporated clinical characteristics in their algorithm.6 This algorithm has not proven more beneficial than algorithms using only natriuretic peptides.12 ,13 Two new diagnostic algorithms have been published7 ,8 including an algorithm specifically designed for elderly patients.8 However, the ability of these algorithms to identify non-acute heart failure in symptomatic patients aged 80 years and older has never been compared. Furthermore, it is unknown whether these diagnostic algorithms differ with respect to the identification of participants at high risk for mortality.
Therefore, an ‘intention-to-diagnose’ analysis14 was performed within the BELFRAIL cohort and the ability of the different diagnostic rules and algorithms to, first, confirm the diagnosis of heart failure in very old symptomatic patients and, second, to identify participants at high risk for mortality.
The BELFRAIL (BFC80+) study is a prospective, observational, population-based cohort study of participants aged 80 years and older living in three well-circumscribed areas of Belgium. The study's design, sampling methods and cohort characteristics have been described previously.15 In brief, a total of 29 general practice centres participated in the BELFRAIL study. The participating centres were asked to include patients aged 80 years and older in the cohort. Only three exclusion criteria were used: (1) severe dementia (known Mini-Mental State Examination (MMSE) <15/30), (2) undergoing palliative care or (3) a medical emergency like acute heart failure. Two sampling methods for the recruitment of patients were used. Two general practice centres were asked to include all eligible patients. The remaining 27 centres were asked to include a maximum of three consecutive patients during a 3-week interval. In these 3 weeks, the general practitioners (GPs) also planned their visit. This interval was repeated five times, so every participating centre included a maximum of 15 patients. Every interval of recruitment was started on a different day to avoid selection bias. Between 2 November 2008 and 15 September 2009, 567 participants were included in the BELFRAIL cohort. Three hundred participants were included using the first sampling method, and 267 using the second sampling method. Out of 567 patients, 365 (70%) presented with heart failure symptoms such as dyspnoea (Medical Research Council (MRC) ≥2), fatigue or ankle oedema. A detailed follow-up regarding cause-specific mortality was collected from the participants' GPs until 5.2±0.25 years after baseline. The causes of death were divided into cardiovascular and non-cardiovascular. The BELFRAIL cohort is representative in gender and age of the very elderly living in Belgium.15 STARD recommendations for reporting of diagnostic accuracy studies were followed.16
Each GP recorded a full medical history and performed a detailed, standardised clinical examination at baseline. The analyses included questions regarding dyspnoea (according to the MRC dyspnoea scale),17 fatigue, orthopnoea, oedema of the lower extremities, wheezing and loss of appetite. The clinical examination consisted of a heart auscultation, a heart rate and breathing rate measurement. The apical beat was palpated and recorded when abnormal. Lung auscultation was performed to detect crepitus. Jugular venous pressure (JVP) was measured and noted if elevated, and the presence of hepatojugular reflux (HJR) and oedema of the lower extremities was checked. Body mass index (BMI) was calculated by a clinical research assistant based on a standardised measurement of height and weight. The patients' GPs reported a history of myocardial infarction and important cardiac interventions such as percutaneous transluminal coronary angioplasty, stenting or coronary surgery. The Anatomic Therapeutic Chemistry classification system was used to register medication use.18 Data regarding loop diuretics were used for the analyses.
The serum N-terminal probrain natriuretic peptide (NT-proBNP) levels were measured using a Dade-Dimension Xpand (Siemens, Deerfield, Illinois, USA). The coefficient of variation ranged from 3.9% to 4.3%.
A 12-lead ECG was recorded by a clinical research assistant at baseline using a QRS Universal ECG device (QRS diagnostics, Plymouth, Minnesota, USA, http://www.qrsdiagnostic.com). A single cardiologist blinded to all other study results analysed each of the ECGs according to the Minnesota Code Classification System for Electrocardiographic Findings and was asked to report whether each patient's ECG findings were completely normal (yes or no).
Four diagnostic algorithms were chosen because they applied to patients aged 80 years or older with non-acute heart failure in primary care. The 2012 European Society of Cardiology (ESC) diagnostic algorithm5 and the diagnostic rules proposed by Mant et al,6 Kelder et al7 and Oudejans et al8 were studied (see online supplementary appendix 1). The ESC algorithm (NT-proBNP≥125 ng/mL or positive ECG) was studied both with and without the inclusion of patients' ECG results.5 The diagnostic rule by Mant was evaluated by considering direct referral for echocardiography of patients with a history of myocardial infarction or basal crepitus or men with ankle oedema. In the remaining patients, referral for echocardiography was considered for women without ankle oedema and NT-proBNP levels above 620 pg/mL, women with ankle oedema and NT-proBNP levels above 190 pg/mL and men with NT-proBNP levels above 390 pg/mL.6 Using the diagnostic rule of Kelder, heart failure was ruled out if the summed score was <24 (<20% probability of heart failure).7 A score between 24 and 54 was considered uncertain (between 20% and 70% probability of heart failure). For patients with a score above 54, referral for echocardiography was considered (>70% probability of heart failure). The diagnostic rule of Oudejans considered heart failure unlikely if the summed score was 16 or less. Heart failure was considered likely with a score of 32 or more and uncertain with a score between 16 and 32.8
Echocardiograms were performed at baseline using a commercially available portable system (CX50, Philips, Andover, Massachusetts, USA) with M-mode, two-dimensional and pulsed, continuous-wave and colour-flow Doppler capabilities. The echocardiograms were performed at the participants' homes by a single cardiologist blinded to the patients' clinical characteristics and laboratory test results. A complete examination was performed in accordance with the recommendations of the American Society of Echocardiography and the European Association of Echocardiography (ASE-EAE).19 The methods, prevalences of the echocardiographic abnormalities and quality of the echocardiographic images were described previously.20 Briefly, left ventricular (LV) systolic function was calculated using Simpson's biplane method.19 Systolic dysfunction was defined as an LV ejection fraction (LVEF) ≤50%. The functions of the mitral valve and the aortic valve were evaluated using colour Doppler echocardiography after optimising the gain and Nyquist limit. Stenotic and regurgitant valve diseases were evaluated using semiquantitative and quantitative methods recommended by the ASE.21 ,22 Participants with prosthetic valves were evaluated separately.21 Clinically relevant valve disease was defined as mitral stenosis of any severity, severe aortic stenosis (aortic valve area <1 cm2), moderate or severe mitral regurgitation, and moderate or severe aortic regurgitation. Diastolic function was assessed using mitral flow velocities obtained via pulsed Doppler and pulsed tissue Doppler at the level of the mitral annulus.23 Additional apical and parasternal views were also recorded to assess tissue velocity (colour tissue Doppler). The ASA-EAE guidelines were used to assess diastolic dysfunction.23
Heart failure defined according to the ESC guideline based on a combination of heart failure symptoms and signs (table 1) and objective cardiac dysfunction was used as the reference standard.5 Objective cardiac dysfunction included a LVEF≤50; clinically relevant valvular heart disease and severe diastolic dysfunction, as used in previous BELFRAIL publications.14 ,20
Continuous variables are presented either as means±SDs or as medians and IQRs. Categorical data are presented as frequencies and proportions. Baseline variables were compared using the χ2 test or the independent Student’s t-test to compare means, and the Mann-Whitney U test to compare medians. p Values below 0.05 were considered statistically significant. The diagnostic accuracy (the sum of true negatives and true positives divided by the total number of participants) of the different algorithms and rules for heart failure was calculated using an intention-to-diagnose analysis. This method, analogous to an ‘intention-to-treat’ analysis, starts from a diagnostic intent, at any given moment, in a population at risk. This means that all patients who are at risk of a specific target condition should be involved in the analyses, regardless of all previously known diagnoses.14 Kaplan-Meier curves for all-cause mortality after 5 years were plotted for the different diagnostic algorithms, with log-rank tests used for comparisons. HRs and corresponding 95% CIs were calculated using Cox proportional hazard models. The net reclassification improvement (NRI) was calculated for Oudejans', Kelder's and Mant's diagnostic rules and compared with the ESC algorithm for the diagnosis of heart failure and the prediction of 5-year all-cause and cardiovascular mortality in order to measure and compare the ability of the rules to detect participants at high risk for mortality. All data analyses were performed using SPSS V.22.0 for Windows (SPPS, Chicago, Illinois, USA).
In order to validate the existing rules, only participants with heart failure symptoms such as dyspnoea (MRC≥2), fatigue or ankle oedema were studied (n=365, 70%). The mean age of the study participants was 85±3.8 years, 123 of whom were men (34%). A total of 40 patients were institutionalised (11%). Heart failure was present in 74 patients (20%).
Table 1 includes the distribution of the different variables used in the diagnostic algorithms and rules. Patients with heart failure were older and presented more often with orthopnoea, ankle oedema in men, an irregular pulse, systolic heart murmur, elevated JVP or HJR. Loss of appetite and a lower BMI, proposed as clinical markers in the elderly (see online supplementary appendix 1), were not statistically different across the two patient groups.
Prediction of heart failure
All rules and algorithms exhibited a strong ability to exclude heart failure. However, a high number of unnecessary referrals were noted.
The 2012 ESC diagnostic algorithm
The 2012 ESC diagnostic algorithm proposes to refer patients suspected of heart failure based on an abnormal ECG or elevated natriuretic peptide levels (NT-proBNP≥125 pg/mL; see online supplementary appendix 1).5 However, the proportion of patients with an abnormal ECG was high in this population of very elderly people (n=238, 65%). As a consequence, the ESC algorithm with ECG results resulted in a higher number of false positives (n=246, 77% of total referrals, specificity 15%) and a similar sensitivity compared with the ESC algorithm without ECG results. Therefore, the ESC algorithm without ECG results was chosen for subsequent analyses and considered an index test for the calculation of the NRI (see online supplementary appendix 1).
By applying the ESC algorithm without ECG results (NT-proBNP≥125 pg/mL) to our study population, only six patients with heart failure were missed, corresponding to a sensitivity of 92%. However, 191 patients (74% of total referrals) were referred despite the absence of heart failure (specificity 34%; tables 2 and 3).
The diagnostic rule of Oudejans et al
By applying the lower score of the diagnostic rule by Oudejans et al (summed score >16), the highest number of patients with HF was missed (n=19, a sensitivity of 74%), but the rule also corresponded to the lowest number of unnecessary referrals (n=78, 59% of total referrals) and consequently the highest specificity (73%; tables 2 and 3).
Using the higher score of Oudejans' rule (summed score >32) resulted in only 29 false-positive cases (a specificity of 90%), of which 17 (59%) had at least one other echocardiographic abnormality. However, sensitivity dropped to 42% (n=43 patients with missed heart failure). Therefore, for subsequent analyses, the lower score of Oudejans' rule was used.
The rule of Oudejans yielded a significant NRI of 21% (95% CI 9% to 32%, p<0.001) in predicting heart failure compared with the ESC diagnostic algorithm (table 4). In Oudejans' rule, an NT-proBNP cut-off of 400 pg/mL is used in addition to clinical characteristics (see online supplementary appendix 1). However, the comparison between NT-proBNP as a stand-alone test (cut-off 400 pg/mL) and the full diagnostic rule of Oudejans regarding both diagnostic and prognostic accuracy yielded no significant differences (data not shown). Hence, the differences between the ESC algorithm and the diagnostic rule of Oudejans could be ascribed to differences in NT-proBNP cut-off values (125 vs 400 pg/mL).
The diagnostic rule of Kelder et al
By applying the higher score of Kelder's rule (summed score >54), 14 patients with heart failure were missed (sensitivity 81%). False positives were noted in 124 patients (67% of total referrals), corresponding to a specificity of 57%.
Using the lower score of Kelder's rule (summed score ≥24) led to zero false negatives (sensitivity 100%), but only 18 patients were not referred in that case (specificity 6%). Therefore, for subsequent analyses, the higher score of Kelder's rule was used.
The rule of Kelder also yielded a significant NRI of 12% (95% CI 0.7% to 22%, p=0.04) in predicting heart failure compared with the ESC diagnostic algorithm (table 4).
The diagnostic rule of Mant et al
Applying the diagnostic rule of Mant et al led to 15 patients with missed heart failure (sensitivity 80%) and 148 unnecessary referrals (71% of total referrals, specificity 59%). No significant NRI was noted compared with the ESC algorithm.
Five-year mortality predictions
During the 5-year follow-up period, 177 patients died (49%), including 78 (44%) who died of cardiovascular causes. There was no loss to follow-up for mortality. Of the patients with heart failure, 61% died (n=45). The proportion of cardiovascular deaths within this group was 60% (n=27). HR for all-cause mortality of heart failure was 1.6 (95% CI 1.2 to 2.3, p=0.004), and for cardiovascular mortality 2.5 (95% CI 1.6 to 4.0, p<0.001). The algorithms and diagnostic rules successfully identified patients at risk for mortality (figure 1). HRs for all-cause mortality ranged from 1.9 (95% CI 1.4 to 2.5; Kelder) to 2.3 (95% CI 1.7 to 3.1; Oudejans), and from 1.7 (95% CI 1.0 to 2.6; Mant) to 2.9 (95% CI 1.9 to 4.5; Oudejans) for cardiovascular mortality.
The improvements in mortality risk classification of the diagnostic rules of Kelder and Mant were only minor compared with the ESC algorithm. The diagnostic rule of Oudejans exhibited a larger degree of improvement, particularly with respect to cardiovascular mortality risk classification, although the said improvement was not statistically significant (NRI 10% (95% CI −3% to 22%, p=0.18; table 4).
The present study demonstrated that all diagnostic algorithms have their limitations in symptomatic patients aged 80 years and older. Therefore, choosing a diagnostic algorithm for non-acute HF in elderly patients represents a trade-off between sensitivity and specificity, mainly depending on differences between used cut-off values for natriuretic peptides. The ESC algorithm exhibited the highest sensitivity. However, implementing this algorithm in this age group also resulted in high referral rates. In contrast, the diagnostic rule of Oudejans exhibited the highest accuracy with reasonable referral rates (36%, n=133). The said rule was a superior predictor of heart failure compared with the ESC algorithm because of a strong gain in specificity, however, in spite of a decrease in sensitivity. Furthermore, all algorithms were able to identify patients at high risk for 5-year mortality.
The prevalence of heart failure in this population-based cohort study of symptomatic elderly individuals was 20%, as could be expected in this age group.24 The 2008 ESC algorithm with an NT-proBNP cut-off point of 400 pg/mL, the clinical decision rule of Mant and the Dutch guideline (either NT-proBNP≥125 pg/mL or a positive ECG) were validated by Oudejans et al12 in a cohort of geriatric outpatients (patients referred to a geriatrician on suspicion of HF). The prevalence of heart failure was 45% in this cohort, resulting in higher negative predictive values compared with this study (Mant et al: 95% vs 91%), as well as higher positive predictive values (Mant et al: 62% vs 29%) and higher referral rates (Mant et al: 70% vs 57%). They concluded that the 2008 ESC algorithm performed the best.12 However, the Newcastle 85+ study demonstrated, in line with our results, that both proposed NT-proBNP cut-off points (125 and 400 pg/mL) had their limitations in an elderly population. Their used reference standard was LV dysfunction.13 However, both studies agreed that algorithms using natriuretic peptide levels alone performed better than the algorithms incorporating patients' clinical characteristics.12 ,13
For clinicians, the decision regarding the referral of patients suspected of having a diagnosis of heart failure will always be a trade-off between the potential benefits afforded by additional investigations and the burdens the said investigations impose on affected patients.12 In elderly patients, this choice is more difficult, as the benefits may be smaller,5 ,25 ,26 and the burdens larger. Therefore, large numbers of false positives should be avoided, a scenario made possible with the use of the diagnostic rule of Oudejans. On the other hand, effective treatments exist for heart failure characterised by either a reduced ejection fraction or valvular heart disease,5 ,25 highlighting the importance of reducing false-negative cases, an area in which the ESC algorithm excels. However, no differences in the identification of patients at high risk for 5-year mortality were observed among the different rules and algorithms. Another difficulty with an elderly population is the high prevalence of echocardiographic abnormalities overall. Only 25–35% of the false-positive cases did not exhibit any echocardiographic abnormalities. Additional research is warranted regarding the clinical significance of these cardiac phenotypes among elderly patients.
NT-proBNP levels played a crucial role in all decision rules and algorithms. Oudejans selected 400 pg/mL as a cut-off point, whereas the ESC algorithm selected 125 pg/mL. Since the comparison between NT-proBNP as a stand-alone test (cut-off 400 pg/mL) and the full diagnostic rule of Oudejans regarding both diagnostic and prognostic accuracy yielded no significant differences, the distinction between the ESC algorithm and the diagnostic rule of Oudejans could be ascribed to differences in NT-proBNP cut-off values. The diagnostic value of natriuretic peptides in heart failure has been proved, even in the elderly.14 ,27 However, controversy remains about optimal NT-proBNP cut-off values among elderly patients.9 ,10 ,13 ,28–30 Previous studies have advocated the use of age-dependent values.10 ,29 ,30 However, these values were highly dependent on settings, populations and reference standards, and warrant further investigation. Although the benefits of extensive diagnostic rules with clinical variables were not proven, it is important to note that natriuretic peptide levels must never be seen as a stand-alone test. A thorough clinical assessment must always take precedence.
This was the first comprehensive study to simulate and compare the ability of the diagnostic rules of Oudejans and Kelder to detect heart failure among symptomatic elderly patients. Additionally, this was the first study to describe and compare the abilities of different algorithms and clinical decision rules in identifying patients at high risk for mortality. Moreover, community studies involving elderly patients are scarce, which made our data particularly valuable. However, a few limitations of this study should be noted. First, heart failure was chosen as the reference standard and defined according to the ESC definition of heart failure. No difference was made between heart failure with reduced ejection fraction and heart failure with preserved ejection fraction. Although these entities have different therapeutic implications, both are important to detect. Debate is possible about which echocardiographic abnormalities should be regarded as objectified cardiac dysfunction. Therefore, a nuanced presentation of all echocardiographic abnormalities present in the group of patients with and without heart failure was given in table 3. Second, all patients with heart failure symptoms were included, not just those suspected of heart failure by their GPs. This approach was chosen since suspicion of heart failure by GPs is subjective and patients could be misclassified based on suspicion. Third, the clinical assessment was performed by each patient's personal physician, which may have increased the heterogeneity of our data. However, all participating GPs were trained, and the assessments of their patients were standardised to limit the risk of heterogeneity as much as possible.15
In conclusion, in a cohort of symptomatic patients aged 80 years and older, all diagnostic algorithms had their limitations. Therefore, choosing a diagnostic algorithm for non-acute HF in elderly patients represents a trade-off between sensitivity (2012 ESC algorithm) and specificity (diagnostic rule of Oudejans et al), depending on differences between used cut-off values for natriuretic peptides. The rules and algorithms were all able to detect patients at high risk for mortality without significant differences between the rules. Further research is warranted to identify strategies that optimise both sensitivity and specificity in a heterogeneous population of elderly patients.
This study was made possible by the participating GPs.
Contributors All named authors have made substantial contributions to the conception and design of the BELFRAIL study and have seen and approved the final version of the manuscript. MS analysed and interpreted the data and drafted the work. BV and JD acquired and interpreted the data and critically revised the work. PW and J-LV acquired the data and critically revised the work. SJ, CM and BA critically revised the work.
Funding The BELFRAIL study (B40320084685) was supported by an unconditional grant from the Foundation Louvain. The Foundation Louvain is the support unit of the Université Catholique de Louvain and is charged with developing the educational and research projects of the university by collecting gifts from corporations, foundations and alumni.
Competing interests None declared.
Patient consent Obtained.
Ethics approval All participants provided informed consent, and the Biomedical Ethics Committee of the Medical School of the Université Catholique de Louvain (UCL) of Brussels approved the study. All clinical and laboratory tests were performed in accordance with the Declaration of Helsinki.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement All authors had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.