An electronic health records cohort study on heart failure following myocardial infarction in England: incidence and predictors

Objectives To investigate the incidence and determinants of heart failure (HF) following a myocardial infarction (MI) in a contemporary cohort of patients with MI using routinely collected primary and hospital care electronic health records (EHRs). Methods Data were used from the CALIBER programme, linking EHRs in England from primary care, hospital admissions, an MI registry and mortality data. Subjects were eligible if they were 18 years or older, did not have a history of HF and survived a first MI. Factors associated with time to HF were examined using Cox proportional hazard models. Results Of the 24 479 patients with MI, 5775 (23.6%) developed HF during a median follow-up of 3.7 years (incidence rate per 1000 person-years: 63.8, 95% CI 62.2 to 65.5). Baseline characteristics significantly associated with developing HF were: atrial fibrillation (HR 1.62, 95% CI 1.51 to 1.75), age (per 10 years increase: 1.45, 1.41 to 1.49), diabetes (1.45, 1.35 to 1.56), peripheral arterial disease (1.38, 1.26 to 1.51), chronic obstructive pulmonary disease (1.28, 1.17 to 1.40), greater socioeconomic deprivation (5th vs 1st quintile: 1.27, 1.13 to 1.41), ST-segment elevation MI at presentation (1.19, 1.11 to 1.27) and hypertension (1.16, 1.09 to 1.23). Results were robust to various sensitivity analyses such as competing risk analysis and multiple imputation. Conclusion In England, one in four survivors of a first MI develop HF within 4 years. This contemporary study demonstrates that patients with MI are at considerable risk of HF. Baseline patient characteristics associated with time until HF were identified, which may be used to target preventive strategies.

Objectives To investigate the incidence and determinants of heart failure (HF) following a myocardial infarction (MI) in a contemporary cohort of patients with MI using routinely collected primary and hospital care electronic health records (EHRs). Methods Data were used from the CALIBER programme, linking EHRs in England from primary care, hospital admissions, an MI registry and mortality data. Subjects were eligible if they were 18 years or older, did not have a history of HF and survived a first MI. Factors associated with time to HF were examined using Cox proportional hazard models. results Of the 24 479 patients with MI, 5775 (23.6%) developed HF during a median follow-up of 3.7 years (incidence rate per 1000 person-years: 63. 8 1.23). Results were robust to various sensitivity analyses such as competing risk analysis and multiple imputation. Conclusion In England, one in four survivors of a first MI develop HF within 4 years. This contemporary study demonstrates that patients with MI are at considerable risk of HF. Baseline patient characteristics associated with time until HF were identified, which may be used to target preventive strategies.

IntrOduCtIOn
Research describing the incidence of heart failure (HF) following myocardial infarction (MI) is limited, mainly originating from the thrombolytic era, often using small sample sizes with contradictory findings, making it difficult to provide evidence-based medicine. For example, a previous UK study among almost 900 patients hospitalised with MI in 1998 found that one-fifth developed HF during their hospital stay and a further third following hospital discharge. 1 More recently, a Swedish study found a 5-year cumulative risk of HF after MI of 21.8% in the calendar period 2004-2013. 2 Further, a Danish nationwide cohort study reported an incidence of HF at 90 days following MI of 19.6% in 2009-2010. 3 Differences between these studies could be related to a number of factors, for example, change in treatment, national policies or definitions of HF, all of which potentially limit the generalisability of results. We used a large contemporary and representative strengths and limitations of this study ► This study based on the use of linked electronic health records from general practitioners and hospital records describes the current burden of heart failure (HF) in a representative sample of patients with a first myocardial infarction. ► The linkage of data from three sources (disease registry, primary care and hospital records) improved diagnostic ascertainment and accuracy in timing of events. ► Misclassification of drug exposure was likely to be minimal, as prescriptions issued during consultation are automatically recorded. ► Risk factor adjustment might have been incomplete given that information regarding baseline body mass index, smoking and blood pressure was missing for 34% to up to 70% of patients. Due to the high degree of missing data on time to revascularisation (88.3%), we did not explore its relation with HF incidence. ► Stratified methods were used to account for potential calendar and centre effects, and competing risk models were used to adjust for potential competing effects on HF and mortality.
Open Access sample of patients with MI based on electronic health records (EHRs) to (1) describe the current incidence of HF after MI in England, using both primary and hospital care data, and (2) explore patient characteristics predictive of post-MI HF.

MethOds study design
Data were used from the CALIBER dataset, which included linked data from (1) 5 For continuous variables, the most recent measurement recorded in CPRD in the year before study entry was used as the baseline value. Data before study entry were used to determine the prognostic potential of age (years, using the date of birth), sex, ethnicity, social deprivation (Index of Multiple Deprivation, IMD score as recorded in ONS in quintiles), smoking, alcohol use, history of cardiovascular disease (CVD), previous coronary revascularisation, history of diabetes, history of thyroid disease, history of chronic obstructive pulmonary disease (COPD), history of depression and history of non-metastatic cancer (see the CALIBER portal (https:// caliberresearch. org/ portal) for details). Diabetes and hypertension diagnosis were based on Read codes from primary care or ICD-10 codes from HES, both of which were classified using primary care consultation records and hospital diagnosis records. Socioeconomic status was based on the IMD which includes seven domains of deprivation (income deprivation, employment deprivation, health deprivation and disability, education, skills and training deprivation, barriers to housing and services and living environment deprivation and crime). 7 The index MI event was characterised using ECG findings (eg, ST-segment elevation myocardial infarction (STEMI) or non-STEMI (NSTEMI)), site of infarction, mode and timing of reperfusion and peak cardiac biomarkers (troponin I, troponin T and creatine kinase). MI was defined by linking all of the CALIBER data sources (eg, HES and CPRD) with the type of MI recorded in MINAP and CPRD, see for more detail https://www. caliberresearch. org and the appendix. Characteristics regarding the index MI (eg, type of revascularisation and delay from symptom onset to delivery of reperfusion therapy) were derived from MINAP. This nationwide registry is part of an annual audit which records over 20 key MI variables. 8 Herrett and colleagues 6 validated the current approach and showed the necessity of linking multiple sources to ascertain MI (see appendix for codes used). The primary study outcome was the first HF event following MI, which was similar to MI recorded in multiple CALIBER data sources; echocardiographic findings and New York Heart Association class were unavailable. All phenotypes were created and validated using a robust methodology described elsewhere 9 and have been used in previously published studies. 10 11 statistical methods Incidence rates (cases per 1000 person-years) and Kaplan-Meier cumulative incidence rates of HF were estimated, with Kaplan-Meier curves stratified by age (<50, 50-65 and ≥65 years) and type of MI. Association between baseline variables with the onset of HF following MI was explored using Cox proportional hazard models. The proportional hazards assumption was checked using Schoenfeld residuals and by non-parametric correlation coefficients between survival time and the parameter specific residuals. All models were stratified on general practice and calendar year periods of enrolment (1998-2001, 2001-2004, 2004-2007 and 2007-2010). Stratified models were used instead of frailty models because the former does not make any distributional assumption. Models were sequentially adjusted for: (1) age and sex, (2) cardiovascular risk factors, (3) type of MI and (4) comorbidities and prescribed medication. Associations are presented as HRs with 95% CIs. Analyses were performed using R (V.3.3.2). 12 sensitivity analyses Due to the availability of EHR from multiple sources, clinical diagnoses and prescriptions were completely Open Access recorded. Biomarker and lifestyle measurements such as smoking, body mass index (BMI) and blood pressure were, however, incompletely recorded (see table 1). It is probable that these data were preferentially recorded in subjects perceived to be at a higher risk for early progression of CVD. Pairwise analyses regressing a complete case indicator on observed variables indeed showed considerable dependency between recorded data and missingness (results not shown), violating the 'missing completely at random' assumption. This dependency was used by multiply imputing missing data using the mice package, which was implemented using 20 imputed datasets and pooled based on Rubin's rules (online supplementary methods). 13 To account for the fact that patients may have died before the onset of HF (eg, competing risk by mortality), analyses were repeated using Fine and Gray models. 14 15 As a hypothetical example of competing risk, let us assume that out of 100 subjects with MI, 1 develops HF and 99 die. In a Cox's regression analysis, all 99 dead subjects would be censored (implicitly, and incorrectly, assuming they may develop HF later in time), and the hazard of HF would equal 1/1. Instead, a Fine and Gray model recognises that the 99 dead subjects could never develop HF and hence calculates the hazard as 1/100.
In addition to these sensitivity analyses, we performed a cancer-stratified and revascularisation-stratified analysis. Due to discrepancies in sample size between the subgroups, we decided against formal interaction testing, which may suffer from inflated false positive rates and lower power in such settings. 16 results subjects and baseline characteristics In total, there were 52 770 patients with an index MI during the study period (figure 1). Excluding patients with a fatal index MI (n=15 104), a prior history of HF (n=3876), missing information on the type of MI (n=97) and subjects with less than 1 year of follow-up prior to the indexing MI event (n=9214) resulted in an analytical cohort of 24 479 patients. Patients were recruited from 226 general practitioner (GP) practices, with the median practice enrolling 101 patients (quartile (Q1) 65 and Q3 150). Of the included subjects, 4657 patients were STEMI and 19 822 patients were NSTEMI (0% missing), 15 969 (65.2%) were men (0% missing), 4538 (42.1%) currently smoked (56% missing), 12 258 (50.1%) had hypertension (0% missing) and 3014 (12.3%) had a history of diabetes at baseline (0% missing). In total, 6129 (25.0%) were prescribed beta-blockers (0% missing) and 5039 (20.6%) ACE inhibitors (0% missing) prior to their indexing MI (table 1). The number of patients identified across the previously defined calendar periods was reasonably stable, with a minimum of 5219 subjects in the period 1998-2001 compared with a maximum of 6838 subjects identified in the period 2004-2007.

Incidence
Patients contributed 90 482 person-years of follow-up, during a median follow-up time of 3.7 years (IQR 1.5; 6.7), and 5775 patients (23.6%) developed HF. The crude incidence rate of HF following a first MI was 63.8 (95% CI 62.2 to 65.5) per 1000 person-years. Within the first 30 days post-MI follow-up, 2438 (10.0%) patients developed HF (figure 2). From day 30 onwards, 3337 (15.8%) of patients with MI (event free during the first 30 days) developed HF, with 6.8% experiencing an HF event within the first year (figure 2). The incidence of HF during the first 30 days of follow-up was 4.3% (102) in patients younger than 50 years, 6.0% (459) in 50-65 year olds and 12.9% (1877) in those 65 years and older, with HF incidence increasing proportionally as time progressed towards 10 years (figure 3 top row). The 30-day incidence of HF was 9.5% (1892 events) for subjects with NSTEMI and 11.7% (546 events) for subjects with STEMI. Excluding patients who experienced HF within the first 30 days showed that patients with STEMI had a lower incidence of HF than subjects with NSTEMI (figure 3 lower panels). At 57 days, the crude cumulative risk of HF in the subjects with NSTEMI surpassed that of the subjects with STEMI for the first time (0.0151 vs 0.0147), with the curves further diverging at 73 days of follow-up (since indexing MI).
Over the entire follow-up, 5921 subjects died, of which 3538 were free of HF at the time. During the first 30 days of follow-up, only one patient died, limiting the potential for competing risk by all-cause mortality. In patients who did not have HF after 30 days accounting for competing risk by all-cause mortality attenuated the cumulative risk to be about 15% after 10 years of follow-up (online supplementary data). Furthermore, the cumulative risks of all-cause mortality and HF converged as follow-up time progressed towards 10 years.  data). After multiply imputing, the data models were extended to include smoking, BMI and systolic and diastolic blood pressure variables which showed BMI, male sex and smoking to be conditionally independent prognostic factors (online supplementary data). The extended model 5 was implemented using imputed data and non-imputed (complete case) data, resulting in similar HR estimates (magnitude and direction). Stratifying the sample on cancer diagnosis Open Access or history of revascularisation showed broadly similar results between subgroups (see online supplementary data, focusing on the CIs); however, precision was low due to the limited number of patients with a history of revascularisation or cancer.

dIsCussIOn
In this large population-based study using linked EHRs, 23.6% of patients who survived a first MI developed HF during a median follow-up of 3.7 years, resulting in an incidence rate of 64 cases per 1000 person-years. Incident HF was associated with increasing age, higher socioeconomic deprivation, a history of diabetes, atrial fibrillation, peripheral arterial disease, COPD, STEMI at presentation, BMI and smoking. A previous Canadian study, using data from the period 1994-2000, 17 found that 71% of elderly patients without HF at index admission developed HF within 5 years' time after an MI, whereas the mortality due to MI decreased in the same period. The Framingham Heart Study, 18 using data from 1990 to 1999, found a 5-year post-MI HF incidence of 31.9% after MI; lower than found in the Canadian study. Importantly, both studies showed a higher incidence of post-MI HF than our more contemporary English cohort. This lower HF incidence is likely due to continued improvements of MI treatment strategies, which are reflected in a decreased HF incidence over calendar time. For example, a 20 812 sample of patients with MI hospitalised in Western Australia showed that the overall 1-year incidence of HF after MI decreased from 28.1% in 1996 to 16.5% in 2007. 19 This decline is confirmed further by a national Swedish hospital discharge and death registry study reporting a one-third decline in incidence between 1993 and 2004. 20 The same Swedish group recently reported that this trend persists for the period 2004-2013 2 and showed improved pharmacological treatment and early revascularisation in this period. During a median follow-up of 4 years, 19% of the patients were rehospitalised because of HF. An explanation for the higher percentage in our study is probably    Open Access the inclusion of HF diagnosis made in the GP setting next to hospital records decreasing misclassification of HF events. Future efforts are needed to harmonise these national data sources to compare daily care and HF epidemiology across different countries.

Prognostic factors
Similar to the current study, Torabi et al 21 reported that HF after MI increased steeply with age. Socioeconomic deprivation has been shown to be an independent predictor of HF development and associated with an increased incidence of HF in MI-free subjects. 22 23 Socioeconomic deprivation has also been associated with more frequent hospital admission and higher mortality in patients with HF. [24][25][26] The current study extends these observations to HF incidence after an MI. Furthermore, we showed that a history of diabetes, atrial fibrillation, peripheral arterial disease, COPD, STEMI at presentation, BMI and smoking are all independently prognostic of HF after MI. Interestingly, despite the size of the collected sample, sex only became significantly associated with HF after accounting for differences in BMI and smoking. Furthermore, the impact of male sex (HR 1.07, 95% CI 1.01 to 1.13) was modest indicating relative sex equality. It is of note that blood pressure was not significantly associated with HF. However, blood pressure measurements were frequently missing, which (even after multiple imputation) may be the cause of the observed lack of association, hence this deserves independent exploration. In this light, it is important to note that the diagnosis of hypertension was associated with HF. Potentially, these discrepancies between the association of blood pressure measurement and hypertension diagnosis can be explained by noting that a recorded diagnosis is indicative of long-term hypertensions, which may be different than a single blood pressure measurement in terms of prognosis. Further, given that both variables were included in the same model, the observed difference in association may suggest that conditional on hypertension, blood pressure itself is only modestly associated to HF, if at all.

strengths and limitations
The linkage of multiple EHR sources from primary and hospital care allowed for the collection of a representative sample, 27 which enabled us to explore the prognostic value of routinely collected data in primary care records and to detect non-hospitalised HF cases. The population of CPRD practices has been shown broadly representative of the UK population. 4 28 In total, 226 GP practices consented to data linkage with HES, MINAP and ONS (containing 3.9% of the population of England in 2006). A potential limitation is that the ascertainment of cardiovascular outcomes was not based on clinical criteria (eg, validated questionnaires and properly conducted physical examinations), practices of medical coding will have changed over time and there could be subgroups of patients with left ventricular dysfunction without clinical symptoms. Calendar-dependent changes over time were accounted for by using time-period stratified Cox models. We further wish to highlight the infrequent use of percutaneous coronary intervention in patients with STEMI, which was shown to be representative of the slower uptake in England. 29 Similarly, 32% of the patients used antiplatelet drugs at baseline which may be a reflection of non-MI CVD burden. The Kaplan-Meier plots for subtype of MI indicate a possible violation of the proportional hazard assumption for this variable. However, these plots represent the pairwise associations between MI subtype and time to HF and, as such, assume that MI subtype is independent of other prognostic factors, which is known to be false. The importance of conditioning on covariates is underlined by noting the Kaplan-Meier plots indicating a protective effect of STEMI versus NSTEMI, which was significantly reversed after correcting for covariates

Open Access
(models [3][4]. Similarly, instead of using Kaplan-Meier plots to assess the proportional hazard assumption of the crude associations, we used Schoenfeld residuals to explore this assumption for the conditional associations (multivariable model 4), which did not show any violations. Based on model 4, the absolute correlation between the Schoenfeld residuals and time was <0.10 (eg, for MI subtype, this was 0.06) indicating an absence of relevant interaction by time (eg, non-proportionality of hazard).
Residual confounding due to medication use (or other missing/mis-specified variables) might be another potential source of bias; however, our intention was not to perform a causal analysis between drug prescriptions and time to HF. As such, it is interesting to note that despite the large sample size, we did not observe a significant association of ACE, angiotensin receptor blocker or betablocker prescribed prior to the indexing MI event and time to HF. Relatedly, we acknowledge that we did not assess all potential predictors of post-MI HF, for example, due to the high degree of missingness, we did not explore the prognostic potential of time to revascularisation. 30 We adhered to CPRD recommendations to obtain up-to-standard baseline data by excluding patients with less than 1 year at CPRD practice prior to index MI. Previously, Lewis and colleagues showed that 3 months after registration with a new practice, most patient characteristics were updated correctly, which approximated 100% after 1 year of follow-up. 31 Assuming that duration of the CPRD follow-up is independent of the relations assessed here, excluding such patients should not hamper generalisability of results. Using the first MI recorded in the database without a prior history of HF might have introduced bias due to left truncation (eg, some subjects may already have experienced an MI before enrolment). 32 However, CALIBER holds longitudinal records from primary and hospital care, making it unlikely a large part of the patients were misclassified as MI free and the 1 year of follow-up prior to entry further decreases the likelihood of misclassifying patients. 31 While the current data are adequate to identify subjects with a first MI, the subclassification of patients with MI into STEMI and NSTEMI, despite recent improvements, is known to be error prone. 33 As such, results for MI type need replication using higher quality data, in perhaps purposely designed studies. Due to our interest in HF occurrence after a first MI, selection bias (eg, index event bias) was introduced. 34 This index event bias does not impact the descriptive or prognostic value of the association presented and is mostly relevant if one wants to develop an intervention based on the associations presented, which was not the aim of this study. Additionally, we note that we reduced the influence of selection bias by accounting for dependencies between predictors. An important caveat of electronic healthcare records is that these data are predominantly focused on recording diagnoses and prescriptions but not on their complement (ie, who is not diseased or who did not receive a drug). As such, we have assumed that subjects without a recorded drug prescription or diagnoses were unexposed or free of (that specific) disease. Provided that electronic registration is required for a patient to fill a prescription, we can be fairly confident that we did not miss many prescribed treatments. However, it is likely that some subjects were misclassified as free of disease while in fact they were not. We've attempted to minimise this misclassification by linking data across multiple healthcare settings and data sources (MINAP, HES and CPRD and ONS).
A final limitation is that there is a possible delay between primary and hospital care records. Previous research has shown that MI events tend to be recorded in primary care after the HES or MINAP record. 6 The lower 30-day HF incidence in patients with unclassified MI primarily derived from primary care could be partly explained because of a delay in coding. Therefore, we showed cumulative incidence rates in patients (alive and HF event free within the first 30 days) from 30 days after index MI to account for a delay in recording of MI in primary care. We were unable to differentiate between HF with preserved ejection fraction and HF with reduced ejection fraction as we had no access to detailed (echocardiographic) parameters to assess diastolic dysfunction. It is likely, however, that the majority of our patients with HF had developed systolic dysfunction after MI.

COnClusIOn
In this large cohort study using linked EHRs in England from primary and hospital care, about one in four people developed HF within a median of 4 years after surviving a first MI. Increasing age, higher socioeconomic deprivation, a history of hypertension, diabetes, atrial fibrillation, peripheral arterial disease, COPD, smoking and STEMI at presentation were independent determinants of new onset HF following MI.