Objectives To describe incidence and prevalence of cardiovascular disease (CVD), its risk factors, medication prescribed to treat CVD and predictors of CVD within a nationally representative dataset.
Design Cross-sectional study of adults with and without CVD.
Setting The Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC) is an English primary care sentinel network. RCGP RSC is over 50 years old and one of the oldest in Europe. Practices receive feedback about data quality. This database is primarily used to conduct surveillance and research into influenza, infections and vaccine effectiveness but is also a rich resource for the study of non-communicable disease (NCD). The RCGP RSC network comprised 164 practices at the time of study.
Results Data were extracted from the records of 1 275 174 adults. Approximately a fifth (21.3%; 95% CI 21.2% to 21.4%) had CVD (myocardial infarction (MI), angina, atrial fibrillation (AF), peripheral arterial disease, stroke/transient ischaemic attack (TIA), congestive cardiac failure) or hypertension. Smoking, unsafe alcohol consumption and obesity were more common among people with CVD. Angiotensin system modulating drugs, 3-hydroxy-3-methylglutaryl-coenzyme (HMG-CoA) reductase inhibitors (statins) and calcium channel blockers were the most commonly prescribed CVD medications. Age-adjusted and gender-adjusted annual incidence for AF was 28.2/10 000 (95% CI 27.8 to 28.7); stroke/TIA 17.1/10 000 (95% CI 16.8 to 17.5) and MI 9.8/10 000 (95% CI 9.5 to 10.0). Logistic regression analyses confirmed established CVD risk factors were associated with CVD in the RCGP RSC network dataset.
Conclusions The RCGP RSC database provides comprehensive information on risk factors, medical diagnosis, physiological measurements and prescription history that could be used in CVD research or pharmacoepidemiology. With the exception of MI, the prevalence of CVDs was higher than in other national data, possibly reflecting data quality. RCGP RSC is an underused resource for research into NCDs and their management and welcomes collaborative opportunities.
- cardiac epidemiology
- primary care
- health informatics
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
High data quality for demographic, clinical and social variables.
Ability to report prevalence and incidence.
From our rich dataset, we are able to report diagnosis and pharmacological and physiological (eg, blood pressure) data.
Data collected in primary care represent only a brief abstraction from a consultation.
Prevalence of myocardial infarction was lower than reported in other UK primary care datasets linked to hospital data. This might indicate that acute conditions are less well coded in primary care records.
Cardiovascular disease (CVD) is a major cause of mortality and morbidity. Improved preventive strategies could reduce the burden of disease. There were over 17 million CVD-related deaths in 2012; CVD accounts for almost one-third of all deaths.1 Ischaemic heart disease has topped the list of causes of years of life lost for more than a decade,2 highlighting the shift in the global burden of disease from communicable to chronic disease.3 Risk factors for CVD, including raised blood pressure, hypercholesterolaemia and high body mass index (BMI), are among the most important contributors to disability-adjusted life years.4 Primary prevention of CVD is achievable through early identification and modification of ‘lifestyle risk factors’ and secondary prevention, through appropriate risk reduction, which slows disease progression. Effective prevention and treatment is reliant on the identification of people at risk of or with current CVD, and systems which facilitate monitoring of management.
Large datasets provide real-world insights into the epidemiology of CVD and better use of analytics may lead to improvements in care.5 Notwithstanding, the widespread use of computerised medical record (CMR) published research on the use of large datasets in CVD research remains limited.6 The majority of CVD management takes place in primary care, providing opportunities for research.7 ,8 In the UK, a pay-for-performance scheme (P4P), the Quality and Outcomes Framework (QOF) was introduced to incentivise general practitioner (GP) practices to achieve indicator thresholds for the management of chronic diseases,9 which has enhanced data quality in primary care. The ubiquitous use of CMR systems in UK primary care allows ready analysis enhanced by the UK’s registration-based systems, which provide an accurate denominator.10
The English Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC) is one of the longest established primary care sentinel networks.11 It comprises a nationally representative sample,12 monitoring infections and respiratory disease, particularly influenza and assessing vaccine effectiveness.13 , 14 15 The RCGP RSC extended its remit into diabetes, with a focus on disparities and adherence/persistence, and has been shown to have good data quality.16 One recent analysis has used the cardiovascular outcome data from the RCGP RSC17 but there has not previously been a systematic assessment of CVD data quality and completeness. We describe the incidence and prevalence of CVD in the RCGP RSC cohort; including across different patient characteristics, medications and risk factors for CVD.
To identify the proportion of adults currently registered within the RCGP RSC who have a diagnosis of CVD.
To describe the prevalence of risk factors for CVD and medications prescribed within the RSCP RSC network.
To compare the incidence and prevalence with other nationally reported datasets.
Study design and study population
This is a cross-sectional study using routinely collected data extracted from general practices registered within the RCGP RSC. All adults registered at the end of the study period, 31 December 2016, were included. Data were contributed by 164 practices across England.
The Read classification is currently used to code key clinical data.18 Diagnoses and key symptoms, examination findings, therapies, investigations and test results, and processes of care are increasingly coded. Interactions with secondary care are generally coded into a patient’s primary care record.
Data quality is very important to RCGP RSC.19 20 A sentinel system has to differentiate first, from new to follow-up of cases. So there is a lot of emphasis on making sure all consultations have a ‘problem’ coded, and it is so-called ‘episode type’ (ie, first, new or ongoing). Practices receive extensive feedback, practice visits, a monthly newsletter and more recently a dashboard refreshed weekly.21 We are also looking at ways of automated classification of unclassified episodes.22
The individuals in the cohort have their identity pseudonymised. We pseudonymise National Health Service (NHS) number to protect patients’ privacy—but can apply the same pseudonymisation algorithm to other datasets (eg, hospital data or cancer registries), so we can privately link patient-level data. We can, in addition, use a combination of probabilistic and deterministic methods to link data where there is no NHS number.23
Data collection takes place twice per week with near real-time processing
We collect data from all patients twice per week, making us one of the freshest data sources. We have the capability to characterise and monitor disease and use of therapies, and compare use according to guidance.24 The RCGP RSC has produced a ‘weekly return’ of infections and respiratory disease since 1967, though over time it has expanded in terms of size, scope, sample collection and its capability with linking with other datasets. We can process large numbers of patients’ data rapidly. Our weekly report involves the processing of around 1.75 million patients data to produce our weekly report of over 30 conditions in around 4 hours.
Demographic, social and clinical characteristics of the study population were identified using Read codes. Ethnicity was characterised using an established ontological approach: a combination of recorded ethnicity and information which infers ethnicity (such as language).25 Socioeconomic status (SES) was determined using the Index of Multiple Deprivation (IMD—the official UK SES measure) from each patient’s postal code,26 which we also have the capability to map to geocoordinate.
We used the latest code recorded for each patient to report smoking status. The categories we created were: never, current and ex-smoker. For alcohol use, we included alcohol-related disease or complications. We stratified alcohol use into: not recorded, within limits (<14 units per week) or alcohol consumption with no amount specified or excess (≥14 units per week or other codes consistent with heavy drinking). We categorised BMI using WHO categories: underweight (<18.5 kg/m2), normal (18.5–24.9), overweight (25.0–29.9), obese class 1 (30.0–34.9), obese class 2 (35.0–39.9) and obese class 3 (≥40.0).27 For BMI and blood pressure, we used the latest value recorded. We also report missing data.
CVD cases include a diagnosis of one or more of the following conditions: coronary artery disease (including myocardial infarction and angina), atrial fibrillation (AF), peripheral arterial disease, stroke, transient ischaemic attack and congestive cardiac failure (online supplementary appendix 1, Read code list28). We included hypertension in our analysis of CVD.
Risk factors for CVD and disease prevalence
In addition to those described above, we measured prevalence of established CVD risk factors demonstrated to predict CVD.29 Our analysis included latest systolic blood pressure (SBP) in excess of 140 mm Hg, chronic kidney disease (CKD, including stages 3–5) and diabetes mellitus.
Prevalence and incidence of CVD
Prevalence of CVD was described by the number of different conditions in people by age groups (<50 years, 50–59 years, 60–69 years, 70–79 years and ≥80 years). Incidence of CVD was derived for the last 5 years (2012–2016). We reported age-adjusted and gender-adjusted annual incidence of CVD using the 2011 Census for England and Wales. We also compared prevalence of CVD conditions and risk factors/comorbidities in the RCGP RSC dataset with those reported for the QOF/P4P national financially incentivised chronic disease management scheme.
CVD prescription use
We report the following cardiovascular medication prescriptions: ACE inhibitors, alpha blockers, angiotensin II receptor blockers, antiplatelet therapy (including aspirin), glycosides, calcium channel blockers, centrally acting antihypertensives, statins, fibrates, other lipid-lowering drugs, loop diuretics, nicorandil, nitrates, potassium sparing diuretics, super ventricular antiarrhythmic, thiazide and thiazide-like diuretics, vasodilator antihypertensives, and beta-blockers. We defined medication use as the recording of one or more prescriptions in the GP record.
Predictors of CVD
Logistic regression was used to identify variables that predict CVD. Variable selection was based on established risk factors for CVD. Since CVD is a composite of conditions, which present at different stages, we also explored associations between variables and the most common CVD conditions identified in this study.
All adults, aged ≥18 years, who were registered at an RCGP RSC practice on 31 December 2016 were included in the analysis.
Any patients who have codes suggesting they declined any form of data sharing are not analysed by RCGP RSC (approximately 2.2% of the registered population).
We used descriptive statistics calculated using the open source software, R-studio. We reported counts and percentages of crude data. Proportions were compared using χ2 tests, while the independent samples t-test and Mann-Whitney U test were used to compare measures of central tendency (means and medians).
As mentioned above, we used logistic regression to identify variables that predict CVD as an outcome. Three models were run, which included the following outcome variables: (1) CVD as a composite of conditions (coronary artery disease, myocardial infarction, angina, AF, peripheral arterial disease, stroke, transient ischaemic attack and congestive cardiac failure); (2) coronary artery disease (including myocardial infarction and angina) and (3) AF.
Within each model, we adjusted for the following predictor variables: age, gender, ethnicity, SES (using IMD stratified into quintiles), smoking status, alcohol consumption, BMI category, presence of diabetes (no diabetes, type 1 diabetes, type 2 diabetes) and presence of other known risk factors (uncontrolled SBP: >140 mm Hg; hypertension and CKD (stages 3–5). All variables were retained in the regression analyses irrespective of associations in bivariate analyses.
We reported ORs with 95% CIs and p values for each model parameter. Results were deemed significant if they were associated with a significance level of p<0.05.
The data used for the analysis were pseudonymised at the point of extraction and encrypted prior to uploading to the Clinical Informatics Research Group secure server. Personal data were not identifiable during the analysis. This study was approved by the RCGP RSC study approval committee and was classified as a study of ‘usual practice’.30 RCGP RSC data are available to researchers and applications should be made direct to RCGP.31
Patient and public involvement
No patients or public were involved in the design, recruitment or conduct of this study.
Demographics about people with and without CVD
A total of 1 275 174 adults were registered with RCGP RSC practices. SES was identified within almost the entire adult population and three-quarters had their ethnicity recorded. The least deprived quintiles (IMD quintiles 4 and 5) were over-represented in the cohort (table 1).
The prevalence of CVD and hypertension was 21.3% (271 684) and more common among males. Smoking, unsafe drinking and obesity were more prevalent among people with CVD than without, as was white ethnicity. There was a lower proportion of Asian, mixed or other ethnicity individuals among those diagnosed with CVD by comparison to those without CVD, and the proportion of people in the two least deprived quintiles was higher in those with CVD (table 1).
Risk factors for CVD and disease prevalence
The most prevalent risk factor was smoking, followed by obesity, uncontrolled SBP, type 2 diabetes and CKD (table 2). Crude prevalence of smoking (active and ex-smokers) was higher in those of white and mixed ethnicity, and obesity was most prevalent in people of white and of black ethnicity. Crude prevalence of type 2 diabetes was greater in Asian and black ethnicity, and CKD in white people. The highest overall crude prevalence for CVD was in white ethnicity, particularly for AF.
Prevalence of CVD
The prevalence and number of conditions increased with age (online supplementary table S1). CVD was present in less than 1% of people under the age of 50 years, increasing to a quarter of people with at least one condition between 70 and 79 years old, and over 40% in people over the age of 80 years.
We identified a higher prevalence of CVDs from RCGP RSC data than identified by the P4P(QOF) definitions (figure 1A). We also detected a higher proportion of other CVD risk factors and comorbidities (figure 1B).
Incidence of CVD
Hypertension showed the highest incidence (2012–2016) and age-adjusted and gender-adjusted annual incidence rate followed by AF and coronary artery disease (table 3).
CVD prescription use
ACE inhibitors, statins and calcium channel blockers were the most commonly prescribed, with over half of the CVD cohort prescribed at least one of these medications (online supplementary table S2).
Predictors of CVD
When considering predictors of all types of CVD, people were more likely to have the disease if they were older, male, current or ex-smokers, hazardous drinkers or categorised as an alcoholic, or had a well-established comorbidity/risk factor (chronic kidney disease, diabetes and hypertension) for CVD (table 4). In addition, likelihood of having CVD increased with each BMI category compared with people with a normal BMI, while people in the more deprived groups (IMD quintiles 1–4) were more likely to have CVD than those in the least deprived group (IMD quintile 5), and people of non-white ethnic groups were less likely to have CVD than people of white ethnicity. Interestingly, people with uncontrolled SBP were less likely to have CVD than those with controlled SBP. However, bivariate analysis showed that people were more likely to have CVD if their SBP was uncontrolled (OR 2.09, 95% CI 2.06 to 2.12), which suggests that adjusting for other variables (eg, age and hypertension diagnosis) affected this relationship.
Similar associations were found for coronary artery disease and AF (online supplementary tables S3 and S4); however, people of Asian ethnicity were more likely to have coronary artery disease than people of white ethnicity. Current and ex-smokers and people with type 1 diabetes were less likely to have AF than non-smokers and those without diabetes, respectively.
Our results demonstrate a high prevalence of CVD, including hypertension, in a nationally representative population sample. A CVD diagnosis was recorded in more than one in five adults in our population. Regarding specific cardiovascular conditions, hypertension was the most commonly diagnosed cardiovascular condition. Coronary artery disease was diagnosed in 3.5% of individuals. This compares to previous UK studies which have demonstrated a fairly consistent prevalence of coronary artery disease at 3% in England, and 4% in Scotland, Northern Ireland and Wales.32 Additionally, our reported prevalence of AF, 2.7%, is similar to modelling studies undertaken by Public Health England to estimate the true burden of AF in England,33 which in itself is significantly higher than the national prevalence of AF reported by QOF/P4P. This highlights the benefit of our approach to case definition and the use of a wider range of clinical codes than used within P4P business rules. Classical risk factors for CVD were well represented within the RCGP RSC population. For example 234 838 individuals had a diagnosis of hypertension.
The RCGP RSC database is suitable for CVD research. We have defined a cohort of people with CVD (n=271 684) and identified a higher prevalence of patients than the P4P/QOF scheme: though we note QOF uses a limited list of codes. Incident cases of CVD and CVD risk factors can be identified in the cohort. We identified more obesity, our results are more closely aligned with self-reported data.34 Our use of ontologies to improve case finding may make our approach more reproducible than studies that rely on an expert’s code list.35
Data quality was high in the CVD cohort for demographic, clinical and social variables. These are similar to those reported in other UK primary care databases, such as Clinical Practice Research Datalink (CPRD).36 We were able to identify uncontrolled hypertension, a risk factor for major cardiovascular events,37 38 and highlighted an area where management could be improved through a combination of patient education and self-management and pharmacological interventions as appropriate.39 We could identify appropriate medicines used to manage CVD and found statins prescribed at anticipated rates.40
The associations with ethnicity were generally those expected. We found that a higher prevalence of people of white ethnicity had CKD, while the highest prevalence for type 2 diabetes was seen in those of black or Asian ethnicity.41 ,42 Similarly a higher prevalence of hypertension was seen in people of black ethnicity.43 44
The age-adjusted and gender-adjusted annual incidence of CVD was highest in AF and similar to other findings that have reported rates in Europe,45 while the incidence of stroke was higher than previously reported in other UK primary care databases (CPRD).46 This may reflect an increase in disease in the years since these findings were reported, or enhanced case-finding strategies.
Logistic regression analyses demonstrated that established risk factors for CVD were associated with CVD in the RCGP RSC dataset. Unusually, people with uncontrolled SBP were shown to be less likely to have CVD. However, bivariate analysis confirmed that this finding was only after adjusting for other variables, and uncontrolled SBP is in fact associated with higher risk of CVD. This variable was for the latest recording only for people with or without a diagnosis of hypertension, and therefore, many people who had uncontrolled SBP are healthy.
Strengths and weaknesses
We have presented a comprehensive assessment of the burden of CVD in a representative population sample. Our use of coding ontologies improves the accuracy of case identification. RCGP RSC has promoted high-quality data recording for over half a century.
Although RCGP data have traditionally been used for monitoring infectious diseases, the network data have recently started to be used more for non-communicable disease (NCD) research. Examples include the epidemiology of type 3c diabetes,47 liver disease48 and risks associated with anticoagulant use in people with AF and CKD.17
The weakness of our data is that of all routinely collected data.18 Data are collected for routine patient care in the context of the 10 min GP consultation. Our coded data are therefore a brief abstraction of this process.49 We are still in the early stages of looking to ensure the robustness of real-world evidence studies.
The incidence of myocardial infarction was lower than reported UK primary care data from linked primary care, hospital and national audit datasets (CPRD).50 This suggests there are limitations to using primary care data alone, though we have the capability to link to these same datasets.
We have reported the prevalence and incidence of common CVDs in the RCGP RSC dataset. Our reported prevalence of coronary artery disease and AF is comparable with other UK studies. Therefore, the RCGP RSC is ready to be more active in research and quality improvement studies including CVD. The rich clinical data available within the RCGP RSC have substantial potential utility for epidemiological research into a variety of communicable and NCD. This has recently been demonstrated using the RCGP RSC dataset to explore CVD outcomes in renal disease.17 The combination of clinical and prescription data can be used to carry out real-world evidence studies and explore effectiveness beyond the traditional randomised control trial setting.
Rachel Byford and Julian Sherlock, SQL developers, database management and data extraction, and the participating practices and patients for providing the data for this cohort. Collaboration with the GP computer system suppliers—EMIS, InPractice Systems and TPP; Apollo systems; Public Health England and our other funders and collaborators.
Contributors WH led the analysis and drafting of the paper. AM and AC supported the statistical analysis of the paper. SdL led the development of the cohort, conceived the idea and contributed to the drafting of the paper. RC, TSH, PS and FF also contributed to the preparation and reviewing of the manuscript.
Funding This study was funded internally by the Department of Clinical and Experimental Medicine at the University of Surrey.
Competing interests None declared.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The RCGP RSC data set can be accessed by researchers, approval is on a project-by-project basis (www.rcgp.org.uk/rsc). Ethical approval by an NHS Research Ethics Committee is needed before any data release/other appropriate approval. Researchers wishing to directly analyse the patient-level pseudonymised data will be required to complete information governance training and work on the data from the secure servers at the University of Surrey. Patient-level data cannot be taken out of the secure network. We encourage interested researchers to attend the short courses on how to analyse primary care data/RCGP RSC data offered twice a year.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.