Objectives To investigate the relationship between performance on the UK Quality and Outcomes Framework pay-for-performance scheme and choice of clinical computer system.
Design Retrospective longitudinal study.
Setting Data for 2007–2008 to 2010–2011, extracted from the clinical computer systems of general practices in England.
Participants All English practices participating in the pay-for-performance scheme: average 8257 each year, covering over 99% of the English population registered with a general practice.
Main outcome measures Levels of achievement on 62 quality-of-care indicators, measured as: reported achievement (levels of care after excluding inappropriate patients); population achievement (levels of care for all patients with the relevant condition) and percentage of available quality points attained. Multilevel mixed effects multiple linear regression models were used to identify population, practice and clinical computing system predictors of achievement.
Results Seven clinical computer systems were consistently active in the study period, collectively holding approximately 99% of the market share. Of all population and practice characteristics assessed, choice of clinical computing system was the strongest predictor of performance across all three outcome measures. Differences between systems were greatest for intermediate outcomes indicators (eg, control of cholesterol levels).
Conclusions Under the UK's pay-for-performance scheme, differences in practice performance were associated with the choice of clinical computing system. This raises the question of whether particular system characteristics facilitate higher quality of care, better data recording or both. Inconsistencies across systems need to be understood and addressed, and researchers need to be cautious when generalising findings from samples of providers using a single computing system.
Statistics from Altmetric.com
Practice and patient-level characteristics are known predictors of quality of care, as measured by the Quality and Outcomes Framework (QOF) indicators.
Various general practitioner (GP) clinical computer systems are used in the UK but their distribution over time and location is unknown.
GP clinical computer systems differ in software architecture, user interface and clinical coding lists but their effect on quality of care has never been examined.
Seven systems were found to hold 99% of the market share, with clear geographical variation in their distribution.
Levels of performance on the QOF differed to a small extent across clinical computer systems, even after controlling for practice and patient characteristics. Quantified differences were small but not negligible since they translate to systematic variation in recorded care for hundreds of thousands of patients nationwide.
Researchers that utilise primary care databases, which collect data from a single clinical system, need to be cautious when generalising their findings to all English practices.
Strengths and limitations of this study
This is the first study that investigates the effect of GP clinical computer system choice on measured quality of care.
We used data for over 99% of all English practices and there is no risk of inductive fallacy.
There are more aspects to quality of care than what is recorded under the QOF; this is an observational study and causality is difficult to establish and it is possible that QOF-oriented practices have particular clinical system preferences.
Clinical computing systems have been promoted as a means to improve the quality of healthcare, holding advantages over paper-based systems of improved data recording, integration and accessibility. These advantages, combined with automated feedback and alerts, have the potential to drive improvements in efficiency, process performance, clinical decision-making and medication safety.1 ,2 The potential is greatest in primary care settings, where the activities of different providers for multiple conditions must be coordinated in order to optimise care for patients and minimise wastage of resources. In practice the results of implementing information technology systems have been mixed,3–9 and the full promise of computer-supported healthcare has yet to be realised. This is partly attributable to the variable quality of software systems developed by multiple providers, and partly to clinicians adapting at different rates to systems that frequently do not address their specific requirements and which may challenge their approach to existing practice.
In UK primary care, health information technology developed from multiple, often amateur, systems in the 1980s into a handful of systems using a standard coding thesaurus, known as Read codes, by the 2000s.10 From 1998, family practices were partially subsidised for the costs of installing clinical computing systems as part of a wider government programme to develop electronic patient records.11 Full subsidies were provided from 2003 in preparation for the implementation of a national pay-for-performance scheme—the Quality and Outcomes Framework (QOF)—the following year. The QOF provides large financial bonuses to practices based on their achievement on over 100 quality of care indicators, mostly relating to processes of care for common chronic conditions.12 Under the QOF, practices are awarded points for each quality indicator based on the proportion of patients for whom targets are achieved, between a lower achievement threshold of 40% and an upper threshold that varies by indicator from 50% to 90%. Each point scored earns the practice £126, adjusted for the relative prevalence of the disease and the size of the practice population. Practices can exclude (‘exception report’) inappropriate patients from achievement calculations for logistical reasons (eg, recent registration with the practice), clinical reasons (eg, a contraindication to treatment) or for patient-informed dissent. Performance data are drawn automatically from practices’ clinical computing systems—which must conform to standard interoperability requirements—and collated on a national database: the Quality Management and Analysis System (QMAS). This system provides feedback to practices and is also used to calculate quality payments.
The QOF has had a substantial impact on the use of clinical computing systems by practices, and this in turn impacts on relationships with patients both within and beyond consultations.13 Practices are required to keep disease registers, and because bonus payments increase with disease prevalence there is an incentive to case-find. The business rules for the QOF also specify criteria and permissible Read codes for identifying patients with particular conditions, resulting in greater uniformity of code usage and in some cases changing diagnostic behaviour (eg, making clinicians more reluctant to record depression14). Software providers have adapted their systems to facilitate better practice performance on the QOF (usually defined in terms of practice remuneration rather than high achievement rates per se) by incorporating pop-up alerts and management tools in the software and providing QOF-oriented training programmes for practice staff. There is no independent evidence to date, however, on whether practice performance on the QOF and recorded quality of care is associated with the practice's choice of clinical computing system. We used a unique dataset to assess these relationships in English family practices operating under the national pay-for-performance scheme.
We carried out a retrospective study of performance on the QOF by English family practices from 2007–2008 to 2010–20011, identifying practice predictors including choice of clinical computing system (only one in use within each practice) through multilevel multiple linear regression models. English practices are geographically organised in 151 Primary Care Trusts (PCTs: commissioning bodies that oversee practices operating in a locality) and those in turn into 10 Strategic Health Authorities (SHAs: responsible for fiscal policy at a regional level).
We used data from the QMAS, the national information system supporting the QOF, which holds data for almost all English practices (over 99% of registered patients in England attend participating practices). QMAS data on QOF achievement, exception reporting and prevalence rates are freely available on the Health and Social Care Information Centre (HSCIC) website15 but information on clinical computing systems is not publicly reported and we obtained the relevant dataset for this study from the HSCIC. Data on practice characteristics and the populations they serve were obtained from the General Medical Services (GMS) Statistics database, also provided by the HSCIC. Area deprivation, as measured by the Index of Multiple Deprivation (IMD),16 and urban/non-urban classification,17 at the lower super output area level (a set of geographical areas of roughly consistent size with a population of around 1500), were obtained from the Communities and Local Government18 and the Office of National Statistics19 websites, respectively. Lookup tables from the UKBORDERS website20 were used to assign the measures to practices at the postcode level. Data were complete for all practices.
Clinical computing systems
Over the study period, 15 clinical computing systems from eight suppliers were active within the QOF scheme. Practices were assigned to a group on the basis of the clinical computing system in use at each year end (March; see table 1). Computing systems with fewer than 100 users at any time point were excluded from the analyses. Seven systems from five software suppliers were thus included in the analyses, accounting for around 99% of the practices participating in the scheme.
We measured practice performance on the clinical quality indicators of the QOF, using non-register clinical indicators that were continually incentivised over the study period (see online supplementary table A6). Three performance measures were used: (1) reported achievement (RA), the proportion of eligible patients for whom the targets were achieved, not including exception reported patients; (2) the percentage of QOF points scored (PQ), the metric on which remuneration is based and (3) population achievement (PA), measured by the proportion of eligible patients for whom targets were actually achieved, including exception reported patients. We argue that RA and PA are proxies of quality of care and percentage of points scored is a measure of practice benefit from the scheme, although all three outcomes are very strongly correlated.
For practice and indicator , RA can be defined as 1where is the number of patients for whom the target was achieved and the number of patients meeting the criteria for the indicator who have not been an exception reported by the practice. The exception reporting provision is intended to protect patients from inappropriate care and it can be used to exclude patients for a variety of reasons, including logistical, clinical and patient dissent.21 The RA rate is the most commonly used measure of practice performance under the QOF scheme, as in theory it focuses on patients for whom the quality targets are appropriate.
Percentage of QOF points scored
For practice and indicator , the number of points scored is based on RA rates 2where, for each indicator is the number of available points (ranging from 1 to 57), the lower threshold (set at 25% for two indicators and 40% for all others) and the upper threshold (ranging from 50% to 90%). For a given indicator, a practice will secure 0 points if RA is less than or equal to the lower achievement threshold and will secure maximum points if RA equals or exceeds the upper achievement threshold. For RA rates between the thresholds, the number of points scored is calculated linearly using (2).22 The percentage of points scored (PQ) is obtained by dividing the overall points scored by the relevant number of available points.
The existence of upper achievement thresholds and the provision for practices to exception report patients means that maximum points can be scored (and maximum remuneration secured) without achieving the targets for all patients. Practices with markedly different achievement rates can therefore appear similar when assessed using the percentage of points scored. We therefore calculated the PA rate23 as: 3where is the number of patients’ exception reported.
Across all three outcome measures, practice composite scores were calculated: (1) overall, across all 62 clinical indicators and (2) by three categories of activity: measurement activities, 35 indicators; treatment activities, 11 indicators and intermediate outcomes, 16 indicators (see online supplementary table A6). For reported and PA, scores were calculated by summing across all relevant indicators and dividing by the sum of eligible patients ( for RA and for PA). Patients eligible for multiple indicators are double counted in the composite rates, and these rates therefore represent the proportion of ‘opportunities’ for a practice to perform an incentivised activity that resulted in a success. For percentage of achieved QOF points, we used the sum of across the relevant indicators, overall and by indicator group, divided by the respective number of available points (see online supplementary appendix for details of indicators in each category).
We used multilevel mixed effects multiple linear regression models to identify population, practice and clinical computing system predictors of RA and PA rates and percentage of QOF points scored. For each outcome type, two regression models were executed, with a different summary measure used as outcome in each. In the simpler model (model 1), we used the overall summary measure across all 62 indicators, while the more complex model (model 2) included summary measures by indicator group (measurement, treatment and intermediate outcome), with their effect modelled as fixed. Using model 1 we were able to estimate the effect of each clinical system on overall achievement and percentage of points scored, and to assess whether there were significant differences across clinical systems by using omnibus postregression χ2 tests. In model 2, we included interaction terms (clinical system by indicator group) to estimate the effect of each clinical system in each indicator group and assessed whether there were significant differences across clinical systems, in each of the indicator groups, through postregression tests. All analyses were controlled for year, practice list size, local area deprivation, rurality, type of general practitioner (GP) contract (GMS, Personal Medical Services, Alternative Provider Medical Services or Primary Care Trust Medical Services), percentage of female patients, percentage of patients aged 65 or over, mean GP age in practice, percentage of female GPs, percentage of UK-qualified GPs and percentage of GP providers (partners, single-handed or shareholders). Practices that switched to a different system were included in both models but we excluded a few practices with fewer than 1000 patients since they were unrepresentative and would contribute data only partially (approximately 97 practices, with 0.001% of the patients).
Linear predictions, and their 95% CIs, were calculated for each clinical system from the regression models, overall (model 1) and by indicator group (model 2). These can be described as mean predicted outcome levels for each factor (clinical system and indicator group by clinical system) that are controlled for all model covariates and set to their mean values in the models, thus allowing us a ‘fairer’ comparison of the systems (table 2). We also performed pairwise comparisons between clinical system performance predictions, for all outcomes, with CIs adjusted using Scheffé's method for multiple comparisons.24 The results from these comparisons are presented in the online appendix (see online supplementary tables A3–5).
The structure of the data was three-level, with practice outcomes nested within PCTs and PCTs nested within SHAs. To account for this structure and to model variability at each level, we used mixed effects models with the xtmixed command in Stata. Owing to computational limitations, we modelled both levels as random effects. A potential complicating factor was the distributions of the outcomes, which were extremely skew-normal in some cases (eg, percentage of points achieved). Although linear regression models are robust against deviations from normality25 we obtained bootstrap estimates of 1000 repetitions for the SE across all models, an approach that does not make any distributional assumptions about the observed data.26 Differences in the SEs between bootstrapped and standard regression models were small and we only report the former. All statistical comparisons were made at an α level of 5%. Stata V.12.1 software was used for all analyses.27
Overall average PCT performance on RA, PA and percentage of QOF points scored are mapped in figure 1. Performance at the PCT level by type of indicator is given in the online appendix (see online supplementary figures A1–3).
Over the study period (2007–2008 to 2010–2011) 15 clinical computing systems from eight providers were in use in English family practices, but seven systems collectively accounted for approximately 99% of the market (table 1). LV by EMIS was the most widely used system, although its use declined over time (from 46% in 2007–2008 to 39.9% in 2010–2011). Vision 3 by In Practice Systems was the second most popular choice, with a relatively stable proportion of the market (around 19%). The third most popular choice was PCS by EMIS with a stable market share of around 15%, although it has been superseded by EMISWeb and is no longer marketed. The number of practices using TPP by ProdSysOneX more than doubled in the study period, from 697 in 2007–2008 (8.4%) to 1466 in 2010–2011 (17.8%). Synergy and Premiere from iSoft had a combined market share of 9.5% in 2007–2008 but declined to 5.5% in 2010–2011. Practice Manager by Microtest was used by approximately 2% of English practices throughout the time period. EMISWeb and GV by EMIS, HealthyV5 and Crosscare by Healthy Software, Seetec GP Enterprise by Seetec, Ganymede and System 6000 by iSoft and Exeter GP System by Protechnic Exeter were used in the remaining 1% of practices.
Variation in practice characteristics by system
Practice characteristics and performance by system are presented in table 3. In 2010–2011 Practice Manager, Vision 3 and ProdSysOneX practices had the highest RA scores, with Vision 3 practices scoring higher for measurement indicators, ProdSysOneX and Practice Manager practices for treatment indicators and Vision 3 and Practice Manager practices for outcome indicators. For PA, overall Synergy practices were collectively the highest performers, with LV and Practice Manager practices achieving similar levels of performance on the treatment and outcome domains, respectively. Synergy practices also returned the highest percentage of points per practice, on average, for measurement and treatment indicators, closely followed in these domains by Premiere practices, which, along with Practice Manager practices, scored the highest for outcomes.
Practice and patient characteristics varied by system. For example, average list sizes were highest for Synergy practices and lowest for PCS practices, and average area deprivation was lowest for Premiere practices and highest for PCS practices. There was also clear geographical variation in system distribution, for example, the LV system was used by 24.8% of practices in London and 50.5% of practices in the South East.
Variation in practice performance
There was little change in RA, percentage of points scored or PA over time, with PA being higher by 0.42% (95% CI 0.32% to 0.52%) in 2008–2009 and 0.32% (95% CI 0.22% to 0.42%) in 2010–2011, compared with 2007–2008 (see online supplementary table A1). Most practice and patient characteristics had significant but small effects on performance, and these effects often varied by outcome (see online supplementary table A1). For example, for every additional 1000 patients on the practice list RA decreased by –0.11% (95% CI −0.12% to −0.10%), PA decreased by 0.14% (95% CI 0.13% to 0.15%), and percentage of points scored increased by 0.08 (95% CI 0.07 to 0.10). Practices located in more deprived areas performed worse across all three outcomes: by –0.01 (95% CI −0.012 to −0.007) for RA, −0.02 (95% CI −0.022 to −0.016) for PA and –0.006 (95% CI −0.01 to −0.002) for percentage of points scored, per 1 point increase on the IMD scale. Over the range of area deprivation this is equivalent to 0.8% higher RA, 1.6% higher PA and 0.48% higher percentage of points scored in the most affluent compared with the most deprived areas. Rural practices scored lower on RA and points but not on PA. Practices with a higher proportion of female patients performed better on all three outcomes.
In the regression models, overall performance (across all 62 clinical indicators) differed significantly by clinical system used, for all three outcomes (model 1: table 2 and see online supplementary table A1). The systems with the best performing practices were Vision 3 for RA and Synergy for PA and percentage of points scored. The system with the worst performing practices was PCS, across all three measures. The clinical system rankings for each outcome, based on predictions from the linear regression models, are presented in table 2 and pairwise comparisons in the online appendix (see online supplementary tables A3–5).
Relative monetary gains by clinical system, based on the predictions of points achieved, are displayed in figure 2. Compared with PCS practices (the worst performing on points achieved), practices using Synergy were predicted to gain, on average, an additional £602/year.
Performance by type of activity varied significantly across clinical systems, for all three outcomes (model 2: table 2 and see online supplementary table A2). In the measurement domain, as in overall performance, the systems with the best performing practices were Vision 3 for RA and Synergy for PA and percentage of points scored; the systems with the worst performing practices were PCS, for reported and PA, and Practice Manager for percentage of points scored. In the treatment domain, the systems with the best performing practices were Vision 3 for RA, LV for PA and Premiere for points scored; the systems with the worst performing practices were PCS for RA and Practice Manager for PA and percentage of points scored. Finally, in the outcome domain, the systems with the best performing practices were Practice Manager for RA, Synergy for PA and Vision 3 for points scored; the worst performing system was PCS, across all three measures. The clinical system rankings for each outcome and indicator group are presented in table 2. Pairwise comparisons between systems are presented in online supplementary tables A3–5.
The UK's QOF was developed to reward high-quality primary care for a range of chronic conditions, with the ultimate aim of improving patient outcomes. The framework relies heavily on an infrastructure of clinical computing systems to: register and classify eligible patients; remind practice staff of the quality indicators; monitor progress towards the targets and to notify the payer of practice achievement. Subject to meeting interoperability criteria, various commercial providers were permitted to provide this infrastructure, and multiple clinical computer systems were developed with different user interfaces, mechanisms and even variations in clinical coding lists.28 We found that practice performance on the quality indicators contained within the QOF varied significantly across clinical computer systems. This variation persisted when we controlled for a range of practice and patient characteristics, raising the possibility that differences in practice performance under the QOF may be partly attributable to architectural differences between the software systems they use.
Strengths and limitations of the study
The study uses data for all English practices participating in the QOF, covering over 99% of the population registered with Primary Care services and there is therefore no inductive fallacy risk.
However, there are several limitations to the study. First, the QOF was introduced in 2004–2005 but clinical systems information was only available from 2007 to 2008 onwards. Our findings might have been different in previous years, especially given the greater variation in practice performance in the early years of the incentivisation scheme.23 Second, we did not investigate the mechanisms behind the clinical system variation, for example by focusing on particular indicators for which variation is greater and the recording process might differ across systems. Third, we report quality of recorded care and there may be differences with care actually delivered. However, improvement in measurement is a necessary pre-requisite for improved quality of care. Fourth, reported quality of care, as measured by the QOF indicators, represents only a fraction of the care provided by a practice, and the effect of clinical computing systems on the quality of care for other (non-incentivised) activities might be different. Fifth, causality is difficult to establish in observational studies, and it is possible that more QOF-oriented practices have particular clinical system preferences. Sixth, even trivial effects can be found to be statistically significant when analysing very large populations and we have therefore focused our discussion on the effects and their interpretation rather than CIs and p values. Seventh, following the incentivisation scheme, we used indicators that were not independent since some would apply to the same patient. Eighth, timing of computing system introduction to the market might be an important predictor of performance, but we did not include this as a covariate to avoid ‘filtering out’ the system effect we aimed to measure (ie, we do not wish to control the analyses for system characteristics). Finally, the relationship between clinical computing system and recorded quality of care is confounded by location, since system distribution varied by region. However, our analyses are controlled for two levels of geographic classification.
Our study analysed years 4–7 of the scheme, by which time variation in performance between practices was much less than in the first year.29 Nevertheless, as in previous studies we found that several practice and population characteristics—such as list size, local area deprivation and rural location—were associated with performance on the QOF,23 ,30 ,31 although these effects were small.
Larger practices performed worse on RA and PA and better on point scoring, a finding that might indicate that larger practices have processes in place that enable them to maximise their QOF returns despite having slightly lower levels of achievement, compared with smaller practices. The better point scoring of larger practices has been identified in the past but attributed to better performance in non-clinical aspects of the QOF.32 The inequality gap between practices in deprived and affluent areas has been found to be diminishing over time, with the former catching up with the latter,23 although there is evidence that the gap had been exaggerated with the introduction of the financial scheme, since more affluent practices were quicker to adapt and maximise their QOF performance.33 In agreement with previous work, we found small practice location deprivation effects across the three outcome measures. The gap between affluent and deprived practices was wider for PA, than for RA, which is not surprising: comorbidity levels are higher34 and education levels are lower in more deprived areas (education deprivation is one of the seven domains of the IMD score), potentially leading to higher exception rates due to contraindication or informed dissent. In the first year of the scheme, remote practices from urban centres were found more likely to score lower on points and RA but not PA,35 and our findings are in agreement. The slightly better performance of practices serving a larger percentage of female patients might indicate that males are less receptive or cooperative in consultations, although previous work identified slightly higher levels of diabetes care for male patients.33
Practice characteristics also varied by computing system, for example: practices using Synergy tended to be larger than average, and practices in more deprived areas favoured PCS. The distribution of computing system usage varied by region, which may be the result of market penetration, ‘critical mass’ effects (with clinicians becoming familiar with particular systems as trainees and continuing to use them throughout their careers), or the influence of PCTs.
The differences in performance between groups of practices using different clinical computing systems were also small in absolute terms: for example, the modelled difference between the best and worst performing systems for overall RA was 1.4%. However, the association between clinical computing system and QOF performance was stronger than for any other patient or practice characteristic, including list size, proportion of patients over the age of 65 and local area deprivation. The differences between computing systems were greatest for intermediate outcomes indicators (such as control of blood pressure), with a gap between the best and worst performing systems of 2.7% for RA.
These differences are not negligible, especially in light of the small overall variation in practice performance and further convergence in performance over time,23 ,36 and the diminishing variation in recorded care between population groups.33 In addition, differences become substantial at the population level. For example, a 1% difference in achievement of blood pressure control targets for hypertensive patients equates to 9 patients in the average practice and over 71 000 patients nationally.
On the other hand, remuneration differences across systems are very small, considering the average practice is awarded around £120 000/annum through the scheme. This discrepancy between remuneration equity and performance can be explained by the very high levels of performance which, for most practices, surpass the upper thresholds over which no further monetary gains are obtained.37 ,38
Overall, the best average performance across the three outcomes measures was achieved by practices using Vision 3, Synergy or Premiere systems (with one exception on RA, where practices with Practice Manager ranked second). It is notable that two of these systems—Synergy and Premiere—were installed in a diminishing minority of practices (falling from 9.5% to 7.4% over the study period) and are now likely to be withdrawn from the market.39
There are several reasons why particular clinical computing systems might facilitate better performance on quality schemes such as the QOF: usability and intuitiveness use of alerts and notifications, dismissability of reminders, support and training and adaptability. In particular, QOF recording relies heavily on the use of data entry templates and it is plausible that the design and usability of these may have an impact. However, the interactions between the users and developers of software systems are likely to be complex and to impact on quality of care in unpredictable ways. For example, early adopters may be more likely to be oriented towards computerised medicine and to have developed familiarity with IT systems, but may also adhere to outdated systems and heuristics, either due to habituation or lack of resources for reinvestment. System differences are also likely to vary by activity, given the different workflows and data linkage processes involved (eg, referrals to specialists as compared with obtaining laboratory results). This is an under-researched area, and the mechanisms underlying the differences in outcomes for different systems need to be examined in greater detail. It is also necessary to examine whether adapting clinical computing systems to support a pay-for-performance scheme impacts on non-incentivised aspects of care, either through neglect of other elements of software development or by reinforcing particular behaviours in clinicians.
In the UK, performance on the QOF, the world's largest health-related pay-for-performance scheme, is partly dependent on the clinical computing system used by practices. The raises the question of whether particular characteristics of computing systems facilitate higher quality of care, better data recording or both. This question is of interest to clinicians and to policy makers, for whom this work highlights an inconsistency across clinical computer systems which needs to be understood and addressed. For health services researchers, our findings identify an important variable to include in future studies of clinical performance, and an additional factor to consider when generalising findings from samples of providers based on a single clinical computing system. These cautionary messages are also relevant for other international healthcare systems, particularly those with multiple software providers or without stringent interoperability standards.
We would like to thank the Health and Social Care Information Centre analysts for sharing the clinical system data with us and Dr Mark Ashworth and the two reviewers for their insightful observations that improved the manuscript.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online appendix
Contributors EK and TD designed the study. EK extracted the data, and performed the statistical analyses. EK and TD wrote the manuscript. IB, DR and KC edited the manuscript. EK is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. All authors have read and approved the final manuscript.
Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests EK was partly supported by an NIHR School for Primary Care Research fellowship in primary healthcare; TD was supported by an NIHR Career Development Fellowship. The views expressed are those of the authors and not necessarily those of the NHS, the National Institute for Health Research or the Department of Health. No other relationships or activities that could appear to have influenced the submitted work.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Most of the data we used are freely available and we have provided references and links in the manuscript. However, the clinical systems and practice characteristics information (GMS) cannot be shared due to licencing restrictions.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.