Objectives To test the performance of new variants of models to identify people at risk of an emergency hospital admission. We compared (1) the impact of using alternative data sources (hospital inpatient, A&E, outpatient and general practitioner (GP) electronic medical records) (2) the effects of local calibration on the performance of the models and (3) the choice of population denominators.
Design Multivariate logistic regressions using person-level data adding each data set sequentially to test value of additional variables and denominators.
Setting 5 Primary Care Trusts within England.
Participants 1 836 099 people aged 18–95 registered with GPs on 31 July 2009.
Main outcome measures Models to predict hospital admission and readmission were compared in terms of the positive predictive value and sensitivity for various risk strata and with the receiver operating curve C statistic.
Results The addition of each data set showed moderate improvement in the number of patients identified with little or no loss of positive predictive value. However, even with inclusion of GP electronic medical record information, the algorithms identified only a small number of patients with no emergency hospital admissions in the previous 2 years. The model pooled across all sites performed almost as well as the models calibrated to local data from just one site. Using population denominators from GP registers led to better case finding.
Conclusions These models provide a basis for wider application in the National Health Service. Each of the models examined produces reasonably robust performance and offers some predictive value. The addition of more complex data adds some value, but we were unable to conclude that pooled models performed less well than those in individual sites. Choices about model should be linked to the intervention design. Characteristics of patients identified by the algorithms provide useful information in the design/costing of intervention strategies to improve care coordination/outcomes for these patients.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
The use of statistical models to predict risk of hospital admissions is increasingly used to prioritise patients for preventive care. Models exist in several different forms and use a variety of input data sets.
This paper compared the performance of a variety of models built using different data sets.
The addition of more detailed data sets led to moderate improvement in the number of patients identified with little or no loss of positive predictive value.
The use of general practitioner registry data for the denominator proved to be of significant importance. By including all patients in an area, not just those with prior hospital use, improved rates of case finding were observed.
Models calibrated to local data sets did not show consistent improvement over models built on pooled data.
Strengths and limitations of this study
The analysis is based on populations from only five areas in England; however, this is one of the largest UK populations (1.8 million people) used in the development of a publicly available risk tool.
The success of a predictive model depends on many factors beyond the statistical performance of the model.
There remains continuing interest in identifying patients at risk of future hospital admissions. Policies providing penalties1 or non-payment2 for hospital readmissions that put providers at risk for a share of total health expenditures have been developed in the USA and England. These create even stronger incentives to identify high-risk patients to target care coordination and management strategies that may potentially reduce future inpatient expenditures.
Most predictive modelling approaches have used administrative data from claims in the USA or hospital data from hospital episode statistics or secondary uses services (SUS) in England. These data provide the information on prior utilisation and diagnostic history to develop predictive models for patients at risk of future hospitalisation.3 Payor claims data in the USA provide rich information on care provided in hospitals, home care services and nursing home use, as well as detailed pharmacy prescription history.4 In England, the most commonly used models (such as in the now outdated patient at risk of readmission (PARR) algorithm)5 ,6 are based on hospital admissions data (including day case use and regular attendances) with some use of accident and emergency (A&E) and outpatient attendance data as well.7
While some predictive modelling efforts in the UK have included information from general practitioner (GP) electronic medical records (EMRs),8 ,9 using such data presents a number of challenges. These include obtaining permissions for access to EMRs for large populations, linking the records to hospital data and use of Read codes10 to develop GP variables. Data from EMRs include additional elements not available in the hospital data sets such as test results (eg, blood pressure, glycated haemoglobin (HbA1c) levels), diagnostic history for patients without recent inpatient admissions, prescription history, GP contact patterns (GP visits and telephone contacts) and other personal health markers (eg, body mass index, smoking status). These additional data elements have the potential to add power to predictive modelling efforts, especially for patients with no or lower levels of recent inpatient use.
Despite the challenges, a number of initiatives around the UK are demonstrating population-wide access to EMR data. A common application is in the use of models that assist local clinical commissioning groups in identifying high-risk patients.
Though the choice of data sets for a predictive model makes a big difference to the investment required to run these models—at least initially—no studies have looked at the marginal value of different data sets. In this analysis, we examine the added value of including data on A&E and outpatient visits (which are readily available) to predictive modelling efforts using hospital inpatient data alone. We also assess the marginal effect of adding GP EMR information to help identify patients at risk of future hospital admissions. Most of the existing models in use were developed using logistic regression techniques and we used this standard approach throughout this paper. We recognise that different modelling methods may yield different results, but in this analysis we were concerned with the impact of changes in the underlying data sets. Such models will always be limited by the scope and quality of data available, the ways data are grouped and classified and the ways that users can assess up-to-date information. Despite these problems, these models have become commonly used tools. In addition to the depth of data used, there is also a question about how generalisable models are across different sites. In many settings, models are recalibrated on local data sets. Yet there is little systematic analysis of the value that this step adds and whether models built on data from one site outperform those built on a larger sample of pooled data. We therefore explore whether there is a need for development of individual site predictive models, or whether models developed from multiple sites can be applied effectively at a new individual site.
We conducted analyses separately for five Primary Care Trust (PCT) areas in England (Newham, Cornwall, Kent, Croydon, Redbridge; total adult population ranging from 209 661 to 693 089). Results are reported for the individual sites and as combined/pooled results (total population 1 836 099). Hospital data were extracted from the SUS11 system which contained records of all hospital events (inpatient admissions, outpatient appointments and A&E visits) for the PCTs’ registered populations between 1 August 2007 and 30 September 2010. The PCTs also extracted data from GP systems in two forms. First, as a register of the local adult population from 1 August 2007 to 31 July 2009, and second, in the form of data sets recording details of GP consultations over the same time period.
Personally identifiable information was stripped out before any data were passed to the research team. Individuals’ NHS numbers (the personal identifiers) were concatenated with a pass code chosen by each of the five PCT areas (and unknown to the research team), and these were pseudonymised at source using the secure hash algorithm SHA-256.12 This allowed for linkage between the hospital and the general practice data from each area, while preserving the individuals’ anonymity.
A series of variables were created from each data set that were believed to be potentially predictive of an unplanned (emergency) hospital admission in the last 12 months of the study period. These variables captured resource use, utilisation patterns, diagnostic history, test results and prescription history in the 2 years prior to the predictive period. They were created for all individuals aged 18+ years and registered with a GP in one of the five areas on 31 July 2009. To account for the expected time required to obtain and process the hospital and GP EMR data, we included a 2-month lag in our analyses, with data from 1 August 2007 to 31 July 2009 used to predict emergency admissions during the period 1 October 2009 to 30 September 2010.
Patient age and gender were obtained from the GP register. Patient area of residence was not available, and therefore GP practice attributed index of multiple deprivation (2007) was used as an area deprivation measure. The number of months the patient was registered with the PCT in the preperiod was calculated and included in the regression. Hospital inpatient data were used to capture utilisation in the 0–90, 91–181, 180–365 and 366–730 days prior to the lag period. The number of emergency and elective admissions for these periods was included and dichotomous variables for any day case or regular attendance use were created.
A broad range of diagnostic variables were developed using primary and secondary diagnosis fields and a Charlson Comorbidity Index13 was calculated for each patient and included in the model.
A&E data were used to determine A&E visit rates for various intervals in the preperiod, both total visits and unplanned follow-up visits. A&E diagnostic information was not reliably reported across the five sites and was not included, although X-ray use was included. Outpatient data provided variables on outpatient visit rates for various intervals, as well as missed appointment rates and the number of different specialty types consulted. Diagnostic information in outpatient data was missing in more than 95% of cases and was not included.
GP EMR data were used to create proxy visit rates (these may include both actual GP visits, in addition to other events documented in a person's records) for various intervals and to capture any increase in visit rates at the end of the preperiod that may reflect increased morbidity in a patient. EMR Read codes (CTV3 version) were used to obtain test results (blood pressure, blood serum levels, HbA1c levels, etc), body mass index, smoking history, prescription history (number and type) and a range of diagnostic variables during the preperiod.
Variables from each data set (inpatient (including day case and regular attenders), A&E, outpatient and GP EMRs) were added and modelled sequentially using standard logistic regression in SPSS V.20. Emergency admission in the next 12 months was used as the dependent variable, producing a risk score ranging from 0 to 100. Separate models were developed for each PCT area, and analysis was limited to patients aged 18–95 who were on the GP register in the area. Over-fitting was tested using a split sample approach, with only minor differences observed in positive predictive values (PPV), sensitivity and specificity.
The findings provided here include individual site results and results combined across the five sites. We also created five additional predictive models (referred to below as the ‘four-site regression models’), each one combining data from four sites and applying coefficients to the fifth remaining site. With this, we could compare results with individual site predictive models to help assess the value of local model development.
The full list of more than 300 potential variables was ultimately reduced to 88 by exclusion of variables with low volumes and low significance levels across the sites. The 88 variables ultimately included in the model (and regression coefficients) may be found in online supplementary appendices B and D, and a full listing of the variables considered for inclusion and detailed specification of each variable are available at http://www.nuffieldtrust.org.uk/.
Cost variables were examined, with secondary care activity costed according to the method used in development of the person-based formula for allocating commissioning funds to general practices in England.14 Ultimately, these were not included in the predictive models because of concerns about difficulties in constructing these variables by possible future users; however, costs are included in descriptive findings to help in the design of intervention strategies.
Predictive modelling performance is typically documented reporting PPV and sensitivity at the risk score threshold of 50. However, because interventions may be targeted at patients with higher or lower risk scores and intervention strategies may be calibrated differently depending on the risk level and characteristics of patients at various risk score levels, we report PPV sensitivity at 20 risk score cut-off points (vigintiles) and provide detailed patient characteristics at risk score thresholds of 50 and 30 to facilitate intervention design.
Pooled individual site results
There were 1 836 099 people aged 18 and over who were registered with a GP practice on 31 July 2009. Table 1 shows the combined results of individual site regressions including the number of patients correctly identified, PPV and sensitivity for four models:
IP based on hospital inpatient data only (including day cases and regular attendances);
IPAE using inpatient and A&E data;
IPAEOP using inpatient, A&E and outpatient data;
IPAEOPGP using inpatient, A&E, outpatient data and GP EMR.
At the traditional risk score threshold level of 50, all four models perform respectably in terms of PPV (ranging from 0.523 to 0.538), but sensitivity remains quite low across all models (0.049–0.060). Lowering the threshold to 30 increases sensitivity somewhat with a concomitant reduction in PPV (ranging from 0.417 to 0.422). The receiver operator characteristic area under the curve (C statistic) improved with the addition of each data set, increasing from 0.731 with the inpatient-only model to 0.780 with the full model.
Of particular note is the finding that the addition of each data set added power, that is, correctly identified more patients with an admission in the next 12 months, with only a minor reduction in PPV. At a risk threshold of 50, the addition of A&E data resulted in an increase of 400 (8.6%) correctly flagged patients, with no loss in PPV. The inclusion of outpatient data added a more modest 2.9%, but with a slight loss in PPV (0.531 to 0.523). The addition of GP EMR data added an additional 9.6% of patients, while actually increasing the accuracy of the model (PPV increasing from 0.523 to 0.538). The added power of the A&E data set is less substantial at a risk score threshold of 30 (4.9%), but outpatient and GP EMR data sets had larger increases in correctly identified patients (4.3% and 19.9%).
There were also important differences between the models in terms of the characteristics of patients identified as high risk. For example, at a risk score cut-off of 50, patients identified using inpatient data alone had high prior emergency inpatient utilisation rates with 2.62 admissions in the previous year compared to 2.43 when A&E data were added; 2.34 with the addition of an outpatient and 2.20 with the addition of GP EMR data (see table 2).
The inclusion of additional data sets also led to a reduction in the observed morbidity level of patients at the 50 threshold, with lower numbers of long-term conditions, fewer patients with multiple long-term conditions, lower Charlson Comorbidity Index scores, less history of alcohol abuse and mental illness and lower emergency inpatient costs in the years prior to the predictive period. Similar, but less substantial, differences were observed at the risk score threshold of 30. The addition of the A&E data set resulted in higher rates of A&E visits in the preperiod among patients identified at both risk score cut-off levels, and the addition of outpatient data resulted in higher outpatient visit and missed visit rates among identified patients.
These findings suggest that the inclusion of additional data sets added some predictive power and generally tended to find additional patients who were less severely ill (more severely ill patients tended to remain high risk). Thus, they potentially offer an opportunity for intervention at earlier stages in the progression of a patient's condition. However, the number of patients identified with no prior emergency inpatient utilisation in the previous 2 years was relatively small across all models. At a risk score threshold of 50, only 0.3% of patients correctly identified by the inpatient-only model had no prior emergency admissions in the previous 2 years, and increased only modestly 3.2% in the full model (table 3). At a risk threshold of 30, the rates were higher, but only reached 12.4% for the full model.
Individual site and ‘four-site regression’ model results
Overall, the performance of the models was similar at the individual site level. Only modest differences were found in PPV levels and sensitivity across the sites. For runs using non-GP data only (IPAEOP), at a risk score threshold of 50, PPVs ranged from 0.512 to 0.552 and sensitivity ranged from 0.047 to 0.071. For the model including GP EMRs, PPVs ranged from 0.521 to 0.566 and sensitivity from 0.053 to 0.073 (see online supplementary appendix A). There was some variation in the magnitude of regression coefficients between sites, but in general the coefficients were comparable for the models based on the non-GP data model (IPAEOP) (see online supplementary appendix B). For the model including variables from GP EMRs (IPAEOPGP), the level of variation in regression coefficients (size and direction) was somewhat greater for those variables derived from GP data. We observed substantial differences in frequency of reporting of Read codes across sites, which no doubt contributed to this variation. The level of significance of individual variables also varied across sites (see online supplementary appendix C), but most variables were consistently strongly significant across all sites, especially variables involving prior emergency inpatient admissions. Again, higher levels of variation in levels of significance were observed for the GP variables derived from Read codes.
We compared the results for these individual site models to that of a pooled model combining data from four of the sites and applied coefficients to the remaining individual site. We generally found only small differences in predictive accuracy (PPV) between these two approaches (table 4); however, the individual site models identified a greater number of true positives. For example, in Cornwall at a risk score cut-off of 50, the individual site model using hospital data correctly identified 1041 patients while the pooled model identified only 754 patients. In Newham, however, the four-site model was more powerful, correctly identifying 858 patients compared to 734 patients for the individual site approach. In both cases (and in general across all sites), the model identifying larger numbers of true positives had a somewhat lower PPV, suggesting that improved case finding volume came at the expense of predictive accuracy.
Testing alternative population denominators
Models built using inpatient data only (IP) were also built for just the subset of patients who had some inpatient care in the previous 2 years (to reflect typical predictive modelling efforts that may have been conducted without access to GP registry information), as well as for the group who had had an emergency admission in the previous year (to replicate analyses conducted by PARR users).
Combining the results from the five sites at a risk score threshold of 50, models using the full GP register correctly identified 4627 patients compared with 3572 patients in runs restricted to patients with prior inpatient care and 3060 to runs limited to patients with an emergency admission in the previous year. This substantial increase in case finding was obtained with only moderate loss in PPV (0.529 GP list, 0.559 previous inpatient and 0.589 emergency admissions in the last year). Similar results were also found for all hospital data models (IPOPAE, though with any hospital use in the previous 2 years, rather than just any inpatient use).
Using the full GP registry population did not result in finding substantial numbers of patients with no emergency admissions in the previous 2 years, but the increased numbers of patients identified included more patients with less prior use and lower levels of morbidity. For a profile of patients identified using these alternative denominators, see http://www.nuffieldtrust.org.uk/.
This analysis has looked at the performance of new variants of predictive models for case finding. These models are intended to update and improve upon the established combined predictive model-like15 and PARR models5 widely used in the NHS.
Each of the models examined produced reasonably robust performance, by some measures better or at least comparable to similar prior models.9 At a risk threshold of 50, patients identified by the models had PPVs ranging from 0.523 to 0.538. While the percentage of all patients with future admissions identified was relatively low (sensitivity 0.049–0.060), lowering the risk threshold allows the identification of more patients with relatively small loss in PPV (eg, at a risk threshold of 30, the full model identified 14% of future admissions with a PPV of 0.417). Users of predictive modelling algorithms have obvious trade-offs between maximising the number of patients identified and predictive accuracy. Lower risk score thresholds will find more patients, but these patients are increasingly less likely to have future admissions.
The implications for intervention design are important. Patients at lower risk thresholds have less prior inpatient use and lower morbidity, so an intervention here might be calibrated to be less intensive. But because the models are less accurate at lower risk scores, the amount that can be spent on an intervention is also reduced if you wish to achieve financial break-even (ie, where the cost of intervention is offset by cost savings from reduction in future admissions). As documented in table 2, at a risk score threshold of 50, the rate of future admission for patients identified by the full model (IPAEOPGP data) was 1.31 admissions per year with an associated cost of £2270. If there were a 10% reduction in future admissions, £227 could be spent on an intervention to improve care coordination and still achieve break even. However, at a lower risk threshold of 30, the lower rates of future admissions and costs means that lower intervention expenditures are required to achieve break even (£151 with a 10% reduction in future admissions). A detailed business case analysis with mean emergency inpatient costs in the next 12 months within each risk vigintile level is available via http://www.nuffieldtrust.org.uk/.
These data also provide other information that may be useful in the development of intervention strategies. As shown in table 2, patients identified by the models have extremely high rates of chronic disease (85–90% with long-term conditions at risk threshold of 50), often with multiple long-term conditions and high Charlson Comorbidity Index levels, indicating serious medical needs. However, these patients already have high use of outpatient care and very high GP visit rates. This suggests that simple access to ambulatory care is not the issue, but prevention needs to look at care coordination and management of complex problems and at the ability of patients and their families to manage chronic illness. High-risk patients identified by the models also have relatively high rates of mental illness (27–32% at risk threshold of 50) and moderate levels of alcohol abuse, factors that are likely to complicate any intervention strategy.
It is also important to note the limitations of these data in helping frame the design of any intervention strategy. Other studies have documented that high-risk patients often have important characteristics related to care needs and patient capacity not captured by administrative data and EMRs. For example, interviews with high-risk patients and their families have documented high levels of social isolation for many, as well as precarious housing status.16 These non-medical factors are likely to have significant impact on health status and utilisation patterns. Moreover, not much is known about how/whether care coordination and management has actually failed for these patients. Are these high-risk patients just very sick patients whose hospitalisations are largely not preventable/avoidable,17 or has the care delivery system failed in some important dimensions that can be corrected with improved care coordination and management? These data cannot answer this very critical question, and it is clear that the field would benefit from further study that examined the circumstances of patients identified as high risk by predictive modelling algorithms to sort out more clearly the factors contributing to high rates of emergency admission.
This study does document the value of incorporating data sets beyond inpatient records. The addition of A&E and outpatient records resulted in the identification of more high-risk patients with little or no loss of predictive accuracy. These data sets are readily available and have standardised reporting formats that facilitate analysis. While the absence of useful diagnostic information in these data sets is a limiting factor, the improvement in case finding and usefulness in descriptive profiling of high-risk patients to help in intervention design (eg, high rates of A&E use rates, high rates of missed outpatient appointments) suggests that their inclusion is clearly merited.
The use of GP EMRs presents significant challenges. While the lack of access to these data is unlikely to remain a problem, the variation in completeness and quality of data is problematic. The use of the unwieldy Read codes system makes analysis difficult, and we observed significant differences across sites in reporting patterns. Some of these differences may be caused by the under-reporting of diagnostic variables, others by differences in coding approaches. However, the potential improvement in case finding, especially among patients with lower rates of utilisation in the preperiod, suggests that these barriers are worth confronting. Our development of new variables beyond those included in prior predictive modelling efforts8 contributed substantially to enhanced case finding, and further work on variable development is likely to lead to further improvements. Again, these data are also useful in providing descriptive information on high-risk patients to help in intervention design (eg, documenting potential targets of opportunity such as uncontrolled hypertension or diabetes).
This study does not provide definitive findings on the value of developing individual site models compared to simply applying coefficients from multisite or national model coefficients to local data. Our four-site regression models generally had comparable PPVs to individual site models, but for the majority of sites the four-site regression approach correctly identified a somewhat fewer number of patients with future admissions. Though it is tempting to speculate on whether differences in the health needs of the population or coding differences affect model performance, we did not observe any clear patterns between the areas. Our analysis is somewhat limited by the small number of sites involved, which might cause somewhat greater variability in regression coefficients (regression coefficients for each of the five four-site models are available at http://www.nuffieldtrust.org.uk/). Development of a national model using SUS data only is planned to further assess the need/value of locally developed models.
Finally, it is worth noting that use of the GP registry data for the denominator also proved to be of significant importance. Many prior predictive modelling efforts have been limited to patients with utilisation history in whatever data sets were included. By including all patients in an area, not just those with prior use, the impact on predictive modelling of prior use was apparently enhanced. As a result, patients with more moderate levels of prior use and morbidity were found to be at higher risk than patients with no prior use at all, and were often assigned higher risk scores than when the analysis included just patients who had prior use. Accordingly, the use of the GP registry as the denominator can improve rates of case finding and may permit identification of patients at earlier stages.
The authors wish to thank all individuals in the PCTs who contributed data used for this study during the Whole Systems Demonstrator, Virtual Wards and Social care end of life studies.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online appendix
Contributors The preparation of data sets and input variables and costs was undertaken by TG and IB; JB carried out the central modelling. MB advised on the analysis and results and managed the work of the research team at the Nuffield Trust. All authors contributed to the writing of the paper. JB is the guarantor.
Funding The research received no specific grant but was funded by the Nuffield Trust. The resulting models will be used as part of the WSD trial funded by the Department of Health.
Competing interests None.
Ethics approval This study only involved the analysis of pseudonymous secondary data. Since there were no identifiable human subjects, ethics approval was not required for this research and informed consent was not sought.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Details of the derived models variables and definitions are available from the authors at the Nuffield Trust at firstname.lastname@example.org.