Article Text

Development of a predictive model to identify inpatients at risk of re-admission within 30 days of discharge (PARR-30)
  1. John Billings1,
  2. Ian Blunt2,
  3. Adam Steventon2,
  4. Theo Georghiou2,
  5. Geraint Lewis3,
  6. Martin Bardsley2
  1. 1Robert F. Wagner Graduate School of Public Service, New York University, New York, USA
  2. 2Nuffield Trust, London, UK
  3. 3Department of Clinical Outcomes & Analytics, Walgreen Co., Deerfield, Illinois, USA
  1. Correspondence to Martin Bardsley; martin.bardsley{at}


Objectives To develop an algorithm for identifying inpatients at high risk of re-admission to a National Health Service (NHS) hospital in England within 30 days of discharge using information that can either be obtained from hospital information systems or from the patient and their notes.

Design Multivariate statistical analysis of routinely collected hospital episode statistics (HES) data using logistic regression to build the predictive model. The model's performance was calculated using bootstrapping.

Setting HES data covering all NHS hospital admissions in England.

Participants The NHS patients were admitted to hospital between April 2008 and March 2009 (10% sample of all admissions, n=576 868).

Main outcome measures Area under the receiver operating characteristic curve for the algorithm, together with its positive predictive value and sensitivity for a range of risk score thresholds.

Results The algorithm produces a ‘risk score’ ranging (0–1) for each admitted patient, and the percentage of patients with a re-admission within 30 days and the mean re-admission costs of all patients are provided for 20 risk bands. At a risk score threshold of 0.5, the positive predictive value (ie, percentage of inpatients identified as high risk who were subsequently re-admitted within 30 days) was 59.2% (95% CI 58.0% to 60.5%); representing 5.4% (95% CI 5.2% to 5.6%) of all inpatients who would be re-admitted within 30 days (sensitivity). The area under the receiver operating characteristic curve was 0.70 (95% CI 0.69 to 0.70).

Conclusions We have developed a method of identifying inpatients at high risk of unplanned re-admission to NHS hospitals within 30 days of discharge. Though the models had a low sensitivity, we show how to identify subgroups of patients that contain a high proportion of patients who will be re-admitted within 30 days. Additional work is necessary to validate the model in practice.

  • Health Economics
  • Health Services Administration & Management
  • Statistics & Research Methods

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: and

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Article focus

  • Preventing readmissions to hospital is important for patients, and recent policy in the English NHS means it may also impact on hospital income.

  • Using logistic regression of existing person-level hospital records to develop a model that predicts the probability of readmission to hospital within 30 days.

Key messages

  • The model has been purposely designed to use only a few variables that might be entered from computerised information, or at the bedside.

  • The model has reasonable accuracy in terms of positive predictive value for the highest risk patients but low sensitivity.

Strengths and limitations of this study

  • Simples and easily implemented model.

  • The model has low sensitivity which means high risk patients are rare.


Unplanned hospital admissions and re-admissions are regarded as markers of costly, suboptimal healthcare1 ,2 and their avoidance is currently a priority for policy-makers in many countries.3 For example, in England, Department of Health guidance for the National Health Service (NHS) proposes that commissioners should not pay provider hospitals for emergency re-admission within 30 days of an index elective (planned) admission.4 The rate of re-admissions will also play an important part in monitoring health system performance, as one of the new English public health ‘outcome indicators’.5

In the 5-year period between 1 April 2004 and 31 March 2010, 7% of patients discharged from a hospital in England were re-admitted to hospital within 30 days,6 with costs to the NHS estimated at £1.6 billion each year.7 While many different interventions have been introduced with the aim of reducing unplanned admission rates,8 the evidence for their efficacy and cost-effectiveness is limited.9

One reason why hospital-avoidance interventions may be unsuccessful is if they are offered to patients who are at insufficiently high risk of future unplanned hospital admission.10 A history of recent hospital admissions is not an accurate predictor of future admissions by itself,11 and it seems that clinicians are often unable to make reliable predictions about which patients will be re-admitted.12 ,13 There is also some evidence to show that many re-admissions may not be avoidable.14 One recent analysis observed a strong relationship between rates of rehospitalisation and overall admission rates within specific areas.15

In order to improve the accuracy of the ‘case finding’ process, researchers have in recent years developed a number of predictive risk models for the NHS, with the specific aim of identifying people at highest risk of a future admission or re-admission.16–21 The models use relationships in routine data to identify patients at highest risk of unplanned admission or re-admission in the next 12 months. Most of these models are not contingent on an index hospital admission but instead calculate risk scores across the population at a particular date, and are designed to be run on regular (eg, monthly or quarterly) basis.

One advantage of predicting which patients are at high risk of admission in the coming 12 months is that this prolonged period may allow time for clinicians and care managers/coordinators to contact and engage with high-risk patients. Furthermore, it allows time for behavioural and treatment changes to be instigated. On the other hand, the likelihood of an unplanned admission is highest in the immediate postdischarge period,22 so there may be advantages of predicting re-admissions that occur shortly after discharge. Moreover, there is evidence that some forms of preventive care may be more effective at reducing unplanned hospital admissions if initiated immediately after an acute illness.23

Outside the UK, a number of tools have been built for predicting re-admissions within 1524 or 30 days25–29 of discharge from hospital. Until recently, NHS funding arrangements gave hospitals in England few financial inducements to predict and prevent unplanned hospital admissions. However, the 2011–2012 operating framework proposed that NHS hospitals should not be reimbursed for re-admissions occurring within 30 days (as well as only receiving a 30% marginal rate for emergency admissions above their 2008/2009 baseline).30 In practice, the degree to which this new 30-day rule is being enforced appears to vary across the country.31 Yet even without monetary incentives, knowledge of 30-day re-admission risk could still be useful to clinicians for focusing their discharge planning efforts and postdischarge support on high-risk patients.

Predictive tools built in one setting may not necessarily be accurate when used in other healthcare settings.32 So in this paper, we describe how we used English hospital episode statistics (HES) data to develop a predictive model that can identify patients at high risk of re-admission to an NHS hospital in England within 30 days of discharge. The model, which we are calling ‘PARR-30’ (Patients at Risk of Re-admission within 30 days), can be used in practice in one of two ways: either automatically, drawing variables from Secondary Uses Service (SUS) data and from a hospital's Patient Administration System (PAS)33 or ‘manually’ by clinicians, who can obtain the requisite information from the patient and the patient's notes and then calculate the risk using a spreadsheet or a smartphone/tablet ‘app’. To facilitate this second approach, we sought to develop an algorithm that was easy to use and which relied only on a relatively small number of variables that are easily obtained from available records or from the patient. In order to justify changes in services it is often helpful to understand how the costs of the intervention may improve care and lead to lower overall costs down the line. We therefore present figures for the potential scope for savings that might accrue through reduced hospital use according to the level of risk targeted, and with assumptions about the effectiveness of interventions. We are making PARR-30 freely available for use across the NHS in England.


The model was developed using HES obtained from the NHS Information Centre for health and social care for the period 1 April 2006–30 April 2009.34 This analysis was based on existing data that had been anonymised and therefore did not require additional ethical approval. Records were extracted for 10% of all NHS hospital admissions in England with a discharge date between 1 April 2008 and 31 March 2009. Episodes coded as births, deaths in hospital, self-discharged patients and patients transferred to other hospitals were excluded, leaving a total of 576 868 admissions remaining in the sample. Re-admissions within 30 days were restricted according to the provisions of the 2011–2012 NHS operating framework by excluding non-emergency admissions; admissions where a national tariff was not applicable; admissions for multiple trauma or transport accidents; and children aged under four. Cancer-related re-admissions were included since their exclusion in the operating framework is being reconsidered.35 Patients that died after discharge were included in the development data set, reproducing what would happen if the models were applied in practice. The data set allowed patients to have more than one re-admission episode, but each re-admission within 30 days was linked only with the most recent prior admission.

A series of logistic regressions were conducted to identify those variables that contributed most to predictions of a re-admission within 30 days of discharge, creating ‘risk scores’ of 0.01–1.00 describing the estimated probability of re-admission within 30 days. The variables were restricted to those that could be formulated in ways that meant they could be easily extracted from the patient or patient notes in the absence of computerised administrative data. The variables tested were based on a broad range of measures used in the PARR algorithm which predicts re-admission within the following year.14 These included: the number admissions to hospital by type (emergency versus non-emergency) according to a time interval prior to current admission (90, 180, 365, 730 and 1095 days); the number of episodes per spell in prior admissions (a proxy measure of complex health problems); number of different types of specialists consulted in the last 12 months (based on services recorded in outpatient records); a range of diagnostic categories and hierarchical diagnostic groups;36 characteristics of the area of residence and length of stay. A dummy variable was introduced to represent the hospital—using the largest hospital in the data as the reference point. The reduced number of variables ultimately included in this algorithm were selected based on their impact on overall model performance and ease of access to medical notes or recall by the patient.

We measured the accuracy of the predictive models in a number of ways. The positive predictive value (PPV) estimates the accuracy of the model by comparing the number of people identified by the model as being likely to experience a re-admission (based on a given threshold of risk) with the number in this group who went on to experience a re-admission. The PPV is defined as the percentage of those at-risk patients identified by the model who experience a re-admission. The sensitivity is a related concept, which measures the percentage of those people who experienced a re-admission who are correctly identified by the model as being at risk. Conversely, the specificity is defined as the proportion of people who did not experience an admission who were correctly identified as being at low risk. The sensitivity and specificity of the model can be traded off against each other by varying the threshold of risk used to define them. As well as these measures, we present estimates of the area under the receiver operating characteristic (ROC) curve, which shows the trade-off between true positives (sensitivity) and false negatives (1-specificity) at all possible thresholds. Further, we were interested in the proportion and costs of patients who experienced a re-admission by risk band (20 bands based on the level of the risk score).

Predictive models are generally ‘trained’ on a data set consisting of dependent variables (in this case hospital re-admissions) relating to many patients, together with a range of independent variables from an earlier time period. The apparent performance of the model on the training (or development) data set tends to be considerably better than its performance on another, independent data set—even if that other data set consists of similar patients. In order to ensure that the model's predictions are generalisable, it is therefore important to evaluate the performance of the model more realistically than simply by calculating its accuracy on the training sample.

To do this, we used a bootstrapping evaluation method.37 This method involves estimating the degree of ‘optimism’ associated with evaluating the apparent performance of the model on the training data set. The observed performance is moderated by subtracting the degree of optimism from the apparent performance. We calculated the degree of optimism by repeatedly drawing a large number of different bootstrapped samples from the training data set. Each consisted of the same number of patients as in the original sample, but each was formed by selecting patients randomly and allowing individual patients to be selected more than once. To estimate the optimism, we fitted models to each of these bootstrapped samples and calculated the difference between the performance of the model on the bootstrapped sample and its performance on the original sample. The optimism was estimated as the average of this quantity over all bootstrapped samples. One of the benefits of bootstrapping is that it allows all of the available patient data to be included in the data set. It has been shown to estimate model performance more accurately than other approaches such as those that involve setting aside data for a separate validation sample.38

The estimated degree of optimism was found to be very small, which we would expect given the large number of patient records available. We therefore extended the bootstrapping technique to add CI on the proportion of patients who experience a re-admission by risk band, treating optimism as negligible. These CI were formed by applying the final model to a large number (we chose 200) of bootstrapped samples, and estimating the range within which the proportions fell 95% of the time. CI were calculated for the ROC curve using a Bayesian bootstrap method.39

Developing the business case analysis

A ‘business’ case analysis is presented to help guide providers and commissioners in designing interventions to prevent patient re-admissions. For this we calculated the mean re-admission costs of all patients in each risk band and at various cut-off levels. This represents the cost to NHS hospitals in terms of potential non-payment. Various assumptions are made about the effectiveness of interventions aimed at reducing the number of re-admissions within 30 days (10%, 15% and 20%), to estimate the maximum amount that could be expended on prevention, based on the estimated ‘savings’ from reduced admissions.

The costs of secondary care utilisation were estimated from HES data using 2010/11 Payment by Results (PbR) tariffs.40 ,41 Activity not covered by the national tariff was costed using the national reference costs (NRC)42 and adjusted to ensure that they were directly comparable with 2010/11 tariffs. If neither tariff nor NRC were available, the activity was costed as the average tariff for the specialty under which it was delivered in a method developed for a national study of resource allocation.43 Therefore, costs represent income for providers rather than the actual cost of treatment for the re-admission.

We established the costs of inpatient admissions by calculating the Healthcare Resource Group (HRG) for each patient's whole stay in hospital. We derived the full cost using the PbR rules44 to combine the HRG, admission method and other details of the hospital stay. This included the unit cost of the HRG and any payments due because of an unexpectedly long stay in hospital, or for any specialist care or additional treatments and tests (so-called unbundled payments). We also calculated outpatient and emergency department costs as recommended by the PbR rules.


The derived model uses a small set of variable types included below:

  • Patient age—used as squared value.

  • Index of multiple deprivation (IMD)45 for the patient's place of residence (derived from a postcode and mapped to one of five bands based on the lower super output area).

  • Whether the current admission was an emergency admission (defined in HES as an admission category 21–28).

  • Whether there had been an emergency hospital discharge in the past 30 days.

  • The number of emergency hospital discharges in the last year (from any hospital).

  • History in the prior 2 years (from any HES primary or secondary diagnostic field) of 11 major health conditions drawn from the Charlson co-morbidity index.46

  • The hospital of the current admission, using a set of 150 dummy variables for the major acute hospitals in England.

Table 1 summarises the coefficients for these variables—the details for the individual hospital coefficients are provided in appendix 1. Box 1 gives an example of how a risk score for an individual patient could be calculated. Full details of the model will also be made available on the Nuffield Trust website (

Box 1

A worked example of how a risk score can be calculated

An 83-year-old woman from a relatively deprived part of London is about to be discharged from a large teaching hospital in London. She received an emergency admission linked to her chronic obstructive pulmonary disease 7 days ago. Though she has not been in hospital within the last month, she did have two discharges following emergency admissions in the previous year. The patient also has a history of congestive heart failure and peripheral vascular disease.

The patient's risk of re-admission within the next 30 days was 25.1% (24.4% to 25.6%)
 Age squared68896E-050.417
 Number of admissions last year20.1210.243
 Admission in last month00.5260.000
 Current admission is emergency/unplanned10.5560.556
 Deprivation—IMD score 25–4010.0660.066
 Congestive heart failure10.0950.095
Peripheral vascular disease10.1040.104
 Chronic pulmonary disease10.2240.224
 Hospital: Barts and The London National Health Service Trust10.1170.117

The performance of the model is shown in table 2 in terms of the percentage of patients with a 30-day re-admission, and the costs of those re-admissions displayed by risk band vingtiles. For the higher-risk patients (risk bands 11 and above), re-admission rates ranged from 47.7% to 88.7% compared with an overall re-admissions rate of 12.2%. However, the number of patients in these high-risk bands represented only a small share (1.1%) of all patients analysed. For risk bands 1–10, the risk of re-admission within 30 days dropped steadily with decreasing risk score, but the number of patients in each band increased. The two lowest-risk bands cover 54.7% of patients with a risk of re-admission within 30 days of 7.1% or lower.

Table 1

Summary of variables* included in model, and their coefficients, SE and significance

The mean re-admission costs tended to be lower in the lower-risk bands because a smaller percentage of patients were re-admitted. However, those in the lower bands who had a re-admission tended to have higher costs (eg, £1340 per admission for patients in band 20 compared with £2143 per admission for patients in band 11).

A business case analysis is provided in table 3, documenting the rate of re-admissions and the maximum level of expenditure at each risk band (and at various risk band cut-off levels). These values indicate where the cost of the preventive intervention equals the net savings from reduced re-admissions—with various assumptions about the effectiveness of interventions (10%, 15% and 20%). With a risk band cut-off at Band 11, mean re-admission costs were £1088 (CI £1046 to £1124—not shown) per patient. Using an assumption of a 10% reduction in the rate of re-admission, £109 per patient (CI £105 to £112—not shown) could be spent on the 6395 patients in these bands, with the costs of the intervention equalling costs of avoided emergency admissions (breakeven).

Table 2

Estimated re-admission 30 days rates and costs by risk band

The PPV for the model for all patients with a risk score above 0.50 (risk bands 11+) was 59.2% (CI 58.0% to 60.5%), with specificity of 99.5% (CI 99.5% to 99.5%) and sensitivity of 5.4% (CI 5.2% to 5.6%; see table 4). The ROC curve ROC in figure 1 illustrates the trade-off between true positives (sensitivity) and false negatives (1—specificity) for the model. Overall, the area under the curve was 0.70 (CI 0.69 to 0.70).

Table 3

‘Business Case’ analysis

Table 4

Estimated model performance bootstrapped central estimate and 95% CI

Figure 1

Receiver operating characteristic curve for the bootstrapped central estimate (red line) and 95% CI (shaded area).


We have built a predictive model using a limited set of variables that were generated from HES. The model estimates the risk and costs of re-admission to an NHS hospital in England within 30 days of discharge. We have intentionally selected variables that we believe will easily translate to information available from patients’ notes or from the patients themselves. Look-up tables can be built to map variables such as a patient's postcode to deprivation score. This means it is possible to build simple software tools such as a spreadsheet or ‘app’ to calculate scores, as well as by using data from a hospital's patient administration system.

The performance of the model was respectable, with a PPV of 59.2% for a risk score threshold of 50+ and an area under the ROC curve (‘c-statistic’) of 0.70. For example, a recent systematic review of predictive risk models for 30 day re-admissions documented c-statistics ranging from 0.50 to 0.72.47 The specificity of this model (99.5%) is high, although the sensitivity of the model is quite low with only 5.4% of all patients in the sample (bands 11+). The performance of the model could have been improved by including more variables but this would have made the model less useful in practice. Traditional measures of performance, such as the sensitivity, mask the potential value of models in targeting preventive interventions. Knowledge of the percentage of patients in each risk score band who will have an admission in the next 30 days can be useful in titrating resources to patients, with more or different types of resources assigned for patients who are most likely to have a hospital admission. At the highest-risk band, patients had an 88.7% chance of hospital re-admission within 30 days and £178 could be spent per patient on interventions aimed at avoiding re-admission, assuming these interventions were successful at averting 15% of all re-admissions and that breakeven was required. The level and type of resources allocated to these patients should be different from those allocated to patients in the lower-risk levels, such as those in band 6 where chances of re-admission were 28.0%. These data can also be used in setting an overall cut-off level/threshold for the full range of intervention strategies. For example, at a cut-off level at band 5, almost 30% of patients who will have an admission in the next 30 days will be included, and the chance of these patients having a re-admission is 31.8%. The levels and type of intervention for these patients should vary by risk band and patient characteristics, but clinicians and commissioners can use these data to select thresholds for any preventive intervention.

The model has its limitations. It was developed using HES data, but it is intended to be used by hospitals using either a combination of PAS data and SUS data or patient self-reported information on prior use and medical history from the patient's notes. While PAS/SUS data do differ from HES, the differences are minor so we believe this shortcoming is unlikely to affect the accuracy of the predictive model substantially. However, differences in patients’ recall of their prior hospital use and their medical history present bigger challenges to the validity of the model. Self-recall data on health care utilisation can differ from administrative data, especially for people with high levels of healthcare use, older people and people with poor health status.48 ,49 We are currently testing the model to determine the extent to which patient-reported information differs from that recorded in HES.

The ability to identify patients at high risk of re-admission constitutes the first step in any strategy to improve care and services for susceptible patients. The ultimate goal, however, is to couple this ‘case finding’ process with cost-effective interventions that mitigate the risk of re-admission, and ideally, uses the ensuing financial savings to help fund the intervention. Unfortunately, only a modest amount is known about what works, and for whom, in reducing re-admissions.

In a recent systematic review, Hansen et al50 identified a broad range of strategies that have been employed, including predischarge interventions (improved discharge planning, patient education, medication reconciliation, postdischarge follow-up appointment, etc), postdischarge interventions (patient hotlines, telephone appointment reminders, home visits, etc) and other interventions to bridge the transition from hospital to home such as nurse coaching. Many of the studies looked at were small and not well designed. Five of 16 randomised controlled trials documented statistically significant reductions in the absolute risk of re-admission, but no single intervention or bundle of strategies were found to be consistently successful in reducing risk.

The data on costs developed here also suggest additional caution. At a risk score cut-off of 0.50 (band 11+), even with an optimistic assumption of a 20% reduction in the rate of re-admissions, the amount available to spend on an intervention and still achieve breakeven is relatively modest (£218 per patient). Broadening the intervention to a cut-off at band 5, this amount drops to £143 (and £71 if a more realistic reduction in re-admissions of 10% is assumed), see table 3. While improved discharge planning, arranging postdischarge follow-up visits and telephone reminders may be relatively inexpensive, other interventions such as nurse coaching and home visits can become quite costly. These data would permit targeting of interventions, with more costly strategies limited to the patients at highest risk, but the level of available resource will undoubtedly be strained if breakeven is expected.

As hospitals in England begin responding to the new financial incentives included in the 2011–2012 operating framework, it will be important to gather evidence about what interventions are effective and for which patients and at what cost. Areas for future research may include determining whether and how the effectiveness of interventions differs according to the underlying level of risk. For example, it may be that patients at lower or moderate risk of re-admission have conditions or circumstances where an intervention is more likely to succeed than for patients at high risk. Equally, there may be certain sub-groups of patients within a particular risk band who are more or less amenable to preventive care. The use of predictive models as case finding tools to target preventive interventions has gained considerable currency in community-based settings. We believe that it is important to consider how such tools might be used in the much more immediate care environment of the hospital to improve the long-term management of patients.


We are grateful to the staff at Chelsea & Westminster Hospital and the Royal Berkshire Hospital for their support and guidance as we developed the PARR-30 model.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors The preparation of data sets and input variables and costs were undertaken by TG and IB; J B did the central modelling and reporting while AS undertook work on bootstrapping and testing derived models. GL wrote the first draft of the paper and coordinated advice from local sites. MB advised on the analysis and results and managed the work of the research team at the Nuffield Trust. All authors contributed to the writing of the paper. GL was employed as a Senior Fellow at the Nuffield Trust at the time this work was undertaken. JB is the guarantor.

  • Funding and disclaimer This research was funded by the Nuffield Trust. The study sponsor was the Chairman of the Nuffield Trust. The sponsor had no role in and the collection, analysis and interpretation of data, in the writing of the article nor in the decision to submit it for publication.

  • Competing interests All authors have completed the Unified Competing Interest form at and declare that no authors have any relationships with any companies that might have an interest in the submitted work in the previous 3 years; none of their spouses, partners or children have any financial relationships that may be relevant to the submitted work; and no authors have no any non-financial interests that may be relevant to the submitted work.

  • Ethical approval This study only involved the analysis of pseudonymous secondary data. Since there were no identifiable human subjects, ethics approval was not required for this research and informed consent was not sought.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Details of the derived models variables and definition are available from the authors at the nuffield trust at