Objectives To identify ways of using routine hospital data to improve the efficiency of retrospective reviews of case records for identifying avoidable severe harm
Design Development and testing of thresholds and criteria for two indirect indicators of healthcare-related harm (long length of stay (LOS) and emergency readmission) to determine the yield of specified harms coded in Hospital Episode Statistics (HES).
Setting Acute National Health Service hospitals in England.
Participants HES for acute myocardial infarction (AMI), bowel cancer surgery and hip replacement admissions from 2014 to 2015.
Interventions Case-mix-adjusted linear regression models were used to determine expected LOS. Different thresholds were examined to determine the association with harm. Screening criteria for readmission included time to readmission, length of readmission and diagnoses in initial admission and readmission. The association with harm was examined for each criterion.
Results The proportions of AMI cases with a harm code increased from 14% among all cases to 47% if a threshold of three times the expected LOS was used. For hip replacement the respective increase was from 10% to 51%. However as the number of patients at these higher thresholds was small, the overall proportion of harm identified is relatively small (15%, 19%, 9% and 8% among AMI, urgent bowel surgery, elective bowel surgery and hip replacement cohorts, respectively). Selection of the time to readmission had an effect on the yield of harms but this varied with condition. At least 50% of surgical patients had a harm code if readmitted within 7 days compared with 21% of patients with AMI.
Conclusions Our approach would select a substantial number of patients for case record review. Many of these cases would contain no evidence of healthcare-related harm. In practice, Trusts may choose how many reviews it is feasible to do in advance and then select random samples of cases that satisfy the screening criteria.
- hospital administrative data
- case finding
- healthcare-related harm
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of the study
Routine hospital administrative data is inexpensive and easy to access.
Potential healthcare-related harm can be identified in these data by specific codes used for such harms for example, complications and adverse reactions or by using indirect indicators known to be linked to such harm such as long length of stay (LOS) or readmission.
Comparing the performance of long LOS and readmission across four contrasting cohorts of patients (emergency: acute myocardial infarction; urgent: non-elective bowel cancer surgery; semiurgent: elective bowel cancer surgery; and elective: hip replacement) when thresholds for LOS and criteria for readmission are manipulated shows that sensitivity and positive predictive power to identify harm can be increased.
To confirm if any harm identified in the administrative records is healthcare-related, retrospective case record review is required.
The approach would identify potential healthcare-related harm in large numbers of cases. A selection process for those going forward to case record review would be required.
There are two main ways avoidable severe harm is identified in patients in acute hospitals in the National Health Service (NHS) in England. However, both approaches have shortcomings. Incident reporting systems depend on staff compliance while retrospective case record reviews (RCRR) requires considerable resources which preclude universal application to all hospital admissions.
An alternative approach could be for hospitals to employ administrative data to screen records for case note review. These include codes for healthcare-related harm (’direct' indicators of harm) such as ‘complications and adverse events’ or ‘pulmonary embolism after a surgical procedure’. In theory, all harm should be recorded though the completeness of recording is doubtful. In addition, the data will not distinguish between levels of severity so instances of severe harm cannot be distinguished from lesser forms. There has been no recent estimation of the completeness of reporting of harm in administrative data, but a historical (1999–2003) comparison with Australia suggested under-reporting: 2.2% of all NHS admissions compared with 4.75% in Australia.1 2
Administrative data also offers the possibility of using two ’indirect' indicators that reflect the potential consequences of healthcare-related harm: longer than expected length of stay (LOS) and unplanned readmission. Such indicators might be used to identify those patients in whom it is likely harm has occurred even if it had not been recorded. In this way the detection or yield of harm could be enhanced. Support for such an approach comes from RCRR studies that have shown adverse events are associated with longer LOS,3–5 though the direction of causality is unclear as a longer stay increases the risk of an error in care and subsequent harm.6 7 Similarly, a high rate of unplanned readmissions has been shown to be associated with harm having occurred.8–10
Our aim was to identify ways of using routine hospital data to improve the efficiency of retrospective reviews of case records for identifying those patients who suffered avoidable severe harm. In this paper we focus on exploring the potential use of two indirect measures (long lengths of stay and early unplanned readmissions) in patients with one of four tracer conditions (acute myocardial infarction (AMI), bowel cancer surgery (elective and emergency) and hip replacement). These were selected to represent elective, urgent and emergency admissions for medical and surgical reasons.
With lengths of stay we have evaluated different thresholds for defining a long stay; while with early readmissions we have assessed a range of criteria. To evaluate different thresholds we assessed how they are associated with the presence of direct indicators of harm coded within the electronic care records. We then used this to understand the resource implications for choosing different screening criteria. Our specific analyses, therefore, focused on:
The relationships between direct indicators of harm and:
Long lengths of stay.
The time between a patient is discharged from hospital and readmitted as an emergency.
The length of a readmission spell.
How the presence of a direct indicator of harm affects the chances of a subsequent readmission.
Primary diagnoses on readmission that reflect potential harm.
For this analysis we used hospital administrative data as reported in Hospital Episode Statistics (HES) for the 2014–2015 financial year. Diagnosis and procedure codes are based on the International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) and the Office of Population Censuses and Surveys (OPCS) Classification of Interventions and Procedures, version 4 (OPCS-4). There are 20 diagnostic fields per episode: a primary diagnosis and up to 19 secondary diagnoses.
Categorising direct indicators of harm
A set of ICD-10 codes were selected as direct indicators of harm. These included complications and adverse reactions (T80-88; Y40-84) plus others identified from the literature,11–13 or through consultation with clinicians and clinical coders and other sources of guidance.14 15 Direct indicators of harm were divided into eight groups: complications and adverse reactions; thromboembolism; pneumonia; pressure sores; urinary tract infections; falls; fractures; post-procedural complications. Further details on these definitions are shown in online supplementary appendix 1.
Selection of patient cohorts
Given that the approach to using direct or indirect indicators might vary by the type of admission and condition, we selected four examples that represented both medical and surgical conditions and the urgency of the admission: (1) emergency: AMI; (2) urgent: non-elective bowel cancer surgery; (3) semiurgent: elective bowel cancer surgery; and (4) elective: hip replacement. A full list of ICD-10 and OPCS-4 codes for defining these cohorts is supplied in online supplementary appendix 2. Cohorts were restricted to adults (over 17 years), their first admissions in 2014–2015 to an acute NHS hospital in England for the relevant condition or surgery, and discharged alive. Admissions excluded day cases or regular day or night attenders.
Evaluating thresholds for unexpected long LOS
LOS was measured as the time between admission and discharge, ignoring transfers to other hospitals. The expected LOS of each patient was estimated using a linear regression model that controlled for age, sex, comorbidities, deprivation and emergency admissions in the previous 12 months. Comorbidities were measured using the Charlson Score and deprivation by quintiles of the Index of Multiple Deprivation (IMD). The expected LOS was estimated for each of the patient cohorts in 2013–2014 and the parameter estimates applied to the 2014–2015 population.
Thresholds for long lengths of stay were defined as multiples of the expected values, specifically two, three, four and five times. For each threshold, we investigated the association with the direct indicators of harm using a linear regression model adjusting for age, sex, Charlson Score, IMD and number of emergency admissions in the previous year. We then evaluated the impact of different thresholds on the number of patient records that each trust would have to review in order to find instances of harm (positive predictive value; PPV) and the proportion of patients with harms reported in the spell that would be selected (sensitivity).
Patients in each cohort were identified as having an unplanned readmission if their subsequent admission was an emergency and occurred either in 2014–2015 or 2015–2016. Screening criteria for readmissions were derived from combinations of:
Time to readmission.
Harm reported in the first episode of the readmission spell.
Harm reported in the initial admission (present either in second or subsequent episodes).
Primary diagnosis on readmission.
Length of the readmission spell.
The relevance of the primary diagnosis on readmission to harm having occurred in the previous admission was determined by expert clinical review. The reviewers judged whether the primary diagnosis codes used in the dataset could represent healthcare-related harm. This was done for each cohort to allow for differences types of harm between the four cohorts (eg, conditions that are relevant for the hip replacement cohort may not be relevant for the AMI cohort).
As with lengths of stay, different options were evaluated in terms of proportions of case notes to be reviewed, and the sensitivity and positive predicted value associated with the occurrence of direct indicators of harm.
There were two patients on the steering group for this study. One was recruited through a local hospital patient reference group and the other was a patient advisor for a charity auditing the care of acutely ill patients, who was recruited through contact with the charity. Both had experience of family illness and in the case of one representative, a family member who had experienced a healthcare-related harm. The two patient representatives contributed, through discussion at meetings, to the design of the study and suggested a range of possible harms that the study could look at. They also provided helpful input as to how study results might be effectively communicated to wider audiences.
Mean age was similar across the patient cohorts (68–70 years) and each cohort had similar distribution of socio-economic status (table 1). There were differences in sex: men made up 66% of the AMI cohort but only 40% of the elective hip replacements. There were also differences in comorbidity: 45% of emergency bowel cancer patients had Charlson scores of four or more compared with 5% of patients receiving elective hip replacement.
What is the relationship between long lengths of stay and direct indicators of harm?
The median LOS differed between cohorts (table 2) and the distribution was highly positively skewed. The prevalence of harm increased with LOS (figure 1). For example, 94% of emergency bowel surgery patients staying longer than 50 days had experienced harm compared with 16% in those who stayed 5–9 days. Linear regression analysis found that nearly all categories of harm were significantly positively associated with LOS: some exceptions being fractures in emergency bowel cancer patients and hospital-acquired infections in all bowel cancer patients (online supplementary appendix 3, table 3.1).
What are the resource implications for choosing different screening criteria derived from LOS?
The impacts of different LOS thresholds on the numbers of patient notes that would be selected and on the sensitivity of detecting harm are shown in table 3.
Of all the patients with a direct indicator of harm, the proportion included in these subgroups (the sensitivity) decreases as the threshold rises, from 56% to 15%. At the same time, the PPV (the number of cases identified in each threshold that would actually have a harm code) increases. For example, for hip replacement the value rises from 10% among all patients to 51% for those staying three times longer than expected.
What is the relationship between the presence of a direct indicator of harm and time to emergency readmission?
Rates of readmission within 7 days for AMI and bowel surgery (6%–7%) are notably higher than for hip replacement (2.6%) (table 2). Approximately half of readmissions within 28 days occur within the first 7 days. More than half the surgical patients readmitted within 7 days had a direct indicator of harm compared with 21% in the AMI cohort (figure 2). With bowel surgery cases, these proportions decline as time to readmission increases but for hip replacement patients, the decline is only after 28 days and for AMI patients there is no association with time to readmission. This pattern among the surgical cohorts is specifically due to declines in ‘complications and adverse reactions’ which constitutes approximately 65% of harm across these groups in contrast to the AMI cohort where only 25% are ‘complications and adverse reactions’.
How does the presence of a direct indicator of harm affect the chances of a subsequent emergency readmission?
For individuals who have a direct indicator of harm reported in the initial spell, the 7 day readmission rates are higher than the overall 7 day rates for each cohort except those undergoing urgent bowel cancer surgery (table 4). After adjusting for age and likelihood of readmission (using Patients at Risk of Re-admission within 30 days Score16), the time to readmission was only related to the record of a direct indicator of harm in the initial spell for the AMI cohort (excluding cases where there was a direct indicator of harm reported in both the initial spell and the readmission).
How is the presence of a direct indicator of harm related to the length of a readmission spell?
The proportion being readmitted for more than 3 days varied by cohort: bowel surgery over 50%, AMI 44%, hip replacement 32%. The latter group have more patients who stay for less than a day (32% compared with 13% to 17% for the other cohorts). 44% of these readmissions are for conditions reported as ’other soft tissue disorders' and they also include all patients with a primary diagnosis of ’phlebitis and thrombophlebitis' (including deep vein thrombosis). The latter represent cases where patients are discharged quickly after the readmission to manage the condition in the community. Direct indicators of harm are significantly more prevalent when the readmission LOS is longer than 3 days (table 5) (p<0.001 for each cohort).
In how many readmissions is a potential harm suggested by the primary diagnosis?
Just over half of readmissions within 7 days for the AMI cohort were admitted with a primary diagnosis that was judged by expert review to be potentially related to harm (table 6). This compares with much higher proportions among the other cohorts, with nearly 99% of the hip surgery cohort having a diagnosis that could be potentially related to harm among that group (more details in online supplementary appendix 3, table 3.2). There were no significant differences in the proportions within 7 days from 8 to 28 days among the surgical cohorts. However, there are significant reductions in proportions among elective bowel surgery readmissions that occur after 28 days (p<0.001). Among the AMI cohort, the proportion among the earlier readmissions (50.9%) is significantly higher than among the later readmissions (p<0.001).
What are the resource implications for choosing different screening criteria derived from emergency readmissions?
Choices of criteria against which to select case records for review will depend on a trade-off between numbers of cases selected and proportion of harm that is found. table 7 shows the outcomes of different criteria using 28 day readmissions as a baseline against which to compare proportions of notes selected and sensitivities. With the hip surgery and elective bowel replacement cohorts, given the majority of the primary diagnoses on readmission are associated with harm having occurred (table 7), restricting selection to these primary diagnoses (scenario C) makes little difference. Further limiting selection to cases where readmission lengths of stay exceed 3 days will reduce the number of case records for review by 50% or more but will correspond to larger reductions in sensitivity (comparing scenario E with scenario C). Including any cases where direct indicators of harm are present, regardless of LOS and primary diagnosis will increase the PPV at the expense of having a larger proportion of notes to review.
It is possible to derive criteria from hospital administrative data to select case records in order to find cases of severe hospital-related harm. Our findings suggest that adopting screening rules based on two indirect indicators (long lengths of stay and early readmission) has the potential to improve the targeting of case record reviews. The precise scale of any improvements is unclear until selection criteria have been tested against the outcomes of such reviews.
The selection of LOS thresholds for screening could have a significant impact on the yield of cases of harm. For example, over half those who stayed at least three times longer than expected had a direct indicator of harm. The PPV of the screen increases across the thresholds, such that the number of cases identified as having a direct indicator of harm as a proportion of all cases examined increases. By manipulating LOS threshold, choices can be made in relation to the trade-off between the number of cases that will actually have a harm code present at that threshold and the proportion of all the harm that will be found if only those cases are investigated.
Selection of the time to readmission has an effect on the yield of potential cases of harm but it varies by condition. At least 50% or more of the surgical patients had a direct indicator of harm if readmitted within 7 days, compared with 21% in the AMI cohort. With bowel surgery cases, these proportions decline as time to readmission increases but for hip replacement patients, the decline is only after 28 days and for patients with AMI there is no association with time to readmission. This suggests that the sampling window for the latter two conditions could be extended to 28 days without significant impact, with the added benefit of increasing the number of patients in the hip replacement cohort where there are relatively few readmissions. The lack of relationships between a harm code found in the initial admission with time to readmission suggests that the occurrence of harms in the initial episode may not be useful as a criterion for selecting case records, except, perhaps, for patients with AMI. However, because we were not able to identify individuals who had died outside hospital soon after the initial spell this analysis may underestimate subsequent outcomes after discharge following a harm.
For primary diagnoses on readmission deemed to be potentially associated with harm, there were higher frequencies among the surgical cohorts with between 80% and 100% of readmission primary diagnoses identified by clinical reviewers being potentially associated with harm. For the AMI cohort, the corresponding proportion was around 50%. This suggests that the nature of the primary readmission diagnosis can be useful as a further criterion for selecting case records and this approach would have the greatest impact on the AMI cohort.
Our assessments of thresholds used PPVs and sensitivity as we were interested in the value of case note review in revealing a harm and an indication of how effective they are at detecting all harms that may have occurred. We could also have used specificity and negative predictive values, but considered them less useful in this context.
Previous RCRR studies estimated that the proportion of inpatients with an adverse event ranged from 3.8% to 16.6%.17 Across the four cohorts, we found higher proportions of harm codes. However, the conditions we studied were chosen to highlight different admission types and were not representative of all conditions. Similar rates of harm in bowel cancer patients (between 20% and 40%) have been found in previous studies.18 A recent Dutch study found a higher proportion of harm in patients admitted with AMI, between 13.3% and 29.9%.19 The harm in this study was found using an audit tool to screen electronic patient records which could account for a greater proportion of harm being uncovered. National clinical audits suggest that the rate of complications after percutaneous coronary interventions is around 9% in England,20 and after total hip replacement are about 1% for infection and venous thromboembolism and 3%–4% for dislocation.21 This is consistent with the lower incidence of harm that we found in this group.
Our study is the first to look at the relationships between different LOS thresholds and a variety readmission characteristics and coded harm in hospital administrative records in the UK. The Dutch have used a threshold based indicator, unexpectedly long LOS (UL-LOS), defined as the percentage of clinically admitted patients with an actual hospital stay that is more than 50% longer than expected, as a generic indicator of hospital safety for a number of years.22 Cihangir et al 23 found a significant positive correlation between UL-LOS and another indicator of potentially poor quality care, the hospital standardised mortality ratio (r=0.44, p<0.001) in hospital administrative data from two-thirds of Dutch hospitals. In a small, single site validation study the authors found that in 85 out of 191 patients with colorectal cancer with UL-LOS, 43 (51%) had one or more adverse events, compared with 9% (4 out of 44) in the non-unexpected long LOS group.24
There are a number of limitations to be considered when interpreting our findings. First, our estimates of harm, based on our inclusive approach, are inflated by an element of double counting when the same harms are coded under more than one of the harm categories. This inflation is compounded by the fact that we inevitably include a number of conditions that were present on admission without the routine application of ’present on admission’ codes. Harm codes such as pneumonia or urinary tract infection, which occur more commonly in emergency medical patients, are also more difficult to attribute to healthcare-related processes. It is also known that hospitals vary in the way they use complication and adverse reaction codes which has the potential to introduce further bias in measurement.1
Second, one of the main limitations of developing screening approaches is the accuracy and completeness of the routine data.1 25 This limitation is particularly important to consider in this study which used harm codes for the internal validation of indirect indicators of harm derived from the same routine hospital administrative data source. As our approach to harm code definitions was inclusive in an attempt to increase sensitivity, it is likely that our estimates are inflated which may have biased assessment of the performance of indirect indicators as screening tools.
Third, to assess the feasibility of using indirect indicators to detect cases of harm, we have relied on direct indicators. However, we cannot know the relationship with other types of harm that are not so easily identifiable within routine data, and whether approaches that would work for the detection of harm codes we identified would work more generally. Our analytic approach can indicate that a patient may have experienced harm and the patterns of harm among groups of patients with differing conditions and across an organisation but it cannot confirm if that harm was healthcare-related, its severity or its avoidability without recourse to case record review.
Finally, not all long lengths of stay reflect a patient’s acute care needs as there may be several days when a patient is awaiting discharge. However, it has not been possible to distinguish these from the data.
Screening for harm using routine data allows large numbers of records to be rapidly processed with minimal resources required. The Global Trigger Tool has also been developed as a harm screening tool.26 However using this tool to identify triggers linked to harm, case records need to be individually screened, which is usually done manually creating a more resource intense process. Our approach to screening using long lengths of stay or early readmissions would identify a substantial number of patients for case record review if extended across all patients. In 2014–2015, there were approximately 16 million admissions to all NHS acute trusts in England,27 yet the cohorts we included in this analysis comprised less than 1%. If a threshold of twice the expected LOS was used for screening, we estimate this would have resulted in 9974 case record reviews in 2014–2015 across the four cohorts, which equates to about 70 per trust. If we assume a similar LOS distribution across all hospital admissions, this scales up to around 7000 reviews per year in an average sized trust. Furthermore, having increased the sensitivity of the screening process at the expense of specificity, many of these cases would contain no evidence of healthcare-related harm.
In practice, therefore, this suggests that one approach might be for Trusts to decide how many reviews they are going to do in advance and then select random samples of cases that satisfy the screening criteria. Consideration of how to adapt or create algorithms applicable to wider patient populations would also be required.
Contributors HH, NB, CSJ and JVM designed the study. CSJ, NCO and KC carried out the analyses. NB, CSJ and HH wrote the first draft of the manuscript. All authors provided input and approved the final version for submission.
Funding This work was supported by Department of Health Policy Research Programme (PR-R9-0114-14001).
Disclaimer The views expressed in this publication are those of the authors and not necessarily those of the Department of Health. The funders of the study had no role in the study design; data collection, analysis, and interpretation; or composition of the report.
Competing interests None declared.
Ethics approval This study was approved by North West- Lancaster Research Ethics Committee (15/NW/0941). As data were pseudoanonymised individual patient consent was not required. Hospital Episode Statistics data ©2017, re-used with the permission of the Health & Social Care Information Centre. All rights reserved.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Further details on statistical models and definitions are available from the Nuffield Trust at firstname.lastname@example.org.
Patient consent for publication Not required.