Objectives Some medical patients are at greater risk of adverse outcomes than others and may benefit from higher observation hospital units. We constructed and validated a model predicting adverse hospital outcome for patients. Study results may be used to admit patients into planned tiered care units. Adverse outcome comprised death or cardiac arrest during the first 30 days of hospitalisation, or transfer to intensive care within the first 48 h of admission.
Setting The study took place at two tertiary teaching hospitals and two community hospitals in Winnipeg, Manitoba, Canada.
Participants We analysed data from 4883 consecutive admissions at a tertiary teaching hospital to construct the Early Prediction of Adverse Hospital Outcome for Medical Patients (ALERT) model using logistic regression. Robustness of the model was assessed through validation performed across four hospitals over two time periods, including 65 640 consecutive admissions.
Outcome Receiver-operating characteristic curves (ROC) and sensitivity and specificity analyses were used to assess the usefulness of the model.
Results 9.3% of admitted patients experienced adverse outcomes. The final model included gender, age, Charlson Comorbidity Index, Activities of Daily Living Score, Glasgow Coma Score, systolic blood pressure, respiratory rate, heart rate and white cell count. The model was discriminative (ROC=0.83) in predicting adverse outcome. ALERT accurately predicted 75% of the adverse outcomes (sensitivity) and 75% of the non-adverse outcomes (specificity). Applying the same model to each validation hospital and time period produced similar accuracy and discrimination to that in the development hospital.
Conclusions Used during initial assessment of patients admitted to general medical wards, the ALERT scale may complement other assessment measures to better screen patients. Those considered as higher risk by the ALERT scale may then be provided more effective care from action such as planned tiered care units.
- adverse outcome
- hospital ward
- resource allocation
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
Multiple hospital sites with diverse populations and two several-year time periods were assessed to test the reliability and stability over time of the Adverse Hospital Outcome for Medical Patients (ALERT) scale predictor of hospital outcomes.
ALERT scale elements are already routinely collected or easily available on admission at all hospitals.
ALERT scale results may be used to assign patients to different levels of observation, depending on likelihood of an adverse event such as cardiac arrest, death or transfer to intensive care unit.
ALERT scale predictor cut-off may be adjusted by individual hospitals in order to match the level and distribution of local resources.
The need for an equation to generate the probability of an adverse event; the equation is available in the manuscript as well as a website to ease ALERT scale use (http://www.alertassessmenttool.ca).
In recent years, a number of attempts have been made to reduce preventable morbidity and mortality in hospitalised patients. Unfortunately, initiatives such as the medical emergency team (MET) concept and the Modified Early Warning Score1 ,2 have not consistently led to decreases in hospital mortality, cardiac arrest and transfer to the intensive care unit (ICU).3–6 Litvak et al7 have postulated that necessity for MET intervention implies either inadequate care or inappropriate triage to an inadequate care environment, suggesting that another approach is needed.
In most hospitals, patients requiring admission are triaged either to the ICU or a general ward. Mortality predictors such as Acute Physiology and Chronic Health Evaluation (APACHE) II and Simplified Acute Physiology Score (SAPS) II, have been used to stratify patients following ICU admission.8 ,9 Risk stratification tools for non-surgical patients have been proposed based on laboratory data and some physiological measurements from retrospective administrative databases.10 ,11 The addition of the Charlson Comorbidity Index to the Rapid Emergency Medicine score was shown to influence long term but not 3-day or 7-day mortality.12 These scales may be limited because they do not include measures of functional status (ie, Activities of Daily Living (ADL)) which may influence morbidity and mortality.
Early stratification of patients admitted to non-ICU environments for risk of subsequent physiological collapse, cardiac arrest and death would help determine the optimal care environment to which the patient should initially be admitted.10 ,11 We propose this alternative approach, the Early Prediction of Adverse Hospital Outcome for Medical Patients (ALERT) scale using demographic, routine physiological data and easily obtainable measures of comorbidity and physical disability, available at or near the time of the decision to admit. The primary advantage of our model is that data needed to inform the model is likely to be already collected on admission in most hospitals or easily added to data collected on admission. We hypothesize that we would find a set of indicators which would guide the decision about which ward to assign hospital admissions.
The model development population comprised 4883 patients consecutively admitted to a tertiary hospital (TH1) in Winnipeg, Canada from 1 October 2003 to 30 September 2005. Admissions originated in emergency departments (79%), ICU (8%) or from other sources including rural hospitals, nursing stations and other hospital services (13%). Validation was performed using data collected from 65 640 admissions in four sites and two time periods. The validation sites included two community hospitals (CH1 and CH2), a second tertiary hospital (TH2) and the same hospital in which the model was developed (TH1B). The validation time periods were October 2005 to December 2008 and January 2009 to December 2012. To assess the generalisability of results to populations with varied characteristics, we validated the model for each of the four hospitals and two time periods.
We developed a model predictive of adverse outcome defined as any one of the following:
Death during hospitalisation within 30 days of admission.
Cardiopulmonary resuscitation (CPR) during hospitalisation attempted within 30 days of admission.
Transfer to an ICU within 48 h of admission.
The ALERT scale is a predicted probability of adverse outcome, ranging between 0 and 1.
In 2004, the University of Manitoba Department of Internal Medicine funded and implemented a permanent, comprehensive administrative and research database for its 13 general medical ward services in four Winnipeg hospitals. Data collection for this database is performed by ICU trained data collection nurses through concurrent chart review. Collector training, data management, and collection and verification processes had already been developed through the establishment of a long-standing regional ICU database. A primary objective of this initiative was to develop a model-based outcome stratification tool for medical patients. All data used in development and testing of the ALERT scale was already being routinely collected on hospital admission before the administrative and research database was developed, but previous to the database, the data was not all recorded electronically.
Data elements for the model were selected by consensus in a series of investigator meetings. The final data set included patient demographics and all the elements of APACHE II, SAPS II, Charlson Comorbidity Index (CCI) and the Activities of Daily Living (ADL) Score.8 ,9 ,13 ,14 The objective of ALERT is to discriminate patients for level of care using variables routinely available at the time of admission.
The ADL Scoring system was modified to increase discrimination by adding an intermediate category. We assigned zero points to independent performance of individual activities, three points for an activity performed by the patient but requiring physical assistance and six points when the patient was unable to assist with the activity (see online supplementary appendix 1).
Patients admitted for palliative care or those with an advance directive precluding CPR or ICU admission were excluded. Patients transferred to non-study hospitals or units within 30 days where outcome data could not be obtained were excluded.
To develop the model, logistic regression was performed with adverse outcome as the dependent variable. Generalised estimating equations (GEE) were used to account for more than one admission during the study period by some individuals. After assessing clinical relevance, the independent variables were selected via a stepwise procedure. Both forward and backward selection was used. To assess potential non-linear relationships between independent variables and the probability of an adverse outcome, quadratic terms of all continuous variables were considered.
To classify patients in the higher or lower-risk of adverse outcome groups, an optimal cut-off point in terms of predicted probability of an adverse outcome was chosen. As an improvement in sensitivity necessarily means decline in specificity, and vice versa, we chose the cut-off which ensured equal sensitivity and specificity in the development population. The same cut-off point was then applied to the eight validation populations—four hospitals and two validation time periods. The impact of using different cut-offs from that which equated sensitivity to specificity was assessed descriptively.
Discriminatory power, the ability to distinguish between good and adverse outcomes, was determined by the area under the receiver-operating characteristic (ROC) curve15 and by sensitivity and specificity analyses. The ROC is a plot of true-positive rate (sensitivity) versus false-positive rate (1-specificity), as its discrimination threshold is varied. In our context, it is a plot of the proportion of those that had an adverse outcome that were predicted to have one, versus the proportion that did not have an adverse outcome who were misclassified and predicted to have had one. Discrimination threshold is the cut-off probability used for classifying patients.
The calibration of the model was evaluated using admissions from each of the four hospital study sites and two time periods, using the Hosmer-Lemeshow goodness-of-fit test. The Hosmer-Lemeshow goodness-of-fit statistic (C^) approximates a χ2 distribution. Larger C^ values and smaller p values indicate significant differences between predicted and actual adverse outcomes, and thus a lack of fit of the model.
Characteristics of the development and validation populations were compared using Analysis of Variance (ANOVA) and χ2 tests. The association between each of the top 30 primary admission diagnoses and subsequent adverse outcome were assessed using Pearson χ2.
During the study period 75 189 admissions occurred. Of these, 4666 (6.2%) were excluded for one or more of the reasons provided in our exclusion criteria. For the remaining 70 523 admissions all measurements were obtained in 99.7% of the entries. The median duration of admission was 7.4 days. The four validation populations and two validation time periods are compared to the development population in table 1, demonstrating differences in age, burden of chronic illness, physical disability and physiological variables.
Charlson Comorbidity Score, ADL Score, Glasgow Coma Score, age, heart rate, respiratory rate, systolic blood pressure and white cell count were predictive of adverse outcome. Gender was forced in the model because of its known association with survival. The final model is displayed in table 2. Two variables, age and systolic blood pressure, had significantly non-linear associations with adverse outcome.
Our model results in the development population are described in table 3A, indicating good calibration (C^=9.91) and good discrimination (ROC=0.831). Application of the model to the four validation populations in the 2005–2008 validation period produced ROC of 0.78 (TH2), 0.80 (CH3), 0.79 (CH4) and 0.81 (TH1B), indicating very similar discrimination to that obtained in the model development population. Similarly, in the 2009–2012 validation period, we found ROC of 0.77 (TH2), 0.75 (CH3), 0.80 (CH4) and 0.79 (TH1B). Figure 1 displays the ROC curves for the development population and each of the four validation sites and two validation periods.
Although the discrimination of the model was similar across the four validation sites and two validation periods, calibration of the model declined with time (data not shown in tables). During the 2005–2008 validation period, the Hosmer-lemeshow goodness-of-fit statistics (C^) were: 81.99 (p<0.01) in TH2, 13.34 (p=0.22) in CH3, 17.62 (p=0.06) in CH4 and 20.66 (p=0.03) in TH1B. During the 2009–2012 validation period, the corresponding C^-statistics were: 92.31 (p≤0.01) in TH2, 55.93 (p≤0.01) in CH3, 67.34 (p≤0.01) in CH4 and 55.81 (p≤0.01) in TH1B. The lower the C^-statistic and higher the p value, the better the model fits with the data.
A cut-off point of 0.088, or a predicted probability of adverse outcome of 8.8%, was used to classify patients into two groups; high-risk and low-risk of adverse outcome. Using this cut-off for each validation population resulted in sensitivity in 2005–2008 ranging from 62.5% to 76%, and specificity ranging from 66.2% to 77.1% (table 4). Corresponding sensitivity and specificity in each validation population in 2009–2012 ranged from 58.3% to 81.2% (sensitivity), and from 62.6% to 82.9% (specificity). These are very near the sensitivity and specificity of 75.1% and 75.2% found in the model development population (table 4).
Cut-offs other than 8.8% were examined. Adverse outcome was experienced by 454 (9.3%) of patients in the model development population (bottom of column 3, table 3B). Consequently, the majority had a predicted probability of adverse outcome of less than 10%. Using a cut-off of 8% would result in 32% of all patients classified as high risk (column 8, table 3B) and would provide 77.8% sensitivity (column 5, table 3B) and 72.7% specificity (column 6, table 3B). This demonstrates correct predictability of 77.8% of all adverse outcomes. However, the hospital may be required to provide additional monitoring to many with no adverse outcome. With this cut-off, just 22.6% of those predicted at high risk had an adverse outcome (column 7, table 3B).
An alternative strategy may be to increase sensitivity at the expense of specificity. If an extreme cut-off such as 40% were used, then just 4.3% of all admissions would be classified as high risk (column 8, table 3B). Over half, (52.9%) of those classified as high risk would have an adverse outcome (column 7, table 3B) and 97.8% of those classified as low risk would be correctly classified (specificity, column 6, table 3B). This could save beds in a more intensely monitored hospital unit, but at a cost that only 24.2% of those with adverse outcome would be correctly classified (sensitivity, column 5, table 3B).
Incidence of one or more of adverse outcomes among the eight validation populations ranged from 6.3% to 10.6% and included death within 30 days (5.1–9.8%), CPR within 30 days (0.5–1.8%) and ICU admission within 48 h (1–1.9%). These events were not mutually exclusive. For example, most patients who suffered cardiopulmonary arrest died.
Among the 30 diagnoses most responsible for admission, four may be associated with an increased likelihood of adverse outcome (table 5). Among all admissions, 9% experienced one or more adverse outcome. However, 12.6% (688 out of 5452) admitted with pneumonia, 13.9% with cerebral vascular accident, 14.9% with pleural effusions and 28.9% with primary lung malignancy experienced adverse outcomes. With the exception of primary lung malignancy, however, adverse outcomes never occurred more than two times more frequently among those admitted with a specific diagnosis than among all admitted patients.
Given the frequency of undesired and often catastrophic outcomes on hospital wards, it is reasonable to question whether the available range of monitoring and surveillance modalities is appropriate for all patients. Our study also demonstrated that a substantial proportion of patients are at low risk for such events. Initial identification of substantial proportions of the patient population at either extreme of the risk spectrum could lead to more effective and efficient allocation of nursing staff and monitoring modalities that would match resources to patient need in tiered care units.
A useful predictor of adverse hospital outcome should demonstrate good discrimination for heterogeneous patient populations in tertiary and community hospital settings. It should be independent of primary diagnosis due to the wide variety of possible diagnoses and often the difficulty at the time of admission to make an accurate diagnosis. As well, it should be simply derived from data routinely obtained at the time of initial evaluation. The ALERT scale model meets these requirements more clearly than those previously reported.10 ,11
We overcame several limitations of previous models by including admissions from multiple sources, thereby expanding the population to which the model may be generalised. Unlike Prytherch et al10 and Olsson et al,11 who used admissions from one emergency department, our validation procedure used admissions from four tertiary and community hospitals and two time periods, representing a broader spectrum of hospitalised medical patients. Our results demonstrate a better calibration than that reported for the Rapid Emergency Medical Score,11 and better discrimination than those previously described.10 ,11
Including degree of comorbidity (CCI) and chronic disability (ADL Score), in addition to acute physiological data, may account for the increased robustness of our model. A report by Olsson suggested that combining the CCI with the Rapid Emergency Medical Score added to its prognostic value12 for long-term outcomes. ALERT is the first to incorporate a functional measure, the ADL score, which is associated with mortality, into the predictive equation of short-term adverse hospital outcomes.
The ALERT scale is a better scoring system than those identified in previous studies because it includes easily collectable data at the time of admission, does not require diagnosis, which is often difficult to correctly identify at the time of admission and is the only one that includes a measure of function, the ADL scale, which is independently associated with outcome. It could therefore be used as a practical decision aid in determining admission disposition. Further, the ALERT scale proved to be a good predictor of outcome over time, an 8-year period, as well as in teaching and community hospital settings. To the best of our knowledge this has not been replicated by other proposed models. The ALERT validation patient population was also larger than that used in other studies. In addition, the ALERT scale allows the user to set their own cut-off point based on local conditions to determine what level of risk should allow admission to higher and lower monitoring inpatient setting. This will allow users to better utilise their resources. This flexibility is not something that developers of the other scales have suggested. Finally, our discrimination met or exceeded that found in other models. For example, while our c-statistic (ROC) was 0.83 in the model development site, and ranged from 0.75 to 0.81 in each of the eight validation periods/sites, the c-statistic in the three validation sites for the model described by Prytherch et al10 was 0.78, 0.76 and 0.75. The c-statistic in the model described by Olsson et al11 was 0.85, similar to ours, but their Hosmer-Lemeshow Goodness-of-Fit χ2 (calibration) was 62, much higher than the goodness-of-fit χ2 that we found in our development site and higher than the goodness-of-fit χ2 in most of our validation sites. High goodness-of-fit χ2 indicate a lack of fit of the model to data.
Arterial oxygen saturation is not routinely collected on non-ICU admissions and was only available on 5.3% of admissions. Its impact on the final model was not statistically significant (p=0.246) when we ran this model on the subset of admissions in which oxygen was recorded. For these reasons it was not included in the model development.
Although adverse outcome frequency is higher among those admitted for some primary diagnoses than others, our model provided good discrimination and calibration without using diagnosis as an input variable. This should not be surprising since the observation period was limited to 30 days, during which time factors that were included in model development such as physiological instability, comorbidity, physical disability and the level of consciousness may be more important determinants of short-term outcome. ALERT follows the precedent set by the SAPS II mortality predictor for ICU patients, which does not include admitting diagnosis.
In all nine study populations, the model development site and two validation periods times four validation sites, the 25% of patients included in the highest predicted risk bands accounted for more than two-thirds of actual adverse outcomes. Conversely, 33–47% of the patients were model-estimated to belong to the lowest risk band in each of the nine study populations. These observations are similar to those reported by Prytherch et al10 and raise interesting alternatives. First, initial identification of the quartile accounting for the majority of potentially avoidable deaths would be prerequisite to developing focused preventative strategies. Second, these observations raise the possibility that some of the resources expended in the routine surveillance of low risk patients could be redirected to those in the highest risk quartile. Our findings and those of Olsson et al11 challenge hospital planners and designers to consider reconfiguring the traditional ward environment in order to allow better matching of resources to individual patient requirements as determined by early risk assessment.
The limitation of our model is the need to use an equation to generate a probability from 0 to 1. Although a variety of probability cut-points can be used to stratify patients, in our institution, we have elected to use <0.04 as low, 0.04 to <0.088 as medium, 0.088 to <0.20 as high and 0.20 or higher as very high risk of adverse outcome. To facilitate calculation both an Excel file for downloading and a website are available http://www.alertassessmenttool.ca.
Whether calibration or discrimination should be prioritised when evaluating the performance of a model is an ongoing debate.16–18 Calibration of our model declined with time. Indeed, as medical technology improves, one would expect that the risk of an adverse event would decline across all patients. Thus, one would expect that the calibration of a model, which classifies individuals into specific risk bands, would decline. Since the model discrimination remained stable over time, however, we feel that our model would be a valuable tool for many years. This is because, while the risk of adverse event among the entire patient population (hopefully) declines over time in most hospitals, we found that our model consistently (across time and site) distinguished the relative risk of adverse event across patients with different characteristics and medical histories.
Our investigation produced a practical early risk stratification tool for hospitalised medical patients. Whether or not early and focused attention to the high-risk quartile can reduce adverse events remains to be determined, however, early identification of these patients creates the possibility of resolving the question.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online appendix1
Contributors Primary analysis was conducted by JM and LAS. DR and WP designed the study. JM and LAS conducted statistical analyses. PO, MP, CM and AK contributed additional expertise, over and above that of DR and WP, in determining which comorbidity scores to include in model development, taking into account both ease of data collection and clinical relevance. DR drafted the first manuscript version, while DR, AK and LAS drafted the final manuscript. All coauthors contributed suggestions to each manuscript version.
Funding This study made use of already available data and no external funding was used in this study.
Competing interests None.
Ethics approval Approval was obtained for publication of aggregate data from the regional medicine database, from the University of Manitoba, Faculty of Medicine Ethics Committee.
Provenance and peer review Not commissioned; internally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.