Elsevier

Resuscitation

Volume 90, May 2015, Pages 1-6
Resuscitation

Rapid Response Systems
Are observation selection methods important when comparing early warning score performance?

https://doi.org/10.1016/j.resuscitation.2015.01.033Get rights and content

Abstract

Introduction

Sicker patients generally have more vital sign assessments, particularly immediately before an adverse outcome, and especially if the vital sign monitoring schedule is driven by an early warning score (EWS) value. This lack of independence could influence the measured discriminatory performance of an EWS.

Methods

We used a population of 1564,143 consecutive vital signs observation sets collected as a routine part of patients’ care. We compared 35 published EWSs for their discrimination of the risk of death within 24 h of an observation set using (1) all observations in our dataset, (2) one observation per patient care episode, chosen at random and (3) one observation per patient care episode, chosen as the closest to a randomly selected point in time in each episode. We compared the area under the ROC curve (AUROC) as a measure of discrimination for each of the 35 EWSs under each observation selection method and looked for changes in their rank order.

Results

There were no significant changes in rank order of the EWSs based on AUROC between the different observation selection methods, except for one EWS that included age among its components. Whichever method of observation selection was used, the National Early Warning Score (NEWS) showed the highest discrimination of risk of death within 24 h. AUROCs were higher when only one observation set was used per episode of care (significantly higher for many EWSs, including NEWS).

Conclusions

Vital sign measurements can be treated as if they are independent – multiple observations can be used from each episode of care – when comparing the performance and ranking of EWSs, provided no EWS includes age.

Introduction

Several prior publications by our group have assessed the performance of the early warning scores (EWS) used to identify patients’ severity of illness.1, 2, 3 EWS systems allocate points in a weighted manner, based on the derangement of a predetermined set of patient vital signs variables (e.g., blood pressure, heart rate, breathing rate, temperature) from an arbitrarily agreed “normal” range. The points for each variable are summed and the total is used to inform a change in the patient's vital sign monitoring schedule and/or trigger a call for expert help at the bedside.

Our performance evaluations of EWS systems have often used all the observations sets from a sample of patient episodes and, therefore, contain multiple vital sign observation sets from the same patient episode in the analysis.2, 3 Multiple observations may be within 24 h of death (or another adverse outcome). We have considered an EWS to be better than another if it has a significantly (p < 0.05) higher area under the ROC curve (AUROC,4 a measure of discrimination). Sicker patients generally have more vital sign assessments, particularly immediately before an adverse outcome, and especially if the vital sign monitoring schedule is driven by an EWS value. A previous review of our manuscripts have suggested that this lack of independence of the data points in the sample data sets may influence the measured discriminatory performance of an EWS. By extension, it is possible that an EWS that appears significantly better than another when all observations are used may appear significantly worse if only one observation was used from each episode.

EWS systems are implemented clinically as if vital sign measurements and derived EWS values are independent. EWS escalation decisions are generally binary. For example, an EWS value of 4 might result in no clinical intervention, whereas a value of 5 might require both a change in vital signs frequency and an assessment by a doctor (irrespective of the fact that the previous EWS was 0 or 4). Consequently, it is the extent of derangement of physiology at any given time, and not the degree of abnormality of any previous measurements, that determines actions taken based on the EWS score.

One study by our group5 has suggested that treating vital signs and derived EWS values as independent may be reasonable, as an alternative technique of using one randomly chosen observation set per episode did not significantly affect discrimination of the combined outcome of cardiac arrest, unanticipated ICU admission or death within 24 h. In this study,5 as with others,1, 2, 3 the ability of the EWSs to discriminate the risk of a range of adverse outcome has been compared using the AUROC.4 The use of multiple observation sets per episode has the potential to bias the AUROC as episodes with more observations may disproportionately influence the AUROC compared to those with fewer observations.

The aim of this study was to determine whether a lack of independence between data points when sampling patient observations might significantly change the ranking of EWS systems by their AUROC (i.e., lead to one EWS having significantly higher AUROC than another under one method of choosing observations, but significantly lower AUROC than the other under another method). We compared the performance of EWSs using three methods of observation selection: (1) all observations, (2) one randomly chosen observation set per episode, and (3) one observation set per episode based on choosing a random point in time within each episode.

Section snippets

Method

This research falls within local research ethics committee approval (08/02/1394) from the Isle of Wight, Portsmouth and South East Hampshire Research Ethics Committee.

Results

In the study period, there were 64,285 episodes of care with admission on or after 25/05/2011 and discharge on or before 31/12/2012, where the patient was aged ≥16, the patient was not discharged alive on the day of admission and one or more observations were taken during the last 24 h of the stay. Associated with these episodes of care were 1395,941 observation sets (mean 21.7 observation sets per episode). Of these episodes, 30,723 (48%) were for male patients and the mean age at admission was

Discussion

For the three observation selection methods studies, there were no significant changes in the rank of EWSs by their AUROCs except for EWSs that included age. Overall, the findings of this research suggest that vital signs and derived EWS values for EWSs that do not include age can be treated as if they were independent (even though the ICCs demonstrate that there is within-episode dependence). Therefore use of multiple observation sets from a single episode in assessing the performance of EWS

Conclusions

Using multiple observations from each episode of care does not significantly change the ranking of EWSs compared to using only one observation from each episode, as long as no EWS includes age. This is in spite of observed dependence between vital signs observations collected during the same episode of care.

The method of observation selection can affect the AUROCs recorded—higher AUROCs (significantly higher for many EWSs) are recorded when only one observation is used from each episode. For

Conflict of interest statement

VitalPAC is a collaborative development of The Learning Clinic Ltd (TLC) and Portsmouth Hospitals NHS Trust (PHT). At the time of the study, PHT had a royalty agreement with TLC to pay for the use of PHT intellectual property within the VitalPAC product. PM, DP, PF and PS are employed by PHT. GS was an employee of PHT until 31/03/2011. PS, PF, and the wives of GS and DP are minority shareholders in TLC. GS, DP, and PS are unpaid research advisors to TLC, and have received reimbursement of

Funding

None.

Acknowledgements

The authors would like to acknowledge the efforts of the medical, nursing and administrative staff at Portsmouth Hospitals NHS Trust who collected the data used in this study. Dr Stuart Jarvis had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Cited by (0)

A Spanish translated version of the summary of this article appears as Appendix in the final online version at http://dx.doi.org/10.1016/j.resuscitation.2015.01.033.

View full text