Article Text

Original research
Performance of seven different paediatric early warning scores to predict critical care admission in febrile children presenting to the emergency department: a retrospective cohort study
  1. Sam T Romaine1,
  2. Gerri Sefton2,
  3. Emma Lim3,
  4. Ruud G Nijman4,
  5. Jolanta Bernatoniene5,
  6. Simon Clark6,
  7. Luregn J Schlapbach7,8,
  8. Philip Pallmann9,
  9. Enitan D Carrol1,10
  1. 1Department of Clinical Infection, Microbiology and Immunology, Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
  2. 2Paediatric Intensive Care Unit, Alder Hey Children's NHS Foundation Trust, Liverpool, Merseyside, UK
  3. 3Paediatric Immunology, Infectious Diseases & Allergy, Great North Children's Hospital, Newcastle Upon Tyne, UK
  4. 4Section of Paediatrics, Division of Infectious Diseases, Department of Medicine, Imperial College London, London, UK
  5. 5Paediatric Infectious Disease Department, Bristol Royal Hospital for Children, Bristol, UK
  6. 6The Jessop Wing Neonatal Unit, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
  7. 7Child Health Research Centre, The University of Queensland, Saint Lucia, Queensland, Australia
  8. 8Children’s Research Center, Neonatal and Pediatric Intensive Care Unit, University Children's Hospital Zürich, Zurich, Switzerland
  9. 9College of Biomedical and Life Sciences, Centre for Trials Research, Cardiff University, Cardiff, South Glamorgan, UK
  10. 10Department of Infectious Diseases, Alder Hey Children's NHS Foundation Trust, Liverpool, Merseyside, UK
  1. Correspondence to Professor Enitan D Carrol; edcarrol{at}


Objective Paediatric Early Warning Scores (PEWS) are widely used in the UK, but the heterogeneity across tools and the limited data on their predictive performance represent obstacles to improving best practice. The standardisation of practice through the proposed National PEWS will rely on robust validation. Therefore, we compared the performance of the National PEWS with six other PEWS currently used in NHS hospitals, for their ability to predict critical care (CC) admission in febrile children attending the emergency department (ED).

Design Retrospective single-centre cohort study.

Setting Tertiary hospital paediatric ED.

Participants A total of 11 449 eligible febrile ED attendances were identified from the electronic patient record over a 2-year period. Seven PEWS scores were calculated (Alder Hey, Bedside, Bristol, National, Newcastle and Scotland PEWS, and the Paediatric Observation Priority Score, using the worst observations recorded during their ED stay.

Outcomes The primary outcome was CC admission within 48 hours, the secondary outcomes were hospital length of stay (LOS) >48 hours and sepsis-related mortality.

Results Of 11 449 febrile children, 134 (1.2%) were admitted to CC within 48 hours of ED presentation, 606 (5.3%) had a hospital LOS >48 hours. 10 (0.09%) children died, 5 (0.04%) were sepsis-related. All seven PEWS demonstrated excellent discrimination for CC admission (range area under the receiver operating characteristic curves (AUC) 0.91–0.95) and sepsis-related mortality (range AUC 0.95–0.99), most demonstrated moderate discrimination for hospital LOS (range AUC 0.69–0.75). In CC admission threshold analyses, bedside PEWS (AUC 0.90; 95% CI 0.86 to 0.93) and National PEWS (AUC 0.90; 0.87–0.93) were the most discriminative, both at a threshold of ≥6.

Conclusions Our results support the use of the proposed National PEWS in the paediatric ED for the recognition of suspected sepsis to improve outcomes, but further validation is required in other settings and presentations.

  • infectious diseases
  • paediatrics
  • paediatric A&E and ambulatory care
  • paediatric infectious disease & immunisation

Data availability statement

Data are available upon reasonable request.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • We compared the performance of six Paediatric Early Warning Scores (PEWS) in-use across the UK with the proposed National PEWS, in a large cohort of febrile children attending the emergency department.

  • Our cohort represents a group of children in whom those at risk of poor outcomes can be difficult to identify, meaning effective detection tools are required.

  • We used retrospective data from a single-centre, in which a proportion of eligible cases had to be excluded due to missing observation data.

  • Minor adaptations to five of the PEWS were necessary for inclusion in the study.

  • Only the detection aspect (afferent limb) of the overall PEWS system could be assessed in the study.


Early Warning Scores

The use of Early Warning Scores (EWS) has been recommended for optimising the recognition and management of serious illness to prevent avoidable deaths.1 2 Many different EWS have been used across the UK,3 initially for the purpose of identifying deterioration in adult inpatients.4 The Royal College of Physicians (RCP) standardised UK adult practice in 2012 with the National Early Warning Score (NEWS)3 and subsequent NEWS in 2017.5 NEWS demonstrated good discriminative ability in predicting intensive care unit (ICU) admission and death in adult Medical Admission Unit patients6 and has subsequently been successfully validated in both the Emergency Department (ED)7–9 and prehospital settings.10–12 In order to standardise systems of care, it was, therefore, suggested that NEWS should be used across the entire journey of an acutely unwell patient, including primary care, ambulance service, the ED and inpatient ward.5 13 14 This system-wide introduction of NEWS was associated with decreased mortality in adults with suspected sepsis.15

Paediatric Early Warning Scores

Paediatric EWS (PEWS) have become widely used across the UK, with many different systems in use.16 Despite substantial overlap, currently used tools often differ in their included variables, thresholds, weighting and escalation algorithms.17 Several PEWS have demonstrated good discrimination in predicting ICU admission and death,17 18 and implementing PEWS alongside a rapid response system (RRS) has been shown to reduce in-hospital mortality.19 However, significant variations in performance have been reported with different PEWS, and it is unclear to what extent this relates to tool performance or clinical context such as patient complexity.17 18 The large number of contrasting PEWS, coupled with a lack of internal and external validation,20 21 creates uncertainty over the optimal PEWS. In addition, the use of different PEWS may complicate the transfer of patients between settings, the experience of staff working in different settings and training requirements. Both Scotland and Northern Ireland use a single PEWS across their hospitals.22 23 In England, a standardised National PEWS is being developed with plans for system-wide introduction in April 2021, including primary care, ambulance services, the ED and inpatient wards.23 It is hoped that using a common language across these settings will improve outcomes in children, as has been achieved in adults with NEWS.15

Acute febrile illness is among the most common reasons for children to attend the paediatric ED,24 but only a minority have sepsis or serious illness, making their detection challenging. This difficulty is exacerbated by the tendency of a fever to alter other vital signs. Outcomes are poor for those with sepsis that is recognised late;25 therefore, it is essential that PEWS are able to effectively identify these children.


The aim of the present study was to compare the performance of the proposed National PEWS with six other PEWS currently in-use across the UK for their ability to predict poor outcomes in a large cohort of febrile children presenting to the ED.


Study population and definitions

The study was conducted at a single large tertiary hospital, Alder Hey (AH) Children’s NHS Foundation Trust, Liverpool, UK, which manages around 60 000 ED attendances annually.26 We included children (<16 years) attending the ED with a fever over 38°C, or a history of fever within the previous 3 days, who presented between 1 September 2015 and 31 August 2017. Details of the cohort have been previously published.27 Patients were identified retrospectively from the electronic patient record (EPR) by reviewing eligible ED attendances and critical care (CC) admissions across this 2-year period. Cases were excluded if there was missing observation data, age ≥16 years, no history of fever or patients transferred from another hospital. Cases with missing observation data were defined as those with two or more components of any of the included PEWS not being recorded during their ED stay. If only one component was not recorded, this was deemed to be normal and the case was included. The primary outcome was CC admission (high-dependency unit (HDU) or ICU) within 48 hours of ED presentation. CC admission within 48 hours was used to capture both those children with sepsis recognition in the ED, and a significant proportion admitted to a ward, who later deteriorate and are transferred to CC.28 The secondary outcomes were hospital length of stay (LOS) >48 hours, and sepsis-related mortality, defined as in-hospital death within 28 days, determined to be sepsis related by a review of the medical notes. The seven PEWS evaluated were: AH PEWS,29 Bedside PEWS,30 Bristol PEWS (personal communication, Jolanta Bernatoniene, 2020), National PEWS (personal communication, Simon J Clark, 2020), Newcastle PEWS (personal communication, Emma Lim, 2020), Paediatric Observation Priority Score (POPS)31 and Scotland PEWS22 (online supplemental figures 1–7).

Score calculations

PEWS scores were calculated using the worst observations recorded during the ED stay. AH PEWS was in routine clinical use and was calculated electronically in real time, whereas the other six PEWS were calculated retrospectively for study purposes only.

Score adaptations

The routinely recorded EPR data are based on, and limited to, the information required to calculate a score for the PEWS in-use at our hospital (AH PEWS). The full information required to calculate scores for five PEWS was not available from the EPR data. It was, therefore, necessary to adapt these five PEWS for the purposes of the study. Care was taken to ensure that any changes were minor and as close to the original PEWS as possible (table 1).

Table 1

Summary of parameters used by each PEWS including adaptations

Respiratory distress

The AH PEWS uses a different system for ‘respiratory distress’ than the five other PEWS that use this component. In the AH PEWS, severity is determined by totalling the number of ‘effort of breathing’ features present (grunting, head bobbing, marked subcostal recession, nasal flaring, stridor, tracheal tug; one feature=one point, two or more features=two points), whereas National PEWS, Newcastle PEWS and POPS grade the severity of each individual feature (eg, stridor=2 points) and the highest scoring feature determines the score (online supplemental figures 1, 4–6). Furthermore, the three ‘severe’ features in the National and Newcastle PEWS are not routinely recorded on the EPR so could not be scored. Conversely, ‘mild recession’ and ‘marked subcostal recession’ are routinely recorded on the EPR but are not explicitly categorised in the ‘respiratory distress’ severity grades described in the National and Newcastle PEWS (online supplemental figures 4,5). Therefore, the study authors took a pragmatic decision to include ‘mild recession’ in the mild category and ‘marked subcostal recession’ in the severe category (table 1). The latter decision was based partly on two previously published paediatric respiratory scoring systems that use ‘marked’ features in their severe categories only.32 33 Similarly, for POPS, ‘marked subcostal recession’ was deemed a surrogate for ‘severe recession’ as the latter is not routinely recorded on the EPR (online supplemental figure 6). In the Bristol and Bedside PEWS, overall ‘respiratory distress’ is simply categorised as ‘none’, ‘mild’, ‘moderate’, or ‘severe’ but only individual ‘respiratory effort’ features are recorded on the EPR rather than this overall summary (online supplemental figures 2, 3). For Bristol and Bedside PEWS, the ‘respiratory distress’ category was scored identically to the National and Newcastle PEWS described above. This represents a less significant change to the original scores than other adaptations (eg, using the number of features present as in AH PEWS).

Other adaptations

The ‘gut feeling’ category in POPS (eg, low-level concern=1 point, child looks unwell=2 points) was also modified as there is no grading of the nurse and/or parent concern on the EPR (ie, it is either ‘yes’ or ‘no’). To adapt the EPR information to POPS, nurse and parent concern was scored as 2 points, while nurse or parent concern was scored as 1 point. In addition, comorbidity information was unavailable, so this category in POPS could not be scored (table 1 and online supplemental figure 6). Finally, Newcastle PEWS is currently a ‘trigger-based’ system, whereas the other six systems are ‘score-based’. To allow a more direct comparison, Newcastle PEWS was converted to a score-based system, with mild features scored as 1, moderate as 2, and severe as 4, as is the convention across Bedside, Bristol and National PEWS (online supplemental figures 2–5).

Statistical analysis

Area under the receiver operating characteristic curves (AUC) were calculated for each PEWS. The three optimal cut-offs within each PEWS for predicting CC admission were identified using Youden’s J statistic.34 Sensitivity, specificity, positive and negative predictive value, positive and negative likelihood ratio, OR and accuracy were calculated for each PEWS at each threshold, alongside an asymptotic 95% CI. Statistical analyses were performed using SPSS V.25.

Patient and public involvement

No patients or the public were involved in the present study.


Study population

Of 14 121 cases identified, 2672 were excluded: 2041 (14%) due to missing observation data, 454 (3.2%) due to no history of fever, 96 (0.7%) due to duplicate cases, 77 (0.5%) due to age ≥16 and 4 (0.03%) due to transfer from another hospital. A total of 11 449 cases remained in the final cohort (table 2 and online supplemental figure 8).

Table 2

Summary of demographics, clinical outcomes and scores for each of the PEWS

Missing data

A total of 2041 (14%) cases were excluded due to missing observation data. Blood pressure (BP) was by far the most frequently missed parameter, with 7569 (66%) cases in the final cohort missing a BP measurement, including 26% of all those admitted to CC. In comparison, oxygen delivery was absent in 118 (1.0%) final cohort cases, and all other parameters were missing in fewer than 0.2% of final cohort cases.


A total of 134 (1.2%) children were admitted to CC within 48 hours of ED attendance. A total of 114 were admitted to HDU, 28 to ICU, and 8 to both. Of those admitted to CC, 117 (87%) met Sepsis-3 criteria before CC admission,35 while 99 (74%) met 2005 Goldstein criteria.36 Hospital LOS was >48 hours in 606 (5.3%) children. A total of 10 (0.09%) children died and 5 (0.04%) of which were sepsis-related.

PEWS comparison

All seven PEWS demonstrated excellent discrimination (range AUC 0.91–0.95) for predicting CC admission within 48 hours of ED attendance (table 3). The greatest discrimination was seen with the Bedside PEWS (AUC 0.95; 95% CI 0.93 to 0.97), Bristol PEWS (AUC 0.95; 95% CI 0.92 to 0.97) and National PEWS (AUC 0.95; 95% CI 0.92 to 0.97). In threshold analyses, Bedside PEWS (AUC 0.90; 95% CI 0.86 to 0.93) and National PEWS (AUC 0.90; 0.87–0.93) demonstrated the best discrimination, both at a cut-off of ≥6. At this threshold, National PEWS achieved a sensitivity of 89.6% (95% CI 83.1 to 94.2) and a specificity of 90.6% (95% CI 90.1 to 91.2), whereas Bedside PEWS demonstrated a sensitivity of 87.3% (95% CI 80.5 to 92.4) and a specificity of 91.7% (95% CI 91.2 to 92.2). The remaining five PEWS had at least one threshold, whereby both sensitivity and specificity were over 80% (table 4).

Table 3

AUCs for primary and secondary outcomes for each PEWS

Table 4

Prognostic performance of each PEWS in predicting critical care admission within 48 hours

Most PEWS demonstrated moderate discrimination for hospital LOS >48 hours (range AUC 0.69–0.75), with Bristol PEWS (AUC 0.75; 95% CI 0.73 to 0.77) and National PEWS (AUC 0.75; 95% CI 0.72 to 0.77) the most discriminative (table 3). For sepsis-related mortality, all seven PEWS demonstrated excellent discrimination (range AUC 0.95–0.99), with Bedside PEWS and Newcastle PEWS (both AUC 0.99; 95% CI 0.98 to 1) performing best (table 3). For both secondary outcomes, the detailed performance of the seven PEWS as dichotomised scores is displayed in online supplemental tables 1 and 2).


Our results demonstrate excellent and relatively comparable performance across six different PEWS currently used in the UK, and the proposed National PEWS, in predicting CC admission in febrile children presenting to the ED. For predicting hospital LOS >48 hours, modest discrimination was seen. Given the proposed standardised system performs similarly to several systems currently in use, our findings provide evidence to support the use of the National PEWS in the ED in order to improve standardisation and reduce variability in escalation of care (as the same thresholds for escalation will be used nationally). Before applying the score to prehospital and inpatient settings, further validation in different cohorts across multiple presentations (eg, non-febrile and trauma) and settings will be required.

The development of National PEWS follows the development and introduction of an effective and standardised system across the entire adult patient journey in England.5 15 There are clear theoretical and practical advantages in using a consistent ‘language’ to detect deterioration in patients across their entire journey. These benefits include (1) standardisation of care when transferring patients between hospitals, (2) consistency in handovers between primary care, ambulance services, ED, inpatient wards and ICU, (3) facilitated benchmarking and quality control across services and institutions, (4) improved standardised education of nursing and medical staff. Using a standardised system will allow for consistency of monitoring and make deterioration easier to recognise. Importantly, for staff members working in multiple departments or trusts (eg, bank/locum staff, trainee doctors on rotation, staff moving between trusts), using the same system will increase the opportunity for standardised training and familiarity. This will reduce the risk of confusion, inconsistency and miscommunication, consequently improving early recognition of deterioration.

A critically ill child may present to the ED, and it is usually clear to the clinician that the child needs resuscitation, stabilisation and transfer to CC. Other children presenting earlier in the sepsis trajectory may not appear critically unwell in the ED and may be admitted to a ward, then deteriorate within the next 48 hours. PEWS are an ideal tool for identifying and tracking physiological changes, and the introduction of a standardised score would potentially allow the deterioration to be tracked from ED to ward to CC.

Further validation of the National PEWS is required in other presentations and other contexts, such as primary care and ambulance services. Studies of PEWS in these settings are limited. However, the Scotland PEWS has demonstrated promising performance in the prehospital setting,22 and the system-wide introduction of NEWS has led to improved outcomes in adults with suspicion of sepsis.15 Importantly, in the all-comer cohort studied by Corfield et al,22 the performance of Scotland PEWS was inferior to all seven PEWS in the present study, including Scotland PEWS, suggesting the performance of these PEWS may be less discriminatory in presentations other than suspected sepsis. Furthermore, 79% of eligible cases were excluded from Corfield et al22 due to missing data, highlighting a potential problem with the use of PEWS in the prehospital setting. A follow-up study provided a potential solution by demonstrating that mortality and ICU admission can be predicted as accurately using only four components as opposed to eight.37 Even in the ED setting, missing vital sign data pose a significant problem. In the present study, 14% of cases were excluded due to missing two or more components. This would have increased to 69% if those missing one component had also been excluded. In our cohort, BP was by far the most commonly missed parameter, and similarly low rates of BP measurement in the paediatric ED have been reported consistently in the previous research.38–40 This affects the development of scoring systems that include BP, as they are using cohorts with significant amounts of missing data. Furthermore, the ‘real-world’ performance of a score may be greatly affected if one parameter is consistently not measured, reducing opportunities for the reliable detection of serious illness.41 One solution would be a quality improvement drive to ensure a minimum set of observations is done, including BP. However, current barriers to regular BP measurement, including inappropriate cuff-size, patient distress or lack of co-operation, and the concern over falsely elevated readings, may be difficult to overcome. An alternative solution would be to use simpler scoring systems, with fewer parameters, that do not include BP, and instead include parameters that are more consistently measured in the prehospital and ED settings. We have previously developed such a score, the Liverpool quick Sequential Organ Failure Assessment (LqSOFA) and demonstrated its ability to identify those children at risk of poorer outcomes within a febrile ED cohort.27 Compared with the AH PEWS, the LqSOFA was less discriminative (AUC 0.93 vs 0.81).27 However, the LqSOFA has the advantage of using fewer parameters (four vs eight), which are measured easily and consistently, while still achieving good discrimination. This means it may be ideally suited for settings where BP measurement is inconsistent or impossible, such as prehospital, and resource-poor settings, but the LqSOFA requires further validation in these settings.

In addition to the missing data, the present study has some other important limitations. Our study evaluated only the ‘detection’ or afferent aspect of the PEWS system. The rest of the PEWS system, including the escalation plan and the efferent arm (eg, RRS), could not be assessed in this retrospective study, but are crucial components in preventing critical deterioration and improving patient outcomes.19 Furthermore, the escalation plan of the AH PEWS may have influenced the performance of the scores in predicting CC admission, as urgent medical review is required for those with scores of 6 or greater (online supplemental figure 1). Second, it was necessary to adapt some of the included PEWS to allow a comparison to be made, which may have affected the performance of individual scores. In particular, comorbidity information was unavailable, possibly reducing performance of POPS, as this is the only PEWS in our study that incorporates this variable into the overall score. Third, some PEWS include features that are not accounted for in the overall score, such as ‘staff or carer concern’ in the Scotland PEWS (table 1 and online supplemental figure 7). Similarly, other practical factors such as acceptability to staff, ease-of-use and compliance could not be considered but are likely to affect overall performance. Other limitations include the use of retrospective data from a single centre, the criteria for CC admission which may depend on local practice, and the low mortality rate.


Our results demonstrate highly comparable performance of seven different PEWS. The proposed National PEWS demonstrated comparable or slightly improved performance compared with the other six PEWS, supporting the use of National PEWS in the paediatric ED. However, further validation is required in other settings (primary care and prehospital) and other presentations, such as trauma, asthma and diabetic ketoacidosis, to allow the proposed National PEWS to be used as a common language of illness severity across the health system.

Data availability statement

Data are available upon reasonable request.

Ethics statements

Ethics approval

The NHS Health Research Authority granted ethical approval for the study, including a waiver of the informed consent requirement (reference: 16/LO/1684).


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @CarrolEnitan

  • Contributors EDC conceptualised and designed the study, analysed data, reviewed and revised the manuscript, and oversaw all aspects of the study. STR drafted the initial manuscript, cleaned data, analysed data, and reviewed and revised the manuscript. GS iteratively developed the Alder Hey PEWS used in the study, is a member of the National PEWS Delivery Board, contributed to design of the study, provided advice on earlier drafts of the manuscript, and reviewed and revised the manuscript. EL is a member of the National PEWS Delivery Board, helped develop the Newcastle PEWS, contributed to design of the study, provided advice on earlier drafts of the manuscript, and reviewed and revised the manuscript. RGN contributed to design of the study, provided advice on earlier drafts of the manuscript, and reviewed and revised the manuscript. JB contributed to design of the study, provided advice on earlier drafts of the manuscript, and reviewed and revised the manuscript. SC co-chairs the National PEWS Delivery Board, contributed to design of the study provided advice on earlier drafts of the manuscript, and reviewed and revised the manuscript. LJS contributed to study design, provided advice on earlier drafts of the manuscript, and critically reviewed and revised the manuscript. PP supervised the data analysis, and critically reviewed and revised the manuscript. All authors approved the final manuscript as submitted, and agree to be accountable for all aspects of the work.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.