Article Text

Original research
Development and validation of automated computer-aided risk scores to predict in-hospital mortality for emergency medical admissions with COVID-19: a retrospective cohort development and validation study
  1. Muhammad Faisal1,2,3,
  2. Mohammed Mohammed1,4,
  3. Donald Richardson5,
  4. Massimo Fiori6,
  5. Kevin Beatson6
  1. 1Faculty of Health Studies, University of Bradford, Bradford, UK
  2. 2Wolfson Centre for Applied Health Research, Bradford Royal Infirmary, Bradford, UK
  3. 3NIHR Yorkshire and Humber Patient Safety Translational Research Centre (YHPSTRC), Bradford, UK
  4. 4The Strategy Unit, NHS Midlands and Lancashire Commissioning Support Unit, West Bromwich, UK
  5. 5Department of Renal Medicine, York Teaching Hospital NHS Foundation Trust, York, UK
  6. 6Department of Information Technology, York Teaching Hospitals NHS Foundation Trust, York, UK
  1. Correspondence to Dr Mohammed Mohammed; m.a.mohammed5{at}bradford.ac.uk

Abstract

Objectives There are no established mortality risk equations specifically for unplanned emergency medical admissions which include patients with SARS-19 (COVID-19). We aim to develop and validate a computer-aided risk score (CARMc19) for predicting mortality risk by combining COVID-19 status, the first electronically recorded blood test results and the National Early Warning Score (NEWS2).

Design Logistic regression model development and validation study.

Setting Two acute hospitals (York Hospital—model development data; Scarborough Hospital—external validation data).

Participants Adult (aged ≥16 years) medical admissions discharged over a 24-month period with electronic NEWS and blood test results recorded on admission. We used logistic regression modelling to predict the risk of in-hospital mortality using two models: (1) CARMc19_N: age+sex+NEWS2 including subcomponents+COVID19; (2) CARMc19_NB: CARMc19_N in conjunction with seven blood test results and acute kidney injury score. Model performance was evaluated according to discrimination (c-statistic), calibration (graphically) and clinical usefulness at NEWS2 thresholds of 4+, 5+, 6+.

Results The risk of in-hospital mortality following emergency medical admission was similar in development and validation datasets (8.4% vs 8.2%). The c-statistics for predicting mortality for CARMc19_NB is better than CARMc19_N in the validation dataset (CARMc19_NB=0.88 (95% CI 0.86 to 0.90) vs CARMc19_N=0.86 (95% CI 0.83 to 0.88)). Both models had good calibration (CARMc19_NB=1.01 (95% CI 0.88 to 1.14) and CARMc19_N:0.95 (95% CI 0.83 to 1.06)). At all NEWS2 thresholds (4+, 5+, 6+) model, CARMc19_NB had better sensitivity and similar specificity.

Conclusions We have developed a validated CARMc19 scores with good performance characteristics for predicting the risk of in-hospital mortality. Since the CARMc19 scores place no additional data collection burden on clinicians, it may now be carefully introduced and evaluated in hospitals with sufficient informatics infrastructure.

  • COVID-19
  • infection control
  • health & safety
  • quality in health care

Data availability statement

Data may be obtained from a third party and are not publicly available. Our data sharing agreement is with York Hospital and does not permit us to share the data used in this paper.

https://creativecommons.org/licenses/by/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study provides a computer-aided risk of in-hospital mortality for unplanned admissions with COVID-19 using National Early Warning Score (NEWS2) and routine blood test results.

  • About 20%–30% of admissions do not have both NEWS2 and blood test results and so we have developed two scores (computer-aided risk score (CARMc19)_N and CARMc19_NB) reflecting those with/without blood test results.

  • Patients with COVID-19 were determined by COVID-19 swab test results (hospital or community) and clinical judgement and so our findings are constrained by the accuracy of these methods.

  • Our two hospitals are part of the same NHS Trust and this may undermine the generalisability of our findings, which merit further external validation.

  • CARMc19 scores place no additional data collection burden on clinicians and are readily automated.

Introduction

The SARS-19 produced ‘COVID-19’ infection in individuals with symptoms that has challenged healthcare systems globally (Coronaviridae Study Group of the International Committee on Taxonomy of Viruses1). Patients with COVID-19 admitted to the hospital during the early stages of the pandemic were at severe risk of developing the severe disease with life-threatening respiratory and/or multiorgan failure2 3 with a high risk of mortality.

Early diagnosis and management of patients with COVID-19 was key in providing high-quality care, which included palliative care, isolation and escalation to critical care. Early Warning Scores (EWS) are commonly used in hospitals worldwide,4 and in the National Health Service (NHS) hospitals in England, the patient’s National Early Warning Score (NEWS) is used to identify patients at risk of deterioration.5 We have developed two automated risk equations to predict the patient’s risk of in-hospital mortality (CARM_N and CARM_NB) using NEWS only (CARM_N)6 and NEWS+blood test results (CARM_NB)7 following emergency medical admission to hospital. We found CARM_NB performed similar to consultant clinicians.8

NEWS2 was published in December 2017 as an update to NEWS4 that considered new confusion or delirium and allocated three points (the maximum for a single variable). NEWS2 also offers two scales for oxygen saturation (scale 1 and scale 2). Scale 2 is used for patients at risk of hypercapnic respiratory failure who have a lower oxygen saturation target of 88%–92%.

While hospitals continued to use NEWS2 during the COVID-19 pandemic, little was known at the time about how NEWS2 and CARM scores perform in monitoring patients with COVID-19. In this study, we aimed to develop and validate an automated computer-aided risk score (CARMc19) using on admission NEWS2 and blood test results for predicting mortality in our patient cohort that included a large number with a diagnosis of COVID-19. This approach is clinically useful because it places no additional data collection burden on staff for monitoring patients with COVID-19. It must be stressed that this algorithm was developed at a time that predated widespread vaccination and the development of other evidence-based treatments for COVID-19 disease. The Randomised Evaluation of COIVD-19 Therapy (RECOVERY) study was ongoing in the trust during the development of this algorithm.9

Methods

Setting and data

Our cohorts of emergency medical admissions are from two acute hospitals which are approximately 65 km apart in the Yorkshire and Humberside region of England—Scarborough Hospital (SH) (n~300 beds) and York Hospital (YH) (n~700 beds), managed by York Teaching Hospitals NHS Foundation Trust. We selected these hospitals because they had electronic NEWS2, collected as part of the patient’s process of care since April 2019, and were agreeable to the study.

We considered all consecutive adult (aged ≥18 years) non-elective or emergency medical admissions discharged over a course of 3 months (11 March 2020 to 13 June 2020) with electronic NEWS2. For each emergency admission, we obtained a pseudonymised patient identifier, patient’s age (years), sex (male/female), discharge status (alive/dead), admission and discharge date and time, diagnoses codes based on the 10th revision of the International Statistical Classification of Diseases (ICD-10), NEWS2 (including its subcomponents respiratory rate, temperature, systolic pressure, pulse rate, oxygen saturation, oxygen supplementation, oxygen scales 1 and 2 and alertness including confusion), blood test results (albumin, creatinine, haemoglobin, potassium, sodium, urea and white cell count) and Acute Kidney Injury (AKI) score.

The diastolic blood pressure was recorded at the same time as systolic blood pressure. Historically, diastolic blood pressure has always been a routinely collected physiological variable on vital sign charts and is still collected where electronic observations are in place. NEWS2 produces integer values that range from 0 (indicating the lowest severity of illness) to 20 (the maximum NEWS2 value possible) (online supplemental appendix table S1). The index NEWS2 was defined as the first electronically recorded NEWS2 within ±24 hours of the admission time. We excluded records where the index NEWS2 (or blood test results) was not within ±24 hours (±96 hours) or was missing/not recorded at all (online supplemental appendix table S2). The ICD-10 code ‘U071’ was used to identify records with COVID-19. We searched primary and secondary ICD-10 codes for ‘U071’ for identifying COVID-19.

Statistical modelling

We began with exploratory analyses including box plots and line plots to show the relationship between covariates and risk of in-hospital mortality. We developed two logistic regression models, known as CARMc19_N and CARMc19_NB, to predict the risk of in-hospital mortality with following covariates: (1) model CARMc19_N uses age+sex+COVID-19 (yes/no)+NEWS2 including subcomponents; (2) model CARMc19_NB extends model CARMc19_N with all seven blood test results and AKI score. The primary rationale for using these variables is that they are routinely collected as part of process of care and their inclusion in our statistical models is on clinical grounds as opposed to the statistical significance of any given covariate.

We used the qladder function (Stata10), which displays the quantiles of a transformed variable against the quantiles of a normal distribution according to the ladder powers Embedded Image for each continuous covariate and chose the following transformations: (creatinine)−1/2, loge(potassium), loge(white cell count), loge(urea), loge (respiratory rate), loge(pulse rate), loge(systolic blood pressure) and loge(diastolic blood pressure). We used an automated approach to search for all two-way interactions and incorporated those interactions which were statistically significant (p<0.001) from the MASS library11 in R.12

We developed both models using YH data (development dataset) and externally validated their performance on SH data (validation dataset). The hospitals are part of the same NHS Trust but are geographically separated by about 65 km (40 miles).

We report discrimination and calibration statistics as performance measures for these models.13

Discrimination relates to how well a model can separate—or discriminate between—those who died and those who did not and is given by the area under the receiver operating characteristics (ROC) curve (AUC) or c-statistic. The ROC curve is a plot of the sensitivity (true positive rate) versus 1−specificity (false positive rate) for consecutive predicted risks. A c-statistic of 0.5 is no better than tossing a coin, while a perfect model has a c-statistic of 1. In general, values <0.7 are considered to show poor discrimination, values of 0.7–0.8 can be described as reasonable and values >0.8 suggest good discrimination.11 The 95% CI for the c-statistic was derived using DeLong’s method as implemented in the pROC library12 in R.14

Calibration measures a model’s ability to generate predictions that are, on average, close to the average observed outcome and can be readily seen on a scatter plot (y-axis=observed risk, x-axis=predicted risk). Perfect predictions should be on the 45° line. We internally validated and assessed the calibration for all the models using the bootstrapping approach.15 16 The overall statistical performance was assessed using the scaled Brier score which incorporates both discrimination and calibration.13 The Brier score is the squared difference between actual outcomes and predicted risk of death, scaled by the maximum Brier score such that the scaled Brier score ranges from 0% to 100%. Higher values indicate superior models.

The recommended threshold for detecting deteriorating patients and sepsis is NEWS2 ≥5.17 18 Therefore, we assessed the sensitivity, specificity, positive and negative predictive values and likelihood ratios for these models at NEWS2 threshold of 4+, 5+ and 6+.19 We followed the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines for reporting of model development and validation.20 We used Stata10 for data cleaning and R14 for statistical analysis.

Results

Cohort characteristics

The number of non-elective discharges was 6444 over 3 months. For the development of CARMc19_N, we excluded 36 (0.6%) admissions because the index NEWS2 was not recorded within ±24 hours of the admission date/time, or these data were missing or not recorded at all (online supplemental appendix table S2). Likewise, for the development of CARMc19_NB, we further excluded 1189 (18.3%) of admissions because the first blood test results were not recorded within ±96 hours of the admission date/time, or they were missing or not recorded at all (online supplemental appendix table S2).

The characteristics of the admissions included in our study are shown in table 1. Emergency admissions in the validation dataset were older than those in development dataset (69.6 years vs 67.4 years), less likely to be male (49.5% vs 51.2%), had higher index NEWS2 (3.2 vs 2.8), higher prevalence of COVID-19 (11.0% vs 8.7%) but similar in-hospital mortality (8.4% vs 8.2%). See accompanying scatter plot and box plot in online supplemental appendix figure S1 to S4.

Table 1

Characteristics of emergency medical admissions in development and validation datasets

Figure 1

Receiver operating characteristic curve for computer-aided risk score (CARMc19)_N and CARMc19_NB in predicting the risk of mortality in the development dataset. Predicted probability at National Early Warning Score thresholds 4+ (0.09), 5+ (0.11), 6+ (0.14) (sensitivity, specificity).

We assessed the performance of CARMc19_N and CARMc19_NB models to predict the risk of in-hospital mortality in emergency medical admissions (see table 2 and figure 1 for validation results and online supplemental appendix table S3 and figure S7 for model development results).

Table 2

Performance of CARMc19_N and CARMc19_NB models for predicting the risk of mortality for patients with COVID-19 and patients without COVID-19 in validation dataset

The c-statistics for predicting mortality for CARMc19_NB was slightly higher than model CARMc19_N in development dataset (CARMc19_NB=0.87 (95% CI 0.85 to 0.89) vs CARMc19_N=0.86 (95% CI 0.84 to 0.87)) and the validation dataset (CARMc19_NB=0.88 (95% CI 0.86 to 0.90) vs CARMc19_N=0.86 (95% CI 0.83 to 0.88)).

The c-statistics for predicting mortality for patients with COVID-19 lower than patients without COVID-19 (CARMc19_NB: 0.78 (95% CI 0.71 to 0.84) vs 0.87 (95% CI 0.84 to 0.90); CARMc19_N: 0.75 (95% CI 0.69 to 0.81) vs 0.83 (95% CI 0.79 to 0.86)).

Internal validation of both models is shown in online supplemental appendix figure S6. Both models had good internal and external calibration (CARMc19_NB: 1.01 (95% CI 0.88 vs 1.14) and CARMc19_N: 0.95 (95% CI 0.83 to 1.06)) (see table 2 and figure 2).

Figure 2

External validation of computer-aided risk score (CARMc19)_N and CARMc19_NB models, respectively for predicting the risk of mortality. We limit the risk of mortality to 0.30 for visualisation purpose because beyond this point, we have few patients.

Table 3 includes the sensitivity, specificity, positive and negative predictive values for CARMc19_N and CARMc19_NB models for predicting mortality at NEWS2 threshold of 4+, 5+, 6+. At all NEWS2 thresholds (4+, 5+, 6+), model CARMc19_NB had better sensitivity (development dataset: 76% vs 72%; 71% vs 67%; 65% vs 61% and validation dataset: 79% vs 73%; 75% vs 68%; 69% vs 61%) and similar specificity (development dataset: 81% vs 82%; 86% vs 86%; 89% vs 90% and validation dataset: 80% vs 82%; 85% vs 86%; 88% vs 89%) (table 3 and online supplemental appendix table S4).

Table 3

Sensitivity analysis of CARMc19_N and CARMc19_NB models in validation dataset for predicting the risk of mortality at NEWS2 thresholds 4+ (0.09), 5+ (0.11) and 6+ (0.14) of predicted risk of mortality in development dataset

Discussion

In this study, we developed and validated two (CARMc19_N and CARMc19_NB) models to predict the risk of in-hospital mortality with the following covariates: (1) CARMc19_N uses age+sex+COVID-19 (yes/no)+NEWS2 including subcomponents; (2) CARMc19_NB extends model CARMc19_N with all seven blood test results and AKI score (online supplemental appendix figure S5). We found that CARMc19 scores have good performance chracterstiics and our findings tentatively suggest that a NEWS2 threshold of 5+ appears to strike a reasonable balance between sensitivity and specificity. CARMc19_NB was more sensitive with similar specificity than the CARMc19_N model.

CARMc19 scores performed better than our previous CARM models6 7 because of additional NEWS2 variables (oxygen flow rate and oxygen scale 2) and COVID-19 status. A recent systematic review identified models to predict mortality from COVID-19 with c-statistics that ranged from 0.87 to 1.21 However, despite these high c-statistics, the review authors cautioned against the use of these models in clinical practice because of the high risk of bias and poor reporting of studies which are likely to have led to optimistic results.21 In contrast, our approach follows rigorous methodological standards for the development of risk scores.22–24

The main advantages of our models are that they are designed to incorporate data which are already available in the patient’s electronic health record thus placing no additional data collection or computational burden on clinicians, and are readily automated. Nonetheless, we emphasise that our CARMc19 scores are not designed to replace clinical judgement. They are intended and designed to support, not subvert, the clinical decision-making process and can be always overridden by clinical concern.5 25 The working hypothesis for our models is that they may enhance situational awareness of mortality by processing information already available without impeding the workflow of clinical staff, especially as our approach offers a faster and less expensive assessment of in-hospital mortality risk than current laboratory tests which may be more practical to use for large numbers of people.

There are limitations in relation to our study. We identified COVID-19 based on ICD-10 code ‘U071’, which was determined by COVID-19 swab test results (hospital or community) and clinical judgement and so our findings are constrained by the accuracy of these methods.26 27 This does, however, allow the algorithm to take account of the entry of diagnostic information by the clinician including radiology findings as input variables if the swab result is negative. The systematically lower c-statistics for COVID-19 admissions requires further study. There are several candidate hypotheses which stem from the complex pathology of COVID-19—which can produce an inflammatory response (sepsis), coagulopathy (leading to sudden pulmonary embolism or arterial thrombosis). It is known that NEWS(2) is inadequate in monitoring hospital patients at risk of neurological deterioration, and this may also apply, to some extent, to COVID-19. Also, COVID-19 status could has a longer ‘sell by date’. A PCR test may be positive up to 90 days after the initial infection and may therefore overestimate risk, if the patient is admitted and positive, when the COVID-19 episode is effectively over. Conversely, the physiological and pathological variables are unlikely to reflect the future risk if mortality is secondary to a sudden event such as veno-thromboembolism. COVID-19 diagnosis may also be determined by clinical diagnosis (as well as PCR positive test), whereas the other variables in our models are measurements (also subject to error, but less so than a diagnostic category).

We used the index NEWS2 data in our models, but vital signs and blood test results are repeatedly updated for each patient according to hospital protocols. Although we developed models using one hospital’s data and validated into another hospital’s data, the extent to which changes in vital signs over time reflect changes in mortality risk need to be incorporated in our models requires further study. Our two hospitals are part of the same NHS Trust and this may undermine the generalisability of our findings, which merit further external validation.

Although we focused on in-hospital mortality (because we aimed to aid clinical decision making in the hospital), the impact of this selection bias needs to be assessed by capturing out-of-hospital mortality by linking death certification data and hospital data. CARMc19, like other risk scores, can only be an aid to the decision-making process of clinical teams11 28 and its usefulness in clinical practice remains to be seen.

The next phase of this work is to field test CARMc19 scores by carefully engineering it into routine clinical practice to see if it does enhance the quality of care for acutely ill patients, while noting any unintended consequences.

Conclusion

We developed a validated a risk predictor (CARMc19 score) with good performance characteristics for predicting the risk of in-hospital mortality following an emergency medical admission during the pandemic where a significant proportion of the patient cohort was presenting with COVID-19 disease. Since the presentation of the CARMc19 scores to the clinician’s caring for the patient placed no additional data collection burden on clinicians and is readily automated, it was carefully introduced to the electronic care record for clinicians caring for patients with COVID-19 in the hospital during the second phase of the pandemic.

Data availability statement

Data may be obtained from a third party and are not publicly available. Our data sharing agreement is with York Hospital and does not permit us to share the data used in this paper.

Ethics statements

Patient consent for publication

Ethics approval

This study was deemed to be exempt from ethical approval because it was classified as an evaluation. Furthermore, this study used already de-identified data from an ongoing study involving NEWS, which received ethical approval from Health Research Authority (HRA) and Health and Care Research Wales (HCRW) (reference number 19/HRA/0548).

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @dr_m_faisal, @dzrichar

  • Contributors DR and MAM had the original idea for the work. KB, RH provided the data extracts. MF undertook the statistical analyses with support from MAM. MF, MAM, and DR wrote the first draft of the paper. DR provided clinical perspectives. All others contributed to the final paper and have approved the final version. DR & MF will act as study guarantors.

  • Funding This research was supported by the Health Foundation. The Health Foundation is an independent charity working to improve the quality of healthcare in the UK. This research was also supported by the National Institute for Health Research (NIHR) Yorkshire and Humber Patient Safety Translational Research Centre (NIHR Yorkshire and Humber PSTRC).

  • Disclaimer The views expressed in this article are those of the author(s) and not necessarily those of the NHS, the Health Foundation, the NIHR or the Department of Health.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.