Objectives There are no established mortality risk equations specifically for emergency medical patients who are admitted to a general hospital ward. Such risk equations may be useful in supporting the clinical decision-making process. We aim to develop and externally validate a computer-aided risk of mortality (CARM) score by combining the first electronically recorded vital signs and blood test results for emergency medical admissions.
Design Logistic regression model development and external validation study.
Setting Two acute hospitals (Northern Lincolnshire and Goole NHS Foundation Trust Hospital (NH)—model development data; York Hospital (YH)—external validation data).
Participants Adult (aged ≥16 years) medical admissions discharged over a 24-month period with electronic National Early Warning Score(s) and blood test results recorded on admission.
Results The risk of in-hospital mortality following emergency medical admission was 5.7% (NH: 1766/30 996) and 6.5% (YH: 1703/26 247). The C-statistic for the CARM score in NH was 0.87 (95% CI 0.86 to 0.88) and was similar in an external hospital setting YH (0.86, 95% CI 0.85 to 0.87) and the calibration slope included 1 (0.97, 95% CI 0.94 to 1.00).
Conclusions We have developed a novel, externally validated CARM score with good performance characteristics for estimating the risk of in-hospital mortality following an emergency medical admission using the patient’s first, electronically recorded, vital signs and blood test results. Since the CARM score places no additional data collection burden on clinicians and is readily automated, it may now be carefully introduced and evaluated in hospitals with sufficient informatics infrastructure.
- computer aided risk score
- hospital mortality
- vital signs and blood test
- national early warning score
- emergency admission
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
- computer aided risk score
- hospital mortality
- vital signs and blood test
- national early warning score
- emergency admission
Strengths and limitations of this study
This study provides a novel computer-aided risk of mortality (CARM) score by combining the first electronically recorded vital signs and blood test results for emergency medical admissions.
CARM is externally validated and places no additional data collection burden on clinicians and is readily automated.
About 20%–30% of admissions do not have both National Early Warning Score(s) and blood test results and so CARM is not applicable to these admissions.
Unplanned or emergency medical admissions to hospital involve patients with a broad spectrum disease and illness severity.1 The appropriate early assessment and management of such admissions can be a critical factor in ensuring high-quality care.2 A number of scoring systems have been developed which may support this clinical decision-making process, but few have been externally validated.1 We propose to develop a computer-aided risk of in-hospital mortality score, following emergency medical admission that automatically combines two routinely collected, electronically recorded, clinical datasets—vital signs and blood test results. There is some evidence to suggest that the results of routinely undertaken blood tests and/or vital signs data may be useful in predicting the risk of death.1
In the UK National Health Service (NHS), the patient’s vital signs are monitored and summarised into a National Early Warning Score(s) (NEWS) that is mandated by the Royal College of Physicians (London).3 NEWS is derived from six physiological variables or vital signs—respiration rate, oxygen saturations, any supplemental oxygen, temperature, systolic blood pressure, heart rate and level of consciousness (alert, voice, pain, unresponsive)—which are routinely collected by nursing staff as an integral part of the process of care, usually for all patients, and then repeated thereafter depending on local hospital protocols.3 The use of NEWS is relevant because ‘patients die not from their disease but from the disordered physiology caused by the disease’.4 NEWS points are allocated according to basic clinical observations and the higher the NEWS the more likely it is that the patient is developing a critical illness (see online supplementary material for further details of the NEWS). The clinical rationale for NEWS is that early recognition of deterioration in the vital signs of a patient can provide opportunities for earlier, more effective intervention. Furthermore, studies have shown that electronically collected NEWS are highly reliable and accurate when compared with paper-based methods.5–8
Blood tests are an integral part of clinical medicine, and are routinely undertaken during a patient’s stay in hospital. Typically, routine blood tests consist of a core list of seven biochemical and haematological tests (albumin, creatinine, potassium, sodium, urea, haemoglobin, white blood cell count) and, in the absence of contraindications and subject to patient consent, almost all patients admitted to hospital undergo these tests on admission. Furthermore, in the UK NHS creatinine blood test results are now used to identify patients at risk of acute kidney injury (AKI),9 which is an important cause of avoidable patient harm.10
In this paper, we investigate the extent to which the vital signs and blood test results of acutely ill patients can be used to predict the risk of in-hospital mortality following emergency admission to hospital. Our aim is to develop and validate an automated, computer-aided risk of mortality (CARM) model, using the patient’s first, electronically recorded, vital signs and blood test results, which are usually available within a few hours of emergency admission without requiring any additional data items or prompts from clinicians. CARM, therefore, is designed for use in hospitals with sufficient informatics infrastructure.
Setting and data
Our cohorts of emergency medical admissions are from three acute hospitals which are approximately 100 km apart in the Yorkshire and Humberside region of England—the Diana, Princess of Wales Hospital (n~400 beds) and Scunthorpe General Hospital (n~400 beds) managed by the Northern Lincolnshire and Goole NHS Foundation Trust (NLAG) and York Hospital (YH) (n~700 beds) (managed by York Teaching Hospitals NHS Foundation Trust). The data from the two acute hospitals from NLAG are combined because this reflects how the hospitals are managed and are referred to as NLAG Hospitals (NH), which essentially places our study in two acute hospitals. Our study hospitals (NH and YH) have been exclusively using electronic NEWS scoring since at least 2013 as part of their in-house electronic patient record systems. We chose these hospitals because they had electronic NEWS, which are collected as part of the patient’s process of care and were agreeable to the study. We did not approach any other hospital.
We considered all adult (aged ≥16 years) emergency medical admissions, discharged during a 24-month period (1 January 2014 to 31 December 2015), with blood test results and NEWS. For each admission, we obtained a pseudonymised patient identifier, the patient’s age (years), sex (male/female), discharge status (alive/dead), admission and discharge date and time and electronic NEWS. The NEWS ranged from 0 (indicating the lowest severity of illness) to 19 (the maximum NEWS value possible is 20). The admission/discharge date and electronically recorded NEWS are date and time stamped and the index NEWS was defined as the first electronically recorded score within ±24 hours of the admission time. The first blood test results were defined as the first full set of blood test results recorded within 4 days (96 hours) of admission (>90% of blood test results were within ±24 hours of admission—see online supplementary table S1).
For model development purposes, we were unable to consider emergency admissions without complete blood test results and NEWS recorded—this constituted 16.5% (6104/37 100) of records in NH and 28.6% (10 504/36 751) of records in YH. We excluded records for the following reasons: (1) records where the first NEWS was after 24 hours of admission and/or (2) where the first blood test was after 4 days of admission because these ‘delayed’ data were considered less likely to reflect the sickness profile of patients on admission. Moreover, the time from admission to first blood test results was usually several hours earlier than the actual time of admission because blood tests can be ordered in the emergency department before formal admission (see online supplementary figure S1).
Development of a CARM score
We began with exploratory analyses including line plots and box plots that showed the relationship between covariates and risk of in-hospital death in our hospitals. We developed a logistic regression model, known as CARM, to predict the risk of in-hospital death with the following covariates: age (years), sex (male/female), NEWS (including its components, plus diastolic blood pressure, as separate covariates), blood test results (albumin, creatinine, haemoglobin, potassium, sodium, urea and white cell count) and AKI score. The primary rationale for using these variables is that they are routinely collected as part of process of care and their inclusion in our statistical models is on clinical grounds as opposed to the statistical significance of any given covariate. The widespread use of these variables in routine clinical care means that our model is more likely to be generalisable to other settings.
We used the qladder function (Stata, StatCorp, 2014), which displays the quantiles of transformed variable against the quantiles of a normal distribution according to the ladder powers for each variable continious covariate and chose the following transformations: (creatinine)−1/2, loge(potassium), loge(white cell count), loge(urea), loge(respiratory rate), loge(pulse rate), loge(systolic blood pressure) and loge(diastolic blood pressure). We used an automated approach to search for all two-way interactions and incorporated those interactions which were statistically significant (p<0.001) implemented in the MASS library11 in R.12
We developed the CARM model to predict the risk of in-hospital mortality following emergency medical admission using data from NH (the development dataset) and we externally validated this model, reporting discrimination and calibration characteristics,13 using data from another hospital (YH) (the external validation dataset). The data from YH are not used for model development but as an external validation dataset only. We internally validated the CARM using a bootstrapping method that is implemented in the rms library14 in R to estimate statistical optimism.13 14
Discrimination relates to how well a model can separate (or discriminate between), those who died and those who did not. Calibration measures a model’s ability to generate predictions that are on average close to the average observed outcome. Overall statistical performance was assessed using the scaled Brier score, which incorporates both discrimination and calibration.13 The Brier score is the squared difference between actual outcomes and predicted risk of death, scaled by the maximum Brier score such that the scaled Brier score ranges from 0% to 100%. Interpretation of the scaled Brier score is similar to R2. Higher values indicate superior models. Calibration is the relationship between the observed and predicted risk of death and can be readily seen on a scatter plot (y-axis observed risk, x-axis predicted risk). Perfect predictions should be on the 45° line. The intercept (a) and slope (b) of this line gives an assessment of ‘calibration-in-the-large’.15 At model development, a=0 and b=1, but at validation, calibration-in-the-large problems are indicated if a is not 0 and if b is more/less than 1 as this reflects problems of under/over prediction.16
The concordance statistic (C-statistic) is a commonly used measure of discrimination. For a binary outcome, the C-statistic is the area under the receiver operating characteristics (ROC) curve. The ROC curve is a plot of the sensitivity (true positive rate) versus 1−specificty (false positive rate), for consecutive predicted risks.13 The area under the ROC curve is interpreted as the probability that a deceased patient has a higher predicted risk of death than a randomly chosen non-deceased patient. A C-statistic of 0.5 is no better than tossing a coin, while a perfect model has a C-statistic of 1. The higher the C-statistic, the better the model. In general, values <0.7 are considered to show poor discrimination, values of 0.7–0.8 can be described as reasonable and values >0.8 suggest good discrimination.17 The 95% CI for the C-statistic was derived using DeLong’s method as implemented in the pROC library18 in R.12 Box plots showing the risk of death for those discharged alive and dead are a simple way to visualise the discrimination of each model. The difference in the mean predicted risk of death for those who were discharged alive and dead is a measure of the discrimination slope. The higher the slope, the better the discrimination.13 We followed the TRIPOD guidelines for model development and validation.19 All analyses were carried using R12 and Stata.
Patient and public involvement
A workshop with a patient and service user group, linked to the University of Bradford, was involved at the start of this project to co-design the agenda for the patient and staff focus groups, which were subsequently held at each hospital site. Patients were invited to attend the patient focus group through existing patient and public involvement groups. The criteria used for recruitment to these focus group was any member of the public who had been a patient or carer in the last 5 years. The patient and public voice continued to be included throughout the project with three patient representatives invited to sit on the project steering group. Participants will be informed of the results of this study through the patient and public involvement leads at each hospital site and the project team have met with the Bradford patient and service user group to discuss the results.
We considered emergency medical admissions in each hospital (NH: n=37 100, YH: n=36 751) over the 24-month period. Of these, 16.5% (6104/37 100) in NH and 28.6% (10 504/36 751) in YH were not eligible for our study because they did not have NEWS recorded within ±24 hours of admission and/or full complement of blood test results within ±96 hours of admission (see table 1, online supplementary table S1 and figure S1). At YH, 24.2% of records were excluded because no or incomplete blood test results were recorded compared with only 10% in NH. Exclusions due to lack of NEWS data were less marked between YH and NH (see online supplementary table S2 for characteristic of emergency admissions with incomplete data).
The in-hospital mortality was 5.7% (1766/30 996) in NH and 6.5% (1703/26 247) in YH. The age, sex, NEWS and blood test results profile is shown table 2. Admissions in YH were older, with higher NEWS, higher AKI scores (AKI stage 3 is more common than stage 2 in YH) but higher albumin blood test results than NH. YH has a renal unit whereas NH does not.
Online supplementary figures S2 to S5 show box plots and line plots for each continuous (untransformed) covariate that was included in the CARM model for NH and YH, respectively. The box plots (see online supplementary figures S2 and S3) show a similar pattern in each hospital. Compared with patients discharged alive, the deceased patients were aged older, with lower albumin, haemoglobin and sodium values and higher creatinine, potassium, white cell count and urea values. NEWS was higher in deceased patients compared with patients discharged alive, as respiratory rate and pulse rate were higher in deceased patients. However, the temperature, blood pressure and oxygen saturation were lower in deceased patients. The line plots in online supplementary figures S4 and S5 show that the relationship between a given continuous covariate and the risk of death is similar in each hospital.
Statistical modelling of CARM
We assessed the performance of the CARM model to predict the risk of in-hospital mortality. The model coefficients in logit scale with examples are shown in online supplementary table S3. Table 3 shows the performance of the model in the development and validation dataset. Figure 1 shows the ROC plots of CARM in the development and validation datasets (see online supplementary figure S6 for ROC plots comparing CARM vs NEWS). The C-statistic was high in the development dataset 0.87 (95% CI 0.86 to 0.88) and the external validation dataset 0.86 (95% CI 0.85 to 0.87). Likewise, the scaled Brier score and discrimination were similar in the development and external validation datasets. The calibration slope is 0.97 (95% CI 0.94 to 1.00), which is good (see online supplementary figure S7). The final CARM model, which is not intended for paper-based use, is shown in the online supplementary figure S7).
We excluded 10.0% (NH) and 24.2% (YH) of emergency admissions from the development and validation dataset, respectively, because they had no or incomplete set of blood test results reported. We examined the performance of the CARM model in these excluded records by first imputing age and sex-specific median blood test results, and then applying the CARM model to these admissions only. The last column in table 3 shows the subsequent C-statistics in these imputed records only. The C-statistics for these imputed records were not markedly different in the development and validation dataset (see online supplementary figure S8 for corresponding ROC plots).
Table 4 shows the sensitivity, specificity and positive and negative predictive values along with likelihood ratio (LR+/LR−) for a selected range of cut-off values for the risk of dying, which tentatively suggests that a threshold risk of 8% provides a reasonable balance between sensitivity (around 70%) and specificity (>80% in development and validation datasets—see table 4 and online supplementary figure S9). Furthermore, the CARM model performance is good in each hospital in various subgroups such as by sex, age, seasons, longer versus shorter length of stay admissions, day of the week and 16 Charlson Comorbidity Index (CCI) disease groups (see online supplementary table S4).
We have shown that it is feasible to use the first electronically recorded vital signs and blood test results of an emergency medical patient to predict the risk of in-hospital mortality following emergency medical admission. We developed our CARM model in one hospital and externally validated in data from another hospital. We found that CARM has good performance and our findings tentatively suggest that a cut-off of 8% predicted risk of in-hospital mortality death appears to strike a reasonable balance between sensitivity and specificity.
While several previous studies1 have used blood test results20–27 or patient physiology28 29 to predict the risk of in-hospital mortality, few studies have combined these two data sources2 30–32 and even fewer reported external validation.1 Our study is based on data from two different hospitals with material differences in recording of blood test results but still yielding similar performance of CARM. This suggests that our approach, which merits further study, may be generalisable to other UK NHS hospitals with electronically recorded blood test results and NEWS, especially as the use of NEWS in the UK NHS is mandated and that our approach does not rely on reference ranges from blood tests which can vary between hospitals. Indeed, a recent paper with sepsis as the outcome variable also showed promising results by combining the first blood test results and NEWS.33
There are a number of limitations in our study. There appears to be a systematic difference in the prevalence of oxygen supplementation in the development and validation datasets, which may warrant further investigation. However, the prevalence ratios (dead/alive) are similar in both groups (2.77 and 3.29 for NH and YH, respectively) and therefore this should have no significant detrimental effect on the validity of our model. Although we focused on in-hospital mortality (because we aimed to aid clinical decision making in the hospital), the impact of this selection bias needs to be assessed by capturing out-of-hospital mortality by linking death certification data and hospital data. CARM, like other risk scores, can only be an aid to the decision-making process of clinical teams1 17 and its usefulness in clinical practice remains to be seen. We found that up to about quarter of emergency medical admissions had no (or an incomplete set of) recorded blood test results for whom we tested a simple median imputation strategy without knowing why such data were missing. We found that the performance of CARM did not materially deteriorate in these admissions. We do not suggest that our imputation method is an optimal imputation strategy. Rather, we offer it as a simple, pragmatic, preliminary imputation strategy, which is akin to the AKI detection algorithm which also imputes the median creatinine value where required.34 Further work on how to optimally address the issue of missing data is required. We did not undertake an imputation exercise for patients with no recorded NEWS because they constituted a much smaller proportion of missing data (<5%), and NEWS is not recommended in patients requiring immediate resuscitation, direct admission to intensive care, and patients with end-stage renal failure or with acute intracranial conditions.35 We have used the first set of electronically recorded vital signs and blood test results to develop CARM, but updating CARM scores in real-time when new data become available is likely to be important to clinical teams and so warrants further study. Finally, our external validation was undertaken by the same research team in a similar context of the NHS. Further external validation by different research teams in different settings would be useful.
We have designed CARM to be used in hospitals with sufficient informatics infrastructure (eg, electronic health records).36 37 CARM is not targeting specific emergency medical patients only. Rather, we are seeking to raise situational awareness of the risk of death in-hospital as early as possible, without requiring any additional data items or prompts from clinicians. While we have demonstrated that CARM has potential, we have yet to test its use in routine clinical practice. This is important because we need to demonstrate that CARM does more ‘good’ than ‘harm’ in practice.36 37 For example, while routine blood tests are not indicated in a considerable number of emergency medical admissions, it is nevertheless possible that for a given patient, some clinicians (eg, less experienced) may be tempted to order routine blood tests so that they can obtain a CARM score to support their clinical decision-making process. So, the next phase of this work is to field test CARM by carefully engineering it into routine clinical practice to see if it does enhance the quality of care for acutely ill patients, while noting any unintended consequences.
We have developed a novel, externally validated CARM model, with good performance for estimating the risk of in-hospital mortality following emergency medical admission using the patient’s first, electronically recorded, vital signs and blood test results. Since CARM places no additional data collection burden on clinicians and is readily automated, it may now be carefully introduced and evaluated in hospitals with electronic health records.
Contributors MAM and DR had the original idea for this work. NJ was overall study coordinator with JeD as local NLAG coordinator. MF, AJS and MAM undertook the statistical analyses. JuD, CM and NJ are leads for qualitative studies. RH and KB extracted the necessary data frames. DR, MM and KS gave a clinical perspective. MAM and MF wrote the first draft of this paper and all authors subsequently assisted in redrafting and have approved the final version. MAM will act as guarantor.
Funding This research was supported by the Health Foundation. The Health Foundation is an independent charity working to improve the quality of healthcare in the UK. This research was supported by the National Institute for Health Research (NIHR) Yorkshire and Humberside Patient Safety Translational Research Centre (NIHR YHPSTRC).
Disclaimer The views expressed in this article are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
Competing interests None declared.
Patient consent Not required.
Ethics approval This study received ethical approval from The Yorkshire & Humberside Leeds West Research Ethics Committee on 17 September 2015 (ref. 173753), with NHS management permissions received January 2016.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Our data sharing agreement with the two hospitals (York hospital & NLAG hospital) does not permit us to share this data with other parties. Nonetheless if anyone is interested in the data, then they should contact the R&D offices at each hospital in the first instance.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.