Objectives To compare the performance of a validated automatic computer-aided risk of mortality (CARM) score versus medical judgement in predicting the risk of in-hospital mortality for patients following emergency medical admission.
Design A prospective study.
Setting Consecutive emergency medical admissions in York hospital.
Participants Elderly medical admissions in one ward were assigned a risk of death at the first post-take ward round by consultant staff over a 2-week period. The consultant medical staff used the same variables to assign a risk of death to the patient as the CARM (age, sex, National Early Warning Score and blood test results) but also had access to the clinical history, examination findings and any immediately available investigations such as ECGs. The performance of the CARM versus consultant medical judgement was compared using the c-statistic and the positive predictive value (PPV).
Results The in-hospital mortality was 31.8% (130/409). For patients with complete blood test results, the c-statistic for CARM was 0.75 (95% CI: 0.69 to 0.81) versus 0.72 (95% CI: 0.66 to 0.78) for medical judgements (p=0.28). For patients with at least one missing blood test result, the c-statistics were similar (medical judgements 0.70 (95% CI: 0.60 to 0.81) vs CARM 0.70 (95% CI: 0.59 to 0.80)). At a 10% mortality risk, the PPV for CARM was higher than medical judgements in patients with complete blood test results, 62.0% (95% CI: 51.2 to 71.9) versus 49.2% (95% CI: 39.8 to 58.5) but not when blood test results were missing, 50.0% (95% CI: 24.7 to 75.3) versus 53.3% (95% CI: 34.3 to 71.7).
Conclusions CARM is comparable with medical judgements in discriminating in-hospital mortality following emergency admission to an elderly care ward. CARM may have a promising role in supporting medical judgements in determining the patient’s risk of death in hospital. Further evaluation of CARM in routine practice is required.
- computer-aided risk score
- medical judgement
- emergency medical admission
This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
This study compares a novel computer-aided risk of mortality (CARM) score versus medical judgement in predicting the risk of in-hospital mortality.
Consecutive emergency admissions to an elderly care ward in one hospital were assigned a risk of death at the first post-take ward round by consultant staff.
The consultant medical staff used the same variables to assign a risk of death to the patient as the CARM (age, sex, National Early Warning Score and blood test results) but also had access to the clinical history, examination findings and any immediately available investigations such as ECGs.
For a one-fourth of admissions with one or more blood test, missing CARM was similar to medical judgement with imputed blood test results.
Over the past few decades, numerous scoring systems have been developed to estimate the risk of mortality in hospital settings including intensive care medicine emergency medicine1 and to a lesser extent general medical wards.2 Despite the preponderance of scoring systems, systematic reviews2 have highlighted a lack robust evaluation of risk scoring systems and only a few studies3–5 have assessed their accuracy versus medical judgements in routine clinical settings. This is important because if the risk score is found not to perform well when compared with medical judgements, this would call into question the benefit of using the score in routine clinical practice. In a review of 12 studies in intensive care, Sinuff et al 6 found that physicians were better able to discriminate between survivors and non-survivors than scoring systems in the first 24 hours of admission. However, one of their included studies4 found that for patients at the extremes of risk of deterioration, clinicians outperformed scoring systems when assessing these patients but when assessing the ‘in-between’ group of patients, scoring systems were better than clinical judgement.4
We recently developed a computer-aided risk of mortality (CARM) score, which combines age, sex, vital signs (based on National Early Warning Score (NEWS)7) and seven blood test results for emergency medical admissions.8 A key design feature of CARM is that it uses data which is already collected as part of the process of care and so places no additional data collection burden on clinicians. Furthermore, CARM is intended for computerised implementation and is not suited to pencil and paper methods because the underlying equation is not simple8 as it involves 22 covariates with and without transformations and interaction effects. Nonetheless, it is important to note that CARM is intended to support, not displace, clinical judgement but the extent to which it can support the clinical decision-making process in practice remains unknown. So, as part of the ongoing evaluation of CARM, we set out to compare the performance of CARM versus medical judgements in estimating the risk of in-hospital mortality in consecutive emergency admissions to elderly care wards in one hospital over a 2-week period.
Setting and data
Our cohort of elderly medical admissions is from York Hospital (managed by York Teaching Hospitals National Health service (NHS) Foundation Trust) which has approximately 700 beds. It has been exclusively using electronic NEWS scoring since 2013 as part of their in-house electronic patient record systems. Consecutive admissions to an elderly care medical admissions ward in this hospital were assigned a risk of death at the first post-take ward round by consultant medical staff over a 2-week period (05 February 2017 to 20 February 2017). The consultant medical staff used the same variables to assign a risk of death to the patient as the CARM (age, sex, NEWS and blood test results)8 but also had access to the clinical history, examination findings and any immediately available investigations such as ECGs. Both CARM and medical judgements had access to the same physiological and pathological variables. The medical staff did not have access to the CARM score during the data collection exercise. For each admission, we obtained the patient’s age, sex (male/female), admission and discharge date and time, acute kidney injury (AKI) score, electronic NEWS (including its subcomponent vital signs data) and seven blood test results (albumin, creatinine, haemoglobin, potassium, sodium, urea and white cell count), although not all patients have all seven blood tests. To derive a CARM score for patients with missing blood test results, we imputed population-based age–sex median values. The reason for missing blood tests was that they were not ordered by the medical staff.
The performance of CARM versus medical judgement was assessed by comparing risk estimates using boxplots. The discrimination of CARM and medical judgements was quantified by the area under the receiver-operating characteristic (ROC) curve or c-statistic.9 In general, values less than 0.7 are considered to show poor discrimination, values of 0.7–0.8 can be described as reasonable, and values above 0.8 suggest good discrimination.10 We compared the c-statistic for CARM and medical judgement using DeLong’s test.11
We determined the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV), and positive and negative likelihood ratios for CARM and compared this with medical judgement scores using probability thresholds from a NEWS only model for NEWS scores from 1 to 5. The cut-off of NEWS at 5 is the recommended threshold for escalation of care.12 13 We have also reported the geometric mean of sensitivity and specificity.14
All analyses were undertaken in STATA15 and R16 using rms 17 and pROC 18 packages.
This study received ethical approval from The Yorkshire & Humberside Leeds West Research Ethics Committee on 17 September 2015 (ref. 173753) with NHS management permissions received January 2016.
Patient and public involvement
A workshop with a patient and service user group, linked to the University of Bradford, was involved at the start of this project to co-design the agenda for the patient and staff focus groups which were subsequently held at each hospital site. Patients were invited to attend the patient focus group through existing patient and public involvement groups. The criterion used for recruitment to these focus groups was any member of the public who had been a patient or carer in the last 5 years. The patient and public voice continued to be included throughout the project with three patient representatives invited to sit on the project steering group. Participants will be informed of the results of this study through the patient and public involvement leads at each hospital site and the project team have met with the Bradford Patient and Service User Group to discuss the results.
The study involved 409 emergency medical admissions to the elderly care wards in York Hospital. Of these, 300 (73.3%) had a full set of blood test and 109 (26.7%) had at least one blood test result missing (table 1). The most frequent missing blood test was albumin (n=96).
The in-hospital mortality was 31.8% (130/409). The age, sex, NEWS and blood test results profile are shown in table 2. Compared with patients discharged alive, deceased patients were aged older, with lower albumin, haemoglobin and sodium values, and higher creatinine, potassium, white cell count and urea values. NEWS was higher in deceased patients compared with patients discharged alive, as were respiratory rate and pulse rate values. The temperature, blood pressure and oxygen saturation values were lower in deceased patients. Where blood test results were missing, we imputed the age– sex population median value which appeared to give more reasonable values for patients discharged alive than those who died (see imputed values in table 2 comparing imputed values with observed values). For example, the observed mean (n=313) for albumin is 36.7 for survivors versus 33.6 for non-survivors. However, the imputed means for albumin (n=96) were 36.8 for survivors and 36.7 for non-survivors.
Comparison of CARM versus medical judgement
Figure 1 shows the estimated risk of in-hospital mortality using CARM versus medical judgements for patients who discharged alive and deceased. The mean estimated risk of in-hospital mortality for patients discharged alive was lower with CARM (0.07 SD=0.07) versus medical judgements (0.12 SD=0.14). Likewise, for decreased patients, the risk estimates from CARM (0.16 SD=0.16) were lower than estimates from medical judgements (0.26 SD=0.25) (see table 2).
Figure 2 shows the ROC curve. The area under the ROC curve (c-statistic) was higher for CARM 0.75 (95% CI: 0.69 to 0.81) than for medical judgement 0.72 (95% CI: 0.66 to 0.78) and were not statistically significant (p value=0.28). The area under the ROC curve was similar for admissions with at least one blood test result missing (see table 3).
Table 4 shows the sensitivity, specificity, PPV and NPV for a selected range of NEWS values. For patients with complete blood test results (n=300), NEWS at 5 (the recommended escalation threshold), which is equivalent to a 10% risk of in-hospital mortality, medical judgement had a higher sensitivity 59.2% (95% CI: 48.8 to 69.0) versus 58.2% (95% CI: 47.8 to 68.1), lower specificity 70.3% (95% CI: 63.5 to 76.5) versus 82.7% (95% CI: 76.7 to 87.6), lower PPVs 49.2% (95% CI: 39.8 to 58.5) versus 62.0% (95% CI: 51.2 to 71.9) and a lower positive likelihood ratio (2 vs 3.4) than the CARM score.
For patients with at least one imputed blood test result (n=109), at a NEWS of 5 medical judgement had a higher sensitivity 50.0% (95% CI: 31.9 to 68.1) versus 25.0% (95% CI: 11.5 to 43.4), lower specificity 81.8% (95% CI: 71.4 to 89.7) versus 89.6% (95% CI: 80.6 to 95.4), higher PPVs 53.3% (95% CI: 34.3 to 71.7) versus 50.0% (95% CI: 24.7 to 75.3) and higher positive likelihood ratios (2.8 vs 2.4).
In this study, we assessed the accuracy of CARM versus medical judgements in consecutive emergency admissions to the elderly care ward over a 2-week period. We found for patients with complete blood test results, the c-statistic for CARM was 0.75 versus 0.72 for medical judgements (p=0.28). For patients with at least one missing blood test result, the c-statistics were lower but still similar (medical judgements 0.70 vs CARM 0.70). At a 10% mortality risk, the PPV for CARM was higher than medical judgements in patients with complete blood test results (62.0% vs 49.2%) but not when blood test results were missing (50.0% vs 53.3%).
Overall, when comparing CARM with medical judgements, no significant differences in area under the curve were found. These findings are remarkable because, unlike medical judgements, CARM relies exclusively on routinely collected data based primarily on the patients’ age, vital signs and blood test results without having any disease labels or clinical history. Furthermore, where blood tests are being imputed, CARS and medical judgements are less able to discriminate mortality. While this is to be expected for CARM because we use a population median imputation strategy, which is biased towards survivors, the reasons for lower c-statistics for medical judgements are less clear. It would suggest that these patients (with one or more missing blood test results) are more challenging to assess for the medical staff although the underlying reasons are not clear.
Our findings are in line with other studies, which also found no significant differences between ROC curves for Acute Physiology and Chronic Health Evaluation (APACHEII) score and clinical staff.19 However, a study reported that the clinical assessment had an overall accuracy of 95.2% versus 90.9% for APACHE2.3 Other studies have also failed to show an advantage for the APACHE2 model when compared with medical judgements by the clinicians.4 5 20 Another study found that physicians were significantly better in predicting outcome in a medical intensive care unit than APACHE.21 One study concluded that physicians' clinical judgement could differ from scoring systems enough to account for large differences in expected outcomes.20
It is important to note that we have designed CARM to support the medical decision-making process, not replace it, without placing any additional data collection burden on staff. The CARM risk prediction can also be made available as soon as the physiological observations and blood test results are available and prior to the consultant review which may be of assistance to more junior staff. CARM was developed using all adult non-elective medical and elderly care admissions to in one hospital and externally validated in another hospital.8
The overall mortality was 5% in the study population in which the CARM risk predictor was developed. The overall mortality in this patient cohort is high and it is worth noting that patients had already been streamed (selected) as requiring in-patient admission as direct admission from GP or via the emergency department. Thus, the pre-test probability of mortality is different to original study population; yet, the CARM risk predictor still performs reasonably well in this population.
Our study has several limitations. This study provides a snapshot of the use of CARM in a hospital over a short period and the extent to which our findings generalise to patients over a longer time period and to other wards and hospitals require further study. Although CARM is designed to be automated, we note that for 26% of patients were unable to derive the CARM score because of no or incomplete blood test results and the most frequent missing blood test result was albumin. Although we adopted a median imputation strategy, the extent to which this is acceptable in routine clinical practice remains unknown especially as this imputation strategy is biased towards survivors and so will underestimate the true risk of dying for those who are likely to die. So further study is required to understand the issue of missing blood test results and how to address it in routine clinical practice. One possibility is that there may be an unintended increase in the use of blood test results in patients where blood test would not ordinarily be undertaken to simply provide a CARM score. Crucially, how the medical decision-making process is modified by the availably or CARM and the extent to which it enhances situational awareness and subsequently enhances the quality of care without adverse unintended consequences remains to be seen.
CARM is comparable with medical judgements in predicting in-hospital mortality following emergency admission to an elderly care ward. CARM may have a promising role in supporting medical judgements in determining the patient’s risk of death in hospital. Further evaluation of CARM in routine practice is required.
Contributors MM and DR had the original idea for this work. MF undertook the statistical analyses with guidance from AS and MAM. DR gave a clinical perspective. MF and BK wrote the first draft of this paper. SI, RD, DH, AC, JA, RH, SK, GM, KG and MH contributed to data collection and all authors subsequently assisted in redrafting and have approved the final version. MM will act as guarantor.
Funding This research was supported by the Health Foundation. The Health Foundation is an independent charity working to improve the quality of health care in the UK. This research was supported by the National Institute for Health Research (NIHR) Yorkshire and Humberside Patient Safety Translational Research Centre (NIHR YHPSTRC).
Disclaimer The views expressed in this article are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Our data sharing agreement with the York hospital does not permit us to share this data with other parties. Nonetheless, if anyone is interested in the data, then they should contact the R&D offices at York hospital in the first instance.
Patient consent for publication Not required.