Objectives Predictive algorithms to inform risk management decisions are needed for patients with COVID-19, although the traditional risk scores have not been adequately assessed in Asian patients. We aimed to evaluate the performance of a COVID-19-specific prediction model, the 4C (Coronavirus Clinical Characterisation Consortium) Mortality Score, along with other conventional critical care risk models in Japanese nationwide registry data.
Design Retrospective cohort study.
Setting and participants Hospitalised patients with COVID-19 and cardiovascular disease or coronary risk factors from January to May 2020 in 49 hospitals in Japan.
Main outcome measures Two different types of outcomes, in-hospital mortality and a composite outcome, defined as the need for invasive mechanical ventilation and mortality.
Results The risk scores for 693 patients were tested by predicting in-hospital mortality for all patients and composite endpoint among those not intubated at baseline (n=659). The number of events was 108 (15.6%) for mortality and 178 (27.0%) for composite endpoints. After missing values were multiply imputed, the performance of the 4C Mortality Score was assessed and compared with three prediction models that have shown good discriminatory ability (RISE UP score, A-DROP score and the Rapid Emergency Medicine Score (REMS)). The area under the receiver operating characteristic curve (AUC) for the 4C Mortality Score was 0.84 (95% CI 0.80 to 0.88) for in-hospital mortality and 0.78 (95% CI 0.74 to 0.81) for the composite endpoint. It showed greater discriminatory ability compared with other scores, except for the RISE UP score, for predicting in-hospital mortality (AUC: 0.82, 95% CI 0.78 to 0.86). Similarly, the 4C Mortality Score showed a positive net reclassification improvement index over the A-DROP and REMS for mortality and over all three scores for the composite endpoint. The 4C Mortality Score model showed good calibration, regardless of outcome.
Conclusions The 4C Mortality Score performed well in an independent external COVID-19 cohort and may enable appropriate disposition of patients and allocation of medical resources.
Trial registration number UMIN000040598.
- respiratory infections
Data availability statement
Data are available upon reasonable request. The data used in the present study are available from the corresponding author and SK upon reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
In order to compensate for the original research where the 4C (Coronavirus Clinical Characterisation Consortium) Mortality Score was derived and tested only in the cohort of the same nation, the Japanese nationwide cohort of COVID-19 was used for the external validation of the score.
The precision of the calculated scores was considerably high because the registry data form in this study was built with reference to ISARIC, where the 4C Mortality Score was originally developed.
The study focused not only on in-hospital mortality but also on a composite outcome of the need for invasive mechanical ventilation and mortality, which has not been evaluated in previous study.
The applicability of the 4C Mortality Score to various clinical settings needs to be further tested, especially outside the UK and Japan.
The novel coronavirus identified at the end of 2019 in Wuhan, China has spread worldwide and became a pandemic in March 2020. The number of patients diagnosed with COVID-19 exceeds 150 000 000 according to the WHO and continues to grow.1
To use our limited resources efficiently, it is important for healthcare practitioners to identify patients at high risk and to allocate medical resources for them appropriately. This highlights the need for a reliable and practical prognostic prediction tool; however, a universally accepted tool has yet to be established. Over 50 prognostic models have been proposed for predicting either mortality or deterioration of disease, but most studies have suffered from poorly designed validation processes, inappropriate statistical methodologies and high bias in patient recruitment.2 More recently, the 4C (Coronavirus Clinical Characterisation Consortium) Mortality Score, a scoring system for prediction of in-hospital mortality, was developed based on data from a large population that contains more than 30 000 subjects with COVID-19 in the UK3 and showed good discrimination and calibration; however, its generalisability remains to be investigated. Specifically, (1) it remains unclear how well the 4C Mortality Score works in the population outside the UK because healthcare systems vary considerably among countries, which might affect the accuracy of prediction. Although differences in ethnicity were reportedly associated with worse outcomes in patients with COVID-19,4 5 Asians accounted for only 0.7% of all subjects in the derivation cohort. (2) It was unclear whether the 4C Mortality Score also predicts other clinically relevant outcomes, such as the need for invasive mechanical ventilation (IMV). Approximately one-third of hospitalised patients develop acute respiratory distress syndrome, which requires support with a ventilator.6 This puts a heavy burden on healthcare systems, resulting in shortages of medical resources during the pandemic.7
Therefore, the aim of this study was to evaluate the external validity of the 4C Mortality Score in an independent cohort, that is, one with a different ethnicity and healthcare systems, in predicting poor outcomes, defined as mortality and IMV.
This study was a retrospective post-hoc analysis of the Clinical Outcomes of COVID-19 Infection in Hospitalized Patients with Cardiovascular Diseases and/or Risk Factors (CLAVIS-COVID), a Japanese nationwide registry that was endorsed by the Japanese Circulation Society. Briefly, this registry was designed to investigate the clinical features and outcomes of patients with COVID-19 with pre-existing or developing cardiovascular disease or coronary risk factors (CVDRF) from January 2020 to May 2020 in 49 acute care hospitals in Japan. The design and primary outcome have been described elsewhere.8 Even though there are numerous hospitals in Japan, this study focused on major acute care hospitals that accommodated patients with COVID-19 during that time period, resulting in an enrolment of approximately 9.0% (1518 of 16 851) of all Japanese cases of COVID-19. Diagnosis was confirmed with PCR test using oropharyngeal swab specimens in all subjects.
Before the first patient was enrolled, information on the registry was published as an abstract in the University Hospital Medical Information Network Clinical Trial Registry, in accordance with the International Committee of Medical Journal Editors.Written informed consent from each patient was waived under Japanese law as this was a retrospective observational study.
The prevalence of pre-existing CVDRF is high in patients with COVID-19; these factors are associated with poor outcomes,9 and management of such patients is likely to be problematic. Therefore, we focused on hospitalised patients with COVID-19 and pre-existing CVDRF in this analysis. Cardiovascular disease was defined as heart failure, coronary artery disease, myocardial infarction, valvular heart disease, arrhythmia, stroke/transient ischaemic attack, deep venous thrombosis, pulmonary embolism, peripheral arterial disease, aortic aneurysm, aortic dissection, cardiopulmonary arrest, heart transplantation, left ventricular assist device, cardiac implantable electronic device, pericarditis, myocarditis, congenital heart disease and pulmonary hypertension. Risk factors were defined as diabetes mellitus, hypertension and dyslipidaemia.
Patient and public involvement
Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.
We evaluated the predictive ability of the 4C Mortality Score for two different types of outcomes: in-hospital mortality and a composite outcome defined as the need for IMV and mortality. Therefore, patients intubated prior to admission were excluded from the analysis of the composite endpoint.
Calculation of the 4C mortality risk score
As the data form in CLAVIS-COVID was built with reference to International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC),10 we calculated each individual 4C Mortality Score from the data set in which variables basically followed the data collection form in ISARIC-UK. This score is the sum of the points assigned to eight independent parameters, including age, sex, number of comorbidities, respiratory rate, peripheral oxygen saturation level, Glasgow Coma Scale score, blood urea nitrogen level and C reactive protein level, as described in online supplemental table 1. The number of comorbidities was determined by counting the presence of each of the following diseases or conditions: chronic cardiac disease, chronic respiratory disease, chronic renal disease, liver disease, dementia, chronic neurological conditions, autoimmune disease, diabetes mellitus, malignancy and clinician-defined obesity. Because dementia and HIV infection were not collected in the registry data, they were assumed to be absent when counting comorbidities. In the original 4C Mortality Score, 2 points were given if peripheral oxygen saturation at room air was below 92%, but not all patients had saturation data at room air. Thus, we considered patients with <92% saturation, regardless of oxygen therapy, to have a score of 2 points in this study.
Selection of the scoring systems
We reviewed the studies comparing multiple prediction models in patients with COVID-19 and selected three scores that have previously shown good discriminatory ability in comparison with the 4C score: the Risk Stratification in the EmergencyDepartment in Acutely Ill Older Patients (RISE UP) score,11 the Japan Respiratory Society community associated pneumonia severity index (A-DROP scoring system)12 and the Rapid Emergency Medicine Score (REMS).13 These three scores were not COVID-19-specific risk scores but were reported to have the highest discriminatory ability in predicting short-term mortality in patients with COVID-19, with an area under the receiver operating characteristic curve (AUC) of 0.83 (RISE UP score),14 0.87 (A-DROP)15 and 0.76 (REMS).16 We calculated the RISE UP score based on age, vital signs, serum albumin, blood urea nitrogen, lactate dehydrogenase and bilirubin, as proposed in the original score. Likewise, the A-DROP was calculated based on age, dehydration, respiratory failure, orientation disturbance and low systolic blood pressure, and the REMS accounted for six variables, namely heart rate, blood pressure, respiratory rate, consciousness, age and level of oxygen saturation.
Continuous variables were expressed as mean and SD if normally distributed, or as median and IQR if not normally distributed. To describe the heterogeneity between the two studies, CLAVIS-COVID and ISARIC-UK, the derivation cohort also served as a comparison (online supplemental table 2). In patients who had missing elements required for calculation of the risk scores, missing values were multiply imputed with the assumption of missing at random. We generated 20 data sets with imputation using the variables presented in table 1. As we used two different inclusion criteria corresponding to two types of outcomes, multiple imputations were performed for in-hospital death and for the composite endpoint separately. After the scores were calculated, all statistical analyses were applied to the generated data sets. We calculated the AUC, the continuous net reclassification index (NRI) and the integrated discrimination improvement (IDI) index, and subsequently averaged across 20 pooled data sets to retrieve the corrected value and its range using Rubin’s rule.17
To assess the calibration of the models, we constructed logistic regression models of each risk score for predicting outcomes and created a calibration plot for the predicted versus observed risk. The Hosmer-Lemeshow goodness-of-fit test was also performed. Statistical analyses were performed using R (V.4.0.3; R Foundation for Statistical Computing, Vienna, Austria; http://www.R-project.org).
Among 1518 registry patients, 693 with pre-existing or known CVDRF were included in the analysis for in-hospital mortality. The mean age of the patients was 68±15 years (64.8% male), and almost all patients were Japanese (96.1%). Four hundred and twenty-two patients (60.8%) had at least one comorbidity, and diabetes was the most prevalent among all designated comorbidities. During hospitalisation, 108 patients died (15.6%).
After the exclusion of 34 patients who were already intubated at the time of the index admission, the composite endpoint was evaluated in 659 patients. In this cohort, IMV was initiated for 119 patients during hospitalisation and 41 (34.4%) patients died after a median IMV duration of 9 (IQR: 6–16) days. Overall, 178 patients met the composite endpoint of death or IMV (27.0%). There were no missing data on prognosis during the index admission.
We compared patient characteristics between cases with or without missing 4C Mortality Score in online supplemental table 2. At least one element for calculating the 4C Mortality Score was missing in 217 cases (31.3%), namely respiratory rate in 153 cases, peripheral oxygen saturation in 4 cases, Glasgow Coma Scale score in 68 cases, blood urea nitrogen level in 14 cases and C reactive protein level in 26 cases. There was no significant difference in the event rate between patients with or without missing 4C Mortality Score (in-hospital mortality: 16.1% vs 15.3%, respectively, p=0.88; composite endpoint of mortality and IMV: 26.3% vs 32.4%, respectively, p=0.13). The number of cases in which any elements of the A-DROP and the REMS were missing was the same as that for the 4C Mortality Score, but the RISE UP score was missing in 290 cases (41.8%).
Table 1 summarises the differences in characteristics among tertiles based on the 4C Mortality Score after imputation. Tertile 1 comprised patients with the lowest 4C Mortality Score. Tertile 3, the group with the highest 4C Mortality Score, had a significantly higher risk of in-hospital mortality and composite outcomes. Most components of the 4C Mortality Score, except for sex, systolic blood pressure, body temperature, comorbidities of liver disease, hypertension and obesity, showed significant differences among the groups. Lymphocyte count was significantly lower while neutrophil count was significantly higher in tertile 3.
When compared with the ISARIC-UK cohort, patients in the CLAVIS-COVID cohort had fewer comorbidities and lower levels of blood urea nitrogen, creatinine and C reactive protein, which were suggestive of lower in-hospital mortality (online supplemental table 2).
Figure 1A shows the distribution of the 4C Mortality Score. The median 4C Mortality Score was 9 points (IQR: 6–11). No in-hospital deaths or composite events occurred in patients with a 4C Mortality Score of <7 points or <4 points, respectively. The distribution of other scores was skewed to the left in the A-DROP score and REMS and to the right in the RISE UP score (figure 1B–D).
Discrimination of the 4C Mortality Score, RISE UP, A-DROP and REMS
A high 4C Mortality Score was associated with poor outcomes (OR per 1 point: 1.54, 95% CI 1.42 to 1.68 for in-hospital death; OR: 1.36, 95% CI 1.28 to 1.45 for the composite endpoint). As shown in table 2, the models’ discrimination was verified for each risk score. As seen in figure 2A, the 4C Mortality Score predicted in-hospital death better than the other scores (0.84 in 4C Mortality Score vs 0.82, 0.78 and 0.74 in RISE UP, A-DROP and REMS, respectively), but the difference in AUC with the RISE UP score was modest (delta AUC: 0.024) (table 2). The 4C Mortality Score improved the net risk classification for in-hospital mortality compared with the RISE UP score (NRI=16.2%), A-DROP score (NRI=54.6%) and REMS (NRI=83.1%), respectively (table 2).
Similar results were observed for the prediction of the composite endpoint. The 4C Mortality Score yielded the best discrimination among the three risk scores (figure 2B), with an AUC of 0.78, 0.72, 0.70 and 0.69, respectively. We also found a 44.2% improvement over the RISE UP score, 60.8% improvement over A-DROP and 59.6% improvement over REMS in predicting the composite endpoint (table 2).
The IDI for the 4C Mortality Score was consistently positive compared with the other three scores for in-hospital mortality (vs RISE UP score, A-DROP score and REMS: 0.068, 0.081 and 0.132, respectively) and composite outcome (vs RISE UP score, A-DROP score and REMS: 0.074, 0.088 and 0.101, respectively) (table 2).
Calibration of the model
We constructed calibration plots for the assessment of the models. For the 4C Mortality Score model, the calibration curve for both mortality and composite outcome almost coincided with the ideal slope (figure 3A,B), while deviation of points from an ideal line was observed in the calibration plot for both outcomes for the REMS model, indicating a lack of calibration, and those for both outcomes for the RISE UP and A-DROP models were well calibrated (online supplemental figure 1). The Hosmer-Lemeshow goodness-of-fit test had a p value of 0.001 for the RISE UP score model for in-hospital mortality, while it had a p value greater than 0.05 for the other models for both outcomes.
In this study, we validated the 4C Mortality Score for two outcomes (all-cause in-hospital mortality and the combined endpoint of all-cause death and introduction of IMV) in patients with COVID-19 complicated with CVDRF. The 4C Mortality Score performed well, as shown in previous studies. Of note, we demonstrated that this model is applicable to predicting not only in-hospital death but also the composite endpoint of the need for IMV and death. Furthermore, the 4C Mortality Score was superior to the pre-existing prediction models that have also been validated in patients with COVID-19 for predicting mortality. Our study results showed that (1) the 4C Mortality Score was generalisable to other clinically relevant events (composite outcome of death and need for IMV) and was useful in different ethnicities and healthcare settings; and (2) the 4C Mortality Score was superior to pre-existing risk scores originally developed for patients with COVID-19 and patients visiting the emergency department, except for the model of RISE UP score for mortality.
Importance of external validation of the risk models
One of the clinical challenges of scoring systems is the applicability of the model to the population in which it is actually used in a clinical setting. In general, there is a certain risk of overfitting to the derivation cohort when developing risk models, and the predictive ability is prone to overestimation. Therefore, it is important to externally validate the developed risk model in an independent cohort, although this is rarely performed, despite the fact that numerous risk prediction models are proposed in the medical field. Indeed, Siontis et al18. studied newly proposed risk models in any medical field and reported that only 16% of them were subsequently validated after the first publication. Amidst the pandemic, many prediction models have been proposed; however, their external validity has rarely been examined. In addition, a recent study reported that most prediction models developed for patients with COVID-19 failed to reproduce a similar ability to discriminate high-risk patients.16 This implies that it is difficult to expect the same ability for a prediction model outside the cohort in which the model was derived. In fact, the AUC of the receiver operating characteristics of both A-DROP and REMS fell short in the present study compared with those previously reported as 0.87 and 0.84, respectively.15 19
As for the 4C Mortality Score, the primary study tested the external validity within the ISARIC-UK cohort, yet both the derivation and validation cohorts were developed in the same way and in the same country. More recently, the performance of the 4C Mortality Score was tested in a single medical centre in the Netherlands; however, there was lack of generalisability and missing information on ethnicity.14 At this point, it is noteworthy that the 4C Mortality Score performed well in the cohort of patients from the Japanese nationwide registry which has different features, including ethnicity, comorbidities and healthcare system. Despite these differences between the two cohorts, the 4C Mortality Score showed similar discriminatory ability to that in the previous study (AUC: 0.786).3 A previous study reported that the RISE UP score was comparable with the 4C Mortality Score in predicting mortality.14 Our results were in line with the previous study; however, the 4C mortality model fitted better than the RISE UP score based on the results of the goodness-of-fit test. These findings imply that the 4C Mortality Score is generalisable to patients with COVID-19 regardless of ethnicity and healthcare system.
Identification of patients at high risk of requiring mechanical ventilation
Our study also evaluated a composite endpoint that included the need for an IMV. We found that the 4C Mortality Score could predict this composite endpoint well and was better than the other scores.
Even though mortality is one of the most clinically relevant outcomes, it does not directly project the burden on healthcare systems. Patients who survive but require intensive care are not accounted for if only death is used as the outcome. Considering that the main purpose of risk stratification tools is to allow clinicians to determine the disposition of patients and the allocation of limited medical resources appropriately, it is important to identify not only those likely to die but also those likely to require intensive care. In this context, our findings on the predictability of the 4C Mortality Score for the composite endpoint incorporating IMV may expand the clinical utility of the 4C risk score. Although several studies have already developed a model for predicting those at high risk for progression to severe COVID-19,16 20 21 the definitions of severe COVID-19 used in those studies were heterogeneous and there is no universally accepted definition of severe COVID-19, which make interpretation of the study results difficult. On the other hand, a recent study adopted a composite endpoint of mortality and admission to the medium/intensive care unit instead of IMV.14 In this study, the 4C Mortality Score was not superior to the RISE UP score in predicting the composite endpoint. Although admission to the medium/intensive care unit generally reflects deterioration of general condition, including respiratory status, the difference in outcomes may explain the inconsistency of the results with ours. Given that patients who require IMV therapy are likely to stay on a ventilator for about 2 weeks and nearly one-third of them die,22–24 a combined endpoint of IMV and in-hospital mortality appears to be a reasonable outcome of interest in terms of healthcare burden. Moreover, a recent study has shown that the COVID-19 intensive care unit case load was associated with increased mortality for critical patients25; appropriate and early disposition according to the risk model capable of identifying those at high risk for intensive care should therefore be considered.
This retrospective post-hoc analysis has several limitations that must be addressed. The patients in our study cohort were almost all Japanese and it is not clear whether this model is applicable to different cohorts; however, our study results were consistent with those of the previous study in terms of model discrimination and calibration, and superior to other risk models in performance. Second, the practicality of the 4C Mortality Score was not evaluated in our study; however, it can be readily calculated, either manually or online, even if its calculation is more complicated than the A-DROP. Third, there were several missing values in our data set, and we analysed only complete cases in this study, which may have led to potential selection bias. Fourth, one of the most important limitations of our study is that we did not have two of the variables included in the original 4C Mortality Score, namely HIV infection and dementia. However, we assumed that a limited number of patients with HIV were included, given that the prevalence of HIV is less than 0.2% in Japan.26 No clear definition of dementia was proposed in the ISARIC data set and it is not clinically feasible to diagnose dementia in the acute setting of COVID-19, as the disease itself can possibly affect the status of consciousness.
The 4C Mortality Score performed well in a cohort with CVDRF, independent of the original cohort, and showed consistent results with the former study in terms of discrimination and calibration. The score also predicts the composite outcome of IMV and in-hospital mortality, and its predictive ability is superior to those of other prediction models previously proposed for patients with COVID-19. Our study results showed the generalisability of risk stratification of patients with COVID-19 based on the 4C Mortality Score, and it may aid in the disposition of patients and the allocation of limited medical resources during the pandemic.
Data availability statement
Data are available upon reasonable request. The data used in the present study are available from the corresponding author and SK upon reasonable request.
Patient consent for publication
The investigation conformed to the principles outlined in the Declaration of Helsinki. The study protocol, including the use of and opt-out consent method, was approved by the ethics committee of Toho University Omori Medical Center (no. M20253) and the local ethics committees of all participating institutions.
The authors would like to acknowledge all investigators who participated in the CLAVIS-COVID and the Japanese Circulation Society.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Twitter @kuroda_shunsuke, @yuya_matsue
Contributors TS, TKit, TY, SKo, ST, TKis, IK, KH and KN oversaw all data collection efforts and contributed to the revision of the manuscript. SM and YM oversaw the project, conceptualised the study and contributed to the revision of the manuscript. SKu conceptualised the study, analysed the data, and contributed to the writing and revision of the manuscript. All authors read and approved the final version.
Funding This study was supported by the Japanese Circulation Society.
Competing interests TY belongs to endowed departments of Abbott Vascular Japan, Boston Scientific Japan, Japan Lifeline, WIN International and Takeyama KK. SKo received unrestricted research grants from the Department of Cardiology, Keio University School of Medicine provided by Daiichi Sankyo and Bristol Myers Squibb, and lecture fees from AstraZeneca and Bristol Myers Squibb. ST received unrestricted research grants from Japan Medical Device Technology, Boston Scientific Japan and Asahi Intecc, and received an honorarium from Boston Scientific Japan, Abbott Vascular Japan and Medtronic. IK received unrestricted research grants from Daiichi Sankyo, Sumitomo Dainippon Pharma, Takeda Pharmaceutical Company, Mitsubishi Tanabe Pharma Corporation, Teijin Pharma, Idorsia Pharmaceuticals, Otsuka Pharmaceutical, Bayer Yakuhin, Ono Pharmaceutical and Toa Eiyo, and lecture fees from AstraZeneca, Daiichi Sankyo Company, Takeda Pharmaceutical Company, Bayer Yakuhin, Pfizer Japan and Ono Pharmaceutical. KN received lecture fees from Astellas, AstraZeneca, Bayer Yakuhin, Boehringer Ingelheim Japan, Daiichi Sankyo, Eli Lilly Japan Kowa, Mitsubishi Tanabe Pharma, MSD, Novartis Pharma, Ono Pharmaceutical, Otsuka and Takeda Pharmaceutical, research funding from Asahi Kasei, Astellas, Boehringer Ingelheim Japan, Mitsubishi Tanabe Pharma, Teijin Pharma and Terumo Corporation, and scholarship funds from Bayer Yakuhin, Daiichi Sankyo, Medtronic, Takeda Pharmaceutical and Teijin Pharma. YM is affiliated with a department endowed by Philips Respironics, ResMed, Teijin Home Healthcare and Fukuda Denshi, received an honorarium from Otsuka Pharmaceutical and Novartis Japan, received consultant fee from Otsuka Pharmaceutical, and joint research funds from Otsuka Pharmaceutical and Pfizer. IK, KH and KN are members of Circulation journal’s editorial team.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.