Article Text

Original research
Prediction of COVID-19 severity using laboratory findings on admission: informative values, thresholds, ML model performance
  1. Yauhen Statsenko1,
  2. Fatmah Al Zahmi2,3,
  3. Tetiana Habuza4,
  4. Klaus Neidl-Van Gorkom1,
  5. Nazar Zaki4
  1. 1Radiology, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, UAE
  2. 2Neurology, Mediclinic Middle East Parkview Hospital, Dubai, UAE
  3. 3Clinical Science, Mohammed Bin Rashid University Of Medicine and Health Sciences, Dubai, UAE
  4. 4Computer Science, College of Information Technology, United Arab Emirates University, Al Ain, UAE
  1. Correspondence to Dr Yauhen Statsenko; e.a.statsenko{at}uaeu.ac.ae

Abstract

Background Despite the necessity, there is no reliable biomarker to predict disease severity and prognosis of patients with COVID-19. The currently published prediction models are not fully applicable to clinical use.

Objectives To identify predictive biomarkers of COVID-19 severity and to justify their threshold values for the stratification of the risk of deterioration that would require transferring to the intensive care unit (ICU).

Methods The study cohort (560 subjects) included all consecutive patients admitted to Dubai Mediclinic Parkview Hospital from February to May 2020 with COVID-19 confirmed by the PCR. The challenge of finding the cut-off thresholds was the unbalanced dataset (eg, the disproportion in the number of 72 patients admitted to ICU vs 488 non-severe cases). Therefore, we customised supervised machine learning (ML) algorithm in terms of threshold value used to predict worsening.

Results With the default thresholds returned by the ML estimator, the performance of the models was low. It was improved by setting the cut-off level to the 25th percentile for lymphocyte count and the 75th percentile for other features. The study justified the following threshold values of the laboratory tests done on admission: lymphocyte count <2.59×109/L, and the upper levels for total bilirubin 11.9 μmol/L, alanine aminotransferase 43 U/L, aspartate aminotransferase 32 U/L, D-dimer 0.7 mg/L, activated partial thromboplastin time (aPTT) 39.9 s, creatine kinase 247 U/L, C reactive protein (CRP) 14.3 mg/L, lactate dehydrogenase 246 U/L, troponin 0.037 ng/mL, ferritin 498 ng/mL and fibrinogen 446 mg/dL.

Conclusion The performance of the neural network trained with top valuable tests (aPTT, CRP and fibrinogen) is admissible (area under the curve (AUC) 0.86; 95% CI 0.486 to 0.884; p<0.001) and comparable with the model trained with all the tests (AUC 0.90; 95% CI 0.812 to 0.902; p<0.001). Free online tool at https://med-predict.com illustrates the study results.

  • COVID-19
  • biotechnology & bioinformatics
  • infectious diseases
  • respiratory infections
  • information technology
  • biochemistry
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Strength and limitations of the study

  • The research is based on a unique study cohort that is representative of the entire population because of the national standard that required all patients with confirmed COVID-19 to be admitted to acute care hospitals regardless of their symptoms or illness severity.

  • To distinguish the patients with the confirmed COVID-19 who may worsen while treated, we justified the threshold values of the laboratory tests done on admission.

  • The prediction of the future deterioration by the neural network is reliable even with the top three valuable laboratory tests (activated partial thromboplastin time, C reactive protein and fibrinogen) used for training (area under the curve 0.86; 95% CI 0.486 to 0.884; p<0.001).

  • The limitation of the study was the unbalanced dataset (eg, the disproportion in the number of patients admitted to the intensive care unit vs non-severe cases).

Introduction

Despite the necessity, there is no reliable prognostic biomarker to predict disease severity and prognosis of patients with COVID-19.1 Studies on COVID-19 have built up several types of prediction models. These have been the models designed to indicate the disease risk in the general population, the diagnostic models based on medical imaging and the prognostic models. Unfortunately, these models have had some limitations that have precluded their use in clinical practice.2

Models using laboratory findings as the inputs

Researchers tried to establish the role of laboratory findings in the diagnosis of COVID-19.3 They showed that the severe cases of COVID-19 were associated with D-dimer level over 0.28 µg/L, interleukin (IL)-6 level over 24.3 pg/mL3 and lactate dehydrogenase (LDH) activity with an upper limit cut-off in the range of 240–255 U/L.4 However, the use of these laboratory parameters with the above-mentioned cut-off values was limited for the following reasons. First, these studies were conducted on severe forms of the disease. Limited research was done on patients who were asymptomatic or had mild disease.3 5 Second, the whole spectrum of the regularly used clinical laboratory data is unavailable for non-severe patients. Thus, the published papers add justification on the diagnostic utility of separate laboratory findings, instead of working out reliable diagnostic criteria for a set of them.

Gong et al6 have generated a tool for the early prediction of severe COVID-19 pneumonia out of the following data: age, serum LDH activity, C reactive protein (CRP), the coefficient of variation of red blood cell distribution width, blood urea nitrogen, direct bilirubin, lower albumin. The resulting performance was not high (sensitivity 77.5%, specificity 78.4%).6 Supposedly, this is because the dataset used as the input consists of exceptionally the age and laboratory findings.

In another model, the inputs included basic information, symptoms and the results of laboratory tests. After the feature selection, the number of key features was set to just three laboratory results: LDH, lymphocytes and high-sensitivity CRP. The model was trained with the follow-up studies of the general, severe and critical patients.1 By feeding machine learning (ML) algorithm with the results obtained at the time of admission and in follow-up studies, the authors worked out a decision rule to predict patients at the highest risk. However, physicians are interested in the early prediction of the disease outcomes, and it is highly disputable that the model will not loose its predictive potential if applied exceptionally to the data received on admission.

We believe that a more accurate model can be built based on the simultaneous interpretation of laboratory results, clinical data and physical examination findings (eg, body mass index, body temperature, respiratory rate) at the time of presentation. The analysis using an ML algorithm could provide an accurate prediction of the disease severity.

Data used by clinicians for stratifying risks

Clinicians routinely use physical examination findings and laboratory parameters for risk stratification and hospital resources management. Commonly, each laboratory test kit has the only cut-off value to segregate the normal status from a pathology. We believe that threshold values should be re-adjusted for each disease rather than used as a common cut-off value for all pathologies.

As a standard of care, baseline blood tests and inflammatory markers are obtained on admission to the hospital. The proper approach for the risk assessment should allow physicians to forecast the patient’s future worsening out of the initial findings on admission. This is what we intend to do by applying an ML approach to the predictors routinely used in clinical practice. There are some promising data for the following set of prognostic biomarkers of COVID-19 severity.

Inflammatory markers

There is evidence that IL-6 and tumour necrosis factor (TNF)-α do not indicate the level of COVID-19 progression.7 Some markers of inflammation are elevated in the serum of patients with COVID-19 compared with the healthy people, that is, the serum SARS-CoV-2 viral load (RNAaemia) is closely correlated with drastically elevated IL-6 levels in critically ill patients with COVID-19.8 However, there is no significant difference between severe and mild groups.7 In contrast to this, the indicators are reflective in the progression of the diseases caused by other coronaviruses (eg, Middle East respiratory syndrome (MERS), SARS).9 This may be explained by the huge amino acid differences in viral proteins of distinct coronaviruses. Even with different MERS-CoV strains, common cytokine signalling by TNF and IL-1α results in the differential expression of innate immune genes.10

Ferritin

Ferritin is a marker of iron storage. However, it is also an acute-phase reactant, the level of which elevates in processes of acute inflammation, whether infectious or non-infectious. Marked elevations have been reported in cases of COVID-19 infection.11

D-dimer

A common finding in most patients with COVID-19 is high D-dimer levels (>0.28 mg/L), which are associated with a worse prognosis.3 12 An exceptional interest of physicians in this biomarker comes from the fact that the vast majority of patients who died of COVID-19 fulfilled the criteria for diagnosing the disseminated intravascular coagulation. This is why the incidence of pulmonary embolism in COVID-19 is high. In this condition, the D-dimer concentration will definitely rise up because it is a product of degradation of a blood clot formed out of fibrin protein.13 Thromboembolic complications explain the association of low levels of platelets, increased levels of D-dimer and increasing levels of prothrombin in COVID-19.14 Alternatively, the D-dimer level may go up as a direct consequence of SARS-CoV-2 itself.15

Reasonably, laboratory haemostasis may provide an essential contribution to the COVID-19 prognosis and therapeutic decisions.16 Researchers tried to forecast the severity of COVID-19 with D-dimer as a single predictor. They showed that D-dimer level >0.5 mg/L had a 58% sensitivity, 69% specificity in the forecast of the disease severity.17 In another study, D-dimer level of >2.14 mg/L predicted in-hospital mortality with a sensitivity of 88.2% and specificity of 71.3%.18 Another study highlighted that a D-dimer threshold of >2.66 mg/L detected all patients with a pulmonary embolus on the chest CT.15 So, the high levels of D-dimer are a reliable prognostic biomarker of in-hospital mortality.

Fibrinogen

In patients with COVID-19 admitted to ICU for acute respiratory failure, the level of fibrinogen is significantly higher than in healthy controls (517±148 vs 297±78 mg/dL).12 The small vessel thrombi revealed on autopsy in lungs and other organs suggest that disseminated intravascular coagulation in COVID-19 results from severe endothelial dysfunction, driven by the cytokine storm and associated hypoxaemia. As standard-dose deep vein thrombosis prophylaxis cannot prevent the consumptive coagulopathy, monitoring D-dimer and fibrinogen levels are required. This will promote the early diagnostics of hypercoagulability and its treatment with direct factor Xa inhibitors.14 19

Activated partial thromboplastin time

In a study conducted in February 2020, the levels of activated partial thromboplastin time (aPTT) as well as white blood cells (WBC), lymphocytes, aspartate aminotransferase (AST), alanine aminotransferase (ALT) and creatinine, differed negligibly between severe and mild patients.3 At the same time, other researchers showed inconsequential distinction in aPTT in survivors versus non-survivors.20 According to the results of another study published in March 2020, no significant difference in aPTT values were found in the cohort of severe cases versus the non-severe one.6 The results obtained in another study in April in Italy were the same.12 The common limitation of these early studies was a small sample size. Finally, a meta-analysis justified that the elevation of D-dimer, rather than prothrombin time and aPTT, reflects the progression of COVID-19 towards an unfavourable outcome.21

LDH and creatine kinase

Increased levels of the enzymes may reflect the level of the organ damage in a systemic disease.4 22 Reasonably, they may serve as biomarkers for COVID-19 progression.

C reactive protein

In the early stage of COVID-19, CRP levels are positively correlated with the diameter of lung lesions and severe presentation.23

Liver enzymes and total bilirubin

COVID-19 leads to elevated liver biochemistries (eg, the level of AST, ALT, gamma-glutamyl transferase, total bilirubin) in over 50% of patients on admission. AST-dominant aminotransferase elevation reflects the disease severity and true hepatic injury.24 25

Objectives

We decided to identify predictive biomarkers of COVID-19 severity and to justify their threshold values. Hypothetically, the absolute values of the biomarkers on admission to the clinics could provide physicians with an accurate prognosis on the future worsening of the patient that would require transferring the individual to the intensive care unit (ICU). Getting a reliable tool for such a prognosis will support decision making and logistical planning in clinics.

To address the objective, we designed a set of the following tasks:

  • To study the linear separability of the laboratory findings values in patients with confirmed COVID-19 who were transferred to ICU versus non-severe cases of the disease, and to make the comparative analysis of the ICU department cases (both the deceased and survived cohorts) with other patients with COVID-19.

  • To identify the risk factors by selecting the most valuable features for predicting the deterioration that would require transferring the patient to ICU.

  • To work out the threshold criteria for the major clinical data for the early identification of the patients with a high risk of being transferred to ICU.

  • To identify the accuracy of the prediction of the patient’s deterioration by the ML algorithm and by a set of the newly created threshold values of the laboratory and clinical findings.

Materials and methods

Study design and sample

We did a retrospective analysis of the clinical data obtained as a standard of primary and secondary care. The study sample included all the consecutive patients admitted to Dubai Mediclinic from 24 February to 1 July 2020, who fit the criteria of eligibility (total 560 cases). Using this sample, the intention of the study was met, that is, to allow for the early prognostic stratification.

The inclusion criteria were as follows: age 18 years or older; inpatient admission; SARS-CoV-2-positive real-time reverse transcription PCR from nasopharyngeal swabs only, at our site. Those patients who met the inclusion criteria for our studies were included in the study sample. All the patients were discharged at the time of writing the paper.

The remarkable feature of our study is that at the beginning of the pandemic, all the patients with COVID-19 verified by PCR were hospitalised in the Mediclinic even if they did not present any symptoms. We observed many mild and asymptomatic forms of the disease, with all the required spectrum of analyses being conducted. All patients who were hospitalised stayed in Dubai Mediclinic until they were afebrile for >72 hours and had SpO2 value not <94%.

We assessed the duration of viral shedding as the number of days from the disease onset when the diagnosis was confirmed (eg, the first positive PCR test) to the first negative PCR test.26 All the patients hospitalised to the Mediclinics hospital were subject to the regular collection of nasopharyngeal swabs by a standard technique. Furthermore, after the patient stopped presenting disease symptoms, the specimen collection continued on a daily basis until two subsequent negative PCR tests for COVID-19 >24 hours apart. In the case of the mild disease course, patients might be transported to isolation facilities before being discharged home (see the flow chart diagram in figure 1). If the facilities were run by Mediclinic, we had their follow-up PCR results. For those patients who went to other isolation facilities not connected to Mediclinic, we could not study the duration of viral shedding (the data are missing for 27 out of 560 patients).

Figure 1

The flow of patients with COVID-19 in Dubai Mediclinic. ICU, intensive care unit.

The treatment was administered in full accordance with ‘National Guidelines for Clinical Management and Treatment of COVID-19’. The indications for the supportive oxygen therapy were (a) the oxygen saturation level below 94%, (b) the respiratory rate (RR) above 30 breaths per minute, (c) both of them. In case of suspicion of superimposed bacterial pneumonia, physicians ordered empirical broad-spectrum antibiotics. The administration of the antiviral and antimalarial drugs followed the national guidelines.27

Patient and public involvement

No patient involved. The data were collected retrospectively from the medical record system.

Methods used

To address the first task, we studied the separability of laboratory findings values on admission to Dubai Mediclinic concerning the future transfer of the patient to the ICU department. To carry out the comparative analysis of features with regard to transferring to ICU, we used a set of non-parametric tests. The relationships involving two variables were assessed with the Mann-Whitney U test or Kruskal-Wallis test for the continuous features, and with Fisher’s exact test or χ2 test for the quantitative ones. The data were expressed as IQR, median±SD or number of cases and their percentage. The missing data for the comparative analysis were treated with the complete-case analysis method.

To address the second task, we used a set of different methods. First, we trained the neural network (NN) ML model on each variable separately. To come up with laboratory data cut-off levels which may be considered as biomarkers of severe course of the disease we assessed their statistical significance against chance performance. We calculated 95% CI for receiver operating characteristic (ROC) and ROC AUC scores with the bootstrap technique and p values with permutation tests.

Second, we used ML tree-based methods (AdaBoost, Gradient Boosting, Random Forest and Extra Trees) to check if there were unique patterns within the data that could unambiguously identify the event of transferring the patient to ICU from the data obtained on admission. For the list of features used as predictors, see online supplemental appendix 1. To assess the importance of the variables, we ranked all features concerning their impurity-based predictive potential. For ranking, we used a set of classifiers and then averaged all the received scores. Missing data in all ML models were replaced by the mean or median values with regard to the continuous or quantitative feature, respectively using single imputation method.

To tackle the third task, we used two approaches: a threshold moving technique (Youden’s index)28 and a heuristically chosen percentile-based cut-off level. The problem of predicting the transfer to ICU had a severe class imbalance. Therefore, we needed to focus on the performance of the classifier on the minority class (admitted to ICU patients). The sensitivity and specificity of the supervised ML classification model (NN) were used to evaluate the quality of the chosen optimal threshold for each important laboratory finding.

To evaluate the classifier output quality, we trained several ML classification models using a stratified 10-fold cross-validation technique to generalise the models to the true rate error. For each fold, we used 90% of the data to train the model and then tested it with the rest 10%. The decision matrices built on the test dataset for all folds were combined and used to calculate the performance metrics.

Results

Comparison of the ICU versus non-ICU patients

The problem of predicting admission to ICU has a severe class imbalance (488 vs 72). Therefore, we need to focus on the performance of the classifier on the minority class (the patients admitted to ICU).

We look at the linear separability of the groups of numerical data composed from the laboratory findings values with regard to their quartiles. In figure 2, box plots for the laboratory findings data are presented with the red dashed line that marks the 75th percentile for the subjects that were not transferred to ICU. The assumption is to use the third quartile (Q3) start point value as the threshold if there is separability between ICU and non-ICU groups. In diagrams in figure 2, the red line indicates the 75th percentile for not admitted to the ICU group. The exception is the diagram for the lymphocyte count, where it stands for the 25th percentile.

Figure 2

Variation of laboratory findings values in the intensive care unit (ICU) cohort (orange box plot) versus the non-ICU cohort of patients (blue box plot). ALT, alanine aminotransferase; aPTT, activated partial thromboplastin time; AST, aspartate aminotransferase; LDH, lactate dehydrogenase; WBC, white blood cell.

The results of the comparative analysis of features with regard to transferring to ICU and the final outcomes of the disease are presented in table 1. We excluded from further analysis the laboratory findings that did not significantly differ in the distribution of two groups. Therefore, we considered the list of 13 variables: WBC, lymphocyte count, total bilirubin, ALT, AST, D-dimer, aPTT, creatine kinase (CK), CRP, LDH, troponin, ferritin and fibrinogen on admission.

Table 1

Comparison of the patients hospitalised to ICU concerning the COVID-19 outcomes: comorbidities, the result of physical examination on admission, laboratory findings on admission and deterioration (eg, peak or minimal values), ethnicity and disease course features

Feature ranking with regard to ML model performance

The features of the dataset listed in online supplemental appendix 1 were ranked with four tree-based ML classifiers (eg, Random Forest, AdaBoost, Gradient Boosting and ExtraTrees). Tree-based models provide measures of feature importances. The classifiers are based on the mean decrease in impurity. The impurity is quantified by the splitting criterion of the decision trees. Averaged values of impurity-based attribute ranks were calculated as the mean of rank values for the algorithms (see online supplemental figure 1). The classification performance is seen in online supplemental figure 2.

The cut-off levels of the laboratory findings

To come up with laboratory data cut-off levels, which may be considered as biomarkers of the severe course of the disease, we trained the NN ML model on each variable separately and assessed their statistical significance against chance performance. We calculated 95% CI for ROC and AUC scores with the bootstrap technique and p values with permutation tests (see table 2).

Table 2

Statistical significance of ROC AUC for predicting transfer to ICU out of the laboratory findings on admission

Table 2 shows that there is a notable difference between the performance of the model in terms of ROC AUC and the performance at chance level. High-performance measures were obtained for aPTT, CRP and fibrinogen values (sensitivity and specificity of 0.9877 and 0.4028, respectively). The values increased to 0.9754 and 0.75, respectively, for all 13 significant tests. So we used the performance of the classification model based on the combination of these 3 and 13 features.

First, we trained the ML classification model based on the data taken from only one lab feature using a stratified 10-fold cross-validation technique. Then, we built ROC for the test data of all 10 folds (see diagrams in online supplemental figure 3).

To improve the model’s efficiency and to choose the cut-off value set for some laboratory findings data, we used a threshold moving technique along with a supervised ML classification model (NN).

The ML estimator assigns threshold values for interpreting probabilities. The default threshold returned by the estimator to class labels is 0.5. However, when the dataset is unbalanced, tuning this hyperparameter can improve the model’s efficiency by finding the optimal threshold. This is crucial when the importance of predicting the positive class (admitted to ICU) outweighs true negative predictions. Performance metrics calculated for all laboratory features with regard to the optimal threshold value are presented in table 3. The table displays the sensitivity, specificity and AUC values obtained after applying the threshold moving technique. We marked in bold the AUC values which are higher than the ones displayed in online supplemental figure 3A. The optimal cut-off value returned by the technique is shown in the appropriate column.

Table 3

Justification of the cut-off levels for the admission values of laboratory findings to predict transferring to ICU

As per the box plots regarding the laboratory findings values in the ICU versus the non-ICU cohort of patients in figure 2, we decided to check whether the performance of the model is good if we applied thresholds in the following manner. For lymphocyte count, we set the cut-off level to the 25th percentile (values lower than or equal to the chosen level were set to 1 or 0 otherwise). For the other features, we set the thresholds to the 75th percentile (values higher or equal to the cut-off limit were set to 1 or 0 otherwise). The performance of the models with regard to the aforementioned cut-off levels is presented in table 3.

Online supplemental figure 4A shows the performance of the logistic regression model built on the binary data by applying the cut-off level for the threshold moving technique. Online supplemental figure 4 illustrates the same information for the percentile’s cut-off levels.

The performance of the classification models

The applied ML algorithms were trained with stratified 10-fold cross-validation technique. The predictors used are listed in online supplemental table 1. The performance of the classification models such as Gradient Boosting, AdaBoost, ExtraTrees, Random Forest, NN, logistic regression with and without L1 regularisation is presented in online supplemental figure 2, online supplemental table 2. It displays all 560 test points concatenated from test (actual and predicted) label values for each fold. Online supplemental tables 3 and 4 show the performance metrics obtained by the NN model with the highest output quality. Online supplemental figure 3 displays ROC curves and AUC for the NN model with different variables, observed on admission, as predictors. Online supplemental figure 4 illustrates the quality of the performance for the binary data obtained by using the threshold moving or percentile-based heuristic approach.

Discussion

Severity of the disease course in SARS-CoV-2 infection

There are different risk factors for COVID-19 severity. Finding and justifying them are the issues of the ongoing studies because of the persistence of the viral infection. In research on the severe respiratory illness for COVID-19, the authors justified the age above 65 years as a predictor of clinical outcomes of interest.29 The data we received support this fact. In the same study, the authors showed inconsistent results regarding the race of the patient. In the univariate model, the race was a non-significant predictor of the disease severity, however it turned out to be significant in the multivariate prediction. We did not find ethnic differences between ICU and non-ICU cohorts, but observed a notable difference in the outcome of the disease within these groups (eg, discharged vs deceased patients). According to other studies, age is the largest contributor to risk of death for SARS-CoV-2, the impact of the race or ethnicity on the disease course remains not fully understood. The researchers have difficulty adjusting the samples for comorbidities as physicians did not examine all the patients thoroughly before the disease.30 31 Presumably, the same limitations account for disparities between the studies in which the authors try to consider comorbidities (eg, asthma, diabetes, hypertension and chronic kidney disease) as risk factors. To overcome the limitation, we decided to base the prediction on the laboratory findings on admission. They are standardised and unambiguously interpretable.

Biomarkers of the deterioration of the patients

It is common sense that people with unmanaged chronic conditions are more vulnerable to severe outcomes. High-sensitive laboratory findings are a reliable tool for assessing pathologies of these kinds. Reasonably, these findings may serve as predictors of the disease progression.

As it comes from feature selection, LDH activity is the laboratory finding that has maximal informative value for the prediction of worsening of the patient (see online supplemental table 1). This keeps up with the results of a pooled analysis that show an association of elevated LDH values with a sixfold increase in odds of developing severe disease. Notably, the LDH cut-off in the included studies ranged from 240 to 253.2 U/L. The threshold value for the LDH activity in our study is 246 U/L, which is close to the median of the range.4 It is also known to be a predictor of worse outcomes in inpatients.32 In our study, LDH is the top rank predictor of disease severity, CK levels have a medium informativeness. Both of them are non-specific biomarkers of energy deficiency and hypoxia. The levels of CRP have an expectedly high predictive value as they reflect the activity of an inflammatory process.

The concentration of D-dimer seems to be a more promising biomarker of COVID-19 severity because of the endothelial dysfunction mechanism which is specific for this viral infection (see ‘Data used by clinicians for stratifying risks’ subsection). For the same reason, aPTT is an interesting predictor for SARS-CoV-2-infected patients. Therefore, recent studies justified the coagulation indicators on admission (eg, D-dimer, aPTT, prothrombin time and fibrinogen) as significant indicators of severe course of COVID-19.33

Online supplemental table 1 shows that fibrinogen values are not predictive of disease severity. The explanation to this discrepancy is many missing values for this indicator in our database. As it is seen from table 1, the total number of 153 cases (27%) were missing. We had to replace them with the mean values to perform the multivariate prediction with the tree-based model. The replacement decreased the real prognostic value, which was expected to be high. In contrast to this, the univariate model based on fibrinogen levels had the best classifying metrics compared with other predictors. Its ROC AUC value is 0.7704 (see table 2).

Threshold criteria for the major clinical data

With the ML approach, we justify the cut-off thresholds for the major laboratory tests regularly done on admission.

The disproportion in the number of patients admitted to ICU versus non-severe cases was challenging. Therefore, we customised the ML algorithms in terms of threshold values used to predict worsening. For each laboratory findings feature, we (1) fit the model to the training dataset using 10-fold cross-validation technique, (2) predicted the probabilities on the test dataset, (3) found the optimal threshold value which maximises the ROC AUC measure.

The optimised threshold values (marked in bold in table 3) can be used to predict the supposed deterioration of the patient from the initial findings at presentation. Some of the thresholds are close to the normal reference values, but not completely. For instance, the cut-off for CRP is 3 times bigger than the top reference value. The cut-offs that we found for WBC and total bilirubin are within the range of normal values for these laboratory findings. That is why it is challenging to interpret them.

The prediction based on CRP with ROC AUC equal to 0.8403 proved to be most accurate. A meta-analysis done by other authors showed the possibility to predict mortality for COVID-19 out of CRP with the same level of accuracy (ROC AUC 0.84).17 Unfortunately, they did not state clearly the time point for collecting the samples.

In our study, the performance of the disease severity prediction based on the coagulation indicators was not so high (eg, D-dimer 0.7228; fibrinogen 0.6774). However, it almost equals the results of ROC analyses for mortality risk by other authors who received AUCs value of 0.742 for D-dimer on admission and 0.643 for aPTT on admission.33 Other authors reached even better performance for the prediction of in-hospital mortality based on D-dimer on admission (AUC 0.85).

Despite the similarities in performance metrics, the studies cannot be compared as they are based on different inclusion criteria, study cohorts and threshold values found. In general, our findings support the idea of other researchers to use laboratory findings on admission for risk stratification. Moreover, they encourage the further studies to implement new biomarkers into prognostic models along with the proven ones.17

The multivariable prediction of the severity of COVID-19

For better prediction, it is recommended that several biomarkers are analysed concomitantly. A combination of 3 and 13 most valuable ones, if fed to the deployed ML algorithm, provides a reliable prognosis. Online supplemental figure 2 clearly shows that there is a separability pattern within all variables used to build the predictive model. When we rank the features in accordance with their importance, most laboratory findings variables are listed at the top (see online supplemental table 1). It also helps to justify the threshold values, presented in this study.

Limitations

There are several limitations in the current study. First, the dataset is unbalanced. Therefore, we customised the supervised ML algorithm in terms of the threshold value used to predict worsening. Second, the severity and mortality of the included patients might not be representative of the community because of the latent course of the mild and asymptomatic cases. Third, the population of Dubai is specific in terms of unequal age distribution and ethnic heterogeneity. However, one may consider the last feature as a strength because we can generalise the results to the world population. Fourth, although other clinical examinations (eg, diagnostic imaging) could provide additional information, we limited the predictors of disease deterioration to laboratory findings. Nonetheless, this was enough to build up an ML algorithm with good performance. The concomitant analysis of the top three valuable biomarkers on admission provided a reliable prognosis without radiological predictors. Another advantage of the choice we made is the high applicability of study results into practice. The justified cut-off thresholds for the laboratory tests are easy to use on admission to the hospital.

Conclusion

  • By comparing the data for the patients who were transported to ICU with those who did not worsen throughout the hospitalisation, we selected a set of laboratory findings with the significant differences on admission to the clinics. The variables were used as the predictors to build up the classification model. The performance of the models was low, with the default thresholds returned by the ML estimator, we improved it by setting the cut-off level to the 25th percentile for lymphocyte count and the 75th percentile for other features.

  • To distinguish the patients with confirmed COVID-19 who may worsen while treated, we justified the following threshold values of the laboratory tests done on admission: lymphocyte count <2.59×109/L, and the upper levels for total bilirubin 11.9 μmol/L, ALT 43 U/L, AST 32 U/L, D-dimer 0.7 mg/L, aPTT 39.9 s, CK 247 U/L, CRP 14.3 mg/L, LDH 246 U/L, troponin 0.037 ng/mL, ferritin 498 ng/mL and fibrinogen 446 mg/dL.

  • The performance of the neural network to predict the future deterioration out of the top three valuable tests (aPTT, CRP and fibrinogen) is admissible (AUC 0.86; 95% CI 0.486 to 0.884; p<0.001). It is comparable with the model trained with all the tests (AUC 0.90; 95% CI 0.812 to 0.902; p<0.001).

Acknowledgments

The authors would like to thank the UAE University (Al Ain, UAE) and Mediclinic Parkview Hospital (Dubai, UAE) for providing support and for allowing to use the facilities for conducting this research. The authors would also like to thank the healthcare staff and the patients for their dedication and commitment to this research.

References

Supplementary materials

Footnotes

  • YS, FAZ and TH are joint first authors.

  • Twitter @StatsenkoE

  • Contributors All authors contributed to the creation of the article as follows: all of them contributed to the conceptual idea of the paper equally; FAZ and YS formulated the objectives; FAZ collected the dataset; YS wrote the manuscript; TH proposed the methodology of the study, and performed the statistical analysis, prepared the figures and tables for data presentation and illustration; FAZ, TH, KN-VG and NZ contributed to the literature review and data analysis. The data were analysed and interpreted by the authors, who also reviewed the manuscript and vouch for the accuracy and completeness of the data and for the adherence of the study to the protocol.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval The study got an ethical review by Dubai Scientific Research Ethics Committee (DSREC), Dubai Health Authority (protocol no. DSREC-05/2020_25) and was approved for the retrospective analysis of the data obtained as a standard of care. No potentially identifiable personal information is presented in the study.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available on reasonable request. The datasets generated for this study are available on request at Data Analytics Group website: https://bi-dac.com. To assess the risk of having complications in a patient with COVID-19, one may use the ML-based free online tool at https://med-predict.com, which illustrates the results of the current study.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.