Development and external validation of prognostic models for COVID-19 to support risk stratification in secondary care

Objectives Existing UK prognostic models for patients admitted to the hospital with COVID-19 are limited by reliance on comorbidities, which are under-recorded in secondary care, and lack of imaging data among the candidate predictors. Our aims were to develop and externally validate novel prognostic models for adverse outcomes (death and intensive therapy unit (ITU) admission) in UK secondary care and externally validate the existing 4C score. Design Candidate predictors included demographic variables, symptoms, physiological measures, imaging and laboratory tests. Final models used logistic regression with stepwise selection. Setting Model development was performed in data from University Hospitals Birmingham (UHB). External validation was performed in the CovidCollab dataset. Participants Patients with COVID-19 admitted to UHB January–August 2020 were included. Main outcome measures Death and ITU admission within 28 days of admission. Results 1040 patients with COVID-19 were included in the derivation cohort; 288 (28%) died and 183 (18%) were admitted to ITU within 28 days of admission. Area under the receiver operating characteristic curve (AUROC) for mortality was 0.791 (95% CI 0.761 to 0.822) in UHB and 0.767 (95% CI 0.754 to 0.780) in CovidCollab; AUROC for ITU admission was 0.906 (95% CI 0.883 to 0.929) in UHB and 0.811 (95% CI 0.795 to 0.828) in CovidCollab. Models showed good calibration. Addition of comorbidities to candidate predictors did not improve model performance. AUROC for the International Severe Acute Respiratory and Emerging Infection Consortium 4C score in the UHB dataset was 0.753 (95% CI 0.720 to 0.785). Conclusions The novel prognostic models showed good discrimination and calibration in derivation and external validation datasets, and performed at least as well as the existing 4C score using only routinely collected patient information. The models can be integrated into electronic medical records systems to calculate each individual patient’s probability of death or ITU admission at the time of hospital admission. Implementation of the models and clinical utility should be evaluated.


BACKGROUND
The COVID-19 pandemic has placed exceptional strain on healthcare systems globally.Health systems, and especially critical care services, can be overwhelmed, given the number of patients and the duration and severity of their illness.A proportion of patients with COVID-19 can deteriorate rapidly.Clinicians need to differentiate between those with COVID-19 who are at

Strengths and limitations of this study
► The University Hospitals Birmingham (UHB) development dataset represents one of the largest and most ethnically diverse patient cohorts within the UK.► As part of the UHB COVID-19 response, all admitted patients underwent a wide range of investigations to support international research efforts examining prognostic markers allowing assessment of a wide range of possible predictors (demographic variables, symptoms, physiological measures, imaging and laboratory test results) with low levels of missing data.► A limitation of the study was that the overall sample size was relatively small compared with that of the International Severe Acute Respiratory and Emerging Infection Consortium study and was limited to one UK geographical location.► In the external validation cohort, we were unable to examine all of the predictors included in the original full UHB model due to only a reduced set of candidate predictors being available in CovidCollab.► It was not possible to carry out stratified analysis by ethnicity as the UHB dataset contained too few patients in many of the strata, and no ethnicity data were available in the CovidCollab dataset.Open access high risk of the most severe symptoms (requiring intensive care treatment/ventilation) or death, and those who can be considered at low risk and potentially managed in the community.Early identification of patients at highest risk of severe outcomes may provide opportunity to prioritise, intervene and improve outcomes.
Objective prognostic tools for patients with COVID-19, based on patients' initial characteristics, symptoms, biomarkers and imaging at the time of hospital admission, which can be used at or just after admission, and which can accurately discriminate between patients who will progress to more severe symptoms or death and those who will not, can be used by clinicians to triage and manage patients.This could potentially reduce time to appropriate interventions and improve patient outcomes.
A rapid systematic review has identified a number of prediction models developed for COVID-19, including prognostic models. 1 However, while these existing studies provided useful information on candidate predictors for further exploration, the review found substantial limitations: many models were developed exclusively in a Chinese population; many were at high risk of bias, particularly in terms of inclusion of non-representative control participants, inappropriate exclusion criteria and small sample sizes, leading to high risk of overfitting; and external validation was limited. 1 Other studies have evaluated existing early warning scores such as the National Early Warning Score, but with conflicting findings regarding their utility in predicting COVID-19 outcomes. 2 3ore recent models have since been developed, 4 5 some of which overcome a number of these limitations, including the International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC) model and corresponding (simplified) 4C score, which was developed in a UK secondary care population representing 260 hospitals in England, Scotland and Wales (the ISARIC dataset). 5While the 4C score showed reasonable discrimination for mortality, there are some limitations, including a reliance on clinicians counting specific comorbidities, which may not be recorded at admission and which are known to be under-recorded in secondary care, 6 and an absence of imaging data among the candidate predictors.

Aims and rationale
To date, there have been few prognostic models for patients admitted to the hospital with COVID-19 developed in a UK dataset.Furthermore, evaluation of the extent to which the inclusion of comorbidities, imaging and additional biomarkers improves model performance is required.It also remains to be determined whether updating the clinical parameters with evolving biomarkers improves prediction of the clinical course of patients as the disease evolves.
The overarching aim of this study was to develop prognostic models for patients admitted to the hospital with COVID-19 using routinely collected data at the point of admission, which can be used in a secondary care setting to support clinical decision-making.Specific objectives were (1) to develop novel prognostic models for calculating predicted probability of adverse outcomes (death and intensive therapy unit (ITU) admission) at an individual patient level in a UK secondary care setting; (2)  to externally validate these models in an international dataset (including data from UK hospitals); (3) to externally validate the existing UK ISARIC 4C score 5 ; and (4) to compare performance of the newly developed models with the UK ISARIC 4C score.In addition, we developed daily models using time series data from the first 8 days from admission to explore changes in predictors over time.

Data source
Data from University Hospitals Birmingham (UHB) NHS Foundation Trust were sourced via the PIONEER Health Data Research Hub for acute care and were used for model development and for external validation of the ISARIC 4C score.Data from patients with COVID-19 admitted to Queen Elizabeth Hospital, Birmingham (part of UHB), between 1 January 2020 and 16 August 2020 were included.Data included symptoms recorded at admission, comorbidities (from International Classification of Diseases, 10th revision (ICD-10) discharge codes), vital signs (eg, blood pressure and oxygen saturation), laboratory results (biochemistry, haematology, microbiology and pathology), imaging and outcomes (ITU admission and death).
External validation of the newly developed models was performed in the CovidCollab dataset.CovidCollab is an international project using routinely collected healthcare data to develop a better understanding of how best to treat and care for adults with COVID-19. 7 8The dataset includes symptoms, comorbidities, vital signs, laboratory results, imaging findings and outcomes.

Study population
Patients of all ages diagnosed with COVID-19 and hospitalised were included.Diagnosis was defined as a positive test result for SARS-CoV-2 from one or more reverse transcription PCR or transcription-mediated amplification tests.In the CovidCollab dataset, COVID-19 diagnosis was by either PCR or antibody test.Anonymised data for all patients with COVID-19 admitted to UHB during the study period were included.For CovidCollab, data collection was dependent on the specific processes within individual participating hospitals and the capacity of the data collector. 7

Study design
The study utilised retrospective cohort analyses; the index date (start of follow-up) was the hospital admission date.The study period was from 1 January 2020 to 12 September 2020 (the last admission date was 16 August to ensure a minimum of 28 days of follow-up).

Outcomes
The primary outcome was death within 28 days of admission (in-hospital or post-discharge).The secondary outcome was ITU admission within 28 days of admission.

Study follow-up
Participants were followed up from index (admission) date until the earliest of outcome date or study end (latest available data, 12 September 2020).Participants were censored 28 days after the index date.Participants admitted after 16 August 2020 (less than 28 days prior to the study end date) were excluded.

Candidate predictor variables
Candidate predictors were selected a priori following a review of existing literature, discussion with clinical experts (specialists in acute care, critical care and geriatric medicine), and based on availability of variables routinely collected in secondary care/UHB.These included demographic variables, symptoms, comorbidities, physiological measures, imaging findings and laboratory test results.Comorbidities are not reliably and completely collected at admission, with the most complete hospital record of comorbidities usually being the discharge ICD-10 codes; therefore, the development and performance of models with and without comorbidity predictors were compared in order to explore the potential for developing models which would require no additional data collection (other than routinely collected data) at the point of admission.

Model development
Models were trained using UHB data (patients admitted up to and including 16 August 2020).We used a multistage model building process that assessed the impact of a range of feature representation and modelling choices to select important candidate predictors.All analyses were performed in R.
Three sets of models were fitted which incorporated continuous variables in three different ways, to explore the impact of treating these variables as continuous or categorical, and also to explore the impact of different methods of handling missing data: ► As continuous numeric values, with missing values imputed ('continuous').► As categorical values derived from the imputed continuous values ('categorical-imputed'). ► In secondary analysis, as categorical values, using clinically meaningful categories and reference ranges, with missing indicators as a separate category ('categorical').For the three ways of handling numerical features and missing variables mentioned previously, we fitted outcomes of death within 28 days and ITU admission (within 28 days) to candidate predictors using a range of models, which allowed both linear relationships and complex interactions between variables: ► Logistic regression with (1) all baseline parameters (demographic variables, symptoms, vital signs/ physiological measures and laboratory test results); (2) demographic variables only; and (3) all baseline parameters with the addition of recorded comorbidities (recorded up to the point of discharge).► Logistic regression with stepwise Akaike information criterion (AIC) minimisation, both forward and backward. 9► Least absolute shrinkage and selection operator (LASSO, l1 penalised) logistic regression using all baseline parameters.► Gradient boosted model (GBM) using all baseline parameters with default hyperparameter values of 150 trees, maximum interaction depth of 3, minimum of 10 observations in nodes and shrinkage of 0.1. 10Further information on handling of continuous variables is presented in online supplemental appendix 1.
For each of these four variable selection models, in order to reduce overfitting and selection bias, we internally validated using fivefold cross-validation (80/20 train/test split) to derive the candidate variable list.To avoid sensitivity to imputation, this cross-validation was repeated for each of the five multiple imputations.
Due to the relatively small number of outcome events (<300), we did not attempt to systematically look for interactions between multiple variables.

Model performance
Model performance (discrimination) was assessed by calculating the area under the receiver operating characteristic curve (AUROC or C-statistic). 11Calibration was assessed by plotting the observed probability of the outcome against predicted probability and by calculating the calibration slope and intercept.We also calculated sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for the final models.For each feature set and each model, the final results for cross-validated (optimism-adjusted) AUROC and all other metrics (including calibration plots) were combined from all the multiple imputations of the dataset using Rubin's rules for the mean and CI (derived from the SD). 12 Missing data Information on candidate predictors was collected at the point of admission; however, where information on physiological or laboratory measures was not available on the day of admission, measures recorded up to 72 hours after admission were used.Candidate predictors for which >40% of patients had missing data were excluded from the analysis.Further missing continuous variables (vital signs and laboratory tests) and symptoms were imputed using multiple imputation using chained equations (using the R 'mice' multiple imputation package).We performed five imputations and a maximum of 50 iterations. 13Continuous variables were imputed with predictive mean matching, and categorical variables with logistic regression (logreg) or polytomous regression (polyreg).Input variables for the multiple imputation included all We also explored use of a missing category for missing test results.Absence of a record of a comorbidity was taken to indicate absence of the condition.

External validation
To investigate the transferability of models, we performed external validation of logistic regression models derived from the UHB dataset in the CovidCollab dataset for predicting outcomes of 28-day mortality and ITU admission.
Not all candidate predictors were common to both datasets; therefore, new logistic regression models for death within 28 days and for ITU admission were refitted on the UHB data using only those variables also present in the CovidCollab data.We then performed an external validation of these UHB models in the CovidCollab dataset and ascertained the AUROC in both the UHB and Covid-Collab datasets.Based on model performance observed in the initial model derivation and in the interest of clinical utility, we used only categorical rather than continuous numerical variables, with imputed missing values (imputed prior to categorisation).To verify that predictors behaved similarly, we compared logistic coefficients from UHB to the same models fitted on the CovidCollab dataset.To account for sensitivity to missing values, we performed training and testing five times on fivefold multiple imputed datasets for both UHB and CovidCollab.

External validation of ISARIC 4C score
A logistic regression using the 4C score was performed in the UHB dataset (following the same modelling methods used in the original ISARIC study).Model performance was assessed by calculating the AUROC and plotting calibration curves.

Sensitivity analyses
Most patient records had some missing variables; we therefore performed a complete case analysis where we refitted the best forward stepwise selection model derived using the full set of UHB variables to complete case data, then data with ≤1, 2, 5 and 10 missing values, imputing missing values in the same way as previously mentioned, and examined AUROCs and logistic coefficients for stability.
In addition, we performed sensitivity analyses (1) within male and female strata by assessing performance (AUROC) of the final models in male and female patients separately; and (2) within age strata by assessing model performance in patients aged ≤60 and>60 years separately.

Time series analysis
The UHB regression models used baseline measurement data collected on admission; where not available at admission, we accepted values up to 72 hours after admission.To investigate fine-grained temporal effects of data acquisition, we produced a series of separate logistic regression models using data collected at different time windows from within 24 hours of admission up to within 7 days of admission, in 1-day increments, for the mortality outcome.Each dataset included only those patients eligible at the end of the window (not dead or discharged).This created eight different sets of predictors, including baseline variables of age, gender, symptoms and the time-sensitive variables of the latest physiological and laboratory measurements available.
For missing data, data were carried forward from the first observation (last observation carried forward (LOCF)) and fivefold multiple imputation was performed for missing data after LOCF was done, within each separate time-window dataset.Each model was trained and tested in fivefold cross validation, within each imputation, and AUROCs averaged using Rubin's rule.We compared the AUROCs for each of the eight models for predicting 28-day mortality from the time of admission and compared the logistic coefficients for the models.For additional insight into possible effects of changing measurements, we produced an additional logistic model for 28-day mortality to time-sensitive data collected within 4 days of admission, augmented with predictors indicating an increase or decrease in the category of each time-sensitive predictor relative to the reference category from 0 to 4 days, for example, whether temperature had crossed from below to above 37.8°C in that period.

Patient and public involvement
We engaged with members of the PIONEER patient and public involvement group during development of the study protocol.We will further engage with this group, as well as other local and national patient and public involvement groups, in order to discuss dissemination of the findings and the best way to communicate these to patients and the public.We also consulted with several secondary care clinicians before and during the study to ensure that the tools developed meet the needs of clinicians.We have engaged with local NHS trusts to ensure that the algorithms developed are implemented/tested in a hospital setting.

Derivation cohort characteristics
A total of 1040 participants with COVID-19 admitted to UHB were included in the derivation cohort.A total of 288 (28%) died within 28 days of admission and 183 (18%) were admitted to ITU.Baseline characteristics are presented in table 1 (stratified by mortality outcome) and online supplemental table 1 (stratified by ITU admission).The mean (SD) age of participants was 68.2 (17.7) years; 57% (589) were male; and almost 90% had at least one comorbidity.
Mortality outcome (28 days): UHB model and predictive performance Area under the ROC curve values for each of the logistic, LASSO and GBM models, treating continuous variables in one of three ways (as continuous variables with imputed missing values; as clinically meaningful categorical variables with imputed missing values; and as categorical variables with missing categories), are presented in online supplemental table 2.
The final model selected was a logistic regression using stepwise selection of variables with categorisation of continuous variables (with imputed missing values).The final 18 categorical predictors included in the model were: age, breathlessness, sputum, systolic blood pressure, temperature, respiratory rate, oxygen saturation, FiO 2 , alkaline phosphatase, C-reactive protein, corrected calcium, eosinophils, glucose, pH, urea, WBC count, platelets and frailty score.
AUROC for the UHB cross-validated model was 0.779 (95 % CI 0.744 to 0.813) (table 2).At a 20% predicted probability of mortality, sensitivity was 83% (95% CI 81% to 85%); specificity was 58% (95% CI 55% to 61%); positive predictive value was 43% (95% CI 41% to 46%); and negative predictive was 90% (95% CI 88% to 91%) Open access (table 3A).Calibration was very good at low to medium predicted probabilities but was poorer at very high predicted probabilities; a calibration plot is shown in figure 1A; the calibration slope was 0.79 (95% CI 0.64 to 0.94) (table 2).Model coefficients (and model equation) are presented in online supplemental table 3. Addition of comorbidities to the candidate predictors included in the model did not improve performance of the model (online supplemental table 2).Since comorbidities are known to be under-reported during acute presentations, 6 and they offered no improvement on model performance, models without comorbidities were preferred.

ITU admission: UHB model and predictive performance
Area under the ROC curve values for each of the models performed are presented in online supplemental table 4.

Open access
The final model selected was a logistic regression using stepwise selection of variables with categorisation of continuous variables (with imputed missing values).The final 16 categorical predictors included in the model were: age, gender, fever, new onset diarrhoea or vomiting, heart rate, respiratory rate, FiO 2 , temperature, albumin, C-reactive protein, eGFR, pH, monocytes, WBC, frailty score, and Glasgow Coma Scale score.
Addition of comorbidities to the predictors included in the model did not improve performance.
Reduced UHB model and external validation in the CovidCollab dataset A total of 6099 patients admitted with COVID-19 were included in the CovidCollab external validation dataset; 1668 (27%) died and 722 (12%) were admitted to ITU (table 1 and online supplemental table 1).Not all variables included in the UHB model derived previously were available in the CovidCollab dataset.Therefore, revised and reduced models were developed in UHB data using the subset of candidate predictors common to both the UHB and CovidCollab datasets (reduced UHB dataset, UHB-R), using logistic regression with stepwise selection, and these were then externally validated in the Covid-Collab dataset.
The reduced set of 27 candidate predictors included demographic characteristics: age and gender; symptoms: cough, fever and delirium; physiological measures and vital signs: BMI, systolic blood pressure, diastolic blood pressure, heart rate, temperature, respiratory rate, oxygen saturation, FiO 2 and chest X-ray; frailty score; Glasgow Coma Scale score; laboratory test results: eGFR, pH, base excess, lymphocytes, neutrophil:lymphocyte ratio, haemoglobin, bicarbonate, C reactive protein, alanine aminotransferase, urea and lactate.

Mortality (28 days)
For the 28-day mortality outcome, following stepwise selection, the final 10 categorical predictors (common to both datasets) included in the reduced logistic regression model were age, oxygen saturation, FiO 2 , respiratory rate, temperature, systolic blood pressure, C reactive protein, pH, urea and frailty score.

ITU admission
For the ITU admission outcome, the final 11 categorical predictors (common to both datasets) included in the reduced model were age, gender, fever, respiratory rate, FiO 2 , C reactive protein, eGFR, pH, neutrophil:lymphocyte ratio, frailty score and Glasgow Coma Scale score.
Calibration was good for both derivation and external validation datasets; calibration plots are shown in figure 1E,F; calibration slopes were 0.94 (95% CI 0.84 to 1.04) and 0.95 (95% CI 0.82 to 1.08) for the UHB-R and CovidCollab datasets, respectively (table 2).Model coefficients are presented in online supplemental table 7.

External validation of the ISARIC 4C score in the UHB dataset
The AUROC for the recently published ISARIC 4C score in the UHB dataset was 0.753 (95% CI 0.720 to 0.785).The calibration slope was 0.99 (95% CI 0.85 to 1.12) (table 2 and online supplemental figure 1).
It was not possible to externally validate the ISARIC 4C score in the CovidCollab dataset, as information on many of the comorbidities required to calculate the ISARIC comorbidity score was not available in the dataset.

Sensitivity analyses
Analyses exploring different ways of handling missing data are reported in online supplemental appendix 2 and online supplemental figures 2 and 3.

Complete case analysis
Few patients in the dataset had complete data (n=224/1040, 22%); model performance in this patient subset was slightly poorer for the mortality outcome: AUROC 0.696 (95% CI 0.597 to 0.795) for mortality and 0.892 (95% CI 0.844 to 0.940) for ITU admission.Including patients with missing variables, with missing values imputed, improved model performance for predicting mortality; allowing even a single missing/ imputed variable improved AUROC for mortality to 0.760 (95% CI 0.708 to 0.812) (online supplemental table 8).

Stratification by gender and age
When patients were stratified by gender, the reduced models predicting mortality and ITU still performed well: AUROCs for mortality were 0.775 (95% CI 0.726 to 0.823) for males and 0.755 (95% CI 0.706 to 0.804) for females, and those for ITU 0.897 (95% CI 0.856 to 0.937) for males and 0.873 (95% CI 0.833 to 0.913) for females (online supplemental table 9).

Time series analysis
Online supplemental figure 4 shows variation in logistic regression coefficients for the candidate predictors from day of admission and up to 7 days later.The majority of coefficients remained relatively constant over time.However, several (not necessarily statistically significant) trends in the modification of effects over the week of admission on mortality were visible, such as a decrease over the week of the effect of obesity on mortality, elevated effect of eosinophils, and an increase over the week of the effect of elevated haemoglobin, elevated potassium and elevated oxygen saturation.Some of these might be depletion effects related to relatively high patient mortality in the first few days, for example, the apparent protective effect of obesity and high eosinophils.

DISCUSSION
Using routinely collected data for more than a thousand patients admitted with COVID-19 at a large UK hospital trust, we have developed and externally validated prognostic models for mortality and ITU admission.The models showed good discrimination and calibration.The candidate predictors explored included a clinically informed, wider range of demographics, clinical observations, symptoms, comorbidities, biomarkers and radiological investigations than those included in the derivation of existing prognostic scores or models.
If integrated into hospital electronic medical records systems, the model algorithms will provide a predicted probability of mortality or ITU admission within 28 days of hospital admission for each patient based on their individual data at, or close to, the time of admission, which will support clinicians' decision making with regard to appropriate patient care pathways and triage.This information might also assist clinicians in explaining complex prognostic assessments and decisions to patients and their relatives, particularly at times when relatives are unable to see the patient and understand how unwell they are.

Summary of results
The models developed using all 63 available candidate predictors from UHB performed well with an optimismadjusted AUROC of 0.779 (95 % CI 0.744 to 0.813) for mortality within 28 days of admission and 0.893 (95% CI 0.864 to 0.922) for ITU admission.Not all variables included in the UHB dataset are routinely collected at admission in other hospitals; therefore, reduced models using only variables common to both UHB and the CovidCollab external validation dataset were explored.Discrimination remained similar, with an AUROC of 0.791 (95% CI 0.761 to 0.822) for mortality and 0.906 (95% CI 0.883 to 0.929) for ITU admission in the UHB derivation dataset.These reduced models also performed well in the CovidCollab external validation dataset, with AUROCs of 0.767 (95% CI 0.754 to 0.780) and 0.811 (95% CI 0.795 to 0.828) for mortality and ITU admission, respectively.The models also performed well in gender-stratified and age-stratified patient subgroups.
Calibration of all models showed good agreement between observed and predicted probabilities, particularly at lower predicted probabilities in the range where the models would be of most clinical utility.
We found that addition of comorbidities to the model predictors did not improve overall model performance.This may be due to a correlation between presence of comorbidities and related physiological measurements and/or biomarkers which are already captured by the model.

Comparison with existing literature
Two systematic reviews summarised the existing secondary care COVID-19 prognostic models or scores published until 31 May 2020. 1 17 The majority of the reported models, along with several more recent ones, 18 were derived in Chinese cohorts.Many of the models included in the reviews demonstrated high discriminatory performance; however, all pre-existing models when assessed using the PROBAST score were at high risk of bias.Furthermore, few models were externally validated in suitable cohorts.By deriving our model from routinely collected data, we were able to reduce the risk of bias in patient selection as well as predictor and outcome measurements.Additionally, in this study, we were able to externally validate models in a large global heterogeneous cohort.
More recently, the most notable secondary care prediction model advised for uptake in UK hospitals was derived from the ISARIC-WHO collaborating cohort and has been externally validated. 5 19Both the full and reduced UHB-derived models for mortality had slightly better discrimination than the ISARIC 4C score in the UHB data (AUROC 0.753, 95% CI 0.720 to 0.785 for 4C).This compares with an AUROC of 0.767 (95% CI 0.760 to 0.773) for the 4C score reported in the original ISARIC validation cohort. 5However, better performance may be expected for models evaluated in their development dataset compared with external datasets.The newly developed UHB model offers an advantage over the ISARIC 4C model in that it uses only routinely collected patient data recorded at admission and does not require additional assessment and recording of specific comorbidities (which are often not routinely fully recorded at the point of admission).
In our time series analysis, we did not find strong evidence for trends in predictor coefficients over the first 8 days of admission, particularly for variables included in the final models, suggesting that time-dependent effects due to effect modification or selection bias in the first week are small.Another recent model derived from patients with COVID-19 in a Hong Kong hospital adopted the use of time-dependent routinely collected predictors; the model in the Hong Kong study demonstrated high discrimination, with an AUROC of 0.91 when predicting severe COVID-19 outcomes. 20However, this model is yet to be peer-reviewed and externally validated.

Strengths and limitations
The UHB dataset represents one of the largest and most ethnically diverse patient cohorts within the UK.Additionally, as part of the early UHB response to the COVID-19 pandemic, the hospital trust ensured that, on admission, all patients underwent a wide range of investigations to support international research efforts examining prognostic markers.This allowed us to examine a wide range of possible predictors (63 candidate predictors after exclusions).Lastly, a strength of this study was the good performance, in terms of both discrimination and calibration, of the simplified, reduced model in an externally validated cohort (CovidCollab), indicating its suitability for wider use, including potentially in LMICs.
Despite the strengths, the findings must be considered in light of the study's limitations.Although we were able to use a derivation dataset from UHB with low levels of missing data, the overall sample size was relatively small compared with that of the ISARIC study and was limited to one UK geographical location.However, we were able to externally validate the model in a larger external cohort.A second limitation was that in the external validation cohort, we were unable to examine all of the predictors included in the original full UHB model due to only a reduced set of candidate predictors being available in CovidCollab.Nevertheless, the model performed well and the results suggest it may be applicable in a wide range of datasets where only a reduced set of predictor variables is available.It was not possible to carry out stratified analysis by ethnicity as, in the UHB dataset, too few patients were included in most of the strata; ethnicity data were not available in the CovidCollab dataset.Open access the definition used by the UK government in reporting COVID-19 mortality statistics 21 22 ; however, we acknowledge that this may not capture all COVID-19-related deaths, and some other studies have used a longer period of follow-up. 23

CONCLUSION
In this paper, we have described the development and external validation of novel prognostic models which predict mortality and ITU admission within 28 days of admission for patients admitted to hospital with COVID-19.The simple, reduced models used only routinely collected data gathered at admission, showed good discrimination and calibration, performed at least as well as the existing ISARIC 4C score and performed well in a validation cohort.The models can be integrated into existing electronic medical records systems to calculate each individual patient's probability of death or ITU admission at the time of hospital admission.The models should be further validated to determine their applicability in other populations.In addition, implementation of the models and clinical utility should be evaluated.

Figure 1
Figure 1 Calibration plots (observed probability (y-axis) against predicted probability (x-axis)): (A) UHB derivation dataset for mortality outcome, (B) UHB derivation dataset for ITU admission outcome, (C) UHB-R derivation/train dataset reduced model for mortality outcome, (D) UHB-R derivation/ train dataset reduced model for ITU admission outcome, (E) CovidCollab external validation dataset reduced model for mortality outcome, (F) CovidCollab external validation dataset reduced model for ITU admission outcome.ITU, intensive therapy unit; UHB, University Hospitals Birmingham; UHB-R, University Hospitals Birmingham reduced.
Our definition of 28-day COVID-19 mortality aligns with the current technical guidance from Public Health England and Adderley NJ, et al.BMJ Open doi:10.1136/bmjopen-2021-049506 on September 24, 2023 by guest.Protected by copyright.

Table 2
AUROCs, calibration slopes and calibration intercepts for models developed in UHB data (full (UHB) and reduced (UHB-R) datasets) and externally validated in CovidCollab data, and for external validation of the ISARIC 4C score *Models derived using logistic regression with stepwise selection of candidate predictors and categorisation of continuous variables into clinically meaningful categories (after imputing missing data).†Not all variables included in the full UHB model were available in the CovidCollab dataset.Therefore, revised (reduced) models were developed in UHB data using a subset of the candidate predictors common to both the UHB and CovidCollab datasets (UHB-R), and these were then externally validated in the CovidCollab dataset.AUROC, area under the receiver operating characteristic curve; ISARIC, International Severe Acute Respiratory and Emerging Infection Consortium; UHB, University Hospitals Birmingham; UHB-R, University Hospitals Birmingham reduced model.on September 24, 2023 by guest.Protected by copyright.http://bmjopen.bmj.com/BMJ Open: first published as 10.1136/bmjopen-2021-049506 on 17 January 2022.Downloaded from

Table 3A
Sensitivity, specificity, PPV and NPV for mortality at 28 days after admission (University Hospitals Birmingham derivation dataset) FN, false negative; FP, false positive; NPV, negative predictive value; PPV, positive predictive value; TN, true negative; TP, true positive.

Table 3B
Sensitivity, specificity, PPV and NPV for intensive therapy unit admission within 28 days after admission (University Hospitals Birmingham derivation dataset) FN, false negatives; FP, false positives; NPV, negative predictive value; PPV, positive predictive value; TN, true negatives; TP, true positives.

Table 4A
Sensitivity, specificity, PPV and NPV for mortality at 28 days after admission for the reduced model (UHB derivation dataset and CovidCollab external validation dataset, using predictors common to both datasets) Table4BSensitivity, specificity, PPV and NPV for intensive therapy unit admission within 28 days after admission in the reduced model (University Hospitals Birmingham derivation dataset and CovidCollab external validation dataset, using predictors common to both datasets)