Article Text
Abstract
Objectives Develop an individualised prognostic risk prediction tool for predicting the probability of adverse COVID-19 outcomes in patients with inflammatory bowel disease (IBD).
Design and setting This study developed and validated prognostic penalised logistic regression models using reports to the international Surveillance Epidemiology of Coronavirus Under Research Exclusion for Inflammatory Bowel Disease voluntary registry from March to October 2020. Model development was done using a training data set (85% of cases reported 13 March–15 September 2020), and model validation was conducted using a test data set (the remaining 15% of cases plus all cases reported 16 September–20 October 2020).
Participants We included 2709 cases from 59 countries (mean age 41.2 years (SD 18), 50.2% male). All submitted cases after removing duplicates were included.
Primary and secondary outcome measures COVID-19 related: (1) Hospitalisation+: composite outcome of hospitalisation, ICU admission, mechanical ventilation or death; (2) Intensive Care Unit+ (ICU+): composite outcome of ICU admission, mechanical ventilation or death; (3) Death. We assessed the resulting models’ discrimination using the area under the curve of the receiver operator characteristic curves and reported the corresponding 95% CIs.
Results Of the submitted cases, a total of 633 (24%) were hospitalised, 137 (5%) were admitted to the ICU or intubated and 69 (3%) died. 2009 patients comprised the training set and 700 the test set. The models demonstrated excellent discrimination, with a test set area under the curve (95% CI) of 0.79 (0.75 to 0.83) for Hospitalisation+, 0.88 (0.82 to 0.95) for ICU+ and 0.94 (0.89 to 0.99) for Death. Age, comorbidities, corticosteroid use and male gender were associated with a higher risk of death, while the use of biological therapies was associated with a lower risk.
Conclusions Prognostic models can effectively predict who is at higher risk for COVID-19-related adverse outcomes in a population of patients with IBD. A free online risk calculator (https://covidibd.org/covid-19-risk-calculator/) is available for healthcare providers to facilitate discussion of risks due to COVID-19 with patients with IBD.
- COVID-19
- inflammatory bowel disease
- statistics & research methods
Data availability statement
Data are available upon reasonable request. We are committed to sharing our data with the international research community. Data requests are reviewed by our SECURE-IBD team including the International Advisory Committee to ensure data will be used in a scientifically and ethically sound way. The data request form and additional information can be found online at https://covidibd.org/sharing-secure-ibd-data/ . Data collection is ongoing, and data beyond the current study’s data may be available at the time of the request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
Our study includes data from an international cohort with a wide range of ages including paediatric patients.
The use of regularised regression methods for prediction allowed us to consider a wide range of potential predictors in a statistically sound way.
The data for this study comes from a voluntary registry, and the differences between the registry population and the general population of patients with inflammatory bowel disease (IBD) are unknown.
The models were validated using a test data set from the same registry and have not yet been validated in an external cohort of individuals with IBD.
Our methods are associational, not causal—when using the online risk calculator, healthcare providers should not use it to answer ‘what-if’ questions (eg, how an individual’s risk would change if they altered the medications they were taking) which are inherently causal questions.
Introduction
Since the onset of the COVID-19 pandemic, almost 50 million cases have been reported globally. Many countries, including the USA, are reporting record numbers of new cases as of November 2020.1 While the majority of cases are mild, patients with at least one comorbidity are at higher risk of adverse outcomes, including hospitalisation, respiratory failure or death.2 3 Risk calculators can facilitate shared decision making between patients and healthcare providers,4 and such tools have been created to predict death due to COVID-19 in US patients 65 years and older,5 to determine hospitalisation risk6 and to guide early vaccine allocation.
Patients with inflammatory bowel disease (IBD) are prescribed immunosuppressive medications such as corticosteroids, immunomodulators, biological therapies and Janus-kinase inhibitors, which are linked with a higher risk of viral infection.7 8 Demographics, comorbidities, medication use, geographic region and other factors may increase the risk for COVID-19-related complications among patients with IBD.9 10 To help healthcare providers and patients navigate these myriad potential risk factors, we developed and validated penalised multivariable logistic regression models for predicting the probability of hospitalisation, intensive care unit (ICU) admission and death due to COVID-19 in patients with IBD. We used an international registry of 2709 patients with IBD with COVID-19 from 59 countries. We also developed a free, publicly available personalised risk calculator using the final models that is available online (https://covidibd.org/covid-19-risk-calculator/). Reporting follows Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis guidelines.11
Methods
Source of data
The Surveillance Epidemiology of Coronavirus Under Research Exclusion for Inflammatory Bowel Disease (SECURE-IBD) database (www.covidibd.org) is an international registry to study outcomes of COVID-19 in paediatric and adult patients with IBD.12 SECURE-IBD is a voluntary registry with ongoing data collection where healthcare providers can report cases of COVID-19 in patients with IBD, confirmed by PCR or antibody testing. Healthcare providers are instructed to report cases of severe outcomes after a minimum of 7 days from the onset of symptoms and after a sufficient time has passed to observe the disease course through the resolution of acute illness or death. In the event that a patient’s status changed after submission, reporters are instructed to re-report and contact the research team. Reporters were not explicitly informed of what data could be used as predictors or outcomes, but being a voluntary registry, reporters were not blinded. A fuller account of the data collection is given in Brenner et al.13
Patient
Patient and professional organisations representing many countries were engaged in planning the registry and the data collection, promoting the registry and disseminating results from studies using the SECURE-IBD database. A list of the organisations involved is available in online supplemental table 1.
Supplemental material
Participants
We included all patients reported to the registry from 13 March 2020, the data collection start date, through 20 October 2020. For model development, a training sample consisting of 85% of the entire surveillance data set available as of 15 September 2020 was used. The random split was done using stratified random sampling based on an ordinal version of the outcome. The test data set consisted of the remaining 15% of the data available on 15 September, plus all of the additional cases reported to the registry between 16 September 2020 and 20 October 2020. We added the entirety of the last month of data to the test data set in order to provide a more honest assessment of our model’s performance in an environment that is changing over time.
We reported means and SD for continuous variables, counts for categorical variables and proportions for binary variables. We reported the missing data for all variables. We did not include p values in our descriptive tables following the Strengthening the Reporting of Observational Studies in Epidemiology guidelines.14
Outcomes
We examined three primary outcomes: (1) hospitalisation or death (Hospitalisation+), (2) ICU admission, mechanical ventilation or death (ICU+) and (3) death due to COVID-19 related causes (Death). Patients may experience multiple outcomes. All outcomes were reported by the patient’s healthcare provider at the time of the case report.
Predictors
As our aim was to create models and a risk stratification tool intended to allow physicians to inform patients of their risk before presenting with COVID-19, we restricted our attention to predictors that would be available during a routine consultation. COVID-19 presenting symptoms and information about the COVID-19 treatment received were therefore not included in this analysis. All predictors were reported by the patients’ healthcare provider.
A full description of the predictors is available in online supplemental table 2. Demographic predictors included age, country of residence, state of residence (for US cases), gender, race and ethnicity. Racial indicators included white, black and Asian. American Indian and Pacific Islander indicators were excluded due to low prevalence. Multi-racial patients belong to multiple categories. As only one patient had reported a gender other than male or female, only two genders were considered in the analysis. Due to the nature of reporting, ethnicity, gender and race should be interpreted as provider-perceived race and gender. Assessing race and ethnicity is important for identifying potential health inequities in COVID-19 related outcomes. For cases from US states with very low prevalence in the registry, a more general geographic predictor (census region or census division) was used in place of the state itself. Clinical predictors included height, weight, body mass index (BMI, study derived), IBD diagnosis (Crohn’s disease, ulcerative colitis or IBD unspecified) and IBD disease activity as defined by physician global assessment. We included indicators for the following a priori defined medication classes: biologicals (including antitumour necrosis factor (anti-TNF), anti-interleukin 12 (anti-IL-2) and anti-integrin agents), 5-aminosalicylates/sulfasalazine, immunomodulators (6MP, azathioprine, methotrexate), corticosteroids (prednisone, budesonide and other oral/parenteral steroids) and Janus kinase inhibitors (tofacitinib). We also included indicators for subclasses of biologicals (eg, anti-TNF) at the time of COVID-19 diagnosis. Additionally, dosage information was included for prednisone, 6-mercaptopurine and azathioprine.
For categorical (including binary) predictors without a meaningful reference level, all levels were included in the model. Quadratic terms were considered for all continuous covariates. Interactions were considered based on a combination of subject matter expert advice and a minimum threshold of thirty observations for every cell for interactions involving two binary predictors.
Missing data
Multiple imputation of the covariates and outcomes was performed using multivariate imputation by chained equations to address missing data.15 16 A total of 30 imputed data sets were created. Imputation was performed separately on the training and test data to prevent inducing dependence between the training and the test data through the imputation models. For transformed variables that are derived from other covariates (eg, BMI from height and weight), we imputed the missing root variables and then created the transformed variable to ensure that the relationship between the transformed variable and its inputs was preserved.
Table 1 includes the level of missing data in each of the covariates included in the analysis. Medication variables, clinical descriptions of disease and severity, location, age, and gender all had very low levels of missingness, ranging from 0% to under 5%. There were three covariates with a moderate amount of missing data—height, weight and ethnicity were missing in approximately 20% of patients.
Statistical analysis
The 10-fold cross-validation deviance averaged across each of the imputed data sets was used to decide between the least absolute shrinkage and selection operator (LASSO), ridge or elastic net penalties and to choose the value of the regularisation parameter.17 18 Separate logistic regression models were fit for each of the outcomes. Smoothing splines for continuous covariates,19 a multinomial model, Group LASSO20 and Sparse Group LASSO,21 were all investigated as potential methods, but the performance improvement in terms of the cross-validation deviance was not sufficient to justify the additional complexity and computational time. The non-parametric resampling bootstrap was used to generate 1000 samples, and for each bootstrap sample, the same sampled participants were used across the 30 imputed data sets.22 A total of 30 000 (30×1000) fitted models were created. Predicted probabilities were created by averaging the predictions from all models. We used the sample mean of the bootstrap distribution of the predicted probabilities for the final predicted probability and the percentiles of the bootstrap distribution to find the 90% CI for the risk estimate. Risk groups were not created.
To assess the performance of the resulting predictions, we created receiver operator characteristic (ROC) curves and calculated the corresponding area under the curve (AUC) using the held-out test data set for the imputed data sets.22 We provide two graphical summaries of the resulting models: (1) a summary of the sign distribution showing, across bootstrap replications and imputed data sets, the proportion of estimated associations which were negative (better outcomes), zero or positive and (2) box plots of the estimated effect on the log-odds scale. We opted to show box plots instead of CIs to highlight the exploratory nature of the results for individual predictors. LASSO estimates are biased, and the lack of a priori hypotheses makes statistical significance testing inappropriate. We first defined a set of contrasts in order to make meaningful comparisons while accounting for the second-order terms in the model rather than report results for every parameter. The contrast matrices are available on a public repository (https://github.com/KosorokLab/CovidIBDRiskCalc) in a CSV format.
Predictions are averaged over bootstrap replications and imputed data sets, and so there is no single set of model coefficients to report. Because the logistic link function is non-linear, the predicted probability from averaging over predictions from each fitted model does not equal the predicted probability from averaging over coefficients across the models. Additionally, averaging the model coefficients would result in none of the coefficients being equal to zero unless that coefficient is equal to zero in every fitted model. Instead of reporting a misleading model summary, we opted to make all the model coefficients available online (https://github.com/KosorokLab/CovidIBDRiskCalc).
Software
The analysis was conducted using R V.4.0.2 and the tidyverse, glmnet, glmnetUtils, mice, magrittr, future and pROC packages.16 23–29 The online calculator was created using shiny.30 The most recent draw for the Carolina Pick 4 lottery (https://nclottery.com/Pick4) at the time of analysis was used for the random number generation seed. The code used to conduct the analysis is available on GitHub (https://github.com/KosorokLab/CovidIBDRiskCalc). This does not include the study data, but the estimated model coefficients are available.
Results
Participants
A total of 2709 patients were reported to the registry, split into a training set of 2009 patients and a test data set of 700 patients. The test data set was comprised 366 from the 15% split and 334 patients added to the registry after model fitting and before manuscript submission. Table 1 provides demographic, clinical, medication and outcome descriptive summaries for the training set, test set and the whole sample. A total of 633 (24%) patients were hospitalised, 137 (5%) were admitted to the ICU or intubated and 69 (3%) patients died. The cohort has 1076 (40%) patients from the USA, with the rest coming from a variety of other countries summarised in table 1.
Model performance
The models have excellent discrimination, with an AUC and associated 95% CI estimated on the test data set averaged over the imputations of 0.79 (0.75 to 0.83) for Hospitalisation+, 0.88 (0.82 to 0.95) for ICU+ and 0.94 (0.89 to 0.99) for Death. The receiver operator character curves are shown in figure 1.
Predictors of hospitalisation, intensive care and death
Figures 2 and 3 show the estimated coefficient sign distribution and the effect on the log-odds scale for the ten contrasts most strongly associated with each outcome, respectively. Consistent with other studies on risk factors for hospitalisation and death, we find older age, male gender and comorbidities to be associated with worse outcomes due to COVID-19.3 31 White race is associated with a lower risk of Hospitalisation+ in 89.2% of our replications but is not consistently selected in the models for ICU+ (30%) or Death (10%). These plots, not restricted to the top ten effects, are available in the supplement for all demographic, clinical and medication predictors (online supplemental figures 1 and 2), for countries (online supplemental figures 3 and 4) and for US regions (online supplemental figures 5 and 6).
Corticosteroids are associated with a higher risk of Hospitalisation+, ICU+ and Death. Oral corticosteroid use is the most important predictor, in terms of the magnitude of the absolute value of the coefficient, for Hospitalisation+, ICU+ and Death (figure 3). Biological medicines are associated with a lower risk of Hospitalisation+, ICU+ and Death, with integrin antagonists having directionally smaller effects than TNF antagonists or IL-12/23 inhibitors.
Online risk tool
The online risk calculator where physicians can enter their patient’s information and receive predictions from our models is freely available online (http://shiny.bios.unc.edu/secure-ibd-risk-calc/). The SECURE-IBD COVID-19 Risk Calculator was designed for physicians to use during consultations with their patients and includes detailed clinical characteristics, including demographics, disease diagnosis information, comorbidities and current medications. Daily dosage may optionally be entered for certain medications. The output of the risk calculator numerically and visually summarises the patient’s probabilities of adverse outcomes and associated prediction intervals among the three nested outcomes discussed earlier. Figures 4 and 5 display the results for two example patients and their associated probabilities (and 90% CIs) of adverse outcomes if they were to contract COVID-19. The interactive application could provide a reliable basis for distinguishing between high-risk and low-risk patients to aid in personalising clinical guidance on decisions about precautions, returning to normal activities and vaccination.
Discussion
We developed and validated risk prediction models for hospitalisation, intensive care stay and death resulting from COVID-19 in patients with IBD using data from 2709 cases from 59 countries reported through an international voluntary registry.12 We made a free online risk calculator using these models (https://covidibd.org/covid-19-risk-calculator/) for healthcare providers to facilitate discussion of risks due to COVID-19 with their patients with IBD.4 The interactive application could provide a reliable basis for distinguishing between high-risk and low-risk patients to aid in personalising clinical guidance on decisions about precautions, returning to normal activities and vaccination.
Other COVID-19-related risk prediction tools have focused on predicting hospital course based on clinical data captured at the time of admission,6 and predicting mortality among US patients aged 65 years and older.5 Our risk tool is unique in at least three ways. First, we focus on a specialised population of patients with IBD, a chronic, immune-mediated condition frequently treated with immune suppressive medications and often affected by other comorbidities. Second, our model focuses on predictors that are known before a patient were to contract COVID-19 and thus can be used to inform lifestyle or treatment decisions to prevent infection or downstream complications. Finally, we examine a broader range of outcomes than tools focused solely on mortality. Our work can serve as a model for other disease areas, and our code is publicly available and could be adapted for similar online risk tools in other settings or populations.
Strong associations with worse adverse COVID-19 outcomes were oral corticosteroids, older age, comorbidities, gender and non-white physician-reported race (for Hospitalisation+). Caution must be used when interpreting penalised regression results because the coefficients are biased, but the results for oral corticosteroids were particularly dramatic. Compared with not taking an oral corticosteroid, taking a daily dose equivalent of 40 mg of prednisone was associated with 10 times greater adjusted odds of death. Biological therapies were associated with a lower risk of adverse COVID-19 outcomes, with small differences between the subcategories of biological therapies. Compared with not taking a biological therapy, TNF inhibitors were associated with an adjusted OR of 0.62 for death. In contrast to earlier studies using this database,13 32 we did not find a consistent association between 5-aminosalicylates and a higher risk of adverse outcomes; depending on the imputation and bootstrap replication, the sign would often change from positive to negative.
The worldwide collaboration that enabled this study and the detailed clinical data reported by physicians or trained medical staff is an important strength of this study. The machine learning approach allowed us to consider a wide variety of potential associations with adverse outcomes, and we examined multiple adverse COVID-19-related adverse outcomes enabling preliminary comparisons between risks. Certain comorbidities, including chronic obstructive pulmonary disease (COPD), cardiovascular disease (CVD), and cancer, were not as strongly associated with Hospitalisation+ as they were for death. In contrast, severe IBD disease activity was an important predictor of Hospitalisation+ but was not consistently associated with a higher risk of ICU+ or Death.
Limitations
The data for this study comes from a voluntary registry, and the registry population may differ in unknown ways from the general population of patients with IBD. Reported cases may under-represent both low-risk asymptomatic cases and severely ill patients who may be hospitalised at an outside hospital or die without their healthcare provider’s knowledge. Model development and validation were conducted using data from the same registry, and validation in an independent cohort of patients with IBD will be an important future direction. Our results are associational, not causal—when using the online risk calculator, healthcare providers should not use it to answer ‘what-if’ questions (eg, how an individual’s risk would change if they altered the medications they were taking) which are inherently causal questions.33 While the registry has a wealth of clinical data, it does not collect granular data on many social determinants of health. Additionally, insurance status is not collected, which, for patients in the USA, likely factors into the decision making of whether to visit a hospital. Finally, we cannot compare the risk of adverse COVID-19 outcomes in patients with IBD to that in the general population.
Conclusions
This prognostic model can effectively predict which patients with IBD may be at higher risk for COVID-19-related morbidity. The free and publicly available (https://covidibd.org/covid-19-risk-calculator/) risk calculator should facilitate patient–provider discussions regarding the individualised risk of COVID-19 based on patient and treatment-related factors. As COVID-19 cases continue to rise in the USA and the rest of the world, this tool will be important in assisting physicians in identifying high-risk patients with downstream clinical implications. This tool can inform public health efforts to promote rational vaccine allocation and could help providers target their outreach to higher-risk patients. We believe this approach can also serve as a model for risk stratification in other chronic diseases.
Data availability statement
Data are available upon reasonable request. We are committed to sharing our data with the international research community. Data requests are reviewed by our SECURE-IBD team including the International Advisory Committee to ensure data will be used in a scientifically and ethically sound way. The data request form and additional information can be found online at https://covidibd.org/sharing-secure-ibd-data/ . Data collection is ongoing, and data beyond the current study’s data may be available at the time of the request.
Ethics statements
Patient consent for publication
Ethics approval
The UNC-Chapel Hill Office for Human Research Ethics has determined that the storage and analysis of deidentified data for this project does not constitute human subjects research as defined under federal regulations (45 CFR 46.102 and 21 CFR 56.102) and does not require IRB approval.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Twitter @kushalshah96, @Minxin Lu
Contributors MK conceptualised and acquired funding for the study. XZ curated the data. The investigation including data collection and analysis was conducted by JS, KSS, ML, XZ, RU, EJB, MA, J-FC, MK and MRK. The formal analysis, software programming, validation, visualisations and preparation of the original draft were done by JS, KSS and ML. The modelling methods were determined by JS, KSS, ML and MRK. The execution of the project was supervised by MK and MRK. The draft was reviewed and edited by JS, KSS, ML, XZ, RU, EJB, MA, J-FC, MK and MRK.
Funding This work was funded by the Helmsley Charitable Trust (2003-04445), National Center for Advancing Translational Sciences (UL1TR002489), a T32DK007634 (EJB) and a K23KD111995-01A1 (RCU). Additional funding was provided by Pfizer, Takeda, Janssen, Abbvie, Eli Lilly, Genentech, Boehringer Ingelheim, Bristol Myers Squibb, Celltrion and Arenapharm.
Competing interests JS, KSS, ML, XZ, EJB, MA and MRK report no conflicts of interest. RU has served as a consultant and/or advisory board member for Bristol Myers Squibb, Eli Lilly, Janssen, Pfizer and Takeda. He has received research support from AbbVie, Boehringer Ingelheim and Pfizer. He is supported by a Career Development Award from the National Institutes of Health (K23KD111995‐01A1). J-FC reports receiving research grants from AbbVie, Janssen Pharmaceuticals and Takeda; receiving payment for lectures from AbbVie, Amgen, Allergan, Inc. Ferring Pharmaceuticals, Shire and Takeda; receiving consulting fees from AbbVie, Amgen, Arena Pharmaceuticals, Boehringer Ingelheim, Celgene Corporation, Celltrion, Eli Lilly, Enterome, Ferring Pharmaceuticals, Genentech, Janssen Pharmaceuticals, Landos, Ipsen, Medimmune, Merck, Novartis, Pfizer, Shire, Takeda, Tigenix, Viela bio; and hold stock options in Intestinal Biotech Development and Genfit. MK has consulted for Abbvie, Janssen, Pfizer and Takeda, is a shareholder in Johnson & Johnson, and has received research support from Pfizer, Takeda, Janssen, Abbvie, Lilly, Genentech, Boehringer Ingelheim, Bristol Myers Squibb, Celtrion and Arenapharm.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.