Article Text

Original research
Biomarker identification using dynamic time warping analysis: a longitudinal cohort study of patients with COVID-19 in a UK tertiary hospital
  1. Hannah Burke1,
  2. Anna Freeman1,
  3. Paul O’Regan2,
  4. Oskar Wysocki2,
  5. Andre Freitas2,
  6. Ahilanandan Dushianthan1,
  7. Michael Celinski1,
  8. James Batchelor1,
  9. Hang Phan3,4,
  10. Florina Borca5,
  11. Natasha Sheard6,
  12. Sarah Williams6,
  13. Alastair Watson1,
  14. Paul Fitzpatrick7,
  15. Dónal Landers8,
  16. Tom Wilkinson1
  17. On behalf of the REACT COVID group
  1. 1Faculty of Medicine, University of Southampton, Southampton, UK
  2. 2Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, Cancer Research UK Manchester Institute, The University of Manchester, Manchester, UK
  3. 3Clinical Informatics Research Unit, University of Southampton Faculty of Medicine, Southampton, UK
  4. 4University of Southampton, Southampton, UK
  5. 5Institute for Life Sciences, University of Southampton, Southampton, UK
  6. 6University Hospital Southampton NHS Foundation Trust, Southampton, UK
  7. 7University of Manchester, Cancer Biomarker Centre, Cancer Research UK Manchester Institute, Manchester, UK
  8. 8Digital Experimental Cancer Medicine Team, University of Manchester, Cancer Biomarker Centre, Cancer Research UK Manchester Institute, Alderley Edge, Cheshire, UK
  1. Correspondence to Dr Anna Freeman; a.freeman{at}soton.ac.uk

Abstract

Objectives COVID-19 is a heterogeneous disease, and many reports have described variations in demographic, biochemical and clinical features at presentation influencing overall hospital mortality. However, there is little information regarding longitudinal changes in laboratory prognostic variables in relation to disease progression in hospitalised patients with COVID-19.

Design and setting This retrospective observational report describes disease progression from symptom onset, to admission to hospital, clinical response and discharge/death among patients with COVID-19 at a tertiary centre in South East England.

Participants Six hundred and fifty-one patients treated for SARS-CoV-2 between March and September 2020 were included in this analysis. Ethical approval was obtained from the HRA Specific Review Board (REC 20/HRA/2986) for waiver of informed consent.

Results The majority of patients presented within 1 week of symptom onset. The lowest risk patients had low mortality (1/45, 2%), and most were discharged within 1 week after admission (30/45, 67%). The highest risk patients, as determined by the 4C mortality score predictor, had high mortality (27/29, 93%), with most dying within 1 week after admission (22/29, 76%). Consistent with previous reports, most patients presented with high levels of C reactive protein (CRP) (67% of patients >50 mg/L), D-dimer (98%>upper limit of normal (ULN)), ferritin (65%>ULN), lactate dehydrogenase (90%>ULN) and low lymphocyte counts (81%<lower limit of normal (LLN)). Increases in platelet counts and decreases in CRP, neutrophil:lymphocyte ratio (p<0.001), lactate dehydrogenase, neutrophil counts, urea and white cell counts (all p<0.01) were each associated with discharge.

Conclusions Serial measurement of routine blood tests may be a useful prognostic tool for monitoring treatment response in hospitalised patients with COVID-19. Changes in other biochemical parameters often included in a ‘COVID-19 bundle’ did not show significant association with outcome, suggesting there may be limited clinical benefit of serial sampling. This may have direct clinical utility in the context of escalating healthcare costs of the pandemic.

  • COVID-19
  • respiratory medicine (see thoracic medicine)
  • respiratory infections

Data availability statement

A REACT COVID Data Access Management Committee has been established to prioritise and ensure appropriate governance of requests to access linked anonymised clinical data, which may be shared with other research centres. Further details of this have been published elsewhere . [please refer ref. 15]

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Close alignment of research and clinical practice in a near real-time manner.

  • Longitudinal data collection and sampling opportunities.

  • A single-centre study with data collection reflective of clinical need rather than a strict protocolised time frame.

  • Use of novel artificial intelligence techniques for data analysis.

  • Analysis of a ‘first wave’ cohort prior to approval for use of COVID-19 specific treatments, including dexamethasone.

Introduction

COVID-19 is a heterogeneous disease with variable clinical outcomes, ranging from asymptomatic carriage to severe pneumonia and multi-organ failure.1 2 Understanding this heterogeneity and its implications for prognosis and therapeutic response is key to improving outcomes in COVID-19. Despite the rapid development of effective vaccinations,3 4 the clinical repercussions of SARS-CoV-2 infection are likely to continue to impact on health services, and an increased understanding of the disease is still necessary.

Large observational studies have characterised the clinical features of hospitalised patients with COVID-192 and highlighted specific risk factors associated with mortality such as older age, male sex and chronic comorbidity.5 ,6Furthermore, robust models that predict prognosis of COVID-19 have been developed for use both in the general population7 and on admission to hospital.8 These models have incorporated known phenotypic risk factors but also included simple clinical variables such as blood urea level and C-reactive protein (CRP). Much has been documented about the common laboratory parameters in COVID-19 and whether these can be used as diagnosis aids or to predict outcome.9 However, the focus has been solely on admission blood parameters, and little is known about how these parameters change during admission and whether these changes could help prognosticate.

In addition, little is known about the relationship between timing of symptom onset and presentation to hospital and subsequent disease trajectory. The WHO International Severe Acute Respiratory and emerging Infection Consortium (ISARIC) cohort reports the median duration of symptoms before admission as 4 days; however, there is a wide range of 1–8 days.5 The timing of presentation to hospital may have important consequences as it may relate to the underlying pathology of the disease.10 For example, initial evidence suggested that acute respiratory distress syndrome (ARDS)-associated with severe COVID-19 presents in the second week of the illness.11 The emerging literature suggest that there may be two distinct but overlapping pathological subsets: the first triggered by the virus itself and the second by the host response.11–13 Therefore, the point at which patients deteriorate and require hospitalisation may be key to understanding the dominant pathological mechanism at play and may have important implications for treatment.

We hypothesise that timing of presentation to hospital with COVID-19 may have a bearing on the outcome of disease and may relate to an individual’s COVID-19 risk profile, as defined by a weighted risk score based on the clinical and biochemical features described previously. Furthermore, identifying specific changes in common laboratory blood markers may provide additional prognostic information and aid clinicians working in the field.

Here we report the clinical characteristics, laboratory measures and outcomes of all patients presenting with SARS-CoV-2 positive swabs to a tertiary academic medical centre in Southampton (UK) from 7 March to 4 September 2020. We examine timing of presentation to hospital against individual patients' COVID-19 risk, and determine temporal changes of laboratory blood variables that predict outcome during hospital admission. The granularity of our data and longitudinal nature of the analysis adds novelty and depth to the more cross-sectional analyses in the literature to date. While dynamic time warping has been used in modelling and forecasting the number COVID-19 cases,14 15 we are the first, to our knowledge, to use dynamic time warping to look much more closely at the patients, aiming to discover patterns in biomarkers trajectories.

Objectives

  • Compare disease progression from symptom onset to final outcome between risk groups.

  • Identify features at admission to hospital that are associated with outcome.

  • Identify changes in biochemical parameters from serial sampling that are associated with outcome.

Methods

Study design and setting

As part of the Research Evaluation Alongside Clinical Treatment in COVID-19 (REACT) observational and biobanking study of 16COVID-19,17 data were collected for COVID-19 positive patients who were admitted to University Hospital Southampton between 7 March and 4 September 2020.

Participants

Patients were included in the study if they were admitted to the hospital and were confirmed positive for SARS-CoV-2 on real-time reverse transcription PCR from a nasopharyngeal swab or bronchoalveolar lavage. Patients without a definitive outcome (eg, death or discharge), either due to ongoing treatment or missing outcome data at the time of analysis, were excluded. Possible second COVID-19 infections or readmissions with no subsequent evidence of death or discharge were also excluded.

Variables

Patients’ characteristics included demographics (age, sex and body mass index) and comorbidities (including asthma, chronic obstructive pulmonary disease (COPD), cardiac disease and others). The following data were collected at admission and throughout hospitalisation as part of routine clinical care. Laboratory tests included full blood count, renal profile, liver profile, CRP, ferritin, D-dimer and lactate dehydrogenase (LDH), and vital signs included blood pressure, heart rate, respiration rate and peripheral oxygen saturations. Timing, dose and duration of all treatments including corticosteroids, anticoagulants, antibiotics, antivirals and antifungals were collected.

Outcomes

The primary outcome was in-hospital mortality. Analysis of associations between biochemical parameters at admission and outcome was restricted to patients who were hospitalised for 2 or more days and had a final outcome within 28 days of admission. For analysis of changes in parameters, additional restrictions were that patients’ last specimens must have been taken at least 2 days after admission and no more than 4 days prior to final outcome.

Data sources/measurement

Clinical data were captured longitudinally, with change over time treated as explicit. All data collected from the cohort in the study are kept in a highly secure contemporary encrypted data platform BC|Insight (within the Clinical Informatics Research Unit, University of Southampton) that was set up in a Microsoft Data Centre in South UK. A detailed outline of study protocol and methodology is published elsewhere.17

In order to adjust analysis of mortality based on known risk factors for COVID-19, weighted risk scores were calculated for patients at admission (first available value up to and including the day after admission) using available variables and equivalent weightings, as described previously for the 4C mortality score.8 Briefly, the following weightings were applied: age (50–60 years score +2, 60–70 years score +4, 70–80 score years+6, >80 years score +7); gender (male score +1); number of relevant comorbidities (1 score +1, >1 score +2); respiration rate (20–30 score +1, >30 score +2); peripheral oxygen saturation (<92% score +2); urea (7–14 mmol/L score +1, >14 mmol/L score +3); CRP (50–100 mmol/L score +1, >100 mmol/L score +2). Glasgow Coma Scale values were not included in risk score calculation, as approximately 90% of patients did not have values available. Participants were classified based on these scores using the same thresholds described previously: low risk (weighted risk score 0–3), intermediate (score 4–8), high (score 9–14) or very high (score >14) risk.8

Values for individual laboratory tests were classed as low/normal/high based on the thresholds defined in online supplemental table 1.

Statistical methods

Continuous data are summarised as median (IQR), and categorical data are summarised as frequency (percentage). Differences between cohorts were tested using the Kruskal-Wallis rank sum test for quantitative variables and using Pearson’s χ2 test for count data. Biochemical parameters measured at admission were compared with the last available records using Wilcoxon signed-rank test within groups stratified by outcome. Associations between parameters and outcome were investigated using logistic regression adjusted for age, number of comorbidities and gender. Changes in biochemical parameters were also tested using logistic regression adjusted for age, number of comorbidities and gender and including both parameter value as well as the difference between first and last available values. P values were adjusted for multiple testing using Holm-Bonferroni method. Trajectories of biochemical parameters were clustered using the k-means clustering algorithm together with dynamic time warping18 19 used as a distance metric, setting the number of clusteto r k=4, which was found to be the optimal setting for the majority of biochemical parameters on analysis using the silhouette and elbow method.20 21

Missing data

Given the real-world nature of the study, there were a number of missing data points, and as this paper is mainly descriptive, we have not performed any imputation for these missing data but describe the data as they stand. For each model the number of patients may vary due to missing values.

Bias

The analysis population includes only patients hospitalised with COVID-19, and as such, is likely to be biased towards more severe COVID-19 infection and/or people at higher risk of death due to the presence of known risk factors. Treatment and intervention pathways may have evolved over the course of the study period. The analysis does not adjust for any differences in interventions between patients over the course of the study period. The study period largely predates the finding that dexamethasone treatment improves survival; while no adjustment is made for differences in interventions, dexamethasone use in the population was low and is not considered likely to affect the results or their interpretation.

Patient and public involvement (PPI)

Patient and public involvement was sought in the design and oversight of the broader Southampton Research Biorepository, within which the sampling arm of the REACT COVID study is nested, and patient representatives were involved in the design and management of the WATCH study,22 on which the REACT COVID Database is based. For further detail on PPI involvement in the REACT COVID study, please see ref 17).

Results

Participants

Six hundred and fifty-one patients who had a confirmed case of COVID-19 infection were included in this analysis. Five hundred patients had a final outcome of either died or discharged. Of these, date of symptom onset was recorded for 455 patients: 96 had date of symptom onset after admission (classed as nosocomial cases) and were excluded from further analysis. The analysis population consists of 359 patients admitted after symptom onset, who had a final outcome (died or discharged) recorded.

In order to support comparisons between cohorts, weighted risk scores at admission were calculated based on weightings described previously for 4C mortality score predictor.8 As seen in figure 1, while weighted risk scores predict outcome with reasonable accuracy, mortality was underestimated in most cases.

Figure 1

Performance of weighted risk score at admission as predictor of mortality. Red bars: predicted mortality based on a univariate logistic regression model of mortality according to weighted risk score at admission. Blue bars: actual observed mortality rate for a given risk score at admission.

Descriptive data

Patient characteristics

Patient characteristics are described in table 1. The median age of the analysis population was 71 years (interquartile range (IQR) : 53–83). The majority of patients (60% overall) were male. The most common comorbidities (>25% overall) were cardiac disease, renal disease, obesity and diabetes. The majority of patients (208) were classified as high or very high risk at admission using the modified 4C risk score.8 Thirty-one patients were missing one or more of the parameters required to calculate 4C weighted risk scores.

Table 1

Known risk factors at admission and disease progression, categorised into risk groups based on the modified 4C risk score8

Outcome data

Disease progression

Across all risk groups, the majority of patients were hospitalised within 1 week of symptom onset (table 1). Patients in the high and very high risk groups were admitted sooner after symptom onset compared with the low and intermediate risk groups. Time from admission to final outcome showed a bimodal relationship to risk (figure 2). Most patients in the low risk group (30/45, 67%) were discharged within 7 days of admission; most patients in the very high risk group (22/29, 76%) died within 7 days of admission. Most patients in the intermediate and high risk categories (48/75, 64% and 113/179, 63%, respectively) took longer than 7 days to reach final outcome. The longest time from admission to final outcome was 143 days.

Figure 2

Time from admission to outcome according to risk group at admission. Patients were grouped based on weighted risk scores at admission: low (0–3), intermediate,4 5 7–9 high10–15 and very high (>14). Risk scores at admission could not be calculated for some patients due to one or more missing features (‘data missing’ group).

Biochemical characteristics at admission

Based on first available values up to and including the day after admission, high CRP, D-dimer, Ferritin, LDH levels and low lymphocyte counts were common among all patients regardless of final outcome (online supplemental figure 1). The following abnormal results were more common among those who died compared with those who were discharged: high urea, high creatinine and low haemoglobin.

Abnormal values (below LLN or above ULN) for the following laboratory tests at admission were more common among higher risk groups: creatinine (eg, 40% of low-risk patients vs 86% of very high risk patients), CRP (38% vs 100%), ferritin (54% vs 95%), glucose (12% vs 68%), haemoglobin (13% vs 55%), neutrophils (18% vs 62%), urea (7% vs 97%) and white cell count (18% vs 45%). Within risk groups, differences between patients who were discharged versus died were broadly consistent (online supplemental figure 2).

Prognostic biochemical features at admission

Associations between laboratory values at admission (first available value, up to and including the day after admission) and outcome were evaluated for patients who were admitted for two or more days and had a final outcome within 28 days of admission (n=308). Potential associations with final outcome were evaluated separately for each parameter using multivariable logistic regression, adjusted for age, gender and number of comorbidities. P values were adjusted for multiple testing using the Holm-Bonferroni method. After correction for multiple testing, only CRP:lymphocyte (p=0.011) and neutrophil:lymphocyte (p=0.0189) ratios at admission were significantly associated with outcome, with higher values associated with increased mortality (online supplemental table 2). Associations between the temporal change of laboratory values from admission (first available value, up to and including the day after admission) to the nearest available result to outcome were evaluated for patients who were admitted for 2 or more days and had a final outcome within 28 days of admission, and the last result was taken no more than 4 days prior final outcome (n=209). After correction for multiple testing, increases in CRP, LDH, neutrophils, neutrophil:lymphocyte ratio, urea and white cell count were each significantly associated with increased mortality, whereas increases in platelets were associated with reduced mortality. For example, a 1 mg/L increase in CRP was associated with approximately 2%–4% increase in the odds of death and vice-versa for a 1 mg/mL decrease (table 2).

Table 2

Prognostic biochemical changes between admission and outcome

Unsurprisingly, among patients who were discharged, CRP levels at discharge were significantly lower than at admission (p<0.001), whereas among those who died, CRP levels at death were significantly higher than at admission (p<0.001). Similar patterns were observed for CRP:lymphocyte ratio and neutrophil counts (figure 3). Platelet counts at discharge were higher than at admission (p<0.001) but were unchanged among patients who died. However, decreased urea, white cell counts and neutrophil:lymphocyte ratios were seen at discharge compared with admission, in partients who were discharged but not those who died .

Figure 3

Changes in biochemical parameters between admission and last available specimen. Green line: increase; red line: decrease of parameter value for each individual patient. Biochemical parameters measured at admission were compared with the last available records using Wilcoxon signed-rank test within groups stratified by outcome.

Timing of prognostic changes in biochemical parameters

Clusters of patients with similar trajectories for a given biochemical parameter over the first week after admission were identified using dynamic time warping. Low, stable CRP (below 100 mg/L) was associated with reduced mortality (cluster 3, 31% mortality) compared with high and/or rising CRP (eg, cluster 2, 69% mortality, figure 4A). Low and stable urea levels were associated with reduced mortality (cluster 2, 18% mortality), whereas high and/or rising urea was associated with increased mortality (eg, cluster 4, 92% mortality, figure 4B).

Figure 4

Clustered trajectories of parameter values over the first 7 days after admission: trajectories of biochemical parameters were clustered using the k-means clustering algorithm together with dynamic time warping18 19 used as a distance metric, pragmatically setting the number of cluster k=4: (A) CRP; (B) urea; (C) platelet counts; (D) LDH; (E) neutrophil counts; (F) white cell counts; and (G) ymphocyte counts. Each line represents an individual patient: blue: discharged; red: died; black dashed line is a centre of the cluster and green area is normal range. Note: clustering was performed for each parameter separately, that is, CRP cluster #1 does not contain the same patients as urea cluster #1. CRP, C reactive protein; LDH, lactate dehydrogenase.

In contrast, low and stable platelet counts were associated with increased mortality (eg, cluster 2, 59% mortality) compared with high and/or rising platelet counts (eg, cluster 1: 23% mortality; cluster 3: 30% mortality) (figure 4C). Increasing LDH values were associated with higher mortality (cluster 3, 67%) compared with low and stable LDH (clusters 1 and 4, 35% and 17%) or declining LDH (cluster 2, 33%, figure 4D). Neutrophil counts that persisted above normal range were associated with higher mortality (figure 4E). No obvious relationships between patterns of white blood cell counts and lymphocyte counts were observed (figure 4F,G).

Discussion

Key results

We apply a dynamic time warping approach to deeply understand clinical characteristics and biomarker trajectory patterns of patients with COVID-19. We demonstrate that increasing comorbidity and age are associated with and predictive of adverse outcome with SARS-CoV-2 infection, in line with the 4C risk score,8 23 24 providing reassurance of the translatability of our findings from a small, single centre cohort, despite the omission of Glasgow Coma Scale. Through our analysis, we highlight that serial measurement of some routine blood tests may be a useful prognostic tool for monitoring treatment response in hospitalised patients with COVID-19. However, others may provide limited clinical benefit, which may have important implications for resources and healthcare costs within the pandemic.

Symptom onset has a role in increasing understanding of the natural history of SARS-CoV-2 infection, yet many studies focus on day of hospital admission rather than symptom onset for calculation of time to outcome.25 We demonstrate that the majority of patients requiring hospitalisation for SARS-CoV-2 infection present to hospital within 7 days of symptom onset. This is in line with findings from a Belgian study, demonstrating a mean time from symptom onset to hospitalisation of 5.74 days.26 A potential explanation for this could be found in the description of disease phases in SARS-CoV-2 infection, with rapid progression through early infection, pulmonary phase to hyperinflammation stage driving presentation relatively early in their disease course for those patients who require hospitalisation.27 This may be driven by a higher viral load in those patients who develop severe disease more rapidly, with support for this hypothesis coming from work demonstrating a higher and more persistent viral shedding in those patients with severe disease when compared with those with mild disease.28 In terms of time to outcome in those patients requiring hospitalisation, those with a low-risk score or a very high-risk score reach their outcome generally within 7 days of admission. This is reflective of the literature, in terms of meta-analyses findings of a median length of stay of 5 days outside of China.25 This is likely due to those in the low-risk group having sufficient physiological reserve to overcome the infection rapidly and those in the very high risk group having minimal reserve and therefore succumbing quickly, due to a combination of changes in the immune cell repertoire, epigenome and inflammasome response to infection.29 Those in the medium-risk and high-risk groups demonstrate a more variable and prolonged time to outcome, highlighting the groups in which the risk scores may be less accurate in predicting outcome. This is of clinical relevance as healthcare services become more stretched as the current wave progresses.

Consistent with previous reports, most patients presented with high levels of CRP, D-dimer, ferritin and LDH and low lymphocyte counts.30 31 However, when adjusted for repeated measures, age, comorbidity and how unwell patients were at admission, only neutrophil lymphocyte ratio was predictive of outcome. The literature is suggestive of a greater range of tests being predictive of outcome, and this may be a result of our more complex statistics correcting for confounders more accurately. It may also be that within a relatively small sample size, these additional predictive tests did not reach statistical significance. The neutrophil:lymphocyte ratio as predictive of outcome raises interesting questions as to the pathobiology driving morbidity and mortality in SARS-CoV-2 infection. It may be that the rise in neutrophils is associated with subclinical bacterial coinfection, which is supported by a higher CRP:lymphocyte ratio at admission and rising through the course of admission predicting worse outcome. However, a number of reviews have demonstrated detectable bacterial infection to be relatively low.32–34 The increased neutrophil:lymphocyte ratio may also be a direct response to the combination of viral infection and increased inflammation that are seen with worse outcomes with SARS-CoV-2 infection13 and have been demonstrated in other human coronavirus infections.35 The viral response is thought to directly drive increasing neutrophil numbers and the reduction in lymphocytes thought to be a subsequent response to resulting oxidative stress and inflammation.36

The granularity of our data allowed an in-depth investigation of change over time that adds novelty to this work. During the course of hospitalisation, changes in CRP, LDH, neutrophil counts, neutrophil:lymphocyte ratio, urea, white cell count and platelet counts were all significantly associated with outcome. This adds novelty and depth to existing larger but cross-sectional work in the literature. The platelet change is likely reflective of the coagulopathy and microthrombosis that has been described in SARS-CoV-2 infection37 and reflective of the literature,38 although the role of anticoagulation in SARS-CoV-2 infection remains under investigation39 40 and exact pathological mechanisms are unclear.41 The prognostic capability of this is of relevance to clinical practice and adds novelty to existing data, with many hospitals repeating a broad panel of ‘COVID bloods’ on a regular basis with little to guide interpretation of results. This is of relevance in the context of the current national shortage of blood tubes and also has cost per admission implications. We demonstrate that simple and inexpensive full blood count (FBC), renal profile and CRP are able to provide good prognostic information, and this is useful in the context of escalating healthcare costs of the pandemic.42

Generalisability

The study period largely predates the finding that dexamethasone treatment improves survival: only 2% of the study population received dexamethasone. Furthermore, anecdotal evidence suggests that treatment pathways for COVID-19 infection have evolved and become more standardised since the start of the pandemic. It is not yet clear whether these findings are applicable in a postdexamethasone era and in relation to new variants and vaccination statuses.

Limitations and future potential

There are limitations to our study. Omission of Glasgow Coma Scale (GCS) values systematically underestimates risk compared with the ISARIC study validation data for the 4C mortality score, the maximum risk score in this analysis was 19, compared with 21 for 4C mortality score. However, we demonstrate good correlation between our adapted risk score and the 4C risk score, suggesting reasonable validity in our results. This analysis does not adjust for differences in treatments, in particular for changes in treatment pathways as understanding of the disease and treatment effectiveness evolved over the course of the study period. With regards to symptom onset, there may be recall bias. The single-centre, relatively small data set is a limitation to this work; however, the reflection of published findings in our data set provide reassurance that additional novel findings would be extrapolatable to a larger population. The pragmatic nature of data collection alongside clinical treatment is both a strength and a weakness; it may result in missing data points, but adjustments have been made for these inconsistencies in the analysis. Furthermore, there may be systematic biases in the collection of data; for example, sicker patients may have more laboratory results, especially close to their date of final outcome. However, the fact that data are collected alongside routine clinical management allows for more directly translatable findings. Future work would involve investigation of the role of viral load in the context of these findings in order to better understand the pathological mechanisms that are key in driving a more severe disease phenotype and with serial cytokine sampling of airways and blood to investigate the role of hyperinflammation in this. A mechanistic modelling of the pathobiology of SARS-CoV-2 infection would allow greater understanding of the pathological processes driving these prognostic responses and may also facilitate more targeted novel treatment trials for this disease43 .44–49 Importantly, we have demonstrated the utility of a new approach, based on dynamic time warping analysis, for rapidly characterising emerging COVID-19 clinical cohorts and their trajectories. Further characterisation and validation of this approach is now required to understand whether it has potential to be used to promptly delineate the impact of vaccines, treatments and emerging SARS-CoV-2 strains on clinical characteristics of COVID-19 disease and outcomes in the future.

Conclusions

We implement a new dynamic time warping approach to gain a deeper understanding of patient clinical trajectories and outcomes in a COVID-19 hospital cohort. We demonstrate that serial monitoring of specific biochemical parameters does provide additional information to single testing at admission that is immediately clinically translatable but could be limited to less expensive, readily available testing, adding novelty to existing literature. The demonstration of factors predictive for outcome raise questions as to the mechanism of severe illness in SARS-CoV-2 infection and demonstrates the need for further mechanistic modelling, which is now underway.

Data availability statement

A REACT COVID Data Access Management Committee has been established to prioritise and ensure appropriate governance of requests to access linked anonymised clinical data, which may be shared with other research centres. Further details of this have been published elsewhere . [please refer ref. 15]

Ethics statements

Patient consent for publication

Ethics approval

Data were collected for COVID-19 positive patients admitted to University Hospital Southampton between 7 March and 4 September 2020. Ethical approval for the study was obtained from HRA specific review board (REC 20/HRA/2986). Ethical approval was obtained from the HRA Specific Review Board (REC 20/HRA/2986) for waiver of informed consent.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • HB, AF, PO’R and OW are joint first authors.

  • Twitter @AlastairSWatson, @DrAnnaFreeman

  • Contributors HB and AFree designed the protocol and, with OW and PO'R, drafted the manuscript; AD and MC were involved in protocol design; HP, FB and PF were involved in the realisation of data extraction, integration, transformation and upload processes. DL, PF, HP and JB were involved in the design and adaptation of the Real-time Analytics for Clinical Trials platform. PO'R, OW and AFrei designed and POR and OW performed the data analysis. AFrei and DL contributed in the project management. NS and SW are involved in manual data collection processes, and AW was involved in manuscript drafting and formatting. TW was involved in study conception and protocol design. All authors contributed to editing and reviewed the final manuscript. AFree guarantees and is responsible for the overall content of the finished work and has full access to the data, and controlled the decision to publish.

  • Funding The REACT platform has been supported by the digital Experimental Cancer Medicine Team free of charge (grant number not applicable). The biobanking sub-cohort is supported the NIHR Southampton CRF and NIHR Southampton Biomedical Research Centre at University Hospital Southampton NHS Foundation Trust and as part of a broader effort (Enabling New Treatment Approaches for COVID-19 Treatment) by the University of Southampton (UoS) charity (Office of Development and Alumni Relations) (grant number not applicable). In addition, the Clinical Informatics Research Unit, UoS has supported infrastructure costs (Grant number not applicable). The support described previously was not provided from a specific award or grant. The digital Experimental Cancer Medicine Team are supported by the AstraZeneca iDECIDE Programme (grant number: 119106), awarded to Manchester Cancer Research Centre and by Cancer Research UK via an Accelerator Award (award number: A29374) through the CRUK Manchester Institute (award number: C147/A25254).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.