Article Text

Original research
Trends and symptoms of SARS-CoV-2 infection: a longitudinal study on an Alpine population representative sample
  1. Giulia Barbieri1,2,
  2. Massimo Pizzato3,
  3. Martin Gögele1,
  4. Daniele Giardiello1,
  5. Christian X Weichenberger1,
  6. Luisa Foco1,
  7. Daniele Bottigliengo1,
  8. Cinzia Bertelli3,
  9. Laura Barin1,
  10. Rebecca Lundin1,
  11. Peter P Pramstaller1,
  12. Cristian Pattaro1,
  13. Roberto Melotti1
  1. 1Institute for Biomedicine (affiliated to the University of Lübeck), Eurac Research, Bolzano, Italy
  2. 2Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, Verona, Italy
  3. 3Department of Cellular, Computational and Integrative Biology, University of Trento, Trento, Italy
  1. Correspondence to Ms Giulia Barbieri; giulia.barbieri{at}eurac.edu; Dr Cristian Pattaro; cristian.pattaro{at}eurac.edu

Abstract

Objectives The continuous monitoring of SARS-CoV-2 infection waves and the emergence of novel pathogens pose a challenge for effective public health surveillance strategies based on diagnostics. Longitudinal population representative studies on incident events and symptoms of SARS-CoV-2 infection are scarce. We aimed at describing the evolution of the COVID-19 pandemic during 2020 and 2021 through regular monitoring of self-reported symptoms in an Alpine community sample.

Design To this purpose, we designed a longitudinal population representative study, the Cooperative Health Research in South Tyrol COVID-19 study.

Participants and outcome measures A sample of 845 participants was retrospectively investigated for active and past infections with swab and blood tests, by August 2020, allowing adjusted cumulative incidence estimation. Of them, 700 participants without previous infection or vaccination were followed up monthly until July 2021 for first-time infection and symptom self-reporting: COVID-19 anamnesis, social contacts, lifestyle and sociodemographic data were assessed remotely through digital questionnaires. Temporal symptom trajectories and infection rates were modelled through longitudinal clustering and dynamic correlation analysis. Negative binomial regression and random forest analysis assessed the relative importance of symptoms.

Results At baseline, the cumulative incidence of SARS-CoV-2 infection was 1.10% (95% CI 0.51%, 2.10%). Symptom trajectories mimicked both self-reported and confirmed cases of incident infections. Cluster analysis identified two groups of high-frequency and low-frequency symptoms. Symptoms like fever and loss of smell fell in the low-frequency cluster. Symptoms most discriminative of test positivity (loss of smell, fatigue and joint-muscle aches) confirmed prior evidence.

Conclusions Regular symptom tracking from population representative samples is an effective screening tool auxiliary to laboratory diagnostics for novel pathogens at critical times, as manifested in this study of COVID-19 patterns. Integrated surveillance systems might benefit from more direct involvement of citizens’ active symptom tracking.

  • COVID-19
  • epidemiology
  • health policy
  • SARS-CoV-2

Data availability statement

Data are available upon reasonable request. CHRIS study data can be requested for research purposes by submitting a dedicated request to the CHRIS Access Committee. Please contact access.request.biomedicine@eurac.edu for more information on the process.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • Longitudinal study of SARS-CoV-2 symptoms and incident cases in the absence of competing infections and vaccine prophylaxis.

  • Population-based representative sample susceptible to first-time SARS-CoV-2 infection.

  • Self-reported SARS-CoV-2 infections and symptoms may limit generalisability to most severe cases.

Introduction

After an unprecedented pathogen-led death toll in modern times,1 the COVID-19 pandemic is shifting to an endemic phase in most territories.2 3 However, recrudescence of reinfections by SARS-CoV-2 novel emergent variants and high contagion rates are still a matter of public health concern, particularly towards the most vulnerable individuals. The social and economic difficulties of maintaining most non-pharmaceutical containment actions in the long term, as well as limited awareness on effective self-prevention measures, including vaccine hesitancy, warrant further attention for monitoring the spread of SARS-CoV-2 and other infectious agents.4 5

Continuous monitoring and surveillance have been a challenge for health authorities throughout the COVID-19 pandemic. Routine indicators are based on the number of incident cases over the number of people tested, and on the rate of hospital admissions and deaths linked to COVID-19.6 These indicators mainly rely on system capacity to track infections and perform laboratory analyses, as well as effective information systems. Such indicators based on confirmed positive cases may also lack accuracy and prompt reporting in emergent or fast-spreading situations. Lastly, diverse cultural backgrounds, adaptive healthcare policies, ever-increasing healthcare costs and the process of development for testing methods may influence case number estimation.7

Symptoms screening and monitoring may provide complementary and timely surveillance data, if systematically and regularly conducted in a reference population sample, and under certain conditions.8–12 Focusing on syndromic infections, symptoms’ comprehensibility and suitability to investigate novel pathogen specificities are essential. While presymptomatic and asymptomatic infected individuals contribute to the spread of infections like SARS-CoV-2,13 viral shedding peaks at around the time of symptom appearance.14 Early reporting of symptoms can therefore allow prompt identification of critical areas for further prevention or containment actions. Large-scale repeated cross-sectional studies based on digital social platform anonymised data were capable to capture trends of symptoms and cases at both global and regional area levels.15 16 However, smaller scale ad hoc representative studies may allow participation tracking, relatively better population coverage and control of information and selection biases by detailed or linked auxiliary data. Being independent of the healthcare testing, diagnosis and referral capacity, and requiring substantially less resources, early estimate of infection spread by symptoms monitoring promises to be a better tool for the management of healthcare infection emergencies.

Population-based longitudinal studies are particularly suited to investigate the temporal patterns of incident events and symptoms in a pandemic like COVID-19.9 11 17–19 While planning, recruiting and accrual of evidence are time consuming, these studies have a unique and active role in shaping surveillance strategies.4 By the end of August 2020, we recruited a representative random sample of adult residents in the Alpine rural district of Vinschgau/Val Venosta (South Tyrol, Italy).20 21 As few as 4.4 cases per 10 000 district inhabitants and no active cases (16 confirmed cases overall) had been identified by the healthcare system at the time, in contrast to nearby and other European regions, where infection prevalence estimates ranged from 23.1% to 42.4% in the same period.22–24 At the national level, there were standard recommendations to adopt individual containment actions (eg, face masks, hands hygiene, physical distance) up until 24 October 2020, when additional public restrictions, especially school and business closures, were introduced.25 A governmental decree26 introduced three risk-level zones for each regional and autonomous provincial authority (including South Tyrol) in Italy, starting 6 November 2020 (online supplemental figure S1).27 Each risk zone was alternatively identified for any period with the colours yellow, orange and red, orderly corresponding to increasing levels of non-pharmaceutical interventions. Since 26 April 2020, containment actions were progressively, although cautiously relaxed, yet within an emergency governance.28

We characterised participants through ad hoc questionnaires, in-person molecular testing and blood sample collection at baseline, and monthly follow-up digital questionnaires from September 2020 until July 2021 (figure 1A). In this report we describe (1) the distribution of symptomatic episodes and the symptom patterns over the whole follow-up period, and (2) the dynamic relationship of specific symptoms and their aggregate patterns with incident SARS-CoV-2 infections, as captured by first ever positive swab test self-reporting (figure 1B). We discuss the potential utility of these data to complement diagnostic-based surveillance in similar emergent infectious events.

Figure 1

Graphical representation of the study design and methods. (A) Study design. (B) Methods: each aim (either pale blue or pale pink foreground) was addressed with both aggregate (pale yellow background) and individual-level (light grey background) outcome and methods. 1Performed at the study centre. 2Self-reported contacts with positive or symptomatic individuals OR any positive swab test. SAb, serum antibody.

Methods

Study design

We invited an age-sex stratified random sample targeting 1450 participants of the Cooperative Health Research in South Tyrol (CHRIS) study29 representative of all adult residents in the district to participate in the CHRIS COVID-19 prospective investigation between 13 July and 28 August 2020.20 21 An online screening questionnaire covered SARS-CoV-2-related anamnesis since 1 February 2020 until participation to the baseline clinical visit for blood drawing and testing. Afterwards, repeated follow-up online questionnaires were sent to each baseline participant every 4 weeks for 1 year to monitor for SARS-CoV-2-related events.

Laboratory assessments

Laboratory assessments included a Roche Elecsys Anti-SARS-CoV-2 assay serum antibody (SAb) test to measure the level of SARS-CoV-2-specific antibodies, as a measure of past infection, and a swab PCR test to identify active SARS-CoV-2 infections.20 Serum samples were also submitted to an assessment of their ability to inhibit the transduction of a lentiviral vector pseudotyped with the SARS-CoV-2 spike protein (neutralisation test). Pseudotyped vectors, transducing a gene encoding a fluorescent protein, were incubated along with scalar dilutions of serum that was then inoculated onto Huh-7 cells. Percentage of transduced fluorescent cells was quantified using the High-Content Molecular Devices ImageXpress Micro Confocal on nuclei counterstaining with Hoechst 33342 and the serum dilution associated with 50% inhibition of transduction (ID50 value) was finally estimated from each derived sigmoidal curve.

Baseline and follow-up questionnaires

The baseline questionnaire asked about participants’ sociodemographic and household information, lifestyle determinants, regular therapies and chronic conditions. A detailed section was dedicated to SARS-CoV-2 anamnesis comprising previous diagnosis, occurrence of symptoms and close contacts with infected or symptomatic individuals.20

Follow-up questionnaires focused on SARS-CoV-2 anamnesis and additionally included ‘vaccination status’ from the beginning of the vaccination campaign (27 December 2020). At each follow-up, participants replied to a main filter question on SARS-CoV-2-related events since previous participation: on a positive reply, the participant would fill in the rest of the questionnaire, otherwise all subsequent responses were considered negative, or else missing in case of no response.20 21

Longitudinal analysis framework

All individuals susceptible to first SARS-CoV-2 infection before enrolling in the follow-up were included in the longitudinal analysis. Participants’ records were prospectively integrated in the analysis to the end of the study or until first infection or dose of vaccination to avoid misinterpretation of symptoms induced by vaccination with those of first infection with SARS-CoV-230–32 and changes in individual susceptibility (figure 1A).33 34

Variable definitions

At baseline, a positive result to either the PCR or SAb test defined a SARS-CoV-2-positive case. Follow-up questionnaire completion dates were specific to each participant at every 4-week wave. For any self-reported symptomatic episode, the date at symptom onset was asked. However, for any self-reported swab test, the date of testing was not asked. To match symptom-onset dates with symptom-free dates from independent respondents, we defined the ‘index month of reporting’ as either the month of the first symptomatic episode or, had no symptoms been reported, the month corresponding to ‘shifting’ the actual questionnaire completion date by 15 days backwards (ie, to the midpoint of each questionnaire reference period). In case of duplicate dates of symptom onset in successive questionnaires, the reported event with the most symptoms was retained in the analysis. For the longitudinal analyses, we defined 12 time points (t) corresponding to the months from August 2020 until July 2021. At each time point, a participant could report positivity to any of 26 symptoms. We defined a dichotomous variable ‘occurrence of symptom j’ as stij that was 1 if the ith individual reported symptom j at time t, 0 otherwise. We calculated the total number of symptoms reported by each participant at each time point, referred to as the ‘count of symptoms’ (Sti). Presence of SARS-CoV-2 infection was defined at each time point as either ‘positive’ (positivity to any swab test, henceforth Embedded Image) or ‘negative’ (Embedded Image), based on self-reported questionnaire information. At each index month, the incidence of each symptom j was defined as the proportion of symptomatic cases for that symptom, and the incidence of T+ as the proportion of positive cases, both estimated among all retained questionnaire respondents for that month. Tti, stij and Sti were used as outcome measures in the individual-level analyses, whereas incidences of symptoms and positive cases per month were used as outcome measures in aggregate analyses.

Statistical analyses

Population-calibrated cumulative incidence of SARS-CoV-2 at baseline was estimated using the Clopper-Pearson method for extreme proportions, including sampling weights to account for non-response.20 Pairwise association testing between infection status at baseline (only nine positive participants were detected) and each of 64 personal characteristics (described elsewhere20) or ID50 values used non-parametric statistics, as described in the online supplemental material.

The number of symptoms reported by respondents at each time point (Sti) displayed excess of zeros and overdispersion. To model the temporal evolution of this variable, we fitted a weighted zero-inflated negative binomial mixed model with random intercept (reflecting the individual), using the index month as a fixed effect categorical predictor and adjusting for sex and decade of age. For these analyses we used the ‘glmmTMB’ R package v1.1.4. To assess which individual symptoms were mostly associated with self-reported T+, we conducted a random forest analysis, generating 200 classification trees, using Gini impurity node splitting. SEs were estimated based on 100 bootstrap samples, with a forest of 20 trees each. Each tree randomly included 8 out of 25 (~33%) of all the available predictors.35 We set the minimum node size at five observations, which cast the tree depth. The discriminatory ability of each symptom was defined as the average decline in prediction accuracy on the out-of-bag samples, when excluding that symptom from the model.35 The relative increase in the misclassification rate (MR) was quantified and named as the relative change (RC).36 For these analyses we used the hrf function from the R package ‘htree’ v2.0.0.37

The individual symptom time trajectories were modelled on incidence data via longitudinal cluster analysis based on the k-means method.38 The optimal partition was obtained through an iterative process, according to the Calinski-Harabasz criterion,39 using the R package ‘kml’ v2.4.1.40

Finally, we conducted a dynamic correlation analysis41 42 to assess the aggregation among symptom trajectories and the association between symptoms occurrence and incidence of self-reported positive swab tests (T+). Since the number of symptoms was large compared with the number of reported occurrences across time points, regularisation was applied to obtain a shrinkage estimate of the correlation matrix. This analysis was conducted with the dyn.cor function of the R package ‘longitudinal’ v1.1.13.43

All statistical analyses were run in the R environment v4.1.1.

Patient and public involvement

Participants of the study were not involved in the design, or conduct, or reporting, or dissemination plans of our research.

Results

Characteristics of the study sample

We enrolled 845 participants at baseline (participation: 58%; females: 51.6%; age—years, median: 50, range: 20–94). By 28 August 2020, nine participants tested positive to SARS-CoV-2 infection (eight by SAb test and one by PCR test), corresponding to an adjusted cumulative incidence of 1.10% (95% CI 0.51%, 2.10%). Baseline positivity to infection was associated with self-report of a previous positive SAb test, having been isolated because of suspected or confirmed SARS-CoV-2 infection and specific symptoms such as loss of taste and loss of smell (online supplemental table S1). Participants with any positive test at baseline had higher antibody neutralisation levels (online supplemental table S1). In those with both tests negative, neutralisation levels were not associated with any characteristic except for the municipality of residence (online supplemental table S2).

After excluding the nine baseline positive participants, 134 participants who did not fill any follow-up questionnaires and two who had been vaccinated before participation into follow-up, 700 participants were available for longitudinal analyses. Their characteristics were similar to the baseline sample (online supplemental table S3). Each participant filled in a median of nine follow-up questionnaires (range: 1–13; IQR: 5–11). Based on self-report, throughout the study period, 200 (28.6%) individuals never undertook any swab test, 194 (27.7%) underwent one test only and 306 (43.7%) had ≥2 tests. The maximum observed number of tests per person was 8. 472 participants (67.4%) never reported any symptoms, 158 (22.6%) reported one symptomatic episode and 70 (10.0%) reported more than one (table 1).

Table 1

Characteristics of the 700 study participants during the follow-up period by index month and according to symptoms and swab test reporting.

Symptom patterns and test positivity

The months from October 2020 to February 2021 were characterised by a higher probability of reporting symptoms than August. October, November and January were the months with the highest number of symptoms per participant, among those reporting symptoms, compared with August (figure 2, online supplemental table S4). Random forest analysis (MR=0.013) identified loss of smell (RC=15.3%) as the most predictive symptom of a positive swab test, followed by fatigue or tiredness (RC=14.5%), joint or muscle pain (RC=12.8%), headache (RC=8.4%), fever (RC=5.2%) and loss of taste (RC=4.8%; figure 3).

Figure 2

Results of the zero-inflated negative binomial model. AModelling the probability of a symptomatic episode with any number of symptoms (reversed log odds from the original model). BModelling the expected number of symptoms conditional on a symptomatic episode.

Figure 3

Discriminatory capacity of symptoms derived from random forest analysis. The relative change represents the decrease in accuracy in the discriminatory ability of the model if the symptom is not included.

Longitudinal cluster analysis of symptom patterns

Figure 4A shows the distribution of each symptom incidence over time and the overlapping pattern of self-reported positive swab tests. Two clusters best explained the aggregate trajectories of symptoms, which were characterised by high-frequency and low-frequency symptoms, respectively (figure 4B). The time series of the positive swab tests was best reflected by symptoms included in the high-frequency cluster: cold, joint or muscle pain, fatigue or tiredness, sore throat or hoarseness and headache. Alternative analytical solutions that forced symptom aggregations in more than two clusters produced similar mean trajectories, with no major changes in the symptom aggregation (online supplemental figure S2).

Figure 4

Results of the cluster analysis on the incidence of symptoms over time. (A) Trajectories of symptoms (coloured plain lines) and self-reported T+ (black dotted line). The dynamic correlation between the trajectory of each symptom and self-reported T+ is included within brackets in the legend keys. (A, B) Two clusters of symptoms are distinguished by warm (high frequency) and cold (low frequency) colours, respectively. (B) The heatmap in grey scale represents incidence of each symptom across time and reflects y-axis incidence in panel A.

Dynamic correlation analysis

We observed a generally large and positive dynamic correlation among the symptom trajectories (figure 4; online supplemental figure S3), especially among symptoms included in the high-frequency cluster (dynamic correlation coefficient, r, between 0.72 and 0.85). Joint or muscle pain was the most correlated symptom with headache (r=0.85) and fatigue or tiredness (r=0.83). Almost all observed symptoms were highly correlated with self-reported T+ (r≥0.60), except for abdominal pain (r=0.40) and otitis (r=0.45). The symptoms most correlated with T+ were loss of smell (r=0.85), joint or muscle pain (r=0.85) and headache (r=0.85).

Discussion

This study provides a moving picture of COVID-19 pandemic dynamics since inception over its hardest hitting phases to date in the Val Venosta/Vinschgau district (South Tyrol, Italy). By summer 2020, the resident population was nearly naïve to SARS-CoV-2 infection by official figures, likely thanks to the absence of super spreader events and the rapid application of strict nationwide containment measures.27 44 While still modest in absolute size, our data indicate a cumulative rate of infection in the Val Venosta/Vinschgau district at 11 per 1000 inhabitants, which roughly corresponds to 25-fold the figure derived from officially confirmed cases. This evidence confirms the large number of unidentified positive individuals that had characterised the beginning of the pandemic and the hidden ongoing viral shedding.45 Regardless, the observed incidence was very low compared with other areas of central Europe and Northern Italy that, at the same time, had already been more severely affected by the novel coronavirus.22–24 45 This situation provided the opportunity to observe incident cases of infection prospectively since fall 2020 in a nearly naïve susceptible population.

Among participants susceptible to first infection, we observed two peaks of self-reported incident positive cases, corresponding to November 2020 and February 2021, that is, before widespread vaccine availability. These peaks coincided with the peaks of officially recorded confirmed cases, intensive care unit admissions and deaths for the whole South Tyrol region (online supplemental figure S1).46 This suggests that, during the study period, in the Val Venosta/Vinschgau district the contagion pattern was similar to other areas under the same risk zone mitigation strategies.27 47 Also, the pattern of symptomatic episodes and number of reported symptoms closely resembled the pandemic pattern. While the peaks of symptomatic episodes were identified in October 2020 and February 2021, the load of symptoms for any given episode peaked in October/November 2020 and January 2021. All symptoms followed similar time trajectories, mimicking self-reported positive swab tests as well as the official figures of the pandemic dynamic (compare figure 4A and online supplemental figure S1). This pattern was likely the result of a lack of competing infectious diseases at the same time frame48 favoured by the adoption of strict isolation rules and containment actions, which varied modestly in response to the pandemic spread over the course of the study period (online supplemental figure S1).27

The five most discriminatory symptoms of infection were, in order of relevance, loss of smell, fatigue or tiredness, joint or muscle pain, headache and fever. This confirms the previous reports that identified a similar set of symptoms to predict extant infections.10 12 49 In our analysis, loss of taste had a small discriminatory capacity when accounting for loss of smell, following evidence of cellular mechanisms of infection acting at the smell receptor level.50

On an aggregate level, our best fitting longitudinal cluster analysis split the symptoms into two separate patterns of high-frequency versus low-frequency symptoms. The two clusters followed similar trajectories with no substantial crossover, suggesting limited role for emerging variants on specific symptom frequency. Such evidence is also supported by the estimation of the dominant variants circulating during the study period.51 The most prevalent variants were emerging mixed SARS-CoV-2 strains until the end of 2020. Subsequently, there was a steep uptake of the Alpha variant, which then left predominance to the Delta variant, by July 2021. To our knowledge, none of these variants provided strong scientific evidence for diverse symptom patterns among SARS-CoV-2-infected individuals.52

Fever and loss of smell fell in the low-frequency cluster. In the high-frequency cluster, the five most frequent symptoms were fatigue or tiredness, joint or muscle pain, headache, cold and sore throat or hoarseness. Fatigue or tiredness, joint or muscle pain, headache and loss of smell showed very high dynamic correlation with swab test positivity. These results need careful interpretation due to possible effects of seasonality, reporting bias and mediating pathways. For example, the link between headache, the most reported symptom, and the peaks of infection could be partially mediated through psychosocial distress (eg, confusion, worrisome attitudes) or isolation and indoor confinement,53–55 as corroborated in our results by limited discriminatory capacity. While fever and loss of smell are familiar and recognisable symptoms, their lower frequency and good discriminatory capacity of infection support their high specificity to COVID-19 in a relatively low prevalence and protected context. Nevertheless, the occurrence of infection may not imply those symptoms.22 56 Finally, joint or muscle pain and fatigue or tiredness are generic symptoms that often occur with ordinary influenza-like illness and in combination with other symptoms. In our analyses, these symptoms maintained both high frequency and discriminatory capacity, as observed elsewhere.9 12

Our study is a rare account of a population-based prospective observational study attempting to map incident diagnoses and apply symptom manifestation to COVID-19 screening over a long period from the start of the pandemic. For example, a similar cohort study conducted in Lübeck (Germany) investigated patterns of infection through PCR test and measured antibody response from March 2020 until February 2021, mimicking the official figures of infection.17 A deluge of questionnaires screened for symptoms and self-reported positive tests over the 2020 late-spring/summer period of low incidence and two extra examinations covered the pandemic expansion phase in November 2020 and February 2021. Our manageable sample size and confined catchment area was privileged with regular, frequent and comprehensive assessments of symptoms and incident cases, which extended through the whole evolution of the pandemic in Italy until its next temporal dampening by July 2021. Over this whole period, a nuanced description and grouping of symptom trajectories reflected the trend of incident cases.

The patterns of symptoms and cases of infection described in our study may complement those of ad hoc monitoring systems based on confirmed positive cases to inform effective surveillance strategies. Routine surveillance systems of acute respiratory infections rely on a network of sentinel general practices and testing laboratories subject to voluntary participation, which are often adequate to monitor nationwide seasonal patterns of ordinary respiratory infections. However, the uncontrollable spread of SARS-CoV-2 revealed suboptimal surveillance frameworks for the detection of novel pathogens of concern and for the timely identification of outbreaks in restricted areas.57 Population-based reports like ours show that remote digital technology applications may be employed to screen and trace symptomatic human infections on a reference or largely compliant population sample during a health emergency.10 17 49 Real-time electronic data sharing between citizens seeking assistance and general practices or emergency departments in an overarching global surveillance network could be an efficient and effective model for epidemiological surveillance. Extant and adaptable privacy preserving norms and secured technologies would now allow to make these data flow a critical and ordinary source for both individual care and preventative public health actions.

To our knowledge, this is the first population-based longitudinal study conducted in Europe able to trace the pattern of incident cases both retrospectively and prospectively over an 18-month long course of the COVID-19 pandemic since inception and match data-driven trajectories of symptom clusters. An additional strength of this study was the calibration of the study sample to be representative of the adult population of a wide rural area, which was free of confirmed positive active cases at the time of initial recruitment and susceptible to primary SARS-CoV-2 infection. Our carefully designed 4-week follow-up witnessed high compliance of study participants, as manifest by the large amount of follow-up questionnaires completed, and may have both limited the recall bias and increased the precision of temporal allocation of symptoms and other events.

Several limitations should also be considered. The sample size was limited. However, this was calibrated to estimate a cumulative incidence between 0.01% and 1.1% with a confidence level of 99.0%.20 Given our final estimate corresponded to the upper bound of such an interval, the collected sample size provided sufficient power for reliable cumulative incidence estimates. Symptoms like fever, cough and shortness of breath may have been partly obscured as possible indicators of more severe infections, as suggested by the low frequency reported.9 11 Accordingly, we cannot exclude selection bias concerning the most severely affected and distressed participants. Next, we acknowledge the possibility of missing marginalised or less digitally literate individuals, especially in relation to using an online screening questionnaire. However, participants who filled at least one questionnaire beyond baseline were approximately 83%.20 Moreover, according to a recent local survey, 81% of South Tyrolean families comprising members in the age range of 16–74 reported having access to the internet from home.20 58 59 A peculiarity of our study was that we followed up participants who were susceptible of first infection. When participants reported a positive test or having received a dose of vaccine, they were systematically excluded from further online screening. This might have prevented the observation of the evolution of symptoms over the course of infection on some respondents who might happen to be positive close to the time of infection and questionnaire response. Hitherto, the observational period was unique in several aspects: no major competing illness events, evolving and dynamic containment norms, developing testing capacities and techniques, no prior widespread vaccination prophylaxis and ultimately a mass testing campaign conducted in late November 2020 in the whole of South Tyrol, which allowed the identification of many hidden positive cases in a short time window.60 While these aspects may impair the generalisability of our findings, the symptomatic patterns and reported cases matched the dynamics of the pandemic from official data of a larger area, suggesting that widespread control measures could balance out areas at different levels of risk. Lastly, symptoms and incident cases of infections were self-reported. However, self-reported symptoms have shown greater breadth than electronic medical record-extracted symptoms.49 Moreover, another report on the same cohort21 showed that if participants were to experience symptoms they would generally be tested for COVID-19, limiting the possibility of testing bias in our results.

In conclusion, regular remote symptom tracking is feasible in the context of an emergent pandemic situation, such as that of COVID-19, and can closely characterise the temporal pattern of infection. Such an approach would be logistically and economically advantageous, and more sustainable than long-term, large-scale testing and tracing alternatives. Surveillance systems should broaden and integrate their infrastructure to involve multiple participatory units and include symptom reporting in their templates through citizens’ direct involvement via digital technology or other means.

Data availability statement

Data are available upon reasonable request. CHRIS study data can be requested for research purposes by submitting a dedicated request to the CHRIS Access Committee. Please contact access.request.biomedicine@eurac.edu for more information on the process.

Ethics statements

Patient consent for publication

Ethics approval

The CHRIS COVID-19 study was authorised by the Ethics Committee of the Healthcare System of the Autonomous Province of Bolzano/Bozen with deliberation number 53-2020 on 27 May and 22 July 2020.

Acknowledgments

We are grateful to all CHRIS COVID-19 study participants. We thank the collaborators of both Eurac Research and the Healthcare System of the Autonomous Province of Bolzano for field and laboratory operations, the representatives, and staff of the local administrative authorities for support and all ancillary volunteers who made the study possible.

The authors thank the Department of Innovation, Research and University of the Autonomous Province of Bozen/Bolzano for covering the Open Access publication costs.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • PPP, CP and RM contributed equally.

  • Contributors CP, PPP, RM and MG conceived the CHRIS COVID-19 study. CP, RM and GB conceived the investigation topic object of the present work. GB, MG, RM, DG and CXW performed all statistical analyses. MP and CB performed the neutralising antibody analyses. GB, MP, MG, DG, CXW, DB, CP and RM interpreted the results. GB, RM and CP drafted the manuscript. MP, MG, DG, CXW, LF, DB, CB, LB, RL and PPP contributed to the critical revision and editing of the manuscript. GB acts as guarantor for this study. All authors approved the manuscript.

  • Funding The CHRIS COVID-19 study was supported by Eurac Research, the South Tyrolean Health Authority, and the Department of Innovation, Research and University of the Autonomous Province of Bolzano. The present research was conducted within the project ‘PACE: Partnership to Accelerate COVID-19 rEsearch in South Tyrol’, funded by the Department of Innovation, Research and University of the Autonomous Province of Bolzano within the 2019–2021 Research Program (unique project code: D52F20000770003). This work was supported by a grant from Fondazione VRT/CARITRO to MP.

  • Disclaimer Funding sources did not have any role in the research conduction, writing of the manuscript and decision to submit it for publication.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.