Article Text

Protocol for a prospective, longitudinal cohort of people with COVID-19 and their household members to study factors associated with disease severity: the Predi-COVID study
  1. Guy Fagherazzi1,
  2. Aurélie Fischer1,
  3. Fay Betsou2,
  4. Michel Vaillant1,
  5. Isabelle Ernens1,
  6. Silvana Masi3,
  7. Joel Mossong4,
  8. Therese Staub5,
  9. Dominique Brault1,
  10. Christelle Bahlawane1,
  11. Mohammed Ally Rashid1,6,
  12. Markus Ollert7,8,
  13. Manon Gantenbein1,
  14. Laetitia Huiart1
  1. 1 Department of Population Health, Luxembourg Institute of Health, Strassen, Luxembourg
  2. 2 IBBL, Luxembourg, Luxembourg
  3. 3 Direction de la Santé, Luxembourg Ministère de la Santé, Luxembourg, Luxembourg
  4. 4 Laboratoire National de Santé, Luxembourg, Luxembourg
  5. 5 Centre Hospitalier de Luxembourg, Luxembourg, Luxembourg
  6. 6 Ifakara Health Institute, Bagamoyo, Tanzania
  7. 7 Department of Infection and Immunity, Luxembourg Institute of Health, Strassen, Luxembourg
  8. 8 Department of Dermatology and Allergy Center, Odense Research Center for Anaphylaxis, Odense University Hospital, University of Southern Denmark, Odense, Denmark
  1. Correspondence to Professor Laetitia Huiart; laetitia.huiart{at}


Introduction A few major clinical factors such as sex, obesity or comorbidities have already been associated with COVID-19 severity, but there is a need to identify new epidemiological, clinical, digital and biological characteristics associated with severity and perform deep phenotyping of patients according to severity. The objectives of the Predi-COVID study are (1) to identify new determinants of COVID-19 severity and (2) to conduct deep phenotyping of patients by stratifying them according to risk of complications, as well as risk factors for infection among household members of Predi-COVID participants (the Predi-COVID-H ancillary study).

Methods and analysis Predi-COVID is a prospective, hybrid cohort study composed of laboratory-confirmed COVID-19 cases in Luxembourg who will be followed up remotely for 1 year to monitor their health status and symptoms. Predi-COVID-H is an ancillary cohort study on household members of index cases included in Predi-COVID to monitor symptoms and household clusters in this high-risk population. A subcohort of up to 200 Predi-COVID and 300 Predi-COVID-H participants with biological samples will be included. Severity of infection will be evaluated by occurrence and duration of hospitalisation, admission and duration of stay in intensive care units or equivalent structures, provision of and duration of supplemental oxygen and ventilation therapy, transfer to another hospital, as well as the impact of infection on daily activities following hospital discharge.

Ethics and dissemination The study has been approved by the National Research Ethics Committee of Luxembourg (study number 202003/07) in April 2020. An informed consent is signed by study participants. Scientific articles will be submitted to international peer-reviewed journals, along with press releases for lay audience for major results.

Trial registration number NCT04380987.

  • public health
  • epidemiology
  • molecular diagnostics

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • The Predi-COVID study is a prospective cohort where participants will be followed up for 1 year and will help to identify new epidemiological, clinical, digital and biological characteristics associated with COVID-19 severity and long-term health complications.

  • Predi-COVID participants will be deep-phenotyped using various sources of information with the objective of finding clinically relevant severity markers of COVID-19 infection.

  • Risk factors for infection in the high-risk population composed of household members will be investigated in the Predi-COVID-H subcohort study.

  • Voice recordings are performed to identify vocal biomarkers associated with COVID-19 characteristics (severity, symptoms, psychological factors).

  • The dynamics of the COVID-19 pandemic in Luxembourg will directly impact the recruitment capacity for Predi-COVID, both for patients at hospital and at home.



In December 2019 the first case of pneumonia caused by a novel betacoronavirus, the 2019 novel coronavirus, was reported from the city of Wuhan, Hubei Province in mainland China.1 The disease quickly spread to other parts of China and across the globe. The WHO declared COVID-19 a pandemic on 11 March, with more than 200 000 people infected worldwide as of 19 March 2020. Pandemics are large-scale outbreaks of infectious disease that can greatly increase morbidity and mortality over a wide geographical area and cause significant economic, social and political disruption. A major pandemic can overwhelm the capacity of outpatient facilities, emergency departments, hospitals and intensive care units, leading to critical shortages of staff, space and supplies, with serious implications for patient outcomes.2 The pathogen for the new outbreak has been identified as a novel enveloped RNA betacoronavirus 2 and has been named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which has a phylogenetic similarity to SARS-CoV.1

Studies on epidemiological and clinical characteristics of patients with COVID-19 show that, compared with women, men are more infected and the median age is 49.0 years.2 3 Common symptoms at onset of illness are fever, cough, dyspnoea and myalgia or fatigue; less common symptoms are sputum production, headache, loss of taste and smell, haemoptysis, and diarrhoea. Complications include acute respiratory distress syndrome, anaemia, acute cardiac injury and secondary infection. The risk for serious disease and death in COVID-19 cases increases with age. In the USA case fatality percentages increased with increasing age, from no deaths reported among persons aged ≤19 years to the highest percentages (10%–27%) among adults aged ≥85 years. The findings from the USA are similar to data from China, which indicated >80% of deaths occurred among persons aged ≥60 years. To date, no vaccine has been successfully developed for COVID-19. While treatments are mainly symptomatic and supportive, recent results suggest that the antiviral remdesivir shortens the duration of recovery.4 The most common complications in patients with COVID-19 infection were reported to be acute respiratory distress syndrome, followed by anaemia, acute heart injuries and secondary infections.3

Due to the rapid spread of COVID-19,5 6 we determined that identifying profiles of patients predictive of more severe prognosis would be key to clinical management and would support more efficient public health measures. Although there have been some studies on epidemiological and clinical characteristics of patients with COVID-19, there is a lack of research investigating the prognostic factors for the severity of the disease. Accurate prognostic evaluation of COVID-19 can provide a guiding basis for active and effective management of the outbreak and would allow for personalised isolation policies, thus contributing to preventing the need for general lockdown.


The main purpose of this project is to identify the epidemiological, clinical, digital and sociodemographic characteristics as well as pathogen and/or host predictive biomarkers for the severity of COVID-19. We are aiming to better understand the heterogeneity observed in disease severity through a stratification approach of the cohort in Luxembourg. In the high-risk population composed of household members of index positive cases, the objective is to monitor symptoms and identify factors associated with COVID-19 outbreak.

The secondary objectives include the following: (1) to study the long-term (up to 1 year) health consequences of COVID-19, (2) to describe the trajectories of symptoms after diagnosis of COVID-19, and (3) to identify vocal biomarkers associated with respiratory syndromes, fatigue, anxiety or emotions related to COVID-19, which could then further be used for easy remote monitoring of patients with COVID-19.

Materials and methods

Study design

Predi-COVID is a prospective, hybrid cohort study composed of people positively tested for COVID-19 in Luxembourg. After explicit consent, virtually all positive patients could be included in the Predi-COVID cohort. A subsample of a minimum of 200 participants who agree to provide biological samples will be included in the deep phenotyping substudy. This number will be adapted to quickly evolving knowledge on COVID-19 and to the resources available to include and sample patients. For hospitalised patients, the protocol encompasses a face-to-face inclusion combined with daily evaluations up to death or discharge (using the International Severe Acute Respiratory and Emerging Infection (ISARIC) modified case report form (CRF) (CORE+DAILY modules)7). After discharge, participants will enter a ‘Home’ protocol based on digital follow-up using web and mobile applications. The CoLive LIH smartphone app will be used to collect epidemiological, clinical and voice data to monitor health status and symptoms over time.

Predi-COVID participants will be actively followed for 14 days after confirmation of diagnosis, whether they are at the hospital or at home in isolation or quarantine. Short evaluations will also be performed at weeks 3 and 4, and then monthly for a period up to 12 months to assess potential long-term consequences of COVID-19. A biological sampling will be performed at baseline and week 3 for Predi-COVID participants.

The Predi-COVID-H substudy is a prospective, longitudinal, hybrid cohort study which will include the household members of the Predi-COVID participants (figure 1). A subsample of a minimum of 300 participants will be composed of those who agree to provide biological samples. We further describe people participating in the main study as ‘Predi-COVID participants’ and those in the ancillary substudy of household members as ‘Predi-COVID-H participants’.

Figure 1

Recruitment strategy in the Predi-COVID and Predi-COVID-H study.

Strategies for recruitment and retention

Patients will be contacted by phone by collaborators from the Health Inspection and asked if they consent to having their contact details communicated to the LIH for potential participation in a research project on COVID-19 (figure 2). The cohort will be composed of consenting confirmed COVID-19 cases detected in Luxembourg and will include a subsample of 200 patients who will be visited at home for additional biosampling. Notified persons who agree will be contacted by phone by an experienced nurse or clinical research associate (CRA) from the Clinical and Epidemiological Investigation Centre (CIEC-Luxembourg Institute of Health), who will explain the study and organise the visits at home or at hospital. The CIEC team of LIH will be in charge of obtaining signed consent forms, follow-up of participants and organisation of visits (please see online supplemental material).

Supplemental material

Figure 2

Predi-COVID and Predi-COVID-H study design. CRA, clinical research associate; CRF, Case Report Form; GPS, Global Positioning System; ISARIC, International Severe Acute Respiratory and Emerging Infection Consortium; LNS, Laboratoire National de Santé.

Patient and public involvement

The Predi-COVID initiative was an emergency response from national research institutions grouped under ‘Research Luxembourg’ to fight the COVID-19 pandemic in Luxembourg and contribute to the general effort in the crisis. Therefore, for timing and safety reasons, patients with COVID-19 were not directly included to participate in the study design. However, the first participants included in Predi-COVID provided feedback on general workflow, data collection, questionnaires and sampling, which was taken into account in an amendment to the protocol.

Inclusion criteria

For the Predi-COVID cohort, in order to be eligible to participate in the study, an individual must meet all the following criteria: (1) signed informed consent form; (2) patients ≥18 years old with confirmed SARS-CoV-2 infection as determined by PCR, performed by one of the certified laboratories in Luxembourg; and (3) hospitalised or at home. For the Predi-COVID-H substudy, the inclusion criteria include being an adult household member of a Predi-COVID participant.

Exclusion criteria

An individual who meets any of the following criteria will be excluded from participation in the study: (1) patients not able to understand French or German and (2) patients already included in an interventional study on COVID-19. For the Predi-COVID-H substudy, the exclusion criteria include participants not able to understand French or German.

Participant discontinuation/withdrawal from the study

Patients can leave the study at any time for any reason if they wish to do so without any consequences. In this case, the data and samples already used for the study cannot be destroyed. Individuals enrolled in the Predi-COVID-H cohort (living with confirmed COVID-19 cases) will end the study at 2 weeks following inclusion in the study. In case of a positive PCR result, they will be invited to participate in Predi-COVID.

Lost to follow-up

Participants will receive questionnaires during the 12 months of the study except if they explicitly ask not to be contacted any more. Participants lost to follow-up are those who will not answer any of the questionnaires after baseline.

Study outcomes

Primary outcomes

The main outcomes of Predi-COVID are as follows: (1) hospitalisation: hospitalisation and duration of hospitalisation (days); (2) intensive care: intensive care or resuscitation and duration of intensive care/resuscitation; (3) ventilation: ventilation and duration of ventilation; (4) transfer to a hospital outside Luxembourg; (5) clinical severity; and (6) death.

Secondary outcomes

Other secondary outcomes, both quantitative and qualitative, have been defined to answer the secondary objectives. Quantitative outcomes include the Charlson index, whole blood count parameters, plasma cytokine levels, peripheral cell immune phenotyping, T cell receptor (TCR) repertoire of CD4 and CD8 T cells, IgG and IgM titres, and hormones/glucocorticoid concentration. Qualitative outcomes include syndromic respiratory disease panel results, COVID-19 mutations and human leukocyte antigen (HLA) genotypes.

Data and sample collection

The samples and data collected during the course of the study are described in table 1. The participants will be able to participate either in data collection only or in both data and sample collection. For sample collection they can choose to do all or part of the sampling.

Table 1

Predi-COVID study schedule

The samples and data collected during the course of the Predi-COVID-H substudy are described in table 2. The Predi-COVID-H participants will be able to participate either in data collection only or in both data and sample collection. For sample collection they can choose to do all or part of the sampling. Since the national recommendations for the follow-up of contact persons of positive cases have recently changed, with the prescription of a PCR test 5 days after the test of the case person, we adapted our sampling strategy with a follow-up visit conditioned by the PCR test result (see table 2).

Table 2

Predi-COVID-H study schedule

Data will be collected in three different ways:

  • Questionnaires using the electronic Patient Reported Outcomes (ePRO) module of Ennov Clinical: health status monitoring data with daily questionnaires during the 14 first days following diagnosis, then weekly questionnaires at weeks 3 and 4, and monthly questionnaires from month 2 to month 12. A unique link to the questionnaires will be sent by email to participants, who will fill in the questionnaire online. Participants can choose to answer or not to answer the questionnaires.

  • Adapted ISARIC CRF: at hospital and at inclusion for patients included from home. For patients at home, data collection will be performed by the CIEC team by phone.

  • CoLive LIH smartphone application (LIH inhouse solution): innovative data (voice recordings, geolocation and mini-questionnaires).

Biological samples

The sampling strategy for Predi-COVID is defined in table 3 and for Predi-COVID-H in table 4.

Table 3

Sampling strategy for Predi-COVID

Table 4

Sampling strategy for Predi-COVID-H

Data management

Data in the ISARIC-adapted CRF will be completed by a trained nurse or CRA from CIEC. Access to source data during hospitalisation will be provided by the different participating hospitals. Data collection is the responsibility of the clinical trial staff at the site under the supervision of the site investigator. The investigator is responsible for ensuring the accuracy, completeness, legibility and timeliness of the data reported. In the e-questionnaires and the CoLive LIH application, the patient will be responsible for data entry. Each data collection tool (e-questionnaires, ISARIC, CoLive LIH) implements its own data collection rules and controls, to ensure that data are entered correctly at the earlier stage of the data handling process. Each variable input will be controlled as soon as it is entered by the authorised user with allowable values and min and max values when necessary, consistency controls between relevant variables, allowable values filtering according to the context already fulfilled, and so on. Data handling will be carried out through specific edit checks to test the consistency of the database. Quality controls will also be regularly performed according to the clinical and epidemiological good practice guidelines.

A clinical data management plan will be provided to authorised people manipulating the data (which includes detailed description of source documentation, CRFs, instructions for completing the forms, value coding dictionaries and reconciliation processes, data handling procedure and procedures for data monitoring). Depending on the data collection path of the different tools provided to patients and the CIEC data collection team (nurses, CRA), each data set will be centralised at LIH by the research team in order to link patient records and analyse them. Since each collection tool uses its own pseudonym for patients, LIH’s research team manages a unique and secure pseudonymisation matrix, enabling them to link all patient records coming from different sources. Following good practices, when data are provided to researchers for further research projects, LIH’s data management team will generate an extra pseudonym—added to the pseudonym matrix table—for the extracted patient records. The central codebook will be shared with the new research team (figure 3).

Figure 3

Predi-COVID and Predi-COVID-H data architecture and workflow. eCRF, Electronic Case Report Form; GPS, Global Positioning System; IBBL, Integrated BioBank of Luxembourg; ISARIC, International Severe Acute Respiratory and Emerging Infection Consortium; LIH, Luxembourg Institute of Health.

Statistical analysis

This is a very early study which aims to understand the severity associated with a new, poorly understood pathogen. Therefore, the sample size is not formally determined as in a clinical trial study. Recruitment of participants will depend on the emergence and spread of the virus and the resources available to the recruitment centres. The sample size should be as large as feasible and preferably without limit in order to capture as much clinical data as possible early in the outbreak. Nevertheless, in the subsample study, a minimum of 200 COVID-19-positive persons in the Predi-COVID study would allow finding a risk ratio of severe disease of above 2 for the selected risk factor with a power of 80% when the prevalence of the disease is above 7%.

Descriptive statistics will be obtained for the main endpoints, namely hospitalisation, intensive care/resuscitation admission, ventilation, duration, death and severity of the disease. Secondary endpoints together with other variables including potential prognostic factors of the disease will also be described with mean (±SD) or frequencies (%) as adequate. A logistic regression model will be used to study whether the risks of hospitalisation, intensive care admission or intubation as well as death are associated with a specific risk factor. Adjustment will be done on age, gender and other potential confounding factors. Factors triggering the severity of the disease will be evaluated with a Mantel-Haenszel χ2 to evaluate if different levels of patient characteristics are associated with a more serious form of the disease.

Further stratification of the patient cohort will be performed through machine learning techniques to include biological and other omics measurements. A deep phenotyping related to the symptoms of the disease as well as biosampling allowing for laboratory-based and computational analytics will be performed. The stratification approach will include data-driven tools and allow definition of disease trajectories that will be translated into clinical and/or biomarker-defined subgroups. To do so, we will use multi-omics approaches combined with clustering techniques. Missing data on endpoints will not be replaced. A replacement method such as multiple imputations through chained equations will be envisaged for the other variables and risk factors. The limits of the study linked to the size of the sample enrolled, the extent of the effects or the selection of participants and other potential limits will be evaluated and discussed.

Ethics and dissemination

The study has been approved by the National Research Ethics Committee of Luxembourg (study number 202003/07) in April 2020.

Informed consent and enrolment

The investigator, or another member of the investigating team (experienced nurse or CRA from CIEC), will discuss the study with the subject, whereby the subject will be given the opportunity to understand the objectives, risks and inconveniences of the study and the conditions under which it is to be conducted. The language used to inform the subject, both oral and written, should be concise, described in layman’s terms and should be understandable to the subject. In the present context of COVID-19 pandemic, it is not possible for security reasons to have a fully paper consent form. Moreover it is important that the nurses spend a minimum of time in the house of the positively tested persons. The study will therefore be explained to potential participants by phone and an electronic version of the subject information sheet (SIS) and informed consent form (ICF) will be sent by email if the person is interested in study participation. Participants will have the possibility of participating only in the data collection or in both data and sample collection. In case the subject agrees to participate in data collection only, they will be asked to fill in the consent form electronically (e-consent). If the subject also agrees to participate in the sample collection, they will be asked to fill in the e-consent, and a paper version for sample collection will be brought by the nurse at baseline visit for the subject to fill in, and for the nurse to obtain a written signature and answer additional questions if necessary. The paper document will be given to the participant and a photo of the document will be stored at LIH. The subject will have sufficient reflection time between the first contact (description of the study via phone) and the sample collection visit. In addition, the subject will have the opportunity to ask any question during the phone call or by email, before signing the electronic ICF. The informed consent of all persons willing to participate in the study will be collected, while two different SIS and ICF will be given to the subjects, according to the cohort they will belong to (Predi-COVID for subjects with a positive COVID-19 test and Predi-COVID-H for people living in the same household). Finally, the subject will be informed that he/she can withdraw from the study at any time, without giving any reason and without any consequences on his/her medical care. ICFs have been approved by the competent authorities, and the participants will be asked to read and sign the consent form prior to starting any procedures done specifically for this study. In exceptional cases, when the subject presents with acute conditions and is unable to receive and understand the information about the study, the informed consent can be requested first from the next relative and from the subject as soon as this becomes possible. The written ICF must be dated and personally signed by the principal investigator or delegates and the subject giving consent (or the next relative in case of emergency situations). The electronic copy (photo) of the signed ICF will be retained in the study file at LIH and will be made available for monitoring, audit or inspection. No compensation will be offered to subjects for their participation.

Good clinical practice

The principal investigator and the sponsor ensure that the study is conducted in full conformance to the principles of the 1964 ‘Declaration of Helsinki’, as revised from time to time, and with the laws and regulations of Luxembourg, whichever afford the greater protection to the individual. The study fully adheres to the principles outlined in the ‘Guideline for Good Clinical Practice’ ICH-E6 Tripartite Guideline (January 1997) and to national laws. The principal investigator ensures compliance with the national laws, the ICH Good Clinical Practice (GCP) and with the European Union General Data Protection Regulation (GDPR) with regard to the processing of personal data (GDPR) and with the law of 1 August 2018 on the organisation of the national commission for data protection and the general regime on data protection.

Transfer to third party

Some recipients of the data may be located outside the country and outside the European Economic Area (eg, UK). These may be countries whose level of data protection has not been confirmed by the European Commission as adequate. In this case, security measures equivalent to the security measures required by Luxembourgish and European regulations will be taken in order to protect subjects’ rights in terms of data confidentiality, by entering into specific contractual agreements.

Storage of encoded data

Encoded data will be kept for at least 15 years after the end of the study. Subsequently, they may be kept for an additional period of time, for the above-mentioned scientific purposes or for any legal reason (eg, change of obligations with regard to storage).

Publication plan

Several publications will be submitted to international peer-reviewed journals on findings related to the Predi-COVID and Predi-COVID-H studies. Press releases and articles for lay audience will also be prepared for major results.


The Predi-COVID study group is thankful to the participants, the national ethics committee and the funders for their support in the set-up of this initiative.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @gfaghe, @FischerAurelie1, @joel_mossong

  • Contributors GF, AF, FB, MV, IE, DB, CB, MO, MG and LH contributed to the planning, conception and design of the study. GF, AF, FB, MV, IE, DB, CB, MO, MG and LH contributed to the conduct of the study. GF, AF, FB, MV, IE, DB, CB, MO, MG, LH and MAR contributed to the reporting of the work in the article. GF, AF, FB, MV, IE, SM, JM, TS, DB, CB, MAR, MO, MG and LH contributed to data acquisition for the Predi-COVID study and they all drafted the first version of the manuscript and approved the final version of the manuscript.

  • Funding The Predi-COVID study is supported by the Luxembourg National Research Fund (FNR) (Predi-COVID, grant number 14716273) and the André Losch Fondation.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.