Article Text

Download PDFPDF

Human Early Life Exposome (HELIX) study: a European population-based exposome cohort
  1. Léa Maitre1,2,3,
  2. Jeroen de Bont1,2,3,
  3. Maribel Casas1,2,3,
  4. Oliver Robinson1,2,3,4,
  5. Gunn Marit Aasvang5,
  6. Lydiane Agier6,
  7. Sandra Andrušaitytė7,
  8. Ferran Ballester3,8,9,
  9. Xavier Basagaña1,2,3,
  10. Eva Borràs2,10,
  11. Céline Brochot11,
  12. Mariona Bustamante1,2,3,10,
  13. Angel Carracedo12,13,
  14. Montserrat de Castro1,2,3,
  15. Audrius Dedele7,
  16. David Donaire-Gonzalez1,2,3,
  17. Xavier Estivill14,15,
  18. Jorunn Evandt5,
  19. Serena Fossati1,2,3,
  20. Lise Giorgis-Allemand6,
  21. Juan R Gonzalez1,2,3,
  22. Berit Granum5,
  23. Regina Grazuleviciene7,
  24. Kristine Bjerve Gützkow5,
  25. Line Småstuen Haug5,
  26. Carles Hernandez-Ferrer1,2,3,
  27. Barbara Heude16,
  28. Jesus Ibarluzea3,17,18,19,
  29. Jordi Julvez1,2,3,4,
  30. Marianna Karachaliou20,
  31. Hector C Keun21,
  32. Norun Hjertager Krog5,
  33. Chung-Ho E Lau21,22,
  34. Vasiliki Leventakou20,
  35. Sarah Lyon-Caen6,
  36. Cyntia Manzano1,2,3,
  37. Dan Mason23,
  38. Rosemary McEachan23,
  39. Helle Margrete Meltzer5,
  40. Inga Petraviciene7,
  41. Joane Quentin6,
  42. Theano Roumeliotaki20,
  43. Eduard Sabido2,
  44. Pierre-Jean Saulnier24,
  45. Alexandros P Siskos21,
  46. Valérie Siroux6,
  47. Jordi Sunyer1,2,3,4,
  48. Ibon Tamayo1,3,25,
  49. Jose Urquiza1,2,3,
  50. Marina Vafeiadi20,
  51. Diana van Gent1,2,3,
  52. Marta Vives-Usano1,2,3,10,
  53. Dagmar Waiblinger23,
  54. Charline Warembourg1,2,3,
  55. Leda Chatzi26,27,
  56. Muireann Coen22,
  57. Peter van den Hazel28,
  58. Mark J Nieuwenhuijsen1,2,3,
  59. Rémy Slama6,
  60. Cathrine Thomsen5,
  61. John Wright23,
  62. Martine Vrijheid1,2,3
  1. 1 ISGlobal, Institute for Global Health, Barcelona, Spain
  2. 2 Universitat Pompeu Fabra (UPF), Barcelona, Spain
  3. 3 CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
  4. 4 Municipal Institute of Medical Research (IMIM-Hospital del Mar), Barcelona, Spain
  5. 5 Norwegian Institute of Public Health, Oslo, Norway
  6. 6 Team of Environmental Epidemiology, IAB, Institute for Advanced Biosciences, Inserm, CNRS, CHU-Grenoble-Alpes, University Grenoble-Alpes, CNRS, Grenoble, France
  7. 7 Department of Environmental Sciences, Vytautas Magnus University, Kaunas, Lithuania
  8. 8 Nursing School, Universitat de València, Valencia, Spain
  9. 9 FISABIO–Universitat Jaume I–Universitat de València Joint Research Unit of Epidemiology and Environmental Health, Valencia, Spain
  10. 10 Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
  11. 11 Unité Modèles pour l’Ecotoxicologie et la Toxicologie (METO), Institut National de l’Environnement Industriel et des Risques (INERIS), Verneuil en Halatte, France
  12. 12 Fundación Pública Galega de Medicina Xenómica (SERGAS), Santiago, Spain
  13. 13 Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Universidad de Santiago de Compostela, Santiago, Spain
  14. 14 Research Department, Sidra Medicine, Doha, Qatar
  15. 15 Genomics Unit, Dexeus Woman’s Health, Barcelona, Spain
  16. 16 Inserm UMR 1153—Centre de Recherche Epidémiologie et Biostatistique Sorbonne Paris Cité (CRESS), Equipe de recherche sur les origines précoces de la santé et du développement de l’enfant (ORCHAD), Villejuif, France
  17. 17 School of Psychology, University of the Basque Country UPV/EHU, San Sebastian, Spain
  18. 18 Biodonostia Health Research Institute, San Sebastian, Spain
  19. 19 Department of Health, Public Health of Gipuzkoa, Government of the Basque Country, San Sebastian, Spain
  20. 20 Department of Social Medicine, Faculty of Medicine, University of Crete, Heraklion, Greece
  21. 21 Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK
  22. 22 Integrative Systems Medicine and Digestive Disease, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK
  23. 23 Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
  24. 24 Centre d’Investigation Clinique CIC1402, Inserm, Université de Poitiers, CHU Poitiers, Poitiers, France
  25. 25 Department of Statistics, Faculty of Arts and Sciences, Harvard University, Cambridge, Massachusetts, USA
  26. 26 Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
  27. 27 Department of Genetics and Cell Biology, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
  28. 28 Veiligheids- en Gezondheidsregio Gelderland Midden (VGGM), Arnhem, The Netherlands
  1. Correspondence to Professor Martine Vrijheid; martine.vrijheid{at}


Purpose Essential to exposome research is the collection of data on many environmental exposures from different domains in the same subjects. The aim of the Human Early Life Exposome (HELIX) study was to measure and describe multiple environmental exposures during early life (pregnancy and childhood) in a prospective cohort and associate these exposures with molecular omics signatures and child health outcomes. Here, we describe recruitment, measurements available and baseline data of the HELIX study populations.

Participants The HELIX study represents a collaborative project across six established and ongoing longitudinal population-based birth cohort studies in six European countries (France, Greece, Lithuania, Norway, Spain and the UK). HELIX used a multilevel study design with the entire study population totalling 31 472 mother-child pairs, recruited during pregnancy, in the six existing cohorts (first level); a subcohort of 1301 mother-child pairs where biomarkers, omics signatures and child health outcomes were measured at age 6–11 years (second level) and repeat-sampling panel studies with around 150 children and 150 pregnant women aimed at collecting personal exposure data (third level).

Findings to date Cohort data include urban environment, hazardous substances and lifestyle-related exposures for women during pregnancy and their offspring from birth until 6–11 years. Common, standardised protocols were used to collect biological samples, measure exposure biomarkers and omics signatures and assess child health across the six cohorts. Baseline data of the cohort show substantial variation in health outcomes and determinants between the six countries, for example, in family affluence levels, tobacco smoking, physical activity, dietary habits and prevalence of childhood obesity, asthma, allergies and attention deficit hyperactivity disorder.

Future plans HELIX study results will inform on the early life exposome and its association with molecular omics signatures and child health outcomes. Cohort data are accessible for future research involving researchers external to the project.

  • birth cohort
  • exposome
  • epidemiology
  • omics
  • public health
  • community child health

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


The ‘exposome’ concept encompasses the totality of non-genetic exposures from conception throughout the life course, complementing the genome.1 The exposome concept carries the expectation that the use of holistic and data-driven approaches, similar to those pioneered in the genomics fields, can result in advances in our understanding of the complex environmental component of disease aetiology. The exposome has been delineated to include three overlapping and complementary domains2: (1) a general external domain including macrolevel factors such as climate, urban environment and societal factors; (2) an individual external domain including agents such as environmental pollutants, tobacco smoke, diet and physical activity and (3) a specific internal domain including gene expression, inflammation and metabolism, often assessed through high-throughput molecular omics methodologies such as transcriptomics, proteomics and metabolomics.

The HELIX project aims to measure and describe multiple environmental exposures from the different exposome domains during early life (pregnancy and childhood) and associate these with omics markers and child health outcomes. The background, rationale and detailed objectives of the HELIX project have been described elsewhere at the start of the project.3 HELIX takes early life as a key starting point for defining the exposome because, as recognised in the Developmental Origins of Health and Disease research, it is well recognised that the periods of organ development during prenatal life and infancy are especially vulnerable to the effects of environmental risk factors, which may manifest themselves throughout the lifetime in adult diseases.4 Essential to exposome research is the collection of data on many environmental exposures from the different exposome domains in the same subjects. Here, we describe recruitment, study population, measurements available and baseline data of the HELIX nested study populations, with an aim to provide a detailed description of the cohort and of the data available for future collaborative research.

Cohorts participating in HELIX

The HELIX study represents a collaborative project across six established and ongoing longitudinal population-based birth cohort studies in Europe: the Born in Bradford (BiB) study in the UK,5 the Étude des Déterminants pré et postnatals du développement et de la santé de l’Enfant (EDEN) study in France,6 the INfancia y Medio Ambiente (INMA) cohort in Spain,7 the Kaunus cohort (KANC) in Lithuania,8 the Norwegian Mother and Child Cohort Study (MoBa)9 and the RHEA Mother Child Cohort study in Crete, Greece10 (table 1). These cohorts were selected for participation in the HELIX project because: (a) they could provide substantial existing longitudinal data from early pregnancy through childhood, (b) they could follow-up children at similar ages, (c) they could integrate questionnaires, biosampling and clinical examinations using common HELIX protocols and (d) they offered heterogeneity in terms of exposure and population characteristics.

Table 1

Characteristics of the cohorts contributing to the HELIX cohort

Pregnant women in the original cohorts were recruited between 1999 and 2010. Three cohorts (INMA, KANC, RHEA) recruited during the first trimester of pregnancy, two through the first and second trimesters (EDEN, MoBa), while in the BiB cohort women were recruited between weeks 26 and 28 of gestation (second/third trimesters). Inclusion and exclusion criteria varied between cohorts, as described in table 1. All cohorts included at least one follow-up point during pregnancy, one at birth and several after delivery.

Based on these six existing cohorts, HELIX used a multilevel study design, drawing on nested study populations for data collection of different intensities (figure 1): (1) the entire cohort in which factors arising primarily from outdoor exposures were assessed through geospatial models and linked to existing health outcome data; (2) a subcohort in which one new follow-up examination of the children between ages 6 and 11 years was carried out in order to assess child health outcomes and to fully characterise different areas of the exposome through questionnaires, biological sample collection and biomarker and omics measurements and (3) two panel studies in children and pregnant women to characterise in depth the variability in exposure biomarkers and omics biomarkers, individual exposure-related behaviours and personal exposures.

Figure 1

Flow chart describing design and available data. GIS, Geographic Intelligent Software; HELIX, Human Early Life Exposome; miRNA, microRNA; mtDNA, mitochondrial DNA. *Omics data available after quality control

HELIX entire cohort

The study population for the entire HELIX cohort includes 31 472 women who had singleton deliveries between 1999 and 2010, and for whom exposure to ambient air pollution during pregnancy had been estimated as part of the European Study of Cohorts for Air Pollution Effects (ESCAPE) project.11 The entire cohort includes nine regions from the six cohorts; we included only regions where geographic data were available to calculate air pollution levels and built environment indicators (table 1). This meant, for example, that the city of Oslo and not the whole of the national MoBa cohort was included, and that only the Gipuzkoa, Sabadell and Valencia regions of the INMA study were included. In the other cohorts, women residing outside the main urban areas were excluded for the same reason.

In this study population, data on many variables had been collected in the individual cohorts during previous data collection points (during pregnancy and between birth and 5 years of age). Existing data included information on certain exposures (eg, maternal tobacco smoking during pregnancy, environmental tobacco smoke), key covariates (eg, pregnancy complications, maternal and child diet, maternal and child physical activity, child sleep, breast feeding, other health-related behaviours, indicators of socioeconomic status) and health and development outcomes. As part of HELIX, relevant datasets from all 31 472 mother-child pairs were transferred from the six cohorts to the central HELIX data warehouse located at the Barcelona Institute for Global Health (ISGlobal) (see below). Through data harmonisation, these cohort-specific variables were converted to harmonised variables. This process involved summarising, checking and matching the specific variable cohort-by-cohort and deciding a common coding system appropriate to each variable. Specific expert working groups throughout the HELIX consortium advised on the harmonisation rules for each variable. The child health and developmental outcomes harmonised as part of HELIX include birth outcomes, growth-related and obesity-related outcomes, blood pressure, neurodevelopment and respiratory health between birth and 5 years of age (table 2).

Table 2

Health outcomes harmonised across the entire cohort between birth and 5 years of age

HELIX subcohort

From the entire cohort, a subcohort of mother-child pairs was selected to be fully characterised for a broad suite of environmental exposures and ‘omics’ data, to be clinically examined and to have biological samples collected. A new follow-up visit was organised for these mother-child pairs between December 2013 and February 2016. Subcohort subjects were recruited from within the entire cohorts such that there were approximately 200 mother-child pairs from each of the six cohorts. Subcohort recruitment in the EDEN cohort was restricted to the Poitiers area and in the INMA cohort to the city of Sabadell.

Eligibility criteria for inclusion in the subcohort were: (a) age 6–11 years at the time of the visit, with a preference for ages 7–9 years if possible; (b) sufficient stored pregnancy blood and urine samples available for analysis of prenatal exposure biomarkers; (c) complete address history available from first to last follow-up point; (d) no serious health problems that may affect the performance of the clinical testing or impact the volunteer’s safety (eg, acute respiratory infection). In addition, the selection considered whether data on important covariates (diet, socioeconomic factors) were available. Each cohort selected participants at random from the eligible pool in the entire cohort and invited them to participate in this subcohort until the required number of participants was reached. In total, 1301 mother-child pairs with complete questionnaire and clinical examination data, and urine and blood samples, were included in the HELIX subcohort (figure 1).

Several cohorts then invited and examined further subjects (n=322) following the same protocols for clinical examination and sample collection, and the same questionnaires, but these were not included in the measurement of exposure biomarkers for the HELIX study (figure 1). Among the 322 extra individuals, 266 came from INMA-Sabadell, 26 from BiB, 7 from EDEN, 3 from KANC, 19 from MoBa and 1 from RHEA. For some of these individuals omics data were collected. These individuals may be included in studies with another focus than exposure biomarkers, which is why they are shown in figure 1.

The new follow-up visits for the subcohort took place in the six study centres at a local hospital, a primary care centre or at the National Institute for Public Health (NIPH) in Oslo. During the follow-up examination, trained nurses interviewed the mothers, carried out health examinations of the children and collected biological samples using standardised operating procedures.

Questionnaire information

Interviews with the mothers during the visit used a computer-aided version of a common standardised questionnaire developed for HELIX. Questionnaires were translated and back-translated in each of the country languages. If it was not possible for the mother to attend (although the mother’s attendance was greatly encouraged on recruitment), then the father or legal guardian completed the questionnaire and the mother checked it at a later date at home. This happened for 4% of the children. The full questionnaire can be accessed online (

The questionnaire collected information on child’s diet including an internationally agreed food frequency questionnaire with portion size examples appropriate to each cohort and the reliability of the Mediterranean diet quality index questionnaires to assess Mediterranean diet,12 physical activity of the child, sleeping patterns of child, socioeconomic status (family affluence scale (FAS II),13 ie, subjective wealth), social capital of the family,14 stress of the mother,15 exposure to environmental tobacco smoke, water consumption habits, cooking and heating methods at the home, cleaning products, bedroom location, noise perception, child’s use of mobile phones and other electronic devices, use of green spaces, commuting behaviour, holidays and sun exposure and puberty development of the child.

Questions on exposures during the day and week before the visit were asked separately in a short questionnaire and repeated during the second period of nested child panel study (see below). Additionally, questions related to addresses, places visited and travel were collected using a custom-made Google maps-aided commuting questionnaire based on the free Geographic Intelligent Software (GIS), qGIS software16 that allowed mothers to trace their child’s commuting routes directly on the computer. This information was used to enhance the accuracy of location data collected through the main questionnaires and integrated with the outdoor exposure estimates to provide exposure estimates at different locations and for different time-activity patterns (eg, home, school, commuting exposures). A total of 98.3% of mothers in the subcohort completed the qGIS commuting questionnaire.

Anthropometry and body composition

During the subcohort follow-up examination, anthropometric data were collected using regularly calibrated instruments: height was measured with a stadiometer and weight with a digital weight scale, both without shoes and with light clothing. Height and weight measurements were converted to body mass index (BMI in kg/m²) for age-and-sex z-scores using the international WHO reference curves in order to allow comparison with other studies.17 Overweight and obese children were defined as those above the age-and-sex-specific 85th and 95th percentiles, respectively, as recommended by WHO ( Circumferences (arm, waist, head) were measured with a metric tape and recorded in duplicate.

Four skinfolds were measured (triceps, subscapular, suprailiac, thigh), following the protocols as described in the report from National Health and Nutrition Examination Survey III Body Measurements (anthropometry).18 Three complete sets of each skinfold measurement were taken consecutively, and the mean was used as the representative value for each site. A skinfold is the thickness of a double fold of skin and subcutaneous fat, excluding the underlying muscle. Skinfolds are highly correlated with total body fat and a way to assess the distribution of fat tissue. Specific training workshops (one before and one during field work) were organised to standardise skinfold and other anthropometric measurements between the cohorts. In these workshops, all field workers participated and were trained to obtain measurements that were comparable to those measured by an expert anthropometrist.

Bioelectric impedance analyses readings were performed with the Bodystat 1500 (Bodystat, Douglas, Isle of Man) equipment after 5 min of lying down. Bioelectric impedance provides an objective measure of body composition when standard protocols are followed and population-specific equations are available and used. Fat free mass and fat mass (in grams and as proportion of total body mass) were calculated based on values of impedance, using published age-specific and race-specific equations validated for use in children.19 The multiracial equations developed recently for children based on impedance values obtained by a single frequency tetra-polar Bodystat device19 fit well the measures obtained from children in HELIX using the same device.

Blood pressure

Blood pressure was taken in sitting position after 5 min of rest using the OMRON 705-CPII automated oscillometric device. The mean of three consecutive measurements that were taken with 1 min intervals was used.20 Blood pressure was measured towards the end of the visit to ensure that children had not consumed anything that may affect the results (chocolate, cola drinks) in the previous hour. Systolic and diastolic blood pressures and pulse rate from each measurement were recorded.

Respiratory health

Lung function measures were obtained through forced spirometry test, using the EasyOne spirometer in children by trained field workers using a standardised protocol. During the measurements, the child was sitting straight and equipped with a nose clip and asked to breathe in as deeply possible until his/her lungs were totally full, and then to quickly position the mouthpiece and blast out the air as hard and as fast as possible. The child was asked to perform at least six of these manoeuvres to achieve the three acceptable and reproducible manoeuvres needed for a valid test. While acceptability and reproducibility criteria for spirometry have been well defined for adults, the criteria to be used for children lack clarity and consistency.21–23 Based on international standards,21 24 a manoeuvre was defined as acceptable if there was no hesitation of false starts (ie, if the back-extrapolated volume (BEV) was <5% of the forced vital capacity (FVC) or if BEV <100 mL if FVC <1000 mL) and if the forced expiratory time (FET) was in an acceptable range (1.5 s<FET<10 s). The highest values for forced expiratory volume at one second (FEV1) taken from acceptable forced expiratory manoeuvres should not vary >150 mL or 5% (or <100 mL if FVC <1000 mL) from the second highest FEV1. Then, the per cent predicted values FEV1 were computed using the Global Lung Initiative equations, and any best FEV1% predicted value in the (60%; 140%) range was retained. Following these criteria, 79.4% of the HELIX children performed a valid test.

Information on occurrence of wheeze, asthma, eczema, allergic rhinitis and food allergy in the children was obtained through questions adapted from the International Study on Asthma and Allergy in Childhood by the Mechanisms of the Development of Allergy project (MeDALL).25


Neurodevelopmental outcomes were assessed through a battery of internationally standardised, non-linguistic, and culturally blind computer tests. The tests included N-back26 and the attention network test27 to assess working memory and sustained attention, the trail making test28 to assess speed of processing and executive functions, the finger tapping test to assess motor speed and lateralisation29 and Raven’s coloured progressive matrices30 to assess general non-verbal intelligence. The tests were administered through standardised study-provided laptops and lasted a maximum of 1 hour. Rooms for testing were ensured to be quiet and the tests were done with minimal interference. Field workers were trained to instruct children in a standardised way.

A proxy of maternal IQ or cognitive functioning is an important cofactor to be assessed in any study where the neurodevelopment is the main outcome. The mothers (or father if the mother was not available) completed a short version of N-back (no more than 6 min) adapted to adults.

Parents completed the Conner rating scale’s and child behaviour checklist (CBCL) before the visit to assess child behavioural problems. The Conner rating scale’s of 27 items provides information on child behaviour, particularly in relation to inattention and hyperactivity.31 CBCL is one of the most recognised and extended tools to fully assess a child behavioural functioning and contains several subscales, including aggressive behaviour, anxiety/depression, attention problems, internalising problems, externalising problems, etc.32

Biological sample collection

During the subcohort follow-up examination, new biological samples suitable for all planned exposure biomarker and omics analyses were collected using the same standardised protocols across all six cohorts as shown in figure 1. Two spot urine samples (one before bedtime and one first morning void) were collected in high-quality polypropylene tubes (Sarstedt: 75.9922.744). The two urine samples were brought by the participants to the centre in cool packs and stored at −4°C until processing. After aliquoting, the urine samples were frozen at −80°C under optimised and standardised procedures. If the families did not bring urine samples with them, a new sample was collected on arrival at the centre. This occurred in 6.6% of the subcohort children; 18 mL of blood was collected during the follow-up visit at the end of the clinical examination of the child to ensure an approximate 3 hour (median 3.5 hours, SD 1.1 hour) fasting time since the last meal. Blood samples were collected using a ‘butterfly’ vacuum clip and local anaesthetic and processed into a variety of sample matrices. It included EDTA Vacutainers designed for trace element testing, used for plasma proteomics, microRNA (miRNA) and perfluorinated alkylated substance analyses, blood smears, whole blood heavy metal and DNA isolation (BD: 368381, K2EDTA coated), tempus tubes for RNA isolation (Life Technologies Cat. No.: 4342792), plastic silica Vacutainers for serum metabolomics and clinical parameter analyses (BD: 3 68 813 silica coated, clot activator), glass silica Vacutainer for serum polychlorinated biphenyl, dichlorodiphenyldichloroethylene, hexachlorobenzene, polybrominated diphenyl ether analyses (BD: 367614, silica coated, no activator). After processing, these samples were frozen at −80°C under optimised and standardised procedures. After performing the relevant assays, blood and urine samples, hair samples, RNA and DNA samples remain in storage for the subcohort children.

Exposure assessment

To construct the exposome, HELIX has estimated exposure to a wide range of environmental contaminant exposures and indicators of the built environment. In the entire cohort and subcohort, a GIS environment for the nine study areas was constructed, and, based on residential address histories, exposure estimates were assigned for ambient air pollutants, road traffic noise levels, surrounding (natural spaces green and blue spaces), built environment, ultraviolet (UV) radiation and meteorological variables during pregnancy and childhood (table 3). These estimates build on existing land-use regression air pollution models (ESCAPE project33 34), city noise maps, land use maps (‘Urban Atlas’ by European Environmental Protection Agency), raster maps of the normalised difference vegetation index (NDVI),35 36 raster maps of land surface temperature, building density, population density, connectivity, walkability and public bus transport map information for the built environment and meteorological data, as described in more detail elsewhere (Robinson and colleagues37). Data from existing regulatory monitors were used to back extrapolate ambient air pollution exposure models. The estimates for these outdoor exposures were calculated for the prenatal period and several postnatal periods up to the HELIX subcohort follow-up time point (table 3).

Table 3

Exposure estimates available in the HELIX entire cohort and subcohort

Furthermore, in the subcohort, biomarkers of contaminant exposure were measured in appropriate biological samples collected from the children at age 6–11 years and in samples previously collected from mothers during pregnancy or from the neonates during delivery (cord blood) and stored in cohort biobanks (table 3). Chemical assays were conducted in the laboratory at the Department of Environmental Exposure and Epidemiology at the NIPH, apart from analyses of metals/elements and cotinine, creatinine and blood lipids, which were subcontracted to ALS Laboratory Group Norway AS and Dr Fürst Medisinsk Laboratorium AS, respectively. Biomarkers include: organochlorine compounds and brominated compounds, perfluoroalkyl substances and metals in blood, and non-persistent chemicals (phthalate metabolites, phenols, organophosphate pesticide metabolites and cotinine) in urine samples (table 3). Concentrations of OCs and PBDEs were adjusted for to total lipid percentage and expressed in ng/g of lipids. Urinary concentrations were adjusted for creatinine and expressed in μg/g of creatinine. Urine samples of the night before the visit and the first morning void on the day of the visit were combined to provide a slightly long-term exposure assessment than can be achieved with one spot urine sample (Haug et al, s).

Concentrations of drinking water disinfection by-products (DBPs) during pregnancy were estimated from water company concentration and distribution data as part of the water contaminants and stillbirth, congenital anomalies, birth weight, preterm delivery (HiWate) project in four of the cohorts (BiB, KANC, INMA, RHEA).38 For EDEN and MoBa, we followed the same methodology to obtain estimates during pregnancy. Data were not sufficiently complete to estimate child exposure to DBPs. Indoor air concentrations of nitrogen dioxide (NO2), particulate matter <2.5 µm (PM2.5), particulate matter absorbance, benzene and toluene, ethylbenzene, xylene were estimated by combining measurements in the homes of a subgroup of children during the two periods of the nested panel studies (see below) with questionnaire data from the subcohort.

Measurement of molecular signatures

In the subcohort, we also obtained the following measurements of molecular omics signatures at the age of 6–11 years: blood leucocyte DNA methylation (450K, Illumina), whole blood transcription (HTA V.2.0, Affymetrix and SurePrint Human miRNA rel 21, Agilent), serum metabolites (AbsoluteIDQ p180 kit, Biocrates), urine metabolites (proton nuclear magnetic resonance (1H NMR) spectroscopy) and plasma proteins (Luminex, cytokines 30-plex, apoliprotein 5-plex and adipokine 15-plex). Among the samples available for omics analyses, some were excluded because of the absence of genetic analysis consent (n=1), because the blood DNA/RNA extraction failed (n=386, 22.8%, for RNA; n=22, 1.5%, for DNA), because omics data were of low quality (n=10 for transcriptomics), because of technical outliers (serum haemolysed, n=1), or because of failed sample identity checks for the methylome (n=4 based on sex mismatch, and n=6 based on genotype mismatch between longitudinal HELIX samples from the same child or from existing genome-wide genetic data of the child). Telomere length and mitochondrial DNA content were measured by quantitative PCR as part of a separate project. Genome-wide genotyping will be completed as part of a separate project using the Infinium Global Screening Array from Illumina.

The number of omics markers varies greatly across the omics platforms: from 36 for proteomics to 480 071 for the methylome (table 4). The platforms and data processing procedures selected for the proteins and serum metabolome were in fact targeted assays (<200 features) in order to obtain the best quality data for a large number of samples with fully annotated proteins and metabolites.39 Further data filtering was applied to decrease the apparent complexity in the omics data. For example, in the urine metabolome, generated from untargeted NMR spectroscopic analysis from 128K spectral data points, 44 metabolite integrals were calculated only for resonances with high abundance and limited overlap with other metabolite signals. Urine metabolites were normalised using the median fold change.40 Proteins were filtered out if 30% of samples were outside of the linear range of quantification. After an initial quality control, the number of CpGs, transcript clusters and miRNAs were: 480 071, 67 528 and 2549, respectively. Additional filtering (ie, probes in sexual chromosomes, cross-hybridisation probes, etc) might be applied during data analysis.

Table 4

Omics features available in the HELIX subcohort and repeat child panel study

Within the HELIX subcohort of 1301 mother-child pairs, we obtained the following final numbers of children with omics data: n=941 for miRNA, 1010 for transcripts, 1170 for proteins, 1173 for methylation and 1198 for urine and serum metabolites; a total of 874 children (67% of the subcohort) had complete exposure and omics data (table 4, figure 1). Among these children, between 123 and 154, depending on the omic platform, also had a sample analysed for the second visit approximately 6 months later (table 4).

Panel studies

Intensive repeat panel studies collected data on short-term temporal variability in exposure biomarkers and omics biomarkers, individual behaviours (physical activity, mobility) and personal and indoor exposures (table 5). The child panel study included children from the HELIX subcohort (n=157, from all cohorts except MoBa) who lived in a first floor apartment or private house and were sampled following a maximum variation sampling strategy to high traffic-density exposure at home address. The pregnancy panel study included pregnant women from outside the cohorts in three cities, Barcelona, Grenoble and Oslo (n=154). The inclusion criteria for these pregnant women were to be 18 years or older at the start of pregnancy, to have a singleton pregnancy, to be living in the study area until delivery and to have the first visit before the end of gestational week 20. Participants in the child panel study were followed for 1 week in two seasons, whereas in the pregnancy panel study the participants were followed for 1 week in two trimesters. In the child panel, the last day of the first week coincided with the subcohort examination, detailed above.

Table 5

Measurements performed in the child and pregnancy panel studies

Participants carried smartphones for measurement of physical activity and to collect geolocalisation data through the ExpoApp, a smartphone-based application41 specifically developed for the project (table 5). Indoor air pollution exposure to NO2 and to volatile organic compounds benzene, toluene, ethylbenzene, meta-xylene, para-xylene and ortho-xylene, was measured through passive samplers installed in the homes. For the last 24 hours of the panel study periods, participants carried backpacks containing Active PM2.5 Cyclone pumps and black carbon MicroAthelometer monitors (Model AE51, AethLabs, California, USA). Electronic wrist bands measured UV exposure (Scienterra, New Zealand).

Urine samples were collected twice daily (first morning void and bedtime sample) in the child panel study and three times per day (morning, afternoon, bed time) in the pregnancy panel. Urine samples were used to measure repeat biomarkers for non-persistent exposures (phthalates, phenols, organophosphate pesticides and cotinine) and they were used to assess the variance in NMR metabolomics measured in the first morning void, bedtime and pooled urine42.

At the end of each monitoring week, blood samples were collected following the same procedures as for the subcohort, indeed the collection in week 1 was part of subcohort examination. Blood samples in the child panel study were also used to measure repeat omics signals (table 4). Lung function, blood pressure and anthropometric data were measured at the end of the panel study week following the same protocol as the subcohort clinical examination.

Patient and public involvement

There was no patient involvement in this study. The six cohorts participating in the HELIX project recruited healthy pregnant mothers and followed their children up to 6–11 years. The cohort studies kept the families involved throughout these years through regular clinical and questionnaire follow-ups and disseminated study results to them through newsletters, family meetings and open days. The results are also regularly disseminated to local, national and international stakeholders.

Findings to date

Baseline characteristics of the entire cohort

Main characteristics of the entire cohort are shown in table 6. Fifty-one per cent of the children in the entire cohort are boys; the average birth weight was 3372 g and the average gestational age 39.7 weeks; maternal age at delivery was 29.6 years on average; the majority of participants were from the highest educational level (51.6%); maternal BMI at the beginning of pregnancy showed a high percentage of overweight (25.6%) or mothers with obesity (15.8%) and 12.1% of mothers smoked during the entire pregnancy.

Table 6

Comparison of basic characteristics between the HELIX entire cohort (n=31 472), the subcohort (n=1301) and the child panel study (n=157)

Baseline characteristics of the subcohort

Basic characteristics of the subcohort were somewhat different to those of the entire cohort, probably reflecting selective participation of families in the intensive subcohort follow-up visit and data completeness requirements (table 6). Compared with the entire cohort, the subcohort contained a greater percentage of boys, fewer children whose parents were born abroad (in particular in INMA and RHEA), a lower percentage of mothers with low education (in particular in BiB), a lower percentage of primiparous mothers (mainly in MoBa) and older mothers. The higher percentage of active smoking observed in the subcohort compared with the entire cohort was due to the fact that there were less missing values for smoking in the subcohort.

The age of the children in the subcohort at the time of the examination was 8.1 years on average, and this varied substantially between cohorts with the youngest ages being observed in KANC, RHEA and BiB (median age 6.4, 6.5 and 6.6 years, respectively), followed by MoBa and INMA (8.4 and 8.8 years, respectively), and the oldest ages in EDEN (11.0 years) (table 7). On average, 45.4% of the subcohort participants were girls, ranging from 42.9% in EDEN to 47.8% in MoBa. Most of the subcohort children were of white European origin, although the subcohort within BiB comprised 43% white British and 44.9% of South Asian origin families with 12.1% of other ethnicities. The family’s economic situation as measured by the family affluence scale, showed marked differences between the cohorts with the majority of families in EDEN (78%), MoBa (72%) and INMA (54%) scoring high affluence, while lower affluence scores were observed in BiB, KANC and RHEA with 29.3%, 32.7% and 33.7% in the highest affluence category in those cohorts, respectively. The percentage of children classified with low family affluence was highest in BiB with 27.8%, while only around 1% of children in EDEN and MoBa were from low family affluence.

Table 7

HELIX subcohort characteristics

Maternal smoking during pregnancy was most prevalent in INMA (24.7%), EDEN (23.7%) and RHEA (21.2%) (table 7). Mothers’ replies to questions on environmental tobacco smoke exposure of the child showed that 34% of the children were exposed to environmental tobacco smoke in at least one place (outdoor or indoor), ranging from 19% in MoBa to 69% in RHEA. Consumption of fruits was highest in BiB and MoBa, and of vegetables in EDEN and MoBa. Visits to fast food restaurants/takeaways were most frequent in BiB. Physical activity levels were constructed based on a self-reported questionnaire where we asked about the frequency, intensity and duration of performing physical activities at school, out of school, during weekends and during summer. Over-reporting and abnormal data were corrected based on predictive models built from the panel population accelerometer (Actigraph) data. Estimates in minutes per day of moderate to vigorous activity, that is, activities with intensity above three metabolic equivalents, were low in EDEN (17 min) and INMA (26 min) and high in KANC (42 min), BiB (42 min) and RHEA (49 min).

Food allergy questionnaires showed that overall 21% of children were reported to have at least one food allergy (ever experienced), ranging from 15.6% in RHEA to 35% in INMA (table 7, figure 2). The percentage of children who had ever had asthma was low in INMA (3.6%) and high in BiB (18.5%) and EDEN (20.2%). Overall, 18.8% of children were overweight and 9.9% were obese (total 27.7%). The percentage of overweight and obese children (using the age-and-sex-standardised z-scores) was highest in RHEA (37.2%) and INMA (42.3%) and lowest in MoBa (15.8%) (table 7, figure 2).

Figure 2

Prevalence of children with food allergy, asthma, overweight/obesity and ADHD symptoms in the HELIX subcohort at 6–11 years. BiB, Born in Bradford; EDEN, Étude des Déterminants pré et postnatals du développement et de la santé de l’Enfant; HELIX, Human Early Life Exposome; INMA, INfancia y Medio Ambiente; KANC, Kaunus cohort; MoBa, Norwegian Mother and Child Cohort Study; zBMI , age standardized z-score for body mass index .

ADHD symptoms assessed through the Conner’s rating scale were classified using the cut-off score of the 80th percentile.42 Using this classification, 10.1% of children in the subcohort were classified as having ADHD symptoms, ranging from 4.4% in MoBa to 15.2% in KANC (table 7, figure 2). The total problems score of the CBCL, which consists of the sum of ratings on all 120 behavioural and emotional items of the CBCL, also showed that mothers in MoBa reported the lowest total score (median score 9) and mothers in KANC the highest (median score 27).

Baseline characteristics of the child panel study

Participants in the child panel study (n=157, 28 from BiB, 28 from EDEN, 42 from INMA, 29 from KANC and 30 from RHEA) were similar to the HELIX non-panel subcohort children of the same cohorts in terms of sociodemographic characteristics (table 6). The panel study included children whose mothers had similar ages, weight status and education than children not included in the panel. Birth weights and gestational ages were also similar between panel and non-panel children.

Through the child panel study, we showed that the pooled urine sample (before bedtime and first morning void) provided more coverage of the stable metabolome than would be achieved with either morning or bedtime urine sample alone.43 Through the repeated analysis of non-persistent exposures, we provided variability indicators for each chemicals that can be used to correct dose-response relations and optimise sampling designs in future biomonitoring and exposome studies, and thus limit exposure misclassification (Casas, manuscript under revision).

Strengths and limitations

The HELIX project has constructed a unique large exposome cohort, which included the prospective collection of objective data from different sources (biomonitoring data, geospatial data, sensor data, child health outcomes and omics signatures). These data can facilitate cross-validation of repeated information across different sources (eg, tobacco exposure estimated from questionnaire and cotinine biomarker) and the use of standardised tools and objective measures can allow international comparisons with other studies. The pluridisciplinary aspect of the HELIX study means that a wide range of environmental factors were measured including detailed information of socioeconomic factors, which will help unravelling the influences of pregnancy risk factors, the chemical and physical environment, early family life and that of the school-age exposures on child development. Weaknesses include the loss to follow-up over time, a typical issue in most prospective longitudinal studies and lack of statistical power to study rare outcomes. Our sample size does not allow the investigation of rare diseases or extreme values for continuous traits unless data are pooled with those of other cohorts. In addition, those living outside urban areas were not included in the study due to the lack of outdoor environment data.

Ethics and data protection

Prior to the start of HELIX, all six cohorts on which HELIX is based had been in existence for some years, had undergone the required evaluation by national ethics committees and had obtained all the required permissions for their cohort recruitment and follow-up visits. Each cohort also confirmed that relevant informed consent and approval were in place for secondary use of data from pre-existing data. The work in HELIX was covered by new ethics approvals in each country, and at enrolment in the HELIX subcohort and panel studies participants were asked to sign an informed consent form for the specific HELIX work including clinical examination and biospecimen collection and analysis. An Ethics Task Force was established to support the HELIX project on ethical issues, for advice on the project’s ethical compliance, identification and alerting to changes in legislation where applicable.

Specific procedures are in place within HELIX to safeguard the privacy of study subjects and confidentiality of data. First, any reported study results pertain to analyses of aggregate data; no variables or combination of variables that can identify an individual will be associated with any published or unpublished report of this study. Primary databases with personal information (such as geocodes, dates, questionnaires or health outcomes) have been stored on separate computers with personal identifiers removed. Subjects are identified by a unique study number, linking all basic data required for the study. The master key file linking the study numbers with personal identifiers is maintained in each cohort. For the dataset analysis, all information that enables identification of an individual (dates, geocodes, etc) is removed before distribution of datasets to the researchers. All data exchanges will adhere to the most up-to-date EU and national data protection regulations.

Data warehouse

Relevant datasets from all 31 472 mother-child pairs were transferred from the six cohorts to the central HELIX data warehouse located at ISGlobal. The HELIX data warehouse consists of several schemas, which are linked by means of common identifiers in a relational database created in MySQL.44 New data, collected through the common protocols during the subcohort and panel study fieldwork, were entered directly into an electronic database and then uploaded into the data warehouse. Questionnaires were computer-based with a direct entry to the database. All data were locally and centrally checked by examination of the ranges, distributions, means, SD, outliers and logical checks. Data outliers and missing values were checked with the local cohort field workers and, where possible and relevant, replaced by correct values. All new measurements of exposure biomarkers and omics from the labs, and all exposure variables estimated through geospatial models and other methods, were added to the data warehouse as they became available.


The authors would like to thank all the participating children, parents, practitioners and researchers in the six countries who took part in this study. The authors would like to thank Sonia Brishoual, Angelique Serre and Michele Grosdenier (Poitiers Biobank, CRB BB-0033-00068, Poitiers, France) for biological sample management and Professor Frederic Millot (Principal Investigator), Elodie Migault, Manuela Boue and Sandy Bertin (Clinical Investigation Center, Inserm CIC1402, CHU de Poitiers, Poitiers, France) for planning and investigational actions. The authors would like to thank Veronique Ferrand-Rigalleau, Céline Leger and Noella Gorry (CHU de Poitiers, Poitiers, France) for administrative assistance (EDEN). The authors would like to thank Silvia Fochs, Nuria Pey, Cecilia Persavente and Susana Gross for field work, sample management and overall management in INMA. The authors would like to thank Georgia Chalkiadaki and Danai Feida for biological sample management, to Eirini Michalaki, Mariza Kampouri, Anny Kyriklaki and Minas Iakovidis for field study performance and to Maria Fasoulaki for administrative assistance (RHEA). The authors would also like to thank Ingvild Essen for thorough field work, Heidi Marie Nordheim for biological sample management and the MoBa administrative unit (MoBa).


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.


  • Contributors LM coordinated the collection and harmonisation of the data as the HELIX project scientific coordinator (2016–2018) and drafted the first draft of the manuscript. JdB performed the panel study fieldwork, data harmonisation and data description and assisted in drafting the manuscript. MCasas coordinated field work in the INMA Sabadell cohort, designed the biomarker database, coordinated sample collection and assisted in drafting the manuscript. OR prepared fieldwork protocols and questionnaires and supervised the fieldwork across all cohorts as the HELIX project coordinator (2013–2016). The following authors contributed to the collection of data on chemical contaminants: CT led the workpackage and oversaw all aspects of the work on the biomarker measurements of chemical contaminants; LSH performed the biological sample management and biomarker analysis; CB conducted the pharmacokinetics models data collection and protocols preparation. The following authors contributed to the collection of data on outdoor exposures: MJN led the workpackage and oversaw all aspects of the work on outdoor exposures; MdC conducted the outdoor exposure calculations; DDG conducted the exposure monitoring of panel children and physical activity; IT conducted the outdoor exposome data harmonisation and modelled indoor air pollution and water contamination exposures. The following authors contributed to the omics data collection, analysis and interpretation: MCoen led the omics workpackage, designed the study and oversaw the metabolomics data collection; HCK designed the study and oversaw the metabolomics data collection; CEL performed the NMR metabolite quantification; APS performed the MS metabolite quantification; EB conducted the proteomics analysis; ES designed the proteomics study and oversaw the proteomics data collection; MB designed the study and conducted the analysis for the DNA methylation and transcriptomics (gene expression and miRNAs); MVives conducted the gene expression and miRNA analysis; AC facilitated the analysis and oversaw research for DNA methylation data; XE coordinated the analysis and oversaw research for transcriptomics (gene expression and miRNAs) data collection; JRG designed the omics and exposome bioinformatics and statistical analyses; CHF programmed the R package and contributed to the design of the omics and exposome bioinformatics and statistical analyses. The following authors contributed to data analysis and interpretation: RS led the workpackage and oversaw the preparation of statistical analysis protocols; XB led the statistical analysis working group and prepared statistical analysis protocols; LC led a workpackage, prepared clinical examination protocols and contributed to the clinical data harmonisation and interpretation; SF prepared clinical examination protocols and contributed to the clinical data harmonisation and interpretation; JJ prepared the neurodevelopment protocols and coordinated the neurodevelopment data preparation and interpretation. VS and BG led the allergy and respiratory health data collection, harmonisation and interpretation. VS also assisted in the preparation of statistical analysis protocols. LA conducted the spirometry data harmonisation and contributed to the statistical protocol preparation. CW checked pooled data for accuracy of information and revised the manuscript critically. The following authors contributed to the cohort data collection. MoBa cohort: HMM designed the study and oversaw all aspects of subcohort and panel study data collection. KBG coordinated the subcohort fieldwork; BG coordinated the pregnancy panel fieldwork; GMA constructed and harmonised the MoBa existing database; JE was responsible for the neurological testing in the subcohort, NHK collected GIS input data and prepared routine monitoring data;. KANC cohort: RG (PI of the KANC cohort) designed the study and oversaw all aspects of KANC data collection. SA coordinated the fieldwork for subcohort and panel study and checked pooled data for accuracy of information; AD conducted fieldwork and GIS work; IP revised KANC data and revised the manuscript critically. INMA cohort: MVrijheid designed the study and oversaw all aspects of INMA subcohort and panel study data collection. FB (PI of the INMA-Valencia cohort) oversaw data collection in Valencia; JI (PI of the INMA-Gipuzkoa cohort) oversaw data collection in Gipuzkoa; JS (PI of the INMA-Sabadell cohort and of the entire INMA study) oversaw all previous INMA data collections; CM coordinated the Barcelona pregnant woman panel fieldwork and data preparation. EDEN cohort: RS designed the study and oversaw all aspects of EDEN subcohort and panel study data collection and critically reviewed the manuscript. BH (PI of the EDEN cohort) oversaw previous follow-ups of EDEN population; SLC coordinated the pregnant women panel fieldwork; JQ co-coordinated the children subcohort fieldwork and database integration; PJS was responsible for the subcohort fieldwork in Poitiers; LGA co-coordinated the children panel follow-up, checked pooled data for accuracy of information, conducted the ESCAPE data harmonisation and prepared GIS data. RHEA cohort: LC (PI of the RHEA cohort) designed the study and oversaw all aspects of RHEA subcohort and panel study data collection. JMK carried out the field work and helped design the clinical examination protocols, VL coordinated and carried out the fieldwork; TR checked pooled data for accuracy of information, prepared GIS data and conducted the clinical data harmonisation; MVafeiadi coordinated fieldwork and sample management. BiB cohort: JW designed and oversaw all aspects of BiB subcohort and panel study data collection data; DM constructed the database; RMc designed and oversaw all aspects of BiB subcohort and panel study data collection data; DW coordinated the fieldwork. PvH was responsible for dissemination aspects of the HELIX project. JU constructed and managed the HELIX database and performed data harmonisation, cleaning and validation. DvG is the HELIX project coordinator; she drafted the ethical and data protection and sharing proposal. Finally, the following authors designed the HELIX study and supervised all aspects of the work as members of the HELIX Project Executive Committee: LC, MCoen, PvdH, MJN, RS, CT and JW. MVrijheid coordinated the HELIX project, supervised all data collection, supervised all work related to the manuscript and drafted the manuscript. All authors read and approved the final manuscript. ISGlobal is a member of the Agency for the Research Centres of Catalonia (CERCA) Programme, Generalitat de Catalunya. MC is a member of the MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, UK. ISGlobal is a member of the Agency for the Research Centres of Catalonia (CERCA) Programme, Generalitat de Catalunya. MC is a member of the MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, UK.

  • Funding The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-206) under grant agreement no 308333—the HELIX project. Dr Maribel Casas and Dr Jordi Julvez received funding from Instituto de Salud Carlos III (Ministry of Economy and Competitiveness) (MS16/00128, MS14/00108). INMA data collections were supported by grants from the Instituto de Salud Carlos III, CIBERESP, the Conselleria de Sanitat, Generalitat Valenciana, Department of Health of the Basque Government; the Provincial Government of Gipuzkoa, and the Generalitat de Catalunya-CIRIT. KANC was funded by the grant of the Lithuanian Agency for Science Innovation and Technology (6-04-2014_31V-66). The Norwegian Mother and Child Cohort Study (MoBa) is supported by the Norwegian Ministry of Health and the Ministry of Education and Research, NIH/NIEHS (contract no. N01-ES-75558), and NIH/NINDS (grant no. 1 UO1 NS 047537-01 and grant no. 2 UO1 NS 047537-06A1). The Rhea project was financially supported by European projects, and the Greek Ministry of Health (Program of Prevention of Obesity and Neurodevelopmental Disorders in Preschool Children, in Heraklion district, Crete, Greece: 2011–2014; ’Rhea Plus': Primary Prevention Program of Environmental Risk Factors for Reproductive Health, and Child Health: 2012–2015). The work was also supported by MICINN (MTM2015-68140-R) and Centro Nacional de Genotipado-CEGEN-PRB2-ISCIII. CW received funding from the Fondation de France.

  • Competing interests None declared.

  • Patient consent Parental/guardian consent obtained.

  • Ethics approval Comité Ético de investigación Clínica Parc de Salut MAR.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The data warehouse has been established in a format that allows future use beyond the project lifespan (2013–2017) as an accessible resource for collaborative research involving researchers external to the project. Access to HELIX data is based on approval by the HELIX Project Executive Committee and by the individual cohorts, who will evaluate potential overlap with ongoing work, adequacy of data protection plans, logistic and financial consequences and adequacy of authorship and acknowledgement plans. Further details on the content of the data warehouse (data catalogue) and procedures for external access are described on the project website ( The authors encourage interested researchers to contact them to set up collaborations.