Article Text

Download PDFPDF

Gut microbiome, enteric infections and child growth across a rural–urban gradient: protocol for the ECoMiD prospective cohort study
  1. Gwenyth O Lee1,
  2. Joseph N S Eisenberg1,
  3. Jessica Uruchima1,
  4. Gabriela Vasco2,3,
  5. Shanon M Smith4,
  6. Amanda Van Engen1,
  7. Courtney Victor4,
  8. Elise Reynolds1,
  9. Rebecca MacKay4,
  10. Kelsey J Jesser5,
  11. Nancy Castro6,
  12. Manuel Calvopiña7,
  13. Konstantinos T Konstantinidis8,
  14. William Cevallos9,
  15. Gabriel Trueba2,
  16. Karen Levy5
  1. 1Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, Michigan, USA
  2. 2Instituto de Microbiología, Universidad San Francisco de Quito, Quito, Pichincha, Ecuador
  3. 3Facultad de Ciencias Médicas, Universidad Central del Ecuador, Quito, Ecuador
  4. 4Department of Environmental Health, Emory University Rollins School of Public Health, Atlanta, Georgia, USA
  5. 5Department of Environmental and Occupational Health Sciences, University of Washington School of Public Health, Seattle, Washington, USA
  6. 6Carrera de Nutrición y Dietética, Universidad San Francisco de Quito, Quito, Pichincha, Ecuador
  7. 7Carrera de Medicina, Universidad de Las Americas Facultad de Ciencias de la Salud, Quito, Pichincha, Ecuador
  8. 8School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
  9. 9Instituto de Biomedicina, Facultad de Ciencias Médicas, Universidad Central del Ecuador, Quito, Ecuador
  1. Correspondence to Dr Gwenyth O Lee; golee{at}


Introduction The functional consequences of the bacterial gut microbiome for child health are not well understood. Characteristics of the early child gut microbiome may influence the course of enteric infections, and enteric infections may change the composition of the gut microbiome, all of which may have long-term implications for child growth and development.

Methods and analysis We are conducting a community-based birth cohort study to examine interactions between gut microbiome conditions and enteric infections, and how environmental conditions affect the development of the gut microbiome. We will follow 360 newborns from 3 sites along a rural–urban gradient in northern coastal Ecuador, characterising enteric infections and gut microbial communities in the children every 3 to 6 months over their first 2 years of life. We will use longitudinal regression models to assess the correlation between environmental conditions and gut microbiome diversity and presence of specific taxa, controlling for factors that are known to be associated with the gut microbiome, such as diet. From 6 to 12 months of age, we will collect weekly stool samples to compare microbiome conditions in diarrhoea stools versus stools from healthy children prior to, during and after acute enteric infections, using principal-coordinate analysis and other multivariate statistical methods.

Ethics and dissemination Ethics approvals have been obtained from Emory University and the Universidad San Francisco de Quito institutional review boards. The findings will be disseminated through conference presentations and peer-reviewed journals.

  • molecular biology
  • tropical medicine
  • community child health
  • public health
  • epidemiology

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • A birth cohort design allows us to examine prospectively how environmental factors impact the gut microbiome over the critical first 2 years of life.

  • A longitudinal birth cohort design allows us to examine how early microbiome conditions shape responses to a range of enteric infections.

  • A rural–urban gradient allows us to assess how social, environmental and dietary factors impact the relationship between enteric pathogen transmission and the gut microbiome in a population that otherwise is culturally and genetically similar.

  • Differences in microbiome diversity may be explained by non-environmental factors that we fail to fully characterise and adjust for (eg, genetics).

  • We may be underpowered to examine less common pathogens.


Enteric pathogens are a significant cause of morbidity and mortality among young children in low and middle-income countries (LMICs). However, many infections are either asymptomatic or result in only mild-to-moderate disease.1 2 In other instances, infection with the same pathogen results in severe diarrhoea, requiring hospitalisation and sometimes leading to death. Cumulative enteric infections early in life, even when asymptomatic, are associated with chronic conditions such as environmental enteric dysfunction (EED),3 growth faltering4 5 and cognitive deficits.6 As a result, understanding the factors that underlie relative susceptibility or resilience has become a critical question in the field of enteric disease research.

In a healthy microbiome, resident microorganisms may reduce the risk of pathogen invasion, while in an unhealthy microbiome, potential pathogens may outcompete commensal microbes.7 Additionally, the gut microbiome can be adversely impacted by the cycle of infection and malnutrition often experienced by children whose growth is faltering.8 9 Children with acute diarrhoea have reduced taxonomic microbiome diversity10 11 and exhibit predictable patterns of structural and functional gut microbial succession as the microbiome recovers.12 Severely malnourished children also have a ‘less mature’ microbiome compared with healthy infants,13 and mice implanted with microbiota from undernourished child donors have less weight gain than mice transplanted with microbiota from healthy child donors.14 15 A better understanding of how the gut microbiome both protects against and responds to enteric infections may help to mitigate the negative health consequences of these infections.16

Across geographic locations, major differences exist in the taxonomic diversity and composition of the gut microbiome.17–22 Consistently, higher fecal bacterial alpha diversity (within individuals) and lower beta diversity (between individuals) have been reported from remote traditional societies, compared with more ‘westernised’ societies.17–22 These patterns in bacterial community diversity may be driven by the maternal diet and microbiome,23 infant dietary habits and nutrition,24 25 exposure to animals26 and chemicals27 or other lifestyle factors.28–30 Prenatal maternal weight, diet and antibiotic usage have all been shown to impact the infant gut microbiota.31 Drinking water quality is also likely a contributor to the composition and development of the gut microbiome32 as well as a source of exposure to enteric pathogens.33 While these patterns are intriguing, the causal linkage between these population-level determinants, gut microbiome conditions and, ultimately, child health are not well understood. Information about the environmental conditions that shape the early child gut microbiome may help guide the development of appropriate community and household-level interventions that will lead to optimal microbiome development.

Here, we present the protocol for the ‘gut microbiome, enteric infections and child growth across a rural–urban gradient’ or ‘Enteropatógenos, Crecimiento, Microbioma, y Diarrea’ (ECoMiD) study. The primary hypothesis of this study is that the gut microbiome mediates the effect of enteric infections on diarrhoea, EED and growth in the first 2 years of life. Our specific study objectives are to examine: (1) how environmental conditions affect the developing child gut microbiome and enteric pathogen burden, (2) whether the gut microbiome modifies the short-term and long-term health outcomes associated with enteric pathogen infections (ie, diarrhoea, EED and child growth) and (3) how the gut microbiome responds to and recovers from enteric pathogen infection (figure 1).

Figure 1

Overview diagram of the relationships the study will explore. The numbers on the diagram refer to the specific aims that will address the relationship. Solid lines indicate the main effects that will be tested in models, and dashed lines indicate effect modification. EED, environmental enteric dysfunction.

Methods and analysis

Longitudinal data are needed to characterise conditions prior to infection and to compare those data to the commensal gut microbiome response following infection, enabling us to better disaggregate cause and effect for interactions between infection and the microbiome. Similarly, longitudinal data are useful in identifying whether alterations to the microbiome precede, accompany or follow from EED and growth faltering. We have, therefore, developed a community-based birth cohort study of 360 mother–child dyads.

Study sites

Our study sites span a rural–urban gradient in Ecuador and include the ‘urban’ site of the city of Esmeraldas (population ~1 50 00034), the ‘intermediate’ town of Borbón (population ~4500) and several ‘rural’ communities located between 1 and 3 hours travel from Borbón (community populations ~400 to 920) (figure 2). This rural–urban gradient study design provides a population in a high enteric pathogen transmission setting that has variability in microbiome diversity, yet relative similarity in other cultural, social and genetic factors (eg, child feeding practices and race) that might determine host response to infections, thereby allowing us to isolate the effects of the gut microbiome on the health impacts of enteric infections. Similar types of enteric infections are present across the gradient; however, the relative and overall prevalence of these pathogens varies. For example, pathogenic Escherichia coli, rotavirus and Giardia are more common in less remote areas.35 We have also observed differences in microbiome composition by community membership as well as by diarrhoea case–control status.36 37 Previous studies have placed the prevalence of stunting in our study region at around 15%38–40 and the 2-week prevalence of diarrhoea in children under 5 at around 10%.41

Figure 2

Study sites. The urban–rural gradient of the ECoMiD study: the city of Esmeraldas, the town of Borbón and smaller rural communities in the Canton of Eloy Alfaro, including communities with and without road access. ECoMiD, Enteropatógenos, Crecimiento, Microbioma, y Diarrea.

Study design

Women are recruited and enrolled in late pregnancy, from 37 weeks onward, where gestational age is based on the mother’s report. The study began in May 2019, and we expect to complete data collection in early 2024.

Once their children are born, we characterise enteric infections and gut microbial communities in the children at multiple time points over their first 2 years of life. We also comprehensively evaluate environmental conditions that are associated with enteric pathogen exposure and assess other factors that are known to be associated with the infant gut microbiome, including the maternal microbiome in late pregnancy, caesarean versus vaginal delivery,42 infant diet24 25 and infant nutritional status.43 Prior to the cohort, we comprehensively piloted study instruments and sample collection procedures among 75 women with 0 to 2-year-old children at the urban site (city of Esmeraldas).

Eligibility criteria

Our enrolment strategy is intended to capture healthy pregnancies and minimise the impact of potential confounders related to the development of the gut microbiota. Therefore, study exclusion criteria (table 1) include planned caesarean section, or high-risk pregnancy according to Ministry of Health guidelines, as this is a risk factor for unplanned caesarean section. Women who plan to leave their current community within 6 months or who are under the age of 18 years old are also not eligible. If an enrolled participant experiences an unplanned caesarean section, or any pregnancy complication, they continue to participate in the study. We will perform subsequent analyses to determine how data from these participants should be used.

Table 1

Study inclusion and exclusion criteria


We have several recruitment strategies, including recruitment of women attending prenatal services through the local health centre, snowball sampling (participating mothers may nominate their acquaintances) and radio announcements that are broadcast throughout the study area. After potential participants are identified, field staff contact the pregnant woman to confirm criteria before inviting her to participate in the study.

The consent process occurs in-person with the mother, who consents on behalf of herself and her child. This procedure takes place in a private location within the local health centre or in the mother’s home. Study team members, who are community members trained in the ethical conduct of research, are responsible for obtaining consent. To ensure comprehension, the field staff verbally summarise the study using the consent form as a guide and provide a one-page pictorial description of study activities. The staff makes clear that study participation is voluntary, unrelated to access to prenatal or postnatal care and that the mother and her child may withdraw from the study at any time. Written informed consent is sought, unless the mother is illiterate. In this case, the consent form is read to her in the presence of an impartial witness, and a digital impression (thumbprint) is obtained in place of a signature. A copy of the consent form is provided to the mother, and the signed copy is maintained in a secure, locked cabinet in the study offices. Throughout the study, the field team periodically reminds mothers of planned study procedures. Mothers are also offered the opportunity to decline to participate in individual activities.

Benefits of study participation include child growth monitoring, the timely identification of parasitic infections and anaemia, for which appropriate treatment is coordinated through the Ministry of Health, and information about household water quality from environmental testing. We also provide mothers with general health advice. Study incentives include objects such as soap, baby oil and small toys, of approximately US$3–5, distributed every 3–6 months.


If an enrolled mother–child dyad decides to withdraw or travels permanently outside the study area, the study team documents their exit from the study. Mothers who travel outside the study area and then return may resume study participation, and mothers who travel between study sites remain enrolled. If the mother makes an active decision to withdraw from the study, a supervisor communicates with her to ensure that her reasons for withdrawal are clear to the team.

Data collection

Following enrolment, questionnaires are administered to collect information about prenatal maternal diet, environmental contamination and demographic and socioeconomic factors such as maternal age, parity and household size. A stool sample is collected for determination of the maternal microbiome and a serum sample for determination of maternal anaemia and inflammation. Within 7 days of birth, information about delivery mode (vaginal vs caesarean), gestational age, birth weight and delivery location are recorded based on information recorded on the child’s vaccine card, breastfeeding initiation is captured based on maternal report, a child stool sample is collected and child anthropometry is measured. Further stool and serum samples, nutrition data and environmental assessments are then collected at household visits detailed in figure 3 and described in greater detail below. Depending on the analysis in question, these variables may be included in models as exposures of interest, outcomes of interests or as covariates.

Figure 3

Overall participant timeline. Primary study activities are programmed according to the age of the child. Squares indicate a single sampling point, except for intensive samples, which are collected weekly from 6 to 12 months of age (a total of 24 samples), and diarrhoeal symptoms, which are collected weekly for the entire period of the study. EED, environmental enteric dysfunction; SES, socioeconomic status.

Environmental exposures

To understand how environmental conditions affect the developing child gut microbiome and enteric pathogen burden, we comprehensively evaluate environmental conditions using multiple approaches, across different scales (figure 4). At the broadest scale, households are selected across a rural–urban gradient to capture variation associated with urbanisation. We characterise community sanitation coverage using data from prior and ongoing studies in the region.35 44 At the household level, we use a combination of surveys, environmental sampling, structured observations and in-depth interviews. These activities are conducted every 6 months (figure 3). Surveys include questions about household water sources, animal ownership and child-related hygiene, and observations about water storage, sanitation and washing and cooking areas. Structured observations are conducted on a subset of children to characterise opportunities for contact between the child and their environment. Environmental sampling consists of rinses of the mother’s and children’s hands in 101 mL of sterile water and a 101 mL sample of the mother (at the prenatal visit) or children’s drinking water. We assay samples for faecal indicator bacteria using both PetriFilms (3M, St. Paul, Minnesota) and the Colilert presence-absence test (IDEXX laboratories, Westbrook, Maine). This approach was chosen to optimise the information obtained while remaining feasible in a remote field setting, limiting sample processing time and using study workers without prior formal environmental microbiology training. The tests are described in table 2. Because of the high variability in water quality results over time,33 samples are repeated 3 times from the same household within 1 week.

Table 2

Quantitative and categorical results from water and hand-rinse sampling

Figure 4

Environmental context of the ECoMiD study. Study data are organised according to a socio-ecological framework. ECoMiD, Enteropatógenos, Crecimiento, Microbioma, y Diarrea.

Stool collection

Maternal stool is collected once, at enrolment. Child stool samples are collected at 1 week, and at 3, 6, 12 and 18 months of age (figure 3). Mothers are given a small container to store stool and asked to store this in a cooler provided by the study until the study team member arrives to collect the sample. In addition, from 6 to 12 months of age, we collect weekly (‘intensive’) stool samples. After samples are retrieved by the study team member, multiple aliquots of stool are immediately stored at −196°C in portable liquid nitrogen tanks, which are transported to the Universidad San Francisco de Quito monthly for long-term storage. We collect additional aliquots in Zymo DNA/RNA Shield (Zymo Research, Irving, California). The latter is intended as a backup in case of cold chain failure. The quality of DNA extracted from both frozen samples and samples preserved in Zymo DNA/RNA shield was previously confirmed during piloting of the laboratory methods (data not shown). After all intensive stool samples have been collected, we will conduct a subanalysis using the banked samples. We will select stool samples from 200 children who experienced an episode of diarrhoea at the time of sample collection, where an episode is defined as an onset of diarrhoea preceded by 7 days of no reported diarrhoea. These samples will be randomly selected among geographic strata to ensure equal representation of children across the rural–urban gradient of the study. A matched stool sample from a child without diarrhoea will be selected based on the criterion that the individual has no reported symptoms at the time of stool collection and for at least 7 days prior and is matched by age (±1 month of the symptomatic child) and by infection with the same pathogen. We will test these samples for enteric infections using a TaqMan array card (TAC; ThermoFisher). Pathogens will be linked to the diarrheal disease episode based on relative cycle threshold values from the TaqMan results.45 Children may contribute both diarrheal and asymptomatic stool samples to the study at different ages. This repeated sample design will be accounted for in the analysis.

For each selected symptomatic and paired asymptomatic stool sample, we will also assay 3 further stool samples for 16s ribosomal RNA (rRNA) gene amplicon sequencing and enteric infections, using a TaqMan array card (TAC; ThermoFisher) further described below: one collected the week before the diarrheal episode, one during the week of the episode and one 2 or more weeks following the episode.

Beyond analysing the stool to match symptomatic and asymptomatic samples with respect to infection status, downstream analyses planned for the stool samples include: (1) Bacterial microbiome assessment: after DNA is extracted from frozen or preserved stool samples, the 16s rRNA gene will be polymerase chain reaction (PCR) amplified and sequenced. In a subset of samples, we will also carry out shotgun metagenome sequencing; (2) Pathogen assessment: extracted DNA and RNA will be analysed for multiple viral, bacterial and parasitic enteric pathogens using a TaqMan array card (TAC; ThermoFisher), a singleplex molecular method that consists of nucleic acid amplification of multiple microbial targets.45–47 The specific pathogens we will test for are enteroaggregative E. coli, diarrheagenic E. coli, Shiga toxin-producing E. coli, enteropathogenic E. coli, enterotoxigenic E. coli, Shigella, enteroinvasive E. coli, Campylobacter jejuni, Campylobacter coli, Salmonella typhi, Adenovirus, Astrovirus, Enterovirus, Norovirus GI, Norovirus GII, Rotavirus, Sapovirus, SARS-CoV-2, Cryptosporidium hominis, Cryptosporidium parvum, Cyclospora cayetanensis, Giardia duodenalis, Entamoeba histolytica, Ascaris lumbricoides and Trichuris trichiura. TaqMan array card assays will be tested for linearity and matrix inhibition using positive control plasmids 48 as well as limits of detection, repeatability, reproducibility and analytical accuracy using reference strains or genomic DNA/RNA.45 Phocine herpesvirus and MS2 bacteriophage will be used as extrinsic controls.45 (3) EED assessment: EED will be assessed by analysing the stool for 4 faecal biomarkers of intestinal inflammation and permeability (myeloperoxidase, alpha-1-antitrypsin, neopterin and calprotectin) using commercially available enzyme-linked immunoassay (ELISA) kits.49 50

Serum collection

We collect venous blood samples from mothers at one time point (prenatal) and from children at 2 time points (12 and 24 months). Anaemia is analysed in real time via a haemoglobin test using the HemoCue Hb 201+System. Inflammation is measured by C reactive protein (CRP) and α−1-acid glycoprotein (AGP), both using a sandwich ELISA. We will define inflammation as CRP>5 mg/L; and AGP>1 mg/L.51–53

Diarrhoea incidence

Variables associated with the child’s diarrhoea are collected through weekly surveys, administered either through household visits or telephone calls. These include (1) caregiver-reported symptomatic diarrheal disease (defined as 3 or more loose stools in a 24-hour period54), (2) caregiver’s perception of diarrhoea using locally relevant terms,55 (3) features of diarrheal severity,56 57 such as caregiver-reported dysentery, vomiting and fever and (4) antibiotic usage.


Field staff measure weight, length and head circumference within 1 week of birth, and quarterly thereafter, using infant scales and recumbent length measuring boards. Measurements are made within 2 weeks (14 days) of the targeted day, or up to 6 weeks after the target day if the family is travelling during the desired window. Length, height and weight will be converted to Z-scores based on World Health Organization reference standards.58 For quality control, measurements are repeated if length-for-age or head-circumference-for-age Z-scores are less than −6 or greater than 6, or if weight-for-age or weight-for-length are less than −5 or greater than 5.58 59 Stunting is defined as 2 SDs below median height for age of the reference population.58

Dietary assessments

Dietary data are collected using 3 questionnaires. First, from 0 to 24 months of age, weekly surveillance visits include 3 questions to characterise the presence or absence of breastfeeding, the intake of non-breast milk liquids and intake of semi solid and solid foods in the past week. Second, on a quarterly basis, a modified dietary diversity questionnaire is administered to characterise (1) dietary diversity, (2) indicators of complementary feeding such as feeding frequency and minimal acceptable diet and (3) the child’s usage of common micronutrient supplements (vitamin A, iron or zinc).60 This questionnaire queries about the presence or absence of food groups consumed in the past 24 hours, with some additional questions that capture common regional complementary foods, such as baby porridges frequently made with and without milk. Third, 24-hour dietary recalls are conducted with the child’s primary caregiver at 6, 12 and 18 months of age to assess the child’s intake of non-breastmilk macronutrients and micronutrients and to allow for the calculation of nutrient adequacy ratios and summary measures of overall dietary quality.61 62 At each time point, this consists of 3 recalls conducted on non-consecutive days over a 1-week period. We will calculate macronutrient and micronutrient intakes from complementary foods using a standard reference food composition database for Ecuador.63 This database has been expanded to include local dishes and recipes specific to the study population. Macro and micronutrient intake data will be used to create covariates for the analyses described below. We will use data reduction techniques to summarise intake data. These techniques may include a priori methods to assess specific characteristics of the diet, such as its inflammatory potential,64 as well as a posteriori data-driven approaches such as principal component analysis and reduced rank regression.65

Household socioeconomic status

Socioeconomic status (SES) is measured using indices based on household building materials and assets, which have been standardised across prior studies by our group.39

Creation of a sample bank

With the written informed consent of the participant, any stool or plasma samples left over following planned study assays are stored in anticipation of future advancement and standardisation as well as potential follow-on studies. All samples are labelled and stored separately from household or participant identifiers (eg, name, household location) and linkable only by a protected database.

Data management

Data are captured electronically on Android tablets using Open Data Kit open-source software ( Tablets are password protected and securely stored when not in use. Identifiable data linking participant and sample ID codes to names and addresses are collected at enrolment and stored in a secure folder, accessible only to the study team. Otherwise, all data are collected and stored without personal identifiers.

Survey forms include built-in checks to reduce missing responses and flag logical inconsistencies. Study data are downloaded monthly, and quality control checks are run to compare recently completed study activities against scheduled activities and to check for missing or implausible values, duplicated records or inconsistencies. Discrepancies are reconciled through communication between the data management team and field supervisors.

Study monitoring

This observational study does not test an intervention. Therefore, no external data monitoring committee exists.

COVID-19 pandemic

In March 2020, study enrolment was paused due to the coronavirus pandemic, and activities with already enrolled subjects were modified. All questionnaires that could be conducted by telephone were continued, but biological and environmental sample collection activities, structured observations and anthropometry were paused. As a result, some data continued to be available during this period, but other data are missing. Reinitiation of study activities occurred in a phased manner, beginning in early 2021, as it was considered safe to contact study participants. This started with limited stool sample collection that minimised contact between fieldworkers and study participants. All activities have followed protocols set forth by the Ecuadorian Ministry of Health during the health emergency, declared by ministerial agreement 00 126–2020.

Outcome measures and statistical analysis

Specific aim 1

Examine the association between household environmental conditions and (a) commensal gut microbiome composition and development and (b) enteric pathogen burden.

We hypothesise that household environmental conditions (water, sanitation and hygiene conditions) are drivers of (a) the composition and development of the child gut microbiome and of (b) the total burden of enteric infections within individual subjects as well as the pathogen profile among those with shared environmental characteristics.

To address SA1a, we will look at the effects of household environmental conditions on gut microbiome community structure at 1 week and 3, 6, 12 and 18 months of age. The outcomes of interest will be measures of alpha (within-sample) diversity of the gut microbiome obtained using 16s rRNA gene amplicon sequencing data. In addition, we will use 16s data to evaluate the influence of environmental conditions using beta (between-sample) diversity measures and will use non-parametric ordinations and multivariate statistical models to examine differences explained by key environmental variables and covariates. Statistical analyses that include effect size corrections will be used to identify specific taxa that differ between individuals with varying environmental exposures. These analyses will be completed using the QIIME2 software platform and other bioinformatics tools.66

The outcome of SA1b is the presence or absence of pathogens in children 6–18 months of age. We will examine the effect of household environmental conditions on the burden of enteric infections by developing mixed effect logistic regression models, where the outcome will be total pathogen burden and/or burden of specific pathogens. In addition, we will compare the community similarity of enteric pathogens shed by subjects using similarity matrices and multivariate models. In this case, instead of using the community composition of the gut microbiome as in SA1a, in SA1b, the outcome will be the set of enteropathogens detected in stool samples, partitioning the variance in dissimilarity between the main effects (environmental conditions) and covariates of interest such as child age, sex, delivery mode and nutrition.

Specific aim 2

Examine whether gut microbiome composition and diversity modifies the acute and chronic effects of enteric infections in children.

We hypothesise that diversity and composition of the child gut microbiome modify the impact of specific enteric infections on child health outcomes, including: (a) diarrhoea incidence, (b) EED and (c) growth faltering.

Using participants from all 3 sites, we will evaluate the effect of cumulative burden of enteric infection (pathogens detected at 6, 12 and 18 months of age) on 3 primary outcome measures at 24 months. The primary goal of this analysis is to evaluate whether gut microbiome alpha diversity measures (described in SA1) modify the enteric infection-health relationship of interest. We will examine whether overall diversity and specific bacterial taxa of interest are effect modifiers. We will also use stool samples collected at 1 week of age to assess whether the initial colonisation of the microbiome influences assembly and subsequent composition at later ages (founder effects). We will develop regression models with random effects for site and individual. We will adjust for other covariates related to each outcome, such as breastfeeding, reported antibiotic usage across 24 months and SES.

Specific aim 3

Examine how the gut microbiome (a) responds to and (b) recovers from enteric infections in children.

We hypothesise that (a) enteric infections elicit specific changes in the child gut microbiome and that these signature responses will be distinct for symptomatic versus asymptomatic infections and (b) children with a more diverse gut microbiome prior to infection will recover more quickly from the short-term disturbances of enteric infections than those with a less diverse microbiome.

We will use data collected from children with acute diarrhoea during our intensive stool collection activity, selecting samples that test positive for active enteric infections. The outcomes of interest for SA3a are the (1) alpha diversity, (2) beta diversity and (3) differentially abundant microbiome taxa. Within a given child, we will compare 16s bacterial microbiome communities the week prior to infection with the week during infection. We additionally will compare 16s microbiome communities the week during infection with 2 weeks or more post infection to characterise recovery. We will examine the impact of community, controlling for other covariates, on microbiome community similarity using ADONIS permutation models, based on Unifrac distances and visualise differences using Non-metric Multi-dimensional Scaling (NMDS) plots. For specific enteric infections, we will evaluate whether any observed changes to gut microbiome composition or abundance are modified by diarrhoea case status. We will also examine similarity measures of the gut microbial communities in the week prior to infection, compared with the week during infection.

To address SA3b, we will carry out similar analyses to those described for SA3a, in this case comparing samples from the week prior to infection to the sample collected 2 weeks or more after infection as an indicator of community recovery from the disturbance caused by the enteric infection.

Shotgun metagenomics

To further address SA3, we will carry out shotgun metagenomic sequencing on a subset of at least 225 samples from intensive surveillance to characterise the predominant microbial species and strains present during infection and changes in the presence or absence and relative abundance of functional genes present within the microbial community before, during and following enteropathogen infections. We will compare symptomatic versus asymptomatic individuals with the same type of infection.

We will select samples for metagenome sequencing based on the enteropathogen, or set of enteropathogens, for which we observe the greatest difference in taxonomic composition using 16s sequencing data. Relative to 16s sequencing, shotgun metagenomics will provide improved resolution to study pathogens and pathogen genotypes, measure relative in situ abundance of pathogen populations,67 and investigate changes in functional and virulence gene abundances in the gut microbiome in response to pathogen presence.68 The presence or absence and relative abundance of functional genes contributed by microorganisms in the gut before, during and after an enteropathogen infection will be used to test whether the baseline gut microbiome profile is an effect modifier on the resulting outcome of diarrhoea.

Shotgun metagenomes will be sequenced using Illumina sequencing chemistry as previously described.69 Raw FASTQ reads from the sequencing runs will be quality checked, trimmed, assembled and annotated using tools implemented in the MiGA (Microbial Genomes Atlas) pipeline,70 which was developed for efficient processing and management of microbial metagenomes. We will use read-based mapping to quantify annotated metagenome features and will normalise abundances based on the sequencing depth of an external spike-in control.71 Additional open-source or internal software tools will be used to remove human read contamination, annotate gene functions and taxonomy and bin and quality check metagenome-assembled genomes, as we have done with other analyses.69

Sample size

We plan to enrol up to 480 pregnant women (160 per site), with the goal of retaining 360 total mother–child dyads (120 per site) through 24 months of age. Our current enrolment rate is ~18 dyads per month. Sample size calculations were completed to test the hypotheses proposed in aims 1–3 with a β of 80%, an α of 5% and an intracluster correlation coefficient of 0.1. Table 3 details the assumptions used to estimate the required sample size for each aim. We used ranges of exposure prevalence in the calculations based on regional variability of the various exposures. For gut microbiome power calculations, we used class-level Chao1 species richness ranges from our previous work in the region.36 Other microbiome power analyses using more advanced methods (eg, taxonomic based methods72 or pairwise distances using PERMANOVA73) require additional details beyond the scope of our current data, for example, distances between subjects within each group.74 However, our sample size is on par with or greater than that of other studies reporting significant differences across gut microbiome samples in LMIC settings.10–12 17 21 75 For example, household water access and the microbiome will each be measured at least 5 times per child, resulting in an estimated 1800 paired measurements. We similarly estimate that, assuming an intraclass correlation coefficient of 1.1 to account for within-child correlation between measurements, 1268–1800 measurements would be needed to detect differences in the Chao1 richness related to water access if the overall prevalence of access to improved water in the study communities is between 25% and 50%.

Table 3

Sample size and power calculation estimates for study aims

Patient and public involvement

Patients and/or the public were not involved in the design, or conduct, or reporting or dissemination plans of this research.

Ethics and dissemination

The study protocol has been approved by the institutional review boards of Emory University (IRB00101202) and the Universidad San Francisco de Quito (2018–022M). The University of Michigan and University of Washington formally deferred oversight of the study to Emory University. The study protocol was also reviewed and approved by the Ministry of Health of Ecuador (MSPCURI000253-4).

The results of the study will be published in peer-reviewed journals and presented at international conferences as well as to health officials in Ecuador.


Population-based studies of gut microbiome composition have been largely descriptive, identifying differences between communities living in different regions,17–22 and most studies of host-pathogen dynamics have been carried out in the laboratory or in animal models.76 77 To understand the functional role that the commensal microbiome plays in response to enteric infection, a combination of TAC, 16s rRNA gene amplicon sequencing and novel metagenomic bioinformatics tools will be used to characterise differential compositional and functional responses to enteric pathogen infections. This will allow the impacts of pathogen exposure to be better differentiated from factors related to SES and diet, thereby providing new information to understand and reduce the negative consequences of enteric infections for child health.

Ethics statements

Patient consent for publication



  • Contributors GOL contributed to the development of the protocols for fieldwork, data management and analysis and drafted the manuscript. JNSE contributed to the conception and design of the work and critically revised the manuscript. JU contributed to the development of the protocols for fieldwork and data management and critically revised the manuscript. AVE contributed to the development of the protocols for fieldwork and data management and critically revised the manuscript. ER contributed to the development of the protocols for nutritional data collection and dietary data management and critically revised the manuscript. MC contributed to the development of the protocols related to parasitology and critically revised the manuscript. NC contributed to the development of the protocols for nutritional data collection and dietary data management and critically revised the manuscript. GV contributed to the development of the protocols for fieldwork and laboratory analyses and critically revised the manuscript. WC contributed to the development of the protocols for fieldwork and critically revised the manuscript. SMS contributed to the development of the protocols for laboratory analyses and sample size calculations, and critically revised the manuscript. CV, GT, RM, KJJ, KTK: contributed to the development of the protocols for laboratory analyses and critically revised the manuscript. KL contributed to the conception and design of the work, drafted portions of the manuscript text, and critically revised the manuscript.

  • Funding This work is supported by the National Institutes of Health (R01AI137679). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

  • Map disclaimer The inclusion of any map (including the depiction of any boundaries therein), or of any geographic or locational reference, does not imply the expression of any opinion whatsoever on the part of BMJ concerning the legal status of any country, territory, jurisdiction or area or of its authorities. Any such expression remains solely that of the relevant source and is not endorsed by BMJ. Maps are provided without any warranty of any kind, either express or implied.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.