Article Text


Establishing integrated rural–urban cohorts to assess air pollution-related health effects in pregnant women, children and adults in Southern India: an overview of objectives, design and methods in the Tamil Nadu Air Pollution and Health Effects (TAPHE) study
  1. Kalpana Balakrishnan1,
  2. Sankar Sambandam1,
  3. Padmavathi Ramaswamy2,
  4. Santu Ghosh1,
  5. Vettriselvi Venkatesan3,
  6. Gurusamy Thangavel1,
  7. Krishnendu Mukhopadhyay1,
  8. Priscilla Johnson2,
  9. Solomon Paul3,
  10. Naveen Puttaswamy1,
  11. Rupinder S Dhaliwal4,
  12. D K Shukla4,
  13. SRU-CAR Team1
  1. 1Department of Environmental Health Engineering, ICMR Center for Advanced Research on Environmental Health: Air Pollution, Sri Ramachandra University, Chennai, Tamil Nadu, India
  2. 2Department of Physiology, Sri Ramachandra University, Chennai, Tamil Nadu, India
  3. 3Department of Human Genetics, Sri Ramachandra University, Chennai, Tamil Nadu, India
  4. 4Division of Non-Communicable Diseases, Indian Council for Medical Research, New Delhi, Delhi, India
  1. Correspondence to Professor Kalpana Balakrishnan; kalpanasrmc{at}


Introduction In rapidly developing countries such as India, the ubiquity of air pollution sources in urban and rural communities often results in ambient and household exposures significantly in excess of health-based air quality guidelines. Few efforts, however, have been directed at establishing quantitative exposure–response relationships in such settings. We describe study protocols for The Tamil Nadu Air Pollution and Health Effects (TAPHE) study, which aims to examine the association between fine particulate matter (PM2.5) exposures and select maternal, child and adult health outcomes in integrated rural–urban cohorts.

Methods and analyses The TAPHE study is organised into five component studies with participants drawn from a pregnant mother–child cohort and an adult cohort (n=1200 participants in each cohort). Exposures are assessed through serial measurements of 24–48 h PM2.5 area concentrations in household microenvironments together with ambient measurements and time-activity recalls, allowing exposure reconstructions. Generalised additive models will be developed to examine the association between PM2.5 exposures, maternal (birth weight), child (acute respiratory infections) and adult (chronic respiratory symptoms and lung function) health outcomes while adjusting for multiple covariates. In addition, exposure models are being developed to predict PM2.5 exposures in relation to household and community level variables as well as to explore inter-relationships between household concentrations of PM2.5 and air toxics. Finally, a bio-repository of peripheral and cord blood samples is being created to explore the role of gene–environment interactions in follow-up studies.

Ethics and dissemination The study protocols have been approved by the Institutional Ethics Committee of Sri Ramachandra University, the host institution for the investigators in this study. Study results will be widely disseminated through peer-reviewed publications and scientific presentations. In addition, policy-relevant recommendations are also being planned to inform ongoing national air quality action plans concerning ambient and household air pollution.

Statistics from

Strengths and limitations of the study

  • Using an integrated rural–urban cohort design to monitor air pollution exposures on a longitudinal basis in relation to specific health outcomes.

  • Characterising the rural–urban exposure continuum and minimising exposure misclassification through objective measurements.

  • Generating quantitative exposure–response relationships for birth weight, acute respiratory infections in children, chronic respiratory symptoms and lung function to fill critical gaps in the available evidence base for exposure settings commonly prevalent in India and other rapidly developing countries.

  • Using healthcare facility-provided birth weight data, as field logistics precluded making consistent primary measurements of birth weight.

  • Intermittent as opposed to continuous surveillance for acute respiratory infections in children necessitated by field logistics.

  • Reliance on area measurements and exposure reconstruction methods as opposed to personal exposure measurements on account of limitations in available equipment.


Air pollution ranks among the leading risk factors contributing to the global and regional burden of disease in South Asia.1 ,2 In India, household air pollution (HAP) and ambient air pollution (AAP) are estimated to account for 6% and 3% of the total national burden of disease, respectively, with approximately 1.04 million premature deaths and 31.4 million disability adjusted life years (DALYs) attributable to HAP, and 627 000 deaths and 17.8 million DALYs attributable to AAP.3

Several recent systematic reviews provide an elegant consolidation of the evidence for both acute and chronic health effects of air pollution. Health effects examined in relation to AAP have included cerebrovascular and ischaemic heart disease,4–7 chronic obstructive pulmonary disease (COPD),8 ,9 asthma,10 acute respiratory infections,11 birth weight12 and carcinogenicity;13 while for HAP, they have included child pneumonia,14 COPD,15 ,16 lung cancer,17 ,18 cataract19 and birth weight.20 A new class of evidence using integrated exposure–response functions for comparing health risk estimates across major combustion sources (including sources in the ambient and household environment as well as passive/active smoking) have also recently become available.21 The collective evidence from these studies has served the development of the WHO air quality guideline (WHO-AQGs) values for individual pollutants22 as well as the recent guidelines for HAP in relation to solid cooking-fuel use.23 ,24

In rapidly developing countries such as India, the ubiquity of sources in urban and rural settings, often results in ambient and household exposures significantly in excess of the WHO-AQG values.25–27 The breadth of epidemiological studies linking air quality and health in India has, however, been rather modest. Most studies report observations from cross-sectional studies, with limited efforts directed at establishing exposure–response relationships.28 ,29 Ambient and household exposures are also likely to be seamless in India, with overlapping implications for a range of acute and chronic health conditions,30 but few studies have attempted to study this continuum on a longitudinal basis.

With a view to strengthen this evidence base in India and to contribute new information from settings experiencing dual burdens from ambient and household sources, The Indian Council for Medical Research (ICMR), Government of India, prioritised the need to launch cohort studies that would cohesively address ambient and HAP exposures, within rural and urban populations, while simultaneously examining adult and children's health outcomes.

Given the relatively nascent experience of establishing and running environmental health cohort studies in India, a single site study was initially proposed that would rely on a combination of primary field measurements and data available through governmental health and environmental surveillance systems. Establishing the feasibility and utility of such a modality of data collection could potentially allow scaling up across multiple sites/states in the future. The state of Tamil Nadu in Southern India provided fertile grounds for launching such a cohort, with an elaborate network of well-functioning governmental health and air quality monitoring systems involved in routine data collection. Based on this rationale, the Tamil Nadu Air Pollution and Health Effects (TAPHE) study was launched by the ICMR Center for Advanced Research (ICMR–CAR) on Environmental Health (Air Pollution) at Sri Ramachandra University (SRU) in Chennai, India, in 2010. The choice of health outcomes for examination in the cohort was informed by a gaps assessment in reported India studies28 ,29 as well as by field feasibilities of completion of phase I of the cohort within a period of 5 years.

In this paper, we present the study design, objectives and methods being used in the TAPHE study. We also share detailed standard operating procedures (SOPs) and questionnaires being used in the study in online supplementary files accompanying the manuscript to serve as a resource for other researchers contemplating similar efforts as well as to allow feedback on planned analyses before completion of the project.

TAPHE study objectives

The TAPHE study is organised into five interlinked component studies with data collection on air pollution exposures and health outcomes spread across two rural–urban cohorts in Tamil Nadu. This includes a pregnant mother–child (M-C) cohort and an adult cohort (AC), with planned sample sizes of ∼1200 in each cohort (with roughly equal proportions in the rural and urban arms). The principal objectives of the TAPHE study involving one or more of the component studies are to:

  1. Estimate exposure–response relationships for household particulate matter (PM2.5) concentrations, birth weight (BW) and acute respiratory illness (ARI) in pregnant women and children (<2 years of age), respectively (TAPHE-BW-ARI Study).

  2. Estimate exposure–response relationships for household PM2.5 concentrations, chronic respiratory symptoms and lung function in women and men (TAPHE-Adult Respiratory Health Study).

  3. Estimate relationships between household concentrations, ambient concentrations and personal exposures for PM 2.5 and characterise household concentration profiles of select air toxics for rural and urban households (TAPHE-Exposure Study).

  4. Explore development of exposure models for use in future epidemiological studies requiring long-term and/or large-scale exposure reconstructions for PM2.5 in rural and urban households (TAPHE-Exposure Modelling Study).

  5. Create a biorepository of peripheral and cord blood samples for future explorations of gene–environment interactions for air pollution-related health effects (TAPHE-Bio-repository Study).

The overall organisation of the component studies is illustrated in figure 1.

Figure 1

Organisation of the TAPHE study illustrating the linkages across component studies with study participants drawn from a rural-urban Mother-Child (M-C) and Adult cohort in Tamil Nadu. BW-ARI, birth weight-acute respiratory illness; PM2.5, particulate matter; TAPHE, The Tamil Nadu Air Pollution and Health Effects.


Study methods and sample sizes for the TAPHE studies were developed through multiple consultative protocol development meetings with national experts, who serve as members of a project technical advisory committee constituted by ICMR. Study protocols were approved by the SRU institutional ethics committee (see online supplementary file 1) with permissions to include new modifications during periodic reviews. The study area and participant recruitment strategy for the cohorts is described below.

Selection of study area and recruitment strategy

We restricted our study area within the state of Tamil Nadu to one urban (Chennai) and two rural (Tiruvallur and Kancheepuram) districts, located within ∼50 km of our laboratory at SRU. This was necessary to accommodate the field logistics of making (12–36) repeat visits to multiple field locations as well as collection, transportation and processing of large volumes of field samples.

The state of Tamil Nadu has a well-established public healthcare system in operation, with an elaborate network of primary healthcare centres (PHCs) and urban health posts (UHPs) involved in providing antenatal care.31 The state also reports a high proportion (∼97%) of institutional deliveries when compared to other states in India,32 and has recently launched an electronic surveillance system for routine monitoring and evaluation of maternal and child health programmes.33 We therefore decided to select our recruitment sites for the M-C cohort from the network of PHCs and UHPs within the chosen districts. Permissions were secured from the Secretary of Health, Government of Tamil Nadu, to enrol pregnant women at these sites as well as obtain antenatal and birth outcome data available in their records.

We randomly selected 11 PHCs (of a total of 109 PHCs) and 17 UHPs (of a total of 103 UHPs) within the study area, to recruit pregnant women for the rural and urban arms of the M-C cohort, respectively. The number of recruitment sites was decided based on the required rate of recruitment/site to complete the recruitment over a 20-month period, while restricting the number of sites, to optimise efforts needed for networking with the PHC/UHP staff and social workers on the ground. We estimated a random sample of 10% of available recruitment sites to be sufficient for completion of planned recruitment, based on an average attendance of ∼30 participants on an assigned antenatal day each week, and expected eligibility and response rates of ∼80%, respectively. We selected the sites using standard random number generating procedures. Informed consent was then sought for enrolment of the mother and child to provide the study sample for the TAPHE-BW-ARI study. This involved (1) securing oral informed consent from the pregnant mother by project staff at the PHC/UHP to screen for eligibility as well as to visit her household to seek written consent and (2) administration of an informed consent form at the household in the presence of another household member or project staff as a witness, and securing written informed consent. No financial incentives were offered at any time to the participants. Project staff was trained to assist with referrals to the PHC/UHP or physicians within the community, in the event of sickness of the participating mother or child.

Participants for the TAPHE–Adult Respiratory Health study (ie, AC) were recruited from the same households or villages/municipal zones providing participants for the M-C cohort. This allowed optimisation of resources for household and/or community level air pollution measurements across the two cohorts as well as facilitated integration of data for exposure reconstruction and exposure modelling efforts. The M-C and/or ACs together served as the base to subsequently draw participants for the TAPHE-Exposure Assessment, the TAPHE-Exposure Modelling and the TAPHE-Bio-repository studies. The distribution of participants enrolled thus far in the two cohorts, across villages/municipal zones, are described in online supplementary file 2 and table T2.1.

We describe the organisation of data elements (figure 2) and details of methods used in individual component studies in the next sections. The complete set of SOPs and study instruments may be found in the online supplementary file 3.

Figure 2

Organisation of the TAPHE study illustrating constituent data elements included within component studies. Colours are indicative of data elements that were grouped together in SOPs or study protocols. ANC, antenatal care; ARI, acute respiratory illness; BW, birth weight; ETS, environmental tobacco smoke; M-C, mother–child; PHC, Primary Health Care Centre; PM2.5, particulate matter; TAPHE, The Tamil Nadu Air Pollution and Health Effects; UHP, Urban health post.

The TAPHE-Birth Weight-ARI Study

Sample size

We compared sample size requirements based on effects size estimates, outcome and exposure variability reported in studies concerning birth weight, ARI, household and AAP. Studies using both categorical and continuous outcome/exposure metrics were considered.20 ,34–36 Prajapati et al 2001, Goel et al 2012, Ramasamy et al 2011,29 ,37 We arrived at a sample size of 943 pregnant mothers and 772 children to develop logistic and/or linear regression models as described in online supplementary file 4: figure S4.1. Assuming a loss to follow-up rate of ∼10–15% over a ∼4-year period, a final sample size of 1200 mothers was chosen for the TAPHE-BW-ARI study. The main data collection elements in the TAPHE-BW-ARI Study are described below.

Participant recruitment in the TAPHE-BW-ARI Study

Pregnant women presenting themselves at PHCs/UHPs in the study area (selected as described previously) are approached to secure an initial oral consent to respond to a screening questionnaire for assessing eligibility. Inclusion criteria are based on (1) no use of assisted reproductive technologies; (2) no involvement in dusty occupations; and (3) no known plans to be away from their current residence for longer than 6 months during the study period (including delivery of the child). Eligible women are enrolled on provision of a written informed consent (that includes consents for follow-up of the child after delivery, until 2 years of age) provided at the time of the first household visit.

Collection of general household, maternal health, birth weight and birth outcome data

On the provision of informed consent, a general household questionnaire is administered to pregnant women to elicit information on housing, household and participant characteristics. Following this, antenatal care (ANC) data available with either the participant or the PHC/UHP providing ANC services, are manually abstracted into the study forms during periodical house visits, which are scheduled to cover every trimester. In the event the health records are unavailable at either location, permission has been secured to access the Pregnancy and Infant Cohort Monitoring and Evaluation (PICME) Database (an electronic database being maintained by the Directorate of Public Health, Government of Tamil Nadu) to supplement information collected by field staff. Information on birth outcomes is retrieved from mothers using a birth outcome questionnaire administered between 3 and 8 weeks after the delivery of the child. Birth weight data on live births are collected either from the individual healthcare facilities (accessed by the participants for delivery) or from the PHC/UHP record.

ARI assessments

Periodic ARI assessments are performed on children born to mothers enrolled in the M-C cohort, by field assistants trained in WHO Integrated Management of Neonatal and Childhood Illness (IMNCI) methods38 using child health calendars. Field assistants also receive training to make direct observations on sick children (including measurement of breathing rate and observations on chest wall in-drawing/wheezing).

Routine ARI surveillance is initiated with a household visit ∼4–8 weeks after delivery, if reporting a live birth and once the participant is able/willing to accommodate regular visits by the field team. Following this, field assistants visit each household once a month to inquire from the mother about signs or symptoms of respiratory illness over the preceding 2-week period (until the child's second birthday). Both the day and duration of illness are recorded in the child health calendar. Children are referred to the nearest available community clinic if experiencing one major (reluctance to feed or drink, difficult breathing, cyanosis or chest in-drawing) or two minor (fever, cough or irritability) signs of respiratory illness on the day of the visit.39

Field assistants also perform anthropometric measurements during each visit (using SECA mats or stadiometers to record infant length/height, a digital weighing scale to record infant weight and a Gullick tape to record mid-upper arm circumference). Anthropometric measurements are not performed if the child is unwell, uncooperative or sleeping during the time of the household visit. Anthropometric measurements are converted to age- and sex-specific Z-scores using an algorithm that references the 2006 WHO Growth Standards and calculates the difference in Z-score means to classify children as stunted, underweight, or malnourished.40

The following outcome definitions are used for extracting information from the IMNCI questionnaires:

  1. ARI defined as runny nose or cough either with or without fever, lasting at least 72 h.

  2. Undifferentiated fever: defined as fever not associated with other symptoms and lasting for at least 48 h.

Further, based on recent reports involving field assessments of ARI in relation to air pollution,41 ,42 ill children are classified according to the IMNCI algorithm as WHO pneumonia, defined as cough or difficulty breathing plus age-specific raised respiratory rate, or WHO severe pneumonia, defined as cough or difficulty breathing and the presence of lower chest wall in-drawing or inability to breastfeed or drink. Finally, information on breast feeding status, immunisations, medical advice sought for child illness and household characteristics (including access to sanitation and piped drinking water) is collected through a short questionnaire appended to the main IMCI questionnaire.

Field logistics and/or participant availability during scheduled field days are likely to preclude continuous surveillance and hence prevalence of ARI is planned to be quantified in terms of longitudinal prevalence (number of days with illness divided by the total number of days of observation) to address limitations of intermittent surveillance. Estimating longitudinal prevalence has been also shown to be more efficient in surveillance of episodic illnesses such as diarrhoea and ARI by balancing the need to reduce recall bias in intermittent surveillance with lower field costs, when compared to continuous surveillance.43 ,44

Major data fields recorded in the questionnaires for collecting general household, maternal health, birth outcome and ARI-related information in TAPHE-BW-ARI Study are furnished in online supplementary file 5: table T5.1.

A record of loss to follow-up is maintained for all participants in the TAPHE-BW and TAPHE-ARI studies who become unavailable for the study before data collection on birth outcomes and/or ARI surveillance is completed. Details of exposure metrics used in the TAPHE-BW-ARI Study are provided in sections describing the TAPHE-Exposure study.

The TAPHE-Adult Respiratory Health Study

Sample size

We compared sample size requirements based on effects size estimates, and exposure variability reported in studies concerning chronic respiratory lung function in relation to household and AAP. Studies using both categorical and continuous outcome/exposure metrics were considered (Ackermann et al 1997, Schikowski 2005, Forbes 2008. We arrived at a sample size of ∼1080 adults to develop logistic and/or linear regression models as described in online supplementary file 4: figure S 4.2. Assuming a loss to follow-up rate of 10%, we chose a final sample size of 1200 adults for the TAPHE-Adult Respiratory Health Study. The main data collection elements in the TAPHE-Adult Respiratory Health Study are described below.

Participant recruitment

Participants for the AC are recruited from the households or communities (villages/municipal zones) providing participants for the M-C cohort to optimise field resources for household/community level air pollution measurements across the M-C and ACs. Participants in the age group between 18 and 65 are included based on (1) residence in the same area (village/zone) for at least 10 years; (2) no known history of thoracic or abdominal surgery, heart attack, eye surgery, retinal detachment, hospitalisation; (3) no known history of tuberculosis; (4) not being currently pregnant; (5) no history of smoking; and (6) no history of involvement in dusty occupations. The residence criterion was imposed to allow using the short-term household exposure measurements as a reasonable surrogate for long-term exposures, given the focus on chronic health end points. The criteria for other heath conditions were imposed to allow administration of a lung function test on the same individuals. Finally, criteria on smoking and dusty occupations were imposed to minimise exposure misclassification. Eligible participants are enrolled following written informed consents for administration of the respiratory symptom questionnaire and conduct of lung function tests (including the administration of short-acting bronchodilators).

Chronic respiratory symptom assessment

Respiratory symptoms are assessed using the INSEARCH (Indian Study on Epidemiology of Asthma, Respiratory Symptoms and Chronic Bronchitis) questionnaire, developed and validated for field worker assessments in both rural and urban populations in India.45 The study questionnaire captures the respiratory symptoms of the 12-month period preceding the interview while also eliciting information on demographic and environmental exposure factors likely to influence the prevalence of such symptoms.

Lung function assessment

Spirometry is performed at the household by four field staff members certified through a 2-day training programme, following the protocols recommended by the American Thoracic Society,46 with minor modifications, as suggested in recent multicentric studies concerning prevalence of COPD.47 Briefly, measurements are made using a portable spirometer (MIR Spirobank), which is calibrated daily using a 3 L syringe, according to manufacturer's instructions. Spirometric measurements include peak expiratory flow rate, forced vital capacity (FVC), forced expiratory volume in 1 s (FEV1), FEV1/FVC (expressed as a percentage) and forced expiratory flow over the middle half of the FVC (forced expiratory flow (FEF) 25–75%). Eligible participants perform a maximum of eight forced expiratory manoeuvers to obtain three American Thoracic Society (ATS)-acceptable measurements, with FVC and FEV1 reproducible to 150 mL. Tests are performed after administration of a β agonist bronchodilator (salbutomol 200 µg) by inhalation through a 500 mL spacer. All spirometric examinations are carried out with the subject seated, wearing a nose clip and a disposable mouthpiece. A certified chest physician reviews flow-volume and volume-time curves weekly, to check quality and provide feedback to the field staff. Test results stored in the spirometer memory are downloaded to a computer on a daily basis.

As per ATS criteria, manoeuvers are accepted only if the low back-extrapolated volumes (>5% of the FVC and <0.15 L), the FVC and FEV1 are, within 0.20 L of the best effort FVC and FEV1, and there is a low volume accumulated at the end of the effort. The values of the largest FVC and the largest FEV1 are taken from all of the three reproducible and usable curves (acceptable start of test and free from artifact). Values thus obtained are compared against individual predictive values based on age, sex, body weight, standing height and ethnic group, to describe the distribution of lung function parameters within the population,48 although diagnosis for obstructive or restrictive conditions is not required for the development of the exposure–response models.

A summary of main fields of information recorded in the INSEARCH questionnaire and through the lung function assessment is provided in online supplementary file 5: table T5.2

Details of exposure metrics used in the TAPHE-Adult Respiratory Health Study are provided in sections describing the TAPHE-Exposure study. Although the TAPHE-Adult Respiratory Health Study is designed as a cohort, the current scope of this study is limited to point of time measurements intended to serve as a baseline for planned prospective follow-ups. Loss to follow-up data is therefore not currently recorded in the AC.

The TAPHE-Exposure Study

The TAPHE-Exposure Study draws its sample from households/participants enrolled in the M-C and the ACs. We describe the framework for exposure measurements in the two cohorts that are primarily focused, of measuring household concentrations of PM2.5 with additional measurements to facilitate personal exposure reconstruction as well as generate exposure profiles for select air toxics in rural and urban households.

Overall sampling strategy

Long-term exposure reconstructions for air pollutants in epidemiological studies often have to address contributions from multiple time-varying and time-invariant components. Large sample sizes together with limitations imposed by equipment availability and/or field logistics for serial measurements in the two cohorts required that we adopt a hierarchical approach for exposure assessment. Based on a few studies reporting such an approach,49 ,50 we created tiered exposure sampling arms that are organised on the basis of increasing complexity with some basic measurements planned across all study households (extensive arms) and more complex measurements planned in a subset of households/participants (intensive arms). Accordingly, we created five exposure sampling arms as follows:

  1. Extensive arm: Daily average PM2.5 concentration measurements in the kitchen (using integrated, filter based, gravimetric methods) together with administration of time-activity questionnaires to participants for exposure reconstruction using time budgets in household microenvironments.

  2. Intensive arm I: Additional measurements of PM2.5 in the primary living area, a near-household (outdoor) location and a rooftop (ambient) location (within the same community).

  3. Intensive arm II: Two to six repeat measurements of PM2.5 area concentrations with repeat monitoring days scheduled to cover every trimester (for pregnant women), every year from birth to 2 years of age (for babies) and alternate seasons (for other adults).

  4. Intensive arm IV: Personal sampling on non-pregnant women and men to validate exposure reconstructions performed using time-activity recalls.

  5. Intensive arm IV: Concentrations of air toxics in the kitchen, usually during periods of cooking.

Collectively, the measurements provide the ability to combine single versus multiple measures to address spatiotemporal variability in household concentrations and personal exposures for PM2.5 as well to generate profiles of household concentrations of select air toxics (carbon monoxide (CO), volatile organic compounds (VOCs) and polycyclic aromatic hydrocarbons (PAHs)). Figures 3 and 4 illustrate the overall exposure assessment strategy and the organisation of intensive and extensive sampling arms.

Figure 3

Overall sampling strategy for air pollution measurements and exposure assessment in the TAPHE-Exposure Study. TAPHE, The Tamil Nadu Air Pollution and Health Effects.

Figure 4

Organisation of exposure-sampling arms in the TAPHE-Exposure Study. PM2.5, particulate matter; TAPHE, The Tamil Nadu Air Pollution and Health Effects.

Protocols for PM and air toxics measurements and exposure reconstructions

Protocols for PM2.5 measurements were based on previous studies reported from our laboratory.25 ,37 ,51 Protocols for air toxics, however, required the development of new SOPs that addressed specific exposure configurations of households in the cohort. Details of the measurement protocols for PM2.5 and air toxics are provided in online supplementary file 3.

A time-activity questionnaire is administered to all adult participants of the M-C and ACs to elicit information on time spent in major microenvironments within and outside the household (based on a 24 h recall following the conduct of the household PM2.5 monitoring). Time spent in six primary locations including kitchen, livingroom/bedroom, courtyard/verandah, vehicle, work and outdoor microenvironments is recorded. This information is then averaged over multiple administrations (2–3) to create a table for each individual, summarising the amount of time at each predetermined location for subsequent use in exposure reconstructions. Two models for exposure reconstruction are planned (as described in the SOPs, see online supplementary files) using integrated 24 h or shorter (4 h) concentrations with time-activity records in time-weighted average exposure calculations.

A summary of main fields of exposure-variable information recorded in the general household and time-activity questionnaires is provided in online supplemental file 5: table T5.3.

The TAPHE-Exposure Modelling Study

This component study aims to (1) develop and validate land-use regression (LUR) models for characterising the spatial contrasts in urban AAP concentrations of PM2.5; (2) develop mixed models for estimating long-term exposures on the basis of short-term measures and household level determinants in rural households; and (3) explore the application of alternative models for exposure reconstruction in urban and rural households. The exposure measurements performed in the TAPHE-Exposure Study serve as primary inputs for the models. We plan to develop separate models for urban and rural households to address the differential influence of ambient versus household sources on spatial and/or temporal variability in household concentrations. While the detailed model specifications remain to be finalised, we provide a brief description of the model framework below.

Development of LUR models

Routinely collected ambient air quality monitoring data may not reliably characterise exposure gradients for urban populations on account of inadequate spatial density of monitoring sites resulting in exposure misclassification in epidemiological studies. LUR models that use predictor variables derived from geographical information systems have been useful in characterising intraurban gradients in ambient concentrations of criteria air pollutants (including PM2.5 and NO2) for use in many air pollution health effects studies.52–54 While LURs have been widely applied in health effects studies, relatively few describe model development,55–59 and have been focused largely on cities in North America and Europe.

To develop an LUR model relevant for the Chennai metropolitan area, we prepared an inventory of commonly reported predictor variables used in these studies that included those pertaining to road type, traffic counts, land cover, land use classification, population/housing density and elevation. We subsequently arrived at an initial model input list based on local data availability. Ambient measurements of PM2.5 in each of the 10 zones across Chennai city collected in the TAPHE-Exposure Study, supplemented by campaign measurements across ∼50 locations within Chennai city (during the pre-monsoon and monsoon seasons) will serve as the primary dependent variable in these LUR models. Additional details of the LUR model algorithm are provided in online supplemental file 3.

Development mixed models for exposure reconstruction in rural households

HAP exposures in rural communities relying on solid cooking-fuel use are known to be complex and heterogeneous with considerable spatiotemporal variation within and across households.49 ,60 ,61 While epidemiological studies concerning HAP62 have often relied on group characteristics (such as fuel use) to serve as surrogate indicators of long-term exposures, they do not provide the required precision for exposure estimates in exposure–response studies. Individual measures (such as HAP measurements), on the other hand, are usually only feasible over the short-term resulting biases in the estimates of long-term exposures. Mixed models that use both group and individual level measures have been attempted for child CO exposures in Guatemala.63 We plan to use similar methods that will utilise the results from serial measurements of household PM 2.5 concentrations (performed over ∼ a 2-year period in a subset of the M-C and AC households) together with household/group characteristics to develop mixed effect models and compare alternative measures of PM2.5 exposures.

The TAPHE-Bio-repository Study

This component study aims to primarily create a bio-repository of cord and peripheral blood samples from participants in the M-C cohort and AC and standardise available protocols for DNA extraction, single nucleotide polymorphism analyses for select candidate genes and DNA methylation within the SRU laboratory. The study is exploratory in its current form and specific plans to use the bio-repository samples are expected to be finalised at a later date. Details of sample archival and other laboratory protocols are furnished in online supplementary file 3.

Organisation of field data collection and management

Following enrolment, the data collection elements (as shown in figure 2) in the two cohorts involve planning for ∼2–4 visits/household in the AC and 8–24 visits/households in the M-C cohort, to collect health/exposure data. Further, while household visits to the two cohorts cover most of the data elements needed for the TAPHE component studies, additional field days are required for (1) recruitment and/or ANC data collection at PHCs/UHPs; (2) conduct of AAP measurements within community locations; and (3) retrieval of secondary health/air quality/meteorological survey data from relevant governmental offices.

The high frequency and volume of visits together with logistic/cost requirements for transport and deployment of field equipment necessitate substantial planning to draw up weekly and monthly field schedules. This entails making prior appointments by phone, lining up additional households within the same village/zone to accommodate unavailability and distributing the workload evenly across field teams. A typical month has provided approximately 20 field days with four 2-person teams involved in recruitment, exposure/health data collection (resulting in recruitment of approximately 60–70 participants and health/exposure measurements in 80–100 households every month). Additional two 2-person teams have been involved in data entry and/or laboratory procedures on a rotational basis during the month. A typical year has involved ∼9 months of field data collection with three intervening months devoted to data cleaning and processing, in preparation for semiannual and annual review committee visits. A flow chart describing the organisation of field teams and activities across the TAPHE studies is provided in figure 5.

Figure 5

Organisation of Field Teams for Data Collection in the TAPHE study. ANC, antenatal care; AC, adult cohort; CO, carbon monoxide; ETS, environmental tobacco smoke; M-C, mother–child; PAHs, polycyclic aromatic hydrocarbons; PFT, pulmonary function test; PHCs, primary healthcare centres; PM2.5, particulate matter; TAPHE, The Tamil Nadu Air Pollution and Health Effects; UHPs, urban health posts; VOCs, volatile organic compounds.

Following data collection, two levels of data quality check are routinely performed. At the first level, the collected information is verified by a senior supervisor on a weekly basis, and if any data collection errors are suspected, these fields are appropriately tagged. Information is then communicated to the field team so that missing or erroneous data can be collected during subsequent visits to the same household/recruitment location. Once the field forms are checked for completeness, they are stored in an access restricted data storage room after removing identifier information. Supervisors subsequently hand over questionnaires/field forms to the data entry operators, while maintaining a log of what was handed over to the data entry team. Data are entered into a MS Access database (Microsoft Corp, Inc) designed to accommodate the data analyses requirements for the TAPHE study. The relational database architecture allows various tables in the database to be connected through the unique ID. The second level data quality check is carried out at this stage on a half yearly basis, where simple frequencies are taken to find artifacts. These artifacts are then checked with the data forms for data entry errors.

Data analysis will primarily rely on bivariate and multivariate regression analysis using generalised linear regression models adapted to specific characteristics of each outcome/exposure. Analysis will begin with exploratory univariate analysis of the data using graphs and summary statistics, stratified by sociodemographic, spatial or fuel variables. These analyses will help describe the study characteristics, identify possibly important confounders and factors to be included in multivariate modelling, and identify outliers. Outliers will be checked for errors by comparison with original data collection forms, whenever possible. If no error is found, we will generally take the conservative approach of including all data in the analysis. If appropriate, secondary analyses may be reported, which exclude extreme values. Univariate exploratory analysis will also help to decide on transformations of some variables. Bivariate analysis will be used to examine relationships between each potential predictor and the outcome(s) of interest. Graphical displays such as side-by-side boxplots will be used to explore relationships and to identify possible nonlinearities. Multivariate modelling will be conducted using biological considerations regarding which variables should be included in the model, as well as statistical methods of variable selection, such as stepwise regression, deviance comparisons of various candidate models, and goodness-of-fit assessments. To allow for the nonlinear associations with exposure, alternative functional forms (polynomial or logarithmic) as well as smoothing techniques such as penalised linear spline regression models and local regression models will be used. We will also examine the presence of linear association by dividing exposure into multiple discrete categories (eg, by quintiles of exposure level or of exposed participants). Residual analysis of final models will be used to confirm model fit and identify influential data points.

Ethics and dissemination

All participants are enrolled after providing written informed consent. Field staff are routinely instructed to observe strict rules of professional conduct and extend due courtesy to household members during the conduct of field investigations. Project data are maintained in strictest confidence by stripping all circulating study forms of identifying information and assigning α-numeric IDs for households and participants. The Project principal investigator and senior coinvestigators inquire about issues of potential ethical concern (should they arise during field visits) at weekly group meetings. All such issues are resolved through telephonic or personal interactions. A summary of the field reports are also sent to the SRU Ethics Committee and the ICMR Technical Review Committee on an annual basis for a review.

Two capacity-building workshops and a national dissemination workshop have been planned for the last year of the project. The copies of SOPs furnished as online supplementary files with this manuscript together with copies of the final report will be made freely available through the SRU project web-site as well as the ICMR web-site. Five conference presentations have been made together with three journal (review) publications thus far.27 ,61 ,64 The key results from each of the projects will be published in peer-reviewed media once approved by the technical review committee. An executive summary with specific policy implications is planned for circulation among key personnel in the Ministry of Health and Environment. Materials for awareness generation are also planned for circulation among the cohort communities.


In depth exploration of exposure–response relationships in cohort studies has been responsible for important air quality actions in many parts of the world. Studies involving the MESA,65 Nurses Health66 and ACS,67 in USA, and more recent studies involving the SAPALDIA68 and ESCAPE69 ,70 cohorts in Europe, have played a vital role in catalysing national standards policies concerning urban air quality while informing global efforts to estimate the burden of disease attributable to air pollution. The TAPHE study marks an important beginning of such efforts in India, where understanding the complexity of the rural–urban continuums in exposure and the differential impacts on maternal/child versus adult outcomes will be valuable in prioritising control strategies for air pollution. Estimates from exposure–response functions can also facilitate health impact assessments and inform intervention efforts. The study protocols described in the manuscript represent an effort to inform the scientific community of potential outputs from a major national project, while adding transparency and accessibility to the methods that have been conceptualised to fulfil each project objective.


View Abstract


  • Contributors KB, SS, PR, SG, VV, TG, KM, PJ and SP wrote the original grant proposal on which the study design and protocols are based. KB drafted this paper for which all authors and SRU-CAR team members provided critical inputs. NP and TG drafted many of the SOPs included with the manuscript. RSD and DKS provided inputs for protocol refinements during the review committee meetings. All authors approve of the manuscript contents.

  • Funding The TAPHE study is supported by a “Center for Advanced Research in Environmental Health: Air Pollution” grant awarded to Professor Kalpana Balakrishnan at Sri Ramachandra University, by the Indian Council of Medical Research, Government of India.

  • Competing interests None declared.

  • Ethics approval Institutional Ethic Committee, Sri Ramachandra University.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The original data sets will be made available to interested researchers and other professional groups on specific request. This is expected to be feasible after the completion of the initial round of analyses and the first sets of publications. A data repository is being planned by the Indian Council of Medical Research, and the data sets from the TAPHE study are expected to be made available through this facility as well.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.