Article Text


Cohort profile: the Baependi Heart Study—a family-based, highly admixed cohort study in a rural Brazilian town
  1. Kieren J Egan1,
  2. Malcolm von Schantz1,2,3,
  3. André B Negrão2,
  4. Hadassa C Santos2,
  5. Andréa R V R Horimoto2,
  6. Nubia E Duarte2,
  7. Guilherme C Gonçalves2,
  8. Júlia M P Soler4,
  9. Mariza de Andrade5,
  10. Geraldo Lorenzi-Filho6,
  11. Homero Vallada3,
  12. Tâmara P Taporoski3,
  13. Mario Pedrazzoli7,
  14. Ana P Azambuja8,
  15. Camila M de Oliveira2,9,
  16. Rafael O Alvim9,
  17. José E Krieger2,
  18. Alexandre C Pereira2
  1. 1Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
  2. 2Laboratory of Genetics and Molecular Cardiology, Heart Institute (Incor), University of São Paulo Medical School, São Paulo, Brazil
  3. 3Institute of Psychiatry, University of São Paulo Medical School, São Paulo, Brazil
  4. 4Department of Statistics, University of São Paulo, São Paulo, Brazil
  5. 5Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
  6. 6Sleep Laboratory, Pulmonary Division, Heart Institute (Incor), University of São Paulo Medical School, São Paulo, Brazil
  7. 7School of Arts, Science, and Humanities, University of São Paulo, São Paulo, Brazil
  8. 8Natura Innovation and Product Technology Ltd., Cajamar, SP, Brazil
  9. 9Department of Physiology, Federal University of Juiz de Fora, Juiz de Fora, Brazil
  1. Correspondence to Dr Malcolm von Schantz; m.von.schantz{at}


Purpose Cardiovascular disease (CVD) is a major challenge to global health. The same epidemiological transition scenario is replayed as countries develop, but with variations based on environment, culture and ethnic mixture. The Baependi Heart Study was set up in 2005 to develop a longitudinal family-based cohort study that reflects on some of the genetic and lifestyle-related peculiarities of the Brazilian populations, in order to evaluate genetic and environmental influences on CVD risk factor traits.

Participants Probands were recruited in Baependi, a small rural town in the state of Minas Gerais, Brazil, following by first-degree and then increasingly more distant relatives. The first follow-up wave took place in 2010, and the second in 2016. At baseline, the study evaluated 1691 individuals across 95 families. Cross-sectional data have been collected for 2239 participants.

Findings to date Environmental and lifestyle factors and measures relevant to cardiovascular health have been reported. Having expanded beyond cardiovascular health outcomes, the phenotype datasets now include genetics, biochemistry, anthropometry, mental health, sleep and circadian rhythms. Many of these have yielded heritability estimates, and a shared genetic background of anxiety and depression has recently been published. In spite of universal access to electricity, the population has been found to be strongly shifted towards morningness compared with metropolitan areas.

Future plans A new follow-up, marking 10 years of the study, is ongoing in 2016, in which data are collected as in 2010 (with the exception of the neuropsychiatric protocol). In addition to this, a novel questionnaire package collecting information about intelligence, personality and spirituality is being planned. The data set on circadian rhythms and sleep will be amended through additional questionnaires, actimetry, home sleep EEG recording and dim light melatonin onset (DLMO) analysis. Finally, the anthropometric measures will be expanded by adding three-dimensional facial photography, voice recording and anatomical brain MRI.

Statistics from

Strengths and limitations of this study

  • High degree of admixture and traditional lifestyle, both uniquely typical of rural Brazil.

  • Multiple phenotypes of different categories coupled with genomic and metabolomic data.

  • Family-based study design adds heritability information to all variable collected.

  • Limitations created by the legal preclusion of payment to participants are circumvented by high compliance within the community, but limits the possibility for complex and/or intrusive study protocols.

  • Significant levels of illiteracy complicates the collection of some datasets, and the use of scribes may cause reporting bias.


Owing to multiple and different waves of immigration, the Brazilian population comprises of a wide variety of ethnic backgrounds. Over successive generations, a considerable amount of admixture has arisen within the population, primarily involving the first groups to arrive (native Americans, Europeans, and Africans).1–3 In the 2010 census, 43.1% of the population defined themselves as mixed race, 47.7% as white, 7.6% as black, 1.1% as ‘yellow’ (of Asian descent), and 0.4% as indigenous.4

Similar to many developed countries, cardiovascular disease (CVD) is a major non-communicable disease in Brazil.5 CVD is the leading cause of disability and death, and current statistics suggest that it accounts for 31.2% of deaths worldwide6 and 31.5% in Brazil.7 Despite a number of healthcare breakthroughs within the last decade, including more efficient diagnostic and therapeutic procedures, CVD poses a considerable challenge. A number of modifiable environmental risk factors have been identified (eg, smoking, diet and physical exercise); however, improved awareness does not necessarily equate to behavioural change.8 Therefore, an improved understanding of the genetics behind CVD, and its interaction with environmental factors, would both improve our understanding of CVD and help prioritise public health resources.

The aetiology of CVD is complex, and involves metabolic, neuroendocrine and genetic interactions.9–12 Despite a number of lines of evidence suggesting genetic influences, there has been an inconsistency of findings across studies and it is still unknown to what extent the proposed genetic effects are occurring through major loci or multiple distinct loci acting in concert.

Brazil lends itself well to study the environmental contributions of CVD risk profiles. The rapid population growth and industrialisation of Brazil in recent years has established some of the most populated urbanised areas in the world (84% of the population live in urban environments),4 creating an environmental and lifestyle gap between the metropolitan populations and the still sizable rural ones. Similar to other developed countries, studies have suggested a relatively high prevalence of poor diet,7 sedentary lifestyle and insufficient exercise.13 Accordingly, 52.5% of Brazilians are overweight, and a subset of these (17.9% of the general population) are classified as obese.14

A substantial element of admixture is both an emblematic feature of the Brazilian population and a facilitating factor in the search for genetic associations. Admixed populations have long been recognised as offering a shortcut towards identifying genomic regions associated with disease or other inheritable traits,15 an approach known as ‘mapping by admixture disequilibrium’.16 The original aim of this project was to develop a longitudinal family-based cohort study that reflects on some of the genetic and lifestyle-related peculiarities of the Brazilian population in order to help evaluate genetic and environmental influences on cardiovascular risk factor traits. The overall goal was to quantify and characterise the interindividual variation in common cardiovascular risk factors, and disentangle its genetic and environmental components. By collecting and analysing a uniquely wide range of phenotypic and phenomic information, the study aims to serve as a generator of hypotheses for future investigation.

Cohort description

The study focuses on the population of Baependi, a town in a rural area (752 km2, 18 307 inhabitants at the 2010 census) located in the state of Minas Gerais, Brazil (21.95°S, 44.88°W). It is a traditional community, with a cohesive culture, a high degree of admixture and very limited migration (table 1). Median monthly income at the census date was BRL 510, and 28.1% of inhabitants aged 60 or above and 7.7% of those aged between 24 and 59 years old were illiterate.4 No individual living in the town was born in another country, and 99.0% of the population was classified as being born in the South-East region of Brazil.17

Table 1

Basic description of the study population and comparison with the 2010 Census data

The initial data collection phase of the Baependi Heart Study took place between December 2005 and January 2006. Prior to recruitment, the project was advertised through provincial, religious and municipal authorities, in local television, newspaper and radio messages, through physicians, and by phone calls. For physical examination, a clinic was established in the centre of Baependi. Currently, a general practitioner, a physician, funded through the project is operating a clinic in these premises, where health concerns identified through the study are followed up.

Probands were identified from the community in 11 census districts (from a possible 12). Residential addresses within each district were then randomly selected (by randomly selecting first a street, and second a household on that street). Once a proband in this household, 18 years or older, was enrolled, all his/her first-degree (eg, parents, siblings and offspring), second-degree (eg, half-siblings, grandparents/grandchildren, aunts/uncles, nieces/nephews and double cousins), and third-degree (eg, first cousins, great-uncles/great-aunts, and great-nephews/great-nieces) relatives and his/her respective spouse's relatives, who were at least 18 years old, were invited to participate. After the proband's first contact, first-degree relatives were invited to participate by phone; these included all living relatives in the municipality of Baependi (the urban zone, where 72% of the population lives4 and the geographically larger rural zone) and beyond. Ninety-five families were selected, and with 1691 individuals who have participated in some or all of the measurements of the study; they comprise more than 10% of the local population (see figure 1 for overview of study population).

Figure 1

Flow diagram of study population at baseline and follow-up.

The mean pedigree size was 24.2±31.8, and most of them (63%) encompassed three or four generations. There were 640 sibships with a mean size of 2.4±1.9 and the following numbers of main pairs of relatives: 3138 parent/offspring, 2253 sibling, 2590 grandparent/grandchild, 4418 avuncular, 40 half-siblings and 3743 cousins.

Medical history was collected from all participants at baseline. Described in the following are outcomes (incidence information derived from longitudinal data collection) and exposures (cross-sectional data collected at baseline or at subsequent study visits). The first 5-year follow-up took place in 2010, and alongside this, specific data collections have been added. The 2010 follow-up involved both renewed DNA sampling and measures of major cardiovascular risk factors. Cardiovascular events (end points such as myocardial infarction, heart failure, coronary insufficiency) and procedures (such as hospitalisations, surgery and the need for percutaneous coronary intervention) are recorded regularly; individuals are followed up annually by telephone contact and every 5 years during office visits. The attrition rate (based on deaths and follow-up refusals) was 23%. In the 2010 follow-up, 548 individuals were added (figure 2).

Figure 2

Admixture proportions for the Baependi cohort inferred by the ADMIXTURE software. The X-axis represents individuals from Baependi sorted according to their ancestries. Each individual is represented by a vertical column of colour-coded admixture proportions of the ancestral populations (from HapMap), CEU (Utah residents (CEPH)) and Toscani in Italia (TSI) (European, blue), Human Genome DIversity Project (HGDP), Pima and Maya (Native American, red) and Yoruba in Ibadan, Nigeria (YRI), Luhya in Webuye, Kenya (LWK) and Americans of African ancestry in SW USA (ASW) (African, green).

A new follow-up, marking 10 years of the study, is ongoing, in which data are collected as in 2010 (with the exception of the neuropsychiatric protocol). Taken together, baseline measurements represent 2239 individuals. For follow-up percentages of individual outcomes, see table 2.

Table 2

Follow-up rate across core variables

A general questionnaire was initially administered to each participant by a trained technician to cover basic attributes such as family relationships and demographic characteristics. For participants who were illiterate or visually impaired, questionnaires were read in their entirety by the researcher acting as a scribe. Where an individual's capacity was limited to complete a specific question, assistance was provided as needed by either the researcher or a caregiver.

The investigators also used the WHO MONICA epidemiological instrument18 which had previously been successful within other epidemiological projects. While the core data about cardiovascular risk factor prevalence and heritability were collected at baseline as well as at follow-up, and have been described in detail previously,19 additional parameters have subsequently been recorded pertaining to biochemistry, sleep and mental health. For a full list of study variables measured see table 2. The following types of information were collected (for details of variables and distribution estimates, see table 3): general lifestyle and health, cardiovascular health, biochemistry, anthropometry, mental health and sleep and circadian rhythms.

Table 3

Descriptive statistics of studied phenotypes, distributed by gender (cross-sectional type data, parentheses show SD)

In addition, leucocyte DNA samples have been collected and used to probe Affymetrix 6.0 arrays, and data imputed using the 1000 Genomes Cosmopolitan Panel. Analysis of genomic ancestry was conducted using the Admixture software.20

Each participant provided informed written consent before participation.

Findings to date

The study has begun to make significant contributions to the understanding of a variety of conditions including the heritability of risk factors for cardiovascular events. Many of these have yielded heritability estimates, summarised in table 4. The initial publication19 determined the extent to which unmeasured genetic factors and measured environmental and lifestyle factors contributed to variation in a large panel of cardiovascular-related traits. Subsequent studies identified that age-at-onset is a useful trait for gene mapping of common complex diseases21 and reported that heterogeneity in trait variances needs to be accounted for in the design and analyses of trait orientated gene finding studies.22 Lifestyle factors, including physical activity23 and smoking24 have also been a focus of the study.

Table 4

Estimates from heritability studies and comparisons with previously published estimates

The study has been extended to also include aspects of mental health, another major non-communicable disease complex forming a large burden to society. The most recent report emanating from the study showed that anxiety and depression, measured as continuous variables within the general population rather than as clinical diagnoses, shared a significant genetic background.25 Skin properties of the study participants have also been collected, and a report of significant associations between stratum corneum moisture and sex, age, high sun exposure, and use of sunscreen has been published.26 Sleep and circadian rhythms, having important links with cardiovascular health,27 ,28 are a focus of the project, including a general screen for sleep apnoea which was diagnosed in 18.6% of the population and had similar heritability estimates compared with other urban populations studies (See table 4).29 A study of diurnal preference (chronotype) which indicated that, in spite of universal access to electricity, the Baependi population was strongly shifted towards morningness, particularly in the rural zone.17

In parallel with these primary interests have been developments in statistical methodologies due to the family-based structure of the study.30 ,31 For example, the use of genome-wide association study (GWAS) techniques in family-based studies is susceptible to confounders. Therefore researchers used simulations from two datasets including the Baependi study to derive coefficients which incorporate the relatedness of individuals within family-based designs.31 These estimates suggest that family structure is important for the estimation of global individual ancestry for extended pedigrees, but not for siblings. A second key statistical output from the cohort concerns the complexity of multiple CVD phenotypes, which makes disease diagnosis and genetic dissection difficult. Using the cohort data the groups were able to propose empirically guided statistical methodologies which account for all categorical phenotypes, allowing informative comparison among individuals.30

Strengths and limitations

Over the last decade, the Baependi cohort study has begun to fulfil its potential of making significant contributions to our understanding of the heritability across a variety of different health conditions. A number of study attributes make this study population unique, including the substantial admixture, the conservative and cohesive lifestyle patterns, and the increasingly rich tapestry of both cardiovascular and other variables extracted at baseline and follow-up.

The researchers did not set out with a specific target population number and in contrast to the design of some longitudinal study designs, there was no specific intention to provide a representative sample of the local population. However, the study sample was obtained using randomised procedures, whereby the researchers attempted to achieve equal representation of all socioeconomic groups. Equally, the selection was not specific to any specific phenotype of interest, and therefore our sample is relatively free from disease-specific bias. Further, the multigenerational nature of the study means that it exceeds the conventional nuclear family unit.

The legal preclusion in Brazil of offering payments to study volunteers is a general limitation, especially for more complex and intrusive study protocols. Nonetheless, thanks to the engagement and support of the community, and the provision of a general practitioner in the field station, the retention rate of the study remains very high.32

Although the wide variety of outcomes recorded is a key strength of the study, the extensive use of scribes and face-to-face interviews may have exacerbated reporting bias (eg, for diurnal preference, smoking, alcohol). To control for outcome variability, wherever possible, multiple measurements were collected (eg, systolic and diastolic blood pressure19). However, some analyses were based on single samples (eg, tryglycerides19), whereas repeated measurements are likely to have given more reliable estimates of individual range.

The family-based study design is noteworthy because it adds information (family structure) to any variable that is statistically estimated as opposed to cohorts with independent participants; in practice, heritability estimates can be generated from any desired variable. Although further follow-up of individuals is ongoing (and further follow-ups are anticipated), the 5 years follow-up is insufficient for long-term trajectory of individuals. Another consideration is that the local population has a relatively low level of education and income. These covariates have been considered carefully, particularly where lower socioeconomic status is known to raise the risk for specific health conditions such as anxiety and depression.33 ,34


The database of the project is under continuous development as new datasets are being added. Data will be made available in public repositories as and when required and appropriate, as publications based on datasets are being accepted. The project already involves multiple collaborations, and the investigators welcome new angles of analyses on existing data sets and proposals for new ones. Proposals can be forwarded to ACP (email: for discussion at the steering committee meeting.


The authors wish to thank the Municipal Council of Baependi for logistical support and assistance with field work, the dedicated staff at the field station and the participants of the study. They are particularly grateful to Rerisson Faria Lima for helping in the preparation of this manuscript.


View Abstract


  • Twitter Follow Malcolm von Schantz at @mvonschantz

  • Contributors CMdO, JEK, and ACP all contributed to the conception and design of the study. MvS, ABN, HCS, ARVRH, JMPS, MdA, GL-F, HV, TPT, MP, AA, CMdO, ROA and ACP contributed to the development of methods and data collection. KJE, MvS, ABN, HCS, ARVRH, JMPS, MdA, GL-F, HV, TPT, MP, AA, CMdO, ROA, JEK and ACP were involved in data analysis and interpretation. KJE and MvS drafted the work. All the authors revised this article and approved the final version to be published.

  • Funding This study was supported by awards from FAPESP to ACP, JEK, ARVRH and MP (grants 2007/58150-7, 2010/51010-8, 2011/05804-5, 2013/17368-0), from CNPq to ACP, JEK, HV, ARVRH and MvS (150653/2008-5, 481304/2012-6, and 400791/2015-5), Fundação Zerbini and Hospital Samaritano, and by the Global Innovation Initiative to MvS (jointly funded by the British Council and the UK Department of Business and Skills).

  • Competing interests None declared.

  • Ethics approval The study protocol conformed to the tenets of the Declaration of Helsinki, and was approved by the Ethics Committee of the Hospital das Clinicas, University of São Paulo, Brazil (approval number 0494/10).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Researchers can apply for data and biomaterial by submitting a proposal to the principal investigator, ACP (

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.