Article Text

Download PDFPDF

Model-based recursive partitioning to identify risk clusters for metabolic syndrome and its components: findings from the International Mobility in Aging Study
  1. Catherine M Pirkle1,
  2. Yan Yan Wu1,
  3. Maria-Victoria Zunzunegui2,
  4. José Fernando Gómez3
  1. 1 Office of Public Health Studies, University of Hawaiʻi at Mānoa, Honolulu, Hawaii, USA
  2. 2 Institut de recherche en santé publique, Université de Montréal, Montréal, Canada
  3. 3 Facultad de Ciencias para la Salud, Universidad de Caldas, Manizales, Colombia
  1. Correspondence to Dr Catherine M Pirkle; cmpirkle{at}


Objective Conceptual models underpinning much epidemiological research on ageing acknowledge that environmental, social and biological systems interact to influence health outcomes. Recursive partitioning is a data-driven approach that allows for concurrent exploration of distinct mixtures, or clusters, of individuals that have a particular outcome. Our aim is to use recursive partitioning to examine risk clusters for metabolic syndrome (MetS) and its components, in order to identify vulnerable populations.

Study design Cross-sectional analysis of baseline data from a prospective longitudinal cohort called the International Mobility in Aging Study (IMIAS).

Setting IMIAS includes sites from three middle-income countries—Tirana (Albania), Natal (Brazil) and Manizales (Colombia)—and two from Canada—Kingston (Ontario) and Saint-Hyacinthe (Quebec).

Participants Community-dwelling male and female adults, aged 64–75 years (n=2002).

Primary and secondary outcome measures We apply recursive partitioning to investigate social and behavioural risk factors for MetS and its components. Model-based recursive partitioning (MOB) was used to cluster participants into age-adjusted risk groups based on variabilities in: study site, sex, education, living arrangements, childhood adversities, adult occupation, current employment status, income, perceived income sufficiency, smoking status and weekly minutes of physical activity.

Results 43% of participants had MetS. Using MOB, the primary partitioning variable was participant sex. Among women from middle-incomes sites, the predicted proportion with MetS ranged from 58% to 68%. Canadian women with limited physical activity had elevated predicted proportions of MetS (49%, 95% CI 39% to 58%). Among men, MetS ranged from 26% to 41% depending on childhood social adversity and education. Clustering for MetS components differed from the syndrome and across components. Study site was a primary partitioning variable for all components except HDL cholesterol. Sex was important for most components.

Conclusion MOB is a promising technique for identifying disease risk clusters (eg, vulnerable populations) in modestly sized samples.

  • recursive partitioning
  • metabolic syndrome
  • older adults
  • global health

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Explores social and behavioural risk clustering for metabolic syndrome among community-dwelling older adults from five diverse global settings.

  • Applies model-based recursive partitioning (MOB), which is more intuitive and computationally efficient than classification and regression tree, to identify risk clusters.

  • Provides an example of how MOB can be used in a modestly sized sample for hypothesis generation about complex admixtures of risk factors.

  • Lacks data on participant diet, which likely clusters with many of the social and behavioural factors examined.

  • Strong contextual influences may have masked variance attributable to individual behaviours.


With ageing, life’s hazards and rewards amass and become embodied in ways that diminish or protect health. Differences in health trajectories are the product of cumulative risk and protective factors that are programmed into biobehavioural regulatory systems.1 The cardiometabolic pathologies commonly observed in older adults (partially) reflect the collective burden exacted on their bodies as they adapt to life’s challenges.2 The types and magnitude of challenges that bodies are exposed to vary across societies and time, as these reflect underlying social orders with regard to the distribution of economic, political and social resources.2 Research that purposefully compares populations of older adults across heterogeneous societies may inform our understanding of modifiable social and behavioural factors that influence the dysregulation of biological systems. Of importance, social norms and patterning are capable of creating toxic or protective clusters that manifest among identifiable subgroups.3 Such information is useful for directing public health interventions and for considering how contextual conditions render groups particularly vulnerable.

Metabolic syndrome (MetS) is a highly prevalent health condition among older adults; it confers an approximate twofold increased risk of cardiovascular disease and fivefold increased risk of diabetes.4 It entails a constellation of components including obesity, impaired glucose metabolism, hypertension and atherogenic dyslipidaemia.4 In older adults, MetS prevalence varies considerably across populations. Among older adults in the USA and Europe, MetS prevalence is estimated at 30%4 5; in urban China researchers estimate a prevalence of 60%.6 In Canada, MetS prevalence increases with chronological age; approximately 40% of adults aged 60–79 years were estimated to have the syndrome according to the 2007–2009 Canadian Health Measures Survey.7 Studies of older adults frequently document greater prevalence in women compared with men,6–10 but report inconsistent associations with income and education. Some document linear associations between education and MetS prevalence, with the lowest educated at highest risk.10 Others document non-linear associations, in which the highest risk groups are lower-middle income and high school graduates, while those in the lowest income group and without a high school education are slightly protected.7 Yet, others observe strong associations with education, but not income10 or associations with education and income in one sex over the other.10 11 Heterogeneity in MetS prevalence likely reflects complex, contextually specific risk admixtures.

Epidemiological research on ageing explicitly acknowledges that environmental, social, psychological and biological systems interact to influence health outcomes.1 3 The well-known ecological model posits that patterns of health are affected by a dynamic interplay among these factors across the life course.2 12 A challenge for epidemiologists, especially with modest sample sizes, is to operationalise models that assume the joint effects of multiple risk factors on health conditions, such as MetS.13

Recursive partitioning is a technique that allows for exploration of distinct mixtures, or clusters, of individuals that have a particular outcome. Based on a set of candidate independent variables, it can produce classification trees with a series of binary splits highlighting subgroups with relatively similar risk profiles for a given outcome.14 15 The classification trees depict the joint effects of multiple risk factors.15 It is a data-driven approach with the potential to identify complex interactions worthy of future investigation.15 Researchers have applied partitioning techniques to identify high-risk subgroups for cardiovascular disease, diabetes and falls in population-based studies.16–18 Most of this work examines clinical or genetic risk factors, but the same technique can be expanded to examine social and behavioural risk factors.

In an international, multisite cohort of community-dwelling older adults, we apply recursive partitioning to investigate social and behavioural risk factors for MetS and its components. Our objectives are to assess if there are social/behavioural risks clusters for those with MetS or the components of the syndrome, and whether these risk clusters vary across societies. To date, we know of no other studies employing recursive-partitioning techniques to investigate predictors of MetS that are informed by a social epidemiological perspective.


Data source and study populations

This is an analysis of 2012 baseline data from the International Mobility in Aging Study (IMIAS). IMIAS is composed of community-dwelling older adults, 65–74 years of age. This study comprises three sites in middle-income countries—Tirana (Albania), Natal (Brazil) and Manizales (Colombia)—and two from a high-income country—Kingston (Ontario, Canada) and Saint-Hyacinthe (Quebec, Canada). These cities represent diverse ways of living in distinct societies, providing a wide range of risk factors and outcomes. For example, Tirana is the capital of an ex-communist country in rapid transition to capitalism, while Manizales is in the Andean coffee-growing region, of Catholic tradition and relatively affluent. Approximately 200 men and women, each, were recruited per site for a sample size of 2002. A detailed description of the study sites and cohort is available elsewhere.19


In Tirana, Manizales and Natal, we recruited participants through their neighbourhood primary care centres by selecting a random sample of older adults registered at each.19 The response rate was over 90%.19 Ethics’ committees in Canada prohibited researchers directly contacting potential participants. Invitations to participate in the project were therefore sent indirectly via family physicians.19 Thirty per cent of people receiving a letter of invitation from their doctor in Kingston and St. Hyacinthe contacted the IMIAS research team; 95% agreed to participate.19 Comparison with 2006 Canadian census data suggests participants in Kingston were more educated than the general population of that city, while participants from St. Hyacinthe had similar educational levels to inhabitants of that city. Otherwise, characteristics between those recruited and the sampling frame were very similar.19 At all sites, over 80% of older adults were registered at a health centre or had a primary care physician19; it is unlikely that our recruitment strategy systematically excluded a large segment of older adult population.

Exclusion criterion

Those with four or more errors on the orientation scale of the Leganes Cognitive Test20; low scores indicate inability to complete study procedures.


Study procedures were carried out at the participant’s home, in the local language, by a trained interviewer. Detailed descriptions of data collection procedures are provided elsewhere.19

Metabolic syndrome

Except for the measure of insulin resistance, we defined MetS according to the Adult Panel Treatment III criteria.21 IMIAS did not collect fasting glucose and the corresponding glycosylated haemoglobin (HbA1c) value was used instead.22 Thus, MetS was defined as the presence of three or more of the following: abdominal obesity measured by waist circumference (women >88 cm; men >102 cm); elevated triglycerides (≥150 mg/dL); low high-density lipoprotein cholesterol (HDL-C men <40 mg/dL; HDL-C women <50 mg/dL); elevated HbA1c (≥5.7%); and high blood pressure (≥135 mm Hg systolic and/or ≥85 mm Hg diastolic).

Socioeconomic and demographic characteristics

We categorised education as: less than secondary school and/or illiterate, some secondary school to completed secondary school and postsecondary education. A participant’s living arrangement was determined with the following questions: Do you live alone (yes/no)? (If no) Who do you live with? Responses were then categorised as: alone, spouse only and multiple people.

We determined exposure to childhood social and economic adversity with a scale that varied from 0 to 3 including parental alcohol or drug abuse, witnessing family physical violence and having been physically abused (childhood social adversity); poor economic status, hunger and parental unemployment (childhood economic adversity).23 Occupation was grouped into five categories: non-manual, service, agriculture, manual and housewife, according to self-reported longest held occupation (based on International Labour Organization categories). We enquired about current annual income levels. Based on the annual minimum salary for each site, individuals were categorised as poor, middle, upper middle and high income. For example, in Canada, the minimum salary is $C19 680 per year. Thus, we categorised Canadian participants as poor if they earned less than $C20 000 per year. Those who earned more than the minimum salary but less than twice it ($C20 000–39 999) were classified as middle income, while those that earned twice or higher the minimum salary, but less than three times it, were classified as upper middle ($C40 000–59 999) and those that earned three times or more the minimum salary (≥$C60 000) were classified as upper income. This was done for each site based on the site-specific minimum salary. Income sufficiency was assessed accordingly: to what extent does your income allow you to meet your needs? Responses were categorised into: very sufficient, sufficient and not (at all) sufficient. We asked participants about their work history in the past 2 weeks and categorised them as: worked with remuneration; worked without remuneration; had a job, but did not work; retired or pensioned; and did not work. We also asked participants if they currently smoke. Responses were categorised as regular, occasional, used to be a smoker and never. Finally, we assessed minutes of physical activity per week with a validated computer-animated assessment tool.24

Statistical analysis

Descriptive statistics summarise overall sample characteristics. Because the distributions were positively skewed for some measures, we report the median, first and third quartiles for all continuous variables. We performed Χ2 tests to investigate the associations between MetS and categorical independent variables and carried out two-sample t-tests to examine mean differences in biomarkers. Random forest method was used to impute missing physical activities (n=59).25

Model-based recursive partitioning method (MOB) was applied to cluster individuals into subgroups with similar response values.26–28 MOB is reminiscent of the classification and regression tree (CART) algorithms, which split the datasets into subsets based on independent (partitioning) variables, of which the distributions of the response values are most different.13 Whereas CARTs have constant fits in the terminal nodes, MOB trees have parametric models with one or more predictor variables controlled in each step of the partitioning, and in their terminal nodes. For instance, age is controlled in the MOB analysis of MetS using logistic regression models and the MOB algorithm cycles iteratively through the following steps: (1) fit the logistic regression with MetS as response variable and age as control variable, (2) test for parameter instability over a set of partitioning variables (socioeconomic and demographic characteristics) while controlling for age, (3) if there is some overall parameter instability, split the data set with respect to the variable associated with the highest instability (ie, the smallest P value) and (4) repeat the procedure in each of the resulting subsamples with different risk of MetS. The process is termed recursive because each subpopulation may be split a number of times until a particular stopping criterion is reached. Our stopping criteria were: 5% level of significance and minimum sample size of 100 at terminal nodes. For continuous partitioning variables, MOB tests and selects an optimal cut-off point and split subjects into two subgroups.26–28

We performed MOB for MetS, systolic blood pressure (SBP), diastolic blood pressure (DBP), waist circumference, log transformed triglyceride, HDL-C and HbA1c, controlling for age. The partitioning variables included social and behavioural risk characteristics described previously. Statistical software R (V.3.2.2) and the R package ‘party’ were used.


Complete data on all variables were available for 1628 (81%) participants. Table 1 presents the frequency of MetS according to the participant characteristics. MetS was observed in 43% participants, 50% of women and 35% of men. For most variables, there were important differences in the proportion of participants with MetS. It concentrated among those of lower socioeconomic status: those with lower education, lower incomes, manual workers and housewives. More MetS was observed among those reporting childhood adversities. Those with MetS reported had higher mean blood pressure, waist circumference, HbA1c, triglyceride, low HDL measures and walked less on average than those without it.

Table 1

Descriptive characteristics of the participants and frequency of MetS

Table 2 presents participant characteristics by study site and shows a notably higher frequency of MetS among participants from the middle-income sites (46%–53%) compared with those from the Canadian sties (≈30%).

Table 2

Descriptive characteristics of the participants by study site

Figure 1 depicts the MOB tree for MetS, adjusting for participant age. The highest estimates of MetS prevalence were observed in clusters of women from the middle-income study sites (Tirana, Manizales and Natal). In these clusters, the predicted proportion with MetS varied from 58% to 68%, depending on education. Better-educated women from these sites had more MetS. Among women from the Canadian sites, less walking time per week distinguished the higher from lower probability cluster. The lowest predicted proportion (26%) of MetS was observed in men with postsecondary education reporting no childhood social adversities, and in women from the Canadian sites who had more walking time per week. The graphs under each node in figure 1 depict the estimated prevalence according to age and demonstrate that for certain nodes (e.g., 7), there is a strong association between increasing age and higher estimated MetS prevalence.

Figure 1

Model based recursive partitioning for MetS controlling for age. The horizontal axis of the terminal plots is age (64–75 years), and the vertical axis shows the predicted mean proportions of MetS obtained from logistic regression models by age. The predicted mean proportion of MetS and 95% CI for each terminal node are listed under the plots. MetS, metabolic syndrome.

The online supplementary files contain the MOB trees for MetS components. Each tree depicts unique clusters that do not correspond with those observed for MetS as a whole. For example, the highest estimated mean SBP (154 mm Hg) was observed among participants from Tirana and Natal, with income insufficiency, and who smoked regularly or used to. For this outcome, in contrast to MetS as a whole, sex was not a partitioning variable. Overall, in all models except HDL, study site was the primary partitioning variable and in some cases (triglycerides), the only one. Typically, participants from Natal and Tirana had unfavourable estimates for MetS components; however, participants from Manizales had the highest estimated triglyceride concentrations (145 mg/dL). Participant sex was a key partitioning variable for DBP, waist circumference and HDL concentration. Other partitioning variables for one or two of the MetS components included: weekly walking time, current employment status, perceived income sufficiency, living arrangements, smoking, adult occupation and current income.

Supplemental material


The MOB technique identified distinct clusters of individuals with differential probabilities of MetS and its components, according to multiple social and behavioural risk factors. For the syndrome as a whole, in clusters of women from middle-income sites, the predicted proportion with MetS was quite high (58% or 68% depending on the cluster). In clusters of men, the predicted proportion with MetS was lower (26%, 38% or 41% depending on the cluster) and highest among men reporting childhood social adversities (41%). MetS in women from the Canadian sites varied considerably based on average walking time per week. Women from Kingston and St. Hyacinthe who walked minimally (>11 min/week) had predicted probabilities of MetS identical to men with postsecondary education and no childhood social adversities. This work demonstrates the potential of using MOB to identify joint effects in a moderately sized sample of individuals. It raises questions for future investigation, especially related to the concentration of risk(s) in certain subgroups.

This study corroborates previous findings that the prevalence of MetS varies according to age, sex and socioeconomic status.4 10 29 Consistent with other studies, overall, we observed a concentration of MetS in participants of lower socioeconomic status10 30 31; although, among women from the middle-income sites, MetS was more prevalent among those with postsecondary education. Better educated women from the middle-income sites likely had/have more money to afford obesogenic, westernised foods and over their lifetimes, may have engaged in less exercise, as educational attainment may have allowed them to ‘escape’ physically strenuous jobs. Consistent with other research,32 early life adversity was associated to higher prevalence estimates of MetS; however, this association was only observed in men, whereas it has been observed in both men and women elsewhere.32 The strong context specificity of our findings highlights the utility of using MOB to identify unique admixtures that might have been overlooked with traditional statistical techniques and/or would have been impossible to identify without a very large sample size.

When applying MOB, we observed distinct risk clusters according to study site and participant sex, education and childhood adversity. These findings support a dynamic interplay between contextual and social risk factors and the concentration of risks in certain subgroups, which is consistent with the notion of vulnerable populations proposed by Frohlich and Potvin.3 Accordingly, vulnerable populations are defined by shared social characteristics that put them at ‘higher risk of risks’ (Frohlich and Potvin, p218).3 These risks and their accumulation across the life course relate to fundamental causes linked to one’s social position within the predominant social structure.3 This may explain why study site and sex were key partitioning variables. Study site proxies societal opportunities for education, occupation and income and expectations surrounding behaviours and diet. The clustering by site supports research underscoring the importance of context in patterning the risk exposures of individuals.3 33 Sex/gender likely underpins access to resources such as money, knowledge and power affecting health outcomes through multiple risk factors.34

In studies with large sample sizes (>10 000 participants), complex joint effects have been observed with traditional regression analysis techniques. For example, using data from representative samples of US adults aged 25 and older, Loucks et al reported that overall, the prevalence of MetS was similar in women and men, both low education and poverty were associated with MetS, and the social gradient of the prevalence of MetS was more pronounced in women than in men.11 These US findings differ from ours since in IMIAS, sex was the first stratifying variable, with an overall higher frequency of MetS in women than in men. However, for both men and women in our study, education was an important predictor of MetS at most sites; although, the direction of the association between education and MetS was not consistent. Among women from the middle-income sites, greater educational attainment was associated with a higher predicted prevalence of MetS. Among men with no childhood social adversity, low educational attainment was associated to a higher predicted prevalence of MetS. In the study by Loucks et al, education was generally not associated to MetS in men; although, their study did not consider childhood adversity experiences,10 11 in contrast to our own work. Interestingly, in the Loucks et al study, among men, low education was associated with the MetS components of abdominal obesity, hypertension and hyperglycaemia.11

When MOB was applied to the syndrome components, only study site and participant sex were consistent partitioning variables for most components. In general, when measures of socioeconomic status were partitioning variables, lower status was associated with poorer outcomes. For example, income insufficiency predicted higher mean DBP among Tirana participants. Measures of socioeconomic status appeared as partitioning variables more frequently than risk behaviours. This is consistent with recent work analysing individual-level data from >1.7 million people in which low socioeconomic status was associated with premature mortality across multiple disease categories and ranked third in population attributable fraction for mortality among a large list of risk factors (physical inactivity ranked second).35 In our study, higher mean weekly physical activity predicted lower waist circumferences and higher HDL concentrations, consistent with the literature.36–38

This study has strengths. First is the use of the MOB technique. Traditional CART methods have the vulnerability of over fitting, selection bias and no concept of statistical significance. Thus pruning and cross-validation methods are used to avoid the over-fitting problems characteristic of CART. MOB is implemented via hypothesis tests, which leads to regression models whose predictive performance is equivalent to optimally pruned trees, therefore offering an intuitive and computationally efficient solution to the over-fitting problem, and the resulting models are easier to communicate to practitioners.27 28 39 Finally, this is one of the very few studies that apply MOB to examine social and behavioural risk clusters for disease. Most research applying recursive partitioning focuses on identifying patient subgroups within a clinical setting26 and/or how to better define components that constitute syndromes such as MetS.39

This study has limitations. Study site and sex are structuring variables that may mask more proximal risk factors for disease. Individual risk behaviours, such as smoking, may be ubiquitous within certain subgroups,33 rendering it difficult to detect the influence of these behaviours on MetS using the MOB technique. Another limitation is that we used a single waist circumference cut-off value for all populations. Arguments exist for country and/or ethnicity-specific waist circumference cut-offs, but more work is required to optimally determine these.40 Finally, we did not collect individual dietary data or data on the early nutritional environment.32


We applied a recursive partitioning technique to investigate risk clustering for MetS in an international, multisite study of community-dwelling older adults and observed unique risk clusters according to mostly contextual and socioeconomic characteristics. The main partitioning variables in our results were study site and sex, which for most people are not easily modifiable. However, the policies and opportunities afforded to residents of different communities and to men versus women can be modified and do vary dramatically on a global scale, which likely helps to explain the large variations in MetS prevalence across communities and even the inconsistencies observed across the literature in which women sometimes, but not always, have higher MetS prevalence. By identifying risk clusters with techniques such as MOB, we can generate novel hypotheses about both contributing and protective factors that might have been missed with traditional regression techniques, as relatively few studies have sufficient resources to recruit large enough samples for multiple order joint effects. MOB may also prove particularly informative in studies with much larger samples, such as the Health and Retirement Study, where it can be used to generate new hypotheses about risk clustering and then more traditional deterministic techniques can be applied to the same sample in order to corroborate or contradict these hypotheses. Finally, with regard to both clinical practice and health promotion activities, identifying risk clusters is important for targeting purposes, as the intensity and type of programmes may differ according to subgroups.15


The authors would like to thank all of the IMIAS participants.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.


  • Contributors CMP and M-VZ conceived the study. CMP and YYW analysed and interpreted the data. CMP, YYW, JFG and M-VZ contributed to the writing and editing of this manuscript.

  • Funding This study was supported by the Canadian Institutes of Health Research (CIHR).

  • Competing interests None declared.

  • Patient consent Obtained.

  • Ethics approval Institutional review for this project was obtained from the relevant organisations at each site: the Institute of Public Health in Albania, the Federal University of Rio Grande do Norte in Brazil, the University of Caldas in Colombia, the University of Montreal Hospital Research Centre (CR-CHUM) and Queen’s University in Canada. Written informed consent was obtained from all participants.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Extra data is available through registration on the IMIAS website ( Registered users can request IMIAS data through a data request form.