Article Text

Original research
Identifying co-occurrence and clustering of chronic diseases using latent class analysis: cross-sectional findings from SAGE South Africa Wave 2
  1. Glory Chidumwa1,
  2. Innocent Maposa1,
  3. Barbara Corso2,
  4. Nadia Minicuci2,
  5. Paul Kowal3,
  6. Lisa K Micklesfield4,
  7. Lisa Jayne Ware4,5
  1. 1Division of Epidemiology and Biostatistics, School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, Gauteng, South Africa
  2. 2Neuroscience Institute, National Research Council, Padova, Italy
  3. 3Research Institute for Health Sciences, Chiang Mai University Faculty of Science, Chiang Mai, Thailand
  4. 4SAMRC/Wits Developmental Pathways for Health Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg-Braamfontein, Gauteng, South Africa
  5. 5DSI-NRF Centre of Excellence in Human Development, University of the Witwatersrand, Johannesburg, Gauteng, South Africa
  1. Correspondence to Glory Chidumwa; glory.chidumwa{at}


Objectives To classify South African adults with chronic health conditions for multimorbidity (MM) risk, and to determine sociodemographic, anthropometric and behavioural factors associated with identified patterns of MM, using data from the WHO’s Study on global AGEing and adult health South Africa Wave 2.

Design Nationally representative (for ≥50-year-old adults) cross-sectional study.

Setting Adults in South Africa between 2014 and 2015.

Participants 1967 individuals (men: 623 and women: 1344) aged ≥45 years for whom data on all seven health conditions and socioeconomic, demographic, behavioural, and anthropological information were available.

Measures MM latent classes.

Results The prevalence of MM (coexistence of two or more non-communicable diseases (NCDs)) was 21%. The latent class analysis identified three groups namely: minimal MM risk (83%), concordant (hypertension and diabetes) MM (11%) and discordant (angina, asthma, chronic lung disease, arthritis and depression) MM (6%). Using the minimal MM risk group as the reference, female (relative risk ratio (RRR)=4.57; 95% CI (1.64 to 12.75); p =0.004) and older (RRR=1.08; 95% CI (1.04 to 1.12); p<0.001) participants were more likely to belong to the concordant MM group, while tobacco users (RRR=8.41; 95% CI (1.93 to 36.69); p=0.005) and older (RRR=1.09; 95% CI (1.03 to 1.15); p=0.002) participants had a high likelihood of belonging to the discordant MM group.

Conclusion NCDs with similar pathophysiological risk profiles tend to cluster together in older people. Risk factors for MM in South African adults include sex, age and tobacco use.

  • hypertension
  • public health
  • statistics & research methods

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • This is the first comprehensive study on factors associated with the multimorbidity latent classes in low-income and middle-income countries.

  • A key strength of the Study on global AGEing and adult health is that it consists of nationally representative samples, with high response rates.

  • One weakness of this study is that data on most of the chronic diseases, and many behavioural variables (including tobacco use), were based on self-report, and can thus be affected by possible recall bias and social desirability bias.

  • The cross-sectional design precludes causal inferences.


Non-communicable diseases (NCDs) are the leading cause of mortality across the globe,1 and accounted for 73% of deaths in 2017.2 3 In developed countries, it is estimated that approximately one in every four adults experience multimorbidity (MM), with half of older adults having three or more chronic conditions.4 5 The prevalence of NCDs continues to increase in low/middle-income countries (LMICs) including South Africa.1 NCDs are responsible for 43% of deaths per year in South Africa, with most being premature deaths (deaths occurring before the age of 65 years).6–8 NCD-related deaths are predicted to increase substantially over the next few decades if measures are not taken to combat the upward trend in prevalence.1 9

Within an individual, the coexistence of two or more chronic non-communicable, mental health or infectious diseases, of long duration (>3 months), is referred to as MM.10 11 Data from a 2015 South African primary healthcare (PHC) survey across all age groups reported the prevalence of NCD MM, which included hypertension, diabetes, ischaemic heart disease, asthma, epilepsy, chronic obstructive pulmonary disease, osteoarthritis and respiratory infection, as 14.4%.1 A study by Garin et al aimed at identifying and describing MM patterns among adults older than 50 years in low-income, middle-income and high-income countries, using data from the Collaborative Research on Ageing in Europe project and the WHO’s Study on global AGEing and adult health (WHO-SAGE) Wave 1, found that South Africa had a higher prevalence (68%) of MM (having at least two NCDs) than Ghana (48%), India (58%) and China (45%).3 In addition, in a study by Afshar et al to compare the prevalence of MM across 28 LMICs using the World Health Survey (2003), the prevalence of MM (two chronic conditions or more) in South Africa was 21.6% among the 50–64 year age group and 30.1% among those aged 65 years and older.12 A study by Ayeni et al aimed at profiling MM among 2281 South African women of age 18 years and older, newly diagnosed with breast cancer, across two South African provinces.13 They reported that 43.9% of the women met the definition of MM which included conditions such as hypertension, HIV infection and tuberculosis.

Evidence suggests that the factors associated with the rising prevalence of NCDs in South Africa include age, area of residence (urban or rural), tobacco use, insufficient physical activity and unhealthy diets.9 A study by Weimann et al investigated the association between socioeconomic disadvantage and MM in South Africa at two time points, 2008 and 2012, using the National Income Dynamics Study. They showed that the risk for MM was doubled in urban residents relative to their rural counterparts, and respondents who were socioeconomically deprived had a twofold increased risk of having MM compared with the less deprived in both urban and rural areas.14

Previous research on MM in South Africa has primarily used simple counts of chronic conditions. However, different combinations of diseases may affect a person’s health and healthcare differently.15 To account for these differences, disease combinations can be categorised according to their likelihood to cluster together, pathophysiological pathways or management plans, for example, hypertension and diabetes frequently occur together and may share common pathophysiological mechanisms.15 16 The prevalence and patterns of MM have important implications for targeted healthcare services for prevention, diagnosis, treatment and control.

The aim of this study was to classify South African adults aged 45 years and older according to MM risk, using self-reported diagnosed NCD health condition variables in a latent class analysis (LCA) using data from the WHO-SAGE South Africa Wave 2. Additionally, the analyses looked at sociodemographic, anthropometric and behavioural factors associated with identified patterns of MM. The findings of the current study will contribute to the evidence base on the epidemiology of MM in a large South African adult population.


Study design and participants

The current study used data from the WHO-SAGE South Africa, which is part of an ongoing multicountry longitudinal study including China, Ghana, India, Mexico and the Russian Federation, to examine the health and well-being of nationally representative adult populations aged 18+ years in over 42 000 participants, with an emphasis on populations aged 50+ years.17 Further details are available on the WHO-SAGE website ( The current study is a cross-sectional analysis for the SAGE South Africa Wave 2 data collected in 2014/2015 using participants (n=1967), who had valid (not equal to zero) post-stratification weights, who were at least 45 years of age, with full data on the seven target NCDs.


Data on seven chronic conditions were collected via measurement and/or self-report. Noting hypertension is a common NCD risk factor, for the purposes of this analysis we categorised it as one of the seven conditions. As previously described, blood pressure was measured by trained nurses using wrist-worn blood pressure devices with positioning sensor (R6, Omron, Japan).18 Hypertension status was determined as a measured average systolic blood pressure reading of ≥140 mm Hg; and/or an average diastolic blood pressure reading of ≥90 mm Hg; and/or current use (within the last 2 weeks) of antihypertensive medication.19 Participants reported whether they had ever received a medical diagnosis for angina, arthritis, asthma, chronic lung disease (emphysema or bronchitis, chronic obstructive pulmonary disease), depression and diabetes. These six self-reported NCDs were assessed through a question about ever being diagnosed with the disease by a physician/health professional. The specific question was, ‘Have you ever been told by a health professional/doctor that you have (disease name)?’

Demographic variables included age, sex, years of schooling completed and area of residence (urban or rural). Behavioural variables included ever used alcohol, ever used tobacco (smoked and smokeless), adding salt at the table (yes/no), participation in self-reported vigorous intensity activity (yes/no—‘Does your work involve vigorous intensity activity that causes large increases in breathing or heart rate, (like heavy lifting, digging or chopping wood) for at least 10 min continuously?’, and ‘Do you do any vigorous intensity sports, fitness or recreational (leisure) activities that cause large increases in breathing or heart rate (like running or football) for at least 10 min continuously?’), and self-rated sleep quality (very good/good, moderate or poor/very poor) as reported previously.17 Anthropometric measures included weight, height and waist circumference, and were measured in accordance with WHO standardised techniques with all fieldwork teams trained by WHO staff. Details about the WHO standardised interview and direct measurement techniques are described elsewhere.17 Body mass index (weight, kg/height, m2) and waist-to-height ratio (waist (cm)/height (cm)) were calculated. Principal component analysis (PCA) was used to derive a socioeconomic status (SES) index for each household. PCA involved using household ownership of a set of 19 assets, household density and household service access (sanitation and electricity) into categorical or interval variables. The variables were then processed in order to obtain weights and principal components. The results obtained from the first principal component (explaining the most variability) were used to develop an index. The SES indices were then grouped into tertiles, reflecting different SES levels in the wealth continuum, as previously applied.20–22

Statistical analysis

Data were captured using an electronic data capture system (CAPI). STATA Statistical Software: V.16.0 (Stata Corp, 2017; College Station, USA) was used for statistical analyses. The LCA was performed in SAS (Version 9.4) PROC LCA add-on to determine patterns of coexisting chronic health conditions in the 1967 participants. LCA modelling is preferred over traditional clustering techniques as variation on observed indicators is modelled as a function of membership in unobserved classes called latent classes.23 24 In addition, LCA allows for statistical testing of model fit and class membership in a probabilistic way, with membership probabilities computed from the estimated model parameters.25 Furthermore, LCA has been demonstrated to be more objective and rigorous than K-means and hierarchical clustering for both exploratory work and theory testing.26 This is because LCA is model based, that is, there is a statistical model that is assumed to come from the population from which the data were gathered.25 In the current study, seven chronic health conditions (angina, arthritis, asthma, chronic lung disease, depression, diabetes and hypertension) were used as observed indicators. The optimal number of latent classes was determined using the adjusted Bayesian Information Criterion (aBIC), which has been shown to provide robust indicators of class enumeration with categorical outcomes.27 The aBIC was used to compare several plausible class models where the lowest values indicate the best fitting model. After selecting the best model, each participant was assigned to one class according to his or her highest computed probability of membership. Details for the LCA fit statistics are given in online supplemental table 1. The Pearson’s Χ2 test was used to test statistical differences between latent classes and categorical variables. Due to non-normality of continuous data, as shown by the Shapiro-Wilk test, statistical differences between groups/classes on continuous outcomes were tested using the Kruskal-Wallis test. Multinomial logistic regression was used to determine which sociodemographic, anthropometric and behavioural factors were associated with observed latent class membership. In the current study, we used STATA terminology for multinomial logistic regression. Relative risk ratios, 95% CIs and p values are reported for each explanatory variable.

Ethics statement

This study used the WHO-SAGE Wave 2 data available in the public domain for use by researchers ( The WHO-SAGE survey participants in all selected countries were informed about the survey, design, purpose and how it would benefit society at large. The survey was conducted under the supervision of the respective national governments.

Patient and public involvement

This study did not involve any patient and/or the public.


A total of 1967 participants were included in this analysis. Figure 1 below shows the study flow diagram.

Figure 1

Study flow diagram. NCDs, non-communicable diseases.

The median age for the sample was 62 years (IQR: 54–70). Fifty-seven per cent (n=1113) of our sample were women. The majority of the sample self-identified as black (n=1540, 78%), 6% (n=120) as white, and 16% (n=308) as coloured or Indian.

Prevalence of chronic NCDs and MM

Twenty-one per cent of the sample (n=415) had two or more of the seven chronic diseases, that is, MM while 39% (n=761) had none of the seven NCDs. The most common chronic disease was hypertension (52%) followed by arthritis (16%). Figure 2 below shows the prevalence of chronic NCDs by sex.

Figure 2

Prevalence of chronic NCDs by sex. NCDs, non-communicable diseases.

The prevalence of arthritis, depression, diabetes, lung disease and MM was higher in women, and of angina was higher in men.

Latent classes for chronic disease clusters

The optimal number of latent classes was determined using the aBIC. There were negligible differences between the two class and three class models and considering plausible interpretability, the three-class model was chosen.27 28 The three classes determined were: ‘minimal MM risk’, which included the individuals with low probabilities for having each of the seven NCDs; ‘concordant MM’, which included individuals with high probabilities of having hypertension and diabetes; and ‘discordant MM’, which included individuals with higher probabilities of having chronic conditions other than hypertension and diabetes. Concordant MM has been described by Piette and Kerr as chronic conditions that represent the similar pathophysiological risk profile and are more likely to be the focus of the same disease management plan, and discordant MM as chronic conditions that are not directly related in pathogenesis or management.15 The majority of the sample (n=1625, 83%) were classified as being in the ‘minimal MM risk’ class. This class had the lowest prevalence of all seven NCDs. The ‘concordant MM’ class constituted 11% (n=207) of the sample. The probability of being hypertensive in this class was 95%, and 74.1% for diabetes. Lastly, the ‘discordant MM’ class comprised 6% (n=135) of the sample, and showed prevalence of arthritis (62.0 %), angina (33.0%), asthma (11.7%), depression (15.3%) and lung disease (34.1%). The prevalence of each of the seven diseases is presented by latent class as online supplemental figure 1.

The demographic, anthropometric and behavioural characteristics of the three latent classes are presented in table 1. The latent classes were significantly different with respect to all characteristics, with the exception of self-reported vigorous intensity activity. Details of the pairwise comparisons between the groups are shown in table 1 below.

Table 1

Characteristics of participants by latent class category (n=1967)

Multinomial logistic regression results showing associations between the demographic, anthropometric and behavioural characteristics, and latent class membership, are presented in table 2 below.

Table 2

Results from multinomial logistic regression for factors associated with latent class membership

In this multinomial logit model, we used the minimal MM risk group as the reference. Being female was associated with a 4.4-fold greater likelihood of being in the concordant group, and a 1-year increase in age was associated with an 8% increased likelihood of being in the concordant group.

Tobacco users were 8.9 times more likely to belong to the discordant MM class relative to the minimal MM risk group. Every year increase in age was significantly associated with a 9% increased likelihood of belonging to the discordant MM class. None of the other factors was significant in this logistic regression.


In this study, we have shown that the prevalence of MM (coexistence of two or more NCDs) was 21%. The LCA grouped our sample of men and women over the age of 45 years into three groups namely: minimal MM risk (83%), concordant MM (11%) and discordant MM (6%). When compared with the minimal MM risk group, being female and older were associated with belonging to the concordant MM group, while tobacco use and an increase in age were associated with belonging to the discordant MM group.

Several recent studies have explored MM in South Africa,13 14 29 30 however this study has used data from the SAGE which represents the 50+ years South African population, to identify patterns of chronic disease coexistence. In addition, to our knowledge this is the first study in South Africa to use LCA to identify patterns of chronic disease coexistence as LCA has the ability to identify unique combinations of diseases using probabilities.25

Our study identified three latent classes of MM based on the presence or absence of seven chronic conditions. Previous studies that have used the LCA method to describe patterns of chronic disease coexistence in older populations have yielded mixed results as regards the number of clusters identified. A cross-sectional sample of 4574 Australian senior citizens (aged 50 years and over) using 11 chronic conditions reported an MM prevalence of 52% and identified four classes.31 Their sample presented (1) a relatively healthier group, (2) a sick group with dominant presence of arthritis, asthma and depression, (3) a sick group with dominant presence of hypertension and diabetes, and (4) the sickest group with dominant presence of cancer, heart and stroke.31 Similarly, a retrospective cohort study on 13 self-reported conditions from 14 502 Americans (65 years old and older) identified six classes using the LCA approach, and reported an MM of 67.3%.32 The classes included: minimal disease class (prevalence of all conditions is below cohort average), non-vascular class (excess prevalence in cancer, osteoporosis, arthritis, arrhythmia, chronic obstructive pulmonary disease, psychiatric disorders), vascular class (excess prevalence in hypertension, diabetes mellitus, stroke), cardio–stroke–cancer class (excess prevalence in congestive heart failure, coronary heart disease, arrhythmia, stroke, and to a lesser extent hypertension, diabetes mellitus, cancer), major neurological disease class (excess prevalence in Alzheimer’s disease, Parkinson’s disease, psychiatric disorders), and very sick class (above average prevalence of all 13 conditions).32 Comparison with these studies is difficult since the results might be influenced by the number and type of diseases included in the analysis, the characteristics of the sample or how data on diseases were collected.

In our study we identified a class representing ‘minimal MM risk’ (participants with low observed probabilities for the NCDs reported), which has previously been reported in other studies which also conducted LCA.28 31 32 However, the prevalence of 83% classified as ‘minimal MM risk’ in our study is larger than that described in these studies. This difference could be explained by the age of participants included in a study. For example, a study conducted by Olaya et al which found that 63.8% of their sample were classified in the minimal disease category had a mean age of 66 years, while the average age in our study is 62 years.28 This is further supported by our finding that the probability of MM increases with age.

In addition, we identified two more classes namely concordant MM and discordant MM. This is similar to the study conducted by Chang et al in rural South Africa where they defined concordant conditions as cardiometabolic conditions (hypertension, diabetes and angina), and discordant conditions as mental health illness, alcohol dependence and HIV infection.29 Differences in the conditions in the discordant class could be attributed to the fact that the studies did not consider the same conditions except depression.

To provide better care for individuals with comorbid conditions, South Africa implemented the integrated chronic disease management (ICDM) plan in 2014 for primary healthcare.33 However, evidence suggests that implementation has faced challenges with many programmes remaining disease focused and with vertical implementation that fails to consider comorbid conditions.34 35 Our findings have the potential to guide policy in refining implementation of strategies to address ICDM, for example, targeting to address hypertension and diabetes together.

In addition, in keeping with previous literature, we found tobacco users to have a higher probability of discordant MM which included lung disease, asthma, arthritis and angina, compared with non-tobacco users.36–38 For example, in a study by Fonda et al aimed at examining the clustering of post-traumatic stress disorder, depressive disorders and clinically significant pain among 433 deployed veterans in Boston (USA) aged 18–65 years, tobacco smokers had 3.5 increased likelihood for MM.39

The findings from this study should be viewed in light of some limitations. First, since the current study design is cross-sectional in nature, we could not determine the direction of the association or causality. Second, data on most of the chronic diseases, and many behavioural variables (including tobacco use), were based on self-report, and can thus be affected by possible recall bias and social desirability bias. In addition, the definitions of alcohol use and tobacco use in our study were broad and do not capture the quantities and frequency of consumption, potentially explaining the lack of association found. Furthermore, the LCA combined participants without NCDs with those with mostly one NCD in the minimal MM risk group, thereby limiting the use of participants with no MM as the reference group. In addition, the LCA procedure was explorative in nature. Explorative LCA makes no priori assumptions about the number of latent classes and estimated starting with a two-class model and increasing the number of latent classes in a stepwise fashion. As such, when different criteria to determine the classes are used, researchers may argue in favour of different numbers of classes. Finally, the number of diseases included in this analysis was limited to those included in the SAGE Study. This may miss other conditions present in this population, such as dementia or cancers, and therefore have resulted in an underestimation of MM prevalence. However, our prevalence data for MM are similar overall to previous SAGE recent data, and a number of studies have also analysed MM using a smaller number of diseases, usually less than 10, due to data collection limitations in LMICs such as lack of electronic health/medical records.30

In conclusion, this study identified three latent classes namely: minimal MM risk, concordant MM and discordant MM. Review of the South Africa literature highlights that the PHC system under the ICDM model remains single-disease focused on the treatment of patients. In improving PHC in South Africa, efforts should be made to manage multiple conditions concurrently at PHC centres, in particular diabetes and hypertension. In addition, in our sample, risk factors for MM latent classes include age, sex and tobacco use. Future efforts should focus on the inclusion of all frequently occurring common conditions, including infectious diseases to evaluate clustering patterns and inform policymakers to prioritise the older population, women and tobacco users in prevention programmes.


GC has had support from the Developing Excellence in Leadership, Training and Science (DELTAS) Africa Initiative. The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences’ (AAS) Alliance for Accelerating Excellence in Science in Africa (AESA) and supported by the New Partnership for Africa’s Development Planning and Coordinating Agency (NEPAD Agency) with funding from the Wellcome Trust (grant 107754/Z/15/Z- DELTAS Africa Sub-Saharan Africa Consortium for Advanced Biostatistics (SSACAB) programme) and the UK government. The authors would also like to thank Dr Stephen Rule, Dr Robin Richards and Mr Godfrey Dlulane of Outsourced Insight who were subcontracted to conduct the surveys and coordinate data collection for WHO-SAGE within South Africa. DPHRU acknowledge the support of the South African Medical Research Council. LJW is supported by the South African DSI-NRF Centre of Excellence in Human Development.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors PK designed the research. GC and BC performed analyses. GC, BC, LJW, IM, LKM, NM and PK wrote the paper. GC had primary responsibility for final content. All authors read and approved the final manuscript.

  • Funding The WHO-SAGE multicountry study is supported by the WHO and the Division of Behavioral and Social Research (BSR) at the National Institute on Aging (NIA), US National Institutes of Health, through Interagency Agreements (OGHA 04034785; YA1323-08-CN-0020; Y1-AG-1005-01) with WHO, a Research Project Grant R01AG034479, and in-kind support from the South Africa Department of Health.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval SAGE received approval from the WHO’s Ethical Review Committee and the respective committees in each participating country. Written informed consent was obtained from all study participants. For this secondary data analysis, ethical clearance was obtained from the University of the Witwatersrand.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available in a public, open access repository. Data are available upon reasonable request. The WHO Study on global AGEing and adult health (SAGE) data are available upon request at:

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.