Article Text


Impact of socioeconomic and risk factors on cardiovascular disease and type II diabetes in Australia: comparison of results from longitudinal and cross-sectional designs
  1. Jinjing Li1,
  2. Yohannes Kinfu2,3
  1. 1National Economic and Social Modelling Centre, Institute of Governance and Policy Analysis, University of Canberra, Canberra, Australian Capital Territory, Australia
  2. 2Centre for Research and Action in Public Health, Health Research Institute, University of Canberra, Canberra, Australian Capital Territory, Australia
  3. 3School of Demography, The Australian National University, Canberra, Australian Capital Territory, Australia
  1. Correspondence to Dr Yohannes Kinfu; yohannes.kinfu{at}


Objective Existing large-scale studies do not take into account comorbidity or control for selection and endogeneity biases. This study addresses these shortcomings.

Participants We use information on individuals aged between 35 and 70 years from a nationally representative longitudinal survey conducted in Australia between 2001 and 2013. Participants were approached annually, and updates on their characteristics, including health status, were ascertained through self-reporting.

Method We develop three different analytical designs. The first model is a cross-sectional analysis against which our improved models are compared. In the second model, we follow the same approach but control for prior health conditions. The last preferred model additionally adjusts for characteristics and risk profile of respondents prior to onset of conditions. It also allows for comorbidity and controls for selection bias.

Results Once comorbidity and changes over time in the participant's characteristics are controlled for, body mass index (BMI), alcohol consumption and physical activity exhibit a stronger impact than in the models without these controls. A unit increase in BMI increases the risk of developing a cardiovascular disease (CVD) condition within 2 years by 1.3 percentage points (β=0.11, 95% CI 0.05 to 0.16) and regular alcohol intake increases the risk of CVD by 3.0 percentage points (β=0.24, 95% CI 0.09 to 0.39). Both factors lose significance without proper control for endogenous behavioural change. We also note that frequent physical activity reduces the risk of developing diabetes by 0.9 percentage point (β=−0.40, 95% CI −0.72 to −0.07).

Conclusions Our result shows a greater importance of certain lifestyle and risk factors than was previously suggested.

Statistics from

Strengths and limitations of this study

  • The study reports results for diabetes and cardiovascular conditions, which are major causes of illness and premature mortality in Australia and across the world.

  • It employs quasi-longitudinal observational data to examine issues of reverse causation and selection biases that were often overlooked in previous studies investigating the effects of socioeconomic status and risk factors on non-communicable conditions.

  • Unlike most other previous analyses that focus on each health condition separately, we take into account comorbidity in our models.

  • The outcome variables on health status are self-reported data and not those based on biometric indicators or a doctor's examination reports.


Cardiovascular disease (CVD) and type II diabetes are currently the leading causes of disability and premature mortality in the world and are expected to remain major public health challenges in the years ahead.1 ,2 Global estimates suggest that approximately 17 million people have died of CVD in the world in 2013, a figure which is about 41% larger than the estimate for 1990 and represents about a fifth of total global deaths estimated for 2013.3 Although Australia has made substantial progress in cardiovascular health in recent years, the disease continues to cause a heavy burden on the country, with currently one in six Australians suffering from at least one cardiovascular condition.4 Each year, CVD also claims more than 50 000 lives, which is higher than for any other disease category.4 A similar trend is observed with respect to type II diabetes, whose rates of prevalence in the country have quadrupled since the 1980s.5 ,6

The huge burden of CVD and type II diabetes and the need to address their impact on people's health have generated worldwide research interest in identifying the factors associated with the disorders.7 ,8 Many of these studies draw on prospective data, often but not always, collected in clinical or controlled settings.9–11 Follow-up studies monitor disease progression, risk profile and characteristics of patients on an ongoing basis, which provide unique opportunities for understanding the relationship between one's health status and these time-varying indicators. However, such studies are expensive to carry out, as they require substantial resources and a complex research infrastructure that are in short supply. The fact that they are often localised and based on a relatively small sample size means that generalisation beyond the immediate reference population has always been a concern.12 ,13 Few among these studies also track progression of multiple disorders, which means that comorbidity can sometimes be ignored despite its importance.14 ,15

Cross-sectional population surveys are therefore sometimes used as the primary data source as they have generally a larger sample size, cover more health disorders and are relatively less expensive to conduct.16–18 However, a number of limitations are associated with cross-sectional study designs.19 First, the population that is identified in surveys as having experienced the condition of interest—and used for estimating prevalence rates—represents only the survivors of those who experienced the event before the survey, resulting in what is generally known as selection bias. Second, since some respondents are likely to experience the event after the arbitrary date of the survey, the same population also represents only a subset of those who will eventually succumb to the disease in their lifetime. Third, the absence of information on disease progression and characteristics of respondents before and after diagnosis means that reported health conditions can only be linked with risk profiles and characteristics reported at the time of the survey rather than those that prevailed at the time of experiencing the event itself. This, in turn, requires a strong assumption on how well contemporary variables can approximate the characteristics of the individual at the time of the occurrence of the event of interest. For example, some characteristics of respondents—such as alcohol consumption, smoking, physical activity and labour force status—may even be shaped by a respondent's health condition rather than the other way round, further creating what is known as endogeneity bias in such analyses.

Ideally, a study on the effects of respondents’ characteristics on health conditions would require a large representative sample of healthy respondents clinically checked at regular intervals for a range of health conditions and a record of their characteristics and risk profile during each of these encounters. In Australia, the AusDiab survey20 and the 45 Up study21 provide data that are close to the ideal situation. However, the AusDiab survey is conducted at intervals of 5 years, potentially missing out on important lifestyle events and characteristic changes that happen in between the follow-up surveys. On the other hand, the 45 UP study is confined to a single jurisdiction, New South Wales, of the eight in the country, which makes it unrepresentative of the nationwide situation. Most other studies in the country, as a result, use data from national health surveys, but such data while being statistically representative remain cross-sectional in nature and hence subject to the biases discussed earlier.

Given the limitations of existing studies, this paper therefore aims to revisit the link between risk factors and the development of CVD and type II diabetes by making use of nationally representative longitudinal data from the Australian Household, Income and Labour Dynamics survey (HILDA). Most importantly, the paper employs a range of statistical designs to these data to address some of the key shortcomings associated with studies based on large-scale cross-sectional surveys or follow-up data collected in controlled settings.


Data source

This study uses the HILDA, which is a nationally representative longitudinal survey that has been carried out in Australia since the early 2000s. The unique nature of the HILDA survey emanates from the fact that it combines the advantages of both a follow-up study and population-level surveys and contains information on respondents’ characteristics, as well as a range of health topics and associated risk profiles collected on an annual basis. For example, just as with population-level surveys, the data are nationally representative, cover a relatively large sample and have information on multiple health conditions. Similarly, just like in clinically based follow-up studies, patients are observed at regular intervals, providing an opportunity to examine a range of time-varying characteristics of respondents, including their health conditions.

Although HILDA occupies a unique position in the Australian data collection landscape, its design resembles other large-scale household panel surveys elsewhere in the developed world, such as the German Socio-Economic Panel (GSOEP) Survey and the British Household Panel Survey (BHPS).22 ,23 Traditionally, the survey has been primarily used for economic research, but it is now increasingly also used for demographic and health-related studies.24–27 The initial sample of households in the HILDA survey was drawn in 2001, using a multistage sampling technique.23 It included all members of households where at least one person provided an interview in the initial wave. These individuals were then followed annually. In addition, the sample was gradually extended to newly arrived migrants as well as new household members resulting from changes in the composition of the original households.

With respect to content, HILDA covers extensive information such as age and sex of respondents; living arrangement, family background and relationships; educational attainment and occupation; income and expenditure patterns as well as health status and health service utilisation. Relevant health information obtained by way of self-reporting was included in the HILDA survey in a systematic fashion for the first time in wave 7, and repeated again in wave 9. The data from these waves allow us to identify respondents with type II diabetes and three types of cardiovascular and circulatory-related conditions, namely high-blood pressure, coronary heart disease and other cardiovascular and circulatory conditions. In the analysis, we combine all the responses associated with cardiovascular and heart conditions into a single category known as CVD. We also restrict the analysis to the population aged 35–70 years, given that risks associated with these conditions began to increase rapidly among the Australian population from around age 35 years.28 We cap the age at 70 years both because of differences in general health conditions in old age and the limited number of observations. The final data set for replicating the cross-sectional study design constitutes all respondents aged 35–70 years, while only those who were free from CVD and type II diabetes in the earlier wave were used in the other models.

Analytical approach

The availability of extensive information on respondents’ health status and other individual characteristics at different time points in HILDA makes it possible to compare results from longitudinal and cross-sectional designs and control for comorbidity and time-varying characteristics of respondents. Suppose Embedded Image represents the presence of health condition j in individual i at time t, a basic cross-sectional model for the prevalence of disease j can be expressed as follows:Embedded Image 1where X is a vector that represents respondents’ socioeconomic status, lifestyle conditions and other key risk factors and βj is the coefficient vector for disease j. Embedded Image represents the error term that is assumed to follow a standard normal distribution for all J. Embedded Image is a binary outcome variable and Embedded Image denotes the standard normal cumulative distribution function.

The model above, however, does not take into account the timing of events (ie, the time a respondent developed the condition) or their characteristics and risk profile before or at the time of being afflicted with the disease, which makes interpretation of observed relationships less straightforward. Given the availability of data at multiple time points in the HILDA survey, we can correct for the bias by restricting the analysis only to individuals who reported being healthy in the early wave of the survey. This design thus allows us to analyse the association between disease progression and risk factors that is not usually possible in conventional cross-sectional population-level surveys. Algebraically, such a model that corrects for prior health status can be described as,Embedded Image 2

While the restriction on prior health condition represents an improvement over the conventional cross-sectional design and allows us to have a clearer interpretation of the role of observed lifestyle and risk profile of respondents on the diseases of interest, such a model is not without problems in that it still suffers from endogeneity bias. One example is when there are respondents who quit smoking, begin regular physical exercise or change their employment status following a diagnosis; this can lead to a finding that less smoking and more exercise are associated with having a condition, which is contrary to both expectations and the actual sequence of events. To address such causation and directional issues, we design a ‘lag’ model that links prediagnoses risk profile and characteristics of respondents with outcomes at a later period rather than use those characteristics and profiles observed at the end point. The approach is consistent with disease progression29 and avoids the strict assumptions imposed by some of the alternative approaches such as the one that uses instrumental variables.30

One of the key requirements of the proposed lagged design is the availability of a longitudinal data set for estimation. However, longitudinal data sets or follow-up studies are characterised by relatively high levels of non-response and attrition, which at times can be linked with key characteristics of respondents including their health status. Therefore, to cater to such possibilities, our final preferred model explicitly controls for selection bias resulting from missing observations owing to attritions. This is carried out by specifying a probit selection model,31 ,32 where the selection equation and the health status equations are estimated jointly and the correlations between the error terms are directly factored in. If we use yi to denote whether the individual remains in the sample in the follow-up wave, our approach can be summarised as follows:

Embedded Image 3where Embedded Image, vi∼N(0,1). Both Embedded Image and yi are binary variables and can be estimated using a probit link function. Selection bias can be detected if the correlation between Embedded Image and vi is statistically different from zero. zi represents variables that are used in the selection equation. Such selection models require the use of exclusion restrictions to obtain reliable estimates. In our case, place of residence and the number of dependent children in each age group were used for this purpose as they are expected to influence attrition but not changes in health status within the 2-year interval used in our analysis. Embedded Image is the coefficient vector for zi. The presence of data on multiple health conditions also means that we can model and study comorbidities in the study population. To do so, we use a flexible variance-covariance structure in the error terms of the two disease equations (Embedded Image).

For comparison purposes, we report the results for our proposed model (equation 3) along with the traditional prevalence-based analyses from the two cross-sectional designs. The first of these (equation 1) is a standard probit model and relates current health status with current personal characteristics of respondents. Equation 2 is also a cross-sectional data-based design but controls for prior health status.

Results and discussion

Table 1 provides basic descriptive statistics for the study population in the final model. In total, the analysis covered 3632 individuals aged 35–70 years in 2007 with valid responses in waves 7 and 9. According to the self-reported data, all of these individuals were free from cardiovascular and type II diabetes diseases in wave 7 (2007), the starting point for analysis. Of these individuals, 255 respondents reported having developed a CVD in wave 9 (2009), that is, within 2 years of observation and 34 others had developed type II diabetes during the same period (including 10 individuals who developed both conditions). On average, individuals who did not report a condition appeared to be younger, smoked less, had a lower body mass index (BMI) and tended to engage more in physical activity than their counterparts. They were also relatively better educated, more likely to hold managerial and technical positions, earned a higher income and had a partner or were in some form of marital relationship. Similarly, individuals who reported having no CVD or type II diabetes condition tended to report better health status during childhood.

Table 1

Data description (age 35–70 years in HILDA wave 7, mean value)

Results shown in table 2 reveal significant changes in the lifestyle and risk profile of respondents over the interview period. For example, after 2 years of follow-up, only 84% of respondents in the original sample maintained their BMI or were within a margin of 2.5 units (half of a typical BMI-derived weight category band) and this fell to below 73% a year later. Similar trends are evident for smoking, alcohol consumption and physical activity. The proportion of respondents who remained in the same relationship and occupational categories also declined progressively with time, which suggests that characteristics and risk profiles reported at one point in time may not approximate the characteristics of these individuals in another period, even when the intersurvey interval is reasonably short. Notwithstanding the potential differences in responses to questions over time, resulting from changes in levels of understanding and interpretation of survey questions, the findings from table 2 clearly highlight the problems associated with cross-sectional studies. Lack of information on timing of events and reliance on characteristics reported at the time of survey to predict events that occurred at an unknown period before the survey date can be misleading.

Table 2

Percentage of respondents aged 35–70 years who reported identical patterns in the follow-up rounds

Figure 1 shows age-specific prevalence and incidence rates for CVD and type II diabetes with their associated CIs using local polynomial smoothing. The prevalence rates for both conditions resemble published results based on the Australian Health Survey (AHS).33 As expected for both conditions, prevalence rates increased with age of respondents and were higher for CVD than for type II diabetes. By age 50 years, over a third of the population suffered from a lifetime risk of CVD and about 1 in 20 also reported suffering from type II diabetes. Incidence rates also rose with age for CVD, but for type II diabetes they did so only up to the late 60s and declined thereafter.

Figure 1

Smoothed prevalence and incidence rate of CVD and type II diabetes in HILDA (wave 9). The rates are smoothed using local polynomial regression estimators with a bandwidth of 2.5. 95% CI is shown in shaded colours. CVD, cardiovascular disease; HILDA, Australian Household, Income and Labour Dynamics survey.

Table 3 shows the probabilities of transition into the four different health states (healthy; with CVD condition only; with type II diabetes condition only; with both conditions) from a previously healthy state. As can be seen, about 7–8% of the respondents, who reported health status in both waves and were healthy at the beginning of the observation period, became diabetic or developed at least one CVD condition 2 years later. The transition probabilities for both genders followed a similar pattern. For both men and women, the probability of developing type II diabetes was lower than the probability of developing CVD conditions, as expected.

Table 3

Raw transition probabilities within 2 years for a healthy individual in HILDA between 2007 and 2009

Tables 4 and 5 present results of our regression models for CVD and type II diabetes, respectively. These tables contain three statistical designs, namely results from a cross-sectional approach, a model based on cross-sectional data but estimated on the condition of no prior health condition, and a final model that controls for prior health status and corrects for endogeneity biases. The cross-sectional models were based on wave 9, while for the final model both waves 7 and 9 were used. As stated earlier, in the final model, in addition to estimating the risks of developing CVD and type II diabetes jointly, we also tested the results for the effects of attrition and selection biases.

Table 4

CVD estimations

Table 5

Type II diabetes estimation

The analyses suggest notable differences in patterns and the strength of relationships between risk factors and CVD, depending on the strands of the statistical design used for analyses. BMI and a higher education level were consistently and significantly associated with CVD conditions in all three models. However, the strength and significance of the relationships observed for these variables varied among the models. For example, while BMI was significant across all the three models for CVD, its impact was stronger in the model that controlled for endogeneity biases and weakest in the models with a cross-sectional design. Specifically, the point estimates for the final preferred model were more than two SEs higher than those for the cross-sectional estimates. Results for the conditional cross-sectional model were somewhere in between, but closer to the estimates from the conventional cross-sectional model, which suggests that the effects of endogeneity biases are not negligible.

Similar patterns were apparent for alcohol consumption where its effect was found to be insignificant for the standard cross-sectional model. The stronger effect of these factors for the endogeneity-corrected model may suggest that patients diagnosed with CVD conditions were more likely to control their alcohol consumption and physical weight through exercise and diet, a finding that is consistent with a number of earlier clinical trial studies.34

On the other hand, once the effects of comorbidity and endogeneity biases were corrected for, occupation and smoking no longer predicted the chance of developing CVD as shown in model 3. Income, education and migration status all showed a stronger impact in the preferred model than the standard cross-sectional (model 1) or the cross-sectional model estimated on condition of no prior health condition (model 2). Additionally, in two of the three models, respondents who reported childhood health problems were more likely to suffer from CVD. In all the three models for CVD, females tended to have a higher risk of developing the condition than their male counterparts, but the difference between them was statistically significant only in the final model, which corrects for endogeneity and selectivity biases.

Interestingly, physical activity did not seem to be an important predictor of risk of developing a CVD condition, although the sign of the coefficient for the variable exhibited the expected adverse impact. However, once the comorbidity and endogeneity biases were corrected for, physical inactivity was significantly associated with the development of type II diabetes. This suggests that the effect of physical activity, as is the case with BMI and alcohol consumption for CVD, may have been underestimated in cross-sectional studies owing to the potential dose–response behaviour. This is consistent with previous medical research on the mechanism and pathways of influence of increased physical activity on risk of diabetes.35 ,36

Certain socioeconomic factors such as employment status and occupation that are usually found to be associated with CVD or type II diabetes in cross-sectional studies showed a similar impact in our cross-sectional designs. However, the findings that these same variables did not have any statistically significant coefficient in the preferred final model seems to suggest that the observed correlation might as well be a result of endogeneity-related issues,37 ,38 rather than of any true effect of employment on CVD. This is in line with some of the clinical studies,39 ,40 where education, instead of occupation, was found to be a more significant factor. Additionally, the finding suggests that immigrants with a non-English speaking background have a higher risk of developing both CVD and type II diabetes than Australian-born respondents, a finding consistent with some of the earlier research on Australian immigrants.41 ,42

Moreover, the analysis suggests a high and significant correlation between CVD status and the incidence of type 2 diabetes in the population, even after controlling for socioeconomic status and lifestyle factors. The correlation coefficient between the error terms of the two disease equations is positive and statistically significant (coefficient 0.27, p value 0.01), which means that those who developed either of the conditions are more likely to develop the other condition as well. In the preferred model, we also tested for selection-associated or attrition-associated biases but found no supporting evidence. The p value for the correlation between the error terms in the selection equation and the disease equations (see equation 3) were well above the 0.10 threshold.


CVD and type II diabetes are two of the major public health challenges of our time, and both conditions are expected to assume even greater significance in future, given the ongoing demographic transformation in Australia and around the world. In this population-level study, we followed a panel of nationally representative individuals aged 35 years or over with no prior type II diabetes or CVD condition in the initial wave under consideration and re-examined the socioeconomic and risk factors associated with developing these conditions; we also compared the results from cross-sectional study designs. Specifically, we controlled for selection-related and attrition-related biases and related the outcome variables to prediagnosis behaviour and characteristics rather than to those that prevailed at the time of enquiry. We then compared the results with standard cross-sectional based models to gauge the level of endogeneity-related biases resulting from adopting the latter approach.

Our findings reaffirm some of the findings of earlier studies while at the same time revealing some differences with results from cross-sectional studies. Our estimations suggest that BMI and alcohol consumption have a much stronger association with risk of developing a CVD condition than cross-sectional estimates would suggest. Physical activity was found to have a significant effect on developing type II diabetes once we controlled for endogeneity biases.

Our analyses suggest that while physical activity and lifestyle factors appear more important in predicting the probability of developing a CVD in the endogeneity-corrected model, the impact observed with regard to socioeconomic characteristics for this model was rather lower as compared with what is generally reported in cross-sectional studies. For example, employment status and occupation types that are usually found to be associated with CVD or type II diabetes in cross-sectional studies appear to have no impact once the model is corrected for endogeneity biases.

A number of results relevant to policy emerge from our analysis. Given the importance of BMI for CVD prevention and the high levels of population-wide BMI in Australia, it is high time that health intervention programmes pay greater attention to the obesity epidemic in the country. The total annual direct cost of overweight and obesity in Australia in 2005 was $21 billion,43 and is likely to be even greater for recent years, given the current trend of obesity in the country. The stronger link between BMI and CVD in this paper therefore suggests that surgical options in the case of excessive obesity44 appear to be more warranted than was previously thought. Along with medical interventions, efforts to promote healthy lifestyles (such as those related to diet, smoking and physical exercise) also need strengthening in the country. The findings on the importance of physical activity for type II diabetes and smoking for CVD are particularly important and need to be considered in designing future intervention strategies. Further, a targeted screening process for type II diabetes, especially among the country's foreign-born population, could help practitioners identify early-stage patients, given the positive and significant link between migration status and newly developed type II diabetes.

Finally, a few caveats around the research are in order. One of the major limitations of our analysis relates to the fact that the health status variables used in the study were not based on biomarkers but were obtained by way of self-reporting. Although it is possible that part of this limitation could be offset by the potential correlations between biomarkers and lifestyle factors, of which our data have an extensive collection, self-reported data may contain measurement errors that warrant further investigation as and when such data become available. In view of this and the increasing burden of non-communicable conditions, there appears to be an opportunity for stakeholders of the HILDA survey to introduce biomarkers for selected health conditions, possibly to a limited subsample of the survey. However, given that HILDA is not meant to be a health survey and collects only a limited range of information, future health research in Australia should also look into ways of introducing large-scale nationally representative panel surveys focusing on major health issues in the country. This can be achieved either by extending the 45 UP study to other jurisdictions (beyond New South Wales) or by widening the scope of the AusDiab survey to include other major non-communicable conditions as well as by shortening the interval between rounds for the AusDiab survey from 5 to 3 years. Meanwhile, while analysing existing longitudinal clinical data sets, further research into the socioeconomic and risk factor effects on CVD and diabetes in Australia and elsewhere may consider applying our preferred model to correct for endogeneity and selectivity biases.


This paper uses unit record data from the Household, Income and Labour Dynamics in Australia (HILDA) Survey. The HILDA Project was initiated and is funded by the Australian Government Department of Social Services (DSS) and is managed by the Melbourne Institute of Applied Economic and Social Research (Melbourne Institute). The findings and views reported in this paper, however, are those of the authors and should not be attributed to either DSS or the Melbourne Institute. The authors are grateful for the valuable comments and feedback from Tom Cochrane.


View Abstract


  • Contributors YK conceptualised the study, the design and approach. JL prepared the data and undertook the analysis. Both authors were equally responsible for drafting the manuscript.

  • Funding This work was supported by The National Heart Foundation (NHF) Special Grant to the Centre for Research and Action in Public Health, University of Canberra. However, the findings and views reported in this paper remain those of the authors and do not necessarily reflect the views of either the NHF or its staff.

  • Competing interests None declared.

  • Ethics approval This work reports the secondary analysis of a non-identifiable data set. Ethical approval for the collection and analysis of HILDA has been provided by the Human Ethics Advisory Committee at the University of Melbourne, Australia.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.