When can group level clustering be ignored? Multilevel models versus single-level models with sparse data

P Clarke

doi:10.1136/jech.2007.060798

Article Text

PDF

Theory and methods

When can group level clustering be ignored? Multilevel models versus single-level models with sparse data

P Clarke

Dr P Clarke, Institute for Social Research, University of Michigan, 426 Thompson Street, Ann Arbor, MI 48106-1248, USA; pjclarke{at}umich.edu

Abstract

Objective: The use of multilevel modelling with data from population-based surveys is often limited by the small number of cases per level-2 unit, prompting many researchers to use single-level techniques such as ordinary least squares regression.

Design: Monte Carlo simulations are used to investigate the effects of data sparseness on the validity of parameter estimates in two-level versus single-level models.

Setting: Both linear and non-linear hierarchical models are simulated in order to examine potential differences in the effects of small group size across continuous and discrete outcomes. Results are then compared with those obtained using disaggregated techniques (ordinary least squares and logistic regression).

Main results: At the extremes of data sparseness (two observations per group), the group level variance components are overestimated in the two-level models. But with an average of only five observations per group, valid and reliable estimates of all parameters can be obtained when using a two-level model with either a continuous or a discrete outcome. In contrast, researchers run the risk of Type I error (standard errors biased downwards) when using single-level models even when there are as few as two observations per group on average. Bias is magnified when modelling discrete outcomes.

Conclusions: Multilevel models can be reliably estimated with an average of only five observations per group. Disaggregated techniques carry an increased risk of Type I error, even in situations where there is only limited clustering in the data.

https://doi.org/10.1136/jech.2007.060798

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Population health outcomes are shaped by complex interactions between individuals and the diverse social and environmental contexts (eg neighbourhoods, schools and regions) in which they are situated over the life course.1^–5 The recent increase in the use of multilevel models to examine associations between group level characteristics and a wide range of individual health indicators6^–9 attest to their value as a statistical method for analysing grouped or clustered data. But the use of multilevel modelling with data from population-based surveys is often limited by data sparseness: a small number of cases per level-2 unit.

Although large-scale surveys make it relatively easy to achieve a large number of groups, there are often very few individuals per group. For example, the National Longitudinal Study of Adolescent Health (Add Health) is a school-based survey of health behaviours in American adolescents.10 However, because schools (not neighbourhoods) were the sampling frame, there is considerable sparseness in the data for those interested in examining “neighbourhood” effects on health. In the first wave of the survey, 16 683 students are nested in 2276 census tracts (a typical geographic area used to approximate neighbourhoods), yielding an average of 7.33 subjects per tract. Owing to subject attrition and residential mobility, sparseness only increases over surveys with longitudinal follow-up.

When there is a high level of clustering within groups (ie large group size), it is well known that disaggregation of the data by using ordinary least squares (OLS) regression analysis leads to an elevated risk of Type I error (concluding that there are significant effects when in fact these effects may have occurred by chance).11 By pretending that the observations are independent, the standard errors of the regression coefficients are biased downwards, generating artificially narrow CIs. Multilevel models appropriately partition within-group and between-group effects so that a high level of clustering within groups is statistically accounted for. But little is known about the lower threshold at which data sparseness renders multilevel models unreliable or even unnecessary. There are various rules of thumb stated in the literature, often in the range of 15 to 30 per group.12^–14 But the wide range of numbers given in these recommendations reflects the fact that very little research has been done to explicitly test the minimum level of clustering necessary for valid and reliable estimates in multilevel models.

Simulations designed to assess the data sparseness problem are beginning to appear in the literature, and results suggest that the number of groups is more important for unbiased and efficient estimates than the number of observations per group.11 15^–18 This is reassuring since a shortage of groups is rarely a problem in population-based data. Yet researchers continue to be concerned with small group sizes when examining contextual effects on health, and have adopted various strategies to deal with data sparseness. A common strategy is to ignore the hierarchical structure of the data altogether by using OLS regression techniques.19^–21 A related strategy involves the use of generalised estimating equations (GEEs),7 22 which essentially bypasses the sparseness problem by treating the group level variance as a “nuisance factor” that is adjusted for in the analysis but not explicitly investigated.

However, without quantifying the random components at level 2, both GEE and OLS approaches essentially decontextualize the importance of the fixed effect parameters.9 14 23 24 It is well recognised that it is possible to find large significant fixed effects (at level 2) in conjunction with trivial between-group variation.9 23 25 By ignoring the group level random effects the pitfalls of using OLS or GEEs are considerable, particularly from a health policy standpoint. For example, a study may show a significant mortality risk for those living in socioeconomically disadvantaged neighbourhoods. But an intervention targeted towards disadvantaged neighbourhoods may be misplaced if trivial between-group variance in mortality indicates that they are not dissimilar from other apparently advantaged neighbourhoods.

The neighbourhood literature has also witnessed a recent trend where cluster analysis techniques are used to reduce data sparseness.8 26^–29 Respondents are grouped together into larger “synthetic” neighbourhoods, yielding a larger number of cases per level-2 unit. Yet, the effects of such clustering techniques on the accuracy of model parameters have not been widely considered. In recent work30 we showed that clustering strategies introduce artificial within-group heterogeneity by grouping together individuals who may differ on a host of unobserved characteristics. As a consequence, the balance of within- to between-group variance is upset, and clustering has the ironic consequence of reducing the variance between neighbourhoods.29 30

Although concerns about data sparseness abound, researchers appear to be unconcerned with the consequences of using alternative statistical techniques that ignore the hierarchical structure of the data, disregard the variance components or artificially manipulate the contextual grouping in the data. In earlier work30 we examined the effect of various group sizes (gs = 2, 5, 10, 20), number of groups (ng = 50, 100, 200) and strength of group-level dependency (intraclass correlation coefficient (ICC) = 0.1, 0.2, 0.3) on bias and efficiency in two-level linear models. We found no evidence of bias in the estimates of the fixed effects across conditions, but sampling variability increased as group size decreased. We did, however, find evidence of upwards bias in the group level variance components in conditions with small group size (gs⩽2) and a small number of groups (ng = 50). This bias disappeared when the number of groups reached 200, even with only two observations per group.

The purpose of this article is to expand on this work by using Monte Carlo simulations to compare the validity and efficiency of estimates from two-level and single-level linear and non-linear models when there is minimal clustering in groups. The results show that more sparseness can be tolerated in multilevel models than is generally assumed. Disaggregated techniques carry an increased risk of Type I error, even in situations when there is only limited clustering in the data.

THE HIERARCHICAL GENERALISED LINEAR MODEL

In general, the two-level model can be conceptualised as a hierarchical system of regression equations within J contextual groups, with N_j individuals in each level-2 group.31 At the individual level (level 1) there are separate regression equations for each group. For linear models the identity link function regresses the dependent variable Y_ij on a linear predictor set of one (or more) independent variables X_ij, with normally distributed residuals (e_ij) having a mean of 0 and variance σ²:

Y_ij = β_oj+β_1jX_ij+e_ij.(1.1)

For non-linear models, various link functions linearise an underlying non-linear predictor component. For the case of a binary outcome with a binomial error distribution, the logit link function is used to regress the log odds of the response probability, or proportion π_ij, on a linear predictor set of independent variables:

For both the linear and non-linear models these level-1 coefficients can then be modelled by explanatory variables at the contextual level-2 (eg neighbourhood poverty):

β_0j = γ₀₀+γ₀₁Z_j+u_0j(2)

β_1j = γ₁₀+γ₁₁Z_j+u_1j(3)

By substituting equations (2) and (3) into equations (1.1) and (1.2) and rearranging terms, we get the full two-level linear model:

Y_ij = γ₀₀+γ₁₀X_ij+γ₀₁Z_j+γ₁₁Z_jX_ij+u_0j+u_1jX_ij+e_ij(4.1)

and the full two-level logistic model:

Log it(π_ij) = γ₀₀+γ₁₀X_ij+γ₀₁Z_j+γ₁₁Z_jX_ij+u_0j+u_1jX_ij.(4.2)

In both models, u_0j represents group level variability around the intercept, which is assumed to be normally distributed with a mean of 0 and variance τ₀₀, and u_1j represents group level variability around the regression slope, which is assumed to be normally distributed with mean of 0 and variance τ₁₁. The covariance between the group level variance terms u_0j and u_1j is τ₀₁, which is generally assumed to be greater than or less than zero. All residual errors at the group level are assumed to be independent from the individual level within-group residuals (e_ij). For the binomial error distribution (equation 4.2), the level-1 error variance is a function of the population proportion (σ² = (π_ij/(1–π_ij))) and is not estimated separately (by using a scale factor of 1).32

If there are no explanatory variables at levels 1 or 2, equations (4.1) and (4.2) reduce to:

Y_ij = γ₀₀+u_0j+e_ij(5.1)

Log it(π_ij) = γ₀₀+u_0j,(5.2)

which are the fully unconditional, or one-way ANOVA, models for the linear and logistic case, respectively. Partitioning the variance components yields a useful statistic, the intraclass correlation coefficient (ICC), which measures the proportion of variance in the outcome that is accounted for by the group level.31 For the linear model the ICC is defined as:

Since the binomial distribution for the logistic link function with a scale factor of 1 implies a level-1 variance of π2/3 32, the ICC for the non-linear model is defined as:

For full models (with covariates), a conditional ICC can be calculated based on an adjusted value of τ₀₀, representing the degree of dependence among observations within groups at a given value on the covariates.31

METHODS

Simulation procedure

A Monte Carlo simulation33 is conducted with a two-level model. For parsimony, the number of groups is held constant at 200 groups, and group size is varied at 2, 5, 10 and 20 observations per group. The number of groups is chosen to represent the larger number of groups typically found in population-based survey data and the group sizes capture the extremes of data sparseness as well as the larger group sizes typically tested in simulations.16 18 Following the simulation procedures used in the existing literature,16 1000 simulated data sets are generated for each of the four conditions, for the linear and nonlinear models.

The parameters for the simulation were set according to the full two-level models (equations 4.1 and 4.2). The intercept (γ₀₀) is set to 1.00, and the fixed effect coefficients (γ₁₀, γ₀₁, γ₁₁) to 0.3, representing a medium effect size.34 A set of X and Z values are randomly generated from a multivariate normal distribution. Following Maas and Hox16 the residual variance at level 1 (σ²) is fixed to 0.5 in the linear model. (The level-1 error variance in the non-linear model is not estimated separately when using a scale factor of 1.)

The population values of the level-2 variance components are derived according to the formula for a conditional ICC value of 0.1 (capturing the lower values of group level clustering typically seen with population-based survey data adjusting for covariates35 36). Thus, the population value of the level-2 intercept variance (τ₀₀) is set to 0.056 for the linear model and 0.366 for the non-linear model, based on equations 6.1 and 6.2, respectively. For simplicity, τ₀₀ and τ₁₁ are constrained to be equal (following Maas and Hox16). The level-2 covariance (τ₀₁) is initially set to zero in the simulations, and then a negative covariance (τ₀₁ = −0.02) is introduced to examine the effect on the parameter estimates. The value of the covariance was chosen to capture the degree of covariance between random intercept and slope parameters often found in contextual analyses with large-scale survey data.31 (No difference was found in the results between a negative and a positive covariance; so I report the results with the negative covariance only.)

Based on these parameters, values of Y_ij are generated for each of the simulated data sets (Y is continuous for the linear model, whereas for the non-linear model Y can take values of 0 or 1), and the effects of the four different conditions (group size = 2, 5, 10, 20) on the estimated parameter values are examined for the linear and non-linear models, and also in the face of a negative covariance. Reproducible streams of random numbers were generated in the simulations in order to maintain comparability across models. The effects of unbalanced data are also investigated by randomly sampling 60% of the observations from the simulated data to create inconsistent group sizes. For the unbalanced data the average group sizes are 1.4 (range 1–2 per group), 2.9 (range 1–5 per group), 5.9 (range 2–9 per group) and 12.0 (range 6–18 per group).

All simulations are conducted using Mplus Version 4.21.37 Models are estimated using the pseudo-maximum likelihood estimation for general multilevel modelling and the standard errors are computed using a robust sandwich estimator. Non-linear models are estimated using a numerical integration algorithm. Standard OLS regression was used to estimate disaggregated linear models, and logistic regression was used to estimate disaggregated binary models.

Statistical analysis

The effects of small group size are examined in terms of bias for all parameter estimates and their standard errors. Bias is assessed by examining whether the mean of the sampling distribution of estimates under each condition centres on the true value. If θ ∧ is the sample estimate of the population parameter θ, then bias = (E(θ ∧)–θ)/θ. Bias exceeding 10% for any parameter is generally considered to be meaningful.38 The precision or efficiency of the estimates is ascertained by examining the sampling distribution of standard errors for each parameter under each simulated condition. The standard deviation of the parameter estimates in the simulations is an indicator of the population standard error when the number of replications is large.38 This is compared with the average of the estimated standard errors for each parameter estimate in the simulations, and large standard errors (wide CIs) indicate decreased efficiency.

RESULTS

All two-level models for each of the 1000 simulated data sets under each of the four conditions converged successfully when there were at least five observations per group and balanced data. However, model performance was compromised in the face of very small group size and unequal group size (unbalanced data). For the linear two-level model with group sizes of two (balanced data) only 86.7% of the models successfully converged. With unbalanced data and very small group sizes (N⩽2) only 69.3% converged. Model performance was not problematic with the non-linear two-level models in the face of sparse data.

Table 1 presents the fixed effects and associated standard errors for the linear model estimated both with maximum likelihood (linear two-level model), as well as with OLS in a disaggregated single-level regression. Results are presented for both balanced and unbalanced data. The true parameter values are indicated in parentheses in table 1. For each of the four simulation conditions, the fixed effects and the standard errors for the two-level linear model are estimated without bias. Even at the extremes of data sparseness (group size ⩽2), bias in the fixed effect parameter estimates is trivial (less than 1%). Although the fixed-effect coefficients are estimated with decreased precision in the face of unbalanced data (wider confidence intervals), there is no evidence of bias in the standard errors.

View this table:

Table 1 Simulation results for linear models: fixed effects and standard errors*

Parameter estimates from OLS models are also unbiased (table 1). However, the standard errors of these parameter estimates are consistently underestimated with OLS. Even when clustering is marginal (⩽2 observations per group), standard errors of the regression parameters are underestimated by 10–15%. As group size increases, bias only gets worse. With five observations per group, OLS standard errors are biased downwards by 15% on average, and standard errors are underestimated by as much as 40% with 20 observations per group.

Table 2 presents the fixed effects and standard errors for the binary outcome estimated with a two-level non-linear model and a single-level logistic regression. For the non-linear two-level model the fixed effects and standard errors are generally estimated without bias across all conditions. The only exception is in the case of unbalanced data with a very small group size (⩽2) where the fixed effect coefficients are biased up by as much as 16%. When disaggregating the model to a single level logistic regression, downward bias in the standard errors is present across all simulation conditions. Even with only very limited clustering in the data (⩽2 observations per group), the standard errors of the logistic regression coefficients are biased downwards by 25% on average. There is also evidence that the fixed effect for the group level parameter (γ₀₁) is consistently underestimated by about 20% across all group sizes when using single level logistic regression. The level-1 parameter estimate (γ₁₀) is consistently overestimated in the presence of unbalanced data.

View this table:

Table 2 Simulation results for non-linear models: fixed effects and standard errors*

Table 3 presents the variance components and standard errors for both the linear and logistic two-level models across the four simulation conditions. With very small group size (group size ⩽2), the group level variance components are over estimated by over 30% in the non-linear two-level model, with upwards bias most pronounced in the face of unbalanced data. There is no evidence of bias in the two-level linear model when the group sizes are equal, but with unbalanced data and very small group size (⩽2) the group level variance components are also overestimated by as much as 35%. There is also decreased precision (larger standard errors) in the estimate of both linear and logistic random effect coefficients with marginal group sizes. When group size equals two with balanced data the standard error for the group level intercept variance is biased upwards by over 100% in the linear model, and the standard error for the group level slope variance is overestimated by 32% in the non-linear two-level model. As a result, the power to detect significant between group variance in these effects falls below 0.3. Upwards bias in the standard errors only increases with unbalanced data and very small group size. But when average group size increases to five or more, there is no evidence of bias in the random effects or their standard errors for either the linear or logistic two-level models.

View this table:

Table 3 Simulation results for linear and non-linear two-level models: random effects and standard errors*

When introducing a negative covariance between the random intercept and slope parameters in the two-level models (with balanced data) there was no evidence of bias in the fixed effects or their standard errors (not shown). However, the group level variance components are overestimated for both the continuous and binary outcomes at very small group sizes (table 4). This upwards bias is corrected with group sizes of five or more. There was also decreased precision (larger standard errors) when estimating random coefficients in the presence of a covariance, particularly for the non-linear two-level model. Although again, efficiency increases once there are at least five observations per group.

View this table:

Table 4 Simulation results for linear and non-linear two-level models with negative covariance: random effects and standard errors*

DISCUSSION

Using Monte Carlo simulations, this paper aimed to empirically define the limits of data sparseness in two-level models so that informed analytic decisions can be made when working with sparsely clustered data. Consistent with existing research in this area,15^–18 two-level models generate unbiased estimates of the fixed effects and their standard errors, even at the extremes of data sparseness. This holds for both continuous and discrete outcomes. The only exception is for the non-linear model with unbalanced data and very small group size (<2) where the fixed effect coefficients are biased upwards.

Although earlier work30 with two-level linear models found evidence of upwards bias in the group level variance components estimated with sparse data and a small number of groups (N = 50 groups), we found no evidence of bias with a large number of groups (N = 200 groups) and balanced data. However, the group level random effects are overestimated when using two-level models with unbalanced data at the extremes of data sparseness, and the upwards bias is exacerbated when modelling binary outcomes. Moreover, the standard errors of these random effects are overestimated with small group sizes, for both continuous and discrete outcomes. As a result, the decreased precision in these estimates reduces the power to detect significant between group variance when group sizes are small. When the level of clustering increases to at least five observations per group, there was no evidence of bias in the two-level models for either the fixed or random effects or their standard errors.

If one were to ignore the clustering in the data and conduct OLS or logistic regression analysis, unbiased estimates of the fixed effects would generally be obtained. (The exception is for non-linear models where the contextual effects are consistently underestimated by about 20% for all group sizes.) However, the standard errors of all fixed effect coefficients are biased downwards when estimated with OLS or logistic regression. This is the case even when the level of clustering is marginal (group size = 2). As a result, researchers using disaggregated models with clustered data increase the risk of Type I statistical error. This underestimation is more problematic for non-linear models than for linear models.

These results highlight potential problems that are likely to occur when working with sparsely clustered data. If one were to disregard the importance of the random effects, the disaggregated approach results in artificially narrow CIs and raises the risk of Type I statistical error. On the other hand, a two-level analysis with sparsely clustered data generates excess sampling variability, decreasing the power to detect between-group variance. However, the multilevel approach still generates valid fixed parameter estimates and standard errors with two or more observations per group. But perhaps most importantly these simulation results have demonstrated that more sparseness can be tolerated than is generally assumed. Even with an average of only five observations per group, valid and reliable estimates of all parameter estimates can be obtained when using a two-level model with either a continuous or a discrete outcome. In contrast, researchers run the risk of Type I error when using disaggregated techniques even when there are as few as two observations per group on average.

In summary, alternative analytic strategies that ignore the level of clustering in the data are misdirected and not empirically based. Too many analytic decisions are incorrectly based on the assumption that multilevel models are only appropriate for densely clustered data.19 21 The results of these simulations have proven otherwise. Multilevel models are a robust analytic tool, generating valid and reliable estimates of the fixed effect parameters, even with very limited clustering in the data. Researchers should be cognizant of the pros and cons of using disaggregated techniques when working with clustered data.

What is already known on this subject

The use of multilevel modelling with data from population-based surveys is often limited by data sparseness: a small number of cases per level-2 unit.
Very little research has been done to test the general threshold at which data sparseness becomes problematic for unbiased and efficient estimates in multilevel models.

What this study adds

Results show that with an average of only five observations per group, valid and reliable estimates of all parameters can be obtained when using two-level models with either a continuous or a discrete outcome.
Researchers run the risk of Type I error (standard errors biased downwards) when using single level models even when there are as few as 2 observations per group on average.

Acknowledgments

This research was supported by the Canadian Institutes of Health Research Strategic Initiative: Population and Public Health Research Methods and Tools. Additional support for the analyses came from the University of Michigan Center for Social Inequality Mind and Body (NIH/NICHHD grant R24HD04786, Life Course Core). I thank Barb Strane for administrative support on this project.

REFERENCES

↵
1. Yen IH,
2. Syme SL
. The social environment and health: a discussion of the epidemiologic literature. Annu Rev Public Health 1999;20:287–308.
OpenUrl CrossRef PubMed Web of Science
↵
1. O’Campo P,
2. Xue X,
3. Wang M-C,
4. et al
. Neighborhood risk factors for low birthweight in Baltimore: a multilevel analysis. Am J Public Health 1997;87:1113–18.
OpenUrl CrossRef PubMed Web of Science
1. Yen IH,
2. Kaplan GA
. Neighborhood social environment and risk of death: multilevel evidence from the Alameda County Study. Am J Epidemiol 1999;149:898–907.
OpenUrl Abstract/FREE Full Text
1. Barr RG,
2. Diez Roux AV,
3. Knirsch CA,
4. et al
. Neighborhood poverty and the resurgence of tuberculosis in New York City, 1984–1992. Am J Public Health 2001;19:1487–93.
OpenUrl
↵
1. Clarke P,
2. George LK
. The role of the built environment in the disablement process. Am J Public Health 2005;95:1933–9.
OpenUrl CrossRef PubMed Web of Science
↵
1. Pickett KE,
2. Pearl M
. Multilevel analyses of neighbourhood socio-economic context and health outcomes: a critical review. J Epidemiol Community Health 2001;55:111–22.
OpenUrl Abstract/FREE Full Text
↵
1. Ahern J,
2. Pickett KE,
3. Selvin S,
4. et al
. Preterm birth among African American and white women: a multilevel analysis of socioeconomic characteristics and cigarette smoking. J Epidemiol Community Health 2003;57:606–11.
OpenUrl Abstract/FREE Full Text
↵
1. Buka SL,
2. Brennan RT,
3. Rich-Edwards JW,
4. et al
. neighborhood support and the birth weight of urban infants. Am J Epidemiol 2003;157:1–8.
OpenUrl Abstract/FREE Full Text
↵
1. Merlo J,
2. Lynch JW,
3. Yang M,
4. et al
. Effect of neighborhood social participation on individual use of hormone replacement therapy and antihypertensive medication: a multilevel analysis. Am J Epidemiol 2003;157:774–83.
OpenUrl Abstract/FREE Full Text
↵
1. Bearman PS,
2. Jones J,
3. Udry JR
. The National Longitudinal Study of Adolescent Health: Research Design. University of North Carolina Population Centre. http//www.cpc.unc.edu/addhealth (accessed 10 May 2006).
↵
1. Kreft IG,
2. Yoon B
. Are multilevel techniques necessary? An attempt at demystification. In: Annual Meeting of the American Educational Research Association New Orleans (LA), (ERIC Document Reproduction Service No. TM 021737).
↵
1. Bryk AS,
2. Raudenbush SW
. Hierarchical linear models: applications and data analysis methods. Newbury Park: Sage, 1992.
1. Kreft IG,
2. de Leeuw J
. Introducing multilevel modelling. Thousand Oaks: Sage Publications, 1998.
↵
1. Raudenbush SW,
2. Sampson RJ
. Ecometrics: toward a science of assessing ecological settings, with application to the systematic social observation of neighborhoods. Sociol Method 1999;29:1–41.
OpenUrl CrossRef
↵
1. Blasius J,
2. Hox J,
3. deLeeuw E,
4. Schmidt P
1. Maas CJM,
2. Hox JJ
. Sample sizes for multilevel modeling. In: Blasius J, Hox J, deLeeuw E, Schmidt P, eds. Social science methodology in the new millenium. Proceedings of the Fifth International Conference on Logic and Methodology. 2nd expanded edn. Opladen, RG: Leske & Budrich Verlag, 2002;1–19.
↵
1. Maas CJM,
2. Hox JJ
. The influence of violence of assumptions on multilevel parameter estimates and their standard errors. Comput Stat Data Anal 2004;46:427–40.
OpenUrl CrossRef Web of Science
1. Afshartous D
. Determination of sample size for multilevel model design. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco, CA. 1995. (Available at http://moya.bus.miami.edu/~dafshartous/asa_sample_sizes.pdf)
↵
1. Mok M
. Sample size requirements for 2-level designs in educational research. Multilevel Modelling Newsletter 1995;7:11–15.
OpenUrl
↵
1. Robert SA
. Community-level socioeconomic status effects on adult health. J Health Soc Behav 1998;39:18–37.
OpenUrl CrossRef PubMed Web of Science
1. South SJ,
2. Baumer EP
. Deciphering community and race effects on adolescent premarital childbearing. Social Forces 2000;78:1379–408.
OpenUrl Abstract/FREE Full Text
↵
1. Schieman S,
2. Pearlin LI,
3. Meersman SC
. Neighborhood disadvantage and anger among older adults: social comparisons as effect modifiers. J Health Soc Behav 2006;47:156–72.
OpenUrl Abstract/FREE Full Text
↵
1. Diez Roux A
. Multilevel analysis in public health research. Annu Rev Public Health 2000;21:171–192.
OpenUrl CrossRef PubMed Web of Science
↵
1. Merlo J
. Multilevel Analytical approaches in social epidemiology: measures of health variation compared with traditional measures of association. J Epidemiol Community Health 2003;57:550–2.
OpenUrl FREE Full Text
↵
1. Merlo J,
2. Ostergren PO,
3. Hagberg O,
4. et al
. Diastolic blood pressure and area of residence: multilevel versus ecological analysis of social inequality. J Epidemiol Community Health 2001;55:791–8.
OpenUrl Abstract/FREE Full Text
↵
1. Thornton A
1. Duncan G,
2. Raudenbush SW
. Getting context right in quantitative studies of child development. In: Thornton A, ed.The well-being of children and families Ann Arbor: The University of Michigan Press, 2001;356–83
↵
1. Beland F,
2. Birch S,
3. Stoddart G
. Unemployment and health: contextual-level influences on the production of health in populations. Soc Sci Med 2002;55:2033–52.
OpenUrl CrossRef PubMed Web of Science
1. Cutrona CE,
2. Russell DW,
3. Hessling RM,
4. et al
. Direct and moderating effects of community context on the psychological well-being of African American women. J Pers Soc Psychol 2000;79:1088–101.
OpenUrl CrossRef PubMed Web of Science
1. Hou F,
2. Chen J
. Neighborhood low income, income inequality and health in Toronto. Health Reports 2003;14:21–34.
OpenUrl PubMed
↵
1. Wheaton B,
2. Clarke P
. Space meets time: integrating temporal and contextual influences on mental health in early adulthood. Am Sociol Rev 2003;68:680–706.
OpenUrl CrossRef Web of Science
↵
1. Clarke P,
2. Wheaton B
. Addressing data sparseness in contextual population research: Using cluster analysis to create synthetic neighborhoods. Sociol Methods Res 2007;35:311–51.
OpenUrl Abstract/FREE Full Text
↵
1. Raudenbush SW,
2. Bryk AS
. Hierarchical linear models: applications and data analysis methods, 2nd edn. Thousand Oaks: Sage Publications, 2002.
↵
1. Hox J
. Multilevel analysis: techniques and applications. Mahway: Lawrence Erlbaum Associates, Publishers, 2002.
↵
1. Mooney CZ
. Monte Carlo simulation (Sage University Paper series on Quantitative Applications in the Social Sciences, series no. 07-116). Thousand Oaks: Sage, 1997.
↵
1. Cohen J
. Statistical power analysis. Hillsdale: Erlbaum, 1988.
↵
1. Gulliford MC,
2. Ukoumunne OC,
3. Chinn S
. Components of variance and intraclass correlations for the design of community-based surveys and intervention studies. Am J Epidemiol 1999;149:876–83.
OpenUrl Abstract/FREE Full Text
↵
1. Boyle MH,
2. Willms JD
. Place effects for areas defined by administrative boundaries. Am J Epidemiol 1999;149:577–85.
OpenUrl Abstract/FREE Full Text
↵
1. Muthen BO,
2. Muthen LK
. Mplus user’s guide. Los Angeles: Muthen and Muthen, 1998.
↵
1. Muthen LK,
2. Muthen BO
. How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling 2002;4:599–620.
OpenUrl CrossRef

View Abstract

Footnotes

Competing interests: None.

Linked Articles

In this issue
In this issue

Mauricio L Barreto
Journal of Epidemiology & Community Health 2008; 62 665-665 Published Online First: 11 Jul 2008.

[1] ↵
Yen IH,
Syme SL
. The social environment and health: a discussion of the epidemiologic literature. Annu Rev Public Health 1999;20:287–308.
OpenUrl CrossRef PubMed Web of Science

[2] Yen IH,

[3] Syme SL

[4] ↵
O’Campo P,
Xue X,
Wang M-C,
et al
. Neighborhood risk factors for low birthweight in Baltimore: a multilevel analysis. Am J Public Health 1997;87:1113–18.
OpenUrl CrossRef PubMed Web of Science

[5] O’Campo P,

[6] Xue X,

[7] Wang M-C,

[8] et al

[9] Yen IH,
Kaplan GA
. Neighborhood social environment and risk of death: multilevel evidence from the Alameda County Study. Am J Epidemiol 1999;149:898–907.
OpenUrl Abstract/FREE Full Text

[10] Yen IH,

[11] Kaplan GA

[12] Barr RG,
Diez Roux AV,
Knirsch CA,
et al
. Neighborhood poverty and the resurgence of tuberculosis in New York City, 1984–1992. Am J Public Health 2001;19:1487–93.
OpenUrl

[13] Barr RG,

[14] Diez Roux AV,

[15] Knirsch CA,

[16] et al

[17] ↵
Clarke P,
George LK
. The role of the built environment in the disablement process. Am J Public Health 2005;95:1933–9.
OpenUrl CrossRef PubMed Web of Science

[18] Clarke P,

[19] George LK

[20] ↵
Pickett KE,
Pearl M
. Multilevel analyses of neighbourhood socio-economic context and health outcomes: a critical review. J Epidemiol Community Health 2001;55:111–22.
OpenUrl Abstract/FREE Full Text

[21] Pickett KE,

[22] Pearl M

[23] ↵
Ahern J,
Pickett KE,
Selvin S,
et al
. Preterm birth among African American and white women: a multilevel analysis of socioeconomic characteristics and cigarette smoking. J Epidemiol Community Health 2003;57:606–11.
OpenUrl Abstract/FREE Full Text

[24] Ahern J,

[25] Pickett KE,

[26] Selvin S,

[27] et al

[28] ↵
Buka SL,
Brennan RT,
Rich-Edwards JW,
et al
. neighborhood support and the birth weight of urban infants. Am J Epidemiol 2003;157:1–8.
OpenUrl Abstract/FREE Full Text

[29] Buka SL,

[30] Brennan RT,

[31] Rich-Edwards JW,

[32] et al

[33] ↵
Merlo J,
Lynch JW,
Yang M,
et al
. Effect of neighborhood social participation on individual use of hormone replacement therapy and antihypertensive medication: a multilevel analysis. Am J Epidemiol 2003;157:774–83.
OpenUrl Abstract/FREE Full Text

[34] Merlo J,

[35] Lynch JW,

[36] Yang M,

[37] et al

[38] ↵
Bearman PS,
Jones J,
Udry JR
. The National Longitudinal Study of Adolescent Health: Research Design. University of North Carolina Population Centre. http//www.cpc.unc.edu/addhealth (accessed 10 May 2006).

[39] Bearman PS,

[40] Jones J,

[41] Udry JR

[42] ↵
Kreft IG,
Yoon B
. Are multilevel techniques necessary? An attempt at demystification. In: Annual Meeting of the American Educational Research Association New Orleans (LA), (ERIC Document Reproduction Service No. TM 021737).

[43] Kreft IG,

[44] Yoon B

[45] ↵
Bryk AS,
Raudenbush SW
. Hierarchical linear models: applications and data analysis methods. Newbury Park: Sage, 1992.

[46] Bryk AS,

[47] Raudenbush SW

[48] Kreft IG,
de Leeuw J
. Introducing multilevel modelling. Thousand Oaks: Sage Publications, 1998.

[49] Kreft IG,

[50] de Leeuw J

[51] ↵
Raudenbush SW,
Sampson RJ
. Ecometrics: toward a science of assessing ecological settings, with application to the systematic social observation of neighborhoods. Sociol Method 1999;29:1–41.
OpenUrl CrossRef

[52] Raudenbush SW,

[53] Sampson RJ

[54] ↵
Blasius J,
Hox J,
deLeeuw E,
Schmidt P
Maas CJM,
Hox JJ
. Sample sizes for multilevel modeling. In: Blasius J, Hox J, deLeeuw E, Schmidt P, eds. Social science methodology in the new millenium. Proceedings of the Fifth International Conference on Logic and Methodology. 2nd expanded edn. Opladen, RG: Leske & Budrich Verlag, 2002;1–19.

[55] Blasius J,

[56] Hox J,

[57] deLeeuw E,

[58] Schmidt P

[59] Maas CJM,

[60] Hox JJ

[61] ↵
Maas CJM,
Hox JJ
. The influence of violence of assumptions on multilevel parameter estimates and their standard errors. Comput Stat Data Anal 2004;46:427–40.
OpenUrl CrossRef Web of Science

[62] Maas CJM,

[63] Hox JJ

[64] Afshartous D
. Determination of sample size for multilevel model design. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco, CA. 1995. (Available at http://moya.bus.miami.edu/~dafshartous/asa_sample_sizes.pdf)

[65] Afshartous D

[66] ↵
Mok M
. Sample size requirements for 2-level designs in educational research. Multilevel Modelling Newsletter 1995;7:11–15.
OpenUrl

[67] Mok M

[68] ↵
Robert SA
. Community-level socioeconomic status effects on adult health. J Health Soc Behav 1998;39:18–37.
OpenUrl CrossRef PubMed Web of Science

[69] Robert SA

[70] South SJ,
Baumer EP
. Deciphering community and race effects on adolescent premarital childbearing. Social Forces 2000;78:1379–408.
OpenUrl Abstract/FREE Full Text

[71] South SJ,

[72] Baumer EP

[73] ↵
Schieman S,
Pearlin LI,
Meersman SC
. Neighborhood disadvantage and anger among older adults: social comparisons as effect modifiers. J Health Soc Behav 2006;47:156–72.
OpenUrl Abstract/FREE Full Text

[74] Schieman S,

[75] Pearlin LI,

[76] Meersman SC

[77] ↵
Diez Roux A
. Multilevel analysis in public health research. Annu Rev Public Health 2000;21:171–192.
OpenUrl CrossRef PubMed Web of Science

[78] Diez Roux A

[79] ↵
Merlo J
. Multilevel Analytical approaches in social epidemiology: measures of health variation compared with traditional measures of association. J Epidemiol Community Health 2003;57:550–2.
OpenUrl FREE Full Text

[80] Merlo J

[81] ↵
Merlo J,
Ostergren PO,
Hagberg O,
et al
. Diastolic blood pressure and area of residence: multilevel versus ecological analysis of social inequality. J Epidemiol Community Health 2001;55:791–8.
OpenUrl Abstract/FREE Full Text

[82] Merlo J,

[83] Ostergren PO,

[84] Hagberg O,

[85] et al

[86] ↵
Thornton A
Duncan G,
Raudenbush SW
. Getting context right in quantitative studies of child development. In: Thornton A, ed.The well-being of children and families Ann Arbor: The University of Michigan Press, 2001;356–83

[87] Thornton A

[88] Duncan G,

[89] Raudenbush SW

[90] ↵
Beland F,
Birch S,
Stoddart G
. Unemployment and health: contextual-level influences on the production of health in populations. Soc Sci Med 2002;55:2033–52.
OpenUrl CrossRef PubMed Web of Science

[91] Beland F,

[92] Birch S,

[93] Stoddart G

[94] Cutrona CE,
Russell DW,
Hessling RM,
et al
. Direct and moderating effects of community context on the psychological well-being of African American women. J Pers Soc Psychol 2000;79:1088–101.
OpenUrl CrossRef PubMed Web of Science

[95] Cutrona CE,

[96] Russell DW,

[97] Hessling RM,

[98] et al

[99] Hou F,
Chen J
. Neighborhood low income, income inequality and health in Toronto. Health Reports 2003;14:21–34.
OpenUrl PubMed

[100] Hou F,

[101] Chen J

[102] ↵
Wheaton B,
Clarke P
. Space meets time: integrating temporal and contextual influences on mental health in early adulthood. Am Sociol Rev 2003;68:680–706.
OpenUrl CrossRef Web of Science

[103] Wheaton B,

[104] Clarke P

[105] ↵
Clarke P,
Wheaton B
. Addressing data sparseness in contextual population research: Using cluster analysis to create synthetic neighborhoods. Sociol Methods Res 2007;35:311–51.
OpenUrl Abstract/FREE Full Text

[106] Clarke P,

[107] Wheaton B

[108] ↵
Raudenbush SW,
Bryk AS
. Hierarchical linear models: applications and data analysis methods, 2nd edn. Thousand Oaks: Sage Publications, 2002.

[109] Raudenbush SW,

[110] Bryk AS

[111] ↵
Hox J
. Multilevel analysis: techniques and applications. Mahway: Lawrence Erlbaum Associates, Publishers, 2002.

[112] Hox J

[113] ↵
Mooney CZ
. Monte Carlo simulation (Sage University Paper series on Quantitative Applications in the Social Sciences, series no. 07-116). Thousand Oaks: Sage, 1997.

[114] Mooney CZ

[115] ↵
Cohen J
. Statistical power analysis. Hillsdale: Erlbaum, 1988.

[116] Cohen J

[117] ↵
Gulliford MC,
Ukoumunne OC,
Chinn S
. Components of variance and intraclass correlations for the design of community-based surveys and intervention studies. Am J Epidemiol 1999;149:876–83.
OpenUrl Abstract/FREE Full Text

[118] Gulliford MC,

[119] Ukoumunne OC,

[120] Chinn S

[121] ↵
Boyle MH,
Willms JD
. Place effects for areas defined by administrative boundaries. Am J Epidemiol 1999;149:577–85.
OpenUrl Abstract/FREE Full Text

[122] Boyle MH,

[123] Willms JD

[124] ↵
Muthen BO,
Muthen LK
. Mplus user’s guide. Los Angeles: Muthen and Muthen, 1998.

[125] Muthen BO,

[126] Muthen LK

[127] ↵
Muthen LK,
Muthen BO
. How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling 2002;4:599–620.
OpenUrl CrossRef

[128] Muthen LK,

[129] Muthen BO

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

THE HIERARCHICAL GENERALISED LINEAR MODEL

METHODS

Simulation procedure

Statistical analysis

RESULTS

DISCUSSION

What is already known on this subject

What this study adds

Acknowledgments

REFERENCES

Footnotes

Linked Articles

Read the full text or download the PDF:

Log in using your username and password