Elsevier

Social Science & Medicine

Volume 70, Issue 7, April 2010, Pages 1100-1108
Social Science & Medicine

What is a cohort effect? Comparison of three statistical methods for modeling cohort effects in obesity prevalence in the United States, 1971–2006

https://doi.org/10.1016/j.socscimed.2009.12.018Get rights and content

Abstract

Analysts often use different conceptual definitions of a cohort effect, and therefore different statistical methods, which lead to differing empirical results. A definition often used in sociology assumes that cohorts have unique characteristics confounded by age and period effects, whereas epidemiologists often conceive that period and age effects interact to produce cohort effects. The present study aims to illustrate these differences by estimating age, period, and cohort (APC) effects on obesity prevalence in the U.S. from 1971 to 2006 using both conceptual approaches. Data were drawn from seven cross-sectional waves of the National Health and Nutrition Examination Survey. Obesity was defined as BMI  30 for adults and ≥95th percentile for children under the age of 20. APC effects were estimated using the classic constraint-based method (first-order effects estimated and interpreted), the Holford method (first-order effects estimated but second-order effects interpreted), and median polish method (second-order effects are estimated and interpreted). Results indicated that all methods report significant age and period effects, with lower obesity prevalence in early life as well as increasing prevalence in successive surveys. Positive cohort effects for more recently born cohorts emerged based on the constraint-based model; when cohort effects were considered second-order estimates, no significant effects emerged. First-order estimates of age–period–cohort effects are often criticized because of their reliance on arbitrary constraints, but may be conceptually meaningful for sociological research questions. Second-order estimates are statistically estimable and produce conceptually meaningful results for epidemiological research questions. Age–period–cohort analysts should explicitly state the definition of a cohort effect under consideration. Our analyses suggest that the prevalence of obesity in the U.S. in the latter part of the 20th century rose across all birth cohorts, in the manner expected based on estimated age and period effects. As such, the absence or presence of cohort effects depends on the conceptual definition and therefore statistical method used.

Introduction

Both medical sociology and epidemiology seek to understand the distribution and etiology of health outcomes. However the two disciplines often approach health problems from different theoretical starting points. For example, epidemiologists typically use information on health distributions to identify causes of disease, whereas medical sociologists generally seek to understand the impact of the social environment on populations regardless of the particular disease that arises from adverse social conditions (Aneshensel et al., 1991, Syme and Yen, 2000). Thus, epidemiologists often begin with a specific outcome and seek to identify salient exposures; sociologists often begin with an exposure and seek to identify salient health outcomes. These different conceptualizations inform the way in which research questions are asked and the analyses used to answer the questions. The study of cohort effects is one such area in which rich research traditions in both epidemiology and medical sociology emerge. Cohort effects (sometimes referred to as “generation effects” [Last, 2001]) are generally conceptualized as variation in the risk of a health outcome according to the year of birth, often coinciding with shifts in the population exposure to risk factors over time. Cohort analysis is used to identify particularly at-risk birth cohorts, providing vital information for both public health surveillance and for the identification of etiologic factors. This paper will illustrate conceptual differences in defining the cohort effect, the way in which these differences translate into separate statistical modeling strategies, and the alternative conclusions that can arise based on the differences.

We highlight these differences by estimating cohort effects in obesity prevalence in the United States during the last 40 years. Obesity prevalence has risen dramatically (Flegal et al., 2002, Ogden et al., 2006), and the factors most important in this increase remain unclear (Drewnowski, 2007, James, 2008, Prentice and Jebb, 2003). While most explanations focus on individual eating behavior and physical activity, other hypotheses suggest that rising obesity may be attributed to lack of sleep, decreasing smoking prevalence, or, provocatively, even one's in utero environment. Specifically, the fetal over-nutrition hypothesis posits that increasing in utero exposure to maternal obesity may lead to inter-generational increases in offspring obesity (Cole et al., 2008, Gillman, 2004, Keith et al., 2006, Lawlor et al., 2008). The contribution of this phenomenon may manifest as cohort effects in obesity prevalence, as each successively younger cohort is at higher risk for obesity. Evaluation of cohort effects in obesity prevalence can shed light on the plausibility of the over-nutrition hypothesis as well as other novel hypotheses that attempt to explain secular increases in obesity in the U.S.

Cohort analysis began in the early 20th century as a descriptive tool to better understand mortality (Kuh & Davey Smith, 1993), mostly for the purpose of forecasting and calculating life expectancy (Tutt, 1953). Since these earliest studies, the definition, identification, and interpretation of cohort effects have been a subject of controversy (Derrick, 1928, Kermack et al., 1934). To define a cohort effect, it is necessary to first define the related effects associated with “age” and “period.” Age effects describe the common developmental processes that are associated with particular ages or stages in the life course. In other words, age effects represent accumulated exposure and/or the physiological changes associated with the process of aging. Period effects are the result of widespread environmental changes, the ubiquitous, population-wide exposures that occur at a circumscribed point in time. Two alternative accounts of the cohort effect exist, with one definition being relatively more common in medical sociology and other relatively more common in epidemiology. Of course, the fields of medical sociology and epidemiology are not mutually exclusive; neither are the two conceptualizations of a cohort effect exclusive to any field. Further, researchers from different fields may utilize similar constructs yet still conceptualize different research questions. However, for simplicity, we will refer to the two cohort conceptualizations as the “epidemiologic definition” and the “sociologic definition,” and discuss how these definitional differences give rise to alternative research questions, analyses, and interpretations.

The epidemiologic definition of a cohort effect suggests that a cohort effect occurs when different distributions of disease arise from a changing or new environmental cause affecting age groups differently. A cohort effect, therefore, is conceptualized as a period effect that is differentially experienced through age-specific exposure or susceptibility to that event or cause (i.e., interaction or effect modification). These effects can be short-lived or have long-term consequences on the health outcomes of the individuals within the affected cohort. In both public health and social science, we are often most interested in the identification of cohort effects that result in long-term health risks, but short-term fluctuations in health that result from age by period interactions are also important to document. As an obesity-related example, we might imagine that the availability and accessibility of sugar-laden soft drinks increases in the population (period effect), but the effect of that increase was more pronounced among the youngest cohorts because of higher consumption among children relative to adults (period by age interaction). In other words, a cohort effect could arise when a population-level environmental cause is unequally distributed in the population. Alternatively, a cohort effect could arise because a population-level exposure differentially affects age groups who are in the midst of a critical developmental period, during which exposure has long-lasting effects on lifetime disease risk (Gluckman and Hanson, 2004, Lawlor et al., 2008). This definition of the cohort effect applies more often to epidemiologic research questions, in which the primary objective is to better explain a particular pattern or emergence of a population health outcome in the most parsimonious yet comprehensive manner possible.

An alternative definition of a cohort effect has arisen, primarily although not exclusively out of sociological theory. This more sociologically-oriented view grows out of the conceptual starting place that the cohort itself represents an exposure that is rich with explanatory power. Thus, the conceptual orientation is on cohorts, and on determining the ways in which cohort membership affects the lives of persons across the life course. This sociological view of cohort was popularized by demographer Norman Ryder in his seminal 1965 publication ‘The Cohort as a Concept in the Study of Social Change.’ (Ryder, 1965) Ryder posited that a cohort can be conceived as a structural category, whereby the unique circumstances and conditions through which cohorts emerge, come of age, and die provide a record of social and structural change. As a result, the conditions, barriers, and resources that each cohort is born into and in which they live their collective lives may uniquely shape the patterns and experiences of health and mortality for that cohort. The focus of investigations adopting this conceptualization of the cohort effect seek to quantify the unique risks that are associated with cohort membership, defined broadly and inclusively with all exogenous factors that may impact the health of each cohort. Under the sociological definition, the long-term health risks of being born in a certain cohort are of primary interest, whereas short-term fluctuations in health among members of certain birth cohorts do not reflect the broad structural forces that shape health across the life course. In the obesity example, we might posit that the obesity epidemic is shaped by coming of age in a media-saturated environment where sedentary lifestyles are socially acceptable and where many families are priced out of healthy, nourishing food. The variation of a specific environmental cause across age is not of primary interest in this particular example; instead, the totalities of the societal structures that create reservoirs of risk and resilience across different cohorts become the exposures of interest for the sociological inquiry.

In contrast to the epidemiological definition, which defined a cohort effect as the interaction of period and age effects, the sociological definition conceives of age and period as confounders of the cohort effect (Mason, Mason, Winsborough, & Poole, 1973). As described, sociologists often conceptualize cohort effects as representing the totality of environmental influences for a particular birth group that are unique to the cohort itself. The effects of period and age obscure the ability to quantify a cohort effect because all three variables are linked with time. Thus, when examining population health outcomes, we do not know whether the prevalence is changing because of the experience of cohorts; changes in the age structure of the population; or the introduction or removal of widespread environmental influences. Teasing apart the independent effects of historical influences (cohort effects), contemporaneous influences (period effects), and exposure accumulation (age effects) becomes necessary to obtain a unique estimate of the cohort effect under the assumptions of the sociological definition.

Age–period–cohort modeling strategies can be defined as statistical attempts to partition variance into the unique components attributable to age, period, and cohort effects. Regardless of conceptual definition, the majority of APC modeling strategies developed over the past thirty years assume that cohort effects can exist independently of age and period effects (the sociological definition). Therefore, age, period, and cohort are often modeled as having a linear relationship with the outcome of interest, and each linear slope is estimated controlling for the additive effect of the other two. These linear relationships are termed “first-order effects.” However, no statistical model can simultaneously estimate age, period, and cohort effects because of the collinearity among the three variables (Cohort = Period  Age). This collinearity results in a statistically non-identifiable design matrix, making simultaneous mathematical modeling of the linear functions of three effects impossible without additional restrictions in the model.

Research aimed at solving or mitigating this identifiability problem has generated a considerable body of literature and fostered the development of a variety of methodological approaches (e.g., Clayton and Schifflers, 1987, Glenn, 2005, Mason et al., 1973, O'Brien, 2000, Robertson and Boyle, 1986, Rodgers, 1982, Yang et al., 2008). The first and most common approach to mitigating the identification problem is the constraint-based regression (Mason & Fienberg, 1985), in which at least one category of age, period, and cohort is constrained in some manner. While this type of modeling strategy produces simultaneous estimates of age, period, and cohort effects, it has been criticized in the statistical literature because the results are sensitive to the constraint chosen and there is no empirical way to confirm the validity of the chosen constraints (Glenn, 2005, Holford, 1991, Kupper et al., 1985).

An alternative approach was developed by Theodore Holford (Holford, 1983, Holford, 1991, Holford, 1992). Acknowledging that constraint-based approaches were limited, Holford (and others [Clayton & Schifflers, 1987]) advocated for a focus on those aspects of the APC model that are immune to the constraints chosen for model identification: second-order effects. Second-order effects are those which have a non-linear relationship with the outcome of interest. While there are many types of second-order effects that can be estimated in a model, the Holford approach focuses on linear contrasts, a measure which can be interpreted as reflecting a change in the direction or steepness of an underlying linear slope. Linear contrasts are calculated using first-order estimates derived from the constraint-based regression model. However, the magnitude of the underlying slope – the first-order estimate – remains uninterpreted. Thus, a perfectly linear slope as measured by a first-order estimate would evidence no significant linear contrast (second-order effect). Using the obesity example, suppose that the underlying unobservable truth is that the obesity rate is increasing linearly across birth cohorts, but the speed of this increase begins decelerating in a certain birth cohort. The deceleration would be detected in the estimates derived from the Holford approach, but not the underlying magnitude of the linear slope. The Holford method is commonly used in cancer epidemiology as a way to estimate cohort effects (e.g., Zheng et al., 1996). The Holford approach can be conceptualized as a hybrid of the sociological and epidemiologic definition; while conceptually the Holford approach acknowledges the interpretive utility of linear effects for age, period, and cohort (i.e., the sociologically-oriented approach), it accepts the reality that these linear effects are not validly simultaneously estimable and thus focuses on the estimation and interpretation of the non-linear effects (i.e., the epidemiologically-oriented approach).

A third approach to age–period–cohort analysis is to reject first-order effects entirely and focus only on the second-order effects produced by the interaction of age and period effects (Greenberg et al., 1950, Keyes and Li, 2008, Selvin, 1996). The median polish technique (Keyes and Li, 2008, Selvin, 1996, Shahpar and Li, 1999, Tukey, 1977) is an example of an age–period–cohort method that explicitly defines cohort effects as age by period interactions and does not depend on the estimation of first-order effects. This method unambiguously applies the conceptual definition of cohort effects that is common in epidemiology. It captures non-linearities in the age and period effects and partitions this non-linear variance into a systematic component (cohort effect) and an unsystematic component (random error). In statistical models, interaction effects are, by definition, second-order effects because they represent deviations from linearity. Like the Holford method, the second-order effects produced by the median polish method model non-linearities; the difference between these two methods is in how the second-order effects are calculated. In Holford-based models, the second-order effects represent changes in slope, which are derived from the first-order linear slopes of fitted age, period, and cohort effects. The median polish does not estimate nor recognize validity in first-order effects at all. First-order cohort effects that control for the simultaneous linear effects of age and period effects are not of interest; instead, only the second-order joint effect of age and period is estimated and interpreted in the median polish approach.

The present analysis will highlight the implications of different conceptual definitions of a cohort effect by comparing three statistical methods to identify cohort effects on the prevalence of obesity in the United States from 1975 to 2006. The first method is the traditional constraint-based regression technique (Mason et al., 1973), which attempts to quantify cohort effects in an additive model with age and period effects as confounders. The second is the Holford model (Holford, 1983, Holford, 1992), which estimates the cohort effect as a second-order function in a model in which first- and second-order age and period effects are considered confounders of the first- and second-order cohort effects. The third is the median polish technique (Keyes & Li, 2008), which estimates the cohort effect as a partial interaction (second-order effect) of age and period effects.

These three methods were chosen to highlight an evolution of statistical methods: the constraint-based approach explicitly focuses on first-order effects but is limited by the identification problem; the Holford method is built on first-order effects but presents results of constraint-invariant second-order effects; and the median polish does not model first-order effects and interprets only second-order effects. The purpose of these comparisons is to explicitly describe the way in which the models make different assumptions about cohort effects and how these assumptions translate into results with varying interpretations and public health implications. Using obesity prevalence data for the past 40 years to compare the assumptions and interpretations of three APC modeling techniques provides substantively rich and insightful results regarding the role of age, period, and cohort effects in the U.S. obesity epidemic.

Section snippets

Sources of data

Data were drawn from seven cross-sectional waves of the National Health and Nutrition Examination Survey NHANES. The first wave was conducted in 1971–1975, and the most recent in 2005–2006. Each wave provides nationally representative data for the US civilian non-institutionalized population. NHANES utilized a complex, stratified, multi-stage probability cluster sampling design (National Center for Health Statistics, 1978, National Center for Health Statistics, 1994, National Center for Health

Method 1: constraint-based approach

A general three-factor regression model (the three factors are age, period, and cohort) for the risk of the dichotomous outcome (Yijk) is estimated as a function of the scalar αi, the ith of m  1 age effects; the scalar βj, the jth of n  1 period effects; and the scalar γk, the kth of m + n  2 cohort effects. The natural log of Yijk is proportional to a constant term (μ) plus αi, βj, γk, and an error term (ɛij):Ln(Yijk)=μ+αi+βj+γk+ɛijThe above model is not identifiable because of the collinearity

Graphical analysis

Obesity prevalence from 1971 to 2006 was plotted in two different ways in Fig. 1, Fig. 2. Fig. 1 shows the prevalence of obesity stratified by age and period. The age distributions of obesity exhibited curvilinear shapes, with prevalence increasing throughout the life course until approximately age 30, when prevalence begins to stabilize or decrease. While the age-specific slope of obesity prevalence was constant across period, the absolute magnitude of obesity increased for each age group in

Discussion

We explored three statistical methods to estimate age, period, and cohort effects on obesity prevalence in the United States from 1971 to 2006 using nationally representative data. All models agreed that there was an age effect, whereby risk increased in childhood and early adulthood then stabilized by mid-adulthood. All models also found a strong, positive period effect whereby obesity prevalence increased across all ages beginning in the 1980s. Models diverged, however, on the estimation of

References (50)

  • A. Drewnowski

    The real contribution of added sugars and fats to obesity

    Epidemiologic Reviews

    (2007)
  • K.M. Flegal et al.

    Prevalence and trends in obesity among US adults, 1999–2000

    JAMA

    (2002)
  • M.W. Gillman

    A life course approach to obesity

  • N.D. Glenn

    Cohort analysts' futile quest: statistical attempts to separate age, period, and cohort effects

    American Sociological Review

    (1976)
  • N.D. Glenn

    Cohort analysis

    (2005)
  • B.G. Greenberg et al.

    A technique for analyzing some factors affecting the incidence of syphilis

    American Statistical Association Journal

    (1950)
  • J. Hobcraft et al.

    Age, period, and cohort effects in demography: a review

    Population Index

    (1982)
  • T.R. Holford

    The estimation of age, period and cohort effects for vital rates

    Biometrics

    (1983)
  • T.R. Holford

    Understanding the effects of age, period, and cohort on incidence and mortality rates

    Annual Reviews in Public Health

    (1991)
  • T.R. Holford

    Analysing the temporal effects of age, period and cohort

    Statistical Methods in Medical Research

    (1992)
  • W.P. James

    The fundamental drivers of the obesity epidemic

    Obesity Reviews

    (2008)
  • S.W. Keith et al.

    Putative contributors to the secular increase in obesity: exploring the roads less traveled

    International Journal of Obesity

    (2006)
  • K.M. Keyes et al.

    A comprehensive approach to age-period-cohort analysis

    American Journal of Epidemiology

    (2008)
  • R.J. Kuczmarski et al.

    CDC growth charts: United States

    Advance Data

    (2000)
  • D. Kuh et al.

    When is mortality risk determined? Historical insights into a current debate

    Social History of Medicine

    (1993)
  • Cited by (210)

    • Vehicle ownership rates: The role of lifecycle, period, and cohort effects

      2023, Transportation Research Interdisciplinary Perspectives
    • Longitudinal study: Design, measures, and classic example

      2023, Translational Interventional Radiology
    View all citing articles on Scopus

    This research was supported in part by a fellowship from the National Institute of Mental Health (T32MH013043-36, Keyes) and grants from the National Institute on Aging (R01AG13642, Li) and the National Institute on Alcohol Abuse and Alcoholism (R01AA09963, Li). Dr. Robinson is a Robert Wood Johnson Foundation Health & Society Scholar at the University of Michigan in the Center for Social Epidemiology and Population Health. The authors thank the Robert Wood Johnson Foundation Health & Society Scholars program for its financial support.

    View full text