Article Text


Framework to construct and interpret latent class trajectory modelling
  1. Hannah Lennon1,2,
  2. Scott Kelly3,
  3. Matthew Sperrin2,
  4. Iain Buchan2,
  5. Amanda J Cross4,
  6. Michael Leitzmann5,
  7. Michael B Cook3,
  8. Andrew G Renehan1,2,6
  1. 1 Division of Cancer Sciences, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
  2. 2 MRC Health eResearch Centre (HeRC), Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, UK
  3. 3 Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
  4. 4 Department of Epidemiology and Biostatistics, Imperial College, London, UK
  5. 5 Department of Epidemiology and Preventive Medicine, University of Regensburg, Regensburg, Germany
  6. 6 Manchester Cancer Research Centre, NIHR Manchester Biochemical Research Centre, University of Manchester, Manchester, UK
  1. Correspondence to Dr Hannah Lennon; lennonh{at} and Prof. Andrew G Renehan; andrew.renehan{at}


Objectives Latent class trajectory modelling (LCTM) is a relatively new methodology in epidemiology to describe life-course exposures, which simplifies heterogeneous populations into homogeneous patterns or classes. However, for a given dataset, it is possible to derive scores of different models based on number of classes, model structure and trajectory property. Here, we rationalise a systematic framework to derive a ‘core’ favoured model.

Methods We developed an eight-step framework: step 1: a scoping model; step 2: refining the number of classes; step 3: refining model structure (from fixed-effects through to a flexible random-effect specification); step 4: model adequacy assessment; step 5: graphical presentations; step 6: use of additional discrimination tools (‘degree of separation’; Elsensohn’s envelope of residual plots); step 7: clinical characterisation and plausibility; and step 8: sensitivity analysis. We illustrated these steps using data from the NIH-AARP cohort of repeated determinations of body mass index (BMI) at baseline (mean age: 62.5 years), and BMI derived by weight recall at ages 18, 35 and 50 years.

Results From 288 993 participants, we derived a five-class model for each gender (men: 177 455; women: 111 538). From seven model structures, the favoured model was a proportional random quadratic structure (model F). Favourable properties were also noted for the unrestricted random quadratic structure (model G). However, class proportions varied considerably by model structure—concordance between models F and G were moderate (Cohen κ: men, 0.57; women, 0.65) but poor with other models. Model adequacy assessments, evaluations using discrimination tools, clinical plausibility and sensitivity analyses supported our model selection.

Conclusion We propose a framework to construct and select a ‘core’ LCTM, which will facilitate generalisability of results in future studies.

  • latent class models
  • growth curves
  • growth mixture models
  • lifetime obesity
  • trajectories

This is an open access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See:

Statistics from

Strengths and limitations of this study

  • We developed a systematic approach, with rationale for each of eight steps, to derive a latent class trajectory model of favoured number of classes and ‘core’ model structure specification.

  • The results presented here are based on modelling data from a large well-characterised US cohort, allowing the derivation of numerically meaningful subpopulations (ie, classes) with distinct phenotypes.

  • Compared with ‘one-off’ body mass index categorisation, latent class trajectory modelling offers additional phenotypic information and opportunities to identify and intervene early in subpopulations with adverse trajectories.

  • While we described multiple diagnostic tests, ultimately model selection was based on case study appropriate model interpretation (eg, model adequacy, discrimination, clinical plausibility and sensitivity analyses) by a multidisciplinary research team.


In many epidemiological studies, a risk factor is measured at a single point in time and related to the subsequent development of disease under the assumption that a single ‘one-off’ measure is an approximation for that exposure over a long time. Thus, baseline measurement of body mass index (BMI) is associated with subsequent development of common disease like cardiovascular disease,1 diabetes,2 several cancers3 and all-cause mortality.4 This approach is crude, and many investigators seek to use alternative methods that might better capture long-term risk factor exposure termed life-course analysis. There are widely used examples that capture cumulative exposure, such as pack-years for smoking and lung cancer, but the assumption that incidence rate is proportional to total lifetime dose is questionable.5 Many other life-course models simply extract features for use in standard regression approaches, for example, a weight change over time. A more sophisticated approach, which takes account of within-individual correlations, is mixed-effect modelling, but this is difficult to interpret for public health implementation. An extension of this approach is the use of latent classes, also termed growth mixture models.

Latent class trajectory modelling (LCTM) simplifies heterogeneous populations into more homogeneous clusters or classes. From these, one can potentially include random effects to allow for individual variation within these classes. These models have a long history in the criminology6 and psychology7 literatures, and now, are increasingly reported in the human epidemiology literature (eg, disentangling the heterogeneity of childhood asthma8). Of relevance to this paper, LCTM has been used in association studies of repeated BMI measures with the following endpoints: all-cause mortality,9 cancer incidence (multiple cancer types10, gastro-oesophageal,11 prostate12) and cancer mortality.12 The LCTM has three general advantages compared with using ‘one-off’ exposure determinations: first, it better informs aetiological associations by deeply phenotyping certain ‘at risk’ subpopulations; and second, LCTM offers a public health strategy to identify early divergent adverse trajectories as potential intervention targets. Some researchers additionally argue that LCTM is well equipped for future forecasting and new patient generalisations in prediction models, as it handles data following a different predictable pattern from that learnt by the model.13 Third, the trajectory approach allows a better understanding of the causes of between-individual variation in certain features (eg, weight variation over age), by analysing the trajectory as an outcome rather than exposure.

However, LCTM is a complex form of modelling and requires several different structure assumptions.14 Although firmly acknowledged in the GRoLTS-Checklist: Guidelines for Reporting on Latent Trajectory Studies,15 structure-related assumptions have not been systematically evaluated. For many exposures of interest, typically two to seven classes might be described and, as detailed later, at least seven model structures might be fitted, with and without linear curve properties, such that it is possible to derive greater than 80 different models. Thus, reported differences between studies using latent class modelling might reflect different modelling assumptions rather than true differences between populations. To facilitate the generalisability of results in future studies, here, we propose a framework to construct and select a ‘core’ LCTM, using an example of repeatedly determined BMI across adulthood in the National Institutes of Health (NIH)-AARP Diet and Health Study cohort. For exposure-disease outcome association analyses, current approaches generally use two stages: first, LCTM, followed by standard association modelling. The framework described here is limited to the first stage.



The NIH-AARP Diet and Health Study is a US cohort recruited from 1995.16 A baseline medical and lifestyle questionnaire, including self-reported weight and height, was returned by 566 398 participants (aged 50–71 years; mean age: 62.5 years). An additional risk factor questionnaire was mailed in 1996 and completed by a subcohort of 327 860, of whom 288 993 (177 455 men and 111 538 women) provided recall weight for all four time points: ages 18, 35 and 50 years. Derived BMIs at baseline and these ages (assuming constant height) form the data in the present analysis. We excluded participants with extreme BMI values (<15 or >70 kg/m2) recorded at any time point. Means and SD for derived recalled BMI distributions are representative of BMI distributions for historical period-equivalent US populations.17

Latent class trajectory modelling

We developed an eight step framework (table 1) modelling BMI as a function of age. Latent classes were used to identify subgroups of participants with distinct trajectories (detailed mathematical equations in online supplementary material p2).18 We used maximum likelihood approaches to fit the model with the ‘hlme’ function from ‘lcmm’ library19 in the R software environment (V.3.2.1) and cross-checked results using the ‘PROC TRAJ’ function in ‘SAS traj’ library (SAS Institute, Cary, North Carolina, USA)20 (online supplementary table S1).

Supplementary file 1

Table 1

Framework of eight steps to construct a latent class trajectory model

Step 1

We initially constructed a scoping model provisionally selecting the plausible number of classes based on available literature; in the context of BMI trajectories, we used K=5 classes as reported elsewhere.10 12 We built models for both genders, as BMI patterns of lifetime changes differ for men and women.21 To determine the initial working model structure of random effects, we followed the rationale of Verbeke and Molenbergh22 and examined the shape of standardised residual plots for each of the five classes in a model with no random effects. If the residual profile could be approximated by a flat, straight line or a curve, then a random intercept, slope or quadratic term, respectively, were considered. Preliminary plots suggested preference for a quadratic random effects model (supplementary figure S1).

Step 2

We refined the preliminary working model from step 1 to determine the optimal number of classes, testing K=1–7. The number of classes chosen was based on the lowest Bayesian information criteria (BIC).

Step 3

We further refined the model using the favoured K derived in step 2, testing for the optimal model structure. We tested seven models (detailed in online supplementary table S2), ranging from a simple fixed effects model (model A) through a rudimentary method that allows the residual variances to vary between classes (model B) to a suite of five random effects models with different variance structures (models C–G).

Step 4

We then performed a number of model adequacy assessments. First, for each participant, we calculated the posterior probability of being assigned to each trajectory class and assigned the individual to the class with the highest probability. An average of these maximum posterior probability of assignments (APPA) above 70%, in all classes, is regarded as acceptable.6 We further assessed model adequacy using odds of correct classification, mismatch scores and entropy, Ek (detailed in online supplementary table S3). These diagnostic tools assist in model selection.6 23 In some examples, BIC values may decrease as more groups and parameters are added reflecting model overfit. Therefore, the BIC value might not always provide the optimum selection criteria, and model selection must balance between meaningful trajectories, model parsimony and model adequacy. For example, if the model adequacy measures are strongly violated, one might go back to steps 2 and 3 and consider a different model with a higher BIC value. We selected an optimal model structure using the lowest BIC value and satisfactory values from the model adequacy assessments and referred to the outcome of steps 1–4 as the favoured model. To assess the interpretability of the resulting classes, we investigated characteristics of lifestyle behaviours of the favoured model such as smoking, alcohol consumption and physical activity.

Step 5

We used three graphical presentation approaches. The conventional approach is to plot mean trajectories with time encompassing each class. Alternatives include the use of mean trajectory plots with 95% predictive intervals for each class, which displays the predicted random variation within each class, or to plot individual level ‘spaghetti plots’ with time (eg, a random sample of participants), which allows the reader to observe the patterns of changes within classes.

Step 6

We assessed model discrimination, including degrees of separation, DoSK ,24 25 and Elsensohn’s envelope of residuals.25 To describe the separation of latent trajectory curves, a multivariate Mahalanobis distance was used. Peugh and Fan26 argue that it is reasonable to speculate that identification of heterogeneous latent trajectories is facilitated by large statistical separation distance among the subpopulations. Thus, larger values of DoSK indicate the mean trajectories are well separated, while DoSK equal to zero is the special case when all mean trajectories are identical. If the DoSK value is small, then one might consider a model with fewer classes.

To check structure assumptions in fixed effects latent class models, Elsenhohn et al 25 plotted the local SD of the residuals against time. We extended this method to random effects models: first, computing the observed residuals for each participant; and second, computing the class-specific and time-specific weighted local variance of the residuals, with weights being the posterior probabilities of individual belonging to a class. We plotted the upper and lower boundary values of the local SD of the residuals around the mean values for each class. The resulting shape indicates the appropriateness of the model assumptions, where non-parallel boundaries indicate heteroscedasticity of residuals suggesting poor model fit, and differing interval widths suggest that across class variability may not be fully accounted for.

Step 7

We assessed for clinical characterisation and plausibility using four approaches: (1) assessing the clinical meaningfulness of the trajectory patterns, aiming to include classes with at least 1% capture of the population; (2) assessing the clinical plausibility of the trajectory classes; (3) tabulation of characteristics by latent classes versus conventional categorisations; and (4) concordance of class membership with conventional BMI category membership using the kappa statistic (as LCTM is an unsupervised learning approach, we computed k for all possible combinations and selected the optimal k).

Step 8

We conducted sensitivity analyses, in this example, with individuals with at least two and three BMI values, as LCTMs are flexible enough to deal with different observation times between participants.

Patient and public involvement

No patients and or public were involved with this manuscript.

Statistical algorithms

All R and SAS codes used to implement these tools are available via the authors and can be downloaded from


Number of classes

From the preliminary working model of a quadratic random effects model, model F (proportional covariance structure), we derived BICs for up to seven classes: three of the class models failed to converge in men and women. Table 2 reports that the lowest BIC was obtained with five classes in men and women, confirming our initial working model. The proportions by class in men were 68.1%, 25.0%, 3.8%, 2.7% and 0.4%, and in women, proportions by class were 32.6%, 41.1%, 21.1%, 3.5% and 1.7%. For model G (our second favoured model), the lowest BICs were noted for five classes in men and women (online supplementary table S2).

Table 2

Number of classes (K=1–7) using random effects quadratic structure model F (proportional covariance structure) by gender

Assessment for model structures

With the number of classes now selected as five, we tested the seven model structures: A–G. Table 3 reports that the lowest BIC was for model F in men and women, justifying the selection of model F in the preliminary working phase. The class sizes varied between models, with class I ranging from 41% to 68% in men and from 32% to 95% in women. The APPA for each class in model F was 0.81, 0.74, 0.87, 0.83 and 0.74 in men and 0.74, 0.79, 0.80, 0.83 and 0.84 in women, indicating a good discrimination of trajectory. The classes were well differentiated with the relative entropy, EK  values ranging from 0.59 to 0.81 in men and from 0.66 to 0.83 in women.

Table 3

Model adequacy assessments of latent trajectory class models based on different assumptions for K=5 classes, by gender in the NIH-AARP cohort

There was moderately good concordance (unweighted and weighted) between the unstructured variance models G with model F in men (k: 0.57) and women (k: 0.65) (supplementary tables S4 and S5) but poorer concordance between the favoured models and fixed-effects models in men.

Graphical presentation

We plotted the mean trajectories for model A, B, C, D, F and G in men and women (figure 1) illustrating the increased complexity from model A to model G. As alternatives, we plotted separately mean trajectories with 95% predictive intervals for each class, in model F (online supplementary figure S2), which displays the predicted random variation within each of the classes with time, noting that variation was greater with the more ‘complex’ classes (classes IV and V compared with classes I, II and III). Spaghetti plots of individual level data illustrated that the timing and size of BMI changes characterise the classes; for example, sharp increases in BMI in early adulthood in class III but later in adulthood for class IV (online supplementary figure S3).

Figure 1

BMI mean trajectories by men (left) and women (right) for models A–G. Colours are used to discriminate classes within each plot but should not be used for direct comparisons across plots. BMI, body mass index.

Additional tools of suitability of fit

The DoSk values ranged from 0.10 to 0.36 and 0 to 0.34, in men and women, respectively (table 3). The covariances were high and in the positive direction, and therefore models with non-parallel mean trajectories lead to higher separation.

We plotted the local SD of the residuals with time and found that these were broadly homogeneous, that is, there were few parallel boundaries (figure 2). The local residuals for the rapidly obese groups in both genders are the exceptions to parallel lines, which might reflect comorbidities in this group and smaller numbers.

Figure 2

Illustration of the local (Elsensohn) residual envelope plots (shown here for model B).

Clinical assessment

Having established the favoured model, model F with five classes in both genders, we assigned descriptive labels to each respective class as follows (table 4): stable normal weight; normal weight to overweight; normal weight to obese; overweight to obese; and rapid early obesity. We noted that the proportion in the rapid early obesity (class V) was less than 1% in men. However, overall, the proportion for class V for men and women combined was nearly 1%. Thus, we retained this class as we judged it to be clinically meaningful as follows. In both genders, there were rapid increases in obesity from early to middle adulthood, then apparent severe weight reductions. We rationalised that this was clinically plausible, as it could be explained either by intentional (eg, bariatric surgery) or non-intentional weight loss (eg, reverse causality from development of disease).

Table 4

Latent class characteristics of 177 453* men and 111 503* women in the NIH-AARP cohort

We then tabulated the baseline characteristics according to the five classes for model F in men and women and noted patterns across the classes (table 4 and fully expanded in online supplementary tables S5 and S6). Thus, for example, for current smoking status, there were little differences in patterns by class in men: 10%, 8%, 9%, 10% and 15%, and in women: 16%, 12%, 12%, 13% and 11%. This contrasts with BMI categories: men (12%, 8%, 8%, 7% and 7%) and women (16%, 12%, 10%, 8% and 7%) (online supplementary tables S7 and S8).

Finally, we noted very poor concordance between the favoured model and conventional BMI categorisation in men (k: 0.18) and women (k: 0.52) (table 5).

Table 5

Concurrence between BMI categories and classes in model F from the NIH-AARP cohort

Sensitivity analyses

We tested the favoured model using a larger sample of individuals with at least three measures and found no material differences between these models, in men and women, and the main model (online supplementary figure S4).


Main findings

We propose an eight-step framework for the construction and selection of models derived from LCTM. We evaluated a range of model structures from fixed effect models to a set of random effects models, favouring the latter models in this case study, as they include different variance structures and more likely to reflect the natural history of changes with time in BMI distributions in different subpopulations. We showed that different model structures resulted in different classes with contrasting clinical phenotypes. We propose prespecified criteria for model selection and that the reporting of a ‘core’ model will facilitate generalisability of results in future studies.

Context of other literature

To the best of our knowledge, this is the first study to systematically address structure-related assumptions in LCTMs, and their potential impact on clinically relevant endpoints—in this example, BMI trajectories. Anecdotally, there is a justifiable criticism regarding the use of LCTM models and an uncertainty of how class memberships are derived—a ‘black box’ effect. The proposed framework, here, encourages the opposite—a transparent stepwise approach to class and model structure selection. To enhance this process, for example, we have ‘borrowed’ tools developed to address to quantify uncertainty, such as entropy measures, E and Ek , and applied them to assist assessment of model adequacy. A further modification of discrimination measurement with variance estimation has been described by Shah and colleagues27 and might have importance for class assignment where ‘yes/no’ treatment decisions are required.

Variations of model A (fixed effects) have been reported in the clinical literature,9–12 which assume no within-class variability when deriving latent classes. Interpretation in this setting is that variation from the mean trajectory is random; that is, the correlation between measurements for the same individual is explained by latent class membership. In the context of any repeated measures in the general population, this assumption might not be valid.14 Saunders28 argued in support of full random effects models (ie, models F and G), calling on Moffitt’s theory from criminology, which recognises that ‘there are distinct developmental clusters of trajectories of anti-social behaviour that are the result of divergent aetiologies’; in other words, it is unlikely that latent classes start from a similar baseline.

The publication of the 16-item GRoLTS Checklist in 201715 heralded an important advance for the application of LCTM. Here, we add a framework for construction and interpretation.

Strengths and weaknesses

The study has strengths. First, the considered and strategic workflow to optimise identification and application of latent classes provides for a more robust and transparent application of these models in epidemiology. Second, the results presented are based on modelling data from a large well-characterised US cohort, therefore allowing the derivation of numerically meaningful subpopulations (ie, classes) with distinct phenotypes. We uniquely used averaged kappa values to demonstrate that the LCTM-derived subpopulations are markedly different to those derived from a ‘one-off’ BMI determinations. In turn, BMI trajectories are more likely to reflect normal clinical practice of considering a ‘weight history’. Third, we extensively explored different model selections and adequacy tools, and described extensions to other tools, to supplement model interpretation. Fifth, tofurther supplement model interpretation, we embedded this project within a multidisciplinary research team including data scientists, statisticians, clinicians and epidemiologists—an approach echoed elsewhere.29 Finally, we have made the statistical algorithms freely available.

There are several study weaknesses. First, LCTMs currently only considers trajectories of one risk factor at a time. Second, there were only four time points in the AARP such that it was not possible to assess weight cycling. Third, while we described multiple diagnostic tests, ultimately model selection was based on case study appropriate model interpretation (eg, model adequacy, discrimination, clinical plausibility and sensitivity analyses) as well as likelihood-based model fit criteria.18 Some discussion on statistical power and efficiency is warranted. The objective of model selection is a trade-off between efficiency and validation with the aim of summarising distinct features of the data as parsimonious as possible and not just the maximisation of model fits.6 For example, in a hypothetical scenario, putting too much emphasis on the validity of a model in which 10 classes provide the best model fit is questionable if 3 of the 10 classes each include less than 0.5% of the population and do not show markedly different characteristics.

Clinical implications and future research

We showed that different model structures resulted in different classes with contrasting clinical phenotypes. Thus, for example, it is well recognised that the proportions of current smokers decreases with increasing BMI categories. However, this ‘trend’ is not observed across the latent classes derived in our favoured model F, suggesting that the clinical characteristics derived from the LCTM differed from those derived from conventional categorisation approaches. Thus, compared with ‘one-off’ BMI categorisation, LCTM offers additional phenotypic information.

For future research, improving the construction, interpretation and reporting of LCTM (advocated here) is hugely important as the LCTM approach has opportunities to identify and intervene early in subpopulations with adverse trajectories. This approach is analogous to the well-held public strategy of using childhood growth charts to identify and intervening in young children failing to thrive. Thus, in the example of BMI, remembering that 80% of obese adults were not obese in childhood,30 future LCTM studies might identify (new) individuals in their 20s or early 30s on adverse trajectories towards later adulthood obesity. This strategy is a new methodological paradigm, as the repeated measurement of a risk factor (here, BMI) becomes a clinically relevant endpoint rather than just an exposure.


We acknowledge the generous funding from Cancer Research UK National Awareness and Early Detection Initiative.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
View Abstract


  • Contributors HL, MS, MBC and AGR conceptualised the paper. HL, MS and SK designed the statistical approaches; HL performed the modelling. AJC and ML facilitated data access and interpretation of the AARP data. All authors contributed to data interpretation; IB and AGR put modelling into clinical context.

  • Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests AGR has received lecture honoraria from Merck Serona and Janssen-Cilag and independent research funding from Novo Nordisk. All other authors have no conflicts of interest to declare.

  • Patient consent Not required.

  • Ethics approval NCI SSIRB (Special Studies IRB).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data available.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.