Article Text

Original research
Identifying heterogeneity in the risk factors of dental caries status in Chinese adolescents using Poisson mixture regression
  1. Chao Yuan1,
  2. Jie He2,
  3. Xiangyu Sun1,
  4. Jian Kang2,
  5. Shuguo Zheng1
  1. 1Department of Preventive Dentistry, Peking University School and Hospital of Stomatology, National Engineering Laboratory for Digital and Material Technology of Stomatology, Beijing Key Laboratory of Digital Stomatology, Beijing, People's Republic of China
  2. 2Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
  1. Correspondence to Dr Shuguo Zheng; kqzsg86{at}bjmu.edu.cn; Dr Jian Kang; jiankang{at}umich.edu

Abstract

Objective The purpose of this study was to cluster individuals into groups with different dental health characteristics and make statistical inferences on the effect differences among different groups simultaneously to identify the heterogeneity of risk factors in Chinese adolescents by analysing the data from the 4th Chinese National Oral Health Survey.

Methods For decayed, missing and filled permanent teeth (DMFT), mean values were statistically analysed for their relationships with different categories of all involved variables. As DMFT scores only have discrete values, Poisson mixture regression was adopted to model the heterogeneity and complex patterns in the association and to detect the subgroup. The Bayesian information criterion (BIC) was used to determine the optimal number of subgroups. A series of Wald tests were used to explore the relationship between risk factors including the interaction effects and the number of DMFT.

Results A total of 100 986 individuals aged 12–15 years old were analysed. The model clustered different individuals into three subgroups and built three submodels for detailed statistical inference simultaneously. The number of individuals in the three subgroups were 52 576 (52.1%), 41 969 (41.5%) and 6441 (6.4%), respectively. The mean (SD) of DMFT of the three subgroups was 0.50 (1.05), 0.99 (1.21), 5.59 (2.50). The model fitting results indicated that the effects of all risk factors on DMFT appear to be different in three subgroups. Controlling the confounding effects, our analysis implied that the regional inequality was the main contributing factor to dental caries among adolescents in Chinese mainland.

Conclusions The risk factors of dental caries exhibited heterogeneity in groups with different characteristics. The Poisson mixture regression model could cluster individuals into groups and identify the heterogeneous effects of risk factors among different groups. The findings support the need for different targeted interventions and prevention measures in groups with different dental health characteristics.

  • epidemiology
  • public health
  • statistics & research methods
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Strengths and limitations of this study

  • Data from the 4th Chinese National Oral Health Survey were analysed to obtain the prevalence and associated factors of permanent dental caries in Chinese adolescents aged 12–15 years old.

  • It is advantageous for epidemiological surveys to cluster individuals into groups with different dental health characteristics and therefore make targeted statistical inferences to identify the heterogeneity of risk factors.

  • Poisson mixture regression is a sound statistical model for the subgroup analyses. It has the flexibility in modelling the zero-inflated count measure of dental decay and characterising the heterogeneous risk factors of dental caries.

  • The modified Expectation Maximization (EM) algorithm is an efficient parameter estimation method for Poisson mixture regression, while the Bayesian information criterion can be used to determine the number of subgroups.

  • This research on dental caries using cross-sectional data collection could not represent casual relationships between dental caries and the related factors, and hence, was unable to indicate how caries changes over time in the general population or in a given sample and what variables are associated with the overall trend and the differences within and among persons.

Introduction

The adolescent period is an important stage of children’s growth and development.1 After the replacement of primary teeth by their successors, new dentition—namely, permanent dentition—commences and functions during the following decades of life. Thus, the WHO regards 12 years old to 15 years old as a reference group for population oral health surveys.2 Dental caries of permanent teeth is the most prevalent oral disease and, according to the Global Burden of Disease Study 2016, is the disease with the second highest incidence.3

During the last two decades, the prevalence of permanent caries in 12-year-old Chinese children fluctuated, with rates of 45.8%, 28.9% and 38.5% in 1995, 2005 and 2015, respectively.4–6 Meanwhile, the characteristics of the Chinese population are steadily changing, with a decrease in the size of the rural population from 57.0% in 2005 to 43.9% in 2015. As the inequality of economic development, the lifestyles of the adolescents’ families were quite different. The adolescents’ oral disease patterns are expected to appear the heterogeneity. In particular, the risk factors for dental caries can be different for different groups characterised by their individual characteristics such as demographics and social economic status. For example, the geographic heterogeneity analysis of dental caries demonstrated inequalities in decayed, missing and filled permanent teeth (DMFT) among different regions of the country.7 The oral behaviours of adolescents, including toothbrushing habits, consumption of sugary foods and dental attendance, also vary in different provinces and urban/rural areas.8 9 This motivates the needs of clustering individuals into groups in light of the heterogenous associations between the individual characteristics and the risk of dental caries. It is advantageous for epidemiological surveys to cluster individuals into groups with different dental health characteristics and therefore make targeted statistical inferences to identify the heterogeneity of risk factors.

In this work, we propose to use the Poisson mixture regression model to analyse the dental dataset and to perform the subgroup detection to identify the heterogeneity of risk factors. The finite mixture regression model was first proposed by Wolfe,10 and has been successfully applied to solve many problems in various research areas.11 12 The Poisson mixture regression model is a systematic statistical approach that explicitly models the mixture distribution of the regression coefficients, leading to clear interpretation of fitting results for the subgroup analyses. There is a formal statistical procedure to determine the number of mixture components in the model which corresponds to the number of subgroups in the population. A suitable probability distribution model can also be added into the model according to the characteristics of the outcome variables of interest, resulting in a more flexible and adaptive data analysis procedure.13 14

As dental caries are posing great disease burdens to the adolescent population,15 caries control has been put onto the agenda of the government by carrying out national public health programmes with broad application of appropriate preventive techniques for children. Based on all of the above, the aim of this study is to use the Poisson mixture regression model to cluster individuals into groups and to identify the heterogeneity of risk factors for dental caries in Chinese adolescents. Analysis of the most recent prevalence data of caries among adolescents and its related factors will yield better, more reasonable suggestions on the reformation and innovation of policies related to dental public health.

Materials and methods

Description of data

This cross-sectional study used data from the 4th National Oral Health Survey of China (2015–2016). All 31 provinces of the China Mainland participated in the survey. The specific survey description is supplied in online supplemental appendix. Data of the 12–15 years old age group was used for the present study.

Poststratification weights were used to adjust for differences in the age‐by‐sex‐by‐location‐by‐province distribution between the sample and the general population in the 31 provinces involved in the study, consistent with the 6th National Demographic Census in 2010.16 17

The number of DMFT was calculated to determine the overall caries status and experience. The risk factors involved in the present study included six categories: social demographic factors, oral hygiene behaviours, sugar consumption habits, utilisation of dental service, oral health knowledge and pit-and-fissure sealants. Detailed information on the grading standard of each variable is presented in online supplemental appendix table 1. The variables extracted from the examination tables and questionnaires were representative oral health-related factors, which were further statistically analysed with regard to their relationships with dental caries experience.

Patient and public involvement

All participants were selected using multistage stratified cluster sampling method and provided written informed consent to take part in the study.

Model and estimation

Let y denote the response variable, that is, the number of DMFT, taking the count values from 0 to 28. Let x represent a p dimensional covariate vector including all the risk factors and their interactions as well as other confounding factors. We consider a Poisson mixture regression model, where the conditional distribution of y given x is specified as follows:

Embedded Image

Embedded Image(1)

where K is the number of subgroups, Embedded Image and Embedded Image. For the kth subgroup, Embedded Image is a Poisson probability mass function with mean intensity Embedded Image, and Embedded Image represents the effects of x. Embedded Image are probability weights that are associated with component distributions Embedded Image, which satisfy

Embedded Image

In model (1), Embedded Image represents the effect of covariate Embedded Image in subgroup k. When the values of other covariates Embedded Image are fixed, from model (1), we have

Embedded Image

where Embedded Image, Embedded Image represent two different values of Embedded Image. WhenEmbedded Image

Embedded Image

which is the change of the log mean of y in subgroup k per unit increase in Embedded Image, defined as the effect size of Embedded Image. When Embedded Image, Embedded Image has positive effects on the average of y. Similarly, when Embedded Image, the effect of Embedded Image on y goes in the opposite direction. In addition, by comparing Embedded Image with fixed j, we also could infer the effects of factor Embedded Image on the response among different subgroups. We summarise the estimation procedure of β in online supplemental appendix.

All of the above analyses were performed with the R package ‘FlexMix’.18–20 As DMFT scores only have discrete values, we employed the Poisson mixture regression model into the procedure of subgroup association detection. An optimised group number was selected from a pre-specified number established based on the Bayesian information criterion (BIC)21 22 with all computation completed by the modified EM algorithm. In each iteration of the M-step, the algorithm will remove the subgroups with the probability weight below a given threshold set as 5%20 for meaningful interpretations of subgroups, ensuring that the estimated subgroups all have a sample size large than 5% of total. We defined that the estimates represented the effect of the risk factors on DMFT in each subgroup. The estimates were compared in detail among the different subgroups. A series of Wald tests were also conducted to explore the relationship between the number of DMFT and risk factors with an interaction effect. P<0.05 was regarded as statistical significance (two-sided).

Results

The data set consisted of 118 601 individuals aged 12–15 years old. We focused on 100 986 individuals with complete observations. The model fitting results with different pre-specified number of subgroups (Embedded Image) are summarised in online supplemental appendix table 2. The estimated group number was denoted as k (Embedded Image). In our analysis, the k values were always equal to 3 under circumstances of Embedded Image. By comparisons of the BIC, we conclude that three subgroups could sufficiently represent the heterogeneity of risk factors of dental caries among the 12 years old population.

In the three subgroups (subgroup 1–3), the number of individuals were 52 576, 41 969 and 6441. The estimated proportions of three subgroups were Embedded Image, Embedded Image and Embedded Image, respectively. The mean DMFT values (SD) of the three subgroups were 0.50 (1.05), 0.99 (1.21) and 5.59 (2.50), which had statistical difference among subgroups (p<0.001). In subgroup 1, 50.8% were boys, while 49.2% were girls. In subgroup 2, there were 48.6% boys and 51.4% girls. In subgroup 3, 50.4% were boys and 49.6% were girls. The study population was also evenly distributed to urban (37.1% in subgroup 1, 42.5% in subgroup 2 and 39.7% in subgroup 3) and rural (62.9% in subgroup 1, 57.5% in subgroup 2 and 60.3% in subgroup 3) areas. In the three subgroups (subgroup 1–3), 86.4%, 87.6% and 86.1% children were reported to brush their teeth for more than once a day. The summary information of subgrouping results are shown in table 1. The marginal distribution of observed and fitted DMFT within each subgroup are summarised in online supplemental appendix table 1 and figure 1. The fitted R-squared value for this model is 0.785.

Table 1

The summary information of the three subgroups’ results

Table 2 shows the parameter estimates and the 95% CIs of the three subgroups in the model. The results of whether a significant difference existed among the effects of the same variable in the three subgroups are summarised in online supplemental appendix tables 4 and 5.

Table 2

Estimation and 95% CI of three different subgroups in model (1)

Social demographic factors

The effects of regional factors in the eastern area were strongest in subgroups 1 and 2, but the influence sharply decreased with the increasing severity of DMFT in subgroup 3 (p<0.001). Adolescents in the eastern region had the largest DMFT in subgroup 1 and the smallest DMFT in subgroup 2. However, the differences among the different subgroups in the western and central regions were not significant (p>0.05). The adolescents with older age were more likely to have more DMFT. The influence on subgroup 3 was the strongest (p<0.05). The average DMFT of males was less than that of females in all subgroups. Adolescents in urban areas had fewer DMFT in all subgroups, but the influence decreased with the increasing severity of DMFT. The estimates of Only child and educational level factor were less than 0.1 in all subgroups.

Oral hygiene behaviour

Adolescents who brushed their teeth every day had fewer DMFT in all subgroups, but the influence decreased with the increasing severity of DMFT. The estimates were –1.606, –1.231 and −0.849 in subgroups 1, 2 and 3, respectively.

Sugar consumption habits

Adolescents with a higher frequency of sugar consumption had more DMFT in all subgroups, but the influence decreased as the severity of DMFT increased.

Pit-and-fissure sealant history

Adolescents who had received pit-and-fissure sealant had fewer DMFT in all subgroups, but the influence decreased with the increasing severity of DMFT in subgroup 3 (p<0.001). The estimates were –0.178, –0.129 and −0.095 in subgroups 1, 2 and 3, respectively.

Oral health knowledge

The higher the adolescents’ oral health knowledge, the fewer DMFT in all subgroups. However, all estimates were less than −0.01.

Interaction effects

Tables 3 and 4 summarise the estimated effects for region factor Embedded Image and Embedded Image, under the different levels of the sugar intake factor Embedded Image. With increasing sugar consumption, the effects of the eastern regional factor gradually weakened, while the effects of the central and western regions became significant and gradually increased. Table 5 summarises the estimated effects of Embedded Image under different values of Embedded Image and Embedded Image. In subgroup 1, the effect of sugar consumption was greatest in the western region (0.130), second in the central region (0.100) and lowest in the eastern region (0.019). In subgroup 2, the influence of sugar consumption was highest in the eastern region (0.122), second highest in the western region (0.051) and lowest in the central region (0.028). In subgroup 3, the influence of sugar consumption was greatest in the western region (0.030), second in the central region (0.019) and lowest in the eastern region (0.007). With an increase in DMFT, the influence of the sugar consumption factor in the eastern and central regions gradually weakened. Table 6 shows the estimated effects of Embedded Image the frequency of tooth brushing, under different levels of utilisation of the dental service factors Embedded Image and Embedded Image. Adolescents who brushed their teeth every day had fewer DMFT when their self-assessment of their teeth was ‘very good’, ‘good’ or ‘fair’. However, as the self-assessment decreased, the influence gradually weakened until the effect reversed. The same phenomenon occurred for the associated factor of dental experience. The influence of toothbrushing frequency on reduced DMFT was weakened when adolescents had dental experiences.

Table 3

Estimation and 95% CI for Wald test to the effect of regional factors Embedded Image under the case of controlling sugar intake factor Embedded Image at different levels in different subgroups

Table 4

Estimation and 95% CI for Wald test to the effect of regional factors Embedded Image under the case of controlling sugar intake factor Embedded Image at different levels in different subgroups

Table 5

Estimation and 95% CI for Wald test to the effect of sugar intake factor Embedded Image under the case of controlling regional factors Embedded Image and Embedded Image at different levels in different subgroups

Table 6

Estimation and 95% CI for Wald test to the effect of the tooth brushing factor Embedded Image under the case of the utilisation of dental service factors Embedded Image and Embedded Image at different levels in different subgroups

Discussion

This study used data sourced from the 4th National Oral Health Survey of China, which was conducted in 2015–2016 and covered all 31 provinces, municipalities and autonomous regions in the China Mainland. In the analysis in our previous study,6 we obtained the prevalence and associated factors of permanent dental caries in Chinese adolescents aged 12–15 years old and presented a descriptive analysis of the current condition of dental caries.

The mixture regression model is a novel approach to perform subgroup analysis and is applicable to many different types of data, both discrete and continuous. In contrast to the original regression method, the mixture regression model method partitions all individuals into data-driven subgroups and builds models with different coefficients for different subgroups. Rather than discovering the relationships among factors of interest and oral disease patterns by assuming all individuals share common effects, as in general regression analysis, we can make further comparisons of the effects of factors on the status of dental caries among different subgroups. This method provides more detailed and accurate results because the influence of the same factor may be completely different in different populations. In doing so, we can easily find commonalities as well as differences among different groups. Thus, the results of the analysis are more informative, and we can provide more detailed and helpful suggestions to populations according to their characteristics. In reality, all effects can be clustered into three different types: constant, varying and nested varying effects, which were varying in the first step and classified by Wald tests based on the model fitting results. In the second step, we were able to refit the model by adding classifications of all effects and obtain more accurate estimation results.

According to model (1), the conditional distribution of DMFT for an individual i in subgroup k given Embedded Image was specified as a Poisson distribution with mean Embedded Image, a function of covariate Embedded Image. Thus, the conditional distribution was individual-specific and the marginal distribution of DMFT across individuals did not simply follow a Poisson distribution. The fitted R-squared value of Poisson mixture regression was 0.785, and the marginal distribution of fitted DMFT was very close to that of the observed DMFT within each subgroup (see online supplemental appendix table 3 and figure 1). We also evaluated the model fitting of Poisson regression and zero-inflated Poisson (ZIP) regression. The fitted results were summarised in online supplemental appendix figure 2, tables 3 and 6. The results show that for both ZIP and Poisson regression models, the fitted zeros are much smaller than the observed zeros, while the fitted ones are much larger than the observed ones. This implies that both models do NOT fit the data very well, although the ZIP model may slightly improve the fitted number of zeros for DMFT compared with a regular Poisson regression model. We believe the lack of goodness-of-fit is due to the heterogeneity in the effects of risk factors on DMFT among population. Thus, our proposed Poisson mixture regression model with subgroup detection can yield better model-fitting results in the present study.

According to our analysis, there was strong statistical evidence that the population can be divided into three groups according to BIC. By comparing the mean DMFT values, we found that the mean DMFT value of subgroup 3 was much higher than that of other two subgroups. With the Poisson mixture regression model which clustered individuals into groups and helped us to find the individualised group, the subgroup 3 was referred to as adolescents with high risk of dental caries. According to the parameter estimates of the three subgroups in the model, we also found that the heterogeneity of risk factors can be explained by social demographic characteristics.

Dental caries is a disease caused by an ecological shift in the composition and activity of the bacterial biofilm when exposed over time to fermentable carbohydrates, leading to a break in the balance between demineralisation and remineralisation.23 The prevalence of dental caries has been increasing with China’s rapid economic development in the past 10 years.6 However, due to the distinctions in economic status, culture, education and diet among different regions in China, the prevalence of dental caries differs widely from east to west. This study found that adolescents in the eastern region have fewer DMFT in the average level of the caries population and that adolescents in the central region have fewer DMFT in the high caries population. Adolescents in the western region still had the highest prevalence of dental caries. By the end of 2015,24 there were great regional differences in gross domestic product (GDP). The per capita GDP of the eastern, central and western regions was $11,400, $6635 and $5967, respectively. According to the Report of the National Investigation of Resources for Oral Health in China (2015), the ratio of the number of dentists to the population of the east, centre and west regions was 1:6265, 1:8253 and 1:9968, respectively.25 In addition, the economic development level and the distribution of human resources for oral health in China were extremely uneven. For instance, in some provinces and municipalities at a higher level of economic development, such as Beijing and Tianjin, the ratio of dentists to the population had already reached the WHO criteria and in some areas (eg, the urban area) had even exceeded the standard of developed countries. However, in underdeveloped areas, this ratio remained much lower than the WHO criteria, and the dental workforce was seriously insufficient. This unequal distribution of economic development levels and human resources for oral health is be a major reason for the regional differences in dental caries.

The rapid growth in regions with high-level economic development has changed lifestyles; more cariogenic foods and drinks are consumed as a result, making sugar the greatest dietary risk factor for the development of dental caries.26 There is a change in behaviour from early childhood to adolescence as adolescents become more independent in selecting their food and drinks. This could increase the risk of caries development.27 As shown in our study, adolescents with a higher frequency of sugar consumption had more DMFT, but the influence decreased with the increasing severity of DMFT. In the case of controlling sugar intake and increasing sugar consumption, the influence of the eastern regional factor gradually weakened, while the difference between the central and western regions became significant and gradually increased. We previously noted that the eastern region had the most highly economically developed areas and greater accessibility of oral health services. This may be why the influence of sugar consumption weakened with increasing sugar consumption in the eastern region. We hypothesised that sugar consumption could have a more powerful influence in the western or central regions due to regional inequality.

Older adolescents are more likely to have more DMFT. Clinical study data show that newly erupted teeth, especially the first permanent molars, are most vulnerable to dental caries in the first 2–4 years.28 Our study showed that the influence of age was strongest in the high caries subgroup. Therefore, the importance of focusing on preventive measures in the first critical years after tooth eruption in the high caries risk population based on the caries risk assessment system should be emphasised.29

As found in previous studies,30 31 pit-and-fissure sealant could prevent caries. Approximately 60.9% of dental caries in the permanent teeth of 12-year-old Chinese children occurs in the pits and fissures of posterior teeth.2 Despite strong evidence of effectiveness, sealants were underused, especially among adolescents at higher risk for dental caries.32 Less than 7% of 12-year-old Chinese adolescents have received sealants,2 which is far below the level of developed countries.33 US national data indicate that 38% of this population received sealants, compared with 47% of higher-income children. Increasing the prevalence of sealants among children is a national health objective in China.34 Moreover, our study found that the power of sealants for preventing DMFT decreased with increasing severity. This is an indication that sealant is not sufficient for the prevention of caries in high-risk adolescents. These adolescents must be managed aggressively to eliminate or reduce the possibility of new or recurrent caries lesions. For example, bacterial testing, antimicrobial treatments, 1.1% NaF toothpaste, 5% NaF fluoride varnish and xylitol are standard regimens for all high-risk patients.35

This study has some limitations. This research on dental caries using cross-sectional data collection could not represent casual relationships between dental caries and the related factors, and hence, was unable to indicate how caries changes over time in the general population or in a given sample and what variables are associated with the overall trend and the differences within and among persons. In addition, dental caries data with larger scale and longitudinal comparisons would be necessarily beneficial to validate the model-fitting results by the Poisson mixture regression model in future studies. Nevertheless, the latest data used in the present study could provide some helpful suggestions for the future trend of oral health promotion measures in China. Owing to the heterogeneity of risk factors in different populations, group-individualised strategies based on risk assessment levels are needed for adolescents with distinct dental health characteristics and will affect decisions in the use of antibiotics, fluoride, sealants, and the frequency of radiographs and periodic oral examinations, as well as management procedures for other risk factors. Public health policy makers should also keep in mind that group individualisation is of important value to be carefully considered in planning future population-based oral health promotion programmes.

In summary, the risk factors of dental caries exhibited heterogeneity in groups with different characteristics. The Poisson mixture regression model could cluster individuals into groups and identify the heterogeneous effects of risk factors among different groups. The findings support the need for different targeted interventions and prevention measures in groups with different dental health characteristics.

Acknowledgments

We are very grateful to all the participants, examiners and interviewers in the Fourth National Oral Health Survey.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • CY and JH contributed equally.

  • Contributors CY, JH, JK, SZ contributed to conception, design, data acquisition, analysis, and interpretation, drafted and critically revised the manuscript; XS contributed to conception, design, and interpretation, drafted and critically revised the manuscript; all authors gave final approval and agree to be accountable for all aspects of the work.

  • Funding This study was supported by ‘Scientific Research Fund of National Health Commission of the People’s Republic of China (201502002)’.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval Ethical clearance was approved by the Stomatological Ethics Committee of the Chinese Stomatological Association (approval number 2014-003).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available upon reasonable request. Data are available upon reasonable request. E-mail address: kqzsg86@bjmu.edu.cn.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.