Introduction WHO has set a goal to reduce the prevalence of stunted child growth by 40% by the year 2025. To reach this goal, it is imperative to establish the relative importance of risk factors for stunting to deliver appropriate interventions. Currently, most interventions take place in late infancy and early childhood. This study aimed to identify the most critical prenatal and postnatal determinants of linear growth 0–24 months and the risk factors for stunting at 2 years, and to identify subgroups with different growth trajectories and levels of stunting at 2 years.
Methods Conditional inference tree-based methods were applied to the extensive Maternal and Infant Nutrition Interventions in Matlab trial database with 309 variables of 2723 children, their parents and living conditions, including socioeconomic, nutritional and other biological characteristics of the parents; maternal exposure to violence; household food security; breast and complementary feeding; and measurements of morbidity of the mothers during pregnancy and repeatedly of their children up to 24 months of age. Child anthropometry was measured monthly from birth to 12 months, thereafter quarterly to 24 months.
Results Birth length and weight were the most critical factors for linear growth 0–24 months and stunting at 2 years, followed by maternal anthropometry and parental education. Conditions after birth, such as feeding practices and morbidity, were less strongly associated with linear growth trajectories and stunting at 2 years.
Conclusion The results of this study emphasise the benefit of interventions before conception and during pregnancy to reach a substantial reduction in stunting.
- public health
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
Includes high-quality longitudinal data with low rates of missing data on child growth and a wide range of prenatal and postnatal household, family and environmental factors, child characteristics at birth, infant feeding and morbidity.
Employs decision tree-based methods that permit the inclusion of a high number of predictor variables, variables of different types and automatically discover complex interactions between predictor variables and include them in the model.
Some potentially important determinants of linear growth were not present in the database.
The study does not include stratified analyses for girls and boys.
Linear growth is considered to be the best overall indicator of children’s present and future health1 2 and the reduction of growth failure is one of the targets within the sustainable development agenda. Stunted growth is associated with short-term morbidity and mortality, impaired cognitive development, lower future productivity and increased risk of adult chronic diseases.3 In 2012, WHO adopted a resolution on maternal and child undernutrition, targeting a reduction of stunting by 40% by 2025.4 Linear growth is most susceptible to environmentally modifiable factors from conception up to 2 years of age, that is, the first 1000 days when most of the growth faltering takes place.5 6 To develop and deliver appropriate interventions, it is imperative to establish the relative importance of stunting risk factors. In addition, the sustainable development health goal has emphasised the personalised perspective under the universal coverage of healthcare. Precision public health interventions by identifying and targeting high-risk subgroups can be one of the strategies to reach this goal.7
Previous studies employing classical statistical methods have identified a wide range of prenatal and postnatal factors associated with impaired growth.8–13 Low birth weight, maternal height, maternal education, poverty and inadequate complementary feeding practices have been recognised as important risk factors.14–16 Some analyses emphasise the importance of fetal growth restriction for later stunted growth, but rarely is the relative importance of prenatal and postnatal factors assessed.17 Despite these findings, policy documents and recommendations emphasise interventions especially after birth, and prenatal recommendations are usually limited to routine micronutrient supplementation for pregnant women.18–20
Despite a wealth of literature relating to the determinants of stunting, studies with a holistic approach, which concurrently account for household, environmental, nutritional, biological and socioeconomic influences, are few. Moreover, individuals and groups may be stunted for various reasons and thus respond differently to interventions. Studies that identify risk groups with different probabilities of stunting are, to the best of our knowledge, not yet available. The available studies with a multifactorial approach have frequently had a cross-sectional design and have applied traditional statistical methods. As visualised in WHO’s conceptual framework on childhood stunting,21 the causes of stunted linear growth are complex. The number of risk factors and the complexity of the associations of these risk factors with linear growth restriction make traditional statistical models ineffective from a predictive perspective. Moreover, classical statistical methods do not have the capacity to identify groups with different risks based on the combinations of predictors. Decision trees are popular data mining (DM) methods, which allows for the inclusion of a high number of predictor variables, handling variables of different types, automatically discovering complex interactions between predictor variables and including them in the model.22 Decision tree-based algorithms can be used to rank a high number of predictors according to their relative importance for the outcome and to identify subgroups with different risk patterns.
The Maternal and Infant Nutrition Interventions in Matlab (MINIMat) was a randomised prenatal food and multiple micronutrient trial carried out in rural Bangladesh. The frequent follow-up of mothers and children participating in this trial resulted in an extensive database, including frequent prenatal and postnatal anthropometric assessments, socioeconomic and biological characteristics of the mother and father, information on maternal exposure to violence, household food security, breastfeeding and infant-feeding practices, and measurement of morbidity of the mothers during pregnancy and repeatedly of children up to 24 months of age. The aim of this study is to, within this Bangladeshi cohort, assess the relative importance of determinants of linear growth from 0 to 24 months and risk factors for stunting at 2 years, and to identify risk groups with negative growth trajectories and high prevalence of stunting at 2 years.
Study setting, participants and study design
The MINIMat trial (isrctn.org identifier: ISRCTN16581394) was carried out in Matlab, Bangladesh, a rural delta region located 57 km southeast of the capital Dhaka. In this area, a health and demographic surveillance system enables early pregnancy identification and longitudinal follow-up. Pregnant women were enrolled in the MINIMat trial and the follow-up included their offspring. MINIMat was a factorial randomised trial primarily evaluating the effect of an early invitation to prenatal food supplementation (vs usual timing) combined with multiple micronutrient supplementation (vs usual programme iron-folate) to pregnant women on maternal haemoglobin, birth weight, gestational age at birth and infant mortality.23 Further, the participating women were randomly assigned to either counselling for exclusive breast feeding or a different health education message of equivalent intensity.24 The MINIMat trial recruited pregnant women from November 2001 to October 2003. When a woman reported to a community health worker that her menstruation was delayed by more than 14 days, she was offered a pregnancy test and her date for the last menstrual period (LMP) was recorded. If LMP date was missing, the gestational age assessment was based on ultrasound examination. In total, 4436 pregnant women participated, giving birth to 3625 live-born infants from April 2002 to June 2004. The pregnant women were enrolled at around gestational week 8. In this analysis, the mothers and children were followed through pregnancy, birth and up to 2 years of age.
Written and oral informed consent was obtained from all participating women and from the parents of the participating children. The Ethical Review Committee at the International Centre for Diarrhoeal Disease Research, Bangladesh, approved the study (approval registration numbers 2000-025; 2002-031; 2005-004).
Predictor and outcome variables are presented in figure 1, grouped according to WHO conceptual framework of stunting.21 Data were collected using questionnaires, physical examinations and laboratory analyses. At enrolment, well-trained field workers collected information on women’s age, parity, marital status, educational level, occupation, maternal morbidity, socioeconomic characteristics and household food security. Socioeconomic status was assessed based on a range of household assets, and a continuous household asset score, with a mean value of zero, was constructed based on a principal component analysis.25 A validated household food security scale was created from 11 items with data on frequency of food purchased, cooked, borrowed or lent (food and money), and whether there was ready access to adequate meals and snacks.26 The participating women were also asked whether they had suffered any of 30 morbidity symptoms from 12 different categories, including airway, urinary tract, fever, circulation, bowel or pain symptoms during the last month. A sum score ranging from 0 to 12 was created based on the absence of symptoms or those not recorded for each category.
Home visits were followed by clinic visits at local health subcentres. Maternal height and weight were measured at around 8 weeks of gestation using a stadiometer to the nearest 0.1 cm and an electronic scale (Uniscale, SECA, Hamburg, Germany) with a precision of 0.10 kg. In the third trimester, paramedics interviewed the participating women in privacy regarding their experiences of domestic violence. A modified version of WHO collaborative study questionnaire was used,27 28 based on the Conflict Tactic Scale covering physical, sexual and emotional violence and controlling behaviour.29 Household drinking water was analysed for arsenic concentration.30
A birth notification system allowed birth anthropometry to be measured within 72 hours. In the few cases where the newborns were reached after 72 hours, the measurements were adjusted to the time of birth using an SD score transformation, assuming that the infants remained in the same relative position in the anthropometric distribution during this period.31 At birth, data on sex, birth weight, length and breastfeeding practices were collected. During the subsequent 2-year study period, the mother-and-child pairs were visited monthly in their homes during the first year, and every 3 months during the second year. On these occasions, data on infant feeding practices, child morbidity and anthropometry were collected. The mothers were interviewed about breast feeding and complementary feeding practices. Breastfeeding practices were categorised into exclusive, predominant, partial or any breast feeding for each month from 1 to 12 months. The total time for exclusive, predominant and any breast feeding was calculated. WHO recommendations guided the breastfeeding assessment32 and results were validated with a stable-isotope technique. The classification of exclusive breast feeding was found to suffer from limited misclassification in both directions and to be accurate at the group level.33 The food given to the infant was categorised into semisolids and solids each month from 1 to 12 months. The data collection did not include full dietary assessments or classification of dietary diversity and meal frequency.
The mothers were also asked whether the child had any of the following symptoms during the last week: fever, cough, difficult breathing, chest in-drawing, rapid breathing, diarrhoea, bloody diarrhoea and the duration of these symptoms.34 Categories were created based on whether the child had suffered from fever, respiratory symptoms, suspected pneumonia or diarrhoea, and the sum of days with each symptom and total morbidity calculated from birth to 24 months. To reduce the risk of recall bias, the mothers were visited monthly with an interview recall period of 7 days for child morbidity. One week has been found to be optimal for this kind of morbidity recall assessment.35
Children’s weight was measured by SECA beam and electronic scales (UNICEF Uniscale, SECA Gmbh & Co) with a precision of 0.01 kg. The length at birth and up to 1.5 years was measured with a collapsible, locally manufactured length board with a precision of 0.1 cm. From 1.5 to 2 years, height was measured to the nearest 0.1 cm, using a freestanding stadiometer. Head and chest circumference was measured with a measuring tape. Two measurements were recorded on each occasion and the mean was calculated. The equipment was calibrated daily and refresher training on data collection methods, including the standardisation of anthropometric measurements, was conducted periodically.
Height-for-age z-scores (HAZ) were calculated from the measured length and height data using the programme WHOAnthro, based on WHO growth reference for children.36 Children with a HAZ below minus two SD-scores were classified as stunted. Two outcomes were analysed: stunting at 24 months and the change in HAZ from birth to 24 months, referred to as Δ HAZ and calculated by subtracting HAZ at birth from HAZ at 24 months, that is, ΔHAZ=HAZ at 24 months − HAZ at birth.
A database was created with 309 variables characterising mothers and children in the MINIMat cohort from enrolment in early pregnancy up to the time when the children were 24 months of age. The subset of records that had height measurements at birth and 24 months was selected (n=2723). The average per cent of missing values among all the predictors was 4%. The highest per cent missing was among maternal morbidity data during pregnancy (22%) and categorical monthly child morbidity data (ill or not), ranging from 0% to 35% with the highest number of missing observations in the first months. The continuous child morbidity data however (sum of days with different types illnesses) had no missing values. The most important variables identified by the random forest analyses and the variables included by the conditional inference trees (CIT) had less than 1% missing values. The missing values of the predictor variables were imputed. To find the best method to impute the missing data, we made a simulation study of the performance of the following imputation methods: imputation by variable mean, K-nearest neighbour imputation37 and random forest imputation.38 The design of the study followed a procedure similar to the strategy described in Jönsson and Wohlin37; see online supplementary appendix. Accordingly, we imputed the data by use of the random forest as the simulation study revealed that this method provided the most accurate imputations.
Decision trees22 are DM methods that allow for specifying an arbitrarily high number of predictor variables, handle variables of different types, automatically discover complex interactions between predictor variables and include them in the model. Traditional decision trees, such as classification and regression trees, have been shown to be biassed.39 This motivated us to select the CIT framework, a method that embeds a statistical hypothesis-testing framework into a recursive partitioning algorithm used for model building.39 CIT were used in order to identify subgroups characterised by combinations of levels of certain predictors with distinct values of Δ HAZ or prevalence of stunting at 24 months. Cross validation, a well-established model selection method that selects a tree with an optimal predictive performance for new unseen data, was applied. Cross validation splits the dataset into different train and test sets repeatedly, estimates the model in one set and validates the prediction on another set, followed by an aggregation of the predictions.40 To ensure public health relevance, the minimum number of observations in each terminal node (subgroup) was set to 250.
Conditional random forest (CRF) analyses were performed to assess and rank the importance of predictors with regard to their ability to explain the variation of the continuous outcome of the change in HAZ from birth to 24 months and the presence of stunting at 24 months of age. In conditional random forest analysis, an ensemble of CIT is created by means of drawing subsamples from the original data and fitting a unique randomised CIT to each sample. Possible predictors at each split are selected randomly from the complete set of predictors, which leads to a better predictive performance of the tree ensemble.40 The importance of a variable is computed by comparing the predictive mean squared error (MSE) from the original data and a dataset where the corresponding variable values are specified incorrectly, which makes the variable irrelevant for the prediction. If the variable does not contribute to the prediction, the MSE is expected to be small when the values of the variable are permutated. An aggregated difference between the MSE values over the given ensemble of trees makes up the relative importance of a variable. The random forests analyses were created based on 3000 trees, and the 30 variables with the highest importance measure are presented. The programming language R V.3.2.441 and the ‘party’ package42 were used for all analyses.
Patient and public involvement
No participants were involved in developing the hypothesis, the specific aims or the research questions, nor were they involved in developing plans for design or implementation of the study. No participants were involved in the interpretation of study results or write up of the manuscript. There are no plans to disseminate the results of the research to study participants.
There were 4436 women enrolled into the MINIMat trial, of whom 845 were lost to follow-up before delivery, mainly due to fetal death, outmigration, or because they withdrew their consent. Of the 3625 live-born children, including twins and triplets, 155 died between birth and 2 years and 682 were excluded because of missing anthropometry, at birth or at 2 years, resulting in 2723 children available for analysis (figure 2). In the non-analysed group, there was a slightly higher percentage of mothers with more than 5 years of education, younger than 20 years, and belonging to the lowest socioeconomic tertile, and preterm births of children.
The characteristics of the households, mothers, fathers at 8 weeks of gestation and children at birth are given in table 1. The participating mothers had an average age of 26 years (SD 5·6), a mean height of 150 cm (SD 5·3) and a mean weight of 45 kg (SD 6·8) at recruitment. One-third of the women were underweight, with a body mass index below 18·5 at pregnancy week 8. The average number of years of education was similar for mothers and fathers (5 years). The sample of children comprised an equal proportion of girls and boys, and the average birth length was 47·8 cm (SD 2·2), and of birth weight, 2676 grams (SD 410·5). At birth, HAZ was low (mean = −0·94), and declined further at up to 2 years of age with a mean change of −1 HAZ, resulting in a mean HAZ at 2 years of −2·0 (figure 3) and 50% being stunted (girls 51·1%, boys 48·5%).
Relative importance of predictors for stunting at 24 months and change in height scores from birth to 24 months
The relative importance of predictors with respect to their ability to explain the probability of stunting at 24 months and the change in HAZ from birth to 24 months is presented in figures 4 and 5. HAZ and weight-for-age z-scores (WAZ) at birth were the most important predictors of stunting at 24 months, followed by maternal height, small for gestational age (SGA), maternal weight at 8 weeks of gestation, household asset score and parental education. The most important factors for Δ HAZ were HAZ and WAZ at birth, pregnancy duration, head and chest circumference at birth and maternal education.
Subgroups with different levels of stunting at 24 months and levels of change in height scores from birth to 24 months
The CIT presented in figures 6 and 7 display subgroups with different probability of stunting at 24 months and levels of Δ HAZ 0–24 months due to distinctive combinations of levels of certain predictors. The CIT for stunting and ΔHAZ were composed of subgroups defined by the same predictors, specifically, HAZ at birth, maternal height, father’s educational level and the number of saris owned by the mother. The probability of stunting ranged from 14% to 84%. Children with a HAZ at birth below −1·19, born to mothers with a height below 151.4 cm, who owned less than five saris, had the highest probability of stunting at 24 months, at 84%. Children of a father with more than 7 years of education, who had HAZ at birth above −0·2, had the lowest probability of stunting at 24 months, at 14% (figure 6). The difference in Δ HAZ between the identified subgroups of children with the most negative change and the subgroup with the most positive change was 2·22 HAZ. Children who already had a low HAZ at birth (≤−2·33) had the most positive change in HAZ from birth up to 24 months (+0·18 HAZ), while children who were born with a HAZ above 0.19 had the most negative Δ HAZ (−2·04 HAZ) (figure 7).
In our analysis of 309 predictors characterising household, environmental, biological and socioeconomic factors, we found birth size, maternal anthropometry and parental education to be the most influential for linear growth up to and stunting at 24 months. Conditions after birth, such as feeding practices and morbidity, were less important for linear growth trajectories and stunting at 2 years. The difference between the identified subgroups of children with the highest and lowest probabilities of stunting was high.
The most important predictors of stunting at 24 months were different indicators of size at birth, maternal height, asset score and maternal education. These findings are in line with a multicountry longitudinal study that found birth or enrolment weight of the infant and maternal height to have the highest cumulative ORs for linear growth deficit up to 2 years of age.11 These results add to the growing evidence that a large part of linear growth faltering already originates in fetal life.11 17 43 In a pooled analysis of 19 birth cohorts with longitudinal follow-up, 20% of stunting was attributable to SGA weight at birth.17 That study did not include any postnatal factors in the analysis. In a study in Indonesia, neonatal length and weight were the strongest predictors of nutritional status and increases in weight and length during infancy.43 Our study included both prenatal and postnatal factors and, in contrast to most other studies, assessed the relative importance of different potential predictors and the public health importance of each element.
In a study with pooled data from five demographic and health surveys in South Asia, maternal height and underweight, household wealth, maternal education and minimum dietary diversity were found to be the most important factors among children aged 6–23 months.16 Similar results were reported from a study in India.44 These studies were, however, cross sectional, without access to birth characteristics.
Maternal height is a strong determinant of fetal growth45 that indirectly reflect the epigenetic heredity. Maternal height is directly associated with the uterine volume,46 cephalopelvic disproportion and subsequent infant and childhood stunting, and child mortality.47 48 In a previous analysis of the MINIMat cohort, a short maternal height was strongly associated with stunting all the way up to 10 years of age.48 Thus, factors that well precede pregnancy generate a vicious intergenerational cycle, where small mothers give birth to small children of whom a high proportion become and remain stunted. In the CIT for stunting at 24 months, children who were born with a higher HAZ but who had shorter mothers were as likely to be stunted as children with lower HAZ at birth but with a taller mother. This finding suggests that intergenerational improvements in height are achievable and that interventions with a particular focus on adolescents and women of reproductive health are needed to break the vicious intergenerational cycle.
A strong relationship between stunting and poverty has been reported from many low–middle income settings.49 Asset score and other socioeconomic markers, such as the number of shoes and saris the mother owned, were highly ranked in the random forest analysis and categorised subgroups with a higher probability of stunting and undesirable linear growth trajectories. Poverty is associated with unfavourable food and sanitation practices that can lead to poor nutrition and an increased occurrence of infections during pregnancy, infancy and childhood. Poverty increases the risk of maternal stress, depression50 and weak mother-to-child interaction and stimulation.
The number of shoes and saris the mother owns might also be markers of the woman’s status in the household. During the last few decades, the importance of women’s position in household and society for child nutrition has been emphasised.51 Maternal status is associated with food allocation to mother and child, and a higher level of maternal autonomy has been associated with better child weight and lower levels of stunting.52 The subordinate position of women in South Asia has been suggested to be a contributor to the high prevalence of child undernutrition in the region, compared with other areas with equivalent levels of economic growth and food security.51
An acknowledged way of increasing women’s position is through improved education. The remarkable health achievements in Bangladesh over the past two decades can partly be attributed to the progress in access to education, especially at primary level and for girls.53 However, there is a considerable risk of not completing primary school for both girls and boys.54 In 2013, the continuation to the last grade of primary school (5 years) was 75%55 and, in our study, less than 50%. In the conditional decision trees models for stunting and change in HAZ, the cut-off values for paternal and maternal education in the groups with a lower prevalence of stunting and a more positive change in HAZ from birth to 24 months ranged from 6 to 8 years, furthering the importance of girls and boys enrolling in and continuing at school.
It may seem contradictory that children who were born with a very short length had the smallest change in HAZ. This finding most likely reflects a situation where linear growth had already been severely restricted in fetal life.
A multicountry pooled analysis of cohort studies showed that a higher cumulative burden of diarrhoea increased the risk of stunting.56 In situations, where measles still occurred, its impact on growth and mortality risks were repeatedly documented.57 One explanation to the discrepancy between our results and previous findings could be Bangladesh’s remarkable success in achieving the globally highest coverage of oral rehydration therapy in diarrhoea,58 which may have reduced the impact on linear growth. Another factor is the almost universal immunisation coverage53 59 that has reduced or partly eliminated immunization-preventable morbidity and the subsequent effect on growth. Our previous publications on the MINIMat prenatal nutrition interventions’ effects on child growth and mortality were not mediated through morbidity,23 60 further supporting the modest impact of child morbidity on linear growth in our sample.34 In other settings with lower coverage of diarrhoea treatment and immunisation, the relative importance of these factors may be greater.
Suboptimal infant and early childhood feeding practices have, in earlier studies, been reported as significant risk factors for stunting.61 A systematic review and meta-analysis of 17 trials showed an average effect of 0.5 cm in height when children 6–24 months had been randomised to appropriate complementary foods.62 The infant feeding variables included in our analysis ranked low in the random forest analysis and did not show up in any of the CIT. In spite of the relatively few documented effects of complementary feeding programme on stunting, these interventions are often the priority in efforts to combat stunting.
The nutrition interventions from preconception to 2 years of age currently recommended by WHO include efforts to ensure exclusive breast feeding, adequate complementary feeding, appropriate nutritional care of sick and malnourished children and proper intake of vitamin A, iron and iodine for women and children.19 All of these, except micronutrient supplementation to pregnant women, are focused on the postnatal period from birth up to 2 years. Our results strengthen the evidence that the process of becoming stunted already begins in utero, as well as the importance of intergenerational effects. Although worthwhile, the present focus on postnatal interventions results in missed opportunities to intervene before or during the first 9 months when the process of stunting is established.
So, what possibilities do we have to improve the postnatal linear growth trajectories prenatally? Attained height is mainly dependent on one’s genetic potential for linear growth, in turn determined by DNA sequence polymorphism63 64 and epigenetic heredity,65 and to some extent the environment. The modulation of non-DNA sequence epigenetic heredity has been proposed to be one of the leading factors explaining variations in height and height changes over generations,65 especially in more deprived populations.66 Postnatal interventions can influence factors in the environment that constrain the ability to increase linear growth, while prenatal interventions also have the potential to modulate the actual growth potential through an epigenetic modification that results from changes to gene expression in response to the fetal environment.
Established prenatal nutritional interventions include balanced energy-protein supplementation, multiple micronutrient supplements and nutritional counselling and education. Unfortunately, most studies evaluating these interventions report only birth weight, not length, which is why evidence to directly assess the effect on fetal linear growth is limited. Meta-analyses and randomised trials evaluating these interventions report their positive impact on birth weight and a reduced risk of LBW.67–74 Effect sizes vary from increases in birth weight of 20–200 g, with the smallest effects seen in studies of multiple micronutrients and bigger effects seen by balanced energy-protein and lipid-based nutrient supplements. Considerable heterogeneity in growth response is common, and is related to the mother’s nutritional status when entering pregnancy and possibly also to the genetic potential to benefit. In the MINIMat food and micronutrient interventions, all women received food supplementation, but they were randomised to an early invitation to supplementation (week 9) or the usual programme start of supplementation (week 20). Children of mothers who participated in food supplementation from early pregnancy (vs the usual start) had a 13% reduction in stunting up to 5 years.60
There is increasing evidence that preconception interventions may be even more appropriate.75 A few trials examining the effect of interventions initiated before pregnancy are underway, but few results have so far been published.76 Preconception interventions have the potential to bring about epigenetic modulation and improved growth in present and future generations. Thus, the launch and evaluation of interventions targeting adolescent and women of reproductive age that focus on adequate health, education and nutrition before and during pregnancy are needed, especially in South Asia with its high burden of maternal undernutrition and young age at first pregnancy.77 Targeting high-risk subgroups, in this setting characterised by short, poor, women with low education, can be another strategy to address the intractable problem of stunting.
Strengths and limitations
The extensive database that was available for our analysis covered a wide range of household, family and environmental factors, child characteristics at birth, feeding and morbidity. Infant and young child growth was carefully assessed from birth up to 2 years. The MINIMat cohort was implemented in an excellent research infrastructure that fulfils the prerequisites for obtaining high-quality longitudinal data. Experienced field workers and study nurses collected data on the 309 variables during pregnancy and the following 2 years. They received repeated training, including standardisation exercises, and were supervised by senior medical doctors.
Some potential determinants were not present in the database. Household water, sanitation and hygiene characteristics were limited to information on arsenic contamination of the drinking water, but diarrhoea and other morbidity information were included in our analyses. Further, the cohort did not include the collection of stools for the study of enteropathogens in the child, which may be associated with the risk of stunting.11 Paternal height, which may be related to fetal growth, was not available.78 The mothers’ smoking habits were not represented in the data, as smoking was extremely rare among women in the study area.
There were slight differences in basic characteristics of the analysed and non-analysed groups. These differences had most likely no influence on the primary outcomes of this study. There were no or few missing values of the critical variables that ranked high in the random forest and defined the subgroups in the CIT. A substudy was carried out to ensure the most accurate method to impute missing data. Thus, it is also highly unlikely that missing data influenced the main findings.
Some of the included variables like ‘household asset score’ are composite variables, which depend on individual variables like TV ownership, number of cows, and so on. Presence of both composite and individual variables creates computational problems for traditional models like linear regression and for some machine learning models due to a possible high correlation between the individual and the composite variables. However, CIT methods perform automatic variable selection by choosing the most relevant variable (with the strongest association to the response) at each decision tree split step.39 Accordingly, these methods automatically choose either a composite variable or an individual variable at each split step based on the relevance of this variable to the response.
Traditional methods like linear regression often have lower predictive power than DM methods. In some cases, the traditional methods are not even possible to compute due to a high number of predictor variables and complex interactions. The method used in this work, CIT, belongs to the class of interpretable machine learning models and display precise information on the priority, size and direction of the association of the predictors with the outcome. In addition, the risk group identification, including the prioritisation and relevant cut-offs of risk factors, can be of high public health relevance for the design and targeting of appropriate interventions with the most significant benefit. Thus, we believe that the CIT framework has a large potential in public health and medical applications.
It can be noted that the CRF and the CIT models are not fully comparable. This can be explained by two factors. First, many predictors that were important in the CRF model are relatively highly correlated and thus have a similar relationship to the response. Once one of these variables is selected by the decision tree in a split, there is a high chance that the remaining correlated variables (although also important according to the CRF) will not be picked up as the next splitting variable. Second, the CRF models and the CIT models cannot be matched directly. The CRF is a combination of many trees and is thus a more flexible model than a CIT. However, CRFs are nearly black-box models: the only interpretable information that these models deliver is the variable importance measure. On the contrary, CITs are ‘transparent’ and interpretable models but have a smaller predictive power. This is another reason why these models are not generally capable of efficiently embedding all the variables that are important in the CRFs.
Another potential limitation is that decision trees do not deliver p values or CIs. The cross-validation method, however, ensures that the selected tree is optimal. This validation method was chosen superior to other model validation methods, for example, the training-test approach, as it uses the potential of the data to a greater extent at the cost of a greater computational burden.
The study setting was a low socioeconomic area in rural Bangladesh, where maternal and child undernutrition in early life still is widespread. The growth trajectories of our cohort were consistent with established growth trajectories in South Asia, where children are born below WHO growth reference and falter dramatically up to 24 months of age.5 The subcontinents of South Asia and sub-Saharan Africa share similar proportions of stunted children and faltering patterns. The sub-Saharan African children are, however, on average born slightly bigger than children in South Asia,5 which makes our results mainly relevant for the South Asian context.
This cohort study of determinants of young child stunting in a rural Bangladeshi setting included a wide range of high-quality prenatal and postnatal data, household and family information, environmental factors, child characteristics at birth, infant feeding and morbidity. Prenatal factors including birth size, the mother’s anthropometry and parental education were the most critical factors for stunting at 24 months. These results should be seen in contrast to present practice and recommendations that mainly are limited to child interventions. The findings emphasise the benefit of interventions before conception and during pregnancy to reach a substantial reduction in stunting.
The authors thank the participants and their families in Matlab for their continuing involvement in the MINIMat trial, and the field-team members and data management staff for their excellent work.
Contributors PS contributed to study design, data analysis and interpretation of the results and had the main responsibility of writing the paper. LAP and SEA were principal investigators of the MINIMat project. E-CE, LAP and KS contributed to the study design. E-CE, RTN, AR and AIK took part in and supervised data collection. PS, OS and KS analysed the data. All authors contributed to the preparation of the database, interpretation of the results and reviewed and approved the final version of the manuscript.
Funding The MINIMat research study was funded by the icddr,b, United Nations Children’s Emergency Fund, Swedish International Development Cooperation Agency, UK Medical Research Council, Swedish Research Council, Department for International Development, Japan Society for the Promotion of Science, Child Health and Nutrition Research Initiative, Uppsala University and US Agency for International Development.
Disclaimer The funding agencies had no role in the design and conduct of the study, in the collection, analysis and interpretation of the data, or in the preparation, review or approval of the manuscript.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available upon reasonable request.