Article Text
Abstract
Objective The major objective of this project is to find the best suitable model for district-wise infant mortality rate (IMR) data of Bangladesh over the period 2014–2020 that captures the regional variability and overtime variability of the data.
Design, setting and participants Data from seven consecutive cross-sectional surveys that were conducted in Bangladesh between 2014 and 2020 as a part of the Sample Vital Registration System (SVRS) were used in this study. The study included a total of 13 173 (with 390 infant deaths), 17 675 (with 512 infant deaths), 17 965 (with 501 infant deaths), 23 205 (with 556 infant deaths), 23 094 (with 498 infant deaths), 23 090 (with 497 infant deaths) and 23 297 (with 495 infant deaths) complete cases from SVRS datasets for each respective year.
Method A linear mixed effects model (LMM) with a quadratic trend over time in the fixed effects part and a nested random intercept, as well as a nested random slope for a linear trend over time in the part of the random effect, was implemented to describe the situation. This model was selected based on two popular selection criteria: Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).
Results The LMMs analysis results demonstrated statistically significant variations in IMR across different districts and over time. Examining the district-specific area under the logarithm of the IMR curves yielded valuable insights into the disparities in IMR among different districts and regions. Furthermore, a significant inverse relationship was observed between IMR and life expectancy at birth, underscoring the significance of mitigating IMR as a means to enhance population health outcomes.
Conclusion This study accentuates district-wise and temporal variability when modelling IMR data and highlights regional heterogeneity in infant mortality rates in Bangladesh. Area-based programmes should be created for mothers residing in locations with a higher risk of IMR. Further research can examine socioeconomic elements generating these discrepancies.
- public health
- statistics & research methods
- health & safety
Data availability statement
Data may be obtained from a third party and are not publicly available. The annual microdata of SVRS from 2014 to 2020 were used in this study. On request, one can get the annual SVRS microdata from the Bangladesh Bureau of Statistics(BBS). Address: Bangladesh Bureau of Statistics (BBS) Director General, Parishankhyan Bhaban, E-27/A, Agargaon, Dhaka-1207. Phone number +880-2-9112589, faxnumber +880-2-9111064. Email address: dg@bbs.gov.bd.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
STRENGTHS AND LIMITATIONS OF THIS STUDY
In contrast to the existing literature, this study considers repeated measurements of infant mortality rate (IMR) for each district rather than single observations via linear mixed effects model (LMM).
This research uses a robust LMM that incorporates a quadratic trend over time and random effects, allowing for capturing intricate variations and accounting for within-district and between-district heterogeneity.
This is the first research that models regional IMR data of Bangladesh, which considers district-wise and over-time variation.
This study elucidates how the socioeconomic disparity in infant mortality varies across the place of residence.
The cross-sectional nature of the data in this study precludes establishing any causal relationship.
Introduction
Infant mortality rate (IMR) is one of the crucial markers to indicate the developing condition of a country. The IMR is defined as the number of newborn deaths under the age of one per 1000 live births during the same year.1 The IMR in Bangladesh was 21 per 1000 live births in 2020,1 which means, on average, 1 in every 48 children dies before they reach their first birthday. In Bangladesh, the IMR has decreased appreciably in the last decades.2 3 It is very crucial to cut back the mortality rate to a minimum of 5 per 1000 live births to achieve the Sustainable Development Goals (SDGs) established by the United Nations by the year 2030.4 However, the degree of decline in infant mortality varies from district to district5 and even between urban and rural areas of the same district. Presently, the policy-makers of the Bangladesh government prioritise regional development schemes.6 For effective implementation of these district-wise development plans, it is necessary to know which districts have low or high IMR. One of the goals of this study is to identify those districts with lower or higher IMR.
Annual data on IMR from 2014 to 2020 are available for each of the 64 districts of Bangladesh. Some fundamental exploratory data analyses (eg, online supplemental figure 1) divulge an over-time variation, district-wise variation and urban–rural variation in the data. Hence, the statistical model should consider these variations; otherwise, the statistical inference will yield inaccurate results. The primary purpose of this study is to model Bangladesh’s regional IMR data that capture district-wise variation, over-time variation and urban–rural variation. Because if the existing regional and over-time variability are not considered in the model, then the SE estimates will be inaccurate7; hence, statistical inferences will be misleading. This study also focuses on how the IMR is associated with life expectancy at birth, as life expectancy at birth is one of the key indicators for assessing population health.8
Supplemental material
Data and variables
The annual microdata of Sample Vital Registration System (SVRS) from 2014 to 2020 were used in this study. On request, one can get the annual SVRS microdata from the Bangladesh Bureau of Statistics.9 There are 11 tafsil in each annual SVRS microdata1; this analysis uses tafsil-3 and tafsil-4. For each annum, the number of live births and infant deaths in different residential areas (urban/rural) of different districts is obtained from tafsil-3 and tafsil-4, respectively. Then the IMR is obtained by dividing the number of infant deaths by the number of live births and multiplying by 1000. These were done by using the software Stata V.14 (Stata SE V.14, StataCorp). Then the final dataset contains three independent and one dependent variable.
IMR: This is the dataset’s sole dependent variable and represents the newborn mortality rate.
Year: Calendar year between 2014 and 2020. We have recoded the year starting from 1 to 7 while performing the analyses.
District: District codes from 1 to 64 that we recoded for the districts in Bangladesh (see online supplemental table 1).
Region: Dichotomous variable taking the value one if the area is urban and 0 if it is rural.
Selecting appropriate statistical model
As the IMR is a type of rate and the support of any rate is (0, ∞), which is restricted. So, a better choice is to take the log of IMR, as the support of log(IMR) is (−∞, ∞), which is unrestricted. Also, the sampling distribution of log(rate) converges more rapidly to normal than the sampling distribution of rate.10 For this reason, we consider log(IMR) as the dependent variable in this study.
Online supplemental figure 1 manifests an urban–rural variation within district variation (non-linear over time variation) and between districts variation in the data. Since the IMR of each district of Bangladesh is not linear over time, we have to consider a non-linear trend of time in the model. Figure 1 compares the linear, quadratic and cubic trend of time with the non-parametric regression and indicates that both quadratic and cubic linear trends can somewhat well approximate non-parametric regression. However, the Ramsey RESET test11 indicates the linear trend is significant with p=0.000, the quadratic trend is significant with p=0.001, but the cubic trend is not significant with p=0.633.
We design a quadratic polynomial regression model with random effects to take both the within-district and between-district variations into account. Whenever a population exhibits natural heterogeneity, linear mixed effects models (LMMs) are appropriate. LMMs can account for the natural heterogeneity in the population by ensuring that each subject has a mean response that changes over time, which lets a subset of regression parameters vary randomly from one district to another.12 13
Although, in our data structure, ‘Region’ is nested within ‘District’; It assumes that the rural area of district-1 has no connection with the rural area of any other district, and the urban area of district-1 has no connection with the urban area of any other district, and so-forth; It indicates the nested random effect. Also, online supplemental figure 1 demonstrates that a random intercept and random slope model can be appropriate for the data as intercepts and slopes seem to differ from one district to another. However, in order to select our final model, we additionally compute the well-known model selection criteria, Akaike information criterion (AIC), using the maximum likelihood approach.
The AIC and BIC values are lowest for the model with random intercepts for districts adjusted for regions, random slopes of linear trends over time adjusted for regions and districts, and random slopes for regions (see online supplemental table 2). Therefore, to model the district-specific IMR data of Bangladesh for 2014–2020, we select an LMM with a quadratic trend over time in the fixed effects part and a nested random intercept as well as a nested random slope for a linear trend over time in the part of the random effect.In the end, our final model for modelling the district-specific IMR data over time is
where represent the of the ith district in the jth region at tth year . is the tth time for the ith district in jth region while centred on its mean value, that is, , where is the mean value of time ∈ {1, 2, …, 7}. The reason behind centring time is to avoid potential problems of multicollinearity in the quadratic trend model.12 X indicates the region indicator that takes the value 1 if the region is urban or 0 if it is rural. This LMM assumes β0, β1, β2 and β3are parameters for fixed effects. b1i and b2j(i) are the random intercepts, and b3i and b4i are the random slopes. b1i∼N(0, σ2b1), b2j(i)∼ N(0, σ2b2j), b3j(i)∼ N(0, σ2b3j), b4i∼ N(0, σ2b4), and b1i, b2j(i), b3j(i) and b4i are pairwise independent. Finally, the model random error term ϵijt∼N(0, σ2) is pairwise independent and is also independent of b1i, b2j(i), b3i and b4i.
The model incorporates a fixed effects component that encompasses the quadratic trend over time and the region indicator variable (X). The fixed effects in this study account for the general temporal pattern and the variations between urban and rural regions. In addition to incorporating fixed effects, the model includes random effects to address the inherent heterogeneity and variability observed among districts. Including random effects enables the incorporation of district-specific and region-specific variations that deviate from the overall temporal pattern represented by the fixed effects. The model effectively differentiates and adjusts for the temporal fluctuations in the IMR data by incorporating fixed and random effects. The fixed effects account for the general pattern and variations across regions. In contrast, the random effects account for the specific deviations at the district and regional levels from this general pattern. This methodology permits a more extensive comprehension of the temporal fluctuations in IMR. It empowers the model to furnish district-specific estimations while adequately considering the heterogeneity across districts and regions. In general, the incorporation of fixed and random effects in the model enables it to accurately capture and accommodate the temporal fluctuations in the IMR data, thereby establishing a reliable framework for the analysis and interpretation of the findings.
The results of the LMM for log-transformed-IMR are presented in table 1. After accounting for other variables, it has been observed that living in an urban area is linked to a significant decrease of 34.8% in IMR compared with living in a rural area. Nevertheless, it is imperative to acknowledge that the estimated coefficients in isolation do not offer a direct means of comparing high or low IMR. To evaluate the unique characteristics of the IMR curves within each district, we used the area under the curve (AUC) metric. The AUC is a metric commonly employed in LMMs to assess the collective pattern and extent of IMR changes across different districts over a given period. The AUC is a measure that quantifies the cumulative logarithm of the IMR over a specific period. It is used to assess the collective impact of infant mortality within a given district. A decreased AUC value indicates a reduced incidence of IMR and, consequently, a more favourable health status within the district. On the contrary, a greater AUC value indicates a higher IMR and poorer health conditions. The calculation of the AUC involved the utilisation of either a non-parametric methodology relying on observed data or a parametric methodology relying on the fitted values derived from the model. Both methodologies estimate the AUC for every district within a particular geographical area. By comparing AUC values across different districts, researchers can discern regions that exhibit either higher or lower IMRs within the designated time frame.
By investigating the residuals, we assess the adequacy of the fitted model. Online supplemental figure 2a represents the scatter plot of residuals vs the fitted values that reveals no evident structural deficiencies; also, it indicates residuals are pairwise independent, the mean of residual is zero, and the variance of residuals is constant. Online supplemental figure 2b represents the normal quantile-quantile plot of the residuals. Even though some points vary at both tails of the normal Q-Q plot, a slight divergence from normality does not indicate a serious breach of the distributional assumption.14 Consequently, plots (a) and (b) in online supplemental figure 2 reveal that the final LMMs for Bangladesh’s last 7 years’ IMR data are satisfactory.
Assessing the district-specific characteristic of the log(IMR) curves
For each district under a specific region, we have seven data points. Summary statistics help reduce each district’s data to a single value. On the other hand, summary statistics using a parametric approach reduce values from the parameters of a model to a single estimate.15 The AUC is widely used as a summary measure. It can be calculated using either the parametric (based on the fitted value) or the non-parametric (based on the observed data) method. Using a non-parametric approach, AUC for the ith district under the jth region is given by,15
Using parametric approach, AUC for the ith district under jth region is given by,15
where is the fitted value of the response at time k (k=1, …, 7) for the ith district (i=1, 2, …, 64) under region j (j=0, 1).
Patient and public involvement
None.
Results
Results on the district-specific area under the log(IMR) curves of Bangladesh
We obtain non-parametric and parametric AUC estimates for both regions to determine which districts of Bangladesh had higher IMRs and which had lower in the specified periods. Figure 2 points out that AUC’s non-parametric and parametric estimates are close, but the parametric estimates have less variability. A lower AUC value indicates a lower IMR and a higher AUC value indicates a higher IMR in that district. It can be seen in figure 2 that, amidst the urban areas of Bangladesh, the urban areas of the Gopalganj (19) district have the highest IMR as it has the highest AUC value. Among the urban areas of Bangladesh, the urban areas of the Jessor (23) district have the lowest IMR as it has the lowest AUC value. Among the rural areas of Bangladesh, the rural areas of the Bandarban (2) and Rangamati (55) districts have the highest IMR as they have the highest AUC values. Among the rural areas of Bangladesh, the rural areas of the Feni (16) district have the lowest IMR as it has the lowest AUC values. It should be pointed out here, even though in online supplemental figure 1, we see that among the urban areas of Bangladesh, the urban area of Thakurgaon (64) had a high infant mortality peak in 2014, and Joypurhat (21) in 2016, those occurred only in two particular years in the period 2014–2020. However, it is not astonishing that these two districts are not included in the list of districts with high IMR compared with other districts in Bangladesh. The observed data collected in a single year based on sample surveys may give misleading conclusions since, in reality, they may not capture the neighbourhood and contextual influences because of the sampling error.16
Results on the association between IMR and life expectancy at birth
IMR and life expectancy at birth are both used as an indicator of the health condition of a country.17 However, a low IMR value of a district indicates that the health condition of that district is good. In contrast, a higher value of life expectancy at the birth of a district indicates that the health condition of that district is good. Figure 3 indicates a significant negative correlation (r=−0.318) between life expectancy at birth and IMR with p=2.648× 10−12 for the people living in rural areas. Also, it indicates a significant negative correlation (r=−0.288) between life expectancy at birth and IMR with p=2.757 × 10−10 for the people living in urban areas. That is, if the IMR of the people of any district of any region can be reduced, the average life expectancy of the people of that district will increase.
However, this exploratory approach does not consider key aspects of the data, such as the overtime variation and intradistrict and interdistrict variation of the IMR. We used an LMM to take these variations into account.
where represents the logarithm of the IMR of the ith district in the jth region at tth year . is the tth time for the ith district in jth region while centred on its mean value, that is, , where is the mean value of time Tijt∈ {1, 2, …, 7}. denotes the life expectancy of the people living in the ith district in the jth region at tth year . X indicates the region indicator that takes value 1 if the region is urban or 0 if it is rural. The parameters α0, α1, α2 and α3 represent the fixed effects in this LMM. The random intercept ui∼N(0, σ2u) and the model random error term ϵijt∼ N(0, σ2) are independent.
This study examined the association between IMR and life expectancy at birth across various districts and regions of Bangladesh, with a particular focus on temporal variations. Incorporating the variable ‘Region’ into the statistical model facilitated the examination of potential disparities in this association across urban and rural settings. The objective of this study was to ascertain the association between regions characterised by lower IMR and higher life expectancy, as well as the inverse relationship. This investigation offers valuable insights into identifying specific regions that would benefit from targeted interventions to enhance infant health and overall population well-being. Furthermore, including the variable ‘year’ in the model incorporates the temporal dimension. This facilitated the investigation into the progression of the relationship between IMR and life expectancy during the span of 7 years, specifically from 2014 to 2020. The study sought to examine temporal trends in order to ascertain any alterations or regularities in the relationship and evaluate whether it was exhibiting an increasing or decreasing strength over time.
Table 2 displays the results of the LMM analysis, which investigates the relationship between log-transformed IMR and life expectancy, region and time. The model includes a random intercept for the district. The estimated coefficient ofα1 suggests that, on average, there is a negative association between life expectancy and IMR, with a 3% decrease in IMR expected for each 1-year increase in life expectancy while controlling for other predictors. This finding implies a negative association between higher life expectancy and IMR, even when controlling for the influence of other variables included in the model. According to the estimated coefficient of α2, when controlling for other predictors, an increase of 1 year in time is linked to an 8% reduction in IMR. This suggests a negative association between longer follow-up periods and IMR, even after controlling for factors such as life expectancy and geographic region. The estimated coefficient of α3 suggests that, while controlling for other predictors, residing in an urban region is linked to a 36.2% reduction in IMR compared with residing in a rural region. This finding indicates that urban regions generally exhibit lower IMR compared with rural regions, even when accounting for factors such as life expectancy and follow-up duration.
Additionally, the random effect variance component captures the extent of variation in the logarithm of IMR across districts that cannot be accounted for by the fixed effects included in the model. The statistical significance of the value Sd(u)=0.232 suggests a substantial degree of variation in the logarithm of the IMR across different districts. This implies that the specific district in which an observation is taken significantly influences the IMR, even after accounting for fixed effects and other relevant factors. Furthermore, the intraclass correlation coefficient (ICC) offers valuable information regarding the proportion of the overall variation in log(IMR) that can be ascribed to dissimilarities between districts. The obtained ICC coefficient of 0.133 indicates that approximately 13.3% of the overall variability in the logarithm of IMR can be ascribed to variations between districts. This finding suggests that the random effect at the district level accounts for a significant proportion of the variation in the logarithm of IMR across the entire population. All of these statistical analyses were performed using the software R.18
Discussion
The primary objective of this project was to model the district-wise IMR data of Bangladesh appropriately over the 7 years that captured the regional variability and overtime variability of the data. Because when regional variability exists but is not considered, the statistical inference drawn from it would not be valid. Our results suggested that the best suitable model for district-specific IMR data of Bangladesh for the years 2014–2020 was an LMM with a quadratic trend over time in the fixed effects part and a nested random intercept as well as a nested random slope for a linear trend over time in the part of the random effect. However, while the study considered characteristics like district and region, other confounding factors could affect IMR.
From the parametric and non-parametric estimates of AUC value, we found that the rural areas of the Bandarban (2) and Rangamati (55) districts had the highest IMR among all the rural areas of Bangladesh during 2014–2020. Since both Bandarban and Rangamati were hilly and border areas,19 the condition of road transport as well as medical facilities were less developed than in other districts20; this could be a potential reason for higher IMR. However, it is crucial to note that this study did not concentrate on explaining differences in IMR by socioeconomic characteristics, which might be a future research problem. Because this was the first research to model regional IMR data of Bangladesh that took into consideration district-wise and over-time variation; hence, more study is required to support our findings.
Although our primary interest was in the modelling regional IMR data of Bangladesh, we also looked at how IMR was associated with life expectancy at birth. Our findings suggest that if we can reduce the IMR of people in any region of a district, the average life expectancy of people in that district will increase. Some other researchers found similar findings.21 22 Our findings also suggest that living in an urban area is associated with a significant reduction in IMR compared with living in a rural area. This might be because people living in urban areas have higher access to better healthcare. As infant mortality and life expectancy are inversely connected, other researchers found that life expectancy is higher in urban areas than in rural areas in Bangladesh,23 supporting our findings. We believe our results would eventually help the policymakers in plannings to achieve the SDGs goal.
Analysing district-specific IMR data over time enables policy-makers to identify regions or districts that require special attention and interventions. This information enhances the distribution of resources and execution of interventions in regions characterised by higher IMR, thus effectively mitigating regional disparities in infant mortality. This study investigates the relationship between IMR and various factors, including life expectancy at birth and geographical region, by using district-specific IMR data. The findings derived from this study have the potential to aid policy-makers in allocating resources and formulating development strategies aimed at reducing IMR in regions with high infant mortality. By directing resources towards districts characterised by higher IMR, policy-makers could effectively implement strategies that target enhancing healthcare infrastructure, facilitating access to high-quality healthcare services, and promoting community health programmes within these regions. The current focus of policy-makers of the Bangladesh government is on regional development schemes. The findings of this research could provide valuable insights for policymakers regarding resource allocation and planning decisions. Consequently, these findings could contribute to enhanced infant health outcomes.
Limitations
The study uses an LMM that incorporates certain distributional and random effects assumptions. However, it is important to acknowledge that these assumptions may introduce limitations or uncertainties in the estimated parameters. The statistical model employed in the study could establish associations among variables; however, it did not imply causality due to the lack of covariate balance across different levels of exposure. Several important biological and contextual factors such as socioeconomic status, maternal education, healthcare infrastructure, and prenatal and postnatal care availability could not be incorporated due to the unavailability of district-wise data in the SVRS.
Conclusion
This study emphasises the district-wise and temporal heterogeneity of IMR in Bangladesh, highlighting the need for targeted programmes and interventions to address regional disparities. The findings highlight the importance of implementing community-based intervention in districts with higher IMR, focusing on increasing healthcare access, socioeconomic status and cultural considerations. Policy-makers should prioritise allocating resources and developing strategies to improve population health outcomes and lessen IMR disparities across districts. Moreover, to lower IMR and enhance general well-being, initiatives should also be taken to strengthen the healthcare system and boost maternal and child health services. These recommendations can assist in ensuring a healthier future for the nation and achieving the SDGs.
Data availability statement
Data may be obtained from a third party and are not publicly available. The annual microdata of SVRS from 2014 to 2020 were used in this study. On request, one can get the annual SVRS microdata from the Bangladesh Bureau of Statistics(BBS). Address: Bangladesh Bureau of Statistics (BBS) Director General, Parishankhyan Bhaban, E-27/A, Agargaon, Dhaka-1207. Phone number +880-2-9112589, faxnumber +880-2-9111064. Email address: dg@bbs.gov.bd.
Ethics statements
Patient consent for publication
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Contributors NJA conceptualised the research problem. NJA and TI developed the methodology. TI conducted the formal analysis and wrote the original draft. TI and NJA reviewed and edited the manuscript. NJA supervised the project and was responsible for the overall content. All authors read and approved the final version.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.