Article Text

Download PDFPDF

Original research
Efficacy of deep learning methods for predicting under-five mortality in 34 low-income and middle-income countries
  1. Adeyinka Emmanuel Adegbosin1,
  2. Bela Stantic2,
  3. Jing Sun1
  1. 1School of Medicine, Griffith University, Gold Coast, Queensland, Australia
  2. 2School of Information and Communication Technology, Griffith University, Nathan, Queensland, Australia
  1. Correspondence to Dr Jing Sun; j.sun{at}griffith.edu.au

Abstract

Objectives To explore the efficacy of machine learning (ML) techniques in predicting under-five mortality (U5M) in low-income and middle-income countries (LMICs) and to identify significant predictors of U5M.

Design This is a cross-sectional, proof-of-concept study.

Settings and participants We analysed data from the Demographic and Health Survey. The data were drawn from 34 LMICs, comprising a total of n=1 520 018 children drawn from 956 995 unique households.

Primary and secondary outcome measures The primary outcome measure was U5M; secondary outcome was comparing the efficacy of deep learning algorithms: deep neural network (DNN); convolution neural network (CNN); hybrid CNN-DNN with logistic regression (LR) for the prediction of child’s survival.

Results We found that duration of breast feeding, number of antenatal visits, household wealth index, postnatal care and the level of maternal education are some of the most important predictors of U5M. We found that deep learning techniques are superior to LR for the classification of child survival: LR sensitivity=0.47, specificity=0.53; DNN sensitivity=0.69, specificity=0.83; CNN sensitivity=0.68, specificity=0.83; CNN-DNN sensitivity=0.71, specificity=0.83.

Conclusion Our findings provide an understanding of determinants of U5M in LMICs. It also demonstrates that deep learning models are more efficacious than traditional analytical approach.

  • machine learning
  • deep learning
  • random forest
  • under-five mortality
  • community child health
  • maternal medicine
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • The models were tested using a very large data sample, drawn from over 1 million households.

  • The survey used a cluster sampling approach and is representative of each country included.

  • Socioeconomic, political and cultural differences between the included countries may limit generalisability of the results.

  • The cross-sectional design of the study means we can only infer association and not causality.

  • Our study does not reflect subnational trends and patterns.

Introduction

Recent global estimates showed that 5.3 million under-five deaths occurred in 2018; this is equivalent to 15 000 deaths every day and 39 deaths per 1000 live births.1 A majority of the children who die before their fifth birthday live in sub-Saharan Africa and Southeast Asia; most of these deaths result from preventable and treatable causes.1 2 Although these estimates represent a significant improvement in under-five mortality (U5M) levels when compared with the levels in the early 1990s, ‘preventable death of one child is still too many’.1 2

High levels of U5M in low-income and middle-income countries (LMICs) is usually a syndromic feature of a weak health system,3 and U5MR is a key barometer of the state of a nation’s health system and an important impact measure that is reliant on health system input such as health financing, health workforce and infrastructure.3 4 These inputs in turn determine health service access, readiness, quality and safety and consequently influences coverage of interventions such as antenatal care coverage, postnatal care, demand for family planning satisfied, skilled birth attendance, care for childhood illnesses, nutritional supplementation, etc.4 5

Studies have shown that improving child survival requires engaging intricately with a host of child health determinants, including biological, environmental and socioeconomic factors such as level of maternal education, household income, environmental sanitation and hygiene.5–7 The framework of distal and proximate social, environmental and biological determinants was first described by Mosley and Chen.5 Unfortunately, many LMICs are constrained by limited finances and limited health budgets, and are unable to intervene on all of the determinants of child health at the same time.3 It is therefore increasingly important to identify the most important determinants to be prioritised and to determine the most pressing socioeconomic issues that can serve as a starting point for government and policy makers to focus on intervention strategy.

Furthermore, intervention measures need to be equity-oriented in order to be effective.8 Hence, disaggregated household level monitoring of coverage and impact indicators are crucial for informing policies and programmatic interventions in the sustainable development goal (SDG) era.9 It is important to understand the status of every child as against simply exploring global trends, in order to ‘leave no one behind’ and to ‘reach the furthest behind first’.10 In light of the SDG pledge, monitoring changes at household or community level may require new methodological approaches in engaging with the ‘big data’, which continues to be generated through ongoing household surveys such as the Demographic Health Survey (DHS) and Multiple Indicator Cluster Survey.11 12 An expansion of traditional analytical approach may be pertinent and key to effectively monitor health intervention coverage and impact. Machine learning (ML) techniques may represent a novel analytical approach to unravel previously unseen trends; these techniques expand on existing statistical approaches and use methods that are not based on a priori assumptions about the distribution of the data.13

Artificial intelligence (AI) described ‘as a scientific discipline rooted in mathematics, philosophy and computer science attempts to develop systems with properties of intelligence’.13 ML is a subdiscipline of AI, ‘where computer programs learn to solve new problems for which they weren’t explicitly programmed, by learning associations and patterns from example data’.13 ML deploys a broader set of statistical models than those traditionally used in medicine or public health. Example of such being deep learning models.13 AI and ML techniques broaden existing statistical models and offer additional tool sets to achieve public health milestones that may not have been previously feasible. For example, ML has been used for real-time surveillance of disease outbreak through social media data mining,14 AI have been used in large-scale evidence synthesis to guide health promotion and health policy.15 It is important to state that although AI offers new possibilities for targeted and personalised public health practice, its application must still be guided by social and structural determinants of health; this has also been highlighted by other AI researchers.16

In a report recently released by the United States Agency for International Development (USAID) centre for innovation and impact, on the use of AI in global health, AI-enabled population health was identified as one of AI use cases that could have the greatest impact on improving health quality, cost and access in LMICs.17

AI-enabled population health encompasses public health surveillance and prediction, population risk management, population health intervention selection and targeting.17 In this current study, we explored the efficacy of deep learning as a technique for population health surveillance and intervention targeting. Deep learning ‘discovers intricate structure in large data sets by using backpropagation algorithm to indicate how a machine should change its internal parameter used to compute representation in each layer from the representation in the previous layer’.18 Deep learning algorithms have shown excellent performance in genomics, proteomics, drug discovery, speech recognition, visual recognition, object detection and several other domains.18

There have been numerous empirical studies on the various applications of ML in hospital settings for prognostication,19 20 triage21 and prediction of mortality in the hospital setting.22 However, application of ML is yet to be demonstrated in population health studies, where it may represent a potential transformative tool.13 The objective of our study is to fill the gap on application of ML in population health studies, and other previously highlighted gaps. One of the previously highlighted gaps concerns the need to identify the most important determinants of U5MR. To explore these determinants, we employed a data-driven approach by using the random forest algorithm for feature selection, rather than using the traditional hierarchical approach for multivariate analysis, which tends to be highly user-driven and usually involving the development of conceptual frameworks that prejudges the relevance of a limited set of determinants (independent variables).23 Random forest is an efficient classification and regression algorithm that combines several randomised decision trees and aggregates their predictions. It is especially useful when the number of variables is larger than the number of observations.24

The random forest approach allows an unlimited number of variables or determinants to be incorporated into the model. The algorithm automatically tests several hypothesis and selects features that best predicts the outcome, based on information gained from each variable.20

Another gap is the need for new ways to gain insights and to unravel previously unseen trends in the prediction of U5M from disaggregated household level data. To fill this gap, we also compared the efficacy of deep learning algorithms: deep neural network (DNN); convolution neural network (CNN); hybrid CNN-DNN with logistic regression (LR) for classifying child survival, and for predicting age of death. We hypothesise that deep learning methods will outperform traditional methods such as LR in the prediction of U5M.

Finally, in this work, we make recommendations on ML implementation, and the new regulatory and ethical considerations for the use of novel ML techniques in public health.25

Methods

Data source and analytical tools

We conducted an analysis on DHS data from 34 LMICs. The DHS is a nationally representative household survey developed by the USAID in the 1980s.26 The survey provides data on fertility, family planning, maternal and child health, gender, HIV/AIDS, malaria and nutrition.27 In total, over 350 surveys have been carried out in over 90 countries.26 The survey uses a two-stage cluster sampling design, further details about the survey and its design are published elsewhere.27 Combined multicountry data for this study were obtained from the IPUMS-DHS portal.28 Combined DHS data were available for a total of 34 LMICs on the IPUMS-DHS database. We used all available data in these countries from 1987 to 2017 (see online supplementary table 1) . Permission to use data for all included countries was granted by the DHS programme. Analysis was conducted using Python software V.3.7. The programming codes used for the various analysis are accessible on Github using the following link: https://github.com/drulna/u5mr_predict

Patient and public involvement

There are no patients involved in this study.

Data preprocessing

Any real-world dataset needs preprocessing to convert it into a representation that can be used to train a model. This can heavily affect the model’s performance. This dataset had several irrelevant features, such as IPUMS identifiers created to merge multicountry data. We excluded 14 such features and included 41 features in the final model. Like many census data, the DHS data often contain variables with missing observations. All variables except place of residence (rural/urban) had some level of missingness which range from 5% to 60% of the observation in certain cases, we removed all variables on anthropometric measure due to significant missingness. We performed data preprocessing using the forward-fill approach to replace missing data. There exist multiple strategies that can be deployed to handle missing values,29 we tested other approaches and tested the models accordingly, only the forward-fill approach was found to provide reproducible and plausible outcomes . ‘Forward Fill’ strategy involves replacing every missing value with the next real values for each column. This clean and preprocessed data were used for the rest of the analysis.

Variables

Outcome variable

The outcome variable is the risk of death before the age of 5 years, measured as the duration of survival in months from birth.

Independent variables (model features)

The determinants included in the model can be broadly classified into maternal-level determinants, household socioeconomic characteristics and child-level determinants.

Maternal factors

These encompasses maternal behavioural and determinants within the reproductive care continuum, which includes duration of breast feeding, number of antenatal visits when the child was in utero, highest level of maternal education, administration of tetanus injection during pregnancy, provision of prenatal care by a skilled provider, delivery care provider, postpartum health check, unmet need for family planning, prenatal care, pregnancy wanted or not wanted.

Household socioeconomic factors

The household factors included are the household wealth index, the geographical location of the household (urban or rural) and who has final say of the woman’s health within the household.

Child-level factors

These include child’s postnatal check, sex of the child, oral polio vaccination, measles vaccination, diphtheria, pertussis and tetanus vaccination, BCG vaccination, age of the child and care for childhood illnesses such as diarrhoea and suspected symptoms of pneumonia. Survey-specific definition of all included determinants are published elsewhere.28

Feature selection

We use random forest to check feature importance with respect to its predictive power. figure 1 shows the feature importance (red bar) and variance of each tree in random forest (black vertical line). It can be observed that ‘duration of breast feeding’ has the most importance to predict a child’s death. However, there are some features that are of limited importance. We perform feature selection based on this information. We drop all features whose importance are <0.001, because we found that the accuracy of the classifier does not improve beyond this level, and adding the additional attributes only creates unnecessary additional computational overhead. In total, 29 features fell within our cut-off for feature importance and included in the final model. For comparing the utility of feature selection, we perform two experiments. One without feature selection (on all original 41 features) and one with feature selection (on selected 29 features).

Figure 1

Architecture of the deep neural network (DNN)-convolution neural network (CNN) ensemble model. FC, fully connected.

Model selection

We selected multivariate LR as an example of traditional model.20 Three deep learning techniques (DNN, CNN and DNN-CNN) were selected as modern ML approaches. For all the four models, we pose this problem as a multiclass problem, such that each value in the label is assigned an integer and then we binarize the output (ie, one-hot encoding). All categorical attributes are also converted to numerical, that is, dummy variables, by mapping each unique value to a number. After careful consideration, we concluded that the best ratio for training is 75% of the data, while the remaining 25% of the data are reserved for testing purposes. This choice is in line with literature and close to 80/20, which is quite a commonly used training/testing ratio, often referred to as the Pareto principle. We compare the performance of LR as a representative of traditional model, with three deep learning methods: DNN, CNN and hybrid CNN-DNN.

Model architecture

Deep neural networks

DNNs are a special kind of neural network with multiple hidden layers and usually hundreds of units in hidden layers. Each neuron of one layer is connected to every neuron of subsequent layer, also called fully connected (FC) layers. For each layer in DNN, a weight matrix is learnt. DNNs act as blackbox and can learn the data representation automatically with backpropagation of the error at final layer. A softmax layer is usually used to get final prediction for the class label.

Convolutional neural networks

CNN is a specific deep learning architecture that learns a filter instead of weight matrix. This filter is used to perform convolution with input data to get a feature map. This feature map can then be forwarded to a final softmax layer for prediction. A key advantage of using CNN over DNN is that it requires fewer parameters and less iterations to converge as only last layer is FC.

In our presentation, we give results for DNN, CNN and a hybrid of DNN and CNN. We show that the later gives the most optimal results, leveraging benefits of the two worlds.

Hybrid DNN-CNN ensemble model

In this model, the input is forwarded to two streams, where one represents DNN while the other CNN. As our input is one dimensional (1D), we use 1D CNN. With regard to DNN stream, the input is forwarded to an FC layer with 100 units. For non-linearity, the activation function ReLU is used which is defined as Embedded Image. This is followed by a batch normalisation (BN) and dropout (DO) layer to avoid feature co-adaptation. Then a second FC layer with 50 units is used to squash the information, which is again followed by BN and DO layers. The output of this layer is forwarded for concatenation with the output of CNN stream (figure 1).

Regarding the CNN stream, the input is forwarded to a 1D CNN layer with 128 filters and kernel/filter size of 2, with ReLU non-linear activations. The output is followed by a maxpooling to drop low information activations. This is followed by a BN and DO layer. The information is squashed into an FC layer with 50 units, which is again followed by BN and DO. Finally, the output is forwarded for concatenation with the output of DNN stream.

The combined features of both streams are then forwarded to a single FC layer with softmax activation, which results in class probabilities. The class label is assigned based on maximum probability. The detailed diagram of the architecture is shown as figure 1.

To optimise the hyperparameters such as optimizer, DO rate and learning rate, we used grid search. The available choices for hyperparameters and the selected value are given in table 1. To stop the training, we employed early stopping strategy where the training was stopped if the validation accuracy did not improve for 20 epochs. A checkpoint was created at the epoch where the validation accuracy showed improvement as compared with previous checkpoint. The choice of number of layers, number of neurons in each layer and number of filters in CNN was made empirically.

Table 1

Hyperparameters choice and selected values through grid search

Model evaluation

We evaluated the performance of each model using a receiver operating characteristic (ROC) plot, we also derived the weighted precision, sensitivity (also known as recall), specificity, f1-score and area under the curve (AUC) for each model. The formula for calculating the performance metrics are as follows:

Precision = Embedded Image ; F1-score = Embedded Image ; Specificity =Embedded Image

The models were evaluated before and after feature selection. Analysis was initially conducted using all preselected variables. We thereafter optimised the various models based on empirical results from the random forest analysis. As this is a multiclass problem, the ROC plots and performance metrics are all based on micro-averages.

Results

Characteristics of the study population

A total population size of (n=1, 520 018) children drawn from 956 995 unique households were included in the study, the sample was drawn across 34 LMICs. The sample size drawn from each of the included countries is presented in online supplementary table 1. The mean age of the total children population is 1.89 (±1.40). Majority (n=1 100 211; 72.7 %) resides in rural areas. Just under half (n=636 882; 45.2%) were in the lowest two wealth quintile (Q1 and Q2). Majority (n=1 100 262; 73.2%) were uneducated or had only primary education, majority received some form of postnatal check, delivery care tetanus injection before birth and approximately two-third breast fed their children for >6 months (table 2). A total of n=111 907 (7.3%) under-five deaths were recorded survey-wide across all 34 countries. Nearly half, 48.9% (n=54 825) of these deaths were neonatal death.

Table 2

Descriptive analysis of the study population

Feature importance

Overall, key determinants of U5MR include maternal factors such as duration of breast feeding, number of antenatal visits when the child was in utero, provision of maternal postnatal care by a skilled provider, highest level of maternal education, administration of tetanus injection during pregnancy, prenatal care provision by a skilled provider. Significant household socioeconomic factors include household wealth index and geographical location of the household. Time to child’s postnatal check was found to be the most significant child level determinant (figure 2).

Figure 2

Feature importance using random forest.

Model comparisons (before feature selection)

Comparison of the performance of the models before feature selection reveals that hybrid of CNN-DNN performs the best in terms of all metrics (sensitivity=0.68, specificity=0.83), while LR performs the worst (sensitivity=0.47, specificity=0.53) (table 3).

Table 3

Performance comparison (without feature selection)

Figure 3 shows the ROC curves for all the classifiers. It shows that hybrid CNN-DNN model outperforms all other models.

Figure 3

Micro-average receiver operating characteristic (ROC) curve before feature selection. CNN, convolution neural network; DNN, deep neural network; LR, logistic regression.

Model comparisons (after feature selection)

We found that feature selection does not improve the performance of LR. However, for all deep learning-based models, feature selection results in performance gain. The most performance gain is shown by CNN-DNN (sensitivity=0.71, specificity=0.83). CNN-DNN model performs the best out of all classifiers in both settings, that is, before feature selection and after feature selection (table 4).

Table 4

Metrics comparison after feature selection

In figure 4, we present ROC curves for all the classifiers. It shows that hybrid CNN-DNN model remains the top performer of all the models.

Figure 4

Micro-average receiver operating characteristic (ROC) curve after feature selection. CNN, convolution neural network; DNN, deep neural network; LR, logistic regression.

Discussion

A number of maternal-level, child-level and socioeconomic indicators were found to influence U5M. Duration of breast feeding was found to be a significant maternal-level determinant. Previous studies corroborate our findings, it has been shown that children breast fed for a longer duration have lower infectious disease morbidity and mortality, and better chance of survival than those who are breast fed for shorter periods, or not breast fed at all.30 Multiple studies have also shown that early initiation of breast feeding, and exclusive breast feeding reduces both neonatal and early infant mortality.30 31 In addition to breast feeding, several other factors within the continuum of essential obstetric care, such as antenatal care visits, postnatal care, delivery care and maternal tetanus immunisation were found to be significant predictors of U5M. These may partly be explained by our finding, which showed that nearly half of the mortality occurred during early neonatal life, which is in line with other previous studies.32 33 Several previous studies have shown that provision of essential obstetric care is vital for survival during the neonatal period.34 35 In addition, we found that the household wealth index was a slightly more important determinant compared with maternal level of education. This finding however contradicts the work of Fuchs et al, where they argued that mother’s education is the fundamental determinant of child mortality and is relatively more important than income level. They argued that education impacts the child’s health through better maternal health, increased health-specific knowledge, avoidance of traditional, harmful behaviours, greater economic resource as a consequence of education and general female empowerment.36 They however highlighted that other social scientists have often considered education and income as generally highly correlated and tend to be regarded as interchangeable indicators of socioeconomic status.36

The timing of the child’s postnatal check and the gender of the child were also found to be predictive of child’s survival. Postnatal check within 24 hours of birth have been shown to be crucial in identifying, managing or referring complications and ultimately in preventing child mortality.35

Our findings regarding the superiority of ML over traditional approaches such as LR in predictive analysis are also in line with findings elsewhere.20 37

This study however has some limitations. First, this is proof-of-concept cross-sectional study; hence, we can only draw inference on associations, and not on causality. Second, we did not measure change over time. Future studies should consider incorporating temporal data points, to draw inference on changes over time, and possibly causality. Finally, we did not explore individual country, regional and subgroup level variations and cannot conclude that the degree of association is the same across different countries and subgroups, due to differences in socioeconomic, geographical, cultural and political realities. Hence, future studies should consider disaggregating with stratifiers such as income, education and place of residence, to explore subgroup differences.

Recommendations for ML implementation, governance and ethics

Our recommendations regarding the implementation and regulation of ML are fourfold. First, there is a burgeoning risk that the adoption and benefits of ML may be imbalanced.38 High-income countries are beginning to increasingly adopt and benefit from deploying some of these novel technologies; therefore, there is the risk of extending the disparity between LMICs and high-income countries even further. To achieve equity in the implementation of this technology, there is a need for capacity building across board and collaborative use of technological resources between LMICs.

Second, regarding AI research governance and ethics (regulation), the capabilities of AI application in public health are not yet fully understood, and its application is still evolving. This implies that any regulatory attempt will effectively require understanding the capabilities of AI as a tool in public health and medicine. Like other medical research endeavours, the regulatory framework and ethical guidelines will have to evolve, as our understanding of the application of AI evolves. As such, we posit that there is a concordance between regulation, governance, research and development of AI technology. In the light of this, we suggest collaboration between research institutions, academic stakeholders, policy makers and regulatory authorities. There is a need to engage with all stakeholders across the spectrum of AI research, development and ethics.

Third, we believe that existing medical research ethical guidelines are highly applicable and cover several aspects of ML research. However, there is a need to strengthen regulatory aspects pertaining to data security and protection. The growth in the adoption of ML analytical techniques will usher an increase in the level of data transactions and with this, comes the potential risk of breaches to health data privacy. There are existing capabilities to re-identify anonymised data, using a few parameters within the data. Hence, regulatory efforts need to focus on data security, especially reducing the risks of data re-identification.

Fourth, as knowledge and application of AI continues to grow in leaps and bounds, and while regulatory efforts are still rudimentary and trying to catch up, we envisage a vacuum in governance, which will have to be filled. As such, there may be a need for the development and ratification of regulatory framework, which may be possible through the collaboration of multiple stakeholders.

Conclusions

This study demonstrates the superiority of ML as a tool for understanding previously unseen insights in large global health data. We have shown that ML algorithms such as random forest, may be more insightful than the user-dependent traditional hierarchical approach of testing a limited set of determinants for outcome prediction in multivariate analysis. Using random forest, we found that duration of breast feeding, household wealth index and level of maternal education are the most important determinants of U5MR. In addition, we also show that deep learning algorithms are more sensitive and specific for the prediction of U5MR and this finding may be applicable to other multivariate models, for data-rich population studies.

Going forward, the most important implication of this study is that if deep learning algorithms such as the one we describe in this study, are deployed in production in combination with spatial data, it is possible to identify and flag children who are most at risk and not likely to survive until the age of 5, such that necessary interventions can be targeted to communities where those children live. To the best of our knowledge, there are no existing studies that have investigated U5M, using a similar analytical approach.

Acknowledgments

The authors would like to acknowledge the contributions of Professor Hong Zhou and the team at UNICEF Office.

References

Footnotes

  • Contributors AEA conceptualised the study, conducted the data extraction, analysed the data and wrote the first draft of the manuscript. JS contributed to the conceptualisation of the study, critically edited and proofread the document. BS proofread the document.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval Permission to use the data from all included countries was granted by Measure DHS. Ethics approval exemption was granted for use of this secondary data by the Griffith University Human Research Ethics Committee.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data may be obtained from a third party and are not publicly available. The datasets generated and analysed during the current study are available subject to permission from the DHS programme, in the (IPUMS-DHS) repository (https://www.idhsdata.org/idhs/index.shtml).