Article Text

Download PDFPDF

Mapping tuberculosis prevalence in Ethiopia: protocol for a geospatial meta-analysis
  1. Kefyalew Addis Alene1,2,
  2. Zeleke Alebachew Wagaw3,
  3. Archie C A Clements1,2
  1. 1Wesfarmers Centre of Vaccines and Infectious Diseases, Telethon Kids Institute, Nedlands, Western Australia, Australia
  2. 2Faculty of Health Sciences, Curtin University, Perth, Western Australia, Australia
  3. 3National Tuberculosis Control Programme, Ghana Health Service, Accra, Greater Accra, Ghana
  1. Correspondence to Dr Kefyalew Addis Alene; kefyalew.alene{at}


Introduction Tuberculosis (TB), a major public health concern in Ethiopia, is distributed heterogeneously across the country. Mapping TB prevalence at national and subnational levels can provide information for designing and implementing control strategies. Data for spatial analysis can be obtained through systematic review of the literature, and spatial prediction can be done by meta-analysis of published data (geospatial meta-analysis). Geospatial meta-analysis can increase the power of spatial analytic models by making use of all available data. It can also provide a means for spatial prediction where new survey data in a given area are sparse or not available. In this report, we present a protocol for a geospatial meta-analysis to investigate the spatial patterns of TB prevalence in Ethiopia.

Methods and analysis To conduct this study, a national TB prevalence survey, supplemented with data from a systematic review of published reports, will be used as the source of TB prevalence data. Systematic searching will be conducted in PubMed, Scopus and Web of Science for studies published up to 15 April 2020 to identify all potential publications reporting TB prevalence in Ethiopia. Data for covariates for multivariable analysis will be obtained from different, readily available sources. Extracted TB survey and covariate data will be georeferenced to specific locations or the centroids of small administrative areas. A binomial logistic regression model will be fitted to TB prevalence data using both fixed covariate effects and random geostatistical effects based on the approach of model-based geostatistics. Markov Chain Monte Carlo simulation will be conducted to obtained posterior parameter estimations, including spatially predicted prevalence.

Ethics and dissemination Ethical approval will not be required for this study as it will be based on deidentified, aggregate published data. The final report of this review will be disseminated through publication in a peer-reviewed scientific journal and will also be presented at relevant conferences.

  • tuberculosis
  • public health
  • epidemiology

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This will be the first published study that combines survey and published data to spatially predict tuberculosis (TB) prevalence across Ethiopia.

  • The systematic review will be conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement.

  • The study will be limited to TB prevalence surveys conducted in Ethiopia, but the technique employed in this protocol can be adopted for other countries or diseases.


Although preventable, and usually treatable and curable, tuberculosis (TB) remains the leading cause of death from an infectious disease worldwide.1 In 2018, there were an estimated 10 million cases of TB and 1.6 million deaths due to TB globally.2 The burden of TB is distributed unevenly in the world, with the highest incidence rate of TB being reported in Africa.2 The United Nations’ (UN) Sustainable Development Goals and WHO’s End TB Strategy have a common target of ending the global TB epidemic by 2030.3 4 Achieving this ambitious target requires narrowing of the geographical inequalities and increasing access to preventive, diagnostic and treatment services for those at the highest risk of TB. In resource-limited and high-TB burden countries such as Ethiopia, identifying hotspot areas for targeted interventions is particularly important to optimise use of available resources and to increase efficiencies in the delivery of TB services.5

The prevalence of TB in small geographical areas such as districts may significantly vary from the national average. Previous studies conducted in Ethiopia have shown a substantial geographical variation of TB incidence at district level.6–9 Such small-area variations of TB may have important implications for local and national health policy in that policy-makers, clinicians and public health professionals might benefit from evidence to target interventions to those areas and communities at the highest risk. However, previous spatial analyses of TB have been generally conducted using routinely collected TB notification data.10 There are a number of limitations to using such data, including underdetection and under-reporting. Notably, WHO estimated that almost 32% of TB cases have not been detected in Ethiopia due to low TB diagnostic and treatment coverage.2

Population-based surveys, undertaken using probability sampling designs, would be the best source of data to estimate TB prevalence in the general population. However, resource scarcity, logistic difficulties and a shortage of experts may limit the feasibility of conducting large-scale surveys. While there are many small-scale TB prevalence surveys in Ethiopia conducted at different times,11–16 there has been only one national TB prevalence survey, which was conducted in 2011.17 In the absence of large-scale population-based surveys, alternative methods such as geospatial meta-analysis are required to estimate the spatial distribution of TB prevalence across national and subnational areas.

The term geospatial refers to data that have a geographical component, and meta-analysis refers to methods that are applied to combine data from multiple studies to provide a pooled estimate of an effect. Geospatial meta-analysis combines the principles of geospatial analysis and meta-analysis to estimate disease prevalence across geographical areas. Combining geospatial techniques and meta-analysis provides robust information by pooling data from several studies (that might be small scale), and by incorporating spatial components to the model.

Geospatial meta-analysis has been used as a pragmatic and cost-effective method to investigate the spatial distribution of infectious disease such as HIV,18 malaria,19–21 dengue,22 cholera23 and soil-transmitted helminth infections.24 25 However, to our knowledge, there have been no reported studies investigating the spatial distribution of TB prevalence using geospatial meta-analysis. In this report, we present the protocol for a systematic review and geospatial meta-analysis to predict the spatial patterns of TB point prevalence in Ethiopia.


The aim of this systematic review and geospatial meta-analysis is to map the spatial distribution of TB prevalence in Ethiopia.


Study settings

Ethiopia is the second most populous country in Africa after Nigeria, with an estimated population of 112 million in 2019.26 It has a surface area of approximately 1.1 million km² and a population density of 215 people/km2.26 The country is administratively divided into nine regional states and two administrative cities (first level), which are further divided into zones (second level), districts (third level) and villages (fourth level). The districts are the decentralised administrative level where health resource allocation decisions are made. Villages have an average population size of 5000 in rural areas and 25 000 in urban areas. Ethiopia is listed by WHO as being among 30 high TB and multidrug-resistant tuberculosis (MDR-TB) burden countries.27 Finding all people with TB diseases and providing them appropriate treatment are components of the country TB prevention strategy.

Data sources

For our study, a national TB prevalence survey, supplemented with data from a systematic review of published studies, will be used as the source of TB prevalence data. The national TB prevalence survey is described in detail elsewhere.17 Briefly, it was a nationwide survey conducted between 2010 and 2011 to estimate the prevalence of TB in Ethiopia. A total of 85 villages were included in the survey, including 14 villages in urban areas, 63 villages in rural areas and 8 villages in pastoralist areas. Symptom screening, chest X-ray, sputum smear microscopy and culture were reported among 46 697 adults and adolescents aged 15 years and above.17

Search strategies

The comprehensive systematic review will be conducted for studies published up to 15 April 2020 in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement (online supplementary table S1).28 We will search PubMed, Scopus and Web of Science to identify all potential publications reporting TB prevalence in Ethiopia. The search will be conducted using a combination of key words such as “tuberculosis”, (“prevalence” OR “survey”) and “Ethiopia” without language and time restriction. Complete details of the search strategy are available inonline supplementary table S2. The reference list of the retrieved articles will be checked manually for additional studies. Authors of the papers will be contacted when there is a need for additional information.

Study selection criteria

Studies will be selected based on the following inclusion and exclusion criteria.

Inclusion criteria

We will include all TB prevalence surveys conducted in Ethiopia on the general population using probability sampling methods, and where geographical information on the location of the survey is provided at a higher resolution than region level. TB cases should be confirmed bacteriologically by smear microscopy, GeneXpert or culture.

Exclusion criteria

We will exclude studies that: did not report prevalence data; were conducted outside Ethiopia; did not describe the geographical location of the survey; used non-probability sampling techniques or conducted the survey in specific groups that did not represent the general population such as health professionals, prisoners, people living with HIV, homeless people, migrants and military personnel. We will also exclude case reports, case series, review articles and studies conducted in animals. When multiple studies used the same data, we will include the study with the most detailed geographical and clinical data.

Data extraction

KAA and ZAW will screen the titles and abstracts of the studies and will review the full text based on the inclusion and exclusion criteria. When there is uncertainty about eligibility of the study, reviewers will arrive at a decision by consensus. Data will be extracted from the selected articles using a Microsoft Excel 2016 spreadsheet (Microsoft). The following information will be extracted from each included article: first author; year of publication; geographical location of the survey including the name and administrative unit (region (level 1), zone (level 2), district (level 3) and where available village (level 4)) or coordinates in decimal degrees format (with conversion done where required); start and end dates of survey (month and year); study design; type of living environment in which the survey was conducted (urban, rural or pastoralist); age ranges (lowest and highest) and the proportion of participants that were men. We will also extract information on diagnostic methods; type of TB (drug-susceptible TB (DS-TB) and drug-resistant TB (DR-TB)); TB case definition used; total size of the study population (the sample represented); number of people examined for TB; number of people found positive for TB and prevalence of TB (overall and stratified by sex and age groups). When available, the following additional information will be also extracted from the articles: the number and prevalence of sputum smear positive cases; the number and prevalence of culture confirmed cases; the proportion of smear-positive and bacteriologically confirmed cases that did not report TB symptoms; the proportion with HIV coinfection; history of TB and extrapulmonary TB. The extraction sheet is available in online supplementary table S3.

Quality assessment

Quality of studies and risk of bias will be assessed using tools prepared for prevalence studies.29 30 Funnel plots will be used to detect potential publication bias and small-study effects; the Egger method will be used to assess asymmetry. A p value <0.05 will be considered as indicative of statistically significant publication bias.

Data processing

The data will be check for the presence of duplication. If there are duplicate data at a given location, the survey with the greater amount of information will be used for analysis.

Covariate data sources

Data for covariates for multivariable analysis will be obtained from different sources. Data on climatic variables, including annual mean temperature and annual mean precipitation, will be obtained from the Global Climate Data website ( Proxies of socioeconomic status such as the percentage of people with low wealth index and the percentage of adults who had attended school or who can read and write, as well as healthcare access data such as difficulty of getting healthcare services due to lack of money or distance to a health facility, and behavioural factors such as the percentage of people chewing chat and the percentage of people drinking alcohol will be obtained from the Ethiopia Demographic and Health Surveys (EDHS 2016). The highest resolution for the EDHS data is enumeration areas, a geographical area covering on average 181 households, created for the 2007 Ethiopia Population and Housing Census. The percentage of people with poor knowledge and attitudes towards TB will be obtained from EDHS 2011. If the data to classify geographical locations as rural, urban or pastoralist are not available from each individual study, they will be obtained from the Center for International Earth Science Information Network ( Gridded population density data will be obtained from WorldPop ( The definition of covariates and data sources used in this study will be presented in a table as online supplementary information. Shape files containing the administrative boundaries of Ethiopia will be obtained from openAfrica (


Extracted TB survey data will be geolocated to a specific coordinate of latitude and longitude (in decimal degrees format) where possible or to the smallest polygon available otherwise (village or district). When the TB prevalence survey data are reported at village level, coordinates at the centre of each village will be used to obtain a georeference. Village locations will be identified using Google Maps. In instances when the TB prevalence survey has been reported at a district level (ie, a polygon), a centroid that is spatially weighted according to population density will be used. The survey locations for each study will be stored in a geographical information system, ArcGIS (ESRI, Redlands, California, USA). Data on TB prevalence and covariates will be linked according to location using ArcGIS, to produce a spatially referenced dataset for analysis.

Geospatial analysis

A geospatial model will be fitted for TB prevalence survey data using both fixed covariate effects and random geostatistical effects based on the approach of model-based geostatistics.31 Covariates for the spatial model will be selected using a fixed effects logistic regression model (with an exclusion criterion of Wald p>0.2) in Stata/SE V.15.0 (StataCorp, College Station, Texas, USA). Geospatial models will be constructed using R statistical software (

Model specifications

The proportion of TB cases at each surveyed location j as the outcome variable will be considered to follow a binomial distribution, and a binomial logistic regression model will be constructed as follows:

Embedded Image

where Yi,j is the observed number of individuals testing positive for TB at location j and age–gender group i; ni,j is the true prevalence of TB and Embedded Image is the number of individual screened for TB. The observed prevalence Embedded Image and true prevalence Embedded Image of TB can be linked by rule of probability as follows:

Embedded Image=Embedded Imagesensitivity + (1− Embedded Image) × (1−specificity)

The sensitivity and specificity of TB diagnostic tests such as smear microscopy, GeneXpert and culture will be included in the model using beta distributions, with parameters of α and β. The prior values of α and β will be obtained from the literature.

The mean observed TB prevalence at location j, age–gender group i, will be modelled by:

Embedded Image

where α is the intercept; and β and γ are matrices of coefficients, χ is a matrix of spatially variant covariates and Z is a matrix of spatially invariant covariates (ie, confounders such as age, sex and diagnostic method); and Embedded Image is a geostatistical random effect defined by an isotropic powered exponential spatial correlation function:

Embedded Image

where Embedded Image are the distances between pairs of points a and b and Φ is the rate of decline of spatial correlation per unit of distance. Non-informative priors will be used for the intercept (uniform prior with bounds –∞ and ∞) and the coefficients (normal prior with mean=0 and precision, the inverse of variance=1 × 10–4). The prior distribution of Φ will be also uniform with upper and lower bounds that will be set according to the scale of the study area. The precision of Embedded Image will be given a non-informative prior gamma distribution. Parameter estimation will be done using Markov Chain Monte Carlo simulation. Convergence of the model will be checked by visual inspection of history and density plots. Sufficient values from each simulation run for the variables of interest will be stored to ensure full characterisation of the posterior distributions.

Spatial prediction

Predictions of the prevalence of TB at unsampled locations will be made at the nodes of a 0.1×0.1 decimal degree grid (approximately 11 km²) by interpolating the geostatistical random effects and adding it to the sum of the products of the coefficients for the spatially variant fixed effects and the values of the fixed effects at each prediction location.32 This technique has been previously used to estimate the prevalence of schistosomiasis and soil transmitted helminths in Africa.33 34 The overall sum will be back transformed from the logit scale to the prevalence scale, providing prediction surfaces that show the prevalence of TB in each age and sex group for all prediction locations.

Model validation

Validation of the discriminatory ability of the models will be undertaken by partitioning the entire dataset into four random subsets, running the model using three of four subsets and validating the model with the remaining subset. The ability of the models to predict known mean prevalence of TB will be assessed by mean prediction error which will provide a measure of overall bias. The observed values will be compared with the mean of the posterior distribution of each predicted value of prevalence. The area under the curve of a receiver operating characteristic will also be calculated to provide an estimate of discriminatory performance relative to different observed prevalence thresholds.

Estimating the number of people with TB

The estimated number of TB cases (stratified by age group and sex) will be calculated for each location using the product of the mean predicted prevalence of TB and size of the population at risk in each location. This number will be compared with the number of TB cases notified by the national TB surveillance system to estimate the number of misreported or undetected cases in each location.

Patient and public involvement

No patient or member of the public was involved in this study

Discussion and conclusion

TB prevalence surveys are important to directly measure the burden and trend of TB in the general population, particularly in countries where there is high burden of TB, low case detection rate and week reporting systems. The information obtained from TB prevalence surveys can be used by policy-makers to improve TB diagnosis and treatment services as well as to improve reporting systems. Early diagnosis and treatment are crucial to stop TB transmission and reduce the burden of the disease. However, TB prevalence surveys are expensive and logistically difficult to implement regularly.

Effective control and prevention of TB partly depend on information on where and to what extent the disease is present. Spatial analysis can provide such information and identify areas where the diagnostic and treatment services of TB should be implemented. Geospatial meta-analysis combines several individual studies and their spatial components in a single study and provides several advantages (over meta-analysis and geospatial analysis alone). It can be used to synthesise data that obtained from several individual studies (meta-analysis) and assigning them location information (a spatial component) to increase the chances of making accurate estimates across geographical areas.

Like any other systematic review and meta-analysis, a clear protocol is needed before conducting geospatial meta-analysis. Although our protocol has been developed in the context of TB prevalence surveys in Ethiopia, it can be adopted for other infectious diseases or countries.

Ethics and dissemination

Ethical approval will not be required for this study as it will be based on deidentified, aggregate published data. The final report of this review will be disseminated through publication in a peer-reviewed scientific journal and will also be presented at relevant conferences.



  • Contributors KAA and ACAC conceived the study. KAA developed the search strategy and drafted the protocol, and ACAC revised the drafted protocol. All authors critically revised the manuscript for methodological and intellectual content and read and approved the final manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.