Article Text

Original research
Methodological considerations for linking household and healthcare provider data for estimating effective coverage: a systematic review
  1. Emily D Carter1,
  2. Hannah H Leslie2,
  3. Tanya Marchant3,
  4. Agbessi Amouzou1,
  5. Melinda K Munos1
  1. 1International Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
  2. 2Global Health and Population, Harvard TH Chan School of Public Health, Boston, Massachusetts, USA
  3. 3Disease Control, London School of Hygiene and Tropical Medicine, London, UK
  1. Correspondence to Dr Emily D Carter; ecarter{at}


Objective To assess existing knowledge related to methodological considerations for linking population-based surveys and health facility data to generate effective coverage estimates. Effective coverage estimates the proportion of individuals in need of an intervention who receive it with sufficient quality to achieve health benefit.

Design Systematic review of available literature.

Data sources Medline, Carolina Population Health Center and Demographic and Health Survey publications and handsearch of related or referenced works of all articles included in full text review. The search included publications from 1 January 2000 to 29 March 2021.

Eligibility criteria Publications explicitly evaluating (1) the suitability of data, (2) the implications of the design of existing data sources and (3) the impact of choice of method for combining datasets to obtain linked coverage estimates.

Results Of 3805 papers reviewed, 70 publications addressed relevant issues. Limited data suggest household surveys can be used to identify sources of care, but their validity in estimating intervention need was variable. Methods for collecting provider data and constructing quality indices were diverse and presented limitations. There was little empirical data supporting an association between structural, process and outcome quality. Few studies addressed the influence of the design of common data sources on linking analyses, including imprecise household geographical information system data, provider sampling design and estimate stability. The most consistent evidence suggested under certain conditions, combining data based on geographical proximity or administrative catchment (ecological linking) produced similar estimates to linking based on the specific provider utilised (exact match linking).

Conclusions Linking household and healthcare provider data can leverage existing data sources to generate more informative estimates of intervention coverage and care. However, existing evidence on methods for linking data for effective coverage estimation are variable and numerous methodological questions remain. There is need for additional research to develop evidence-based, standardised best practices for these analyses.

  • public health
  • quality in healthcare
  • public health
  • statistics & research methods

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study. As a review article, this article reports data from previously published studies.

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • We systematically reviewed a wide range of methodological issues pertaining to linking population-based and health provider data for effective coverage estimation.

  • The review was limited by the diversity of terminology and fields related to the linking methodology.

  • Multiple search strategies were used to minimise the likelihood of overlooking relevant publications.

  • Results of the review are summarised and related to actionable items and needs for future research.


There is growing demand for tracking progress towards the sustainable development goals through effective coverage estimates.1 2 Effective coverage measures assess not only the proportion of individuals in need of an intervention who receive it, but also the content and quality of services received with an aim to estimate the proportion of individuals receiving the health benefit of an intervention.2 Numerous publications have estimated effective coverage3 using a range of methods and measures to define intervention need, receipt and quality.

Linking household and health provider data is a promising means of generating effective coverage estimates that provide population-based estimates and incorporate data on service quality from health facilities. Data from household surveys can provide a population-based estimate of intervention need and care-seeking for services, such as the proportion of women with a recent live birth who delivered in a health facility. However, a number of maternal, newborn and child health interventions4 5 cannot be accurately measured through household surveys due to reporting errors and biases by respondents (eg, the proportion of women who received a uterotonic during delivery). Health provider assessments yield information on provider quality, including available infrastructure, commodities, equipment, human resources and potentially provision of care. Provider data do not capture need for care in the population, care-seeking behaviour or the experience of individuals who do not access the formal health system. Linking these two data sources can provide a more complete picture of population access to and coverage of high-quality health services, for example, the proportion of women who delivered at a health facility with sufficient structural resources and competence to provide appropriate labour and delivery care.

There are many approaches for combining household and provider datasets.6 The results depend on the choice of data and of methods for combining datasets. However, very limited guidance exists to guide decision making. We conducted a systematic review to understand the current evidence base for effective coverage linking methods and identify needs for further research.


We searched for papers addressing methods or assumptions regarding: (1) the suitability of household and provider (defining health providers as healthcare outlets such as health facilities, pharmacies, and community-based health workers) data used in linking analyses, (2) the implications of the design of existing household (Demographic and Health Survey (DHS) and Multiple Indicator Cluster Survey (MICS)) and provider (Service Provision Assessment (SPA) and Service Availability and Readiness Assessment (SARA)) data sources commonly used in linking analyses and (3) the impact of choice of method for combining datasets to obtain linked coverage estimates.

Our primary search was conducted in Medline. The search was limited to papers published between 1 January 2000 and 29 March 2021 that included terms related to (1) effective coverage, benchmarking, system dynamics or universal health coverage (UHC) metrics, or (2) structural, process and/or health outcome quality, (3) linking analyses using terms adapted from Do et al,6 (4) validity of self-report health indicators and (5) spatial methods for measuring utilisation or distance to care. A full list of Medline search terms is presented in online supplemental file 1. The search was conducted using English-language terms; however, publications in English, Spanish and French were reviewed if captured in the search. Additionally, we conducted searches using these criteria in Population Health Metrics (which was not fully indexed in Medline at the time of our search), the Carolina Population Health Center and DHS publications. In a second step, we handsearched the references of a systematic review by Do et al on linking household and facility data to estimate coverage of reproductive, maternal, newborn and child health (RMNCH) services,6 and a review by Amouzou et al of effective coverage analyses.3 Both the Do and Amouzou reviews summarised publications that linked data or estimated effective coverage; however, they did not systematically address methodological concerns or relevant results for guiding application of these methods. We also handsearched the references, citing works and journal—or database interface-generated related publications of all articles that passed the title and abstract review.

Publications were reviewed for relevant analyses or commentary related to linking methodologies. Articles were included if they explicitly evaluated or compared assumptions used in linking approaches for at least one of the areas defined above. The review focused on low-income and middle-income countries (LMICs) and data sources common in these settings, however, publications from high-income settings were retained if the relevant evidence could translate to LMICs (eg, use of centroid global positioning system (GPS) location in estimates of distance, validity of provider quality measures). No formal quality assessment was conducted due to the diversity of study designs and research objectives of the papers relevant to the review. Title and abstract review were conducted simultaneously by the first author (EC). Data extraction included the title, author, year of publication, country or countries included in analysis, data source and specific analyses or findings relevant to linking loosely categorised by topic areas. Topical area groupings emerged from the review and were used to structure the findings.

Patient and public involvement

As a systematic review, neither patients nor the public were involved in the design, conduct, reporting, or dissemination plans of our research.


The Medline search produced 3669 publications, along with 79 from the Carolina Population Center, 4 from Population Health Metrics, 12 DHS publications, 35 papers included in the review by Amouzou et al and 49 papers included in the review by Do et al meeting the publication date restrictions. After removing duplicates, 3805 publications were included in the title and abstract review and 236 were included in the full text review. Of those papers included in the full text review, 56 publications addressed a methodological concern related to linking household and provider data and were included in the final review. Fourteen additional publications were identified through the snowball review of references and related works (figure 1). In total 70 publications addressed one or more methodological concern, including the suitability of household (n=13) and provider data (n=39) for use in linking analyses, concerns related to the design of existing household (n=6) and provider (n=4) data sources and methods for combining household and facility data (n=14). A list of publications included in the review and a summary of their contributions to the review are provided in table 1.

Figure 1

PRISMA flow diagram. DHS, Demographic and Health Survey; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Table 1

Summary of publications included in the review and contribution to the literature

Suitability of household and provider data for linking analyses

Suitability of household data needed for linked estimates

In effective coverage linking analyses, household surveys can be used to estimate the proportion of the population in need of healthcare, as well as care-seeking behaviour. Household surveys must produce valid estimates of these parameters and provide care-seeking data that can be linked to provider assessments. This review identified papers discussing issues in defining intervention need (n=8) and care-seeking (n=5) that should guide selection of indicators for linking.

Intervention need

Estimation of intervention need may require solely population demographics such as age (eg, for prevention and health promotion interventions) or may require defining specific illnesses or conditions. The latter is more subject to reporting bias.7 Multiple studies have shown poor association or biases between maternally reported symptoms and clinical pneumonia,8 9 malaria10 and diarrhoea11 in children under 5. A handful of studies (n=3) showed maternal report of both maternal and newborn birth complications is variable.12–14 A simulation by Shengelia et al demonstrated the effect of the divergence of true from perceived intervention need on effective coverage estimates. The authors propose estimating the posterior probability of disease based on responses to symptomatic questions using a Bayesian model to measure disease presence on a probabilistic scale.7 However, there has been no work on how to integrate these adjusted estimates into effective coverage estimates.

Care-seeking behaviour

Four studies addressed the accuracy of respondent report of seeking care. Mothers in Zambia and Mozambique were able to accurately report on the type of health provider where they sought sick child15 and delivery care,16 respectively. However, studies in two countries suggested women cannot report on the type of health worker who attended to them during labour and delivery and immediate postnatal care.17 18 Wang et al note that provider categories are not standardised between population surveys and health system assessments, with population surveys often including vague or overly broad categories that do not directly match SPA/SARA categories and require harmonisation.19

Suitability of healthcare quality data needed for linked estimates

Provider assessments present data on service content and quality for effective coverage linking analyses. However, the measurement, construction and interpretation of provider quality measures are highly variable and may significantly alter effective coverage estimates. This paper does not present an exhaustive review of healthcare quality measures or the association between levels of quality. A comprehensive summary of quality of care concepts and measurement approaches, along with their relative strengths and limitations, was presented by Hanefeld et al.20 Publications of particular relevance to linking analyses are noted here, with an emphasis on national provider survey data as the most common source of provider data for linking analyses.

Methods used in assessing provider quality

A review by Nickerson et al found significant variability in the data collected and methods used in health facility assessment tools in LMICs.21 While SPA and SARA data are the most widely used sources of data on health service delivery in LMICs, one paper noted that these surveys focused primarily on structural quality with less data on provision and experience of care.22 The lack of process quality data is in part related to the reliance on direct observation of clinical care—a time-intensive and resource-intensive method—to collect these data. None of the studies included in the review used Health Management Information Systems (HMIS) data to generate linked coverage estimates. A desk review by the Maternal and Child Survival Program (MCSP) found that data collected through HMIS was variable across countries, data recorded within registers often was not transmitted through the system, and only a limited number of indicators collected were related to the provision of health services.23

Nine publications assessed alternatives to direct observation of clinical care for collecting process quality data. Two studies found no association between process quality and maternal perceptions of the quality of care received24 25 while one study found perceived quality was associated with the number of services received but not structural quality.26 Agreement between observed care and health records or provider report was also variable.27–29 A review by Hrisos et al found few studies to support use of patient report, provider self-report or record review as proxy measures of clinical care quality.30 In the USA, vignettes performed better than chart abstraction for estimating quality.31 Another review found providers were unable to accurately assess their own performance, with the worst accuracy among the least skilled providers.32 Five other publications used alternative methods for measuring process quality, including use of vignettes,26 33 register review,26 34 most recent delivery interview35 and an mHealth tool,36 but did not assess their performance against other measurement methods.

Content of provider quality indices

Most linking papers estimating effective coverage included in this review (n=15) characterised provider quality using structural measures of quality, with or without measures of process quality. Various approaches were used to select items for inclusion in these measures. Measures of structural and process quality were derived from either national or international guidance on minimum service availability and required commodities, equipment, infrastructure, training or actions. Measures used by effective coverage analyses included SPA or SARA structural indicators33 37–39 and/or clinical observations,38 40–43 emergency obstetric and newborn care functions,28 34 44 45 provider recall of actions during their last delivery,35 44 and measured health outcomes.46 47

Construction of provider quality indices

In addition to the range of variables used in provider measures, there was no consensus on the approach to use to select and combine variables to generate quality indices. The reviewed publications used a variety of approaches to construct indices including weighted indices,41 simple averages across all indicators or domains,33 39 40 42–44 and categorisation using set thresholds or relative categories.26 37 45 A review of quality measurement using SPA data found that studies frequently did not apply a theoretical framework when selecting indicators for quality measures, and that there was high variability in the indicators included in quality scores.48 In our review, seven publications presented data on the performance of different measurement modes and summary approaches. Two studies found the method of selecting and combining quality indicators had little effect on overall effective coverage estimates.49 50 However, two other studies found inconsistency in the rankings of health facilities when using different index methods.51 52 Two studies using principal component analysis (PCA) to create SPA health service indices found the reduced indices explained only a limited amount of the variance across indicators.52 53 An analysis of SPA data in ten countries found indices empirically derived through machine learning captured a large proportion of the service readiness data in the full SPA index, however, the selected set of indicators varied across countries, and an index generated through expert review captured very little of the data from the full index.54 Two studies found that few facilities could meet all requirements when applying a threshold, limiting the utility of the approach.45 51

Performance of provider measures

Despite the common usage of SPA and SARA data-derived structural and process quality measures, the review found limited data explicitly assessing the association of these measures with each other and health outcomes (n=7). Three studies, two incorporating data from multiple countries, found little association between structural quality and process quality.41 55 56 However, an analysis of SPA data from three countries found a small but significant association between antenatal care (ANC) facility structural and process quality and suggests structural quality can limit provider performance when basic infrastructure and commodities are unavailable.50 Akachi and Kruk emphasised the limited number of studies showing process quality associated with health outcomes.57 Two studies in Malawi found a small association between an obstetric quality index and decreased neonatal mortality58 and an association between quality-adjusted ANC nutrition intervention coverage and decreased low birth weight prevalence.42 Another found a national UHC ‘heath service coverage’ index correlated strongly with infant mortality rate and life expectancy.59

Implications of design of existing household and provider data sources commonly used in linking analyses

Issues related to household and cluster location data

The way in which common household surveys, particularly the DHS and MICS, collect and process location data may also impact the validity of some linked estimates. In many household datasets used for linked analyses, the precise location of individual households is often unknown. The DHS collects central point locations for clusters, rather than household locations, and displaces these points in publicly released datasets.60 MICS often does not collect or make geographical information system (GIS) data available.61 Imprecision around household location may influence the accuracy of estimates generated by linking household and provider data based on geographical proximity.

Data on household location

The effect of using cluster central point locations rather than individual household locations in linking analyses was not addressed by any publication identified in this review. However, four studies looked at the effect of using centroids of varying areal units versus household locations in distance analyses. Two studies found using US census tract62 and zip-code63 centroid locations produced little difference in measures of facility access compared with household location. A third study showed use of areal unit centroids resulted in misclassification of household access to health-related facilities, especially in less densely populated rural areas.64 However, in rural Ghana, measures calculated from village centroids identified the same closest facility as measures from compound locations for over 85% of births.65

Cluster displacement

Displacement of cluster central points might induce additional error in analyses based on geographical proximity. A DHS analytical report found that ignoring DHS displacement in analyses that used distance to a resource as a covariate resulted in increased bias and mean squared error. However, this will not affect linking by administrative unit because DHS has restricted displacement to within the representative sample administrative unit since 2009.60 A simulation analysis in Rwanda reported DHS cluster displacement produced less misclassification in level of access and relative service quality than healthcare provider sampling.66

Issues related to provider sampling

Typical sampling designs for healthcare provider data also present issues for linking analyses. Both SPAs and SARAs are sampled independently of household surveys, thus, there may be no sampled facilities near household survey clusters.67 SPA and SARA surveys typically collect data on a sample, rather than census, of public, private and non-governmental organization (NGO) health facilities and exclude non-facility providers, such as pharmacies or community health workers (CHWs). In most settings, facilities are sampled and analysed to be representative of all facilities within a managing authority, level and/or geographical area, and the results of the provider assessment are not intended to represent the population using health services.67 For provider assessments conducting direct observations of clinical care, the number and type of interactions observed within each health facility is dependent on patient volume and chance.

Provider sampling frame

Two papers assessed the impact of excluding non-facility providers on linked effective coverage estimates. In Zambia and Cote d’Ivoire, CHWs offered a level of care for sick children similar to first-level public facilities. Excluding these providers reduced estimates of effective coverage in Zambia where CHWs were a significant source of skilled care in rural areas,33 but had little effect in Cote d’Ivoire where they were an insignificant source of care.40 In both studies, exclusion of pharmacies did not alter effective coverage estimates as they were an uncommon source of care, though they offered moderate structural quality.33 40

Provider sampling design

Two publications addressed the impact of facility survey sampling designs. At the facility level, Skiles et al’s analysis demonstrated that sampling facilities, rather than using a census, led to an underestimation of the adequacy of the health service environment and substantial misclassification error in relative service environment for individual clusters.66 No studies addressed the suitability of SPA or SARA facility sampling strategies for generating stable quality estimates for use in linking analyses at a level below administrative unit used for the sampling approach.

A Measure Evaluation manual emphasised that data on provision of services (collected through observation of client–staff interactions), experience of care (collected through client exit interviews) and staff characteristics (collected through health worker interviews) are sampled independently and collected among health workers and care interactions available on the day of the survey. These data are a subsample of the overall survey and representative at the level the survey is sampled to be representative—not at the facility level.67 This paper proposed multiple linked sampling approaches to capture geographically concordant household and provider data for linked analyses. While multiple studies included in this review used a census or sample of providers derived from a household sample, none implemented this approach at a national scale.

Issues related to timing of surveys used in linked coverage estimates

Both care-seeking behaviour and provider quality are likely to vary over time, and both household and provider surveys are conducted infrequently in LMICs (~3–5 years). Linked coverage estimates for RMNCH may cover a long time frame as the reference period for care-seeking in household surveys varies from 2 weeks (sick child care) to 2–5 years (peripartum care). Population movement and quality improvement efforts at facilities further complicate associations with increasing time lags. The implications of linking household and provider indicators of different temporal periods is unclear.

Stability of provider indicators

No paper in this review specifically addressed the effect of provider indicator stability on linked effective coverage estimates. However, three linking papers presented data on the stability of some health facility indicators over time. Expanded Quality Management Using Information Power (EQUIP) studies in Uganda and Tanzania found moderate variability in the availability of some maternal and newborn health commodities and services over a period of 2–3 years.44 68 69

Stability of household indicators

Care-seeking behaviour, including overall rates of care and utilisation of different sources of care, may also change over time. Analysis of care-seeking for child illness70 and maternal healthcare71 in multiple LMICs over time showed high inconsistency in trends across countries. However, no identified studies addressed the consequences of this temporal variability within the context of linking analyses.

Impact of choice of method for combining household and provider data

The approach for combining household and provider data can potentially have a significant impact on linked coverage estimates. Methods used to link data, including exact match and various types of ecological linking, are defined in table 2. Exact match linking assigns provider information to individuals in the target population based on their specific source of care. This approach, while potentially subject to the reporting biases described previously, is considered the most precise approach for combining the two data sets in the absence of individual patient health records.6 Without data on specific source of care, ecological linking approaches are designed to approximate care-seeking behaviour or model healthcare access by linking the target population to sources of care based on geographical proximity or administrative catchment area, making assumptions about service access and use.

Table 2

Table of linking approaches

Comparison of exact match and ecological linking methods for estimating effective coverage

Three publications explicitly compared effective coverage estimates generated using exact match and ecological linking methods (table 3).33 40 44 Estimates generated using the exact match linking approach were considered the gold-standard measure of effective coverage. All three publications found exact match linked effective coverage estimates were similar to straight-line,33 40 travel time,33 40 5 km buffer,33 10 km buffer40 and administrative unit33 40 44 geolinked estimates for antenatal,40 labour and delivery,40 44 postnatal40 and sick child33 40 care when linking was restricted by the reported provider category (eg, hospital, health centre, CHW). Distance-restricted linking approaches, such as linking to providers within a 5 km radius, produced inaccurate results if unlinked events were treated as no care.33 Restriction of geographical linking to only providers within the reported category of care and/or weighting by providers’ relative patient volume improved agreement between the exact match and ecological linking estimates.40 44 All three studies also used provider data obtained from a census of health facilities, and therefore, the findings may not be applicable when household data are linked to a sample survey of health facilities.

Table 3

Exact versus ecological linking estimates for select indicators across studies

Performance of measures of geographical proximity for ecological linking

Eight studies assessed the performance of geographical measures for assigning households or individuals to their reported source of healthcare. Four studies in sub-Saharan Africa compared the predicted source of care based on geographical proximity against the true source of care. They found straight-line and road distance performed similarly,72 high performance of shortest travel time method73 and better performance of straight-line distance compared with road distance.33 40 In the USA, a more sophisticated approach (two-stage and three-stage floating catchment area) performed better than alternatives methods in assigning households to their source of care.74 Three studies in sub-Saharan Africa evaluated use of Theissen boundaries, a method of defining catchment boundaries based on the optimal distance between known providers, in assigning households to the catchment of facilities they used. The studies found high performance in some settings,75 but poorer performance related to the use of higher-order facilities76 and influence of public transportation routes.77

Statistical challenges

Most linking analyses that have generated effective coverage estimates by assigning individuals the quality score of the reported or linked source of care have derived estimates of uncertainty based on household sampling error and ignored any sampling error around provider data. However, two analyses used the Delta method78 for estimating the variance of effective coverage estimates generated by multiplying service use and readiness.19 44 A simulation study compared three variance estimation methods for linked effective coverage measures (household sampling error alone, parametric bootstrapping and the delta method), and found that all three performed similarly for large samples. However, the delta method produced more valid confidence bounds with smaller samples or when the effective coverage estimate approached either 0 or 100%.79


This review found a variable number of publications that addressed the diverse methodological issues related to linking household and provider datasets. A summary of key findings and needs for further research is presented in table 4 and discussed below.

Table 4

Summary of evidence related to methodological issues for linking analyses and related needs for future research

Suitability of household and provider data for linking analyses

We identified a number of papers that critically assessed household and provider data needed for linking analyses. The limited existing data on respondent-reported care-seeking suggest respondents can identify sources of care if not individual healthcare worker cadre, but additional validation in various settings and service areas, such as postnatal care, would be informative. Further, it is essential to ensure that categorisation of sources of care in household surveys align with the categories used in provider assessments to facilitate linking datasets. The validity of household survey data for estimating populations in need was more variable. While some populations in need can be clearly defined, others, particularly those requiring symptom-derived diagnoses based on respondent report, have demonstrated potential for bias. Additional work is needed to explore alternative methods for identifying populations in need of interventions within population-based data sources.

The content and construction of provider quality indices was highly variable across publications, but largely derived from facility surveys and informed by international guidelines or recommendations. Methods for collecting provider quality have a number of limitations, and no single method perfectly encompasses all aspects of care.80 The review found a lack of agreement between measures of quality derived through various means of collection. Overall, there was little empirical data supporting association between structural quality and process quality, and measures of quality and appropriate care or good health outcomes, although the number of reviewed studies was very limited. However, as articulated by Nguhiu et al, there is need to consider quality indicators’ ‘intrinsic value as levers for management action’ and application to policy decision making in addition to their ability to capture or predict associated health gain.38 Many important indicators of healthcare quality, particularly around patient-centred care, are not currently measured through existing tools and there is a need to better capture these indicators.81 82 Additional research is needed in the short term to develop and evaluate new quality indices using existing data sources (eg, facility surveys, HMIS and medical records) with an aim of identifying a standardised approach for selecting, combining, and interpreting indicators that reflect aspects of provider quality necessary for delivering appropriate, respectful and effective care. Longer term, substantial effort is needed to strengthen or adapt existing mechanisms or develop alternative methods for collecting provider quality indicators that can produce timely and informative estimates for tracking effective coverage of key interventions.

Implications of the design of existing household and provider data sources commonly used in linking analyses

Few studies addressed the influence of the design of common data sources on linking analyses, including the impact of imprecise household GIS data, provider sampling frame and sampling design and estimate stability. However, there was a lack of concrete evidence around the impact of these factors on linked effective coverage estimates. Explicitly evaluating the impact of imprecise household location, sampling design and temporal gaps between measures within the context of effective coverage estimation would be informative. Mixed results on the inclusion of non-facility providers in provider assessments for effective coverage estimation emphasise the need to empirically assess the utilisation and service quality of non-facility providers in a given setting prior to conducting a linking analysis, as the quality and use of these providers varies by health area and setting.70 71 83 Although data related to impact on effective coverage estimation were limited, small samples of client-staff observations, sampling of health workers and facilities, and temporal gaps between household and provider data have the potential to bias estimates. The available data suggest that developing and testing alternative means of sampling health providers could improve the validity of linked estimates of effective coverage, including evaluating joint sampling approaches proposed by Measure Evaluation67 or used by other data collection mechanisms such as Performance Monitoring for Action (PMA) and the India District Level Household and Facility Survey.

Impact of choice of method for combining household and provider data

The most consistent evidence found through the review was around methods for combining data sets. Three papers compared ecological and exact match linking and reported that ecological linking (when accounting for frequency of provider utilisation by type) produced similar estimates to exact match linking. The agreement between the three publications that compared exact match and ecological linking is promising. Exact match linking is considered the most precise method for generating linked estimates; however, ecological linking is often more feasible because it does not require information on exact source of care or data on all providers. The papers further point to the need to maintain data on type of provider from which care was sought or the relative volume of patients seen by providers in order to generate valid estimates of effective coverage. All three studies were conducted in rural sub-Saharan Africa in settings with high utilisation of public sector health facilities; additional studies evaluating the performance of these methods in settings with a more diverse healthcare landscape would be informative. Other papers evaluated ecological linking approaches and found similar estimates of access to care or effective coverage using different approaches for assessing geographical proximity, although the ability of methods to capture true source of care was more variable. External to this review, additional data suggest that individuals may not always use the closest source of care and may bypass providers in favour of providers offering better care.37 84 85 These findings along with the analyses comparing exact match and ecological linking approaches emphasise the need to carefully select methods for performing ecological linking and to control for true care-seeking behaviour as much as possible by accounting for the type of provider from which care was sought or weighting by utilisation in linking analyses. There is also need to further develop approaches and tools for estimating uncertainty around linked effective coverage estimates.

Evidence across the review demonstrates the need for careful choice of methods, data sources and indicators when conducting studies or analyses to link household and provider data for effective coverage estimation. An exploration of the precise effect of setting characteristics, such as variation in provider quality, on effective coverage estimates is needed to guide decision making in the selection of linking methods. Once more of these issues have been evaluated, additional tools and guidance to facilitate use of these methods will be needed.

The review was limited by the diversity of terminology and fields related to the linking methodology. However, the use of multiple search strategies minimised the likelihood of overlooking relevant publications. No formal grading of publication quality was included in the assessment, but the choice to conduct the search through Medline was intended to ensure a basic level of quality across the diverse study designs included in the review. Additionally, the diversity of fields, approaches and questions made it difficult to summarise the findings neatly, emphasising the need for communication between researchers, more standard terminology, and, ideally, a cohesive research strategy going forward. Recent efforts have aimed to align definitions of effective coverage.2 We attempt in table 4 to translate the review results into actionable items and needs for future research.


Linking household and healthcare provider data is a promising approach that leverages existing data sources to generate more informative estimates of intervention coverage and care. These methods can potentially address limitations of both household and provider surveys to generate population-based estimates that reflect not only use of services, but also the content and quality of care received and the potential for health benefit. However, there is need for additional research to develop evidence based, standardised best practices for these analyses. The most pressing priorities identified in this review are: (1) for those collecting data from health systems to explore methods to strengthen existing provider data collection mechanisms and promote temporal and geographical alignment with population-based measures, (2) for those collecting population-based data to address validity of self-reported intervention need and ensure indicators of access and utilisation of care are measured to facilitate linking analyses and (3) for those conducting linked analyses to standardise approaches for generating and interpreting effective coverage indicators, including sources of uncertainty, to ensure we are producing evidence that is harmonised, informative and actionable for governments and valid for monitoring population health globally.

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study. As a review article, this article reports data from previously published studies.

Ethics statements

Patient consent for publication


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors ICMJE criteria for authorship read and met: EC, HL, TM, AA and MKM. Conceived of the study design: EC, MKM. Conducted review: EC. Drafted paper: EC. Agree with manuscript and conclusions: EC, HL, TM, AA and MKM. All authors read, edited and approved the manuscript.

  • Funding This work was supported, in whole, by the Improving Measurement and Program Design (IMPROVE) grant (OPP1172551) from the Bill & Melinda Gates Foundation. The funders did not have any role in the design of the study and collection, analysis, and interpretation of data or in writing the manuscript.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.