Development of summary indices of antenatal care service quality in Haiti, Malawi and Tanzania

Introduction Measuring quality of care in low-income and middle-income countries is complicated by the lack of a standard, universally accepted definition for ‘quality’ for any particular service, as well as limited guidance on which indicators to include in measures of quality of care, and how to incorporate those indicators into summary indices. The aim of this paper is to develop, characterise and compare a set of antenatal care (ANC) indices for facility readiness and provision of care. Methods We created nine indices for facility readiness using three methods for selecting items and three methods for combining items. In addition, we created three indices for provision of care using one method for selecting items and three methods for combining items. For each index, we calculated descriptive statistics, categorised the continuous index scores using tercile cut points to assess comparability of facility classification, and examined the variability and distribution of scores. Results Our results showed that, within a country, the indices were quite similar in terms of mean index score, facility classification, coefficient of variation, floor and ceiling effects, and the inclusion of items in an index with a range of variability. Notably, the indices created using principal components analysis to combine the items were the most different from the other indices. In addition, the index created by taking a weighted average of a core set of items had lower agreement with the other indices when looking at facility classification. Conclusions As improving quality of care becomes integral to global efforts to produce better health outcomes, demand for guidance on creating standardised measures of service quality will grow. This study provides health systems researchers with a comparison of methodologies commonly used to create summary indices of ANC service quality and it highlights the similarities and differences between methods.


AbStrACt
Introduction Measuring quality of care in low-income and middle-income countries is complicated by the lack of a standard, universally accepted definition for 'quality' for any particular service, as well as limited guidance on which indicators to include in measures of quality of care, and how to incorporate those indicators into summary indices. The aim of this paper is to develop, characterise and compare a set of antenatal care (ANC) indices for facility readiness and provision of care. Methods We created nine indices for facility readiness using three methods for selecting items and three methods for combining items. In addition, we created three indices for provision of care using one method for selecting items and three methods for combining items. For each index, we calculated descriptive statistics, categorised the continuous index scores using tercile cut points to assess comparability of facility classification, and examined the variability and distribution of scores. results Our results showed that, within a country, the indices were quite similar in terms of mean index score, facility classification, coefficient of variation, floor and ceiling effects, and the inclusion of items in an index with a range of variability. Notably, the indices created using principal components analysis to combine the items were the most different from the other indices. In addition, the index created by taking a weighted average of a core set of items had lower agreement with the other indices when looking at facility classification.
Conclusions As improving quality of care becomes integral to global efforts to produce better health outcomes, demand for guidance on creating standardised measures of service quality will grow. This study provides health systems researchers with a comparison of methodologies commonly used to create summary indices of ANC service quality and it highlights the similarities and differences between methods.

IntroduCtIon
Reducing maternal morbidity and mortality has been a global priority over the last several decades. While progress has been made in both improving coverage and reducing mortality, there remains an unacceptably high number of deaths of women globally. 1 In many countries, considerable increases in the coverage of health services have not translated into sufficient reductions in mortality. 2 This finding suggests that quality of care may play a critical role in producing better health outcomes. Reducing maternal mortality will require increasing the coverage of health interventions, and ensuring that those interventions are delivered with a high level of quality. 3 4 These interventions include intrapartum and postpartum care, and antenatal care (ANC), which is important to maintaining a healthy pregnancy, [5][6][7] and promotes safe delivery, postnatal attendance, and is positively associated with an increase in facility-based deliveries. [8][9][10] Low-income and middle-income countries (LMICs) are increasingly shifting focus towards making improvements in the quality of health services delivered, which requires measuring quality of care and monitoring progress. However, for LMICs to systematically and comprehensively measure quality of care, they require a definition of quality of care and clear, specific indicators to operationalise the definition. Many quality of care frameworks exist, most of which build from Donabedian's 1988 framework characterising quality at three levels: structure, commonly called facility readiness (the setting in which care occurs including material resources, human resources and organisational structure), process (commonly called provision of carethe quality of medical advice delivered by providers to clients, as well as interpersonal relationships between the provider and the client), and outcome (the effects of care on the health status and behaviours of patients, as well as improvements in patient knowledge and the degree of satisfaction with care). 11 12 While there may be growing consensus on the core components of a framework for describing quality of care, measuring quality of care is complicated by lack of a standard, universally accepted definition for 'quality' for any one particular service. [13][14][15] In addition, there is limited guidance on which indicators to include in measures of quality of care and how to incorporate those indicators into summary indices.
Studies on quality of care in LMICs largely rely on data from health facility surveys that collect data on facility readiness and provision of care. These studies often use summary indices or composite scores to provide an overall description of service quality. [16][17][18][19][20][21][22][23][24][25][26] Summary scores are useful to LMIC governments in that they simplify complex data and enable comparison of performance within facilities, across administrative units and over time. However, there is little consistency between studies in terms of the items included in summary metrics of quality of care, or in how the metrics are created. A scoping review published in 2018 identified numerous studies that used health facility survey data to assess maternal and newborn quality of care in LMICs and found that studies used various approaches to create quality of care metrics. 27 Online supplementary table 1 illustrates this with an excerpt of studies that used data from the Service Provision Assessment (SPA) to create quality of care metrics for ANC specifically, each using a different approach. Item selection was guided by different sources including the Donabedian quality of care framework, 17 28 clinical guidelines 18 19 29 30 and various definitions of quality of care 20 21 . The number of items that were included in an index varied from a minimum of 7 to a maximum of 40, and various methods were used to combine items into indices. These methods include a simple and weighted additive approaches 20 22-26 as well as principal components analysis (PCA) 28 31-33 . While many of these studies had some overlap in the items selected for inclusion in their ANC quality of care index, the quality of care indices varied greatly in the items selected for inclusion in the indices and in the methodology used for combining items into a summary measure.
As quality of care indices are increasingly used by countries, global agencies and researchers to quantify the quality of health services and estimate effective coverage, there is a need to standardise the methodology for creating these measures. To our knowledge, no studies have compared the methodologies used to create indices of ANC service quality. The objective of this paper is to develop, characterise and compare a set of facility readiness and provision of care indices for ANC.

MethodS overall approach
Using data from an expert survey, we created nine facility readiness indices using three methods for selecting items (a 'core set' of items, 'expert survey' set of items and 'maximum' set of items) and three methods for combining items (simple additive, weighted additive and PCA) (figure 1). We created three indices for provision of care using one method for selecting items ('expert survey' set of items) and three methods for combining items (simple additive, weighted additive and PCA) (figure 1). Data for these indices come from the SPA. We then compared the indices and examined the variability and distribution of the index scores.

data source
The SPA is a health facility assessment used in LMICs to generate nationally representative data on health service delivery. 34 While there are a number of widely used health facility assessment tools, we chose to use SPA data for this analysis as the SPA has been conducted in a number of different countries, includes observation of clinical care, and the data are publicly available through the Demographic and Health Survey (DHS) programme. The SPA includes a standard set of survey instruments: a facility inventory questionnaire, health worker interviews, observation of ANC consultations and exit interviews with ANC clients.
We examined all SPA surveys for inclusion in the analysis (total of 16). We included all SPA surveys which used the DHS-VI or DHS-VII questionnaire (four surveys Open access excluded), that included observations of ANC consultations (three surveys excluded), were conducted in the last 5 years (between 2013 and 2018) (two surveys excluded), and for which recode data files were available (four surveys excluded). The included surveys are from Haiti (2013), Malawi (2013/2014) and Tanzania (2014/2015). The SPA final country reports contain comprehensive information on the survey methodology and questionnaires. [35][36][37] Briefly described, the survey in Tanzania was a nationally representative sample of health facilities. Health facilities were selected using stratified systematic probability sampling with stratification by region and facility type (with oversampling of some facility types such as hospitals). In Haiti and Malawi, the survey was a census of all health facilities in the country. In all three countries, all surveyed facilities completed the facility inventory module. In addition, within each facility, up to eight health workers were interviewed including all health workers whose consultations were observed and those who provided information for any section of the facility questionnaire. ANC clients were selected for observation using systematic sampling, based on the number of clients present at each service site on the day of the visit. At facilities where the number of ANC clients expected on the day of the survey could not be predetermined, the sample was opportunistic since clients were selected as they arrived. Observation was completed for a minimum of five clients per service provider, with a maximum of 15 observations in any given facility for each service. Client exit interviews were conducted with every client whose visit was observed.

Item selection
To identify elements of facility readiness and the provision of ANC considered by experts to be the most important to ANC service quality, we reviewed the Service Availability and Readiness Assessment (SARA) indicators, World Health Organization (WHO) Focused Antenatal Care (FANC) guidelines and WHO recommendations on ANC for a positive pregnancy experience, as well as the SPA questionnaire. This process identified a total of 121 items organised according to the dimensions of quality of care proposed by the WHO Quality of Care Framework for maternal and newborn health: essential physical resources (41 items), competent and motivated human resources (four items), provision of care (65 items) and experience of care (11 items). 38 We then conducted a survey of 50 maternal health experts who had experience working in LMICs (questionnaire available as an online supplementary file). Respondents were asked to rate each item based on overall importance. Importance ratings ranged from one (item was unimportant) to four (item was essential). Experts also provided a list of items they felt were important for delivering high-quality ANC services, but were missing from the survey instrument. Fifteen maternal health experts completed the survey, with respondents representing academic institutions, donor agencies, United Nations agencies, global health implementing organisations and Ministries of Health in LMICs.
Items that were not collected across SPA surveys in all three countries and items that are not required for a first ANC visit were excluded from all indices. For each readiness item, a binary score was created based on whether the criteria for availability were met on the day of the survey (1=available, 0=not available). For human resources items, each item is a proportion ranging from 0 to 1. For provision of care items, a binary score was created based on whether or not the activity was conducted during the ANC visit (1=conducted, 0=not conducted) or whether or not the client had a problem with the item during the ANC visit (1=no problem, 0=major or minor problem). The list of items included in each readiness and provision of care index can be found in online supplementary tables 2 and 3.
Three methods were used to select items to include in the facility readiness indices. The first method identified the core set of readiness items required to deliver ANC services. This set of items was identified by reviewing the provision of care items required for an ANC visit based on WHO FANC guidelines and WHO recommendations on ANC for a positive pregnancy experience, and by determining the human resources, equipment and supplies, medicines, and diagnostics required to deliver each specific item. In creating the core set of items for facility readiness, we mapped each provision of care item to the facility readiness items required to deliver the specific service component. We found that of the 49 provision of care items, 36 items required only human resources and 13 items required human resources plus equipment, diagnostics, medicines or basic amenities. The core index did not include standard precautions for infection prevention items because these are not explicitly required for any one provision of care item. A total of 21 items were selected for the core set readiness index.
The second method used results from the expert survey to identify the set of readiness items maternal health experts identified as essential to deliver ANC services. Mean ratings were calculated for each readiness item (see online supplementary table 4 for results). Items rated by the expert group with a mean importance of >3.4 out of 4 were selected for the expert survey index. The threshold for inclusion was determined by examining the distribution of scores and identifying a natural break point which separated the top-rated items from the rest. In addition, items were selected so that at least one item per domain (human resources, equipment and supplies, diagnostics, medicines, basic amenities) was included in the index. A total of 19 items were selected, representing 42% of the total items for the expert survey readiness index.
The third method identified the maximum set of readiness items used to deliver ANC services. This index included all items identified in the SPA related to ANC readiness across the following domains: human resources, equipment and supplies, medicines, diagnostics and basic amenities. Out of the 45 facility readiness items identified Open access for inclusion in the expert survey, a total of 38 items were selected for the maximum set readiness index. Seven items from the expert survey were not included in the maximum set of readiness items as data was not collected in the SPA on these items. We used a single method to select items to include in the provision of care index. The method used results from the expert survey to select the set of provision of care items maternal health experts identified as essential to deliver ANC services. We chose this method because maternal health experts were the best source for determining which processes are essential to high-quality ANC consultations. In addition, the experts selected most items as very important or essential, and therefore, it was not appropriate to define a core and maximum set of items. Mean ratings were calculated for each provision of care item (see online supplementary table 4 for results). Items rated by the expert group with a mean importance Open access of ≥3.0 out of 4 were selected for the provision of care index. The threshold for inclusion was decided by examining the distribution of scores which showed a bimodal distribution; then, selecting all items in the first mode. The expert survey respondents mentioned a number of provision of care items, largely related to experience of care, that they felt were essential for high-quality care, but were missing from our survey and from SPA datasets. A total of 49 items were selected representing 64% of the total items for the provision of care index.

Item combination
Three methods were used to combine items to create the facility readiness and provision of care indices-simple additive, weighted additive and PCA.

Simple additive
The simple additive index score was calculated by taking a sum of the items available divided by the total number of items in the index. We transformed the index into a score out of 100 by dividing by the number of items and multiplying by 100. The simple additive index weighted all items in the index equally.

Weighted additive
The weighted additive index was also calculated as a sum of items, but instead of assuming equal weights for all items, the weighted additive index accounted for the number of items within each domain. Readiness items were first grouped into five domains: human resources, equipment and supplies, medicines, diagnostics, and basic amenities. For the provision of care index, items were initially grouped into five domains: history and counselling, examination, diagnostics, preventative treatment and client experience. We then computed a domain score by adding the items within each domain and dividing by the total number of items in the domain. Finally, we transformed the index into a score out of 100 by averaging the domain scores and multiplying by 100.

Principal components analysis
To create the PCA index score, we conducted an unrotated, unweighted PCA using a correlation matrix and used the factor loadings from the first principal component to create the index score. We rescaled the score obtained from the first component of the PCA to a range of 0-100 for comparability with other indices.

Analysis
We limited the readiness analysis to facilities offering ANC services and with at least one first visit ANC client observation. In order to standardise expected clinical actions, we limited the provision of care analysis to women attending the health facility for their first ANC visit. Finally, we excluded cases with incomplete data for any variables of interest.
We calculated descriptive statistics on each of the indices including mean, median, minimum, maximum and range. The elements of the complex survey design (weights and clustering of observations) were not incorporated into the analysis as the goal of this analysis was not to make inferences about the entire population from which the sample of health facilities was drawn. In addition, for the indices created using PCA to combine items, we calculated factor loadings, the eigenvalue and the per Open access cent of total variance explained by the first component. Since the PCA method was used to create weights for a composite variable rather than estimating a latent variable we did not report out the number of components with an eigenvalue greater than one. Next, to compare scores from the nine facility readiness indices, we categorised the continuous index scores into low, medium and high readiness categories using tercile cut points. We then assessed the comparability of the facility classification across the indices by calculating the per cent agreement and Cohen's kappa, which accounts for the possibility of the agreement occurring by chance, between each combination of indices. Cohen's kappa ranges from −1 to 1 and can be interpreted as follows: <0 as no agreement, 0-0.20 as slight, 0.21-0.40 as fair, 0.41-0.60 as moderate, 0.61-0.80 as good and 0.81-1 as almost perfect agreement. 39 Next, we examined the variability and distribution of the index scores. As countries are interested in being able to compare facilities with each other to understand better and worse performers (even if they are all within the low-quality or high-quality band), we are interested in being able to capture that variation in an index score. The examination of variability and distribution of the index scores aims to understand the level of variability being captured but does not indicate that an index is necessarily 'better' because it captures more variability. The coefficient of variation (CoV) was used to measure the variation captured by each index, with a higher CoV indicative of more variation captured in the index. Distribution of the scores was examined by assessment of floor and ceiling effects. Floor and ceiling effects were considered to be present if more than 15% of facilities achieved the lowest or highest possible index score (0% or 100%), respectively. 40 While in some settings it may be possible that greater than 15% of facilities are at 0% or 100%, the presence of floor and ceiling effects may indicate that indices have limited ability to differentiate facilities at very low and very high readiness levels. Finally, each index was assessed in terms of inclusion of items with a range of variability. For each index, we tallied the number of items that were available in <30% of facilities, <40% of facilities and >90% of facilities.
All statistical analyses were carried out using R V.3.5.1. 41 Patient and public involvement It was not appropriate or possible to involve patients or the public in this work.

Item selection
The results of the expert survey are available in online supplementary table 4. There was some alignment between the core set of readiness items and the expert set of readiness items. The core readiness index included 21 items, the expert readiness index included 19 items and the overlap between the two indices was 14 items. Of the 45 facility readiness items identified for inclusion in the expert survey, only four were related to human resources and of those, only one was highly rated by the expert group as essential for high-quality service delivery. In addition, the expert survey respondents mentioned a number of items that they felt were essential for highquality care but were missing from our survey and from SPA datasets. These items included: ensuring that clients are treated with respect and without discrimination, having the option for clients to invite their partner to participate in the consultation, discussing identification of a birth companion of choice with the client, including family members in counselling sessions, providing counselling on newborn care practices and ensuring clients understand the counselling information they receive.

descriptive statistics
The mean readiness score varied by index from 56.6 to 64.1 in Haiti, 52.3 to 69.6 in Malawi and 61.2 to 75.9 in Tanzania (table 1, figure 2). Across all countries, there were no indices for which any facilities received the minimum score of 0 and all countries had at least three indices for which some facilities received the maximum score of 100. In addition, in each country, the three indices that contained the same items-but used different methodologies for combining the items-generally had a similar median. For example, the core simple, core weighted and core PCA median scores in Tanzania were 73.5, 72.1 and 73.9, respectively. Finally, in general, the indices created using PCA resulted in a larger IQR, particularly for the core and expert item indices, across all three countries. The mean provision of care score varied by index from 38.9 to 85.9 in Haiti, 35.8 to 53.1 in Malawi and 40.4 to 58.0 in Tanzania (table 1, figure 3). The expert PCA index in Haiti was the only index in which any clients received the minimum score of 0 and no countries had an index for which any clients received the maximum score of 100. The expert simple and expert weighted indices had similar median provision of care scores; however, the expert weighted index scores were higher than the simple index scores in all three countries. The expert PCA index scores had median provision of care scores which were very different from both the expert simple and expert weighted scores. For example, in Malawi, the expert simple median provision of care score was 46.9, the expert weighted median provision of care score was 53.7 and the expert PCA median provision of care score was 33.9.
The results from the PCA analysis are presented in table 2. Across all countries, and all indices for facility readiness, the items with the highest loadings (absolute    For the core item indices, the items that loaded the highest were related to diagnostics and human resources. For the expert item indices, the items that loaded the highest were related to diagnostics. For the maximum item indices, the items that loaded the highest were related to diagnostics and basic amenities. For all facility readiness indices in all countries, the per cent of variance explained by the first principal component was low, ranging from 17.21% for the core and maximum indices in Haiti to 19.49% and 20.38% for the expert set of items in Tanzania and Haiti, respectively. In addition, we found that some items, such as tetanus toxoid vaccine, deworming medications and syphilis testing, had negative loadings. Across all countries for provision of care, the items with the highest loadings (absolute value greater than 0.4) differed by country. For Haiti, the items that loaded the highest were related to client experience. For Malawi and Tanzania, the items that loaded the highest were related to history taking and counselling; although for each country, the highest loadings were largely on different items. For provision of care indices in all countries, the per cent of variance explained by the first principal component was low, ranging from 9.89% in Malawi to 10.49% in Haiti. In addition, we found that items such as history taking, client education and counselling on danger signs in pregnancy, and haemoglobin testing had negative loadings. Table 3 presents the results of the per cent agreement and Cohen's kappa coefficient among the nine facility readiness indices. Across all countries, all readiness indices had fair or better agreement. In Haiti, there were three index combinations that had fair agreement (core weighted/ core PCA; core weighted/expert PCA; core weighted/ maximum PCA) while the remainder of the indices had moderate or better agreement. In Malawi, there was one index combination that had fair agreement (core weighted/maximum simple) while the remainder of the indices had moderate or better agreement. In Tanzania, there were two index combinations that had fair agreement (core weighted/expert PCA; core weighted/ maximum PCA) while the remainder of the indices had moderate or better agreement.

Variability and distribution of the index
The variability and distribution of the nine facility readiness indices are presented in table 4. In Haiti, the CoV ranged from 21.12 (maximum simple readiness index) to 35.72 (expert PCA). In Malawi, the CoV ranged from 18.96 (expert PCA) to 32.09 (core PCA). In Tanzania, the CoV ranged from 19.11 (core simple) to 23.62 (expert PCA). Across all countries and all indices, there were no floor effects. Ceiling effects were limited and far below the 15% threshold. The highest percentage of ceiling effects was 3.33%, which was found in Tanzania in three indices (expert simple, expert weighted and expert PCA). Table 5 presents the percentage of facilities in which each index item was available in order to assess the inclusion of items across a range of frequency. Across all countries, the maximum item index contained the greatest number of items available in less than 40% of facilities (10 items Haiti, 12 items Malawi and 9 items in Tanzania). In Tanzania, the expert index did not include any items available in less than 40% of facilities. Within countries, the core, expert and maximum indices had a similar number of items that were available in over 90% of facilities (four to five in Haiti, six to eight in Malawi and seven to nine in Tanzania).

dISCuSSIon
To our knowledge, this is the first study that uses multiple methodologies to develop and compare indices for ANC facility readiness and provision of care. Our results showed that, within a country, indices containing different items combined using different approaches were quite similar in terms of median index score, facility classification, CoV, floor and ceiling effects and inclusion of items with a range of variability. Although similar overall, we found that the indices created using PCA were the most different from the other indices. In addition, the core weighted index had lower agreement with the other indices when looking at facility classification. This may be the result of some domains in the core weighted index having few items, making each of those items more influential in the overall index score.
Our analysis highlighted the importance of competent, motivated human resources to all aspects of delivering Open access Open access high-quality ANC services. Our approach to defining the core set of items highlighted the lack of metrics for human resources captured by facility surveys, as well as the inconsistency between what inputs are required for provision of care and how facility readiness is often defined. This is especially important considering there is a health workforce crisis globally, which is particularly pronounced in LMICs. [42][43][44] Many countries are suffering from an absolute shortage of healthcare workers. The health workers they do have are often poorly distributed within a country and many workers are deficient in skill mix and core competencies. 45 46 This study's findings also suggest that to provide more comprehensive measures of ANC quality some items could be added to SPAs and other facility surveys, particularly items related to the experience of care. While the global community is moving towards more personcentred care, data collected through health facility assessments is generally deficient in experience of care measures. 47 48 This may be due to the lack of validated instruments available for measuring the experience of maternal health care. 49 In addition, measures of patient experience from health facility assessments are limited and may be subject to courtesy bias, which has been found to be particularly problematic for subjective questions regarding items such as treatment by staff and consultation quality, items of interest for measuring experience of care. 50 We, therefore, cannot be certain if these measures are actually representative of people's experiences or if other types of questions would better capture experience of care. However, a number of recent studies have generated validated tools for measuring respectful maternity care; these may well provide a starting point for incorporating improved experience of care metrics in health facility assessments such as the SPA. 51 52 We also found that the indices differed in terms of their ease of construction and interpretation. The simple additive and weighted additive approaches were straightforward to construct because they required taking averages of items, while the PCA necessitated more complex analysis. In addition, the simple additive approach was the easiest to interpret since facilities with more items available had a higher quality score. The weighted additive approach was also relatively easy to interpret. However, item weighting whereby each domain receives an equal weight was based on an implicit conceptual framework for quality which has not been formally validated. The PCA approach was perhaps the most difficult to interpret as the PCA score represents the linear combination of variables that explained the most variance in the data. As the loadings of each variable on the first component were used as the weights, and some items were negatively correlated and consequently had negative loadings; having all items present in a facility did not always result in the highest PCA score. In addition, the weights assigned by PCA reflected the variation in the data that was different across countries. As a result, PCA did not produce the same weights across different contexts. As we were working with a common definition for what an ANC visit should include across country contexts, this variability in loadings resulted in concerns regarding the face validity and construct validity of the PCA indices. Overall, the low per cent of total variance explained by the first principal component, diverging items with high loadings between countries, negatively correlated items, and the complexity in creating and interpreting the measure highlights concerns for the use of PCA for creating quality of care indices. 53 Another study focused on comparing summary measures of quality of care for family planning found similar results in terms of the ease of construction and interpretation of similar indices suggesting this finding may be relevant across service areas. 54 Due the variation in ease of construction and interpretation, different methodologies may be better suited for different purposes. For example, the simple additive approach using the maximum number of items may be easiest for Ministries of Health in LMICs to understand and implement. However, if there is particular interest in Open access using facility readiness as a proxy for provision of care, it might be conceptually important to ensure that items align well between facility readiness and provision of care, thus making the core set of items a good choice. In addition, while the weighted additive method aims to give equal weights to domains of facility readiness and provision of care, as the contents of each domain are determined by what data are available as opposed to an ideal set of items, this approach implicitly weights certain items more heavily because of the unavailability of data on other items and therefore may not be well-suited depending on the purpose of the index. While different methodologies may be better suited for different purposes, there remains a need for standardised, meaningful, valid measures of quality of care that take into account the variation in ease of construction and interpretation and can be used both by countries and at global level. We note several limitations of the analysis. First, this analysis was conducted using data from three countries, two in Africa and one in Latin America and the Caribbean, which may limit generalisability of findings to lowincome countries globally. However, the findings were relatively consistent across the three countries. The two African countries are in the Southern and Eastern Africa region and represent two different types and sizes of health systems within that region. Second, the SPA survey, while quite comprehensive, does not capture every aspect of quality of care that may be important to include in quality of care metrics. However, the SPA represents the most comprehensive information on quality of care in LMICs that is currently available. 34 Third, our approaches to item selection and combination were based on the most commonly used approaches in the literature. However, there are other methods that may be helpful in determining which items to include in a quality of care index, such as latent class analysis, that were not implemented. The selection of methods for this analysis included those most likely to be useful to LMICs. Finally, this analysis does not provide a formal validation of any one ANC quality of care index. However, it does characterise various summary indices of ANC service quality and provides valuable information on the similarities and differences between indices.

ConCluSIon
While the goal of this study was not to identify the best index for measuring quality of care, we did endeavour to characterise the various indices to make more information available in order to assist health systems researchers in choosing a methodology for creating quality of care indices. Overall, we found the indices to be quite similar within a country. In addition, we found that different methodologies may be better suited for different purposes. Future research on the association between facility readiness and provision of care would be helpful to further characterise the quality of care indices and inform selection of an index. As quality of care becomes Open access integral to global efforts to improve health outcomes, demand for guidance on creating standardised measures of service quality will grow. This study provides health systems researchers with a comparison of methodologies commonly used to create summary indices of ANC service quality and highlights the similarities and differences between methods. Further research will be required at global and country level to develop standardised, meaningful, valid measures of quality of care that take into account multiple services and various country contexts.