Elsevier

Health & Place

Volume 17, Issue 5, September 2011, Pages 1122-1131
Health & Place

Field validation of secondary commercial data sources on the retail food outlet environment in the U.S.

https://doi.org/10.1016/j.healthplace.2011.05.010Get rights and content

Abstract

This study used direct field observations with interior assessments of outlets to validate food store and restaurant data from two commercial business lists conditional on classification of outlet type, including supermarkets, grocery stores, convenience stores, full-service restaurants and fast food restaurants. The study used a stratified random sample that included 274 urban census tracts across 9 counties from the Chicago Metropolitan Statistical Area (MSA) and 46 suburban and 61 rural census tracts across 13 counties from a 50-mile buffer surrounding the MSA. Results showed that agreement between the field observations and the commercial business lists for the food store and restaurant outlets was generally moderate (ranging from fair to good). However, when the listed data were validated based on an exact classification match, agreement was only fair (ranging from poor to moderate) and, in particular, poor for fast food restaurants. The study also found that agreement levels for some outlet types differed by tract characteristics. Commercial databases must be used with caution as substitutes for on the ground data collection.

Highlights

► This study used field observations to validate food store and restaurant commercial data. ► Interior outlet characteristics were used to classify food store and restaurant types. ► The stratified random sample included 274 urban, 46 suburban and 61 rural census tracts.► Results showed that agreement was generally moderate (ranging from fair to good).► Validation agreement based on an exact classification match was only fair.

Introduction

Patterns of poor dietary intake and obesity prevalence in the United States have generated much interest in the role of the neighborhood retail food environment on diet and weight outcomes (Holsten, 2009, Morland and Evenson, 2009, Lovasi et al., 2009, Fleischhacker et al., 2010). Broadly, the retail food environment includes different types of stores, restaurants, and vending machines and covers several domains such as neighborhoods or communities, worksites, schools and media/information (Glanz et al., 2005, MacKinnon, 2008, Townshend and Lake, 2009). In this study, we focused on the neighborhood food store and restaurant environment and assessed the validity of commercial outlet data sources.

The presence of food deserts has been well-documented in the U.S. (Institute of Medicine and National Research Council, 2009, Beaulac et al., 2009). Many studies found disparate availability of certain types of food outlets by neighborhood racial or ethnic composition and found that, compared to middle and high-income neighborhoods, low-income neighborhoods had fewer supermarkets and relatively more fast food restaurants than other restaurants (Larson et al., 2009, Beaulac et al., 2009, Fleischhacker et al., 2010). Several studies found that supermarket availability was associated with increased fruit and vegetable intake, more healthful diets and lower body weight (Morland et al., 2002, Laraia et al., 2004, Powell et al., 2007a, Moore et al., 2008, Larson et al., 2009), although results were more mixed in other countries (Holsten, 2009). Study findings were inconsistent on the association between fast food outlet availability and dietary intakes or weight outcomes (White, 2007, Powell et al., 2007b, Powell et al., 2009, Holsten, 2009, Fleischhacker et al., 2010). The validity of the data sources used in such studies is critical to obtain reliable unbiased results.

Researchers draw upon a variety of sources to document the retail food environment, including secondary data sources such as government listings (Department of Health or local authority council lists), the U.S. Census, telephone directories and commercial business lists (Wang et al., 2006). Direct methods to identify food or restaurant outlets include field observations undertaken by trained observers, often referred to as “ground truthing” (Sharkey, 2009). Depending on the geographic scope and time period of the study, as well as the resources available for data collection, there are trade-offs to use these different sources. Direct field observations obtained by trained observers can be thought of as the “gold standard” but are time intensive, costly, and cannot be easily linked to historical individual-level data. Thus, studies using retrospective designs or stacked cross-sectional time series and studies conducted on a large scale (e.g., national) often rely on secondary data sources.

A limited number of U.S. studies used direct field observations to validate secondary sources of food outlets. Much of this prior research focused solely on food stores, with less attention to restaurants. Comparing food store listings from two secondary sources (government listing and telephone business directory), Wang et al. (2006) found that the correlation between food store counts per census tract was moderate (Spearman's correlation coefficient=0.5), suggesting that secondary sources must be considered with caution although they did not draw comparisons based on field work. Another recent study also examined agreement between secondary databases and found that agreement between two commercial sources was only 38% for food stores and 51% for restaurants (Hoehner and Schootman, 2010).

Only a few studies validated secondary commercial sources with field observations in the U.S. Bader et al. (2010) compared direct field observations with a commercial business list on the presence but not the number of food and restaurant establishments in Chicago, in 2002, and found moderate levels of agreement (kappa ranges of 0.32–0.70). The study did not find evidence of systematic disagreement on the presence of outlets between the ground and list source by socioeconomic or demographic characteristics (Bader et al., 2010). Most recently, Liese et al., 2010 validated two commercial data sources based on field work spanning 8 counties in South Carolina and found that food outlets on the ground were moderately undercounted in the lists for both food stores (61–63% were in the lists) and restaurants (50–67% were in the lists). The percentage of outlets in the commercial lists found on the ground was good (78–82% food stores and 79–90% for restaurants). Assessing differences by tract characteristics, they found no differences for any food outlets by income or race, nor by urbanicity for food stores. However, for restaurants, they found that agreement was higher in urban versus rural tracts (Liese et al., 2010).

Internationally, Paquet et al. (2008) used ground truthing in Montreal, Canada to validate measures of food store and physical activity establishments from a commercial database and Internet searches. They found the commercial list had a higher percent agreement (73%) than the Internet-based list (60%). The level of agreement did not differ significantly by socioeconomic or ethnic (language) characteristics of the census tracts, suggesting that the error was not systematic (Paquet et al., 2008). In the United Kingdom, Lake et al. (2010) used field observations to validate secondary food store and restaurant data from online and paper versions of telephone Yellow Page directories and data from the local authority. They found the proportion of stores in the field that were listed by the local authority was 84%, while the corresponding figure was only 51% and 53% for the two Yellow Page sources. Combining all three sources, the proportion in the field that was listed increased to 93%. Examining the proportion of outlets on the list that were found in the field, the proportion was 92% for the local authority council data and 82% and 79% for the two Yellow Pages sources (Lake et al., 2010). Cummins and Macintyre (2009) assessed the quality of city council lists in Glasgow, Scotland and found that 87% and 88% of the food stores listed in 1997 and 2007, respectively, were present on the ground and that the percentage agreement did not significantly differ by neighborhood social deprivation. They did not assess the proportion of outlets found on the ground that were in the lists nor did they examine restaurants (Cummins and Macintyre, 2009).

This is the first study to validate food store and restaurant data from the two most widely available and used commercial data bases in the U.S., Dun and Bradstreet (D&B) and InfoUSA, providing detailed validation statistics conditional on classification of outlet type, including supermarkets, grocery stores, convenience stores, full-service restaurants and fast food restaurants. This study tested the validity of the secondary sources by comparison to direct field observations undertaken by trained observers. Field observers identified the presence of outlets on the ground and collected detailed information about store and restaurant characteristics from inside each of these outlets for use in determining classification. No previous validation study collected detailed characteristics from the interior of the outlets. Validity was first assessed for food stores and restaurants unconditional on outlet type and then conditional on an exact classification match between the field observations and list by type of store and restaurant. This study also tested whether validity differed by neighborhood characteristics, including by income, race, ethnicity and urbanicity.

Section snippets

Sampling

Secondary commercial business list data on food stores and restaurants drawn from D&B and InfoUSA using a stratified random sample of 425 census tracts were validated via direct field observations. Tracts were sampled from the Chicago–Joliet–Naperville, IL–IN–WI Metropolitan Statistical Area (MSA) (referred to hereafter as the Chicago MSA) and a 50-mile buffer surrounding the Chicago MSA. Six originally sampled census tracts were replaced because they could not be publicly ground-truthed by the

Results

As shown in Table 2, Table 3, in our main Chicago MSA sample of tracts, observers identified 1130 food stores and 3111 restaurants on the ground. In the commercial sources, D&B had 1241 food stores and 2596 restaurants listed. The InfoUSA database had 1181 food stores and 2596 restaurants listed. Not shown in the tables, in our suburban and rural tracts, respectively, observers identified 261 and 190 food stores and 678 and 324 restaurants. D&B and InfoUSA, respectively, listed 262 and 261 food

Discussion

In summarizing the agreement results from this study qualified as poor (0–0.2), fair (0.21–0.4), moderate (0.41–0.6), good (0.61–0.8) and very good (0.81–1.0) (Altman, 1991), the results showed that the agreement statistics between the field observations and the commercial business lists for the food store and restaurant outlets were generally moderate (ranging from fair to good) overall but were generally fair (ranging from poor to moderate) when agreement was assessed based on an exact

Aknowledgements

We gratefully acknowledge research support from the Robert Wood Johnson Foundation through the Bridging the Gap program for the ImpacTeen project.

References (48)

  • L.M. Powell

    Fast food costs and adolescent body mass index: evidence from panel data

    Journal of Health Economics

    (2009)
  • L.M. Powell et al.

    Associations between access to food stores and adolescent Body Mass Index

    American Journal of Preventive Medicine

    (2007)
  • L.M. Powell et al.

    Food prices and fruit and vegetable consumption among young American adults

    Health and Place

    (2009)
  • J.R. Sharkey

    Measuring potential access to food stores and food-service places in rural areas in the U.S.

    American Journal of Preventive Medicine

    (2009)
  • T. Townshend et al.

    Obesogenic urban form: theory, policy and practice

    Health and Place

    (2009)
  • D.G. Altman

    Practical Statistics for Medical Research

    (1991)
  • M.C. Auld et al.

    Economics of food energy density and adolescent body weight

    Economica

    (2009)
  • M.D.M. Bader et al.

    Measurement of the local food environment: a comparison of existing data sources

    American Journal of Epidemiology

    (2010)
  • E.A. Baker et al.

    The role of race and poverty in access to foods that enable individuals to adhere to dietary guidlines

    Preventing Chronic Disease Public Health Research, Practice, and Policy

    (2006)
  • J. Beaulac et al.

    A systematic review of food deserts, 1966-2007

    Preventing Chronic. Dis.

    (2009)
  • Dun and Bradstreet, 2005. The DUNSright Quality Process: The Power Behind Quality Information. Dun and Bradstreet,...
  • ESRI. ArcGIS Online Geocoding. 〈http://www.esri.com/software/arcgis/arcgisonline/task-services.html〉. 2009....
  • T.A. Farley et al.

    Measuring the food environment: shelf space of fruits, vegetables, and snack foods in stores

    Journal of Urban Health

    (2009)
  • T.A. Farley et al.

    The ubiquity of energy-dense snack foods: a national multicity study

    American Journal of Public Health

    (2010)
  • Cited by (119)

    • Validation of secondary data sources for enumerating marijuana dispensaries in a state commercializing marijuana

      2020, Drug and Alcohol Dependence
      Citation Excerpt :

      More importantly, they do not reflect the operation status of dispensaries in reality or capture unlicensed dispensaries that are common in areas with weak law enforcement. Business directories provided by commercial providers (e.g., InfoUSA, Dun & Bradstreet) are commonly used to identify tobacco, alcohol, and food retail outlets when state licensing directories are unavailable or unsatisfactory (Carlos et al., 2017; D’Angelo et al., 2014; Gustafson et al., 2012; Lake et al., 2010; Liese et al., 2010; Powell et al., 2011; Seliske et al., 2012). Unfortunately, these commercial databases had not systematically gathered information on marijuana dispensaries by the time of this study.

    View all citing articles on Scopus
    View full text