Concordance of commercial data sources for neighborhood-effects studies

J Urban Health. 2010 Jul;87(4):713-25. doi: 10.1007/s11524-010-9458-0.

Abstract

Growing evidence supports a relationship between neighborhood-level characteristics and important health outcomes. One source of neighborhood data includes commercial databases integrated with geographic information systems to measure availability of certain types of businesses or destinations that may have either favorable or adverse effects on health outcomes; however, the quality of these data sources is generally unknown. This study assessed the concordance of two commercial databases for ascertaining the presence, locations, and characteristics of businesses. Businesses in the St. Louis, Missouri area were selected based on their four-digit Standard Industrial Classification (SIC) codes and classified into 14 business categories. Business listings in the two commercial databases were matched by standardized business name within specified distances. Concordance and coverage measures were calculated using capture-recapture methods for all businesses and by business type, with further stratification by census-tract-level population density, percent below poverty, and racial composition. For matched listings, distance between listings and agreement in four-digit SIC code, sales volume, and employee size were calculated. Overall, the percent agreement was 32% between the databases. Concordance and coverage estimates were lowest for health-care facilities and leisure/entertainment businesses; highest for popular walking destinations, eating places, and alcohol/tobacco establishments; and varied somewhat by population density. The mean distance (SD) between matched listings was 108.2 (179.0) m with varying levels of agreement in four-digit SIC (percent agreement = 84.6%), employee size (weighted kappa = 0.63), and sales volume (weighted kappa = 0.04). Researchers should cautiously interpret findings when using these commercial databases to yield measures of the neighborhood environment.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Commerce / statistics & numerical data*
  • Data Collection / methods*
  • Data Collection / statistics & numerical data*
  • Databases, Factual / statistics & numerical data
  • Environment*
  • Geographic Information Systems / statistics & numerical data*
  • Health Status
  • Humans
  • Missouri
  • Residence Characteristics / statistics & numerical data*
  • Socioeconomic Factors