Table 4

The limitations of common sources of data used to estimate diabetes prevalence

Sources of dataLimitations
Self-report surveySelection/sample bias, patient recall bias, limited sample size
Survey with one laboratory testSelection bias; cross-sectional measure; poor repeatability with glucose tests; estimates the undiagnosed diabetes based on patient recall or medical records; not necessarily unknown to the entire health system
Primary care recordsInconsistency in primary care coding; subject to migration bias; may miss diagnosis at secondary care or other healthcare providers; limited sensitivity in general
HospitalsOnly identifies those with diabetes who attended hospital; recent changes in ICD coding standards may affect consistency. Major undercount
Pharmaceutical dispensing dataDiet-controlled diabetes would not be captured; adherence is not perfect in the community. Medications may have other indications such as metformin in the polycystic ovarian syndrome or may be used to ‘prevent’ diabetes
Combination of datasetsDepends on quality of the datasets combined. Needs a unique patient identifier for linkage to avoid double counting. The definition of diagnoses may not be consistent across the datasets
Capture–recaptureIdentifies people with diabetes not captured by the system (note—not undiagnosed diabetes). Assumes list independence, and all individuals have the same probability of being captured by each dataset. The estimates can be influenced by factors that are completely unrelated to diabetes prevalence such as changes in ICD coding standards, or admission threshold, and treatment trends. One cannot identify the individuals.
  • ICD, International Classification of Diseases.