Table 4

The limitations of common sources of data used to estimate diabetes prevalence

Sources of data	Limitations
Self-report survey	Selection/sample bias, patient recall bias, limited sample size
Survey with one laboratory test	Selection bias; cross-sectional measure; poor repeatability with glucose tests; estimates the undiagnosed diabetes based on patient recall or medical records; not necessarily unknown to the entire health system
Primary care records	Inconsistency in primary care coding; subject to migration bias; may miss diagnosis at secondary care or other healthcare providers; limited sensitivity in general
Hospitals	Only identifies those with diabetes who attended hospital; recent changes in ICD coding standards may affect consistency. Major undercount
Pharmaceutical dispensing data	Diet-controlled diabetes would not be captured; adherence is not perfect in the community. Medications may have other indications such as metformin in the polycystic ovarian syndrome or may be used to ‘prevent’ diabetes
Combination of datasets	Depends on quality of the datasets combined. Needs a unique patient identifier for linkage to avoid double counting. The definition of diagnoses may not be consistent across the datasets
Capture–recapture	Identifies people with diabetes not captured by the system (note—not undiagnosed diabetes). Assumes list independence, and all individuals have the same probability of being captured by each dataset. The estimates can be influenced by factors that are completely unrelated to diabetes prevalence such as changes in ICD coding standards, or admission threshold, and treatment trends. One cannot identify the individuals.

ICD, International Classification of Diseases.