Systematic review of validated case definitions for diabetes in ICD-9-coded and ICD-10-coded data in adult populations

Bushra Khokhar; Nathalie Jette; Amy Metcalfe; Ceara Tess Cunningham; Hude Quan; Gilaad G Kaplan; Sonia Butalia; Doreen Rabi

doi:10.1136/bmjopen-2015-009952

Article Text

PDF

XML

Health services research

Research

Systematic review of validated case definitions for diabetes in ICD-9-coded and ICD-10-coded data in adult populations

Bushra Khokhar1,2,
Nathalie Jette1,2,3,
Amy Metcalfe2,4,5,
Ceara Tess Cunningham1,
http://orcid.org/0000-0002-7848-7256Hude Quan1,2,
Gilaad G Kaplan1,2,
Sonia Butalia2,6,
Doreen Rabi1,2,6

¹Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada
²O'Brien Institute for Public Health, University of Calgary, Calgary, Alberta, Canada
³Department of Clinical Neurosciences, Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada
⁴Department of Obstetrics and Gynecology, University of Calgary, Calgary, Alberta, Canada
⁵Alberta Children's Hospital Research Institute, Calgary, Alberta, Canada
⁶Division of Endocrinology, Department of Medicine, University of Calgary, Calgary, Alberta, Canada

Correspondence to Bushra Khokhar; bushra.khokhar{at}ucalgary.ca

Abstract

Objectives With steady increases in ‘big data’ and data analytics over the past two decades, administrative health databases have become more accessible and are now used regularly for diabetes surveillance. The objective of this study is to systematically review validated International Classification of Diseases (ICD)-based case definitions for diabetes in the adult population.

Setting, participants and outcome measures Electronic databases, MEDLINE and Embase, were searched for validation studies where an administrative case definition (using ICD codes) for diabetes in adults was validated against a reference and statistical measures of the performance reported.

Results The search yielded 2895 abstracts, and of the 193 potentially relevant studies, 16 met criteria. Diabetes definition for adults varied by data source, including physician claims (sensitivity ranged from 26.9% to 97%, specificity ranged from 94.3% to 99.4%, positive predictive value (PPV) ranged from 71.4% to 96.2%, negative predictive value (NPV) ranged from 95% to 99.6% and κ ranged from 0.8 to 0.9), hospital discharge data (sensitivity ranged from 59.1% to 92.6%, specificity ranged from 95.5% to 99%, PPV ranged from 62.5% to 96%, NPV ranged from 90.8% to 99% and κ ranged from 0.6 to 0.9) and a combination of both (sensitivity ranged from 57% to 95.6%, specificity ranged from 88% to 98.5%, PPV ranged from 54% to 80%, NPV ranged from 98% to 99.6% and κ ranged from 0.7 to 0.8).

Conclusions Overall, administrative health databases are useful for undertaking diabetes surveillance, but an awareness of the variation in performance being affected by case definition is essential. The performance characteristics of these case definitions depend on the variations in the definition of primary diagnosis in ICD-coded discharge data and/or the methodology adopted by the healthcare facility to extract information from patient records.

diabetes
validation studies
case definition
administrative data

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

https://doi.org/10.1136/bmjopen-2015-009952

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

Our systematic review was comprehensive as it had a broad search strategy that bore no language or time restriction.
All included studies captured patient information at the population level with clear case definitions encompassing a broad spectrum of patients.
There is the potential for a language bias as studies where full texts were not available in English were not considered.
There are potential limitations for all reference standards used to validate administrative definitions for diabetes.

Background

Diabetes is a chronic disease that has increased substantially during the past 20 years.1 At present, diabetes is the leading cause of blindness,2 renal failure3 and non-traumatic lower limb amputations4 and is a major risk factor for cardiovascular disease.5 Owing to its chronic nature, the severity of its complications and the means required to control it, diabetes is a costly disease. The healthcare costs associated with this condition are substantial and can account for up to 15% of national healthcare budgets.6

Understanding the distribution of diabetes and its complications in a population is important to understand disease burden and to plan for effective disease management. Diabetes surveillance systems using administrative data can efficiently and readily analyse routinely collected health-related information from healthcare systems, provide reports on risk factors, care practices, morbidity and mortality and estimate incidence and prevalence at a population level.7 With steady increases in ‘big data’ and data analytics over the past two decades, administrative health databases have become more accessible to health services researchers and are now used regularly to study the processes and outcomes of healthcare. However, administrative health data are not collected primarily for research or surveillance. There is a need for health administrative data users to examine the validity of case ascertainment in their data sources before use.8

By definition, surveillance depends on a valid case definition that is applied constantly over time. A case definition is set of uniform criteria used to define a disease for surveillance.9 However, a variety of diabetes case definitions exist, resulting in variation in reported diabetes prevalence estimates. A systematic review and meta-analysis of validation studies on diabetes case definitions from administrative records has been performed.10 This review aimed to determine the sensitivity and specificity of a commonly used diabetes case definition, “two physician claims or one hospital discharge abstract record within a two-year period” and its potential effect on diabetes prevalence estimation. Our study extends this body of work by systematically reviewing validated International Classification of Diseases (ICD), 9th edition (ICD-9)-based and ICD-10-based case definitions for diabetes and comparing the validity of different case definitions across studies and countries.

Methods

Search strategy

This systematic review was performed using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines11 (see online supplementary appendix A). Two citation databases, MEDLINE and Embase, were searched using an OVID platform from 1980 until September 2015. The search strategy consisted of the following set of terms (see online supplementary appendix B): (1) (health services research or administrative data or hospital discharge data or ICD-9 or ICD-10 or medical record or health information or surveillance or physician claims or claims or hospital discharge or coding or codes) AND (2) (validity or validation or case definition or algorithm or agreement or accuracy or sensitivity or specificity or positive predictive value or negative predictive value) AND (3) medical subject heading terms for diabetes. Searches were limited to human studies published in English. The broad nature of the search strategy allowed for the detection of modifications of ICD codes, such as international clinical modification (eg, ICD-9-CM).

Supplementary appendix

[bmjopen-2015-009952supp_appendixA.pdf]

Supplementary appendix

[bmjopen-2015-009952supp_appendixB.pdf]

Study selection

Studies were evaluated in duplicate for eligibility in a two-stage procedure. In stage 1, all identified titles and abstracts were reviewed and in stage 2, a full text review was performed on all studies that met the predefined eligibility criteria. If either reviewer defined a study as eligible in stage 1, it was included in the full text review in stage 2. Disagreements were resolved by discussion or consultation with a third reviewer.

Inclusion/exclusion criteria

A study was included in the systematic review if it met the following criteria: (1) study population included those ≥18 years of age with type 1 diabetes mellitus or type 2 diabetes mellitus; (2) statistical estimates (sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) or κ) were reported or could be calculated; (3) an ICD-9 or ICD-10 case definition for diabetes was reported and validated; (4) a satisfactory reference standard (eg, self-report from population-based surveys or patient medical chart reviews) and (5) if it reported on original data. Studies validating diabetes in specialised populations (eg, cardiovascular disease) were excluded to ensure that the diabetes case definitions would be generalisable. Studies not employing a sole medical encounter data in their diabetes case definition (eg, inclusion of pharmacy or laboratory data) were also excluded, as the independent validity of such definitions could not be calculated. Bibliographies of included studies were manually searched for additional studies, which were then screened and reviewed using the same methods described above.

Data extraction and quality assessment

Primary outcomes were sensitivity, specificity, PPV, NPV and κ reported for each of the ICD-coded diabetes case definition. Other extracted data included sample size and ICD codes used. If statistical estimates were not reported in the original paper, estimates were calculated from data available.

Calculating a pooled estimate of surveillance performance measures using meta-analytic techniques was deemed inappropriate given the heterogeneity of diabetes case definitions and reference standards used across studies. Data were tabulated by the type of administrative health data used. Study quality was evaluated using the Quality Assessment Tool for Diagnostic Accuracy Studies (QUADAS) criteria.12

Results

Identification and description of studies

A total of 2895 abstracts were identified with 193 studies reviewed in full text, of which 16 studies met all eligibility criteria (figure 1). Eight of these studies were conducted in the USA,13–20 seven in Canada21–27 and one in Australia.28 Thirteen studies used ICD-9 codes,13–19 ,21–23 ,26–28 and the remaining three studies used ICD-9 and ICD-10 codes.23–25 None of the studies differentiated or commented as to whether a particular code of interest was in the primary or in one of the secondary diagnostic positions. Of the 16 studies reviewed, 8 used medical records13 ,14 ,21 ,23–26 ,28 and 8 used either self-reported surveys or telephone surveys to validate the diabetes diagnosis.15–20 ,22–27 Eight studies used physician claims data,13–16 ,18–20 ,23 four studies used hospital discharge data22 ,24 ,26 ,28 and four studies used a combination of both.17 ,21 ,25 ,27 Two studies used electronic medical records (EMRs) as their health data source,29 ,30 but these were removed from the review since EMRs were not a part of our search strategy.

Figure 1

Study flow chart. ICD, International Classification of Diseases.

The QUADAS Scores (table 1) ranged from 9 to 13 of a maximum of 14. Five questions were selected from QUADAS to constitute the ‘bias assessment’. Regardless of quality assessment scores, all 16 studies are discussed in this systematic review.

View this table:

Table 1

Study quality characteristics using QUADAS tool

The sample size varied from 93 to ∼3 million people. Sensitivity and specificity values were available from all 18 studies, PPV in 16 studies, NPV in 12 studies and κ in 6 studies. All 16 studies were categorised by the type of administrative health data source being used.

Physician claims data

Table 2 lists the eight studies13–16 ,18–20 ,23 using physician claims data. In these studies, the sensitivity ranged from 26.9% to 97%, specificity ranged from 94.3% to 99.4%, PPV ranged from 71.4% to 96.2%, NPV ranged from 95% to 99.6% and κ ranged from 0.8 to 0.9. Four of the eight studies using physician claims data had a least one diabetes case definition where sensitivity and specificity exceed 80%.

View this table:

Table 2

Study characteristics and test measures of studies for physician claims data

Studies comparing physician claims-based case definitions over multiple years13 ,15 ,16 consistently show increases in sensitivity values and a slight decrease in specificity and PPV overtime. This relationship is consistent with the study18 looking at changes in the statistical estimates with increasing the number of appearance of diagnostic codes in the case definition—the sensitivity was the highest when any diagnostic code (inpatient or outpatient) was used, whereas the specificity and PPV were the highest when most number of outpatient diagnostic codes were used.

Hospital discharge data

Table 3 lists the four studies22 ,24 ,26 ,28 using only hospital discharge data. In these studies, the sensitivity ranged from 59.1% to 92.6%, specificity ranged from 95.5% to 99%, PPV ranged from 62.5% to 96%, NPV ranged from 90.8% to 99% and κ ranged from 0.6 to 0.9. Two of the four studies using hospital discharge data had a least one diabetes case definition where sensitivity and specificity exceed 80%. In contrast to the physician claims-based case definitions, the sensitivity seemed to improve when a longer duration was used in the case definition, however the specificity and the PPV behaved inversely.

View this table:

Table 3

Study characteristics and test measures of studies for hospital discharge data

Combination of physician claims and hospital discharge data

Table 4 lists out the four studies17 ,21 ,25 ,27 using a combination of physician claims and hospital discharge data. In these studies, the sensitivity ranged from 57% to 95.6%, specificity ranged from 88% to 98.5%, PPV ranged from 54% to 80%, NPV ranged from 98% to 99.6% and κ ranged from 0.7 to 0.8. Using a combination of two or more data sources increases the minimum value of the range for sensitivity compared to using either physician claims or hospital discharge data-based definitions individually. All four of the studies using a combination of physician claims and hospital discharge data had a least one case definition where sensitivity and specificity exceed 80%.

View this table:

Table 4

Study characteristics and test measures of studies for physician claims data and hospital discharge data

Another factor affecting the statistical estimates is the number of claims being used in the definition. Rector et al's study17 shows consistent results where the sensitivity is higher when at least one claims data are used in the definition, but the specificity is higher when at least two are used. Finally, Young et al's study27 demonstrates the highest sensitivity when two physician claims and two hospital discharge data are used in the definition and the highest specificity when one physician claim and two hospital claims are used in the definition.

A secondary tabulation of data was performed by the type of ICD coding system used. Eight studies using ICD-9 coding systems are from the USA and four studies from Canada. Four studies use ICD-9 and ICD-10 coding systems—three of these are from Canada and one from Western Australia. In studies using ICD-9 codes, sensitivity ranged from 26.9% to 100%, specificity ranged from 88% to 100%, PPV ranged from 21% to 100%, NPV ranged from 74% to 99.6% and κ ranged from 0.6 to 0.9; whereas, in the studies using ICD-10 codes, the ranges for sensitivity (59.1% to 89.6%) and specificity (95.5% to 99%) narrowed significantly, and PPV ranged from 63.1% to 96%, NPV ranged 90.8% to 98.9% and κ ranged from 0.6 to 0.9.

Discussion

In this systematic review, case definitions appear to perform better when more data sources are used over a longer observation period. The outcomes with respect to sensitivity, specificity and PPV for each of these studies seem to differ due to variations in the definition of primary diagnosis in ICD-coded health data, the use of hospital discharge versus physician billing claims and by the geographical location.

The validity of diabetes case definitions varies significantly across studies, but we identified definition features that were associated with better performance. The combinations of more than one data source, physician claim and/or hospital discharge encounter along with an observation period of more than 1 year consistently demonstrated higher sensitivity with only a modest decline in specificity. These definition characteristics are present in the definition used by the National Diabetes Surveillance System to identify Canadians with diabetes mellitus.31 The performance of this particular definition has been widely studied, and a meta-analysis pooling the results of these studies demonstrates a pooled sensitivity of 82.3% (95% CI 75.8% to 87.4%) and a specificity of 97.9% (95% CI 96.5% to 98.8%).10

This systematic review provides new knowledge on factors that are associated with enhanced definition performance and outlines the trade-offs one encounters with respect to sensitivity and specificity (and secondarily PPV and NPV) related to data source and years of follow-up. The development of an administrative case definition of diabetes is often related to pragmatic considerations (type of data on hand); however, this systematic review provides health services researchers with important information on how case definitions may perform given definition characteristics.

There was considerable ‘within-data definition’ variation in measures of validity. This variation likely reflects that neither physician claims nor hospital discharge data are primarily collected for surveillance; hence, the accuracy of diagnoses coded in these data sources remains suspect. Physician claims, while potentially rich in clinical information, are not recorded in a standardised manner. Billing practices do vary by practitioner, which may in turn be influenced by the nature of physician reimbursement (salary vs fee for service).23 ,32 ,33 Furthermore, patients with diabetes commonly carry multiple comorbidities, so while patients may have diabetes and be seen by a physician, providers will file billing claims for conditions other than diabetes.34 ,35 In contrast, hospital discharge data are limited to clinical information that is relevant to an individual hospitalisation, capturing diagnostic and treatment information usually for a brief window of time. The advantage of hospital discharge data for surveillance is that discharge diagnostic and medical procedure information are recorded by medical coders with standardised training with a detailed review of medical charts. However, the standard method of discharge coding does vary regionally, and thus variation around validity estimates based on these differences in coding practices will be observed.

Ideal performance parameters will vary based on the clinical condition of interest, the nature of surveillance and the type of data being used for surveillance. When studying diabetes trends and incidence rate, a case definition that has high but balanced measures of sensitivity and PPV is preferred. This will ensure maximal capture of potential patients and that patients captured likely have diabetes. This systematic review suggests that the commonly used two physician outpatient billings and/or one hospitalisation within a certain period of time is appropriate. It is also important to recognise that the data source used may also affect the type of patient identified with administrative data definitions. Hospital discharge data (when used in isolation) will potentially identify patients with more advanced disease or more complications and therefore may not be fully representative of the entire diabetes population. Similarly, physician claims data may identify a comparatively well, ambulatory population that has access to physician care in the community.

The greatest strength of this systematic review is its inclusiveness—the search strategy was not restricted by region, time or any particular case definition of diabetes. However, most of the studies, 15 of the 16, included in the qualitative analysis were conducted in North America with high sensitivity and specificity estimates between the cases identified through the administrative data versus medical records and the administrative data versus population-based surveys across studies, suggesting that public administrative data are a viable substitute for diabetes surveillance. Finally, the study quality across all studies included was generally high as measured by the QUADAS Scale.

There is the potential for a language bias as studies whose full texts were not available in English were not considered. There are potential limitations for all reference standards used to validate administrative case definitions for diabetes. The accuracy of chart reviews depends principally on physician documentation, availability of records and the accuracy of coding.36 Self-reported surveys and telephone surveys are prone to recall bias, social desirability bias, poor understanding of survey questions or incomplete knowledge of their diagnosis. Self-reported surveys can also suffer from participation biases as patients with low diabetes risk may be less willing to participate whereas certain patients with advance diabetes may be too unwell to participate. Age, sex and a patient's level of education can have an effect on the reporting of diabetes.37–39 Those with poorly controlled diabetes have been found to underreport their disease status.40 The ideal reference standard would be a clinical measure (such as glucose or HbA1c); however, the use of a clinical reference standard is not often performed.

In addition to the limitations of the reference standards used for validation, it should also be noted that even clinical measures as a references standard are imperfect and glucose and HbA1C are surrogates of the underlying disease process. It should also be noted that glucose and HbA1C thresholds for diagnosis have changed (albeit modestly) over the past 20 years. Changes in the clinical definition overtime have significant implications to diabetes surveillance. Understanding changing diagnostic thresholds is critical to interpreting surveillance data. However, the validity of an administrative data case definition is conceptually related but somewhat separate from the clinical definition. If we are to understand the clinical definition as a biological or physiologic definition that denotes the presence or absence of disease, the administrative data definitions are a surrogate of disease and denote the presence or the absence of disease based on care for the disease. The administrative definitions identify patients with a diagnosis of diabetes based on an interaction with the healthcare system in which they received care for diabetes. Therefore, the application of this definition follows the application of the clinical definition. There is a presumption that the clinical definition, whatever it may be at the time of the application, was valid.

Finally, difference between type 1 diabetes mellitus and type 2 diabetes mellitus is not clear in studies using administrative databases. In this systematic review, we included only adult population (≥18 years of age), which is primarily the type 2 diabetes population.

Generalisability

Fifteen of the 16 included studies were conducted in North America, and therefore it is not surprising that the validation studies report comparable results. However, even though these studies are nested in the general population, the selected diabetes cohorts used in the validation studies may not always be truly representative of the general population.

Conclusions

Most studies included in this review use similar case definitions that require one or more diagnoses of diabetes. The performance characteristics of these case definitions depends on the variations in the definition of primary diagnosis in ICD-coded discharge data and/or the methodology adopted by the healthcare facility to extract information from patient records. Purpose of surveillance and the type of data being used should command the performance parameters of an administrative case definition. Approaches used in developing case definitions for diabetes can be simple and practical and result in high sensitivity, specificity and PPV. Overall, administrative health databases are useful for undertaking diabetes surveillance,21 ,25 but an awareness of the variation in performance being affected by case definition is essential.

References

↵
1. Danaei G,
2. Finucane MM,
3. Lu Y, et al
. National, regional, and global trends in fasting plasma glucose and diabetes prevalence since 1980: systematic analysis of health examination surveys and epidemiological studies with 370 country-years and 2·7 million participants. Lancet 2011;378:31–40. doi:10.1016/S0140-6736(11)60679-X
OpenUrl CrossRef PubMed Web of Science
↵
1. Karumanchi DK,
2. Gaillard ER,
3. Dillon J
. Early diagnosis of diabetes through the eye. Photochem Photobiol 2015;91:1497–504. doi:10.1111/php.12524
OpenUrl
↵
1. Kiefer MM,
2. Ryan MJ
. Primary care of the patient with chronic kidney disease. Med Clin North Am Online 2015;99:935–52. doi:10.1016/j.mcna.2015.05.003
OpenUrl
↵
1. Leone S,
2. Pascale R,
3. Vitale M, et al
. Epidemiology of diabetic foot. Infez Med 2012;20(Suppl 1):8–13.
OpenUrl
↵
1. Grundy SM,
2. Benjamin IJ,
3. Burke GL, et al
. Diabetes and cardiovascular disease: a statement for healthcare professionals from the American Heart Association. Circulation 1999;100:1134–46. doi:10.1161/01.CIR.100.10.1134
OpenUrl FREE Full Text
↵
World Health Organization. Diabetes: the cost of diabetes. http://www.who.int/mediacentre/factsheets/fs236/en/ (accessed 27 Aug 2014).
↵
1. Jutte DP,
2. Roos LL,
3. Brownell MD
. Administrative record linkage as a tool for public health research. Annu Rev Public Health 2011;32:91–108. doi:10.1146/annurev-publhealth-031210-100700
OpenUrl CrossRef PubMed Web of Science
↵
1. Molodecky NA,
2. Panaccione R,
3. Ghosh S, et al
. Challenges associated with identifying the environmental determinants of the inflammatory bowel diseases. Inflamm Bowel Dis 2011;17:1792–9. doi:10.1002/ibd.21511
OpenUrl CrossRef PubMed
↵
Centers for Disease Control and Prevention: National Notifiable Diseases Surveillance System (NNDSS). http://wwwn.cdc.gov/nndss/case-definitions.html. (accessed 1 Sep 2015).
↵
1. Leong A,
2. Dasgupta K,
3. Bernatsky S, et al
. Systematic review and meta-analysis of validation studies on a diabetes case definition from health administrative records. PLoS One 2013;8:e75256. doi:10.1371/journal.pone.0075256
OpenUrl CrossRef PubMed
↵
1. Liberati A,
2. Altman DG,
3. Tetzlaff J
. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. Ann Intern Med 2009;151:W65–94.
OpenUrl CrossRef PubMed Web of Science
↵
1. Whiting P,
2. Rutjes AW,
3. Reitsma JB, et al
. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3:25.
OpenUrl CrossRef PubMed
↵
1. Borzecki AM,
2. Wong AT,
3. Hickey EC, et al
. Identifying hypertension-related comorbidities from administrative data: what's the optimal approach? Am J Med Qual 2004;19:201–6. doi:10.1177/106286060401900504
OpenUrl Abstract/FREE Full Text
↵
1. Crane HM,
2. Kadane JB,
3. Crane PK, et al
. Diabetes case identification methods applied to electronic medical record systems: their use in HIV-infected patients. Curr HIV Res 2006;4:97–106. doi:10.2174/157016206775197637
OpenUrl CrossRef PubMed Web of Science
↵
1. Hebert PL,
2. Geiss LS,
3. Tierney EF, et al
. Identifying persons with diabetes using Medicare claims data. Am J Med Qual 1999;14:270–7. doi:10.1177/106286069901400607
OpenUrl Abstract/FREE Full Text
↵
1. Ngo DL,
2. Marshall LM,
3. Howard RN, et al
. Agreement between self-reported information and medical claims data on diagnosed diabetes in Oregon's Medicaid population. J Public Health Manag Pract 2003;9:542–4. doi:10.1097/00124784-200311000-00016
OpenUrl CrossRef PubMed
↵
1. Rector TS,
2. Wickstrom SL,
3. Shah M, et al
. Specificity and sensitivity of claims-based algorithms for identifying members of Medicare+choice health plans that have chronic medical conditions. Health Serv Res 2004;39:1839–57. doi:10.1111/j.1475-6773.2004.00321.x
OpenUrl CrossRef PubMed Web of Science
↵
1. Miller DR,
2. Safford MM,
3. Pogach LM
. Who has diabetes? Best estimates of diabetes prevalence in the department of veterans affairs based on computerized patient data. Diabetes Care 2004;27(Suppl 2):B10–21.
OpenUrl Abstract/FREE Full Text
↵
1. Singh JA
. Accuracy of veterans affairs databases for diagnoses of chronic diseases. Prev Chronic Dis 2009;6:A126.
OpenUrl PubMed
↵
1. O'Connor PJ,
2. Rush WA,
3. Pronk NP, et al
. identifying diabetes mellitus or heart disease among health maintenance organization members: sensitivity, specificity, predictive value, and cost of survey and database methods. Am J Manag Care 1998;4:335–42.
OpenUrl PubMed Web of Science
↵
1. Hux JE,
2. Ivis F,
3. Flintoft V, et al
. diabetes in Ontario: determination of prevalence and incidence using a validated administrative data algorithm. Diabetes Care 2002;25:512–16. doi:10.2337/diacare.25.3.512
OpenUrl Abstract/FREE Full Text
↵
1. Robinson JR,
2. Young TK,
3. Roos LL, et al
. Estimating the burden of disease. Comparing administrative data and self-reports. Med Care 1997;35:932–47.
OpenUrl CrossRef PubMed Web of Science
↵
1. Wilchesky M,
2. Tamblyn RM,
3. Huang A
. Validation of diagnostic codes within medical services claims. J Clin Epidemiol 2004;57:131–41. doi:10.1016/S0895-4356(03)00246-4
OpenUrl CrossRef PubMed Web of Science
↵
1. So L,
2. Evans D,
3. Quan H
. ICD-10 coding algorithms for defining comorbidities of acute myocardial infarction. BMC Health Serv Res 2006;6:161. doi:10.1186/1472-6963-6-161
OpenUrl CrossRef PubMed
↵
1. Chen G,
2. Khan N,
3. Walker R, et al
. Validating ICD coding algorithms for diabetes mellitus from administrative data. Diabetes Res Clin Pract 2010;89:189–95. doi:10.1016/j.diabres.2010.03.007
OpenUrl CrossRef PubMed Web of Science
↵
1. Quan H,
2. Li B,
3. Saunders LD, et al
. Assessing validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions in a unique dually coded database. Health Serv Res 2008;43:1424–41. doi:10.1111/j.1475-6773.2007.00822.x
OpenUrl CrossRef PubMed Web of Science
↵
1. Young TK,
2. Roos NP,
3. Hammerstrand KM
. Estimated burden of diabetes mellitus in Manitoba according to health insurance claims: a pilot study. CMAJ 1991;144:318–24.
OpenUrl Abstract
↵
1. Nedkoff L,
2. Knuiman M,
3. Hung J, et al
. Concordance between administrative health data and medical records for diabetes status in coronary heart disease patients: a retrospective linked data study. BMC Med Res Methodol 2013;13:121. doi:10.1186/1471-2288-13-121
OpenUrl CrossRef PubMed
↵
1. Zgibor JC,
2. Orchard TJ,
3. Saul M, et al
. Developing and validating a diabetes database in a large health system. Diabetes Res Clin Pract 2007;75:313–19. doi:10.1016/j.diabres.2006.07.007
OpenUrl CrossRef PubMed Web of Science
↵
1. Kandula S,
2. Zeng-Treitler Q,
3. Chen L, et al
. A Bootstrapping algorithm to improve cohort identification using structured data. J Biomed Inform 2011;44(Suppl 1):S63–68. doi:10.1016/j.jbi.2011.10.013
OpenUrl
↵
Public Health Agency of Canada. National Diabetes Surveillance System, Public Health Agency of Canada. http://www.phac-aspc.gc.ca/ccdpc-cpcmc/ndss-snsd/english/index-eng.php (accessed 20 Aug 2014).
↵
1. Roos LL,
2. Roos NP,
3. Cageorge SM, et al
. How good are the data? Reliability of one health care data bank. Med Care 1982;20: 266–76.
OpenUrl CrossRef PubMed Web of Science
↵
1. Klabunde CN,
2. Potosky AL,
3. Legler JM, et al
. Development of a comorbidity index using physician claims data. J Clin Epidemiol 2000;53:1258–67. doi:10.1016/S0895-4356(00)00256-0
OpenUrl CrossRef PubMed Web of Science
↵
1. Carral F,
2. Olveira G,
3. Aguilar M, et al
. Hospital discharge records under-report the prevalence of diabetes in inpatients. Diabetes Res Clin Pract 2003;59:145–51. doi:10.1016/S0168-8227(02)00200-0
OpenUrl CrossRef PubMed
↵
1. Horner RD,
2. Paris JA,
3. Purvis JR, et al
. Accuracy of patient encounter and billing information in ambulatory care. J Fam Pract 1991;33:593–8.
OpenUrl PubMed Web of Science
↵
1. O'Malley KJ,
2. Cook KF,
3. Price MD, et al
. Measuring diagnoses: ICD code accuracy. Health Serv Res 2005;40:1620–39. doi:10.1111/j.1475-6773.2005.00444.x
OpenUrl CrossRef PubMed Web of Science
↵
1. Goldman N,
2. Lin IF,
3. Weinstein M, et al
. Evaluating the quality of self-reports of hypertension and diabetes. J Clin Epidemiol 2003;56:148–54. doi:10.1016/S0895-4356(02)00580-2
OpenUrl CrossRef PubMed Web of Science
↵
1. Kriegsman DM,
2. Penninx BW,
3. van Eijk JT, et al
. Self-reports and general practitioner information on the presence of chronic diseases in community dwelling elderly. A study on the accuracy of patients’ self-reports and on determinants of inaccuracy. J Clin Epidemiol 1996;49:1407–17. doi:10.1016/S0895-4356(96)00274-0
OpenUrl CrossRef PubMed Web of Science
↵
1. Mackenbach JP,
2. Looman CW,
3. van der Meer JB
. Differences in the misreporting of chronic conditions, by level of education: the effect on inequalities in prevalence rates. Am J Public Health 1996;86:706–11. doi:10.2105/AJPH.86.5.706
OpenUrl CrossRef PubMed Web of Science
↵
1. Garay-Sevilla ME,
2. Malacara JM,
3. Gutiérrez-Roa A, et al
. Denial of disease in type 2 diabetes mellitus: its influence on metabolic control and associated factors. Diabet Med 1999;16:238–44. doi:10.1046/j.1464-5491.1999.00033.x
OpenUrl CrossRef PubMed

Footnotes

Contributors NJ wrote the protocol. BK, AM and CTC carried out the systematic review. BK wrote the manuscript. NJ, HQ, GGK, SB and DR provided final approval of the version to be published. All authors read and approved the final manuscript.
Funding BK was supported by the Alliance for Canadian Health Outcomes Research in Diabetes (ACHORD) and The Western Regional Training Centre for Health Services Research (WRTC). NJ holds a Canada Research Chair in Neurological Health Services Research and an Alberta Innovates Health Solutions (AI-HS) Population Health Investigator Award and operating funds (not related to this work) from the Canadian Institutes of Health Research, AI-HS, the University of Calgary and the Hotchkiss Brain Institute and Cumming School of Medicine. CTC is funded by a Canadian Institute of Health Research doctoral research scholarship. GGK is a Population Health Investigator supported by Alberta Innovates—Health Solutions. DR is a Population Health Investigator supported by Alberta Innovates—Health Solutions.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Any additional data such as study protocol and data extraction forms are available by emailing the first author at bushra.khokhar@ucalgary.ca

[1] ↵
Danaei G,
Finucane MM,
Lu Y, et al
. National, regional, and global trends in fasting plasma glucose and diabetes prevalence since 1980: systematic analysis of health examination surveys and epidemiological studies with 370 country-years and 2·7 million participants. Lancet 2011;378:31–40. doi:10.1016/S0140-6736(11)60679-X
OpenUrl CrossRef PubMed Web of Science

[2] Danaei G,

[3] Finucane MM,

[4] Lu Y, et al

[5] ↵
Karumanchi DK,
Gaillard ER,
Dillon J
. Early diagnosis of diabetes through the eye. Photochem Photobiol 2015;91:1497–504. doi:10.1111/php.12524
OpenUrl

[6] Karumanchi DK,

[7] Gaillard ER,

[8] Dillon J

[9] ↵
Kiefer MM,
Ryan MJ
. Primary care of the patient with chronic kidney disease. Med Clin North Am Online 2015;99:935–52. doi:10.1016/j.mcna.2015.05.003
OpenUrl

[10] Kiefer MM,

[11] Ryan MJ

[12] ↵
Leone S,
Pascale R,
Vitale M, et al
. Epidemiology of diabetic foot. Infez Med 2012;20(Suppl 1):8–13.
OpenUrl

[13] Leone S,

[14] Pascale R,

[15] Vitale M, et al

[16] ↵
Grundy SM,
Benjamin IJ,
Burke GL, et al
. Diabetes and cardiovascular disease: a statement for healthcare professionals from the American Heart Association. Circulation 1999;100:1134–46. doi:10.1161/01.CIR.100.10.1134
OpenUrl FREE Full Text

[17] Grundy SM,

[18] Benjamin IJ,

[19] Burke GL, et al

[20] ↵
World Health Organization. Diabetes: the cost of diabetes. http://www.who.int/mediacentre/factsheets/fs236/en/ (accessed 27 Aug 2014).

[21] ↵
Jutte DP,
Roos LL,
Brownell MD
. Administrative record linkage as a tool for public health research. Annu Rev Public Health 2011;32:91–108. doi:10.1146/annurev-publhealth-031210-100700
OpenUrl CrossRef PubMed Web of Science

[22] Jutte DP,

[23] Roos LL,

[24] Brownell MD

[25] ↵
Molodecky NA,
Panaccione R,
Ghosh S, et al
. Challenges associated with identifying the environmental determinants of the inflammatory bowel diseases. Inflamm Bowel Dis 2011;17:1792–9. doi:10.1002/ibd.21511
OpenUrl CrossRef PubMed

[26] Molodecky NA,

[27] Panaccione R,

[28] Ghosh S, et al

[29] ↵
Centers for Disease Control and Prevention: National Notifiable Diseases Surveillance System (NNDSS). http://wwwn.cdc.gov/nndss/case-definitions.html. (accessed 1 Sep 2015).

[30] ↵
Leong A,
Dasgupta K,
Bernatsky S, et al
. Systematic review and meta-analysis of validation studies on a diabetes case definition from health administrative records. PLoS One 2013;8:e75256. doi:10.1371/journal.pone.0075256
OpenUrl CrossRef PubMed

[31] Leong A,

[32] Dasgupta K,

[33] Bernatsky S, et al

[34] ↵
Liberati A,
Altman DG,
Tetzlaff J
. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. Ann Intern Med 2009;151:W65–94.
OpenUrl CrossRef PubMed Web of Science

[35] Liberati A,

[36] Altman DG,

[37] Tetzlaff J

[38] ↵
Whiting P,
Rutjes AW,
Reitsma JB, et al
. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3:25.
OpenUrl CrossRef PubMed

[39] Whiting P,

[40] Rutjes AW,

[41] Reitsma JB, et al

[42] ↵
Borzecki AM,
Wong AT,
Hickey EC, et al
. Identifying hypertension-related comorbidities from administrative data: what's the optimal approach? Am J Med Qual 2004;19:201–6. doi:10.1177/106286060401900504
OpenUrl Abstract/FREE Full Text

[43] Borzecki AM,

[44] Wong AT,

[45] Hickey EC, et al

[46] ↵
Crane HM,
Kadane JB,
Crane PK, et al
. Diabetes case identification methods applied to electronic medical record systems: their use in HIV-infected patients. Curr HIV Res 2006;4:97–106. doi:10.2174/157016206775197637
OpenUrl CrossRef PubMed Web of Science

[47] Crane HM,

[48] Kadane JB,

[49] Crane PK, et al

[50] ↵
Hebert PL,
Geiss LS,
Tierney EF, et al
. Identifying persons with diabetes using Medicare claims data. Am J Med Qual 1999;14:270–7. doi:10.1177/106286069901400607
OpenUrl Abstract/FREE Full Text

[51] Hebert PL,

[52] Geiss LS,

[53] Tierney EF, et al

[54] ↵
Ngo DL,
Marshall LM,
Howard RN, et al
. Agreement between self-reported information and medical claims data on diagnosed diabetes in Oregon's Medicaid population. J Public Health Manag Pract 2003;9:542–4. doi:10.1097/00124784-200311000-00016
OpenUrl CrossRef PubMed

[55] Ngo DL,

[56] Marshall LM,

[57] Howard RN, et al

[58] ↵
Rector TS,
Wickstrom SL,
Shah M, et al
. Specificity and sensitivity of claims-based algorithms for identifying members of Medicare+choice health plans that have chronic medical conditions. Health Serv Res 2004;39:1839–57. doi:10.1111/j.1475-6773.2004.00321.x
OpenUrl CrossRef PubMed Web of Science

[59] Rector TS,

[60] Wickstrom SL,

[61] Shah M, et al

[62] ↵
Miller DR,
Safford MM,
Pogach LM
. Who has diabetes? Best estimates of diabetes prevalence in the department of veterans affairs based on computerized patient data. Diabetes Care 2004;27(Suppl 2):B10–21.
OpenUrl Abstract/FREE Full Text

[63] Miller DR,

[64] Safford MM,

[65] Pogach LM

[66] ↵
Singh JA
. Accuracy of veterans affairs databases for diagnoses of chronic diseases. Prev Chronic Dis 2009;6:A126.
OpenUrl PubMed

[67] Singh JA

[68] ↵
O'Connor PJ,
Rush WA,
Pronk NP, et al
. identifying diabetes mellitus or heart disease among health maintenance organization members: sensitivity, specificity, predictive value, and cost of survey and database methods. Am J Manag Care 1998;4:335–42.
OpenUrl PubMed Web of Science

[69] O'Connor PJ,

[70] Rush WA,

[71] Pronk NP, et al

[72] ↵
Hux JE,
Ivis F,
Flintoft V, et al
. diabetes in Ontario: determination of prevalence and incidence using a validated administrative data algorithm. Diabetes Care 2002;25:512–16. doi:10.2337/diacare.25.3.512
OpenUrl Abstract/FREE Full Text

[73] Hux JE,

[74] Ivis F,

[75] Flintoft V, et al

[76] ↵
Robinson JR,
Young TK,
Roos LL, et al
. Estimating the burden of disease. Comparing administrative data and self-reports. Med Care 1997;35:932–47.
OpenUrl CrossRef PubMed Web of Science

[77] Robinson JR,

[78] Young TK,

[79] Roos LL, et al

[80] ↵
Wilchesky M,
Tamblyn RM,
Huang A
. Validation of diagnostic codes within medical services claims. J Clin Epidemiol 2004;57:131–41. doi:10.1016/S0895-4356(03)00246-4
OpenUrl CrossRef PubMed Web of Science

[81] Wilchesky M,

[82] Tamblyn RM,

[83] Huang A

[84] ↵
So L,
Evans D,
Quan H
. ICD-10 coding algorithms for defining comorbidities of acute myocardial infarction. BMC Health Serv Res 2006;6:161. doi:10.1186/1472-6963-6-161
OpenUrl CrossRef PubMed

[85] So L,

[86] Evans D,

[87] Quan H

[88] ↵
Chen G,
Khan N,
Walker R, et al
. Validating ICD coding algorithms for diabetes mellitus from administrative data. Diabetes Res Clin Pract 2010;89:189–95. doi:10.1016/j.diabres.2010.03.007
OpenUrl CrossRef PubMed Web of Science

[89] Chen G,

[90] Khan N,

[91] Walker R, et al

[92] ↵
Quan H,
Li B,
Saunders LD, et al
. Assessing validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions in a unique dually coded database. Health Serv Res 2008;43:1424–41. doi:10.1111/j.1475-6773.2007.00822.x
OpenUrl CrossRef PubMed Web of Science

[93] Quan H,

[94] Li B,

[95] Saunders LD, et al

[96] ↵
Young TK,
Roos NP,
Hammerstrand KM
. Estimated burden of diabetes mellitus in Manitoba according to health insurance claims: a pilot study. CMAJ 1991;144:318–24.
OpenUrl Abstract

[97] Young TK,

[98] Roos NP,

[99] Hammerstrand KM

[100] ↵
Nedkoff L,
Knuiman M,
Hung J, et al
. Concordance between administrative health data and medical records for diabetes status in coronary heart disease patients: a retrospective linked data study. BMC Med Res Methodol 2013;13:121. doi:10.1186/1471-2288-13-121
OpenUrl CrossRef PubMed

[101] Nedkoff L,

[102] Knuiman M,

[103] Hung J, et al

[104] ↵
Zgibor JC,
Orchard TJ,
Saul M, et al
. Developing and validating a diabetes database in a large health system. Diabetes Res Clin Pract 2007;75:313–19. doi:10.1016/j.diabres.2006.07.007
OpenUrl CrossRef PubMed Web of Science

[105] Zgibor JC,

[106] Orchard TJ,

[107] Saul M, et al

[108] ↵
Kandula S,
Zeng-Treitler Q,
Chen L, et al
. A Bootstrapping algorithm to improve cohort identification using structured data. J Biomed Inform 2011;44(Suppl 1):S63–68. doi:10.1016/j.jbi.2011.10.013
OpenUrl

[109] Kandula S,

[110] Zeng-Treitler Q,

[111] Chen L, et al

[112] ↵
Public Health Agency of Canada. National Diabetes Surveillance System, Public Health Agency of Canada. http://www.phac-aspc.gc.ca/ccdpc-cpcmc/ndss-snsd/english/index-eng.php (accessed 20 Aug 2014).

[113] ↵
Roos LL,
Roos NP,
Cageorge SM, et al
. How good are the data? Reliability of one health care data bank. Med Care 1982;20: 266–76.
OpenUrl CrossRef PubMed Web of Science

[114] Roos LL,

[115] Roos NP,

[116] Cageorge SM, et al

[117] ↵
Klabunde CN,
Potosky AL,
Legler JM, et al
. Development of a comorbidity index using physician claims data. J Clin Epidemiol 2000;53:1258–67. doi:10.1016/S0895-4356(00)00256-0
OpenUrl CrossRef PubMed Web of Science

[118] Klabunde CN,

[119] Potosky AL,

[120] Legler JM, et al

[121] ↵
Carral F,
Olveira G,
Aguilar M, et al
. Hospital discharge records under-report the prevalence of diabetes in inpatients. Diabetes Res Clin Pract 2003;59:145–51. doi:10.1016/S0168-8227(02)00200-0
OpenUrl CrossRef PubMed

[122] Carral F,

[123] Olveira G,

[124] Aguilar M, et al

[125] ↵
Horner RD,
Paris JA,
Purvis JR, et al
. Accuracy of patient encounter and billing information in ambulatory care. J Fam Pract 1991;33:593–8.
OpenUrl PubMed Web of Science

[126] Horner RD,

[127] Paris JA,

[128] Purvis JR, et al

[129] ↵
O'Malley KJ,
Cook KF,
Price MD, et al
. Measuring diagnoses: ICD code accuracy. Health Serv Res 2005;40:1620–39. doi:10.1111/j.1475-6773.2005.00444.x
OpenUrl CrossRef PubMed Web of Science

[130] O'Malley KJ,

[131] Cook KF,

[132] Price MD, et al

[133] ↵
Goldman N,
Lin IF,
Weinstein M, et al
. Evaluating the quality of self-reports of hypertension and diabetes. J Clin Epidemiol 2003;56:148–54. doi:10.1016/S0895-4356(02)00580-2
OpenUrl CrossRef PubMed Web of Science

[134] Goldman N,

[135] Lin IF,

[136] Weinstein M, et al

[137] ↵
Kriegsman DM,
Penninx BW,
van Eijk JT, et al
. Self-reports and general practitioner information on the presence of chronic diseases in community dwelling elderly. A study on the accuracy of patients’ self-reports and on determinants of inaccuracy. J Clin Epidemiol 1996;49:1407–17. doi:10.1016/S0895-4356(96)00274-0
OpenUrl CrossRef PubMed Web of Science

[138] Kriegsman DM,

[139] Penninx BW,

[140] van Eijk JT, et al

[141] ↵
Mackenbach JP,
Looman CW,
van der Meer JB
. Differences in the misreporting of chronic conditions, by level of education: the effect on inequalities in prevalence rates. Am J Public Health 1996;86:706–11. doi:10.2105/AJPH.86.5.706
OpenUrl CrossRef PubMed Web of Science

[142] Mackenbach JP,

[143] Looman CW,

[144] van der Meer JB

[145] ↵
Garay-Sevilla ME,
Malacara JM,
Gutiérrez-Roa A, et al
. Denial of disease in type 2 diabetes mellitus: its influence on metabolic control and associated factors. Diabet Med 1999;16:238–44. doi:10.1046/j.1464-5491.1999.00033.x
OpenUrl CrossRef PubMed

[146] Garay-Sevilla ME,

[147] Malacara JM,

[148] Gutiérrez-Roa A, et al

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

Strengths and limitations of this study

Background

Methods

Search strategy

Supplementary appendix

Supplementary appendix

Study selection

Inclusion/exclusion criteria

Data extraction and quality assessment

Results

Identification and description of studies

Physician claims data

Hospital discharge data

Combination of physician claims and hospital discharge data

Discussion

Generalisability

Conclusions

References

Footnotes

Read the full text or download the PDF:

Log in using your username and password