Article Text

PDF

Systematic review of validated case definitions for diabetes in ICD-9-coded and ICD-10-coded data in adult populations
  1. Bushra Khokhar1,2,
  2. Nathalie Jette1,2,3,
  3. Amy Metcalfe2,4,5,
  4. Ceara Tess Cunningham1,
  5. Hude Quan1,2,
  6. Gilaad G Kaplan1,2,
  7. Sonia Butalia2,6,
  8. Doreen Rabi1,2,6
  1. 1Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada
  2. 2O'Brien Institute for Public Health, University of Calgary, Calgary, Alberta, Canada
  3. 3Department of Clinical Neurosciences, Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada
  4. 4Department of Obstetrics and Gynecology, University of Calgary, Calgary, Alberta, Canada
  5. 5Alberta Children's Hospital Research Institute, Calgary, Alberta, Canada
  6. 6Division of Endocrinology, Department of Medicine, University of Calgary, Calgary, Alberta, Canada
  1. Correspondence to Bushra Khokhar; bushra.khokhar{at}ucalgary.ca

Abstract

Objectives With steady increases in ‘big data’ and data analytics over the past two decades, administrative health databases have become more accessible and are now used regularly for diabetes surveillance. The objective of this study is to systematically review validated International Classification of Diseases (ICD)-based case definitions for diabetes in the adult population.

Setting, participants and outcome measures Electronic databases, MEDLINE and Embase, were searched for validation studies where an administrative case definition (using ICD codes) for diabetes in adults was validated against a reference and statistical measures of the performance reported.

Results The search yielded 2895 abstracts, and of the 193 potentially relevant studies, 16 met criteria. Diabetes definition for adults varied by data source, including physician claims (sensitivity ranged from 26.9% to 97%, specificity ranged from 94.3% to 99.4%, positive predictive value (PPV) ranged from 71.4% to 96.2%, negative predictive value (NPV) ranged from 95% to 99.6% and κ ranged from 0.8 to 0.9), hospital discharge data (sensitivity ranged from 59.1% to 92.6%, specificity ranged from 95.5% to 99%, PPV ranged from 62.5% to 96%, NPV ranged from 90.8% to 99% and κ ranged from 0.6 to 0.9) and a combination of both (sensitivity ranged from 57% to 95.6%, specificity ranged from 88% to 98.5%, PPV ranged from 54% to 80%, NPV ranged from 98% to 99.6% and κ ranged from 0.7 to 0.8).

Conclusions Overall, administrative health databases are useful for undertaking diabetes surveillance, but an awareness of the variation in performance being affected by case definition is essential. The performance characteristics of these case definitions depend on the variations in the definition of primary diagnosis in ICD-coded discharge data and/or the methodology adopted by the healthcare facility to extract information from patient records.

  • diabetes
  • validation studies
  • case definition
  • administrative data

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

Statistics from Altmetric.com

Strengths and limitations of this study

  • Our systematic review was comprehensive as it had a broad search strategy that bore no language or time restriction.

  • All included studies captured patient information at the population level with clear case definitions encompassing a broad spectrum of patients.

  • There is the potential for a language bias as studies where full texts were not available in English were not considered.

  • There are potential limitations for all reference standards used to validate administrative definitions for diabetes.

Background

Diabetes is a chronic disease that has increased substantially during the past 20 years.1 At present, diabetes is the leading cause of blindness,2 renal failure3 and non-traumatic lower limb amputations4 and is a major risk factor for cardiovascular disease.5 Owing to its chronic nature, the severity of its complications and the means required to control it, diabetes is a costly disease. The healthcare costs associated with this condition are substantial and can account for up to 15% of national healthcare budgets.6

Understanding the distribution of diabetes and its complications in a population is important to understand disease burden and to plan for effective disease management. Diabetes surveillance systems using administrative data can efficiently and readily analyse routinely collected health-related information from healthcare systems, provide reports on risk factors, care practices, morbidity and mortality and estimate incidence and prevalence at a population level.7 With steady increases in ‘big data’ and data analytics over the past two decades, administrative health databases have become more accessible to health services researchers and are now used regularly to study the processes and outcomes of healthcare. However, administrative health data are not collected primarily for research or surveillance. There is a need for health administrative data users to examine the validity of case ascertainment in their data sources before use.8

By definition, surveillance depends on a valid case definition that is applied constantly over time. A case definition is set of uniform criteria used to define a disease for surveillance.9 However, a variety of diabetes case definitions exist, resulting in variation in reported diabetes prevalence estimates. A systematic review and meta-analysis of validation studies on diabetes case definitions from administrative records has been performed.10 This review aimed to determine the sensitivity and specificity of a commonly used diabetes case definition, “two physician claims or one hospital discharge abstract record within a two-year period” and its potential effect on diabetes prevalence estimation. Our study extends this body of work by systematically reviewing validated International Classification of Diseases (ICD), 9th edition (ICD-9)-based and ICD-10-based case definitions for diabetes and comparing the validity of different case definitions across studies and countries.

Methods

Search strategy

This systematic review was performed using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines11 (see online supplementary appendix A). Two citation databases, MEDLINE and Embase, were searched using an OVID platform from 1980 until September 2015. The search strategy consisted of the following set of terms (see online supplementary appendix B): (1) (health services research or administrative data or hospital discharge data or ICD-9 or ICD-10 or medical record or health information or surveillance or physician claims or claims or hospital discharge or coding or codes) AND (2) (validity or validation or case definition or algorithm or agreement or accuracy or sensitivity or specificity or positive predictive value or negative predictive value) AND (3) medical subject heading terms for diabetes. Searches were limited to human studies published in English. The broad nature of the search strategy allowed for the detection of modifications of ICD codes, such as international clinical modification (eg, ICD-9-CM).

Study selection

Studies were evaluated in duplicate for eligibility in a two-stage procedure. In stage 1, all identified titles and abstracts were reviewed and in stage 2, a full text review was performed on all studies that met the predefined eligibility criteria. If either reviewer defined a study as eligible in stage 1, it was included in the full text review in stage 2. Disagreements were resolved by discussion or consultation with a third reviewer.

Inclusion/exclusion criteria

A study was included in the systematic review if it met the following criteria: (1) study population included those ≥18 years of age with type 1 diabetes mellitus or type 2 diabetes mellitus; (2) statistical estimates (sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) or κ) were reported or could be calculated; (3) an ICD-9 or ICD-10 case definition for diabetes was reported and validated; (4) a satisfactory reference standard (eg, self-report from population-based surveys or patient medical chart reviews) and (5) if it reported on original data. Studies validating diabetes in specialised populations (eg, cardiovascular disease) were excluded to ensure that the diabetes case definitions would be generalisable. Studies not employing a sole medical encounter data in their diabetes case definition (eg, inclusion of pharmacy or laboratory data) were also excluded, as the independent validity of such definitions could not be calculated. Bibliographies of included studies were manually searched for additional studies, which were then screened and reviewed using the same methods described above.

Data extraction and quality assessment

Primary outcomes were sensitivity, specificity, PPV, NPV and κ reported for each of the ICD-coded diabetes case definition. Other extracted data included sample size and ICD codes used. If statistical estimates were not reported in the original paper, estimates were calculated from data available.

Calculating a pooled estimate of surveillance performance measures using meta-analytic techniques was deemed inappropriate given the heterogeneity of diabetes case definitions and reference standards used across studies. Data were tabulated by the type of administrative health data used. Study quality was evaluated using the Quality Assessment Tool for Diagnostic Accuracy Studies (QUADAS) criteria.12

Results

Identification and description of studies

A total of 2895 abstracts were identified with 193 studies reviewed in full text, of which 16 studies met all eligibility criteria (figure 1). Eight of these studies were conducted in the USA,13–20 seven in Canada21–27 and one in Australia.28 Thirteen studies used ICD-9 codes,13–19 ,21–23 ,26–28 and the remaining three studies used ICD-9 and ICD-10 codes.23–25 None of the studies differentiated or commented as to whether a particular code of interest was in the primary or in one of the secondary diagnostic positions. Of the 16 studies reviewed, 8 used medical records13 ,14 ,21 ,23–26 ,28 and 8 used either self-reported surveys or telephone surveys to validate the diabetes diagnosis.15–20 ,22–27 Eight studies used physician claims data,13–16 ,18–20 ,23 four studies used hospital discharge data22 ,24 ,26 ,28 and four studies used a combination of both.17 ,21 ,25 ,27 Two studies used electronic medical records (EMRs) as their health data source,29 ,30 but these were removed from the review since EMRs were not a part of our search strategy.

Figure 1

Study flow chart. ICD, International Classification of Diseases.

The QUADAS Scores (table 1) ranged from 9 to 13 of a maximum of 14. Five questions were selected from QUADAS to constitute the ‘bias assessment’. Regardless of quality assessment scores, all 16 studies are discussed in this systematic review.

Table 1

Study quality characteristics using QUADAS tool

The sample size varied from 93 to ∼3 million people. Sensitivity and specificity values were available from all 18 studies, PPV in 16 studies, NPV in 12 studies and κ in 6 studies. All 16 studies were categorised by the type of administrative health data source being used.

Physician claims data

Table 2 lists the eight studies13–16 ,18–20 ,23 using physician claims data. In these studies, the sensitivity ranged from 26.9% to 97%, specificity ranged from 94.3% to 99.4%, PPV ranged from 71.4% to 96.2%, NPV ranged from 95% to 99.6% and κ ranged from 0.8 to 0.9. Four of the eight studies using physician claims data had a least one diabetes case definition where sensitivity and specificity exceed 80%.

Table 2

Study characteristics and test measures of studies for physician claims data

Studies comparing physician claims-based case definitions over multiple years13 ,15 ,16 consistently show increases in sensitivity values and a slight decrease in specificity and PPV overtime. This relationship is consistent with the study18 looking at changes in the statistical estimates with increasing the number of appearance of diagnostic codes in the case definition—the sensitivity was the highest when any diagnostic code (inpatient or outpatient) was used, whereas the specificity and PPV were the highest when most number of outpatient diagnostic codes were used.

Hospital discharge data

Table 3 lists the four studies22 ,24 ,26 ,28 using only hospital discharge data. In these studies, the sensitivity ranged from 59.1% to 92.6%, specificity ranged from 95.5% to 99%, PPV ranged from 62.5% to 96%, NPV ranged from 90.8% to 99% and κ ranged from 0.6 to 0.9. Two of the four studies using hospital discharge data had a least one diabetes case definition where sensitivity and specificity exceed 80%. In contrast to the physician claims-based case definitions, the sensitivity seemed to improve when a longer duration was used in the case definition, however the specificity and the PPV behaved inversely.

Table 3

Study characteristics and test measures of studies for hospital discharge data

Combination of physician claims and hospital discharge data

Table 4 lists out the four studies17 ,21 ,25 ,27 using a combination of physician claims and hospital discharge data. In these studies, the sensitivity ranged from 57% to 95.6%, specificity ranged from 88% to 98.5%, PPV ranged from 54% to 80%, NPV ranged from 98% to 99.6% and κ ranged from 0.7 to 0.8. Using a combination of two or more data sources increases the minimum value of the range for sensitivity compared to using either physician claims or hospital discharge data-based definitions individually. All four of the studies using a combination of physician claims and hospital discharge data had a least one case definition where sensitivity and specificity exceed 80%.

Table 4

Study characteristics and test measures of studies for physician claims data and hospital discharge data

Another factor affecting the statistical estimates is the number of claims being used in the definition. Rector et al's study17 shows consistent results where the sensitivity is higher when at least one claims data are used in the definition, but the specificity is higher when at least two are used. Finally, Young et al's study27 demonstrates the highest sensitivity when two physician claims and two hospital discharge data are used in the definition and the highest specificity when one physician claim and two hospital claims are used in the definition.

A secondary tabulation of data was performed by the type of ICD coding system used. Eight studies using ICD-9 coding systems are from the USA and four studies from Canada. Four studies use ICD-9 and ICD-10 coding systems—three of these are from Canada and one from Western Australia. In studies using ICD-9 codes, sensitivity ranged from 26.9% to 100%, specificity ranged from 88% to 100%, PPV ranged from 21% to 100%, NPV ranged from 74% to 99.6% and κ ranged from 0.6 to 0.9; whereas, in the studies using ICD-10 codes, the ranges for sensitivity (59.1% to 89.6%) and specificity (95.5% to 99%) narrowed significantly, and PPV ranged from 63.1% to 96%, NPV ranged 90.8% to 98.9% and κ ranged from 0.6 to 0.9.

Discussion

In this systematic review, case definitions appear to perform better when more data sources are used over a longer observation period. The outcomes with respect to sensitivity, specificity and PPV for each of these studies seem to differ due to variations in the definition of primary diagnosis in ICD-coded health data, the use of hospital discharge versus physician billing claims and by the geographical location.

The validity of diabetes case definitions varies significantly across studies, but we identified definition features that were associated with better performance. The combinations of more than one data source, physician claim and/or hospital discharge encounter along with an observation period of more than 1 year consistently demonstrated higher sensitivity with only a modest decline in specificity. These definition characteristics are present in the definition used by the National Diabetes Surveillance System to identify Canadians with diabetes mellitus.31 The performance of this particular definition has been widely studied, and a meta-analysis pooling the results of these studies demonstrates a pooled sensitivity of 82.3% (95% CI 75.8% to 87.4%) and a specificity of 97.9% (95% CI 96.5% to 98.8%).10

This systematic review provides new knowledge on factors that are associated with enhanced definition performance and outlines the trade-offs one encounters with respect to sensitivity and specificity (and secondarily PPV and NPV) related to data source and years of follow-up. The development of an administrative case definition of diabetes is often related to pragmatic considerations (type of data on hand); however, this systematic review provides health services researchers with important information on how case definitions may perform given definition characteristics.

There was considerable ‘within-data definition’ variation in measures of validity. This variation likely reflects that neither physician claims nor hospital discharge data are primarily collected for surveillance; hence, the accuracy of diagnoses coded in these data sources remains suspect. Physician claims, while potentially rich in clinical information, are not recorded in a standardised manner. Billing practices do vary by practitioner, which may in turn be influenced by the nature of physician reimbursement (salary vs fee for service).23 ,32 ,33 Furthermore, patients with diabetes commonly carry multiple comorbidities, so while patients may have diabetes and be seen by a physician, providers will file billing claims for conditions other than diabetes.34 ,35 In contrast, hospital discharge data are limited to clinical information that is relevant to an individual hospitalisation, capturing diagnostic and treatment information usually for a brief window of time. The advantage of hospital discharge data for surveillance is that discharge diagnostic and medical procedure information are recorded by medical coders with standardised training with a detailed review of medical charts. However, the standard method of discharge coding does vary regionally, and thus variation around validity estimates based on these differences in coding practices will be observed.

Ideal performance parameters will vary based on the clinical condition of interest, the nature of surveillance and the type of data being used for surveillance. When studying diabetes trends and incidence rate, a case definition that has high but balanced measures of sensitivity and PPV is preferred. This will ensure maximal capture of potential patients and that patients captured likely have diabetes. This systematic review suggests that the commonly used two physician outpatient billings and/or one hospitalisation within a certain period of time is appropriate. It is also important to recognise that the data source used may also affect the type of patient identified with administrative data definitions. Hospital discharge data (when used in isolation) will potentially identify patients with more advanced disease or more complications and therefore may not be fully representative of the entire diabetes population. Similarly, physician claims data may identify a comparatively well, ambulatory population that has access to physician care in the community.

The greatest strength of this systematic review is its inclusiveness—the search strategy was not restricted by region, time or any particular case definition of diabetes. However, most of the studies, 15 of the 16, included in the qualitative analysis were conducted in North America with high sensitivity and specificity estimates between the cases identified through the administrative data versus medical records and the administrative data versus population-based surveys across studies, suggesting that public administrative data are a viable substitute for diabetes surveillance. Finally, the study quality across all studies included was generally high as measured by the QUADAS Scale.

There is the potential for a language bias as studies whose full texts were not available in English were not considered. There are potential limitations for all reference standards used to validate administrative case definitions for diabetes. The accuracy of chart reviews depends principally on physician documentation, availability of records and the accuracy of coding.36 Self-reported surveys and telephone surveys are prone to recall bias, social desirability bias, poor understanding of survey questions or incomplete knowledge of their diagnosis. Self-reported surveys can also suffer from participation biases as patients with low diabetes risk may be less willing to participate whereas certain patients with advance diabetes may be too unwell to participate. Age, sex and a patient's level of education can have an effect on the reporting of diabetes.37–39 Those with poorly controlled diabetes have been found to underreport their disease status.40 The ideal reference standard would be a clinical measure (such as glucose or HbA1c); however, the use of a clinical reference standard is not often performed.

In addition to the limitations of the reference standards used for validation, it should also be noted that even clinical measures as a references standard are imperfect and glucose and HbA1C are surrogates of the underlying disease process. It should also be noted that glucose and HbA1C thresholds for diagnosis have changed (albeit modestly) over the past 20 years. Changes in the clinical definition overtime have significant implications to diabetes surveillance. Understanding changing diagnostic thresholds is critical to interpreting surveillance data. However, the validity of an administrative data case definition is conceptually related but somewhat separate from the clinical definition. If we are to understand the clinical definition as a biological or physiologic definition that denotes the presence or absence of disease, the administrative data definitions are a surrogate of disease and denote the presence or the absence of disease based on care for the disease. The administrative definitions identify patients with a diagnosis of diabetes based on an interaction with the healthcare system in which they received care for diabetes. Therefore, the application of this definition follows the application of the clinical definition. There is a presumption that the clinical definition, whatever it may be at the time of the application, was valid.

Finally, difference between type 1 diabetes mellitus and type 2 diabetes mellitus is not clear in studies using administrative databases. In this systematic review, we included only adult population (≥18 years of age), which is primarily the type 2 diabetes population.

Generalisability

Fifteen of the 16 included studies were conducted in North America, and therefore it is not surprising that the validation studies report comparable results. However, even though these studies are nested in the general population, the selected diabetes cohorts used in the validation studies may not always be truly representative of the general population.

Conclusions

Most studies included in this review use similar case definitions that require one or more diagnoses of diabetes. The performance characteristics of these case definitions depends on the variations in the definition of primary diagnosis in ICD-coded discharge data and/or the methodology adopted by the healthcare facility to extract information from patient records. Purpose of surveillance and the type of data being used should command the performance parameters of an administrative case definition. Approaches used in developing case definitions for diabetes can be simple and practical and result in high sensitivity, specificity and PPV. Overall, administrative health databases are useful for undertaking diabetes surveillance,21 ,25 but an awareness of the variation in performance being affected by case definition is essential.

References

View Abstract

Footnotes

  • Contributors NJ wrote the protocol. BK, AM and CTC carried out the systematic review. BK wrote the manuscript. NJ, HQ, GGK, SB and DR provided final approval of the version to be published. All authors read and approved the final manuscript.

  • Funding BK was supported by the Alliance for Canadian Health Outcomes Research in Diabetes (ACHORD) and The Western Regional Training Centre for Health Services Research (WRTC). NJ holds a Canada Research Chair in Neurological Health Services Research and an Alberta Innovates Health Solutions (AI-HS) Population Health Investigator Award and operating funds (not related to this work) from the Canadian Institutes of Health Research, AI-HS, the University of Calgary and the Hotchkiss Brain Institute and Cumming School of Medicine. CTC is funded by a Canadian Institute of Health Research doctoral research scholarship. GGK is a Population Health Investigator supported by Alberta Innovates—Health Solutions. DR is a Population Health Investigator supported by Alberta Innovates—Health Solutions.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Any additional data such as study protocol and data extraction forms are available by emailing the first author at bushra.khokhar@ucalgary.ca

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.