Article Text

Accuracy of administrative data for surveillance of healthcare-associated infections: a systematic review
  1. Maaike S M van Mourik1,
  2. Pleun Joppe van Duijn2,
  3. Karel G M Moons2,
  4. Marc J M Bonten1,2,
  5. Grace M Lee3,4
  1. 1Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, The Netherlands
  2. 2Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
  3. 3Department of Population Medicine, Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, Massachusetts, USA
  4. 4Division of Infectious Diseases, Boston Children's Hospital, Boston, Massachusetts, USA
  1. Correspondence to Dr Maaike S M van Mourik; M.S.M.vanMourik-2{at}umcutrecht.nl

Abstract

Objective Measuring the incidence of healthcare-associated infections (HAI) is of increasing importance in current healthcare delivery systems. Administrative data algorithms, including (combinations of) diagnosis codes, are commonly used to determine the occurrence of HAI, either to support within-hospital surveillance programmes or as free-standing quality indicators. We conducted a systematic review evaluating the diagnostic accuracy of administrative data for the detection of HAI.

Methods Systematic search of Medline, Embase, CINAHL and Cochrane for relevant studies (1995–2013). Methodological quality assessment was performed using QUADAS-2 criteria; diagnostic accuracy estimates were stratified by HAI type and key study characteristics.

Results 57 studies were included, the majority aiming to detect surgical site or bloodstream infections. Study designs were very diverse regarding the specification of their administrative data algorithm (code selections, follow-up) and definitions of HAI presence. One-third of studies had important methodological limitations including differential or incomplete HAI ascertainment or lack of blinding of assessors. Observed sensitivity and positive predictive values of administrative data algorithms for HAI detection were very heterogeneous and generally modest at best, both for within-hospital algorithms and for formal quality indicators; accuracy was particularly poor for the identification of device-associated HAI such as central line associated bloodstream infections. The large heterogeneity in study designs across the included studies precluded formal calculation of summary diagnostic accuracy estimates in most instances.

Conclusions Administrative data had limited and highly variable accuracy for the detection of HAI, and their judicious use for internal surveillance efforts and external quality assessment is recommended. If hospitals and policymakers choose to rely on administrative data for HAI surveillance, continued improvements to existing algorithms and their robust validation are imperative.

  • EPIDEMIOLOGY

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Administrative data algorithms, based on discharge and procedure codes, are increasingly used to facilitate surveillance efforts and derive quality indicators.

  • This comprehensive systematic review explicitly distinguished between administrative data algorithms developed for in-hospital surveillance and those for (external) quality assessment.

  • All included primary studies were subjected to a thorough methodological quality assessment; this revealed frequent risk of bias in primary studies.

  • The diverse nature of primary studies regarding study methods and algorithms precluded the pooling of results in most instances.

Introduction

Assessment of quality of care and monitoring of patient complications is a key concept in current healthcare delivery systems.1 Administrative data, and discharge codes in particular, have been used as a valuable source of information to define patient populations, assess severity of disease, determine patient outcomes and detect adverse events, including healthcare-associated infections (HAI).2–4 In certain instances, administrative data are employed to measure quality of care and govern payment incentives. Examples include patient-safety indicators (PSIs) developed by the USA Agency for Healthcare Quality Research, reduced payment for Healthcare-Associated Conditions (HACs) considered preventable and the expansion of value-based purchasing (VBP) initiatives, both implemented by US federal payers.5–8 HAI rates reported to the national surveillance networks such as the US National Healthcare Safety Network (NHSN) are often determined from clinical patient information through chart review. Although these more clinical rates are increasingly adopted by quality programmes, administrative data are still a key component of HAI detection for payers and some quality measurement programmes.4 ,6

Nonetheless, many cautionary notes have been raised regarding the accuracy of administrative data for the purpose of HAI surveillance.1 ,9–11 Their universal use, ease of accessibility and relative standardisation across settings and time make them attractive for large-scale surveillance and research efforts. On the flip side—inherent to their purpose as a means to organise billing and reimbursement of healthcare—administrative data were not designed for the surveillance of HAI. Hence, when assigning primary and secondary discharge diagnosis codes, other interests may have greater priority, for example, maximising reimbursement for care delivered. In addition, the reliability of diagnosis code assignment depends heavily on adequate clinician documentation and the number of diagnoses in relation to the number of fields available.3 ,12

For the purpose of HAI surveillance, different targeted applications of administrative data algorithms define what measures of concordance are most important. First, they may be used as a case-finder to support within-hospital surveillance efforts, either in isolation or combined with other indicators of HAI such as microbiology culture results or antibiotic dispensing. In this case, sufficient sensitivity may be preferred over positive predictive value (PPV) to identify patients who require manual confirmation of HAI. Alternatively, discharge codes may be used in external quality indicator algorithms that directly determine the occurrence of HAI and thus gauge hospital performance.3 ,9 ,13 In this setting, high PPV of observed signals may be of greater importance than detecting all cases of HAI. The primary objective of this systematic review was to assess the overall accuracy of published administrative data algorithms for the surveillance (ie, detection) of a broad range of HAI. We also determined whether the accuracy of algorithms developed for within-hospital surveillance differs from those meant for external quality evaluation. In addition, we rigorously evaluated the methodological quality of included studies using the QUADAS-2 tool developed for systematic reviews of diagnostic accuracy studies and also assessed the impact of a possible risk of bias.

Methods

This systematic review includes studies assessing the diagnostic accuracy of administrative data algorithms using discharge and/or procedure codes for detecting HAI. Studies assessing infection or colonisation with specific pathogens (eg, methicillin-resistant Staphylococcus aureus or Clostridium difficile) were not included as laboratory-based surveillance may be considered more appropriate. The results of this analysis are reported in accordance with PRISMA guidelines.14 This review did not receive protocol registration.

Search

Medline, EMBASE, the Cochrane database and CINAHL were searched for studies published from 1995 onwards with a query combining representations of administrative data and (healthcare-associated) infections (see online supplementary data 1 S1) with limits set to articles published in English, French or Dutch. The search was performed on 8 March 2012 and closed on 1 March 2013.

Study selection

To define suitability for inclusion, the following criteria were applied: (1) the study assessed concordance between administrative data and HAI occurrence, (2) data included were from 1995 or later as earlier data may be of limited generalisability to current practice, (3) the study did not reflect natural language processing and (4) the study presented original research rather than reviews or duplicated results. Selection of studies was done by a single reviewer (MSMvM), with cross-referencing to detect possibly missed studies. Inclusion was not restricted to specific geographical locations or patient populations, and nor was there a requirement for complete data availability.

Definitions

Administrative data algorithms were considered the index test (ie, the test under investigation). These algorithms consist of a selection of diagnosis and/or procedure codes used for billing or other purposes. The selection of codes within each algorithm was either specific for the study or, in some cases, they were predefined metrics used for payment or quality assessment. The latter group includes PSIs, HACs or the code selection defined by the Pennsylvania Healthcare Cost Containment Council (PHC4); most were used and developed in the USA, but the PSIs have also been used in other countries.6 ,15 The reference standard was the presence or absence of HAI as determined by a review of patient clinical records, according to national infection surveillance methods (eg, NHSN), definitions from surgical quality monitoring programmes such as the US Surgical Quality Improvement Program (SQIP) or other definitions.

Quality assessment and data extraction

After selection of studies, quality assessment and data extraction was performed independently by two reviewers (MSMvM, PJvD) using modified QUADAS-2 criteria for quality assessment of diagnostic accuracy studies (see online supplementary table S2 for data extraction forms, details and assumptions).16 ,17

In brief, these criteria evaluate risk of bias and applicability to the review question with respect to methods of patient selection, the index test and the reference standard. In addition, the criteria provide a framework to evaluate risk of bias introduced by (in)complete HAI ascertainment, the so-called ‘patient flow’. Points of special attention during the quality assessment were whether HAI ascertainment was blinded to the outcome of the administrative data algorithm and the identification of partial or differential verification patterns. Partial verification occurs when not all patients were assessed for HAI presence (received the reference standard), in a pattern reliant on the result of the index test. In the case of differential verification, not all patients who were evaluated with the index test received the same reference standard. Depending on the pattern of partial and/or differential verification, this may have introduced bias in the observed accuracy estimates of the algorithm under study.18 Several studies contained multiple types of verification patterns, methods of HAI ascertainment or specifications of administrative data algorithms; quality assessment and data extraction was then applied separately to each so-called comparison. Agreement between observers on methodological quality was reached by discussion.

Analyses

Included studies were stratified by HAI type and by the intended application of the administrative data within the process of HAI surveillance. A distinction was made between algorithms aimed at supporting within-hospital surveillance—either in isolation or in combination with other indicators—and those developed as a means of external quality of care evaluation. In addition, studies were classified by risk of bias based on QUADAS-2 criteria. Forest plots were created depicting the reported sensitivity, specificity, positive and negative predictive values of the administrative data algorithms for HAI detection.

If large enough groups of sufficiently comparable studies with complete two-by-two tables were available, estimates for sensitivity and specificity were pooled using the bivariate method recommended in the Cochrane Handbook for Systematic Reviews of Diagnostic Accuracy.19 ,20 This analysis jointly models the distribution of sensitivity and specificity, accounting for correlation between these two outcome measures. There was no formal assessment of publication bias. All analyses were performed using R V.3.0.1 (http://www.r-project.org) and SPSS Statistics 20 (IBM, Armonk, New York, USA).

Results

Study selection

After removal of duplicates, 8478 unique titles were screened for relevance and exclusion criteria were applied to 675 remaining abstracts. Cross-referencing identified four additional articles; in addition, 10 articles were published between the search date and search closure (figure 1). Fifty-seven studies, containing 71 comparisons, were available for the qualitative synthesis and underwent methodological quality assessment.21–77

Figure 1

Flow chart of study selection and inclusion. HAI, healthcare-associated infections.

Study characteristics

Study design, selection of the study population, methodology used as reference standard and administrative data specifications varied greatly. This large variability in study characteristics precluded the generation of summary estimates for sensitivity and specificity for most types of HAI. As the reference standard, 35 studies applied NHSN methodology to determine HAI presence, six defined HAI as registered in SQIP, and the remaining studies used clinical or other methods (table 1). Case-definitions were applied by infection preventionists in 24 studies, as well as by trained nurses, physicians or other abstractors. Eighteen studies assessed algorithms for within-hospital surveillance, and a further 15 combined administrative data with other indicators of infection (eg, microbiology culture results or antibiotic use) to detect HAI. Twenty-four studies assessed administrative data algorithms explicitly designed for external quality assessment, such as PSIs or HACs. Only seven studies provided data collected after 2008.36 ,45 ,53 ,66 ,69 ,31 ,34

Table 1

Main characteristics of included studies, stratified by targeted type of HAI

Methodological quality

Figure 2 summarises the risk of bias and applicability concerns for each QUADAS-2 domain (see online supplementary data S3 for details by study; S4 for figures by HAI type). A high risk of bias in the flow component was observed in a considerable fraction of included studies. Ascertainment of HAI status was complete in 37 of 57 studies; in other words, only 65% of studies had the same reference standard applied to all or a random sample of the included patients. Alternative verification patterns were: evaluation of only those patients flagged by administrative data (nine), assessment of patients flagged by either administrative data or another test (eg, microbiological testing) (eight) and reclassification of discrepant cases after a second review. A high risk of bias for the flow component often co-occurred with the inability to extract complete data on diagnostic accuracy, mainly as a result of partial verification. In studies that assessed only the PPV, HAI ascertainment was limited to patients flagged by administrative data; this partial verification in itself was not problematic; however, lack of blinding of assessors may still have introduced an overall risk of bias.

Figure 2

Summary of risk of bias and applicability for all studies (n=57), assessed using the Quality Assessment for Diagnostic Accuracy Studies (QUADAS-2) methods. Some studies contain multiple comparisons; in this case, the lowest risk of bias per study is included. Shading denotes studies where extraction of complete two-by-two tables was not possible, including studies only assessing positive predictive values.

Surgical site infection

Thirty four studies assessed surgical site infection (SSI); most studies identified the population at risk (ie, the denominator) by selecting specific procedure codes from claims data, although a few included all patients admitted to surgical wards. Details on administrative data algorithms are specified in online supplementary table S6. Algorithms in studies applying NHSN methods as a reference standard generally also incorporated diagnosis codes assigned during readmissions to complete the required follow-up duration, and several included follow-up procedures to detect SSI.

Accuracy estimates were highly variable (figure 3A, see online supplementary S5A), also within groups of studies with the same target procedures and intended application (range for sensitivity 10–100%, PPV 11–95%). Several studies assessed multiple specifications of administrative data algorithms; as expected, using a broader selection of discharge codes detected more cases of SSI at the cost of lower PPV.26 ,47 ,54 Between studies, there was no apparent relation between the specificity of the codes included and observed accuracy (ICD9 codes 998.5, 996.6 (or equivalent) vs a broader selection, data not shown). Inspection of the forest plots suggests that, in general, studies with a high risk of bias showed a more favourable diagnostic accuracy than those with more robust methodological quality, perhaps with the exception of cardiac procedures.

Figure 3

Forest plots for sensitivity and positive predictive value, stratified by HAI type and relevant study characteristics. Studies are grouped by the intended application of administrative data: Int (S)—used in isolation to support within-hospital surveillance efforts, Int (C)—used to support within-hospital surveillance, combined with other indicators of infection, Ext—used for external quality assessment, including public reporting and pay-for-performance. BSI, bloodstream infection; CABG, coronary artery bypass graft; DRM, drain-related meningitis; HAI, healthcare-associated infections; Ortho, orthopedic procedure; PSI, patient safety indicator; Sep, sepsis; SSI, surgical site infection; UTI, urinary tract infection. In studies including multiple specifications of the administrative data algorithm, these are numbered sequentially. 95% CIs are derived using the exact binomial method. If multiple study designs were performed within a single study, they are mentioned separately. #Reference standard from Surgical Quality Improvement Project (NSQIP or VASQIP). *Code selection based on specification from Pennsylvania Health Cost Containment Council. ** HAC specification.

Bloodstream infections

Of the 24 studies evaluating bloodstream infections (BSI), half focused on central line-associated BSI (CLABSI) and 19 assessed algorithms for external quality assessment. Methods of identifying patients with a central line were very diverse: studies evaluating PSI 7 (‘central venous catheter-related BSI’) or HAC applied specific discharge codes, whereas other studies only included patients with positive blood cultures67 or relied on manual surveillance to determine central line presence (see online supplementary table S6).69 The sensitivity of CLABSI detection was no higher than 40% in all but one study. Notably, only the studies that did not rely on administrative data to determine central line presence achieved sensitivity over 20% (figure 3B and see online supplementary S5B). The sensitivity of administrative data algorithms for detecting BSI was slightly higher. The pooled sensitivity of PSI 13 (‘post-operative sepsis’) in studies using SQIP methods as a reference standard was 17.0% (95% CI 6.8% to 36.4%) with a specificity of 99.6% (99.3% to 99.7%). Of the algorithms meant for external quality assessment, the PPVs varied widely and were often <50%, suggesting that these quality indicators detected many events that were not (CLA)BSI. Again, study designs with higher risks of bias tended to show higher accuracy.

Urinary tract infection

Fifteen studies investigated urinary tract infection (UTI), seven focusing specifically on catheter-associated UTI (CAUTI). In algorithms relying on administrative data to identify patients receiving a urinary catheter, the low sensitivity of CAUTI detection was striking (figure 3C, see online supplementary S5C, S6).78 ,76 Sensitivity was higher for UTI, but PPVs were universally below 25%, except in the study by Heisler et al; this study, however, additionally scrutinised flagged records for the presence of UTI.34

Pneumonia

Fourteen studies evaluated pneumonia, of which nine specifically targeted ventilator-associated pneumonia (VAP). The presence of mechanical ventilation was either determined within the administrative data algorithm34 ,43 or by manual methods.67 For VAP, sensitivity ranged from 35% to 72% and PPV from 12% to 57%. For pneumonia, sensitivity and PPV hovered around 40%, although the studies used very diverse methodologies (figure 3D, see online supplementary S5D).

Other HAI and aggregated estimates

One study assessed the value of administrative data for detection of postpartum endometritis (data extraction not possible) and one the occurrence of drain-related meningitis. In addition, six studies presented data aggregated for multiple types of HAI (figure 3E, see online supplementary S5E). Also, for these studies, sensitivity did not exceed 60%, with similar or lower PPVs.

Algorithms combining administrative data with clinical data

Fifteen studies in this review evaluated the accuracy of administrative data in an algorithm that also included other (automated) indicators of HAI for within-hospital surveillance. Eight allowed for extraction of accuracy estimates of administrative data alone (labelled as ‘Int (C)’ in figure 3) and only very few provided the data necessary to fairly assess the incremental benefit of administrative data over clinical data such as antimicrobial dispensing or microbiology results. In these studies, gains in sensitivity obtained by adding administrative data were at most 10 percent points (data not shown).23 ,49 ,50 ,59 ,74 ,75

Discussion

In the light of the increasing attention for evaluating, improving and rewarding quality of care, efficient and reliable measures to detect HAI are vital. However, as demonstrated by this comprehensive systematic review, administrative data have limited—and very variable—accuracy for the detection of HAI. In addition, algorithms to identify infections related to invasive devices such as central lines and urinary catheters are particularly problematic. All included studies were very heterogeneous in specifications of both the administrative data algorithms and the reference standard. Thorough methodological quality assessment revealed that incomplete ascertainment of HAI status and/or lack of blinding of assessors occurred in one-third of studies, thus introducing a risk of bias and complicating a balanced interpretation of accuracy estimates. Studies employing designs associated with a higher risk of bias appeared to provide a more optimistic picture than those employing more robust methodologies.

The drawbacks of administrative data for the purpose of HAI surveillance have been emphasised previously, especially from the perspective of (external) interfacility comparisons.3 ,9 ,11 ,79 In comparison with a recent systematic review that assessed the accuracy of administrative data for HAI surveillance,9 we identified a larger number of primary studies (partly due to broader inclusion criteria) and distinguished between administrative data algorithms developed for different intended applications. This prior review suggests that despite their moderate sensitivity, administrative data may be useful within broader algorithmic (automated) routine surveillance; notably, the studies in our systematic review demonstrated only modest gains in efficiency over other automated methods.23 ,25 ,26 ,32 ,63 ,67 ,74 Surprisingly, there was no clear difference between administrative data algorithms developed for the purpose of supporting within-hospital surveillance versus those meant for external quality assessment in terms of sensitivity or PPV. Sensitivity was highly variable and PPVs were modest at best, also in algorithms targeting very specific events (CAUTI, CLABSI) for external benchmarking or payment rules. Administrative data may, however, be advantageous when aiming to track HAIs that require postdischarge surveillance across multiple healthcare facilities or levels of care, such as SSI.41 ,80 Importantly, a considerable number of studies were performed in the USA, with a specific billing and quality evaluation system; hence, some quality metrics and coding systems may not be applicable to other countries.

A number of previously published studies explored reasons for the inability of administrative data to detect HAI. For specific quality measures, differences in HAI definitions between the quality metrics and NHSN methods may account for a portion of the discordant cases,81; other explanations include the erroneous detection of infections present-on-admission (PoA) or infections not related to the targeted device, incorrect coding, insufficient clinician documentation, challenges in identifying invasive devices or the limited number of coding fields available.53 ,69 ,44 ,51 ,76 ,82 ,83 The precarious balance between the accuracy of administrative data and their use in quality measurement and pay-for-performance programmes has been argued previously, especially as these efforts may encourage coding practices that further undermine the accuracy of administrative data.11 Recent studies have provided mixed evidence regarding a change in coding practice in response to the introduction of financial disincentives or public reporting programmes.84–86

Several refinements in coding systems are currently in progress that may affect the future performance of administrative data. First, the transition to the 10th revision of the International Classification of Disease (ICD-10) may provide increased specificity due to the greater granularity of available codes.87 Only seven studies in this review used the ICD-10, often in a setting that was not directly comparable to settings using the ICD-9 (mainly the USA), and some studies purposefully mapped the ICD-10 codes to mimic the ICD-9. Second, the number of coding fields available in (standardised) billing records has increased in recent years, allowing for more secondary diagnoses to be recorded; however, it is unclear whether expansion beyond 15 fields will benefit the HAI registration and other complications.60 ,88 Third, the adoption and accuracy of PoA indicators in the process of code assignment remains to be validated, and they were incorporated in only a few studies included in this review.78 ,89 Finally, this systematic review could not provide sufficient data to evaluate changes in coding accuracy since the US introduction of financial disincentives in 2008 for certain HACs that were not present on admission. Ongoing studies are needed to assess the impact of these changes in coding systems on their accuracy for HAI surveillance.

The frequent use of partial or differential verification patterns may be explained by the well-known limitations with quality of traditional surveillance as the reference standard in conjunction with the workload of applying manual surveillance to large numbers of patients.23 ,25 ,26 ,32 ,63 ,67 ,74 Although reclassifying missed cases after a second review will result in more accurate detection of HAI, this differential application of the second review may bias the performance estimates upwards,18 unless it is applied to (a random sample of) all cases, including concordant HAI-negative and HAI-positive cases.23 ,67 ,90

Despite efforts to identify all available studies, we cannot exclude the possibility of having missed studies and nor did we assess publication bias. In addition, as the search was closed in March 2013, a number of primary studies within the domain of this systematic review have been published since closure of the search. The findings of these studies were in line with our observations.80 ,82 ,83 ,90–99 In addition, as a result of our broad inclusion criteria, the included studies were very diverse, complicating the interpretation of the results. Contrary to a previous systematic review,9 the small number of comparable studies motivated us to refrain from generating pooled summary estimates in most cases. Future evaluations of the accuracy of administrative data should consider applying the same reference standard to all patients, or—if unfeasible—to a random sample in each subgroup of the two-by-two table and ensure blinding of assessors. To facilitate a balanced interpretation of the results, estimates of diagnostic accuracy calculated before and after reclassification should also be reported separately.100

Conclusion

Administrative data such as diagnosis and procedure codes have limited, and highly variable, accuracy for the surveillance of HAI. Sensitivity of HAI detection was insufficient in most studies and administrative data algorithms that target specific HAI for external quality reporting also had generally poor PPVs, with identification of device-associated infections being the most challenging. The relative paucity of studies with a robust methodology and the diverse nature of the studies, together with continuous refinements in coding systems, preclude reliable forecasting of the accuracy of administrative data in future applications. If administrative data continue to be used for the purposes of HAI surveillance, benchmarking or payment, improvement to existing algorithms and their robust validation is imperative.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter Follow Maaike van Mourik at @vanmourikmaaike

  • Contributors MSMvM designed the study, performed the search, critically appraised studies, performed the analysis and drafted the manuscript; PJvD critically appraised studies and helped write the manuscript; MJMB and KGMM assisted in the study design, critical appraisal, data analysis and writing of the manuscript; GML assisted in the study design, data interpretation and writing of the manuscript.

  • Funding MJMB and KGMM received various grants from the Netherlands Organization for Scientific Research and several EU projects in addition to unrestricted research grants to KGMM from GSK, Bayer and Boehringer for research conducted at his institution. GML received a grant from the Agency for Healthcare research (R01 HS018414) as well as funding from the NIH, CDC and FDA.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.