Article Text

Download PDFPDF

Original research
Validity of ICD-10 codes for COVID-19 patients with hospital admissions or ED visits in Canada: a retrospective cohort study
  1. Guosong Wu1,
  2. Adam G D'Souza1,2,
  3. Hude Quan1,
  4. Danielle A Southern1,
  5. Erik Youngson2,
  6. Tyler Williamson1,
  7. Cathy Eastwood1,
  8. Yuan Xu1,3
  1. 1Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
  2. 2Alberta Health Services, Calgary, Alberta, Canada
  3. 3Department of Oncology and Surgery, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
  1. Correspondence to Dr Guosong Wu; guosong.wu{at}ucalgary.ca

Abstract

Objective To evaluate the validity of COVID-19 International Classification of Diseases, 10th Revision (ICD-10) codes and their combinations.

Design Retrospective cohort study.

Setting Acute care hospitals and emergency departments (EDs) in Alberta, Canada.

Participants Patients who were admitted to hospital or presented to an ED in Alberta, as captured by local administrative databases between 1 March 2020 and 28 February 2021, who had a positive COVID-19 test and/or a COVID-19-related ICD-10 code.

Main outcome measures The sensitivity, positive predictive value (PPV) and 95% CIs for ICD-10 codes were computed. Stratified analysis on age group, sex, symptomatic status, mechanical ventilation, hospital type, patient intensive care unit (ICU) admission, discharge status and season of pandemic were conducted.

Results Two overlapping subsets of the study population were considered: those who had a positive COVID-19 test (cohort A, for estimating sensitivity) and those who had a COVID-19-related ICD-10 code (cohort B, for estimating PPV). Cohort A included 17 979 ED patients and 6477 inpatients while cohort B included 33 675 ED patients and 18 746 inpatients. Of inpatients, 9.5% in cohort A and 8.1% in cohort B received mechanical ventilation. Over 13% of inpatients were admitted to ICU. The length of hospital stay was 6 days (IQR: 3–14) for cohort A and 8 days (IQR: 3–19) for cohort B. In-hospital mortality was 15.9% and 38.8% for cohort A and B, respectively. The sensitivity for ICD-10 code U07.1 (COVID-19, virus identified) was 82.5% (81.8%–83.2%) with a PPV of 93.1% (92.6%–93.6%). The combination of U07.1 and U07.3 (multisystem inflammatory syndrome associated with COVID-19) had a sensitivity of 82.5% (81.9%–83.2%) and PPV of 92.9% (92.4%–93.4%).

Conclusions In Alberta, ICD-10 COVID-19 codes (U07.1 and U07.3) were coded well with high validity. This indicates administrative data can be used for COVID-19 research and pandemic management purposes.

  • COVID-19
  • ICD-10
  • validation
  • sensitivity
  • positive predictive value

Data availability statement

Data may be obtained from a third party and are not publicly available. Due to data sharing policies of the data custodians, the dataset is not able to be made publicly available. It may be able to be shared only to researchers in Alberta with approval from the data custodians.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This is the first endeavour to explore the validity of COVID-19-related International Classification of Diseases, 10th Revision (ICD-10) codes using both outpatients and hospitalised patients.

  • With a large population-based retrospective cohort study, the epidemiology, susceptibility and outcomes of COVID-19 were summarised, the sensitivity and positive predictive valued of ICD-10 codes were computed with data collected over an entire pandemic year.

  • Validity of ICD-10 codes for COVID-19 and their combinations was computed and stratified analysis presented by patient demographic and clinical characteristics.

  • While the study presents the sensitivity and positive predictive value, the specificity and negative predictive value could not be determined because the data cannot be used to reliably estimate the true negatives.

  • The extent to which the research findings can be generalised to other countries or healthcare settings is unknown.

Introduction

Since the declaration of a pandemic by the WHO, SARS-CoV-2 has caused 195.9 million infections and caused over 4.2 million deaths globally.1 An enormous number of research projects has been conducted to better understand the disease and its impact.2 For example, there are real-world evidence studies pertaining to the long-term effect on health of survivors,3 large-scale epidemiological studies to explore the natural history of disease outcomes,4 and population-based health services research and policy studies to explore the optimal coping strategies for future outbreaks.5 6 However, case identification of COVID-19 is the first critical step for all these initiatives.

In a quick response to the pandemic, WHO activated two emergency International Classification of Diseases, 10th Revision (ICD-10) codes for COVID-19 in February 2020, U07.1 for confirmed cases and U07.2 for suspected or probable cases (clinical or epidemiological diagnosis).7 A set of additional codes was defined later on to capture COVID-19-related information.8 To date, there is limited information on the performance of ICD-10 codes in identifying COVID-19 patients who were admitted to hospitals or visited emergency departments (EDs). Estimates of the validity of U07.1 among hospitalised patients have varied (range 49%–98%) across countries and over time.9–11 As the pandemic continues to evolve, it is important to assess the validity of ICD-10 codes using large population-based data from the past pandemic year and provide accurate algorithms to identify COVID-19 cases.

This study sought to evaluate the validity of ICD-10 codes in identifying individuals who experienced COVID-19 through population-based administrative databases with laboratory test results as reference standard.

Methods

We conducted a diagnostic coding accuracy study on a consecutive cohort of COVID-19 patients in Alberta, Canada.

Study cohort

This retrospective cohort study included all patients who were diagnosed with COVID-19 and had an ED visit or were admitted to a hospital in Alberta, Canada between 1 March 2020 and 28 February 2021. Only first records were analysed if a patient had multiple encounters in hospitalisation or ED visits.

A patient was defined as a COVID-19 case if they had an ED visit or hospitalisation that occurred between 1 day prior to, and up to 7 days after a positive SARS-CoV-2 PCR test recorded in a laboratory database. Different cut-offs for earliest and latest dates of encounters relative to the positive test date were tested, with no significant impact to any of the reported sensitivity or positive predictive value (PPV) results.

The validity of the ICD-10 codes and their combinations was calculated from the following two cohorts. Cohort A contained all positive COVID-19 cases, linked to administrative databases to calculate sensitivity. Cohort B included all patients who were assigned one of the COVID-19-related ICD codes in administrative databases, linked back to the laboratory database to determine if a positive PCR test existed, to calculate PPV.

Data sources

The data were derived from three Alberta provincial databases that cover the Alberta population: (1) Discharge Abstract Database (DAD), which contains demographic, administrative and clinical data for hospitalised patients including up to 25 ICD-10 diagnosis codes per record; (2) National Ambulatory Care Reporting System (NACRS), which captures data of hospital-based ambulatory care outpatient clinics, day surgery and ED visits and (3) Public Health Laboratory (ProvLab) database, which captures SARS-CoV-2 laboratory PCR test results and dates and was used as the reference standard. The patient personal health number, sex, and date of birth were used to conduct the data linkage. Deidentified data were received from Alberta Health Services and analysed within a secure computing environment at the University of Calgary.

ICD-10 codes of COVID-19

The Canadian Institute for Health Information (CIHI) updates COVID-19 coding directions when new codes are released by WHO. All codes8 that were used by CIHI during the pandemic were included in this study. This includes U07.1 (COVID-19, virus identified), U07.3 (Multisystem inflammatory syndrome associated with COVID-19), O98.5 (COVID-19 in pregnancy), Z03.8 (Observation for other suspected diseases), Z11.5 (Encounter for screening for other viral diseases), Z51.5 (COVID-19 in palliative care) and Z71.1 (Person with feared complaint in whom no diagnosis is made). We also assessed the validity of two combinations of codes, set 1: U07.1 and U07.3, and set 2: U07.1, U07.3, O98.5, Z03.8, Z11.5, Z51.5 and Z71.1. ICD-10 code U07.2 (virus not identified) is assigned when the patient is diagnosed, clinically or epidemiologically, with an acute infection with the COVID-19 virus, but the COVID-19 PCR lab test results are inconclusive or not available, or no test was performed.8 Since PCR lab test was used as the gold standard, it was not suitable for assessing the validity of this code. Therefore, code U07.2 was excluded from this study.

Statistical analysis

Descriptive statistics were used to report characteristics of the study cohorts. Charlson Comorbidity Index was derived from DAD and NACRS.12 Sensitivity and PPV were calculated through comparing ICD-10 codes in administrative data against the reference standard from ProvLab, and 95% binomial proportion 95% CIs were computed using the Wilson method. We estimated the overall performance of ICD-10 codes, and subgroup performance stratified by patient characteristics (eg, age group, sex, mechanical ventilation), hospital type, outcome variables (intensive care unit, ICU admission, discharge status) and seasons of pandemic for both study cohorts. All statistical analyses were performed using Python V.3 and STATA 17 software (StataCorp. 2021. Stata Statistical Software: Release 17: StataCorp.).

Patient and public involvement

Study participants and other members of the public were not involved in the design, or conduct, or reporting, or dissemination plans of the research.

Results

A total of 17 979 ED patients and 6477 inpatients were included in cohort A, and 33 675 ED patients and 18 746 inpatients were included in cohort B (table 1). Overall, compared with the hospitalised patients, ED patients were more likely to be younger (cohort A: median age 47 vs 64, cohort B: median age 43 vs 73) females (cohort A: 50.1% vs 46.1%, cohort B: 51.7% vs 48.6%). Hospitalised patients in cohort B were more likely to have cancers (36.97% vs 12.18%), particularly metastatic carcinoma (23.73% vs 3.47%), compared with cohort A. Of hospitalised patients, 9.5% in cohort A and 8.1% in cohort B received mechanical ventilation. There were about half as many flagged asymptomatic cases at the time of testing as flagged symptomatic cases in both cohorts. Of the hospitalised patients, 15.0% and 13.1% patients were admitted to ICU in cohort A and B, respectively. The length of hospital stay was 6 days (IQR: 3–14) for Cohort A and 8 days (IQR: 3–19) for cohort B. In-hospital mortality was 15.9% and 38.8% for cohort A and B, respectively. The week-by-week COVID-19-related ICD-10 code counts among inpatients and ED visits ranged from 72 counts in March to 767 in December 2020 (figure 1).

Table 1

Baseline patient characteristics

Figure 1

COVID-19-related ICD-10 code counts among inpatients and ED visits (left line chart) and new cases reported (right bar chart) between 1 March 2020 and 28 February 2021. ED, emergency department; ICD-10, International Classification of Diseases, 10th Revision.

For code U07.1, the sensitivity was 82.5% (95% CI 81.8% to 83.2%) and PPV was 93.1% (95% CI 92.6% to 93.6%) (table 2). Compared with ED visits, inpatients had higher sensitivity (94.2% vs 81.3%) and similar PPV (94.5% vs 93.3%). The combination of codes U07.1 and U07.3 for entire cohort had a sensitivity of 82.5% (95% CI 81.9% to 83.2%) and PPV of 92.9% (95% CI 92.4% to 93.4%). The combination of all related codes had a sensitivity of 84.4% (95% CI 83.8% to 85.1%) but PPV of 23.1% (95% CI 22.7% to 23.5%).

Table 2

Performance characteristics of ICD-10 among inpatients and ED visits

Stratified analysis of code U07.1 over the entire cohort shows higher sensitivity and PPV for patients aged 80 and above, patients who were admitted to ICU, ventilated patients, and inpatient survivors relative to encounters (table 3). The validity of U07.1 varied by season, with higher PPV in summer (95.8%, 95% CI 94.2% to 96.9%), and higher sensitivity in spring (86.2%, 95% CI 83.8% to 88.3%). The sensitivity and PPV were similar between symptomatic and asymptomatic patients.

Table 3

Performance of ICD-10 (U07.1) among patient subgroups tested for confirmed COVID-19

Discussion

Our study demonstrated that the ICD-10 code U07.1 for SARS-CoV-2 disease had high sensitivity and PPV. Adding other COVID-19-related codes increased the sensitivity but decreased the PPV. The sensitivity and PPV varied between outpatient and inpatient cohorts, as well as by patient characteristics.

Our findings indicated that ICD-10 code U07.1 accurately identified COVID-19 cases within the administrative database in Alberta, Canada. Recent studies from other care settings or countries evaluated the validity of code U07.1 with 3–5 months of observation.9 10 13 Our study retrospectively analysed the code validity for the past pandemic year and found that validity of administrative data in recording COVID-19 varied by seasons, as well as by patient characteristics such as age, admission to ICU, and discharge status (alive or dead).

The study cohorts A and B are similarly distributed in most aspects (proportions of inpatients, ages of ED vs inpatients and many of the comorbidities), but stark differences were observed in the frequencies of certain severe health conditions (eg, cohort B were more likely to have cerebrovascular disease and cancers). This may be because using ICD codes to identify COVID-19 patients in cohort B might be more likely to capture patients with mixed primary diagnoses, whereas using positive COVID-19 PCR test results to define subsequent in-hospital COVID-19 patients in cohort A was more likely to capture COVID-19 patients who were hospitalised primarily due to their COVID-19 diagnosis.

To the best of our knowledge, this is the first endeavour to explore the validity of COVID-19-related ICD-10 codes using both outpatients and hospitalised patients, based on our review of the literature.9–11 We analysed a large population-based database and provided robust evidence for the validity of the ICD-10 codes. Combinations of different sets of COVID-19-related ICD codes could slightly improve the sensitivity but doing so would, however, compromise the PPV. The observed sensitivity and PPV were higher in the hospitalised patient cohort compared with the ED visitors. Depending on their investigative purpose, researchers need to choose the best method for COVID-19 case identification with administrative databases.

The sensitivity and PPV of U07.1 were observed to be higher in patients aged 80 and above as well as in patients with severe health conditions or even death. A similar pattern was reported by Bhatt et al and might reflect that administrative data coding accuracy was impacted by was impacted by the likelihood of greater detail in clinical documentation with severe disease is present, as well as coder experience and expertis.9 Although it remains unclear why code validity varied throughout the pandemic, it seems reasonable that continuous monitoring of coding validity is needed.14 15

The following limitations must be considered when interpreting the research findings. First, while the study presented the sensitivity and PPV, other measures of validity such as specificity and negative predictive value could not be determined because the data could not reliably be used to estimate the true negatives. Thus, evidence on how well the ICD codes perform in excluding COVID-19 cases was not studied in this work. Second, the symptomatic flag in ProvLab is self-reported data voluntarily collected shortly after testing positive, is frequently not available, and is not updated to reflect disease progress, so the results of the corresponding stratified analysis should be interpreted with caution. Third, the PCR test for SARS-CoV-2 may not be a perfect test to constitute a gold standard; however, we chose to use it as it is widely accepted internationally, and is the most practical choice for a large-scale study. Lastly, due to the variability of coding practice and healthcare systems, the generalisability of our findings to other countries or territories or healthcare settings is unknown.

Conclusions

The validity of ICD-10 code U07.1 and U07.3 demonstrated high sensitivity and PPV in both ED visitors and hospitalised patients. This indicates administrative data in Alberta, Canada, can be used for COVID-19 research and pandemic management purposes.

Data availability statement

Data may be obtained from a third party and are not publicly available. Due to data sharing policies of the data custodians, the dataset is not able to be made publicly available. It may be able to be shared only to researchers in Alberta with approval from the data custodians.

Ethics statements

Patient consent for publication

References

Footnotes

  • Twitter @icd

  • Contributors HQ conceived this study. YX, GW and AGD’S contributed to the study design. AGD’S and EY retrieved and deidentified the data. AGD’S and GW completed the data analysis. TW, CE and DAS contributed to the data interpretation. GW drafted the manuscript and all authors contributed to the revision. All authors agreed on the final version of submission and account for all aspects of this work. GW is the guarantor and takes responsibility for this work, had access to the data and controlled the decision to publish.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.