Article Text

PDF

Validation of administrative hospital data for identifying incident pancreatic and periampullary cancer cases: a population-based study using linked cancer registry and administrative hospital data in New South Wales, Australia
  1. Nicola Creighton1,
  2. Richard Walton1,
  3. David Roder1,2,
  4. Sanchia Aranda3,
  5. David Currow1
  1. 1Cancer Institute NSW, Sydney, New South Wales, Australia
  2. 2Centre for Population Health Research, University of South Australia, Adelaide, South Australia, Australia
  3. 3Cancer Council Australia, Sydney, New South Wales, Australia
  1. Correspondence to Nicola Creighton; Nicola.Creighton{at}cancerinstitute.org.au

Abstract

Objectives Informing cancer service delivery with timely and accurate data is essential to cancer control activities and health system monitoring. This study aimed to assess the validity of ascertaining incident cases and resection use for pancreatic and periampullary cancers from linked administrative hospital data, compared with data from a cancer registry (the ‘gold standard’).

Design, setting and participants Analysis of linked statutory population-based cancer registry data and administrative hospital data for adults (aged ≥18 years) with a pancreatic or periampullary cancer case diagnosed during 2005–2009 or a hospital admission for these cancers between 2005 and 2013 in New South Wales, Australia.

Methods The sensitivity and positive predictive value (PPV) of pancreatic and periampullary cancer case ascertainment from hospital admission data were calculated for the 2005–2009 period through comparison with registry data. We examined the effect of the look-back period to distinguish incident cancer cases from prevalent cancer cases from hospital admission data using 2009 and 2013 as index years.

Results Sensitivity of case ascertainment from the hospital data was 87.5% (4322/4939), with higher sensitivity when the cancer was resected (97.9%, 715/730) and for pancreatic cancers (88.6%, 3733/4211). Sensitivity was lower in regional (83.3%) and remote (85.7%) areas, particularly in areas with interstate outflow of patients for treatment, and for cases notified to the registry by death certificate only (9.6%). The PPV for the identification of incident cases was 82.0% (4322/5272). A 2-year look-back period distinguished the majority (98%) of incident cases from prevalent cases in linked hospital data.

Conclusions Pancreatic and periampullary cancer cases and resection use can be ascertained from linked hospital admission data with sufficient validity for informing aspects of health service delivery and system-level monitoring. Limited tumour clinical information and variation in case ascertainment across population subgroups are limitations of hospital-derived cancer incidence data when compared with population cancer registries.

  • Cancer incidence
  • Registries
  • Hospital admission data
  • Administrative data
  • Sensitivity

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

Statistics from Altmetric.com

Strengths and limitations of this study

  • This study uses statutory population-based cancer registry data as a ‘gold standard’ to assess the ascertainment of cancer cases from administrative hospital data.

  • Sensitivity was examined by patient demographic and tumour characteristics to identify potential biases in case ascertainment from hospital data.

  • A limitation is that we could only identify false positives in the hospital data with a (non-pancreatic) cancer case recorded on the cancer registry, which may lead to an underestimate of the number of false positives since false positives without any invasive cancer case recorded on the registry were not identified.

  • Look-back periods of up to 5 years were examined to distinguish incident cases from prevalent cases in the hospital data.

Introduction

System-level monitoring of appropriateness and quality of cancer care is an essential part of cancer control.1 Population-based cancer registries have a key role to play in system performance reporting since they generally have a high level of completeness and accuracy of cancer case information that is obtained from multiple sources, including hospital admission and outpatient data, pathology reports and death certificates.2 Increasingly, population-based cancer registries are expanding their collection of clinical and treatment information, or are linking to other clinical or treatment databases, to gain more comprehensive data to produce performance indicators for informing health service delivery.3 A critical limitation is that the processes of receiving notifications and compiling data in a population-based cancer registry can be time-consuming, with lag times typically of 18 months or more before complete incidence data are available.4 This can reduce the utility of cancer registry data for timely feedback on health system performance.

Population-level hospital admission data can be a more timely data source for obtaining incident cancer case data and evaluating treatment outcomes. However, hospital data are usually collected for general administrative purposes and may lack the accuracy and completeness required for measuring patterns of cancer care and outcomes. Validation of the data as being ‘fit for purpose’ is therefore necessary.5 ,6 Another challenge is that the diagnosis, treatment and management of cancer can require multiple hospital admissions. As a result, there is a need to distinguish incident cases from prevalent cases. One approach is to only extract hospital admissions with both a cancer diagnosis code and procedure or treatment code for the initial diagnosis or treatment of the cancer.7 This approach may be needed where an individual cannot be identified across multiple hospital admission records. When multiple hospital admission records for an individual can be identified, the first admission with the cancer diagnosis code recorded can be used to indicate an incident case.

The ascertainment of incident cancers from hospital admission data has good validity for some cancers, such as breast cancer, with sensitivity ranging from 77% to 86% and positive predictive values (PPVs) from 86% to 93%.8–10 Lower validity has been found for other cancers, such as prostate and colorectal cancer, where treatment is more likely to be in an ambulatory setting or readmissions for prevalent cancers have not been distinguished from newly diagnosed cancer cases.8 ,11 ,12 The accuracy of case ascertainment from hospital data depends on the quality of coding and completeness of the hospital data, as well as on the patterns of hospitalisation for the particular cancer.

Pancreatic cancer has poor survival and is among the fifth most common cause of cancer death in Australia, the USA and Europe.13–15 Pancreatic cancer surgery has been the subject of system performance improvement programmes in multiple jurisdictions with the aim of improving outcomes for this cancer.16–19 Timely data were required for the development of a service delivery programme for pancreatic surgery for cancer in New South Wales (NSW), Australia. Hospital admission data have the potential to inform aspects of this programme, but there are few studies examining the accuracy of identifying pancreatic cancer cases from hospital admission data. One NSW study found good accuracy for the recording of pancreatic cancer diagnoses in hospital admission data,20 but the study was performed in only one administrative health district and it is not known whether these results are generalisable. The aim of the present study was to determine the validity of administrative hospital admission data for the ascertainment of incident cases and resection use for pancreatic and periampullary cancers by measuring the sensitivity and PPV using NSW cancer registry data as the ‘gold standard’ and to determine the look-back period required to distinguish incident cancer cases from prevalent cancer cases in linked hospital data.

Methods

Study population, design and data sources

The study population comprised NSW residents (aged ≥18 years) diagnosed with pancreatic or periampullary cancer (International Statistical Classification of Diseases and Related Health Problems, 10th Revision, Australian Modification (ICD-10-AM) C17.0, C24-25). Periampullary cancers were included because they have similar clinical presentations and surgery to pancreatic cancer.21 NSW has 7.5 million residents with the majority (4.8 million) living in the greater Sydney metropolitan area. We used linked de-identified data from the NSW Central Cancer Registry (CCR), the Admitted Patient Data Collection (APDC) and the NSW Admitted Patient, Emergency Department Attendance and Deaths Register (APEDDR).

The CCR is a statutory population-based cancer registry of all incident primary invasive cancer cases (excluding non-melanoma skin cancer) and in situ breast and melanoma neoplasms diagnosed in NSW residents (referred to here as ‘registry data’).22 Pathology laboratories, radiotherapy and medical oncology departments, hospitals, residential aged-care facilities and day procedure centres are required to notify the registry with clinical information about new cases of cancer and demographic details about the person. The CCR is extending data collection to include additional clinical and treatment information and is increasing automation to improve efficiency and, ultimately, timeliness. During this transformation process, however, routine data processes were delayed which affected availability of cancer incidence data, with 2009 the latest year available at the time of extraction.

The APDC (referred to here as ‘hospital data’) is a compilation of patient demographic, admission and discharge information, and diagnosis and procedure codes for admissions to all NSW public and private hospitals. Diseases, injuries, procedures and treatments in admitted episodes of care are coded using ICD-10-AM and the Australian Classification of Health Interventions (ACHI). The coding of diagnoses and procedures is carried out according to the Australian Coding Standards for ICD-10-AM and ACHI. This national coding framework has been in place since the late 1990s.23

The APEDDR is a statutory public health and disease register maintained by the NSW Ministry of Health. It contains linked hospital (APDC) data and is the potential data source for timelier cancer incidence data in NSW, with data available up to December 2013 at the time of extraction.

Two separate probabilistic linkages (the registry to hospital data; and the hospital data in the APEDDR) were performed by the Centre for Health Record Linkage (http://www.CHeReL.org.au) with an estimated false-positive rate of 3 per 1000 and using a best practice privacy-preserving protocol that separates personal identifiers from the analysis data sets.24 De-identified data sets were provided to the researchers by the data custodians.

Hospital coding of pancreatic and periampullary cancer—comparison to cancer registry data

We extracted admissions with a pancreatic or periampullary cancer diagnosis recorded in a primary or secondary diagnosis field between 2005 and 2009 in the linked hospital admission data of all people with a cancer case on the cancer registry between 1994 and 2009 (figure 1). We obtained from the registry month and year of cancer diagnosis, sex, age at diagnosis, remoteness of residence,25 tumour histology type, primary site, extent of disease (the furthest extent from notifications within 4 months of diagnosis26) and best basis of diagnosis (the highest level of verification from notifications within 4 months of diagnosis) for pancreatic and periampullary cancer cases diagnosed between 2005 and 2009. People who underwent pancreatectomy (ACHI block 978) were identified from the linked hospital data. Resections in the one calendar month prior to diagnosis (to allow for minor inaccuracies in the month of diagnosis) or any time after diagnosis were included.

Figure 1

Flow chart of case ascertainment from the linked registry and hospital data.

For the hospital data, we allocated a primary cancer site using diagnosis codes recorded in admissions for pancreatic and periampullary cancer. Inconsistencies in the recording of primary site codes across different admissions for the same person required the development of an algorithm to allocate site. For pancreas primary sites (C25), the most specific site in the pancreas was recorded. For example, if the site was ‘C25.9 pancreas, not otherwise specified’ in one admission and ‘C25.0 head of pancreas’ in another, then the latter more specific site was allocated. When a periampullary cancer (C17.0, C24) was recorded in one admission and a pancreatic primary site in another, the primary site was allocated as periampullary.

Ascertaining incident cases from linked hospital data

All admissions from 1 July 2000 (the earliest available data) to 31 December 2013 with a pancreatic or periampullary cancer diagnosis recorded in a primary or secondary diagnosis field were extracted from the linked hospital data in the APEDDR. Non-NSW residents and people aged <18 years at their first cancer admission were excluded.

Analysis

Linked cancer registry and hospital data

Sensitivity of case ascertainment from the hospital data was calculated as the proportion of cases on the registry with an admission with a pancreatic or periampullary cancer diagnosis recorded in the hospital data (true positives) for the 2005–2009 period. Sensitivity was calculated by patient demographics (age at diagnosis (grouped by 10-year age groups) and remoteness of residence (grouped as major city, inner regional, outer regional, remote and very remote)) and tumour characteristics (primary site, histology type, extent of disease, best basis of diagnosis, resection status) to assess if sensitivity varied by these characteristics. A PPV was estimated as the proportion of people with a pancreatic or periampullary cancer diagnosis recorded in the hospital data who had an incident case recorded on the registry for the 2005–2009 period.

We examined the diagnoses recorded for false-negative and false-positive cases in the hospital and registry data, respectively, to evaluate misclassification of cancer cases. For false negatives, we examined diagnoses in hospital admissions from the month prior to the registry diagnosis date or any time after in the 2005–2009 period. For false positives, we ascertained cases on the registry diagnosed between 2005 and 2009, except for pancreatic and periampullary cancers for which we ascertained cases on the registry diagnosed between 1994 and 2004. Exact binomial CIs were calculated for sensitivity and PPV (SAS/STAT V.12.1, SAS Institute, Cary, North Carolina, USA).

Linked hospital data

We examined the effect of a look-back period to distinguish incident cases from prevalent cases ascertained from the hospital data in the APEDDR by calculating the number of people with a ‘first’ admission for pancreatic or periampullary cancer with look-back periods increasing by increments of 6 months up to a period of 5 years using 2009 and 2013 as index years.

Results

Accuracy of hospital diagnoses

Overall, the sensitivity of case ascertainment of registry recorded cases of pancreatic and periampullary cancer cases for the 2005–2009 period from the hospital data was 87.5% (4322/4939) and was highest (97.9%, 715/730) for people who underwent resection (table 1). The lowest sensitivity (9.6%, 18/187) was for cases notified to the registry by death certificate only. Most cases (71.1%, 133/187) with death certificate-only notifications were among people 80 years or older, and this age group had lower sensitivity (83.2%, 1271/1528) for hospital ascertainment than younger age groups. Lower sensitivity was observed for people residing outside of major cities, with inner and outer regional areas having the lowest sensitivity (83.3%). Further exploration of the geographic variation in case ascertainment found that sensitivity ranged from 55.0% to 92.6% across regional and remote administrative health districts (not shown).

Table 1

Sensitivity of the ascertainment of incident cancer cases on the registry from hospital data by demographic and tumour characteristics, 2005–2009

Sensitivity was highest for localised (95.9%, 805/839) cancers, followed by cancers with regional (93.3%, 846/907) and distant (89.5%, 1863/2082) extent of disease and was lowest for cases where the extent was recorded as unknown on the registry (72.7%, 808/1111). Across primary tumour sites, sensitivity was higher for pancreatic (88.6%, 3733/4211) and duodenal cancers (87.1%, 162/186) compared with extrahepatic bile duct and ampullary cancers (78.8%, 427/542), particularly those of the cholangiocarcinoma histology type (67.9%, 131/193).

Of the cases on the registry without an admission with pancreatic or periampullary cancer recorded (false negatives, n=617), 42.9% (n=265) of people did not have a hospital admission in the month prior to diagnosis or any time after in the 2005–2009 period (table 2). The number of false negatives was affected by only including admissions up to the end of 2009, with the sensitivity of case ascertainment for cases diagnosed in 2009 lower (85.6%, 860/1005) than the preceding years. People diagnosed in 2009 who had their first admission for cancer in 2010 would be classified as false negatives. Of false-negative cases with an admission (n=352), diagnoses recorded in hospital admissions included cancer of ill-defined or unspecified site (22.7%, 80/352), intrahepatic bile duct carcinoma (13.9%, 49/352) and non-cancer diagnoses (table 2).

Table 2

Diagnoses (International Statistical Classification of Diseases and Related Health Problems, 10th Revision, Australian Modification, ICD-10-AM) recorded for false negatives and false positives, 2005–2009

A total of 5272 people had an admission with pancreatic or periampullary cancer recorded in the linked hospital to registry data in the 2005–2009 period, giving a PPV of 82.0% (4322/5272; 95% CL 80.9% to 83.0%). Of the false positives (n=950), 50.5% (n=480) had a pancreatic or periampullary cancer case recorded on the registry diagnosed prior to 2005 (table 2). For these cases, the pancreatic or periampullary cancer diagnoses recorded on the hospital admission was correct; however, the cases were prevalent cases rather than incident cases in the 2005–2009 period. Of the other false positives, one-fifth (22.8%, 107/470) of people had an intrahepatic bile duct carcinoma and another fifth (21.1%, 99/470) had an ill-defined or unspecified primary site cancer case recorded on the registry. Nine false positives (1.9%, 9/470) had a pancreatectomy recorded. These estimates of false positives in the hospital data only include people with a linked invasive cancer case in the registry data; therefore, the number of false positives may be underestimated.

We compared the classification of the primary site allocated from the hospital data with the site recorded on the registry (table 3). One per cent (41/3945) of cases allocated as a pancreatic cancer using hospital diagnoses were true periampullary cancer cases, whereas 12.5% (106/847) of cases allocated as periampullary using hospital diagnoses were true pancreatic cancer cases. The misclassification of periampullary cancers was greater since any person with an admission with a periampullary cancer diagnosis recorded was allocated as periampullary regardless of if they also had a pancreatic cancer diagnosis recorded in another admission.

Table 3

Primary site (International Statistical Classification of Diseases and Related Health Problems, 10th Revision, Australian Modification, ICD-10-AM) of registry and hospital cases, 2005–2009

Ascertaining incident cases from hospital data

Around one-quarter of cases ascertained from the linked hospital data (APEDDR) were misclassified as an incident case without any look-back period (figure 2). Two years of look-back was sufficient to distinguish the majority (98%) of incident cases from prevalent cases, which was similar for pancreatic and periampullary cancers or pancreatic cancers only. For example, 1.8% (21/1193) of people classified as having an ‘incident’ pancreatic or periampullary cancer case in 2013, using a 2-year look-back period, were identified as prevalent cases when a 5-year look-back period was used. The number of cases ascertained from the APEDDR for the 2005–2009 period, using all available data to July 2000 as a look-back period, was 4970, which is within 1% of the number of cases recorded on the registry for this period (n=4939).

Figure 2

Effect of a look-back period on the estimates of incident cases from hospital admission data, 2009 and 2013 index years.

Discussion

We found sensitivity for the ascertainment of pancreatic and periampullary cancer cases from the hospital data of 87.5% for the 2005–2009 period. The accuracy of hospital coding varied by tumour primary site and histology, with higher sensitivity of case ascertainment for pancreatic (88.6%) and duodenal cancers (87.1%) compared with extrahepatic bile duct and ampullary cancers (78.8%) and with lower sensitivity for extrahepatic cholangiocarcinomas (67.9%). Misclassification of pancreatic and periampullary cancers in the hospital data was often to closely related sites, for example, intrahepatic bile duct carcinoma, or to less specific sites such as cancers of ill-defined or unspecified primary sites. Whereas hospital coders might only have information from a particular admission available, coders at a cancer registry often have multiple sources of information from diagnostic procedures and treatment which can enable more accurate coding of tumour characteristics.

Several aspects of the coverage of the hospital data affected the ascertainment of cancer cases. In particular, only admissions in NSW hospitals were available. Variation in sensitivity across regional and remote administrative health areas most likely reflects patterns of interstate patient outflows since cases will not be ascertained from NSW hospital data for people who have all of their inpatient treatment outside of NSW. The population-level coverage of the hospital data must be assessed with geographic areas with insufficient coverage unsuitable for cancer case ascertainment from hospital data. Conversely, the NSW cancer registry achieves complete population coverage since NSW residents treated in other jurisdictions are notified to the NSW cancer registry by the statutory population-based cancer registry in that jurisdiction, with Australia having full population coverage by statutory cancer registries. Another factor affecting case ascertainment is the inclusion in the registry of cases notified by death certificate only, where the sensitivity of ascertainment from hospital data was low (9.6%). The accuracy of diagnostic information may be lower for cases when a death certificate is the only source of information, but nevertheless they are recorded by population-based cancer registries to capture comprehensive incidence data for a population.26

Pancreatic and periampullary cancers are good candidates for ascertainment from hospital admission data since diagnosis, treatment or symptom management is likely to require hospitalisation during the course of the disease. Sensitivity was highest (97.9%) for the minority of people (15%) who underwent curative resection. Hospitalisation to relieve biliary or gastric obstruction by stenting or bypass surgery or to manage pain and nutrition is commonly required in the management of these cancers.27 ,28 Chemotherapy, which is indicated for people with unresectable disease,28 is mostly delivered in an outpatient setting in NSW and therefore is largely not captured in hospital admission data. This may lead to underascertainment of cases from hospital admission data when chemotherapy is the only therapy required for management of the cancer. Multiple hospitalisations for the management of these cancers mean that prevalent cases need to be distinguished from incident cases in the hospital data. The PPV estimated in this study (82.0%) was impacted by this pattern of multiple hospitalisations. On examination, half of the false positives in the hospital data were true pancreatic or periampullary cancer cases but were people with a prevalent rather than an incident case for the 2005–2009 period. The PPV could be improved by the use of a minimum 2-year look-back period to identify a person's first admission for cancer, which we found was sufficient to distinguish the majority of incident cases from prevalent cases.

We were able to calculate a PPV; however, it may be an underestimate since we could only identify people with a hospital admission for pancreatic or periampullary cancer who had a cancer case recorded on the NSW cancer registry between 1994 and 2009. Therefore, false positives in the hospital data without any cancer registry-recorded case were not ascertained, which is a limitation of our study. We do not expect the underestimation of false positives to be substantial, however, since the number of incident cases ascertained from the APEDDR was similar to the number of cases recorded on the registry for the 2005–2009 period.

Few studies have examined the validity of ascertaining pancreatic cancer cases from hospital data, with none examining periampullary cancers. Our study compares favourably to a study in the USA which reported sensitivity for pancreas cancer of up to 86% using inpatient Medicare claims data.29 A NSW study reported sensitivity for pancreas cancer of 94.7% and a PPV of 80.9% for admission data from public hospitals in one administrative health district.20 A strength of our study was that it used population-based registry data as the ‘gold standard’ and included both public and private hospital data. Without coverage of public and private hospitals, we expect there would have been substantial underascertainment of cases, as was found by another study.11 Our study is relevant to jurisdictions with administrative hospital data with population coverage and standardised coding in place. Incident breast, colorectal and lung cancer case data obtained from administrative hospital records have been found to be of sufficient quality for informing health services research in other jurisdictions.8 ,9 ,12

Internationally, pancreatic cancer has been the subject of health system performance improvement programmes16–19 since studies have identified underuse of curative resection30 ,31 and variation across hospitals in morbidity, mortality and survival outcomes following surgery, particularly in relation to hospital volume.32 Programmes have generally established recommended minimum hospital volumes with the aim of increasing access to expert multidisciplinary care and improving patient outcomes. Measuring if these minimum volumes are met, changes to the per cent of people receiving curative surgery and changes in outcomes are key components of monitoring the implementation of these programmes.17–19 Our study demonstrates that NSW hospital data are of sufficient quality to inform aspects of the development and monitoring of a service improvement programme for pancreatic surgery for cancer. For example, the data are adequate for measurement of hospital volume of pancreatic cancer surgery, overall postsurgical outcomes (when linked to death registry data) and resection rates for population groups with good coverage in the hospital data. Hospital admission data, however, lack a date of cancer diagnosis and detailed clinical information, such as tumour size and vascular involvement, which are required to measure postdiagnosis survival and perform risk adjustment of outcomes with minimum residual confounding for case complexity.

Conclusion

Pancreatic and periampullary cancer cases can be ascertained from hospital admission data where coding standards are applied and there is population coverage. The pattern of hospitalisation for these cancers mean that linked hospital data, in which multiple admissions for the same person can be identified, and a sufficient look-back period are required to distinguish incident cases from prevalent cases. Our study indicates that hospital-derived case and resection data have sufficient validity to inform aspects of health system performance planning and monitoring for pancreatic cancer. However, case ascertainment differs across population subgroups and the clinical variables are limited. Cancer registry data with population-level coverage and clinical information are required for some health system performance measures.

Acknowledgments

The authors would like to acknowledge the New South Wales (NSW) Ministry of Health for access to the population health data and the Centre for Health Record Linkage for linking the data sets. The Admitted Patient, Emergency Department Attendance and Deaths Register was accessed via Secure Analytics for Population Health Research and Intelligence.

References

View Abstract

Footnotes

  • Contributors DC and SA originated the idea for the study. NC developed the study design, conducted the data analysis and drafted the manuscript. RW and DR critically reviewed the study design. All authors commented critically on the analysis and drafts of the manuscript.

  • Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Ethics approval The linkage of the NSW Central Cancer Registry and NSW Admitted Patient Data Collection was performed with ethical approval from the NSW Population and Health Services Research Ethics Committee (HREC/12/CIPHS/58).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.