Objective To define the accuracy of administrative datasets to identify primary diagnoses of breast cancer based on the International Classification of Diseases (ICD) 9th or 10th revision codes.
Design Systematic review.
Data sources: MEDLINE, EMBASE, Web of Science and the Cochrane Library (April 2017).
Eligibility criteria The inclusion criteria were: (a) the presence of a reference standard; (b) the presence of at least one accuracy test measure (eg, sensitivity) and (c) the use of an administrative database.
Data extraction Eligible studies were selected and data extracted independently by two reviewers; quality was assessed using the Standards for Reporting of Diagnostic accuracy criteria.
Data analysis Extracted data were synthesised using a narrative approach.
Results From 2929 records screened 21 studies were included (data collection period between 1977 and 2011). Eighteen studies evaluated ICD-9 codes (11 of which assessed both invasive breast cancer (code 174.x) and carcinoma in situ (ICD-9 233.0)); three studies evaluated invasive breast cancer-related ICD-10 codes. All studies except one considered incident cases.
The initial algorithm results were: sensitivity ≥80% in 11 of 17 studies (range 57%–99%); positive predictive value was ≥83% in 14 of 19 studies (range 15%–98%) and specificity ≥98% in 8 studies. The combination of the breast cancer diagnosis with surgical procedures, chemoradiation or radiation therapy, outpatient data or physician claim may enhance the accuracy of the algorithms in some but not all circumstances. Accuracy for breast cancer based on outpatient or physician’s data only or breast cancer diagnosis in secondary position diagnosis resulted low.
Conclusion Based on the retrieved evidence, administrative databases can be employed to identify primary breast cancer. The best algorithm suggested is ICD-9 or ICD-10 codes located in primary position.
Trial registration number CRD42015026881.
- breast cancer
- administrative database
- sensitivity and specificity
- systematic review
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
Based on a prepublished protocol, this is the first review that systematically addressed the accuracy of administrative databases in identifying subjects with breast cancer.
We performed a comprehensive electronic databases search complemented with reference check of relevant articles, and we evaluated the quality of reporting of included studies by the Standards for Reporting of Diagnostic checklist.
We considered only papers written in English and this might have introduced a language bias.
The knowledge and experience of the International Classification of Diseases (ICD)-9/ICD-10 coders could have influenced the quality of breast cancer case definition in each study, and consequently the results presented in our review could be biased by this factor.
Generalisability of validated administrative databases is limited to the context in which they are generated.
As breast cancer is the most common cancer and the leading cause of cancer death in women,3 knowledge of its epidemiology and the ability to monitor related outcomes over time is important for health planning services. Administrative healthcare databases are increasingly being used in oncology for epidemiological evaluation,4 population outcome research,5 drug utilisation reviews,6–8 evaluation of health service delivery and quality9 10 as well as health policy development.11–13 Generally, these databases gather longitudinal information concerning health resource utilisation regarding hospitalisations, outpatient care and, often, drug prescriptions and vital statistics.14 In other words, these databases provide a readily available source of ‘real-world’ data on a large population of unselected patients allowing the performance of less expensive and more representative assessment of disease surveillance and outcome research compared with randomised trials.15 16
By definition, administrative healthcare databases contain data that are routinely and passively collected without an a priori research question, as they are usually established for billing or, in general, for administrative purposes, and not for research uses. Hence, the diagnostic codes used to identify, for example, cancers, must be validated according to an accepted ‘reference standard’ diagnosis.17 In validation studies of administrative databases, the reference standard usually used is the clinical chart or cancer registry.18
The current International Classification of Diseases, 9th revision, (ICD-9) codes are 233.0 for breast carcinoma in situ and 174.0–174.9 for invasive breast cancer, whereas the ICD-10 codes are D05.0-D05.9 and C50.0-C50.9, respectively. These codes help to identify subjects that have breast cancer within an administrative healthcare database. Since the clinical diagnosis of breast cancer is based on a combination of clinical and/or instrumental examinations and a pathological assessment,19 these codes are limited in confirming whether a specific subject within the databases truly has the disease of interest. As a result, researchers have proposed a number of different claim-based algorithms for case identification of breast cancers, such as a combination of healthcare claims data,20 the use of chemotherapy21 and the number of medical claims on separate dates.11 In addition, since patients with metastatic cancer have different prognoses and typically different treatment patterns to those with earlier-stage malignancies, researchers suggest using algorithms to identify patients with metastatic cancer.11 22
To our knowledge, data on the validity of breast cancer diagnosis codes have not been synthesised in the medical literature. Our objective was to determine the best algorithms with which to identify breast cancer cases using administrative databases based on a comprehensive systematic search of primary studies that validated ICD-9 or ICD-10 codes related to breast cancer. The present work has been conceived within a project of validating three large administrative healthcare databases in Italy concerning ICD-9-CM codes for breast, colorectal and lung cancers.23–25 For our purposes, it was important to identify all available case definitions or algorithms that best identify subjects with the cancer diseases of interest as outlined in our protocol.26
This study is part of different projects supported by national and local funding with the objectives of assessing case definitions of diseases as well as validating ICD-9 codes for cancer23 26 and other diseases.27–29
As outlined in the protocol,26 the target population consisted of patients with primary diagnosis of breast cancer, the index test was represented by administrative data algorithms related to breast cancer, the reference standard was represented by medical charts, validated electronic health records or cancer registries.
Comprehensive searches of MEDLINE, EMBASE, Web of Science and the Cochrane Library from their inception to April 2017 were performed to identify published peer-reviewed literature. We developed a search strategy based on the combination of: (a) keywords and Medical Subject Heading (MeSH) terms to identify records concerning breast cancer; (b) terms to identify studies likely to contain validity or accuracy measures and (c) a search strategy designed to capture studies that used healthcare administrative databases based on the combination of terms used by Benchimol et al 30 and the Mini-Sentinel’s program.31 32 The developed search strategy is reported in the online supplementary file 1. To retrieve additional articles, the authors searched relevant reference lists of key articles. Titles and abstracts were screened for eligibility by two independent reviewers. Discrepancies were solved by discussion.
Supplementary file 1
This systematic review was prepared according to the Preferred Reporting Items for Systematic reviews and Meta-Analysis Protocols (PRISMA-P) 2015 Statement33 and the results were presented following the PRISMA flow diagram (figure 1).34 A protocol of this review has also been published at the BMJ Open 26 as well as an outline in the PROSPERO International Prospective Register of systematic reviews with registration number CRD42015026881 (http://www.crd.york.ac.uk/PROSPERO).
Full texts of eligible peer-reviewed articles without publication date restriction, published in English that used administrative data for the ICD-9 or ICD-10 codes related to breast cancer diagnoses were obtained. For each study, the following inclusion criteria were applied: (a) the presence of a reference standard (clinical chart, cancer registry or electronic health records), together with the presence of any case definition or algorithm for breast cancer; (b) the presence of at least one test measure (eg, sensitivity, positive predictive value (PPV), etc); (c) the data source was from an administrative database (ie, a database in which data are routinely and passively collected without an a priori research question) and (d) the study database was from a representative sample of the general population.
We aimed to focus on primary diagnosis of breast cancers, hence studies that considered algorithms to identify cancer history, cancer progression or recurrence were not evaluated.
In addition, studies that considered index test databases that were not truly administrative (eg, cancer registries, epidemiology surveillance systems, etc) were excluded. However, studies that used electronic health records to validate breast cancer were also included.35 36
After screening titles and abstracts, we subsequently obtained full texts of eligible articles to determine if they meet the inclusion and exclusion criteria. We conducted data abstraction using standardised data collection forms that were tested on a sample of three eligible articles. Two review authors working independently and in duplicate were involved in titles and abstracts screening, full-texts screening and data abstraction (FC, MO, AG, VS). Discrepancies were resolved by consensus, and where necessary with the involvement of a third review author (IA). Calibration exercises were performed at each level of the process.
Data extraction included the following information: the details of the included study (including title, year and journal of publication, country of origin and sources of funding; the type of disease (invasive, in situ or both); the target population from which the administrative data were collected; the type of administrative database used (eg, hospitalisation discharge data), outpatient records (eg, physician billing claims); the ICD-9 or ICD-10 codes used or the administrative data algorithms tested (including Current Procedural Terminology; prescription fills, etc); the position of the ICD codes in the discharge abstract database (ICD codes in primary position indicate the principal diagnosis, that is, the condition identified at the end of the admission, which is the main cause of the need for treatment or diagnostic investigations; ICD codes in secondary positions refer to secondary diagnoses, that is, conditions that coexist at the time of the admission or which develop after that time and which influence the treatment received and/or the length of hospital stay); the modality of development of the algorithm (eg, using Classification and Regression Trees, logistic regression, expert opinion, etc); external validation; use of training and testing cohorts; the reference standard used to determine the validity of the diagnostic codes (eg, medical chart review, patient self-reports, cancer registry, etc); the characteristic of the test used to determine the validity of the diagnostic code or algorithm (eg, sensitivity, specificity, PPVs and negative predictive values (NPVs), area under the receiver operating characteristic curve, likelihood ratios and kappa statistics).
The design and methods of the included primary studies were assessed using a checklist developed by Benchimol et al,30 based on the criteria published by the Standards for Reporting of Diagnostic accuracy (STARD) initiative for the accurate reporting of studies using diagnostic studies.37 The checklist is provided in online supplementary file 2. The presence of potential biases within the studies were reported in a descriptive way.
Supplementary file 2
For each algorithm, we abstracted the performance statistics provided in the included studies including sensitivity, specificity, PPV and NPV. Where necessary, we calculated validation statistics together with their 95% CIs as far as raw numbers for cases and controls were provided.
Patient and public involvement
Patients and the public were not directly involved. This was a retrospective study based on the consultation of electronic medical literature.
After removing duplicate records identified through MEDLINE, EMBASE, Web of Science and The Cochrane Library, 2929 citations were screened in titles and abstracts. Overall, we assessed 41 full-text articles for eligibility, of which 1710 38–53 were included in the final evaluation. In addition, a reference check of pertinent articles permitted the identification of five potentially relevant studies of which four were included in the final analysis54–57 (figure 1). The list of excluded studies, together with the reasons of their exclusion is reported in the online supplementary file 3.
Supplementary file 3
The included studies were published between 1992 and 2015 and collected data between 1977 and 2011. Fifteen studies were performed in the USA,39 41 42 44–47 51–53 55–59 two were conducted in Italy,10 38 two in France,40 54 one in Japan49 and one in Australia.43 Seventeen studies used Cancer Registry data as the reference standard,38–40 42 43 45–47 49–57 60 four studies used medical chart review.45 50 51 57
Eighteen studies10 38 39 41 42 44–48 50–57 evaluated ICD-9 codes, and three studies40 43 49 evaluated ICD-10 codes. Of the studies that evaluated ICD-9 codes, 11 reported the evaluation of the ICD-9 codes related to invasive breast cancer (code 174.x) and carcinoma in situ (ICD-9 233.0)10 38 42 44–47 50–53; three evaluated only invasive cancer (ICD-9 174.x)39 41 55; three studies did not specify the number of the ICD-9 codes evaluated.54 56 57 Three studies evaluated ICD-10 codes for invasive breast cancer without evaluating carcinoma in situ codes.40 43 49
In terms of representativeness or generalisability, the studies varied greatly. Eight studies considered all women beneficiaries of the Medicare programme, USA), age 65 years or above residing in specific areas,39 42 46 51–53 55 56 nine studies considered all women with any age38 49 or aged 15+54 or 20+10 38 40 45 50 57 in specific areas, two studies considered all women aged 40+44 or 45+43 residing in specific areas and three studies randomly sampled residents at a national level,41 or residents at regional level.47 59
Basic characteristics of these studies are displayed in table 1.
Validity of breast cancer data
Accuracy results by initial algorithms
All the studies considered new (incident) breast cancer cases except Fisher et al.41 Nineteen studies presented the initial accuracy results based on breast cancer diagnosis only; in 18 studies the diagnosis was in primary position,10 38–50 52–54 56 in 1 study in any position55 and in 2 studies the position was unclear,51 57 whereas 2 studies evaluated breast cancer diagnosis with surgical procedures.10 38
Sensitivity was reported by 17 studies, and was at least 80% in 65% (n=11) of them10 41 43 44 46 47 49 53–55 57 (range 57%–99%). PPV, obtained from 19 studies, was ≥83% in the majority (n=14) of them (range 15%–98%). Specificities resulted higher than 98% in all the eight studies that provided sufficient data to permit calculation.10 40 43 47 49 52 54 56 Similarly, the NPV for the five studies for which it was possible to calculate was ≥99%.40 43 52 54 56 Table 2 displays the results of the algorithm with which the studies presented their initial data stratified by ICD codes.
Accuracy results by combinations of diagnosis and surgical procedures
Twelve studies reported validation results using algorithms with different combinations.10 38 40 43–45 49–52 54 56 All algorithms, except in two studies,10 38 started evaluating basic breast cancer codes and progressively added surgical procedures, secondary diagnosis, chemotherapy and/or radiotherapy. The addition of one or more of these items to the algorithms produced different results over the basic accuracy results obtained with the use of the diagnosis code alone. The addition of excision to the incident diagnosis of invasive cancer codes did not add any value to the PPV in the studies by Solin et al 50 (88% vs 89%), Leung et al 45 (83% vs 84%) and Kemp et al 43; conversely, in the study by Solin et al 51 while in the first algorithm there were no improvements between the new diagnosis and the addition of the excision (83% vs 84%), using the best algorithm set, the PPV rose from 84% to 92% when excision was added to the basic breast diagnosis code. In the study by Koroukian et al,44 the addition of mastectomy or mastectomy/lumpectomy significantly raised the PPV from 15% to 84% and 87%, respectively. In the study by Kemp et al, 43 the sensitivity and PPV values of breast cancer diagnosis remained substantially unchanged with the addition of mastectomy, lumpectomy or both (PPV 86% vs 89%). Setoguchi et al 56 proposed four algorithms: the algorithm based on one or more diagnoses of breast cancer generated a sensitivity of 87% and a PPV of 50%; the addition of any surgical procedure lowered the sensitivity to 46% but enhanced the PPV to 82%. In the study by Sato et al,49 the addition of any code related to breast cancer, marker tests, surgical procedures, chemotherapy treatment or radiation therapy did not affect the sensitivity (that resulted high: 98%) but raised the PPV from 66% to 83%. Similarly in the study by Ganry et al,54 the addition of any breast or lymph nodal surgical procedures enhanced the PPV from 91% to 98%. Table 3 shows the sensitivities and PPVs with the respective CIs of the studies that in addition to accuracy measures of breast cancer diagnosis also reported accuracy data of surgical procedures.
Accuracy results by combinations of diagnosis and surgical procedures followed by chemoradiation or radiation therapy
Six studies added chemotherapy or radiation therapy procedures to their algorithm.44 45 49–51 54 Compared with the initial algorithm with only the diagnosis of breast cancer, the PPV value increased in all instances to values higher than 94% in four studies.45 50 51 54 However, in all studies except one the algorithms contained surgical procedures. Table 4 displays the sensitivity and PPV values for the studies that combined chemoradiation or radiation therapy procedures with diagnosis of breast cancer.
Accuracy results based on the position of the diagnosis
Three studies provided results based on the position of the diagnosis.41 52 54 Fisher et al 41 provided sensitivity and PPV for breast cancer diagnosis in any position and it resulted in similar results in the primary position (sensitivity 97% and PPV 84% in any position; sensitivity 96% and PPV 88% in the primary position). Accuracy results for secondary position breast cancer diagnosis was provided by two studies and the estimates resulted lower than the accuracy results for diagnosis in primary position in the studies by Ganry et al 54 and Warren et al.52 PPVs were 26% and 65% in secondary position against 91% and 91% in primary position, respectively (see online supplementary file 4, eTable 1).
Supplementary file 4
Accuracy results based on outpatient or physician’s data
Only two studies assessed the accuracy of breast cancer diagnosis codes based on outpatient or physician’s records42 52; the other 19 studies considered inpatient data alone or in combination with other types of data (short procedure unit stays, professional services, prescription medicines claims, etc). For the physician’s mammography and laboratory data, the sensitivities resulted 87% in both cases but with very low corresponding PPV values (0% and 15%, respectively).42 The remaining cases concerning biopsy, surgical procedures, nodal dissection in the physician records or outpatient records showed very low PPVs (see online supplementary file 4, eTable 2).
Stratified analysis by administrative data source, type of ICD code, country of origin and publication year
Accuracy data stratified by setting of diagnosis showed that outpatient accuracy data were much lower than diagnosis in primary position, although the outpatient accuracy data were reported by only one study.42 In terms of codes, both ICD-9 and ICD-10 showed significant variation in both sensitivity and PPV. In terms of country of origin, most of the studies were conducted in the USA where the variability of the accuracy results showed important variation. The studies conducted in Italy10 38 and France40 54 showed similar ranges of sensitivities, whereas the studies conducted in Italy performed better in terms of PPVs than any other country. Accuracy results of the initial algorithm did not change over time and the range of sensitivities remained similar between the studies published before 2001 compared with the studies published after 2000. PPVs estimates remained also similar provided one outlier, that is, the study by Koroukian et al,44 is excluded. Table 5 shows ranges of sensitivities and PPVs stratified by administrative data source, type of ICD code, country of origin and publication year.
Quality of the studies
All the studies explicitly reported their intention to evaluate the accuracy of the administrative database and described validation cohort, age, disease and location of participants. All the studies reported inclusion criteria and only seven45 49–51 55 57 59 (33%) did not report exclusion criteria, and three52 53 56 (14%) did not report any description regarding the patient sampling method. In terms of the methodology used, all the studies described the methods used to calculate diagnostic accuracy, none of the studies described number, training and expertise of persons reading reference standards; none of the studies reported the consistency and the number of persons involved in reading reference standards and, of the studies that used the medical chart review as the reference standard only one41 reported the blinding of the interpreters. In terms of statistical methods, all the studies except one55 described adequately the statistics used to obtain accuracy. None of the studies reported at least four estimates of diagnostic accuracy. The most common statistics used to estimate diagnostic accuracy were sensitivity in 17 studies10 38–41 43 44 46 47 49 52–57 59 (81%), PPV in 19 studies10 38 40–47 49–56 59 (90%) and specificity in 9 studies10 40 43 47 49 52–54 56 (43%); 10 studies10 39 44 46 47 49 50 52 53 57 (48%) reported accuracy results for subgroups; and only 6 studies10 38 41 46 47 49 (29%) reported CIs (see online supplementary file 2).
Summary of findings
To our knowledge, this is the first review that systematically addressed the validity of algorithms related to breast cancer diseases in administrative databases. Using several medical literature databases, we have identified a significant number of validation studies related to breast cancer disease. Because of the heterogeneity of the results due to the different settings of each included study, we decided to present them in a descriptive manner rather than aggregate them by means of a meta-analysis. Findings from this review suggest that algorithms based on ICD-9 or ICD-10 codes related to breast cancer are accurate in identifying subjects with invasive breast cancer when the diagnosis is in primary position and the algorithm is based on incident cases. Sixty-seven per cent of the studies reported sensitivities or PPVs higher than 80% for inpatient breast cancer code at the initial presentation. The addition of other fields such as surgical procedures, chemotherapy or radiation therapy, outpatient data and physician claims may improve the accuracy results but depend on the accuracy measure used. Breast cancer codes in secondary position yielded lower accuracy values.
Quality of primary studies and heterogeneity
The overall quality of the studies included in our review was judged quite good. There are only some concerns about the items of the modified STARD checklist related to the description of data collection (who identified the patients, who collected data and whether the authors used an a priori data collection form) of which, evaluating the primary studies, we were not able to find these descriptions in the text. However, we do not think that this could significantly affect the results of our review, as it was common to all the studies and it could be related in general to the peculiarity of the studies validating administrative databases that are substantially different from the typical diagnostic accuracy studies.
Regarding reference standard, most of the studies used cancer registries and the confirmation of the cancer disease was based on the presence of the corresponding code within the registry. When medical charts were used as a reference standard, in three studies the diagnosis of carcinoma of the breast was confirmed when there was evidence of a histological documentation of ‘invasive carcinoma of the breast’, ‘intraductal carcinoma (ductal carcinoma in situ) of the breast’ or ‘Paget’s disease of the breast’,45 50 51 whereas Fisher et al 41 reported that ’accredited records technicians, blinded to the coding in the original records, reviewed the medical records, selected the supportable diagnoses and procedures and translated them into ICD-9-CM’.
The included studies differed in the geographical area, temporal period, healthcare system, reference standard considered and other factors, and this heterogeneity could explain the variability of the diagnostic accuracy measures. Because of the heterogeneity of the results due to the different settings of each included study, we decided to present them in a descriptive manner rather than aggregate them by means of a meta-analysis.
Prioritising accuracy measures
In assessing the validity of healthcare databases, researchers will need to weigh the relative importance of epidemiological measures and prioritise the accuracy measure that is most important to a particular study. As pointed out by Chuback et al,35 for example, contrary to PPV estimates, sensitivity and specificity do not depend on the disease prevalence but can vary across populations. Privileging sensitivity to specificity is relevant in a scenario where identifying all cases with the characteristic of interest is important rather than only those with severe disease characteristics. PPV is generally preferred when one wants to ensure that only the subjects who truly have the condition of interest are included in the study. In our assessment, 19 studies provided PPV measures, 15 of which also measured sensitivity.10 38 40–44 46–49 53–55 Most of these performed an accuracy assessment with the intent to define the incidence of breast or other cancer diseases. Several authors included different variables in their algorithm, including surgical, chemoradiation or radiation treatment, outpatient care, such as physician’s claims in order to obtain algorithms with a balanced value between PPV and sensitivity. While three studies obtained similar values between sensitivity and PPV,40 43 55 in six studies10 38 42 46 48 54 the initial PPVs were higher than sensitivity and most of these studies had the priority of estimating the incidence of breast cancer disease. Sato et al 49 attributed the gain of optimal sensitivity (90%) and PPV (99%) to the use of both inpatient and outpatient claims data. The authors argue that they might have obtained high sensitivity but low PPV if they had used the outpatient database alone. In two Italian validation studies10 38 of regional administrative databases, the combination of hospital diagnosis together with surgical procedures accurately identified the majority of cases in the cancer registry (PPV 90% and 91%, respectively). In other circumstances, the aims of the studies were substantially methodological. Freeman et al,42 who aimed at obtaining the optimal combination of predictors, used a logistic regression model using 1992 data from the linked SEER registries. The authors were able to obtain a high sensitivity (90%) with the use of three kinds of claims data (inpatients, outpatients and physician services), but with a loss of PPV (70%) that was probably due to limitations in distinguishing recurrent and secondary cancers. Cooper et al 39 only investigated the sensitivity of diagnostic and procedural coding for case ascertainment of breast and five other cancer diseases. The authors used two sets of analyses: the first set was the inpatient Medicare claims, which include diagnoses and procedures ICD-9 codes; the second set was the part B claims, which include physician and outpatient data. The first set considered the sensitivity of inpatient first position (68.3%) and the increase in sensitivity provided by including additional fields (other diagnosis (76.1%), surgical (79.1%), part B first position (91%), part B other diagnosis (93.1%), part B surgical code (93.6%)). In the second set of analyses, they considered the sensitivity of part B first position (66%) and included the following additional fields: part B other diagnosis (76.9%), part B surgical (80.9%), inpatient first position (90.9%), inpatient other diagnosis (93%) and inpatient surgical (93.6%).
Conversely, the aim of Nattinger et al 47 was to maintain a high specificity and they proposed a four-step algorithm to identify women with surgically treated incident breast cancer that was applied in both a validation set and a training set. For their objective, they considered cases treated in the ambulatory surgical setting as well as prevalent cases. The authors were able to obtain high specificity (99.9%) with a decrease in sensitivity from 85% to 80% but with a good PPV performance (range 89%–93%). As recognised by the authors, the algorithm may have little usefulness in determining the incidence of breast cancer, but it may be much more relevant for outcome classification.35
Strengths and limitations
Our strengths include the use of comprehensive electronic databases with reference checks of relevant reviews of articles, the use of STARD criteria to assess the quality of reporting of primary studies, transparency based on prepublication of a protocol (online supplementary file 5), the use of detailed and explicit eligibility criteria and the use of duplicate and independent processes for study selection, data abstraction and data interpretation.
Supplementary file 5
We must acknowledge that our assessment was focused on primary breast cancer and we did not take into account diagnosis of metastases due to breast cancer. A recent study that evaluated the accuracy of ICD-9 codes in identifying metastatic breast and other cancer diseases found that the performance of the metastases codes from Medicare claims data compared with the gold standard of SEER stage was poor and never exceeded 80% for any of the accuracy measures for any stage for any cancer metastatic disease.60 Other studies reported similar low values of accuracy and this may misclassify a significant number of patients and lead to a biased assessment of survival.22 61 62
Second breast cancer recurrences and second primary breast cancers are of interest in the epidemiological and outcome research of breast cancer. In our assessment, we did not consider studies that used algorithms to identify recurrences and second breast cancer events. A recent study assessed several algorithms to identify second breast cancer events following early stage invasive breast cancer and found high accuracy measures.63 In addition, we were not able to consider articles that were not written in English and this may have introduced language bias. In addition, despite the comprehensive nature of our search, a few pertinent articles may have been missed given that some identified articles did not use the term ‘administrative database’ as a subject heading, and the term is not recognised as a MeSH by Medline. Indeed, we were able to identify four primary studies54–57 using the Cited-By’ tools in PubMed, Google Scholar or checking the reference of included studies. Fourth, the knowledge and experience of the ICD-9/ICD-10 coders could have influenced the quality of breast cancer case definition in each study, and consequently the results presented in our review could be biased by this factor. Finally, we emphasise that the applicability of validation studies depends much on the methods used to identify subjects with the condition of interest to validate the algorithm because this may influence the disease prevalence, and the generalisability of the subjects characteristics as well as the diagnostic accuracy measures. Hence, the generalisability of a database is limited to the setting in which the validation has been performed.64 For example, while Medicare covers the elderly41 42 47 55 59 61 and Medicaid covers indigent and other particular group of patients groups,44 57 the US Healthcare, is an independent practice association model, that may represent patient populations of a relatively higher socioeconomic class.50 51 Hence, inference from these validated databases cannot be made to those who despite residing in the same area of the residents registered in the above reported systems but do not benefit from them or to subjects aged 64 years or less as is in most cases of the Medicare system. Conversely, the database in Italy10 38 and France40 54 where the provision of healthcare is universally provided to residents, the applicability of the results from the validated databases is adequate, although it cannot be extended at a national level. Finally, we found a study that validated data from a single institution in Japan and as acknowledged by the authors it is unclear whether the accuracy results can be directly applicable to other hospitals.49
In summary we conclude that, based on the retrieved evidence, administrative databases can be employed to identify primary breast cancer. The best algorithm suggested is ICD-9 or ICD-10 codes located in primary position. Caution should be used when surgical procedures, chemotherapy, radiation therapy or outpatient data and physician claims are added to the algorithm. We believe that our findings will help researchers that would like to validate breast cancer ICD-9 or ICD-10 codes in administrative databases using either cancer registry or medical charts.
The authors would like to thank Kathy Mahan for editing the manuscript.
Contributors IA, AM, GG, DG, MF conceived and designed the study; EB, VS, AG, FC, MO, IA and AM were involved in the data acquisition; IA, AM, DS, MF and GG analysed and interpreted the data and IA, EB, MO, GG, DS, VS, AG, FC, MF and AM contributed in the drafting and revising the study and have approved submission of the final version of the article. AM is the guarantor of the review.
Funding This systematic review protocol was developed within the D.I.V.O. project (Realizzazione di un Database Interregionale Validato per l’Oncologia quale strumento di valutazione di impatto e di appropriatezza delle attività di prevenzione primaria e secondaria in ambito oncologico) supported by funding from the National Centre for Disease Prevention and Control (CCM 2014), Ministry of Health, Italy.
Competing interests None declared.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The authors will give full access to our database that gathered data of individual studies included in this review. The request must be done by sending an email to firstname.lastname@example.org.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.