Article Text

Download PDFPDF

Validity of peptic ulcer disease and upper gastrointestinal bleeding diagnoses in administrative databases: a systematic review protocol
  1. Alessandro Montedori1,
  2. Iosief Abraha1,
  3. Carlos Chiatti2,
  4. Francesco Cozzolino1,
  5. Massimiliano Orso1,
  6. Maria Laura Luchetta3,
  7. Joseph M Rimland4,
  8. Giuseppe Ambrosio5
  1. 1Health Planning Service, Regional Health Authority of Umbria, Perugia, Italy
  2. 2Scientific Directorate, Italian National Research Center on Aging, Ancona, Italy
  3. 3Department of General Medicine, Azienda USL Umbria 1, Perugia, Italy
  4. 4Department of Geriatrics and Geriatric Emergency Care, Italian National Research Center on Aging, Ancona, Italy
  5. 5Department of Cardiology, University of Perugia School of Medicine, Perugia, Italy
  1. Correspondence to Dr Iosief Abraha; iosief_a{at}


Introduction Administrative healthcare databases are useful to investigate the epidemiology, health outcomes, quality indicators and healthcare utilisation concerning peptic ulcers and gastrointestinal bleeding, but the databases need to be validated in order to be a reliable source for research. The aim of this protocol is to perform the first systematic review of studies reporting the validation of International Classification of Diseases, 9th Revision and 10th version (ICD-9 and ICD-10) codes for peptic ulcer and upper gastrointestinal bleeding diagnoses.

Methods and analysis MEDLINE, EMBASE, Web of Science and the Cochrane Library databases will be searched, using appropriate search strategies. We will include validation studies that used administrative data to identify peptic ulcer disease and upper gastrointestinal bleeding diagnoses or studies that evaluated the validity of peptic ulcer and upper gastrointestinal bleeding codes in administrative data. The following inclusion criteria will be used: (a) the presence of a reference standard case definition for the diseases of interest; (b) the presence of at least one test measure (eg, sensitivity, etc) and (c) the use of an administrative database as a source of data. Pairs of reviewers will independently abstract data using standardised forms and will evaluate quality using the checklist of the Standards for Reporting of Diagnostic Accuracy (STARD) criteria. This systematic review protocol has been produced in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocol (PRISMA-P) 2015 statement.

Ethics and dissemination Ethics approval is not required given that this is a protocol for a systematic review. We will submit results of this study to a peer-reviewed journal for publication. The results will serve as a guide for researchers validating administrative healthcare databases to determine appropriate case definitions for peptic ulcer disease and upper gastrointestinal bleeding, as well as to perform outcome research using administrative healthcare databases of these conditions.

Trial registration number CRD42015029216.

  • peptic ulcer
  • gastrointestinal haemorrhage
  • administrative database
  • sensitivity
  • accuracy
  • validity

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • Validation of International Classification of Diseases, 9th Revision and 10th reversion (ICD-9 and ICD-10) diagnosis codes for peptic ulcer disease and upper gastrointestinal bleeding using administrative healthcare databases can contribute to health outcome research.

  • This review will be the first to systematically identify and evaluate primary studies that validated the accuracy of ICD-9 and ICD-10 codes for peptic ulcer disease and upper gastrointestinal bleeding in administrative healthcare databases.

  • The results from this systematic review will serve as a guide to determine appropriate case definitions for peptic ulcer and upper gastrointestinal bleeding.

  • The main limitation is that validated diagnosis codes or algorithms are context-specific, and may not be generalisable to other settings.


Non-variceal upper gastrointestinal bleeding (UGIB) is associated with significant morbidity and mortality. It has an incidence rate from 48 to 160 cases per 100 000 per year, and greater incidences in men and older people.1 ,2 Although UGIB and peptic ulcer bleeding are diminishing in the general population, hospitalisation rates from ulcer complications are growing in older populations.3 The most frequent risk factors for non-variceal UGIB comprise Helicobacter pylori infection, and the use of NSAIDs/aspirin, and other antiplatelet and anticoagulant medications. (Up to 67% of cases of UGIB are caused by peptic ulcer disease (PUD).1) Both H. pylori infection and NSAIDs are independent risk factors for PUD and UGIB.4

Health authorities generate and maintain large administrative healthcare databases that typically contain information and data regarding health resource utilisation (eg, hospitalisations, outpatient care and drug prescriptions) and vital statistics.5 For research, one of the advantages of administrative databases is that they passively collect data at a population level with longitudinal follow-up, making their results easily generalisable. In addition, they are considered to be cost-effective compared with primary data collection.6 ,7 The main disadvantage of these databases is that they are generated for administrative purposes, such as billing, and as a repository for patient hospital records, and not for research, hence, the diagnostic codes for specific disorders must be validated according to an accepted ‘gold standard’ reference diagnosis.8–14

In the gastrointestinal field, administrative healthcare databases have been used to estimate the epidemiology of PUD15 and UGIB,16 to assess drug-related gastrointestinal outcomes,17–19 to conduct active drug surveillance20 and health service quality evaluation.21 ,22

Current administrative databases use the International Classification of Diseases, 9th Revision (ICD-9) or 10th Revision (ICD-10) codes for PUD and UGIB. Validation of diagnostic codes is of particular interest to national healthcare authorities to perform surveillance of medical products and epidemiological studies of diseases. For example, the US Food and Drug Administration has sponsored a pilot project, Mini-Sentinel, with the aim of performing active surveillance to improve safety signals that emerge for newly released medical products. To implement this work, the programme needed to identify algorithms used to detect a number of health outcomes of interest using administrative data sources and identify the performance characteristics of these algorithms.23 The Mini-Sentinel programme produced a series of systematic reviews of validated methods and case definitions, to identify various diseases or health outcomes in administrative data, including cardiocerebrovascular diseases24–28 and other conditions.29–33 For the purpose of establishing best practices in the use of administrative data for health research and surveillance, the Canadian Rheumatology Administrative Data Network conducted a systematic review of studies reporting on the validity of diagnostic codes to identify cardiovascular diseases.34–36 Likewise, the Regional Health Authority of Umbria, is interested in the validity of administrative data diagnoses and in identifying case definitions and the algorithms developed for different diseases, including cancer (breast, lung and colorectal),9 ,11 chronic obstructive pulmonary disease13 and non-variceal UGIB, which is the focus of this article.

In the medical literature, at the present time, the validity and performance of algorithms employing diagnostic codes for PUD and UGIB have not been systematically investigated. With the current protocol, we plan to systematically evaluate validation studies of diagnostic codes corresponding to these gastrointestinal conditions in administrative databases.


Literature search

Published peer-reviewed articles will be identified through comprehensive searches of MEDLINE, EMBASE, Web of Science and the Cochrane Library from their inception. We will use a search strategy that we developed based on the combination of: (a) keywords and Medical Subject Headings (MeSH) terms to identify records regarding PUD and UGIB; (b) terms to identify studies likely to contain validity or accuracy measures; and (c) a search strategy, based on the combination of terms used by Benchimol et al37 and the Mini-Sentinel programme,38 ,39 which is designed to accurately identify studies that use healthcare administrative databases. The search strategy is available as online supplementary appendix 1. Relevant reference lists of key articles will be hand searched in order to retrieve additional articles. Pertinent articles that cited the article of interest, identified through the preceding search strategy, will be sought through the ‘Cited-By’ tools in PubMed and Google Scholar. Two independent reviewers will screen titles and abstracts for eligibility. Discussion will be used to resolve discrepancies.

This review protocol has been prepared according to the Preferred Reporting Items for Systematic reviews and Meta-Analysis Protocols (PRISMA-P) 2015 statement40 and the results will be presented following the PRISMA flow diagram (figure 1).41 This protocol has also been published in the PROSPERO International Prospective Register of Systematic Reviews with registration number CRD42015029216 (

Figure 1

Study screening process.

Inclusion criteria

Type of studies

We will consider any type of diagnostic (cross-sectional, retrospective or prospective) cohort study, without limits in publication date, and published in English, for inclusion.


The target populations will include patients of any age and sex with peptic ulcer or gastrointestinal haemorrhage. Since there are substantial differences between in-hospital and outpatient upper gastrointestinal bleeders in terms of both clinical risk profile and treatment patterns42 we will consider two types of cohorts with bleeding: (a) patients who have been admitted to a hospital due to non-variceal UGIB caused by peptic ulcer and (b) outpatients who have been visited for peptic ulcer or gastrointestinal bleeding.

Index test

Studies that validated diagnostic codes or algorithms related to ICD-9 or ICD-10 for PUD or UGIB will be considered. The ICD-9 codes for PUD and UGIB are: 531.0–531.7, 531.9 for gastric ulcers and haemorrhage, 532.0–532.7, 532.9 for duodenal ulcers and haemorrhage, 533.0–533.7, 533.9 for peptic ulcers and haemorrhage, 534.0–534.7, 534.9 for gastrojejunal ulcers and haemorrhage, and 578.0, 578.1, 578.9 for gastrointestinal haemorrhage. The ICD-10 codes are K25.0-K25.7 and K25.9 for gastric ulcers and haemorrhage, K26.0-K26.7 and K26.9 for duodenal ulcers and haemorrhage, K27.0-K27.7 and K27.9 for peptic ulcers and haemorrhage and K28.0-K28.7 and K28.9 for gastrojejunal ulcers and haemorrhage and K92.0, K92.1 and K92.2 for gastrointestinal haemorrhage. Detailed descriptions of each ICD code are reported in online supplementary appendix 2.

Reference standard

Studies will be considered in which the diagnoses of target diseases were confirmed through review of medical charts, medical notes or electronic health records. Confirmed peptic ulcers will include cases of active gastric or duodenal ulcers, or gastroduodenal perforation, as confirmed by surgery, endoscopy, X-ray or autopsy. Confirmed UGIB will include cases of haemorrhage from gastric or duodenal ulcers, haemorrhagic gastritis, duodenitis or gastroduodenal perforation, confirmed by surgery, endoscopy, X-ray or autopsy.


Studies that reported the accuracy of administrative data codes to discriminate cases of PUD or UGIB, at least in terms of sensitivity or positive predictive values (PPVs) will be eligible for inclusion.

Selection process

During the initial stage, titles and abstracts will be screened to identify potentially eligible studies. Subsequently, full texts of articles will be obtained and evaluated to determine if they meet the inclusion and exclusion criteria. We will perform data abstraction with standardised data collection forms, which will be tested on a sample of eligible articles beforehand. Title and abstract screening, full-text screening and data abstraction will be carried out, independently, and in duplicate, by two review authors. Any discrepancies will be resolved by consensus, and where necessary, by involving a third review author. Calibration exercises will be performed at each step of the process.

Data extraction

Data extraction will include the following information:

  1. The details of the included study (including title, year and journal of publication, country of origin, and sources of funding; the first author will be used as the study ID);

  2. The disease of interest (peptic ulcer or UGIB);

  3. The target population from which the administrative data were collected;

  4. The type of administrative database used (eg, hospitalisation discharge data), outpatient records (eg, physician billing claims), etc;

  5. The ICD-9 or ICD-10 code used;

  6. External validation;

  7. Use of training and testing cohorts;

  8. The reference standard used to determine the validity of the diagnostic code (eg, medical chart review, patient self-reports, disease registry, etc);

  9. The characteristic of the test used to determine the validity of the diagnostic code or algorithm (eg, sensitivity, specificity, PPVs and negative predictive values (NPVs), area under the receiver operating characteristic (ROC) curve, likelihood ratios (LR) and κ-statistics);

  10. Any conflict of interest.

Quality assessment

The design and method of the included primary studies will be assessed using a checklist developed by Benchimol et al,37 based on the criteria published by the Standards for Reporting of Diagnostic Accuracy (STARD) initiative for the accurate reporting of studies using diagnostic studies.43 The checklist is provided in online supplementary appendix 3. The presence of potential biases within the studies will be reported descriptively.

No subgroup analysis or publication bias assessment is anticipated.


For each algorithm, we will abstract the validation statistics provided in the included studies. Validation statistics may include sensitivity, specificity, PPV and NPV. We will calculate 95% CIs when they are not reported in the articles. Where sufficient and homogeneous data are available we will derive summary estimates of sensitivity and specificity and their 95% CIs data using a bivariate model.44 Data will be meta-analysed using a random-effects model so that sensitivity and specificity are assumed to vary across studies. Separate meta-analyses will be provided based on the administrative data source (outpatient vs inpatient data), type of ICD code (ICD-9 or ICD-10) and type of disease (ulcer or haemorrhage). We will perform subgroup analyses according to the timing of publication and ICD code assessed to examine whether accuracy data have changed overtime.

In addition, summary ROC curves will be constructed and pooled estimates of LR+, LR− and diagnostic OR will be calculated. Heterogeneity will be assessed by visual inspection of forest plots and ROC plots, as well as regression analysis suggested by Reitsma et al.44 Where there is important heterogeneity, we will not pool the data.

Publication bias will not evaluated, as the common tests available (Begg, Egger and Deeks tests) provide different results and thus are not interchangeable.45

Ethics and dissemination

Approval from an ethics committee is not required, since this review protocol will use publicly available data without directly involving human participants. An outline of the protocol has been published in the PROSPERO International Prospective Register of Systematic Reviews in 2015, registration number CRD42015029216. The results of the review will summarise the studies validating diagnostic codes that identify PUD and UGIB in administrative data. In addition, the results will serve as a guide to identify appropriate case definitions and algorithms of PUD and UGIB for researchers validating administrative healthcare databases, as well as for outcome research that uses administrative healthcare databases on these conditions. Findings of the review will be presented at relevant scientific conferences and disseminated through publication in a peer-reviewed journal.



  • Contributors IA, JMR, FC, MO and AM conceived the study. JMR, IA, MLL, FC, MO, CC, GA and AM were responsible for designing the protocol. IA, GA, AM, MO, JMR and FC drafted the protocol manuscript. JMR, IA, FC and MO developed the search strategy. JMR, IA, MLL, FC, MO, CC, GA and AM critically revised the successive versions of the manuscript and approved the final version.

  • Funding This review protocol was funded by the Regional Health Authority of Umbria.

  • Disclaimer The study funder was not involved in the study design or the writing of the protocol.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement All the results from the final version of the systematic review will be published.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.