Article Text

Original research
Identification of delayed diagnosis of paediatric appendicitis in administrative data: a multicentre retrospective validation study
  1. Kenneth A Michelson1,
  2. Richard G Bachur1,
  3. Arianna H Dart1,
  4. Pradip P Chaudhari2,
  5. Andrea T Cruz3,
  6. Joseph A Grubenhoff4,5,
  7. Scott D Reeves6,
  8. Michael C Monuteaux1,
  9. Jonathan A Finkelstein7
  1. 1Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, USA
  2. 2Division of Emergency and Transport Medicine, Children's Hospital Los Angeles, Los Angeles, CA, USA
  3. 3Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
  4. 4Section of Pediatric Emergency Medicine, University of Colorado School of Medicine, Aurora, CO, USA
  5. 5Children's Hospital Colorado, Aurora, CO, USA
  6. 6Division of Pediatric Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
  7. 7Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA, USA
  1. Correspondence to Dr Kenneth A Michelson; kenneth.michelson{at}childrens.harvard.edu

Abstract

Objective To derive and validate a tool that retrospectively identifies delayed diagnosis of appendicitis in administrative data with high accuracy.

Design Cross-sectional study.

Setting Five paediatric emergency departments (EDs).

Participants 669 patients under 21 years old with possible delayed diagnosis of appendicitis, defined as two ED encounters within 7 days, the second with appendicitis.

Outcome Delayed diagnosis was defined as appendicitis being present but not diagnosed at the first ED encounter based on standardised record review. The cohort was split into derivation (2/3) and validation (1/3) groups. We derived a prediction rule using logistic regression, with covariates including variables obtainable only from administrative data. The resulting trigger tool was applied to the validation group to determine area under the curve (AUC). Test characteristics were determined at two predicted probability thresholds.

Results Delayed diagnosis occurred in 471 (70.4%) patients. The tool had an AUC of 0.892 (95% CI 0.858 to 0.925) in the derivation group and 0.859 (95% CI 0.806 to 0.912) in the validation group. The positive predictive value (PPV) for delay at a maximal accuracy threshold was 84.7% (95% CI 78.2% to 89.8%) and identified 87.3% of delayed cases. The PPV at a stricter threshold was 94.9% (95% CI 87.4% to 98.6%) and identified 46.8% of delayed cases.

Conclusions This tool accurately identified delayed diagnosis of appendicitis. It may be used to screen for potential missed diagnoses or to specifically identify a cohort of children with delayed diagnosis.

  • Health informatics
  • PAEDIATRIC SURGERY
  • ACCIDENT & EMERGENCY MEDICINE

Data availability statement

Data are available on reasonable request. Data were collected by the investigators and are available on reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • The study establishes a method for specifically flagging cases with delayed diagnosis, allowing study in large datasets where medical records are not available.

  • The tool was derived and validated in separate cohorts.

  • Expert medical record review was conducted based on a previously-defined objective rubric.

  • All patients were evaluated in paediatric emergency departments, which may differ from non-paediatric emergency departments, affecting rule performance outside of paediatric settings.

Introduction

Appendicitis is the most common serious surgical emergency in children.1 Appendicitis may be more difficult to diagnose in younger children, often exhibits fewer classic features of the disease, and the symptoms and signs can overlap with other, more common illnesses, such as gastroenteritis.2 3 Delays are associated with complications including perforated appendicitis, abdominal abscess formation, sepsis and rarely a need for bowel resection.4 Timely diagnosis can prevent these complications. The emergency department (ED) environment accentuates factors that predispose patients to delayed diagnosis because of high cognitive load on clinicians, frequent high-stakes decisions and patients who are typically not previously known to clinicians.5 6

Systematic identification of diagnostic error is the first step in preventing clinical delays in diagnosis, but reporting of diagnostic errors is unreliable, challenging and typically relies on expert case review.7–11 However, case review is labourious, expensive and difficult to scale. Automated approaches promise to screen and identify potential causes of error, but nevertheless require manual case review after screening.12 13 Case review depends on access to records and resources to perform the review, biasing samples towards hospitals willing to participate.14

Despite these obstacles, tools to assess diagnostic accuracy across hospitals of all types are needed to improve delays in diagnosis of serious emergency conditions for children. Most childhood ED encounters occur in community EDs not staffed by clinicians who primarily treat children, and one-third of EDs evaluate fewer than five children per day on average.15 16 This may magnify the challenge of diagnosis in children, who are more likely to be developmentally unable to provide accurate historical information, and in whom early symptoms of disease are often non-specific.17 Administrative data are the only current widespread means of assessing care at all types of hospitals and thus are the only currently realistic approach for understanding a broad cross-section of care.14

Approaches identifying delayed diagnosis in administrative data, if shown to be accurate, would have several advantages. First, they could be used in administrative data to illuminate hospital-level factors and rates of delayed diagnosis that would inform improvement efforts. Second, they could be used to identify high-performing hospitals or hospital systems that could serve as models or benchmarks to other institutions seeking to improve diagnosis. Third, they could save substantial effort in identifying cases for local review and feedback. Finally, they could be used to assess improvement efforts focused on diagnostic accuracy.

To address the challenges of efficiently identifying potential diagnostic error, we previously piloted a method for accurately identifying delayed diagnosis for conditions using the information contained within administrative data.18 Here, we report on a multicentre investigation to validate that methodology for the identification of a delayed diagnosis of appendicitis in children and young adults aged less than 21 years old.

Methods

Design, setting and participants

We performed a retrospective cross-sectional study to develop and test a decision rule using variables only available in administrative data to predict delayed diagnosis of appendicitis, as determined by expert case review. The study was designed in accordance with Standards for Reporting of Diagnostic Accuracy Studies guidelines for studies on diagnostic accuracy.19 Participants were children and young adults age <21 years who visited one of five paediatric EDs across the country from 2010 to 2019, had a first-time diagnosis of appendicitis, and had an ED visit in the preceding 7 days. The ED encounter associated with the appendicitis diagnosis was designated as the ‘diagnosis encounter,’ and the preceding encounter was designated as the ‘initial encounter’. Cases were identified for inclusion using diagnosis codes (International Classification of Diseases, 9th Edition, Clinical Modification (ICD-9-CM) 540.x, 541, 542 and ICD-10-CM K35.x–K37.x). Patients were excluded if insufficient medical records existed to determine whether a delayed diagnosis occurred, if no record of a prior encounter existed, if the patient left the ED without being seen, or the patient was transferred at the conclusion of the initial ED visit (which made determination of a delayed diagnosis impossible).

Data sources

The source of administrative data was the Pediatric Health Information System (PHIS). The PHIS database contains clinical and billing data from 44 not-for-profit, tertiary care children’s hospitals. The data collection, validation and safeguarding procedures are assured through a joint effort between the Children’s Hospital Association (Lenexa, Kansas, USA) and participating hospitals, and have previously been described.20–22 Data are deidentified at the time of data submission, and data are subjected to a number of reliability and validity checks before being included in the database. For this study, data from five hospitals were included. Cases from PHIS were reidentified locally and linked to the electronic health record (EHR) at each participating site for manual review.

Outcome

The reference standard primary outcome was delayed diagnosis as determined by manual expert case review of the EHR. It was defined as appendicitis being present at the initial encounter. Reviewers rated the likelihood that appendicitis was present as ‘near-definitely not’, ‘probably not’, ‘possibly’, ‘probably’ or ‘near-definitely’ (definitions provided in online supplemental table 1).

Case reviewers were all board-certified paediatric emergency medicine faculty. Reviewers were trained on the assessment of delay using study reading material, and then were tested and retested grading 40 standard appendicitis cases. Real-time feedback was given after each response. The correct answers and feedback were determined by a multispecialty expert consensus panel.23 The reviewer assessment of delayed diagnosis was dichotomised as delayed diagnosis (probably or near-definitely delay) or not delayed diagnosis (possibly, probably not or near-definitely not delay). This approach to case review was previously shown to have high inter-rater reliability in a very similar cohort.18 After training, reviewers evaluated study cases. Reviewers were blinded to the decision rule assessment of delayed diagnosis.

Development of the decision rule

The decision rule evaluated the likelihood of delayed diagnosis of appendicitis using variables contained in administrative data and based on investigators’ clinical expertise. These included age (<3 years, 3–10 years or ≥11 years), sex, history of a complex chronic condition,24 revisit interval (days between initial and diagnosis encounters), diagnosis code for perforated appendicitis (ICD-9-CM 540.0–1, ICD-10-CM K35.2x, K35.32–33), length of stay of the diagnosis encounter (0–1, 2–3, 4–7 or>7 days), and individual presence or absence of specific diagnoses at the initial encounter including abdominal pain, constipation, dehydration, fever, gastroenteritis, genitourinary condition, head/ear/eye/nose/throat condition, leucocytosis, urinary tract infection, viral infection or none of the above (diagnosis codes in online supplemental table 2).

The full cohort was randomly divided into derivation (2/3) and validation (1/3) sets, stratified on the outcome. The decision rule was trained using only the derivation set. Variables were selected for inclusion in the decision rule using univariable logistic regressions. All variables associated with the outcome with p<0.20 were included in the decision rule. The final model underlying the decision rule was created using multivariable logistic regression within the derivation set using delayed diagnosis (determined by expert case review) as the outcome and all screened-in variables as predictors. The decision rule classified cases as delayed or not delayed using two thresholds: (1) a maximal accuracy threshold, based on the model predicted probability being greater than or equal to the value that maximises the proportion of correct classifications25 and (2) a near-definite delay threshold if the predicted probability of delay was ≥90%.

Analysis

The prevalence of delayed diagnosis was determined in the whole cohort and then separately by site. We constructed receiver operating characteristic (ROC) curves in the derivation and validation sets to illustrate the trade-off of sensitivity vs specificity of the decision rule in correctly classifying delayed diagnosis. Areas under the ROC curve (AUC) were computed. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy were determined for the rule in both derivation and validation sets at the two thresholds (maximal accuracy and near-definite delay). We determined binomial exact 95% CIs for each test characteristic.

Calibration of the rule was determined separately in the derivation and validation sets. We first used the predicted probability of delayed diagnosis to categorise patients as 0% to <20%, 20% to <40%, 40% to <60%, 60% to <80% or 80% to 100% likely to have delayed diagnosis. We then computed the actual proportion of patients who had a delayed diagnosis and its 95% binomial exact CI within each of these subgroups. One-sample binomial proportion tests were computed for the validation set comparing expected frequencies of delay (10%, 30%, 50%, 70% and 90%, respectively) with actual proportions.

Sensitivity analyses were performed, recreating the rule derivation to predict the outcome of (1) possible, probable or near-definite delay (a permissive rule) or (2) near-definite delay only (a strict rule).

Statistical significance was defined as p<0.05. A prestudy power analysis suggested that we would need 193 patients in the validation cohort to have 80% power to estimate the rule PPV within 10 percentage points based on a binomial exact CI around 0.9 (the expected PPV based on pilot work).18

Patient and public involvement

Patients and the public were not involved in the development of this research, as the topic of the research was focused on informatics.

Results

Among 801 patients included in the study because of a revisit within 7 days leading to appendicitis diagnosis, we excluded 14 (1.7%) for having insufficient records, 5 (0.6%) for no record of an initial encounter, 32 (4.0%) for leaving without being seen and 81 (10.1%) for being transferred at the initial encounter. We analysed 669 (83.5%) patients. Demographics of the cohort are shown in table 1, and there were no significant differences between characteristics of children in the derivation or validation sets. Delayed diagnosis of appendicitis occurred in 471 (70.4%) of patients.

Table 1

Demographics and outcomes of the derivation and validation study cohorts

Derivation of the decision rule

Among all possible variables screened for inclusion, all except age were associated with the outcome with p<0.20. The final logistic regression model used for the decision rule is shown in table 2. A risk calculator is available as online supplemental file 2. The variables most associated with delayed diagnosis were perforated appendicitis and the interval between the initial and diagnosis encounters. The maximum accuracy threshold of predicted probability of delayed diagnosis was 0.568. Therefore, based on the decision rule determined from the derivation set, predictions of delayed diagnosis were most accurate when a case had a predicted probability of delay >56.8%.

Table 2

Final model predicting delayed diagnosis of appendicitis based on administrative data in the derivation cohort only (model pseudo-R2=0.42)

Validation of the decision rule

ROC curves depicting the trade-off of sensitivity and specificity at differing thresholds of predicted probability of delay are shown in figure 1. The AUC for the derivation set was 0.892 (95% CI 0.858 to 0.925) and for the validation set was 0.859 (95% CI 0.806 to 0.912). We applied the decision rule using the maximal accuracy threshold of 56.8%and separately of >90% in both derivation and validation sets. The validation set PPV of the prediction rule was 84.7% (95% CI 78.2% to 89.8%) and NPV was 67.7% (95% CI 54.7% to 79.1%). Using a stricter cut-off predicted probability of ≥90% yielded a PPV of 94.9% (95% CI 87.4% to 98.6%) and NPV of 42.9% (95% CI 34.7% to 51.3%). Test characteristics of the decision rule are shown in table 3. The calibration of the model was excellent. In the validation cohort, predictions of the probability of delayed diagnosis were not significantly different than actual probabilities of delayed diagnosis, except for children with a predicted probability of 20% to <40%, in whom delayed diagnosis was underestimated (table 4).

Table 3

Test characteristics of the delayed diagnosis prediction model, applied to the derivation and validation cohorts

Table 4

Calibration of the predicted probability of delayed diagnosis at different predicted probabilities

Figure 1

Receiver operating characteristic curves depict the trade-off between sensitivity and false positive rate (1-specificity) in predicting delayed diagnosis. The AUC for the derivation set was 0.892 (95% CI 0.858 to 0.925) and for the validation set was 0.859 (95% CI 0.806 to 0.912). AUC, area under the curve.

Sensitivity analyses

Permissive and strict decision rules had similar performance to the main decision rule (Details of the rules and test characteristics are shown in online supplemental tables 3 and 4). The validation AUCs were 0.865 (95% CI 0.800 to 0.930) for the permissive rule and 0.803 (95% CI 0.747 to 0.859) for the strict rule. PPV was 92.6% (95% CI 87.7% to 96.0%) for the permissive rule and 63.1% (95% CI 53.9% to 71.7%) for the strict rule.

Discussion

We successfully derived and validated an accurate decision rule for retrospectively identifying cases of delayed diagnosis of appendicitis in administrative data, with a PPV of 84.7%. Importantly, the model underlying the decision rule is well calibrated, provides accurate estimates of delay likelihood, and can identify a subcohort of patients who almost certainly experienced a delayed diagnosis of appendicitis: a stricter model threshold had a PPV of 94.9%. The rule relies only on information contained with administrative databases, including patient demographics, encounter length of stay and diagnosis codes. The model is therefore amenable to assessment of care, research and improvement efforts both locally and at the state and national levels.

We believe the decision rule will be useful for several different applications by varying the threshold for detection of delay. Since the rule is well calibrated, using a lower threshold of predicted probability to detect delay (eg, 0.2) will provide sensitive detection but would require further review to confirm delay, and using a higher threshold of detection (eg, 0.8) will provide specific detection but will miss cases with delay. At higher thresholds, the decision rule is specific enough to estimate rates of delayed diagnosis of appendicitis in populations drawn from large administrative databases, without the need for subsequent case review. These features of the rule are crucial, because they allow for a direct assessment of diagnostic performance in hospitals without considerable quality measurement infrastructure or investments in research. Using a sensitive threshold, hospitals could track their diagnostic performance and screen for potential cases of delay, aiding quality assurance efforts by balancing good case capture with the feasibility of many case reviews. A tiered approach would be to screen only cases above the sensitive threshold but assume that those above the higher threshold constitute delays.

The final model mirrors the clinical factors known to predispose to delayed diagnosis of appendicitis. A shorter period between initial and diagnosis encounters increases the likelihood that the initial one was related, mirroring evidence suggesting that the relatedness of two ED visits decreases with time.26 Perforation at diagnosis is associated with the likelihood that a delayed diagnosis occurred, probably because it increases the likely duration of disease that existed before diagnosis. Conditions commonly misdiagnosed before appendicitis were associated with a higher likelihood of delay and included gastroenteritis and urinary tract infection. In contrast, an absence of an apparently related diagnosis made delay less likely.

The approach used to generate this rule is generalisable to other emergency conditions. First, we convened a panel of experts to define the standards for grading a delayed diagnosis. Second, expert reviewers were trained to evaluate case records. Finally, reviewers analysed hundreds of records to generate enough data to develop a reliable decision rule. The variables could be repurposed for other conditions, but the rule itself is unique to paediatric appendicitis. We believe duplicating this approach for other conditions would be useful, because once developed, a rule is applicable for ongoing quality monitoring and research. Once expanded to multiple conditions, it would provide a realistic view of a hospital’s overall diagnostic performance, which has proved elusive to date.

A major reason that we developed this decision rule is that identifying and thus preventing diagnostic errors is challenging in general, as self-report is unreliable and labourious case review is needed.7 27 It is specifically challenging in children because most paediatric care happens outside of paediatric hospitals, where research is most commonly conducted and EHRs may not be available.14 28 Although trigger tools exist to identify diagnostic errors in abdominal cases, they are too non-specific to forego the review step, which requires access to records.29 A key advantage of our approach is that, with a high predicted probability threshold of 90%, delay can be specifically identified.

This study has several strengths, including reliance on a multidisciplinary consensus definition of delayed diagnosis, the validation of the model on a cohort distinct from that used to train it, the large sample size, use of data from multiple centres and face validity of the factors predicting delay. Limitations include the use of data from only paediatric hospitals (suggesting the value of a future independent validation in general hospitals) and the complex nature of the decision rule model. Additionally, we did not perform tests of inter-rater reliability, though we previously showed in pilot work that inter-rater reliability for this approach is excellent.18 Finally, the development of the decision rule using a random split cohort can result in optimistic predictions; thus, we intend to further validate this rule in external populations in the future.

In conclusion, we developed and validated a model that can accurately identify delayed diagnoses of paediatric appendicitis in administrative data, without the need for manual record review for confirmation. This model may be applied to hospital data sources in which polic ymakers and researchers do not have access to patients’ records, allowing for accurate study of diagnostic error in most hospitals. The model may also be used by hospital systems to identify errors and improve care.

Data availability statement

Data are available on reasonable request. Data were collected by the investigators and are available on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants but Boston Children’s Hospital Institutional Review Board exempted this study. Informed consent was not required as this was entirely a retrospective medical record review. The study was exempted by the Boston Children’s Hospital IRB.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors KM contributed to study planning, data collection, data analysis, drafted the manuscript, and acted as guarantor for the study. RB, MCM and JAF contributed to study design, and substantially revised the manuscript. AD contributed to study planning and procedures. PPC, ATC, JAG and SDR substantially contributed to study design and data collection.

  • Funding KM received funding through award K08HS026503 from the Agency for Healthcare Research and Quality, and from the Boston Children’s Hospital Office of Faculty Development.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.