Elsevier

Annals of Emergency Medicine

Volume 64, Issue 3, September 2014, Pages 292-298
Annals of Emergency Medicine

Research methods/special contribution
Looking Through the Retrospectoscope: Reducing Bias in Emergency Medicine Chart Review Studies

https://doi.org/10.1016/j.annemergmed.2014.03.025Get rights and content

Introduction

Chart review studies that use prerecorded data as the primary information source to answer a research question account for approximately 25% of all scientific studies published in peer-reviewed emergency medicine journals1 and 53% of emergency medical services journals.2 The popularity of the chart review study design may be partly ascribed to the fact that the data are already collected, thereby eliminating the onerous task of prospective data collection. Moreover, a chart review study permits the investigation of questions that are difficult or nearly impossible to evaluate in prospective trials, such as the effects of rare or harmful exposures to which subjects cannot be randomized for ethical reasons.

Acknowledging that a summary risk ratio from a systematic review may be many steps removed from the true risk ratio in the target population, Maclure and Schneeweiss3 characterized the bias or distortions in the lenses or filters of an epidemiologist’s telescope. As an analogy, recall the childhood game of “telephone” or “whisper down the lane,” in which a message is relayed from one person to the next, by each of a number of people. Because of the introduction of “noise” (ambiguity) during each communication, and the subsequent misinterpretation of listeners seeking meaning, the final version of the message often differs radically from the original. Similarly, the reported effect estimate from a chart review study is subject to several layers of potential bias because abstracted data are several steps removed from the patient (Figure 1). First, the patient must divulge the information to the medical professional, who must then accurately interpret and transcribe it into the medical record. Using a data collection instrument, the chart abstractor must then cull and interpret the records and find each variable of interest, understand its meaning, and properly record it. Not all events surrounding an illness or injury, however, are relayed to the physician. Furthermore, even if the information is reported by the patient, the physician or nurse may fail to document it in the medical chart because of perceived relative unimportance, sheer oversight, or diagnostic mindset. Misinterpretation of chart entries or miscoding of data during data abstraction can compound the errors and omissions in the medical record. Thus, the potential for systematic error is far greater than the risk of random error, and the process of conducting chart review study has the potential to produce a conclusion that is a distortion of the true effect estimate,4 underscoring the importance of identifying steps to minimize bias in chart review studies. Although there have been various criteria proposed as indicators of chart review study quality,1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 none have been validated (Figure 2). Furthermore, none of these criteria have been updated or aggregated since the evolution of the use of large public databases and the electronic medical record.

A “chart” herein is defined as a document (whether paper or electronic) containing prerecorded patient-focused medical information, such as physician and nursing notes, out-of-hospital reports, and diagnostic test results from both the laboratory and radiology departments.4, 5 A chart review is therefore any type of study in which information is abstracted from the medical record.

The purpose of this article is 3-fold: (1) to provide a model that identifies the numerous processes in chart review studies that can introduce bias; (2) to outline the steps an investigator may take when planning a chart review study to mitigate distortion and bias; and (3) to describe reporting techniques that optimize transparency so readers can anticipate the biases and the limitations of the study. The “retrospectoscope” in Figure 1, which we modeled after the epidemiologic telescope of Maclure and Schneeweiss,3 depicts the following 10 potential layers of bias, detailed below.

Section snippets

Layer 1: Chart Review Appropriate for the Research Question

The chart review must be an appropriate method of data collection to answer the proposed research question. This means that the available charts must be representative of the patient population of interest, and they must include documentation of the pertinent study items. Documentation in charts is recorded for multiple reasons: billing, administrative recording, legal issues, and as a record of what transpired in the actual medical care delivered. This last assumption is key to chart review

Layer 2: Transparency of Investigator Bias

It is also important to understand and identify any investigator biases and potential conflicts of interest, be they financial or philosophical. The investigator may have unwittingly formulated a research question or a data collection instrument that is inherently skewed to favor proving his or her predicted hypothesis.

Solution: Before embarking on a study, the investigators should declare any conflicts of interest, seek institutional review board approval, and develop (and ideally pilot test)

Layer 3: Study and Target Population

A common pitfall of chart review studies is that the base population may not compose a sample representative of the patient population of interest. Studies that fail to use all or a random sample of available charts may lack internal validity. Furthermore, even if sampled charts are representative of all charts, external validity will be compromised if charts were taken from a setting with an atypical population or practice style.

Solution: Make sure that the study settings are typical of the

Layer 4: Variables to Be Collected

The next potential area of bias occurs in the data collection phase. There may be multiple conflicting entries. The triage nurse and resident physician may document the presence of a soft abdomen, whereas the attending physician or consultant who may have access to imaging results obtained later in the emergency department course may document the presence of tenderness or a mass. There may also be inconsistent coding of data into categories. Note that there are many opportunities for

Layer 5: Systematic Data Collection

Data collection that is not systematic may lead to misclassification bias.

Solution: One means to improve objective data collection is to use a standardized data collection instrument that has been pilot tested and organized and ordered in a manner similar to that in which the information may be found in the actual chart.1, 2, 11, 12, 17 Because each manipulation of data provides an additional opportunity for errors, when possible, data should be recorded directly into a computer program that

Layer 6: Missing and Conflicting Data

Large amounts of missing or conflicting data may result in incorrect conclusions. Missing data can be thought of as a form of selection bias, and the degree to which selection bias can compromise validity varies from variable to variable and is determined by the context and subject matter. For this reason, there is no way to define an acceptable proportion of missing data.5, 10

Solution: When determining which variables to study, the investigators should determine the proportion of data that are

Layer 7: Abstractor Bias

As would be expected, one of the greatest potential areas for bias involves the data abstractor. If abstractors are not blinded to the study objectives and hypothesis, they may be biased when assigning values for variables. Although this may not be an issue for some variables (eg, patient’s sex), for other variables there may be multiple contradictory entries in the chart (eg, one physician writes that there is rebound tenderness, a second writes “no rebound,” and a third writes nothing). If

Layer 8: Abstractor Training

Interpreting chart entries, as well as entering and coding data, requires training.1, 2, 4, 5, 6 Although most chart reviewers are medically trained professionals (eg, physicians, nurses, medical students), many National Hospital Ambulatory Medical Care Survey (NHAMCS) data collectors have only a high school diploma and no medical background. Non–medically trained abstractors may fail to recognize medical jargon or misinterpret test results, which can result in erroneous entries.19 They may

Layer 9: Abstractor Monitoring

For studies involving a prolonged data collection phase, data collection forms should periodically be compared with the actual medical record charts because, over time, there may be a decrement in the accuracy of recording or a change in coding practices.

Solution: Meetings with the abstractors may be useful to resolve disputes or review coding rules1, 7, 8, 11 and should therefore be planned. Whereas one study advocated using 3 points during the chart audit phase for quality monitoring,7 there

Layer 10: Abstractor Interrater Reliability

Ideally, 2 abstractors would independently analyze each chart so that differences could be identified and resolved. This, however, is time consuming and costly, and hence is seldom done. The alternative is to establish the interrater reliability of abstractions so that the results of a single abstraction of each chart can be trusted.

Interrater reliability assessments are particularly important when there are groups of different abstractors, such as in a multicenter study. Without a comparison

Layer 10a: Agreement or Reliability

Both raw agreement and chance-corrected agreement (reliability) should be reported. Unfortunately, chart review publications often fail to report interrater reliability.1 Raw agreement can be misleading when reported alone because it does not indicate how much of that agreement could have occurred by chance.7 In contrast, Cohen’s κ, which is a measure of chance-corrected interobserver agreement for categorical data, is reported as a value from –1 (perfect disagreement) to 0 (chance agreement)

Layer 10b: What Is Good Enough Agreement or Reliability?

Landis and Koch21 provided oft-quoted criteria for interpreting κ; however, the acceptable level for κ varies with the circumstances and the variable in question. Therefore, the use of their criteria is discouraged.20 For example, if the outcome variable of interest is death, perhaps no less than a κ of 1.0 should be achieved. Authors should discuss why a certain level of agreement or interrater reliability is presumed acceptable, according to the nature of the variable in question.

Solution:

Layer 10c: What Items Should Be Checked for Reliability?

It is not enough to simply state that an interrater reliability assessment was performed. If the investigators used a 30-item data collection instrument, the κ for the assessment of reported age may be 1.0, but it would be more important to understand how often 2 raters agreed on the most important explanatory variables, the most important confounders, and the most important outcome measures.12, 17

Solution: Ideally, the interrater reliability of each item should be reported, using an online

Layer 10d: What Proportion of the Data Should Be Checked for Reliability?

There is no evidence-based standard proportion of abstracted data that should be evaluated for reliability. Many studies will sample 10% of charts; however, this may or may not be an adequate sample, depending on the circumstances. For example, for a variable that is usually answered one way (eg, 95% of the time the answer is no), a 10% sample may be insufficient because there will be few or no yes answers in the reliability sample, and readers will have no idea whether abstractors are able to

Limitations

This article provides suggestions on the conduct and reporting of studies in which chart review is used as a data collection method. Our major recommendation on reporting is to be transparent and report exactly what was done and what was found for each issue discussed in articles. This recommendation comes directly from principles associated with the scientific method, which emphasizes the importance of describing a study in sufficient detail to permit replication.23 Our recommendations on the

First page preview

First page preview
Click to open first page preview

Cited by (490)

View all citing articles on Scopus

Supervising editor: Michael L. Callaham, MD

Funding and support: By Annals policy, all authors are required to disclose any and all commercial, financial, and other relationships in any way related to the subject of this article as per ICMJE conflict of interest guidelines (see www.icmje.org). The authors have stated that no such relationships exist.

A podcast for this article is available at www.annemergmed.com.

View full text