Article Text

Download PDFPDF

Original research
Outcome reporting bias in Cochrane systematic reviews: a cross-sectional analysis
  1. Kieran Shah1,
  2. Gregory Egan2,
  3. Lawrence (Nichoe) Huan2,
  4. Jamie Kirkham3,
  5. Emma Reid4,
  6. Aaron M Tejani5
  1. 1Pharmacy, University of British Columbia, Vancouver, British Columbia, Canada
  2. 2Lower Mainland Pharmacy Services, Vancouver General Hospital, Vancouver, British Columbia, Canada
  3. 3Department of Biostatistics, University of Liverpool, Liverpool, UK
  4. 4Pharmacy, Queen Elizabeth II Health Sciences Centre, Halifax, Nova Scotia, Canada
  5. 5Therapeutics Initiative, University of British Columbia, Vancouver, British Columbia, Canada
  1. Correspondence to Dr Kieran Shah; Kieran.Shah{at}fraserhealth.ca

Abstract

Background Discrepancies in outcome reporting (DOR) between protocol and published studies include inclusions of new outcomes, omission of prespecified outcomes, upgrade and downgrade of secondary and primary outcomes, and changes in definitions of prespecified outcomes. DOR can result in outcome reporting bias (ORB) when changes in outcomes occur after knowledge of results. This has potential to overestimate treatment effects and underestimate harms. This can also occur at the level of systematic reviews when changes in outcomes occur after knowledge of results of included studies. The prevalence of DOR and ORB in systematic reviews is unknown in systematic reviews published post-2007.

Objective To estimate the prevalence of DOR and risk of ORB in all Cochrane reviews between the years 2007 and 2014.

Methods A stratified random sampling approach was applied to collect a representative sample of Cochrane systematic reviews from each Cochrane review group. DOR was assessed by matching outcomes in each systematic review with their respective protocol. When DOR occurred, reviews were further assessed if there was a risk of ORB (unclear, low or high risk). We classified DOR as a high risk for ORB if the discrepancy occurred after knowledge of results in the systematic review.

Results 150 of 350 (43%) review and protocol pairings contained DOR. When reviews were further scrutinised, 23% (35 of 150) of reviews with DOR contained a high risk of ORB, with changes being made after knowledge of results from individual trials.

Conclusions In our study, we identified just under a half of Cochrane reviews with at least one DOR. Of these, a fifth were at high risk of ORB. The presence of DOR and ORB in Cochrane reviews is of great concern; however, a solution is relatively simple. Authors are encouraged to be transparent where outcomes change and to describe the legitimacy of changing outcomes in order to prevent suspicion of bias.

  • general medicine (see internal medicine)
  • statistics & research methods
  • clinical trials
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This is the first study that assesses a representative sample of Cochrane reviews from all Cochrane review groups.

  • This study assesses both discrepant outcome reporting and the risk of outcome reporting bias within Cochrane systematic reviews.

  • Cochrane review authors were contacted in cases where there was missing or incomplete information.

  • Our study did not assess the link between outcome reporting bias and statistical significance of findings within each Cochrane review.

Introduction

Cochrane systematic reviews of interventions attempt to collate empirical evidence in order to provide key decision makers with up-to-date information on the benefits and harms of healthcare interventions.1 These reviews have been internationally regarded as one of the leading resources for reliable information on healthcare interventions.2

Systematic review authors are encouraged to develop a protocol to document hypotheses, methodology and outcomes a priori to minimise the risk of bias. While every effort should be made to adhere to reporting of prespecified outcomes, this may not always be possible. Discrepancies in outcome reporting (DOR) between the protocol and the published review can occur in order to adapt to unanticipated circumstances. DOR includes inclusion of a new outcome, omission of a prespecified outcome, upgrade or downgrade of a secondary or primary outcome (respectively), or change in outcome definition. When DOR occurs on the basis of the results of included studies, the result is highly susceptible to bias.1 This type of bias is known as outcome reporting bias (ORB), defined as the selection or change in outcomes for publication from the original set of prespecified outcomes after knowledge of results.3

Evidence of ORB first appeared in randomised controlled trials (RCTs), and it was suggested that this bias exists in up to a half of RCTs that changed, introduced or omitted at least one primary outcome.4 This was coupled with an increased risk (OR 2.2–4.7) for publication of outcomes that were statistically significant.4 Further, empirical studies suggest that insufficiently accounting for the presence of this type of bias in RCTs can inherently skew the analyses of systematic reviews, having the potential to overestimate treatment effects.5

It is equally important to assess for the presence of DOR and risk of ORB at the level of systematic reviews. Kirkham et al3 found that approximately one-fifth of Cochrane protocols and systematic review pairings were found to contain DOR, a third of which were suspected of having a high risk of ORB where outcomes changed after knowledge of results of included studies. There was also an increased risk of the promotion of outcomes from secondary to primary when results were statistically significant (risk ratio 1.66; 95% CI 1.10 to 2.49; p<0.02).3 The authors concluded that ORB at the level of systematic reviews is an under-recognised problem despite its prevalence.

There are guidelines that provide explicit instructions for authors aimed at minimising the risk of ORB. The recommendation to document DOR and the reasons for it is mentioned in the published PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines (2009) and in Cochrane Handbook versions since 2006.1 6 7 To evaluate this, a recent survey of Cochrane review group editorial teams indicated that 86% of review groups verify that the outcomes originally defined in the systematic review protocol are always analysed in the final review.8

We are unaware of any empirical studies that investigate the prevalence of DOR and risk of ORB in Cochrane systematic reviews post-2007 in the context of the established PRISMA guidelines (2009) and Cochrane Handbook of Systematic Reviews (since 2006).6 7 To our knowledge, Kirkham et al3 was the most recent group to assess a limited cohort of Cochrane reviews only from years 2006–2007. This study and previous studies do not adequately represent all Cochrane review groups, therefore limiting the generalisability of their results across all Cochrane review groups (51 groups in total).3 9–11

Our study investigated the prevalence of DOR and risk for ORB in an expansion of reviews from an adequate representation of all Cochrane review groups between 1 May 2007 and 1 August 2014, to provide insight into whether or not this type of bias has decreased with the aforementioned interventions.6 7 Gaining knowledge on the prevalence of DOR and ORB and the reasons for it in more recent Cochrane systematic reviews may shed light on the efficacy of the current interventions in place and how they may be improved.

The primary objective of this study was to estimate the incidence of DOR and risk of ORB in all Cochrane reviews from all Cochrane review groups post-2007. A secondary objective was to categorise the types of DOR presented in Cochrane reviews and to describe any patterns we identified.

Methods

Selection and description of reviews

Our study involved a cohort of Cochrane systematic reviews. We did not publish our protocol in a suitable repository; however, a version of our protocol can be found in online supplementary file 1 for readers to refer to. We assumed that protocols for systematic reviews published in other journals would be more difficult to access, as they may not require protocols to be submitted. Reviews and their protocols were included if they assessed the benefits and/or harms of interventions used in healthcare and health policy (ie, intervention reviews). Reviews were excluded if they did not have a protocol available, or if they were methodological, diagnostic, overviews or reviews from Cochrane review groups with no clinical interventions (eg, Methodology Review Group).

There are approximately 4200 Cochrane systematic reviews published between 1 May 2007 and 1 August 2014. We attempted to replicate our search results during our revision of the manuscript. We found 3197 published reviews in the specified period. Some reviews were updated after our original search dates and contained a publication date that is post-August 2014, and were therefore not included in our replicated search results. This could account for differences in search yield. Nonetheless, a record of all citations available at the time of our original search is available in online supplementary file 2, and readers can replicate our methodology with the citations available.

These were identified using the ‘advanced search’ function in the Wiley Online Library of Cochrane Reviews to set search limits for specific dates (from 1 May 2007 to 1 August 2014) and reviews. Protocols of corresponding reviews were accessible through the ‘other versions’ tab. Based on this and the assumption that approximately 40% of reviews would have some evidence of DOR from background literature,3 9 10 we calculated a sample size of 350 reviews to be used in our study using a margin of error of ±5%.

To account for reviews from all review groups, we used a stratified sampling approach to proportionally collect a sample from each review group. A random sample was collected from each Cochrane review group with a clinical focus (ie, non-methodological reviews), and these samples were stratified to produce a representative sample of all Cochrane systematic reviews from all Cochrane review groups based on publication volume. For example, the acute respiratory group represents approximately 3% of all Cochrane systematic reviews. Therefore, 3% of our sample were from the acute respiratory group. This methodology was applied across all Cochrane review groups that met our inclusion criteria. Our study is the first study to collect a representative sample across all Cochrane review groups, whereas previous studies looked at specific publication issues (eg, issue 4, 2006) or specific review groups.3 9 10 Further details on how we generated random samples from each Cochrane review group can be found in our protocol and in online supplementary table 1.

Outcomes and data collection

The primary outcomes included were (1) the incident rate of Cochrane systematic reviews with at least one DOR between the protocol and the published review; and (2) the risk of ORB (measured as low, high or unclear risk) in Cochrane systematic reviews with DOR. The secondary outcomes included the types of discrepancies identified (inclusion of an outcome (eg, addition of a post-hoc outcome), omission of a prespecified outcome, upgrade or downgrade of a primary or secondary outcome (respectively), or change in the definition of an outcome). We only assessed the risk of ORB but not the presence of ORB (ie, if DOR led to a change in results or conclusions) due to limitations in time and human resources.

Two authors assessed DOR in each Cochrane review by matching outcomes in the published review to the prespecified outcomes in the protocol. Data were collected on the type of DOR (inclusion of an outcome, omission of a prespecified outcome, upgrade or downgrade in a secondary or primary outcome (respectively), or change in definition of an outcome). Data were collected and tabulated via Microsoft Excel to calculate the overall incidence of DOR and of types of discrepancies in this cohort of systematic reviews (online supplementary table 2).

Two authors sought reasons for DOR in each review (eg, within the ‘discrepancies between protocol and review’ section). If there was no reason or justification provided, review authors were contacted to provide reasons. We developed a process to determine the risk of ORB in reviews with DOR as consistently as possible when reasons for outcome changes were available. Reviews with DOR were deemed ‘high’ risk of ORB if DOR occurred after knowledge of results or ‘low’ risk if DOR occurred independent of the results (online supplementary table 3). The first author (KS) as well as two coauthors (AMT, ER, LH, GE or JK) assessed the risk of ORB independently when it was difficult to assign a level of risk due to ambiguity. The majority (two of three authors) decided if reviews were considered ‘high’ or ‘low’ risk of containing ORB (online supplementary figure 1). The overall risk of ORB for each review was collected and tabulated in Microsoft Excel to determine the incidence of high, low and unclear risk of ORB in this cohort of systematic reviews.

Patient and public involvement

No patients/public were involved in the study.

Results

Prevalence of discrepant outcome reporting and risk of ORB

A total of 350 reviews were collected; 23% (79 of 350) of the reviews did not have a protocol sourced next to the review under the ‘Protocol and previous versions’ sections in the Cochrane Library. These reviews were replaced, using the same method of randomisation, with reviews that did contain their respective protocols within each review group to minimise selection bias and any impact on our sample size calculation.

Of the reviews, 90% (315 of 350) and 77% (271 of 350) were published 2009 or later and 2010 or later, respectively (online supplementary figure 2) . For our primary outcome, 43% of all reviews (150 of 350; margin of error ±5%) were found to contain DOR when compared with their respective protocol. When reviews were further scrutinised, 23% (35 of 150) of reviews with DOR contained a high risk of ORB, 41% (61 of 150) had a low risk of ORB, and 36% (54 of 150) had an unclear risk (no reason for DOR available).

Types of discrepant outcome reporting and patterns

Figure 1 shows the breakdown of the number of reviews that contain a certain type of DOR for outcomes when compared with their protocols. Of the reviews with discrepancies, 50% (75 of 150) included a new outcome, 43% (64 of 150) redefined an outcome, 30% (45 of 150) omitted an outcome, and 15% (23 of 150) and 9% (14 of 150) upgraded and downgraded the outcomes, respectively.

Figure 1

Type of discrepancy by reviews with discrepancies in outcome reporting.

Discussion

We found that DOR and high risk of ORB are prevalent in more recent Cochrane reviews. The most common DOR in reviews were inclusion of a new outcome, redefining an outcome and omitting an outcome. Upgrade and downgrade of outcomes appeared to be less frequent. The majority of included reviews were published after PRISMA and Cochrane Handbook guidelines. It is difficult to ascertain if this problem is under-recognised despite guideline recommendations or if it takes more time to see a shift towards less DOR and risk of ORB.1 6 7 Nonetheless, our study highlights a concern, as we found that at least one in five reviews with DOR had a high risk of ORB. Further, it is likely that the true incidence of the risk of ORB is likely higher as we were unable to find a reason for DOR in over a third of reviews (classified as ‘unclear’ risk of ORB).

After investigating the reasons for DOR, general themes emerged from author responses and reasons highlighted within the review itself (online supplementary table 4).

Our predefined ‘high’ and ‘low’ risk of ORB applied to reasons provided within the systematic review itself and from contacting authors, with the exception for the reason ‘Omitting outcomes because of no reporting or partial reporting in studies’. These discrepancies were classified as ‘low’ risk if the explanation was provided within the review, but ‘high’ risk if authors needed to be contacted. This decision was based on the recent ORBIT II study, which found that 86% (79 of 92) of Cochrane review did not include full data on the primary harm outcome due to ‘no or partial reporting’ of outcomes in individual studies.12 When individual studies were further scrutinised, approximately two-thirds of them contained a high risk of ORB for the primary harm outcome. The authors concluded that ORB may be unintentionally introduced at the level of systematic reviews if systematic review authors decide to omit these outcomes on the basis of ‘no reporting’ or ‘partial reporting’ in individual studies.12

Comparison with other studies

To our knowledge, our study is the first to include a cohort of all Cochrane review groups post-2007 to assess the prevalence of DOR and risk of ORB. Page et al11 conducted a recent meta-analysis on the prevalence of DOR.11 They found that the prevalence of DOR occurred in 38% of reviews. However, the studies included in the meta-analysis investigated the prevalence of DOR in cohorts of Cochrane reviews prior to 2007.3 10 11 Our study includes reviews that are published after Cochrane Handbook of Systematic Reviews1 7 (as early as 2006) and PRISMA guidelines (2009),6 both of which provide recommendations to minimise DOR and ORB in systematic reviews. Despite these guidelines, we found that the prevalence of DOR is similar to Page et al’s11 results, supporting our conclusion that DOR has not improved.

Ours is the first study to include an adequate representation of reviews across all Cochrane review groups. Both Silagy et al10 and Kirkham et al3 analysed a cohort of reviews published within specific issues in years 2000 (issue 3) and 2006–2007 (issues 4 and 2), respectively. Both studies do not mention if the cohort they analysed adequately represents all Cochrane review groups. Dwan et al analysed a cohort of systematic reviews specifically from the Cochrane Cystic Fibrosis and Genetic Disorders group.9

Similar to our study, Kirkham et al3 further scrutinised reviews with DOR prior to 2007 to assess for risk of ORB. They concluded that 29% of reviews with DOR in their cohort were suspected for potential ORB. The risk of ORB was assessed after contacting authors for reasons of discrepancies, and was suspected if outcomes were changed after knowledge of results. Applying a similar methodology to our study for defining an outcome discrepancy as ‘high’ risk for ORB, we found that 23% of reviews in our cohort were suspected for ORB. A potential explanation for why our calculated estimate for risk of ORB is lower compared with previous studies could be attributed to over a third of included reviews having unexplained reasons, and therefore unclear risk of ORB. Using a conservative estimate based on our current findings, if we assumed that one-fifth of reviews with unexplained reasons for outcome discrepancies were expected for potential risk of ORB, then that would increase our estimate up to 30% of reviews with potential risk of ORB in our cohort.

Limitations

Our study does not use a standardised classification system to assess for risk of ORB, therefore introducing a risk of subjectivity and inter-rater variability when assessing for risk of bias at the level of systematic reviews. While our approach for assessing discrepant outcomes as ‘high’ or ‘low’ risk for ORB was relatively simple, we realised, after contacting review authors, that not all post-hoc changes were bias-related (online supplementary table 4). In these instances, we performed a duplicate risk assessment, where bias was assessed independently and together as a group to determine the overall risk of bias while minimising inter-rater variability. All responses were initially categorised into themes, and duplicate assessment was used to categorise themes as either high or low risk. We achieved consensus in 100% of the themes that emerged (high, low or unclear risk of ORB). These labels (high, low or unclear risk) were applied to reviews that fit under their respective themes. To be clear, we did not analyse if ORB was present by looking at the results or conclusions of reviews with or without DOR; we only assessed the risk of ORB in reviews where DOR was present.

We found that approximately a quarter of reviews did not have a protocol. We did not contact authors to provide protocols or reasons as to why protocols were not sourced next to the review. Kirkham et al3 attempted to do this in another study, although with reviews published between 2006 and 2007, and found that 8% (24 of 297) of reviews did not have a protocol. Protocols were missing for the following reasons: (1) the review was split into a number of separate reviews and only one protocol was registered (n=9); (2) the draft protocol was accepted by the Cochrane review group but not registered in the library as it was never formally published (n=4); (3) protocols were withdrawn from the library on the advice of the collaboration because they were seen to be out of date (n=2); (4) reviewers published a review without a protocol (n=5); and (5) reasons not provided (n=2).3 It is possible that we may have encountered similar reasons for protocols not being available. We encourage the Cochrane Collaboration to ensure that all protocols be made available or state clearly why a protocol for any particular published review is not available.

Sometimes it is not obvious to the reviewer when conducting the reviews what data will be available, and therefore not clear what outcomes to designate a priori as primary or secondary. The systematic review team may decide, based on the type of data available, to select one or the other as the primary outcome, therefore basing it on the significance of such outcomes. We designated this as a high risk of bias; others may designate this as a low risk of bias. However, we feel strongly that every systematic review team should carefully prespecify the outcomes they feel are most important to patients, caregivers and policy makers. Designating the most important outcome as ‘primary’ is critical as it gives readers some sense of what the review team felt was critical in terms of understanding the impact of the interventions being studied. In the event the included studies do not report on the prespecified outcomes and there are no data to analyse, this would still be useful information for readers to know. It is also possible that reviewers may unintentionally be introducing bias as there may be a risk of ORB at the level of RCTs for those specific outcomes. This was evident in the ORBIT II study, particularly for omission of safety outcomes of interventions.12

There are few instances where discrepancies between authors who wrote the final review and those who wrote the protocol resulted in DOR (n=3/150; 2%). We classified this as a low risk of ORB (online supplementary table 4). However, it is possible that authors who wrote the final review may have consulted the original protocol, and despite this may have changed outcomes. This would change the assessment to a high risk of bias if this was based on knowledge of results. Therefore, misclassification of risk of bias may have occurred in these cases. Overall, this did not occur frequently, and we felt that this would not impact the overall result of our paper.

Our study did not investigate the association between statistical significance and the upgrade and inclusion of outcomes, and the presence of ORB. As stated previously, we only assessed the risk of ORB. Performing this type of analysis would have determined if this type of DOR led to actual ORB, by showing if statistically significant outcomes that favour the intervention are more likely to be upgraded or included between the protocol and the publication. Kirkham et al3 performed this analysis and found an association between upgrading or including a new outcome and statistical significance in Cochrane reviews prior to 2007.

Unfortunately we did not collect data with respect to which subcategories of DOR were disclosed within the Cochrane review.

Future directions

Our study provides evidence that discrepant outcome reporting is widely prevalent yet under-recognised at the systematic review level, and that a significant portion of these reviews are at a high risk of ORB where change in outcomes occurs after knowledge of results. While it is essential that reviewers continue to assess the impact of ORB in clinical trials included in the review, it is also important that reviewers are aware that the risk of ORB can occur in the systematic review process as well, whether intentionally or unintentionally. One potential way of overcoming this would be to include alerts in systematic review and meta-analysis software (eg, RevMan) where outcome discrepancies occur between the protocol and the review prior to publication. This could prompt review authors to either fix discrepancies or provide a reason if it was intentional. These reasons could subsequently be placed in the ‘differences between protocol and review’ section of the published review. Furthermore, peer reviewers and editorial groups can review these reasons to ensure that they are justified, or provide further feedback to reduce suspicion of ORB prior to publication.

The Methodological Expectations of Cochrane Intervention Reviews (MECIR) are newly developed standards to which all Cochrane protocols, reviews and updates are expected to adhere to. It is now mandatory to report the results for all prespecified outcomes, irrespective of the strength or direction of the result. It is also mandatory for authors to indicate when data are not available for outcomes of interest and whether adverse events were identified.13 These standards could potentially address ORB in Cochrane systematic reviews. Future research should also focus on the impact of MECIR on ORB for future Cochrane reviews of interventions. Furthermore, follow-up studies should also assess the trend of discrepant outcome reporting and the risk of or presence of ORB over different years, extending it to 2014 and later to determine if there is a delayed impact of guidelines to mitigate ORB.

Conclusions

DOR and ORB have an impact in clinical trials, which can consequently impact the magnitude of effect and direction of statistical significance in systematic reviews. This can occur at the level of systematic reviews as well. In our cohort of systematic reviews across all Cochrane review groups published in the context of PRISMA and Cochrane Handbook guidelines, we found that DOR is widely prevalent and that a significant proportion of systematic reviews with DOR are at a high risk of containing ORB.

Acknowledgments

The contributions of the following are acknowledged: Julie Higgins, Matthew Page, Ciprian Jauca and the Therapeutics Initiative.

References

Footnotes

  • Twitter @amtejani

  • Contributors KS, GE, LH, JK, ER and AMT were involved in the conception and design, and therefore development of the protocol for this project. KS and JH (under the Acknowledgements section) were involved in the acquisition of data, and KS, GE, LH, JK, ER and AMT were involved in the analysis and interpretation of data. All authors were involved in drafting and revising the final manuscript. All authors have given final approval for this version to be published and take public responsibility for appropriate portions of the data and agree to be accountable for all aspects of the work.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available upon reasonable request. All data relevant to the study are included in the article or uploaded as supplementary information.