Background Outcome reporting bias (ORB) in randomised trials has been identified as a threat to the validity of systematic reviews. Previous work highlighting this problem is limited to considering a single primary review outcome. The aim of this study was to assess ORB across all efficacy outcomes in the Cochrane systematic reviews of cystic fibrosis.
Methods Systematic reviews of interventions for cystic fibrosis published on the Cochrane Library by the Cochrane Cystic Fibrosis and Genetic Disorders Group before 2010 were assessed for discrepancies in outcomes between review protocol and full review. ORB in eligible trials was also assessed for all efficacy review outcomes. Two authors independently classified each outcome using a nine-point classification system developed by the Outcome Reporting Bias In Trials study. These classifications were used to inform the assessment of the risk of bias for selective outcome reporting for each trial.
Results –46 Cochrane cystic fibrosis systematic reviews were included. The median number of primary outcomes, number of trials and participants per trial in the reviews were 3 (IQR 2, 3), 4 (IQR 2, 8) and 21 (IQR 14, 41), respectively. 18 reviews (39%, 18/46) had a discrepancy in outcomes between protocol and full review. 37 reviews were eligible to be included in the ORB assessment. When considering review primary outcomes and all review outcomes, ORB was suspected in at least one trial in 86% and 100%, respectively.
Conclusions Assessment of ORB within a systematic review of a single primary outcome underestimates the risk of ORB in comparison to the assessment of multiple primary and secondary outcomes. ORB in trials is highly prevalent within systematic reviews of cystic fibrosis when assessed across all outcomes. This could be reduced by the development of a core outcome set for trials and systematic reviews in cystic fibrosis.
- Statistics & Research Methods
- Medical Education & Training
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Assessment of discrepancies in outcome selection between systematic review protocols and full reviews.
Assessment of outcome reporting bias (ORB) at the outcome level across all efficacy systematic review outcomes.
Assessment of the risk of bias of a trial from selective outcome reporting within a systematic review.
Assessment of ORB within a systematic review of a single primary outcome underestimates the risk of ORB in comparison to the assessment of multiple primary and secondary outcomes. Clearer guidance is needed on how to assess the risk of bias as a result of selective outcome reporting for each included trial within a systematic review, when considering multiple outcomes.
The development of a core outcome set in cystic fibrosis (CF) would help reduce the problem of ORB.
Strengths and limitations of this study
This is the first study to consider the assessment of ORB in all efficacy review outcomes. However, this is limited to reviews of CF.
The value of systematic reviews in establishing an evidence base is widely acknowledged with well-conducted systematic reviews of randomised controlled trials (RCTs) being placed at the top of the hierarchy of evidence.1 It is essential, when conducting systematic reviews, to consider the potential for bias and its impact on the review conclusions. Bias may be induced through the decisions and actions of the authors of the included clinical trials or systematic review authors.
Bias in a systematic review is frequently considered in relation to limitations of the search strategy. However, bias may also occur, for example, when outcomes are added, omitted or changed after a systematic review protocol is published if the decision to deviate from the protocol is based on the significance of the results. A study of an unselected cohort of Cochrane reviews revealed that over a fifth (64/288) of the protocol/review pairings showed some discrepancy in at least one outcome measure with just 6% (4/64) describing the reason for the change in the review.2 Results also indicated that outcomes promoted from primary to secondary between the protocol and the review were more likely to report statistically significant meta-analysis results in comparison to reviews where there was no discrepancy in outcome specification with the review protocol (relative risk 1.66 95% CI (1.10 to 2.49), p=0.02).
Systematic reviews are only as valid as the trials they contain,3 and consequently much effort is given to assessing the risk of bias within the trials identified by assessing their methodological quality. However, it is also important to consider the content of trial reports in an assessment of bias. Outcome reporting bias (ORB) within an RCT is defined as the result-based selection of a subset of the original outcomes for publication.4–,6 In a systematic empirical assessment of Cochrane reviews within which a single review primary outcome could be identified,7 ORB was suspected in at least one RCT in more than a third of the systematic reviews that were examined (35%). This study may have underestimated this problem as review primary outcomes are chosen according to their clinical importance and are more likely to have been measured and reported in trials. Therefore, there is concern regarding the prevalence and impact of ORB in reviews, where multiple primary outcomes are specified, or in secondary outcomes.
Systematic reviews in cystic fibrosis (CF) are characterised by inclusion of small randomised trials specifying multiple primary outcomes. Reporting standards for trials of CF have also been shown to be low when comparing trial reports with the Consolidated Standards of Reporting Trails (CONSORT) statement.8 The aims of this current study were to:
Examine the potential for bias created by review authors by identifying inconsistencies between outcomes published in review protocols and in the associated published reviews.
Determine the prevalence of ORB in trials in systematic reviews of CF, extending previous work by considering all review efficacy outcomes (multiple primary and secondary).
Assess the risk of bias of trials from selective outcome reporting when considering review primary outcomes only in comparison to all review outcomes.
A cohort of systematic reviews published by the Cochrane Cystic Fibrosis and Genetic Disorders (CFGD) group on the Cochrane Library before 2010 were identified.9 Reviews were eligible for inclusion if they compared interventions for CF and identified one or more eligible RCTs. RCTs that had been excluded (in the ‘characteristics of excluded studies’ section) were also checked for any suggestion of ORB. For example, if a review had excluded trials as a result of ‘no relevant outcome data (NROD)’, then these trials were also scrutinised for the presence of ORB and included in the assessment.
Changes in outcomes between systematic review protocol and full review—review level
The numbers of primary and secondary outcomes per review were compared with the recommendations for the number of outcomes (no more than three primary outcomes and a limited number of secondary outcomes) to include in a review in the Cochrane Handbook.10 If a review did not distinguish between primary and secondary outcomes, the first three outcomes listed were taken to be the primary outcomes and the rest were considered as secondary outcomes. Protocols of the systematic reviews were accessed and outcomes stated in the protocol were compared with those stated in the full review. Changes in outcomes were identified and categorised by one author (KD) as: primary outcome downgraded to secondary (downgrade); secondary outcome upgraded to primary (upgrade); a new outcome not stated in the protocol was added to the full review (addition) or an outcome stated in the protocol was omitted from the full review (omission). If there had been a change in outcomes, the section ‘changes between protocol and review’ was examined for a declaration and explanation of the changes.
Assessing trial reports for full ORB—outcome level
For each eligible systematic review, all reports relating to included studies and studies excluded due to no relevant outcome data were obtained. Reviews were checked to see whether review authors had contacted trialists for further information or data for outcomes. Where this was not clear in the review, the review authors were asked to clarify.
A nine-point classification system (table 1) developed for missing or incomplete outcome reporting in randomised trials was used to make an assessment of the risk of bias.7 Table 1 also provides examples of outcomes that were not assessed because they had poor outcome definitions. An outcome matrix (table 2) was created for each review using the Outcome Reporting Bias In Trials (ORBIT) matrix generator (http://ctrc.liv.ac.uk/orbit/), with studies listed in the rows and review primary and secondary outcomes listed in the columns with the ORBIT classifications (table 1) given for each review outcome that was not fully reported (eg, not reported or partially reported p>0.05).
The outcomes listed or detailed in the Method section and the outcomes reported in the results section were compared for all trial publications to determine whether each outcome of the systematic review was measured and analysed. In some instances, it may be obvious that an outcome was measured given the other outcomes reported. For example, if cause-specific mortality is reported, then overall mortality must have been measured, even if not reported. In other situations, it may be that a battery of tests or measurements are usually undertaken together, for example, forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and forced expiratory flow (FEF25–75; average expired flow over the middle half of the FVC manoeuvre). FEV1 is the outcome most often considered for lung function due to its validity and repeatability and it is the outcome most understood by clinicians. However, the device used to measure FEV1 also measures the majority of other lung function outcomes. Therefore, if FEV1 was reported in a trial, it was assumed that other lung function outcomes were also measured but not necessarily analysed (classification F) unless they were specifically stated as an outcome in the trial report. However, if FEV1 was not reported but other lung function outcomes were, then an E classification was given to FEV1 as suspicion would be raised that the latter may have been selectively not reported. This was decided after discussion with clinical experts.
However, it is often difficult to assess whether an outcome was measured, and clinical judgement is required. The clinical lead for each review was contacted by email and asked for their input into the assessment of selective outcome reporting within the trials included in their review. An assessment of whether the review outcomes had been measured and reported within each trial using the classification system was completed. The clinical lead for the review and KD independently assessed the trials in the review and any disagreements were resolved through discussion and then checked with a third person (JJK or PRW).
Assessment of risk of bias for selective outcome reporting—trial level
If one or more of the outcomes for a trial was given a high-risk classification according to table 1, the trial was deemed to be at high risk of bias from selective reporting.
Descriptive results are presented. The median and IQR for the number of review primary and secondary outcomes were calculated.
Data are tabulated and excerpts found in the trial reports relating to review outcomes are used to support decisions made regarding ORBIT classifications and the assessment of risk of bias.
The CFGD group had 46 CF systematic reviews published as of 2010.
Changes in outcomes between systematic review protocol and full review—review level
Protocols were available for all 46 systematic reviews. Nine protocols (20%) did not distinguish between primary and secondary outcomes. Table 3 shows the median number of primary and secondary outcomes for the 46 reviews and the changes in outcomes between protocol and full review.
Eighteen reviews (39%, 18/46) had a discrepancy in outcomes between a protocol and a full review. Between a review protocol and a full review, five (28%) listed all changes, two (11%) listed some changes and 11 reviews (61%) did not mention any change in outcomes. Of the seven reviews that described the changes between a protocol and a full review, three provided no reason for the changes, two stated that the changes in recommendations in the Cochrane Handbook to have a maximum of three primary outcomes were the reason for downgrading outcomes and two reviews stated that they added clinically relevant outcomes that were discovered during the review process.
Assessing trial reports for full ORB—outcome level
Of the 46 published reviews, 38 were eligible to be assessed for ORB (figure 1).
One review was excluded at this stage as the outcomes could not be assessed for ORB due to the different ways in which the outcome definitions could be measured and reported. The primary outcomes were psychosocial outcomes, which included any objective measure with adequate psychometric properties and demonstrable reliability and validity quantifying psychological or social outcomes or both, including individual psychological adjustment, relational, social functioning and adaptation to life with CF.
Therefore, 37 reviews were assessed for ORB, including 280 RCTs (278 included and 2 excluded due to no relevant outcome data but confirmed by review authors that they would have otherwise been included). The median number of trials per review was 4 (IQR 2, 8) and there was a median sample size of 21 (IQR 14, 41) per trial.
Review authors contacted trialists for missing outcome data in 33 reviews (89%); one stated that ‘trialists were not contacted but would be in updates of the review’ and 3 reviews did not state if trialists were contacted for further data.
The lead authors of 12 reviews assessed the included trials and gave classifications for each outcome. For 13 reviews, authors gave input on which outcomes they expected to be measured for trials in their review and which outcomes they expected to be measured in routine clinical practice but did not classify each outcome due to time restrictions. The authors of 12 reviews did not respond to our request.
For the 12 reviews where the authors assigned classifications, discussion was needed on all outcomes to come to an agreed classification. For the other 25 reviews, it was difficult to assign a classification to all outcomes as some outcomes needed a large amount of clinical input to understand the outcome and language used to describe the outcomes within the trial reports. Owing to the number and complexity of outcomes and lack of reviewer input on the majority of reviews, it was decided that the assessment of all primary outcomes listed in the full review that were well-defined should take priority. Many outcomes were also split into suboutcomes or ill-defined to maximise the ability of a trial to contribute data to the review.
The ORBIT classifications for the review primary outcomes for the 280 RCTs are shown in table 1. For the 12 reviews where reviewer input was obtained, classifications for 64 included trials for review secondary outcomes are also shown in table 1. Eligible trials within the reviews fully reported 383 (33.7%) review primary outcomes and 125 (18.7%) review secondary outcomes. In addition to the classifications in table 1, a ‘G, no events’ classification (eg, mortality, where clinical judgement says that it is likely to have been measured and it would have been reported had any deaths occurred. Therefore, it is assumed no deaths occurred during the trial) was given to eligible trials within the reviews for 109 (9.5%) review primary outcomes and 22 (3.3%) review secondary outcomes. Owing to limited reviewer input or the lack of a standard definition, we were unable to assess outcomes (including: adverse events, symptoms, complications, biochemical measures of glycaemic control, symptoms of sleep disordered breathing and measures of specific indices of strength, mass, effort and general fatigue) for 102 trials for review primary outcomes and 59 trials for review secondary outcomes.
Assessment of risk of bias from selective outcome reporting—trial level
Eighteen reviews (49%) had not yet assessed the risk of bias for selective outcome reporting as although the Cochrane guidance on the risk of bias was introduced in 2008 and the cut-off for this study was the beginning of 2010, these reviews were still to be updated. Seventeen reviews (46%) had assessed the risk of bias for all included trials and two reviews (5%) assessed this for some of their included trials.
As we were unable to assess secondary outcomes for ORB for all reviews, the risk of bias assessments were made based on classifications of primary outcomes in order to be consistent across reviews. Only five (14%) of the 37 reviews had no trials at high risk of bias based on the review primary outcomes only. Table 4 shows the risk of bias for selective outcome reporting as defined in this study and also as assessed within the published reviews for the 280 trials assessed for ORB based on the consideration of review primary outcomes only. It was found that 69% of trials had either not been assessed for selective reporting or were assessed as an unclear risk.Table 5 shows the risk of bias for selective outcome reporting based on the consideration of review primary and secondary outcomes separately for the 12 reviews (64 trials) where reviewers also provided classifications. This was to see if decisions regarding risk of bias would change if we considered all outcomes. Only four (6%) of the 64 RCTs had a low risk of bias when considering all outcomes.
Discrepancies in the risk of bias when considering all outcomes arose in 34 (53%) trials; 31 were at low risk when considering review primary outcomes only but at a high risk of bias (excluding G classifications: 13, G classification only: 18) when considering all outcomes; 3 were at high risk (G classifications only) when considering review primary outcomes only but high risk (excluding G classifications) when considering all outcomes. This often occurred in reviews where there was only one or two primary outcomes and a large number of secondary outcomes.
Based on all review outcomes, none of the 12 reviews had all included trials at a low risk of bias (table 6 ).
This is the first study to consider all review efficacy outcomes in an ORB assessment which has allowed us to make practical recommendations on assessing the risk of bias of selective reporting for systematic reviews at both the review and trial levels. Over a third of the Cochrane CF reviews (39%) examined had a discrepancy in outcomes between the review protocol and full review. This compares with 22% of reviews (64/288) that contained a discrepancy in at least one outcome measure in the main ORBIT study which looked at reviews covering all 50 Cochrane review groups.2 However, this is confounded by the different publication date ranges of the reviews (assessed as up to date between 2006 and 2009). Furthermore, for the CF reviews, ORB was suspected in at least one RCT in 86% of reviews when considering all review primary outcomes. When only a single primary outcome was considered, the prevalence of reviews containing at least one trial with a high suspicion of ORB from ORBIT was substantially lower at 34% (96/283).7 While this study is limited only to CF trials, it is clear that the problem of ORB is much larger when considering more than just the single primary review outcome of importance that was used in the ORBIT study.
A study by von Mosch and Dwan8 that compared the reporting in trial reports of CF to the CONSORT statement found that from a maximum of 57 points available, the scores rose from a median of 17.5 (IQR 15.5–24.5) in 1994 to a median of 32 (IQR 22.8–41.5) in 2008. Along with the current study, this also indicates that there is still room for improvement in the reporting of outcomes.
Use of the ORBIT classification system offered a robust methodology for assessing the risk of bias for trials included within a systematic review. When considering the 64 trials in the 12 reviews where it was possible to assess both primary and secondary outcomes, when basing the risk of bias assessment on review primary outcomes, 45% of trials were at high risk of bias and when using all outcomes in the assessment, 94% were at a high risk of bias. Using the current selective reporting item of the current Cochrane risk of bias tool, 69% of trials included in CF reviews were assessed by reviewers as showing an ‘unclear’ risk of bias or not assessed at all, indicating the need for more informed guidance on assigning risk of bias in the systematic review process for all outcomes within a review.
The ORBIT classification system has already been validated as part of the original project. Sensitivity results for predicting that the outcome had been measured (G-classification) were 92% (23/25, 95% CI 81% to 100%), while the specificity for predicting that the outcome had not been measured (H-classification) was 77% (23/30, 95% CI 62% to 92%). With the additional requirement to assess all outcomes in this project, there was an increasing number of outcomes that were not mentioned in the trials reports, and therefore clinical judgement was needed as to whether the outcome of interest was likely to have been measured in a particular trial. Many review authors did not respond to our request to provide classifications (68%), but for those with no response, we did obtain clinical input for the primary outcome from within the CFGD group. Although we cannot exclude the possibility of response bias, it is quite likely that the decision to respond was influenced by time commitments rather than review characteristics. However, these assessments will be provided to the review authors when their review is due to be updated.
Reviewers should ensure that changes between protocol and reviews are listed and justifications provided to enhance the validity of these decisions. Eligible trials should not be excluded on the basis of ‘No relevant outcome data’ because although an outcome was not reported, it may have been measured and contact with the authors is advised. Reviewers should be encouraged to consider trials that have not reported an outcome of interest and to assess whether selective reporting has occurred for all review outcomes. They should consider the amount of missing data from their meta-analysis (ie, the percentage of the sample sizes of the studies that were included compared with those that would have been eligible to be included in the meta-analysis but no outcome data were reported), and this information should be included along with the pooled effect estimate. If appropriate, a sensitivity analysis should be applied to assess the robustness of the conclusions of the review, such as an imputation approach,11 the Copas bound for maximum bias12–,14 or a model based correction.15
Individuals conducting systematic reviews need to address explicitly the issue of missing outcome data for their review to be considered a reliable source of evidence. Extra care is required during data extraction; reviewers should identify when a trial reports that an outcome was measured but no results were reported or events observed, and contact with trialists should be encouraged. Contacting authors is encouraged by CRG and is standard practice within CFGD reviews, which is reflected in our results, as 89% of reviews stated that they contacted authors for extra information on outcomes.
It is recommended that review authors ensure that they limit the number of outcomes in the review and define them clearly as this will allow easier assessments of selective reporting, which can be done during data extraction of the included trials as long as a knowledgeable clinical person is involved. Lung function was specified as the first primary outcome in nineteen reviews (50%), as the second or third primary outcome in 11 reviews (29%), as a secondary outcome in six reviews (16%) and it was not included as an outcome in only one review (5%). However, as discussed earlier, lung function can be measured in different ways (FEV1, FVC, mid-FEF, peak expiratory flow rate, residual volume, total lung capacity, Lung clearance index and maximum expiratory flow.) These outcomes can then be analysed and reported in different ways such as: percentage predicted, litres, litres/second and post-treatment, absolute change from baseline, relative change from baseline or annual rate of change. Therefore, there is a large scope for selective reporting. One solution is the development of a core outcome set for CF.16–,18
Unanswered questions and future research
Work is needed to consider what the best method is to assess the impact of ORB on the results of the meta-analysis when there are multiple outcomes. Multivariate meta-analysis has been suggested by Kirkham et al19 and a model based correction has been suggested by Copas et al.15
Systematic reviews need to clearly state the primary and secondary outcomes that they will consider and be consistent between review protocol and full review.
ORB is a major problem for systematic reviews and more guidance needs to be included in the Cochrane handbook to allow assessment of this important item within the risk of bias tool. We recommend that an outcome matrix be completed during the production of a review to allow an ORB assessment for all review outcomes which can then inform the risk of bias assessment.
A core set of outcomes should be agreed upon for CF, which in turn will have a positive impact on systematic reviews. As future trials are conducted, they should specifically set out to measure and report these outcomes, thereby reducing the prevalence of selective reporting.
The authors would like to thank the Cochrane CFGD group reviewers for their assistance with clinical input for the ORB assessments of the included trials in their reviews.
Contributors KD, PRW, CG and JJK conceived the idea of the study and were responsible for the design of the study. KD was responsible for undertaking the data analysis and produced the tables and graphs. CG, PRW and JJK provided input for the data analysis. The initial draft of the manuscript was prepared by KD and then circulated repeatedly among all authors for critical revision. KD was responsible for the acquisition of the data and KD, PRW, CG and JJK contributed to the interpretation of the results. All authors read and approved the final manuscript.
Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests PRW's time was funded by The MRC North West Hub for Trial Methodology Research. KD's time was funded by NIHR.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The outcome matrix for each systematic review included is available from the contact author on request.