Article Text
Abstract
Background Several scales, checklists and domain-based tools for assessing risk of reporting biases exist, but it is unclear how much they vary in content and guidance. We conducted a systematic review of the content and measurement properties of such tools.
Methods We searched for potentially relevant articles in Ovid MEDLINE, Ovid Embase, Ovid PsycINFO and Google Scholar from inception to February 2017. One author screened all titles, abstracts and full text articles, and collected data on tool characteristics.
Results We identified 18 tools that include an assessment of the risk of reporting bias. Tools varied in regard to the type of reporting bias assessed (eg, bias due to selective publication, bias due to selective non-reporting), and the level of assessment (eg, for the study as a whole, a particular result within a study or a particular synthesis of studies). Various criteria are used across tools to designate a synthesis as being at ‘high’ risk of bias due to selective publication (eg, evidence of funnel plot asymmetry, use of non-comprehensive searches). However, the relative weight assigned to each criterion in the overall judgement is unclear for most of these tools. Tools for assessing risk of bias due to selective non-reporting guide users to assess a study, or an outcome within a study, as ‘high’ risk of bias if no results are reported for an outcome. However, assessing the corresponding risk of bias in a synthesis that is missing the non-reported outcomes is outside the scope of most of these tools. Inter-rater agreement estimates were available for five tools.
Conclusion There are several limitations of existing tools for assessing risk of reporting biases, in terms of their scope, guidance for reaching risk of bias judgements and measurement properties. Development and evaluation of a new, comprehensive tool could help overcome present limitations.
- publication bias
- bias (epidemiology)
- review literature as topic
- checklist
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
Tools for assessing risk of reporting biases, and studies evaluating their measurement properties, were identified by searching several relevant databases using a search string developed in conjunction with an information specialist.
Detailed information on the content and measurement properties of existing tools was collected, providing readers with pertinent information to help decide which tools to use in evidence syntheses.
Screening of articles and data collection were performed by one author only, so it is possible that some relevant articles were missed, or that errors in data collection were made.
The search of grey literature was not comprehensive, so it is possible that there are other tools for assessing risk of reporting biases, and unpublished studies evaluating measurement properties, that were omitted from this review.
Background
The credibility of evidence syntheses can be compromised by reporting biases, which arise when dissemination of research findings is influenced by the nature of the results.1 For example, there may be bias due to selective publication, where a study is only published if the findings are considered interesting (also known as publication bias).2 In addition, bias due to selective non-reporting may occur, where findings (eg, estimates of intervention efficacy or an association between exposure and outcome) that are statistically non-significant are not reported or are partially reported in a paper (eg, stating only that ‘P>0.05’).3 Alternatively, there may be bias in selection of the reported result, where authors perform multiple analyses for a particular outcome/association, yet only report the result which yielded the most favourable effect estimate.4 Evidence from cohorts of clinical trials followed from inception suggest that biased dissemination is common. Specifically, on average, half of all trials are not published,1 5 trials with statistically significant results are twice as likely to be published5 and a third of trials have outcomes that are omitted, added or modified between protocol and publication.6
Audits of systematic review conduct suggest that most systematic reviewers do not assess risk of reporting biases.7–10 For example, in a cross-sectional study of 300 systematic reviews indexed in MEDLINE in February 2014,7 the risk of bias due to selective publication was not considered in 56% of reviews. A common reason for not doing so was that the small number of included studies, or inability to perform a meta-analysis, precluded the use of funnel plots. Only 19% of reviews included a search of a trial registry to identify completed but unpublished trials or prespecified but non-reported outcomes, and only 7% included a search of another source of data disseminated outside of journal articles. The risk of bias due to selective non-reporting in the included studies was assessed in only 24% of reviews.7 Another study showed that authors of Cochrane reviews routinely record whether any outcomes that were measured were not reported in the included trials, yet rarely consider if such non-reporting could have biased the results of a synthesis.11
Previous researchers have summarised the characteristics of tools designed to assess various sources of bias in randomised trials,12–14 non-randomised studies of interventions (NRSI),14 15 diagnostic test accuracy studies16 and systematic reviews.14 17 Others have summarised the performance of statistical methods developed to detect or adjust for reporting biases.18–20 However, no prior review has focused specifically on tools (ie, structured instruments such as scales, checklists or domain-based tools) for assessing the risk of reporting biases. A particular challenge when assessing risk of reporting biases is that existing tools vary in their level of assessment. For example, tools for assessing risk of bias due to selective publication direct assessments at the level of the synthesis, whereas tools for assessing risk of bias due to selective non-reporting within studies can direct assessments at the level of the individual study, at the level of the synthesis or at both levels. It is unclear how many tools are available to assess different types of reporting bias, and what level they direct assessments at. It is also unclear whether criteria for reaching risk of bias judgements are consistent across existing tools. Therefore, the aim of this research was to conduct a systematic review of the content and measurement properties of such tools.
Methods
Protocol
Methods for this systematic review were prespecified in a protocol which was uploaded to the Open Science Framework in February 2017 (https://osf.io/9ea22/).
Eligibility criteria
Papers were included if the authors described a tool that was designed for use by individuals performing evidence syntheses to assess risk of reporting biases in the included studies or in their synthesis of studies. Tools could assess any type of reporting bias, including bias due to selective publication, bias due to selective non-reporting or bias in selection of the reported result. Tools could assess the risk of reporting biases in any type of study (eg, randomised trial of intervention, diagnostic test accuracy study, observational study estimating prevalence of an exposure) and in any type of result (eg, estimate of intervention efficacy or harm, estimate of diagnostic accuracy, association between exposure and outcome). Eligible tools could take any form, including scales, checklists and domain-based tools. To be considered a scale, each item had to have a numeric score attached to it, so that an overall summary score could be calculated.12 To be considered a checklist, the tool had to include multiple questions, but the developers’ intention was not to attach a numerical score to each response, or to calculate an overall score.13 Domain-based tools were those that required users to judge risk of bias or quality within specific domains, and to record the information on which each judgement was based.21
Tools with a broad scope, for example, to assess multiple sources of bias or the overall quality of the body of evidence, were eligible if one of the items covered risk of reporting bias. Multidimensional tools with a statistical component were also eligible (eg, those that require users to respond to a set of questions about the comprehensiveness of the search, as well as to perform statistical tests for funnel plot asymmetry). In addition, any studies that evaluated the measurement properties of existing tools (eg, construct validity, inter-rater agreement, time taken to complete assessments) were eligible for inclusion. Papers were eligible regardless of the date or format of publication, but were limited to those written in English.
The following were ineligible:
articles or book chapters providing guidance on how to address reporting biases, but which do not include a structured tool that can be applied by users (eg, the 2011 Cochrane Handbook chapter on reporting biases22);
tools developed or modified for use in one particular systematic review;
tools designed to appraise published systematic reviews, such as the Risk Of Bias In Systematic reviews (ROBIS) tool23 or A MeaSurement Tool to Assess systematic Reviews (AMSTAR)24;
articles that focus on the development or evaluation of statistical methods to detect or adjust for reporting biases, as these have been reviewed elsewhere.18–20
Search methods
On 9 February 2017, one author (MJP) searched for potentially relevant records in Ovid MEDLINE (January 1946 to February 2017), Ovid Embase (January 1980 to February 2017) and Ovid PsycINFO (January 1806 to February 2017). The search strategies included terms relating to reporting bias which were combined with a search string used previously by Whiting et al to identify risk of bias/quality assessment tools17 (see full Boolean search strategies in online supplementary table S1).
Supplementary file 1
To capture any tools not published by formal academic publishers, we searched Google Scholar using the phrase ‘reporting bias tool OR risk of bias’. One author (MJP) screened the titles of the first 300 records, as recommended by Haddaway et al.25 To capture any papers that may have been missed by all searches, one author (MJP) screened the references of included articles. In April 2017, the same author emailed the list of included tools to 15 individuals with expertise in reporting biases and risk of bias assessment, and asked if they were aware of any other tools we had not identified.
Study selection and data collection
One author (MJP) screened all titles and abstracts retrieved by the searches. The same author screened any full-text articles retrieved. One author (MJP) collected data from included papers using a standardised data-collection form. The following data on included tools were collected:
type of tool (scale, checklist or domain-based tool);
types of reporting bias addressed by the tool;
level of assessment (ie, whether users direct assessments at the synthesis or at the individual studies included in the synthesis);
whether the tool is designed for general use (generic) or targets specific study designs or topic areas (specific);
items included in the tool;
how items within the tool are rated;
methods used to develop the tool (eg, Delphi study, expert consensus meeting);
availability of guidance to assist with completion of the tool (eg, guidance manual).
The following data from studies evaluating measurement properties of an included tool were collected:
tool evaluated
measurement properties evaluated (eg, inter-rater agreement)
number of syntheses/studies evaluated
publication year of syntheses/studies evaluated
areas of healthcare addressed by syntheses/studies evaluated
number of assessors
estimate (and precision) of psychometric statistics (eg, weighted kappa; κ).
Data analysis
We summarised the characteristics of included tools in tables. We calculated the median (IQR) number of items across all tools, and tabulated the frequency of different criteria used in tools to denote a judgement of ‘high’ risk of reporting bias. We summarised estimates of psychometric statistics, such as weighted κ to estimate inter-rater agreement,26 by reporting the range of values across studies. For studies reporting weighted κ, we categorised agreement according to the system proposed by Landis and Koch,27 as poor (0.00), slight (0.01–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80) or almost perfect (0.81–1.00).
Results
In total, 5554 records were identified from the searches, of which we retrieved 165 for full-text screening (figure 1). The inclusion criteria were met by 42 reports summarising 18 tools (table 1) and 17 studies evaluating the measurement properties of tools.3 4 21 28–66 A list of excluded papers is presented in online supplementary table S2. No additional tools were identified by the 15 experts contacted.
Supplementary file 2
General characteristics of included tools
Nearly all of the included tools (16/18; 89%) were domain-based, where users judge risk of bias or quality within specific domains (table 2; individual characteristics of each tool are presented in online supplementary table S3). All tools were designed for generic rather than specific use. Five tools focused solely on the risk of reporting biases3 28 29 47 48; the remainder addressed reporting biases and other sources of bias/methodological quality (eg, problems with randomisation, lack of blinding). Half of the tools (9/18; 50%) addressed only one type of reporting bias (eg, bias due to selective non-reporting only). Tools varied in regard to the study design that they assessed (ie, randomised trial, non-randomised study of an intervention, laboratory animal experiment). The publication year of the tools ranged from 1998 to 2016 (the earliest was the Downs-Black tool,31 a 27-item tool assessing multiple sources of bias, one of which focuses on risk of bias in the selection of the reported result).
Supplementary file 3
Assessments for half of the tools (9/18; 50%) are directed at an individual study (eg, tool is used to assess whether any outcomes in a study were not reported). In 5/18 (28%) tools, assessments are directed at a specific outcome or result within a study (eg, tool is used to assess whether a particular outcome in a study, such as pain, was not reported). In a few tools (4/18; 22%), assessments are directed at a specific synthesis (eg, tool is used to assess whether a particular synthesis, such as a meta-analysis of studies examining pain as an outcome, is missing unpublished studies).
The content of the included tools was informed by various sources of data. The most common included a literature review of items used in existing tools or a literature review of empirical evidence of bias (9/18; 50%), ideas generated at an expert consensus meeting (8/18; 44%) and pilot feedback on a preliminary version of the tool (7/18; 39%). The most common type of guidance available for the tools was a brief annotation per item/response option (9/18; 50%). A detailed guidance manual is available for four (22%) tools.
Tool content
Four tools include items for assessing risk of bias due to both selective publication and selective non-reporting.29 33 45 49 One of these tools (the AHRQ tool for evaluating the risk of reporting bias29) directs users to assess a particular synthesis, where a single risk of bias judgement is made based on information about unpublished studies and under-reported outcomes. In the other three tools (the GRADE framework, and two others which are based on GRADE),33 45 49 the different sources of reporting bias are assessed in separate domains (bias due to selective non-reporting is considered in a ‘study limitations (risk of bias)’ domain, while bias due to selective publication is considered in a ‘publication bias’ domain).
Five tools21 28 43 44 47 guide users to assess risk of bias due to both selective non-reporting and selection of the reported result (ie, problems with outcomes/results that are not reported and those that are reported, respectively). Four of these tools, which include the Cochrane risk of bias tool for randomised trials21 and three others which are based on the Cochrane tool,43 44 47 direct assessments at the study level. That is, a whole study is rated at ‘high’ risk of reporting bias if any outcome/result in the study has been omitted, or fully reported, on the basis of the findings.
Some of the tools designed to assess the risk of bias due to selective non-reporting ask users to assess, for particular outcomes of interest, whether the outcome was not reported or only partially reported in the study on the basis of its results (eg, Outcome Reporting Bias In Trials (ORBIT) tools,3 48 the AHRQ outcome reporting bias framework,28 and GRADE.34 This allows users to perform multiple outcome-level assessments of the risk of reporting bias (rather than one assessment for the study as a whole). In total, 15 tools include a mechanism for assessing risk of bias due to selective non-reporting in studies, but assessing the corresponding risk of bias in a synthesis that is missing the non-reported outcomes is not within the scope of 11 of these tools.3 21 28 30 38 43 44 47 48 51 52
A variety of criteria are used in existing tools to inform a judgement of ‘high’ risk of bias due to selective publication (table 3), selective non-reporting (table 4), and selection of the reported result (table 5; more detail is provided in online supplementary table S4). In the four tools with an assessment of risk of bias due to selective publication, ‘high’ risk criteria include evidence of funnel plot asymmetry, discrepancies between published and unpublished studies, use of non-comprehensive searches and presence of small, ‘positive’ studies with for-profit interest (table 3). However, not all of these criteria appear in all tools (only evidence of funnel plot asymmetry does), and the relative weight assigned to each criterion in the overall risk of reporting bias judgement is clear for only one tool (the Semi-Automated Quality Assessment Tool; SAQAT).45 46
Supplementary file 4
All 15 tools with an assessment of the risk of bias due to selective non-reporting suggest that the risk of bias is ‘high’ when it is clear that an outcome was measured but no results were reported (table 4). Fewer of these tools (n=8; 53%) also recommend a ‘high’ risk judgement when results for an outcome are partially reported (eg, it is stated that the result was non-significant, but no effect estimate or summary statistics are presented).
The eight tools that include an assessment of the risk of bias in selection of the reported result recommend various criteria for a ‘high’ risk judgement (table 5). These include when some outcomes that were not prespecified are added post hoc (in 4 (50%) tools), or when it is likely that the reported result for a particular outcome has been selected, on the basis of the findings, from among multiple outcome measurements or analyses within the outcome domain (in 2 (25%) tools).
General characteristics of studies evaluating measurement properties of included tools
Despite identifying 17 studies that evaluated measurement properties of an included tool, psychometric statistics for the risk of reporting bias component were available only from 12 studies43 44 54–60 62 64 66 (the other five studies include only data on properties of the multidimensional tool as a whole31 53 61 63 65; online supplementary table S5). Nearly all 12 studies (11; 92%) evaluated inter-rater agreement between two assessors; eight of these studies reported weighted κ values, but only two described the weighting scheme.55 62 Eleven studies43 44 54–60 64 66 evaluated the measurement properties of tools for assessing risk of bias in a study due to selective non-reporting or risk of bias in selection of the reported result; in these 11 studies, a median of 40 (IQR 32–109) studies were assessed. One study62 evaluated a tool for assessing risk of bias in a synthesis due to selective publication, in which 44 syntheses were assessed. In the studies evaluating inter-rater agreement, all involved two assessors.
Supplementary file 5
Results of evaluation studies
Five studies54 56–58 60 included data on the inter-rater agreement of assessments of risk of bias due to selective non-reporting using the Cochrane risk of bias tool for randomised trials21 (table 6). Weighted κ values in four studies54 56–58 ranged from 0.13 to 0.50 (sample size ranged from 87 to 163 studies), suggesting slight to moderate agreement.27 In the other study,60 the per cent agreement in selective non-reporting assessments in trials that were included in two different Cochrane reviews was low (43% of judgements were in agreement). Two other studies found that inter-rater agreement of selective non-reporting assessments were substantial for SYRCLE’s RoB tool (κ=0.62, n=32),43 but poor for the RoBANS tool (κ=0, n=39).44 There was substantial agreement between raters in the assessment of risk of bias due to selective publication using the SAQAT (κ=0.63, n=29).62 The inter-rater agreement of assessments of risk of bias in selection of the reported result using the ROBINS-I tool4 was moderate for NRSI included in a review of the effect of cyclooxygenase-2 inhibitors on cardiovascular events (κ=0.45, n=21), and substantial for NRSI included in a review of the effect of thiazolidinediones on cardiovascular events (κ=0.78, n=16).55
Discussion
From a systematic search of the literature, we identified 18 tools designed for use by individuals performing evidence syntheses to assess risk of reporting biases in the included studies or in their synthesis of studies. The tools varied with regard to the type of reporting bias assessed (eg, bias due to selective publication, bias due to selective non-reporting), and the level of assessment (eg, for the study as a whole, a particular outcome within a study or a particular synthesis of studies). Various criteria are used across tools to designate a synthesis as being at ‘high’ risk of bias due to selective publication (eg, evidence of funnel plot asymmetry, use of non-comprehensive searches). However, the relative weight assigned to each criterion in the overall judgement is not clear for most of these tools. Tools for assessing risk of bias due to selective non-reporting guide users to assess a study, or an outcome within a study, as ‘high’ risk of bias if no results are reported for an outcome. However, assessing the corresponding risk of bias in a synthesis that is missing the non-reported outcomes is outside the scope of most of these tools. Inter-rater agreement estimates were available for five tools,4 21 43 44 62 and ranged from poor to substantial; however, the sample sizes of most evaluations were small, and few described the weighting scheme used to calculate κ.
Strengths and limitations
There are several strengths of this research. Methods were conducted in accordance with a systematic review protocol (https://osf.io/9ea22/). Published articles were identified by searching several relevant databases using a search string developed in conjunction with an information specialist,17 and by contacting experts to identify tools missed by the search. Detailed information on the content and measurement properties of existing tools was collected, providing readers with pertinent information to help decide which tools to use in future reviews. However, the findings need to be considered in light of some limitations. Screening of articles and data collection were performed by one author only. It is therefore possible that some relevant articles were missed, or that errors in data collection were made. The search for unpublished tools was not comprehensive (only Google Scholar was searched), so it is possible that other tools for assessing risk of reporting biases exist. Further, restricting the search to articles in English was done to expedite the review process, but may have resulted in loss of information about tools written in other languages, and additional evidence on measurement properties of tools.
Comparison with other studies
Other systematic reviews of risk of bias tools12–17 have restricted inclusion to tools developed for particular study designs (eg, randomised trials, diagnostic test accuracy studies), where the authors recorded all the sources of bias addressed. A different approach was taken in the current review, where all tools (regardless of study design) that address a particular source of bias were examined. By focusing on one source of bias only, the analysis of included items and criteria for risk of bias judgements was more detailed than that recorded previously. Some of the existing reviews of tools15 considered tools that were developed or modified in the context of a specific systematic review. However, such tools were excluded from the current review as they are unlikely to have been developed systematically,15 67 and are difficult to find (all systematic reviews conducted during a particular period would need to have been examined for the search to be considered exhaustive).
Explanations and implications
Of the 18 tools identified, only four (22%) included a mechanism for assessing risk of bias due to selective publication, which is the type of reporting bias that has been investigated by methodologists most often.2 This is perhaps unsurprising given that hundreds of statistical methods to ‘detect’ or ‘adjust’ for bias due to selective publication have been developed.18 These statistical methods may be considered by methodologists and systematic reviewers as the tools of choice for assessing this type of bias. However, application of these statistical methods without considering other factors (eg, existence of registered but unpublished studies, conflicts of interest that may influence investigators to not disseminate studies with unfavourable results) is not sufficiently comprehensive, and could lead to incorrect conclusions about the risk of bias due to selective publication. Further, there are many limitations of these statistical approaches, in terms of their underlying assumptions, statistical power, which is often low because most meta-analyses include few studies,7 and the need for specialist statistical software to apply them.19 68 These factors may have limited their use in practice and potentially explain why a large number of systematic reviewers currently ignore the risk of bias due to selective publication.7–9 69
Our analysis suggests that the factors that need to be considered to assess risk of reporting biases adequately (eg, comprehensiveness of the search, amount of data missing from the synthesis due to unpublished studies and under-reported outcomes) are fragmented. A similar problem was occurring a decade ago with the assessment of risk of bias in randomised trials. Some authors assessed only problems with randomisation, while others focused on whether trials were not ‘double blinded’ or had any missing participant data.70 It was not until all the important bias domains were brought together into a structured, domain-based tool to assess the risk of bias in randomised trials,21 that systematic reviewers started to consider risk of bias in trials comprehensively. A similar initiative to link all the components needed to judge the risk of reporting biases into a comprehensive new tool may improve the credibility of evidence syntheses.
In particular, there is an emergent need for a new tool to assess the risk that a synthesis is affected by reporting biases. This tool could guide users to consider risk of bias in a synthesis due to both selective publication and selective non-reporting, given that both practices lead to the same consequence: evidence missing from the synthesis.11 Such a tool would complement recently developed tools for assessing risk of bias within studies (RoB 2.041 and ROBINS-I4 which include a domain for assessing the risk of bias in selection of the reported result, but no mechanism to assess risk of bias due to selective non-reporting). Careful thought would need to be given as to how to weigh up various pieces of information underpinning the risk of bias judgement. For example, users will need guidance on how evidence of known, unpublished studies (as identified from trial registries, protocols or regulatory documents) should be considered alongside evidence that is more speculative (eg, funnel plots suggesting that studies may be missing). Further, guidance for the tool will need to emphasise the value of seeking documents other than published journal articles (eg, protocols) to inform risk of bias judgements. Preparation of a detailed guidance manual may enhance the usability of the tool, minimise misinterpretation and increase reliability in assessments. Once developed, evaluations of the measurement properties of the tool, such as inter-rater agreement and construct validity, should be conducted to explore whether modifications to the tool are necessary.
Conclusions
There are several limitations of existing tools for assessing risk of reporting biases in studies or syntheses of studies, in terms of their scope, guidance for reaching risk of bias judgements and measurement properties. Development and evaluation of a new, comprehensive tool could help overcome present limitations.
References
Footnotes
Contributors MJP conceived and designed the study, collected data, analysed the data and wrote the first draft of the article. JEM and JPTH provided input on the study design and contributed to revisions of the article. All authors approved the final version of the submitted article.
Funding MJP is supported by an Australian National Health and Medical Research Council (NHMRC) Early Career Fellowship (1088535). JEM is supported by an NHMRC Australian Public Health Fellowship (1072366). JPTH is funded in part by Cancer Research UK Programme Grant C18281/A19169; is a member of the MRC Integrative Epidemiology Unit at the University of Bristol, which is supported by the UK Medical Research Council and the University of Bristol (grant MC_UU_12013/9); and is a member of the MRC ConDuCT-II Hub (Collaboration and innovation for Difficult and Complex randomised controlled Trials In Invasive procedures; grant MR/K025643/1).
Competing interests JPTH led or participated in the development of four of the included tools (the current Cochrane risk of bias tool for randomised trials, the RoB 2.0 tool for assessing risk of bias in randomised trials, the ROBINS-I tool for assessing risk of bias in non-randomised studies of interventions and the framework for assessing quality of evidence from a network meta-analysis). MJP participated in the development of one of the included tools (the RoB 2.0 tool for assessing risk of bias in randomised trials). All authors are participating in the development of a new tool for assessing risk of reporting biases in systematic reviews.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The study protocol, data collection form, and the raw data and statistical analysis code for this study are available on the Open Science Framework: https://osf.io/3jdaa/