Article Text

PDF

How does under-reporting of negative and inconclusive results affect the false-positive rate in meta-analysis? A simulation study
  1. Michal Kicinski
  1. Department of Science, Hasselt University, Diepenbeek, Belgium
  1. Correspondence to M Kicinski; michal.kicinski{at}uhasselt.be

Abstract

Objective To investigate the impact of a higher publishing probability for statistically significant positive outcomes on the false-positive rate in meta-analysis.

Design Meta-analyses of different sizes (N=10, N=20, N=50 and N=100), levels of heterogeneity and levels of publication bias were simulated.

Primary and secondary outcome measures The type I error rate for the test of the mean effect size (ie, the rate at which the meta-analyses showed that the mean effect differed from 0 when it in fact equalled 0) was estimated. Additionally, the power and type I error rate of publication bias detection methods based on the funnel plot were estimated.

Results In the presence of a publication bias characterised by a higher probability of including statistically significant positive results, the meta-analyses frequently concluded that the mean effect size differed from zero when it actually equalled zero. The magnitude of the effect of publication bias increased with an increasing number of studies and between-study variability. A higher probability of including statistically significant positive outcomes introduced little asymmetry to the funnel plot. A publication bias of a sufficient magnitude to frequently overturn the meta-analytic conclusions was difficult to detect by publication bias tests based on the funnel plot. When statistically significant positive results were four times more likely to be included than other outcomes and a large between-study variability was present, more than 90% of the meta-analyses of 50 and 100 studies wrongly showed that the mean effect size differed from zero. In the same scenario, publication bias tests based on the funnel plot detected the bias at rates not exceeding 15%.

Conclusions This study adds to the evidence that publication bias is a major threat to the validity of medical research and supports the usefulness of efforts to limit publication bias.

Statistics from Altmetric.com

Strengths and limitations of this study

  • This is the first study that evaluated both the impact of publication bias on the conclusions from meta-analysis and the ability of publication bias methods to detect publication bias in the same meta-analysis samples.

  • The model for publication bias was realistic since it was based on empirical research on publication bias in the medical literature.

  • Selection models were not considered in this study because their relatively large computational burden made it impossible to incorporate them in the simulations, which involved analysing hundreds of thousands of samples.

Introduction

The tendency to decide whether to publish a study based on its results is commonly referred to as publication bias. Clearly, when some study outcomes are more likely to be reported than others, the available literature may be misleading. The phenomenon of research under-reporting has been long recognised as a potential source of bias.1–3

Meta-analysis, a tool that allows researchers to summarise the findings from multiple studies in a single estimate, plays an important role in the era of evidence-based decision-making. A key assumption of the standard meta-analysis model is that the sample of retrieved studies is representative of all conducted studies.4–6 One consequence of publication bias is that it affects the sample of studies that is available for a meta-analysis, thereby violating that assumption.7 Indeed, more and more evidence suggests that publication bias is present in many meta-analyses.8–11

Deciding whether to publish a study based on the statistical sinificance and the direction of the effect is the best-documented form of publication bias in the medical literature.12 ,13 Investigators, who followed research projects from the moment of the submission of the study protocols to ethics committees and medical agencies to the publication of the results, demonstrated that statistically significant and positive results often have a multiple times higher probability to be published than other results.13–15 Consistent with this evidence, a recent study observed that statistically significant findings favouring treatment often had a multiple times higher probability to enter meta-analyses of clinical trials than other findings.16

The effect of publication bias on the validity of meta-analytic conclusions remains largely unexplored. Hedges17 showed that censoring all non-significant results induces a strong bias when conslusions are drawn from multiple studies. Simulation studies have demonstrated that the standard meta-analysis model produces biased estimates of the mean effect size when publication bias is present.18–20 The conclusions from meta-analyses are sometimes inconsistent with the results of large studies and publication bias is a likely cause of this inconsistency.21–24

The validity of any statistical procedure requires a low rate of false-positive findings. In the case of meta-analysis, a low type I error rate (ie, a rate at which a meta-analysis leads to the conclusion that the mean effect differs from 0 when it in fact equals 0) is particularly important because a meta-analytic conclusion is assumed to summarise the existing evidence. In the context of a meta-analysis of clinical trials, a false-positive result may lead to the conclusion of a beneficial effect from a treatment that is in fact less efficient than the available alternatives.25 In general, a false-positive finding from a meta-analysis misinforms doctors, scientists and policymakers, potentially causing wastefullnes or even harm.

The aim of this study was to investigate the impact of a higher publishing probability for statistically significant positive outcomes on the type I error rate in meta-analysis. A simulation approach was used because the effect of publication bias on the conclusions from meta-analysis can only be evaluated when the exact nature of the selection process is known.

Methods

Data from individual studies

Meta-analyses of clinical trials with two arms and a binary outcome were simulated. However, the results of the simulations are applicable to other study designs as well because the distribution of the log-OR is approximately normal, similarly to the distribution of other commonly used effect size measures. Similar to another simulation study,26 the sample size was modelled using the exponential of a normal distribution. This approach gives a right-skewed distribution, which is a realistic model. Based on the characteristics of the meta-analyses from the Cochrane Database of Systematic Reviews,27 a mean of 4.51 and a variance of 1.47 was chosen. With these values, the median sample size equalled 91 and the IQR was 166. Following other simulation studies,19 ,20 ,26 ,28 ,29 equal sizes were used for the treatment group and control group.

As in other simulation studies,19 ,20 ,26 the probability of the event in the control group (pC) was sampled from a uniform distribution U (0.3, 0.7). The probability of the event in the treatment group (pT) was calculated from the equation logit (pT)=logit (pC)+δ+θ, where δ was the effect of study-specific characteristics on the log-OR, and θ was the mean effect size. The mean effect size equalled 0 because the effect of publication bias on the type I error rate for the test of the mean effect size was investigated. I sampled δ from a normal distribution N (0, τ2). For the between-study variability, τ2, the values 0.02, 0.12 and 0.9 were considered. These values are the 10th, 50th and 90th centiles of the predictive distribution of the between-study variability in the meta-analyses of clinical trials from the Cochrane database.30 The size of the between-study variability is often expressed in terms of I2, defined as the proportion of the total variability due to heterogeneity.31 The considered values of τ2 correspond to I2=17%, I2=56% and I2=90%. The number of events in the treatment and control group was sampled from a binomial distribution.

Selection process

The relative risk (RR) was defined as the ratio of the probability of including statistically significant positive results to the probability of including other results. However, the conclusions of the study are equally applicable to the case of a higher publishing probability for statistically significant negative outcomes. A conventional two-sided significance level of 0.05 was assumed. Three values of RR were considered: 1, 4 and 10. For RR=1, no publication bias was present. A value of four was chosen because multiple studies on publication bias estimated the ratio of the probability of publishing studies showing statistically significant positive results to the probability of publishing other results as close to four.13–15 A value of 10 represents a strong publication bias and is still relevant in the light of the empirical research on publication bias in the medical literature.13 ,16 ,32

Publication bias detection

A meta-analysis is often accompanied by an investigation of the presence of publication bias. Therefore, publication bias tests were incorporated in the simulations. The funnel plot is a scatter plot of effect estimates against some measure of precision. In the absence of a bias, the effect estimates from smaller studies scatter widely at the bottom of the funnel plot, with the spread narrowing among larger studies, so that the plot resembles a symmetrical inverted funnel.33 If there is a bias, funnel plots are often asymmetrical.33 ,34 Since a funnel plot asymmetry is commonly used to investigate the presence of publication bias,35 the funnel plots were inspected visually and using the following formal tests:

  • the Egger's test, ‘Egger’;34

  • the rank correlation test, ‘Rank’;36

  • a modified Egger's test based on the efficient score, ‘Harbord’;28

  • a regression test based on sample size, ‘Peters’;26

  • a rank correlation test for binary data, ‘Schwarzer’;37

  • the Egger's test based on the arcsine transformation, ‘Arc-Egger’;38

  • a rank correlation test based on the arcsine transformation, ‘Arc-rank’;38

  • the trim and fill method, ‘Trim’.39

For all tests, a significance level of 0.05 was used. For ‘Egger’, ‘Rank’, ‘Harbord’, ‘Peters’, ‘Schwarzer’, ‘Arc-Egger’ and ‘Arc-rank’ two-sided tests were used. For the trim and fill method, the presence of publication bias was indicated when the number of missing studies estimated by the R estimator in the first step of the algorithm was greater than 3.39

Meta-analysis

The mean log-OR was estimated using the random effects model proposed by DerSimonian and Laird, which is a widely used approach to conduct a meta-analysis.40 Four sizes of meta-analyses were considered: N=10, N=20, N=50 and N=100. Meta-analyses including less than 10 studies were not considered because publication bias tests were not recommended for use in this case due to a low power.33

Simulations

Four sample sizes (N=10, N=20, N=50 and N=100), three sizes of the between-study variability (τ2=0.02, τ2=0.12 and τ2=0.9), and three levels of publication bias were considered (RR=1, RR=4 and RR=10), resulting in 36 simulation scenarios. For each scenario, the estimates of the mean effect size were evaluated in terms of the bias and the mean squared error. The effect of publication bias on the type I error rate for the test of the mean effect size was estimated for a grid of values within the considered ranges of the level of publication bias and the size of between-study variability. A two-sided significance level of 0.05 was assumed.

For each scenario, the power and the type I error rate for the publication bias tests were also investigated. Additionally, I estimated the type I error rate for the test of the mean effect size using only those samples where no publication bias was found. The purpose of this analysis was to investigate the effect of a one-sided selection process based on the statistical significance on the false-positive rate in meta-analysis in situations where publication bias detection methods cannot not identify the bias. All reported estimates are based on 10 000 simulations. The analysis was conducted in R (V.2.15.0). The R code used to perform the simulations is available online (see data sharing statement).

Results

Validity of the mean effect size estimates

Figure 1 shows the type I error rates for the test of the mean effect size for the range of the level of publication bias and the amount of between-study variability considered in the study. In the presence of a selection process characterised by a higher probability of including statistically significant positive results, the meta-analyses frequently concluded that the mean effect size differed from zero when it in fact equalled zero. The magnitude of the effect of publication bias increased with an increasing number of studies and the amount of between-study variability. When statistically significant positive results were four times more likely to be included than other results, the type I error rate was between 11% and 100%. When statistically significant positive results were 10 times more likely to be included, between 25% and 100% of the meta-analyses concluded that the mean effect size differed from zero when it in fact equalled 0 (figure 1).

Figure 1

The effect of a higher probability of inclusion for statistically significant positive outcomes on the type I error rate for the test of the mean effect size in a meta-analysis of (A) 10 studies, (B) 20 studies, (C) 50 studies, (D) 100 studies. RR: the ratio of the probability of including statistically significant positive outcomes to the probability of including negative and/or not statistically significant outcomes.

A higher probability of including statistically significant positive outcomes led to a drastic increase of the bias and the mean squared error, especially when a large between-study variability was present (table 1). When statistically significant positive results were four times more likely to be included than other results and 90% of the variability was due to between-study differences, the random-effects meta-analysis overestimated the mean log-OR approximately by 0.5 on average. When statistically significant positive results were 10 times more likely to be included and the same amount of between-study variability was present, the random-effects meta-analysis overestimated the mean log-OR by 0.83 on average. The mean squared error was especially large when the between-study variability was large (table 1).

Table 1

Validity of estimates of the mean effect size

Publication bias detection

Next, I investigated whether a one-sided selection process based on the statistical significance (which caused a drastic increase of the false-positive rate of the meta-analyses, as described in the previous section) was detectable by different publication bias methods.

Figure 2 shows data from simulations without publication bias (A and B) and simulations in which statistically significant positive results were 10 times more likely to be included than other results (C and D). A visual examination of the funnel plots indicated that a one-sided selection process based on the statistical significance introduced little asymmetry to the funnel plot both when the between-study variability was small (compare figure 2A, C) and large (compare figure 2B, D). In other words, the funnel plot provided no evidence of publication bias when positive statistically significant results were 10 times more likely to be included than other results.

Figure 2

A funnel plot of simulated data when: (A) the probability of inclusion was the same for all outcomes and a small between-study variability was present (τ2=0.02), (B) the probability of inclusion was the same for all outcomes and a large between-study variability was present (τ2=0.9), (C) statistically significant positive outcomes were 10 times more likely to be included than other outcomes and a small between-study variability was present (τ2=0.02), (D) statistically significant positive outcomes were 10 times more likely to be included than other outcomes and a large between-study variability was present (τ2=0.9).

Table 2 gives the proportions of the meta-analyses in which the presence of publication bias was indicated by formal tests. The scenarios with publication bias (RR=4 and RR=10) provide estimates of the power of different tests to detect a one-sided selection process based on the statistical significance. The scenarios without publication bias provide estimates of the type I error rate (the rate at which publication bias was indicated when no publication bias was present). When statistically significant positive results were four times more likely to be included than other results, all methods indicated the presence of publication bias in not more than 15% of the meta-analyses for all simulation settings (table 2). When statistically significant positive results were 10 times more likely to be included, the power of publication bias detection methods did not exceed 30% for any simulation setting. The type I error rates for the ‘Egger’, ‘Harbord’ and ‘Arc-Egger’ tests substantially exceeded 0.05 for some simulation settings, especially when a large between-study variability was present

Table 2

Power and type I error rate of publication bias detection methods

False-positive rate in meta-analyses in which no publication bias was found

For the completeness of the study, I repeated the investigation of the effect of a selection process based on the statistical significance on the type I error rate for the test of the mean effect size using only those samples in which a certain publication bias test did not show evidence of publication bias. The aim of this analysis was to study whether a one-sided selection process based on the statistical significance threatened the validity of those meta-analyses where no evidence of publication bias was apparent. For example, meta-analyses were simulated until 10 000 samples were identified in which the ‘Egger’ test did not show any evidence of publication bias. Next, those samples were used to estimate the rate at which the meta-analysis led to the conclusion that the mean effect size differed from 0 when it actually did not, under a selection process based on the statistical significance that could not be detected by the ‘Egger’ test. Table 3 compares the proportion of meta-analyses incorrectly showing that the mean effect size differed from zero among all samples (column ‘All’) and among samples where no publication bias was found. There was little difference in the type I error rate for the test of the mean effect size between the meta-analyses without evidence of publication bias and all meta-analyses.

Table 3

Type I error rate for the test for the mean effect size when no evidence of bias was present

Discussion

The results of these realistic simulations demonstrate that when a one-sided selection process based on the statistical significance is present, the false-positive rate in meta-analysis dramatically increases. The magnitude of the problem increases with an increasing number of studies used and the amount of heterogeneity. When statistically significant positive results were four times more likely to be included in the meta-analyses than other results, the false-positive rate was between 11% and 100%. When statistically significant positive results were 10 times more likely to be included, between 25% and 100% of the meta-analyses wrongly concluded that the mean effect size differed from zero.

Publication bias tests based on the funnel plot were unlikely to detect a publication bias of a sufficient magnitude to frequently overturn the meta-analytic conclusions. For example, when statistically significant positive results were four times more likely to be included and a large between-study variability was present, more than 90% of the meta-analyses of 50 and 100 studies wrongly concluded that the mean effect size differed from zero. In the same scenario, all publication bias tests based on the funnel plot detected the bias at rates not exceeding 15%. The power of the tests did not exceed 30% for any simulation settings. In general, the Egger's test,34 the modified Egger's test based on the efficient score28 and the Egger's test based on the arcsine transformation38 showed the highest power. However, the type I error rate of these tests substantially exceeded 0.05, especially when a large between-study variability was present.

Many selection processes are known to introduce a considerable amount of asymmetry to the funnel plot. For example, when studies with most extreme negative effect estimates fail to enter a meta-analysis, a test based on the R estimator from the trim and fill method provides a powerful tool to detect this bias.39 In addition to the type of selection process, the mean effect size also determines the performance of publication bias detection methods. Several studies considering different selection processes have observed that tests based on the funnel plot are characterised by a low power when the mean effect size equals zero.26 ,41 The current study shows that this is also the case for a one-sided publication bias based on the statistical significance.

A higher probability of including statistically significant positive results caused a large increase of the type I error rate for the test of the mean effect size also in those meta-analyses, where publication bias tests did not detect the bias. This result demonstrates that under-reporting of negative and non-significant results is also a threat to the validity of those meta-analyses where publication bias cannot be found by the methods based on the funnel plot.

The most common approaches to address publication bias in a meta-analysis include ignoring the issue and applying methods based on the funnel plot.35 The current study demonstrates that when a one-sided publication bias based on the statistical significance is possibly present, the issue should never be ignored because this bias causes a severe increase of the false-positive rate in meta-analysis. Moreover, the study shows that the methods based on the funnel plot are not appropriate to address the problem because a selection process based on the statistical significance introduces little asymmetry to the funnel plot when the mean-effect size equals zero. Parametric 16 ,42 ,43 and non-parametric 44 ,45 selection models may be an attractive alternative to the methods based on the funnel plot. In a recent study with settings based on characteristics of large meta-analyses from major medical journals, a Bayesian hierarchical selection model outperformed methods based on the funnel plot.16 Future research should compare the performance of different selection models and methods based on the funnel plot in a wider range of scenarios. Selection models were not considered in this study because their relatively large computational burden made it impossible to incorporate them in the simulations, which involved analysing hundreds of thousands of samples.

Many recent developments enhance complete and unbiased reporting of clinical trials. The International Committee of Medical Journal Editors began to require trial registration as a condition for publication in 2005. In 2008, the 59th World Medical Association (WMA) General Assembly stated that clinical trials must be registered prospectively and called a public disclosure of positive, negative and inconclusive results an author's duty. The results of this study add to the evidence that publication bias is a major threat to the validity of conclusions from medical research and strongly support the usefulness of the efforts to limit publication bias.

Conclusions

Under-reporting of negative and inconclusive results, which was demonstrated by studies on publication bias, represents a major threat to the validity of meta-analysis. A higher probability of including statistically significant positive outcomes causes a severe increase of the false-positive rate in meta-analysis. Moreover, a one-sided selection process based on the statistical significance of a sufficient magnitude to dramatically bias meta-analysis conclusions is poorly detectable by publication bias methods based on the funnel plot when the mean effect size equals 0. Future research is needed to compare the performance of these methods with selection models. The study supports the usefulness of initiatives aiming to reduce publication bias in the medical literature.

References

View Abstract

Footnotes

  • Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. MK is a PhD fellow at the Research Foundation-Flanders (FWO).

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The R code that was used to perform the simulations is available on figshare at: http://www.dx.doi.org/10.6084/m9.figshare.1119702.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.