Article Text
Abstract
Objective To investigate the impact of a higher publishing probability for statistically significant positive outcomes on the falsepositive rate in metaanalysis.
Design Metaanalyses of different sizes (N=10, N=20, N=50 and N=100), levels of heterogeneity and levels of publication bias were simulated.
Primary and secondary outcome measures The type I error rate for the test of the mean effect size (ie, the rate at which the metaanalyses showed that the mean effect differed from 0 when it in fact equalled 0) was estimated. Additionally, the power and type I error rate of publication bias detection methods based on the funnel plot were estimated.
Results In the presence of a publication bias characterised by a higher probability of including statistically significant positive results, the metaanalyses frequently concluded that the mean effect size differed from zero when it actually equalled zero. The magnitude of the effect of publication bias increased with an increasing number of studies and betweenstudy variability. A higher probability of including statistically significant positive outcomes introduced little asymmetry to the funnel plot. A publication bias of a sufficient magnitude to frequently overturn the metaanalytic conclusions was difficult to detect by publication bias tests based on the funnel plot. When statistically significant positive results were four times more likely to be included than other outcomes and a large betweenstudy variability was present, more than 90% of the metaanalyses of 50 and 100 studies wrongly showed that the mean effect size differed from zero. In the same scenario, publication bias tests based on the funnel plot detected the bias at rates not exceeding 15%.
Conclusions This study adds to the evidence that publication bias is a major threat to the validity of medical research and supports the usefulness of efforts to limit publication bias.
Statistics from Altmetric.com
Strengths and limitations of this study

This is the first study that evaluated both the impact of publication bias on the conclusions from metaanalysis and the ability of publication bias methods to detect publication bias in the same metaanalysis samples.

The model for publication bias was realistic since it was based on empirical research on publication bias in the medical literature.

Selection models were not considered in this study because their relatively large computational burden made it impossible to incorporate them in the simulations, which involved analysing hundreds of thousands of samples.
Introduction
The tendency to decide whether to publish a study based on its results is commonly referred to as publication bias. Clearly, when some study outcomes are more likely to be reported than others, the available literature may be misleading. The phenomenon of research underreporting has been long recognised as a potential source of bias.1–3
Metaanalysis, a tool that allows researchers to summarise the findings from multiple studies in a single estimate, plays an important role in the era of evidencebased decisionmaking. A key assumption of the standard metaanalysis model is that the sample of retrieved studies is representative of all conducted studies.4–6 One consequence of publication bias is that it affects the sample of studies that is available for a metaanalysis, thereby violating that assumption.7 Indeed, more and more evidence suggests that publication bias is present in many metaanalyses.8–11
Deciding whether to publish a study based on the statistical sinificance and the direction of the effect is the bestdocumented form of publication bias in the medical literature.12 ,13 Investigators, who followed research projects from the moment of the submission of the study protocols to ethics committees and medical agencies to the publication of the results, demonstrated that statistically significant and positive results often have a multiple times higher probability to be published than other results.13–15 Consistent with this evidence, a recent study observed that statistically significant findings favouring treatment often had a multiple times higher probability to enter metaanalyses of clinical trials than other findings.16
The effect of publication bias on the validity of metaanalytic conclusions remains largely unexplored. Hedges17 showed that censoring all nonsignificant results induces a strong bias when conslusions are drawn from multiple studies. Simulation studies have demonstrated that the standard metaanalysis model produces biased estimates of the mean effect size when publication bias is present.18–20 The conclusions from metaanalyses are sometimes inconsistent with the results of large studies and publication bias is a likely cause of this inconsistency.21–24
The validity of any statistical procedure requires a low rate of falsepositive findings. In the case of metaanalysis, a low type I error rate (ie, a rate at which a metaanalysis leads to the conclusion that the mean effect differs from 0 when it in fact equals 0) is particularly important because a metaanalytic conclusion is assumed to summarise the existing evidence. In the context of a metaanalysis of clinical trials, a falsepositive result may lead to the conclusion of a beneficial effect from a treatment that is in fact less efficient than the available alternatives.25 In general, a falsepositive finding from a metaanalysis misinforms doctors, scientists and policymakers, potentially causing wastefullnes or even harm.
The aim of this study was to investigate the impact of a higher publishing probability for statistically significant positive outcomes on the type I error rate in metaanalysis. A simulation approach was used because the effect of publication bias on the conclusions from metaanalysis can only be evaluated when the exact nature of the selection process is known.
Methods
Data from individual studies
Metaanalyses of clinical trials with two arms and a binary outcome were simulated. However, the results of the simulations are applicable to other study designs as well because the distribution of the logOR is approximately normal, similarly to the distribution of other commonly used effect size measures. Similar to another simulation study,26 the sample size was modelled using the exponential of a normal distribution. This approach gives a rightskewed distribution, which is a realistic model. Based on the characteristics of the metaanalyses from the Cochrane Database of Systematic Reviews,27 a mean of 4.51 and a variance of 1.47 was chosen. With these values, the median sample size equalled 91 and the IQR was 166. Following other simulation studies,19 ,20 ,26 ,28 ,29 equal sizes were used for the treatment group and control group.
As in other simulation studies,19 ,20 ,26 the probability of the event in the control group (p^{C}) was sampled from a uniform distribution U (0.3, 0.7). The probability of the event in the treatment group (p^{T}) was calculated from the equation logit (p^{T})=logit (p^{C})+δ+θ, where δ was the effect of studyspecific characteristics on the logOR, and θ was the mean effect size. The mean effect size equalled 0 because the effect of publication bias on the type I error rate for the test of the mean effect size was investigated. I sampled δ from a normal distribution N (0, τ^{2}). For the betweenstudy variability, τ^{2}, the values 0.02, 0.12 and 0.9 were considered. These values are the 10th, 50th and 90th centiles of the predictive distribution of the betweenstudy variability in the metaanalyses of clinical trials from the Cochrane database.30 The size of the betweenstudy variability is often expressed in terms of I^{2}, defined as the proportion of the total variability due to heterogeneity.31 The considered values of τ^{2} correspond to I^{2}=17%, I^{2}=56% and I^{2}=90%. The number of events in the treatment and control group was sampled from a binomial distribution.
Selection process
The relative risk (RR) was defined as the ratio of the probability of including statistically significant positive results to the probability of including other results. However, the conclusions of the study are equally applicable to the case of a higher publishing probability for statistically significant negative outcomes. A conventional twosided significance level of 0.05 was assumed. Three values of RR were considered: 1, 4 and 10. For RR=1, no publication bias was present. A value of four was chosen because multiple studies on publication bias estimated the ratio of the probability of publishing studies showing statistically significant positive results to the probability of publishing other results as close to four.13–15 A value of 10 represents a strong publication bias and is still relevant in the light of the empirical research on publication bias in the medical literature.13 ,16 ,32
Publication bias detection
A metaanalysis is often accompanied by an investigation of the presence of publication bias. Therefore, publication bias tests were incorporated in the simulations. The funnel plot is a scatter plot of effect estimates against some measure of precision. In the absence of a bias, the effect estimates from smaller studies scatter widely at the bottom of the funnel plot, with the spread narrowing among larger studies, so that the plot resembles a symmetrical inverted funnel.33 If there is a bias, funnel plots are often asymmetrical.33 ,34 Since a funnel plot asymmetry is commonly used to investigate the presence of publication bias,35 the funnel plots were inspected visually and using the following formal tests:

the Egger's test, ‘Egger’;34

the rank correlation test, ‘Rank’;36

a modified Egger's test based on the efficient score, ‘Harbord’;28

a regression test based on sample size, ‘Peters’;26

a rank correlation test for binary data, ‘Schwarzer’;37

the Egger's test based on the arcsine transformation, ‘ArcEgger’;38

a rank correlation test based on the arcsine transformation, ‘Arcrank’;38

the trim and fill method, ‘Trim’.39
For all tests, a significance level of 0.05 was used. For ‘Egger’, ‘Rank’, ‘Harbord’, ‘Peters’, ‘Schwarzer’, ‘ArcEgger’ and ‘Arcrank’ twosided tests were used. For the trim and fill method, the presence of publication bias was indicated when the number of missing studies estimated by the R estimator in the first step of the algorithm was greater than 3.39
Metaanalysis
The mean logOR was estimated using the random effects model proposed by DerSimonian and Laird, which is a widely used approach to conduct a metaanalysis.40 Four sizes of metaanalyses were considered: N=10, N=20, N=50 and N=100. Metaanalyses including less than 10 studies were not considered because publication bias tests were not recommended for use in this case due to a low power.33
Simulations
Four sample sizes (N=10, N=20, N=50 and N=100), three sizes of the betweenstudy variability (τ^{2}=0.02, τ^{2}=0.12 and τ^{2}=0.9), and three levels of publication bias were considered (RR=1, RR=4 and RR=10), resulting in 36 simulation scenarios. For each scenario, the estimates of the mean effect size were evaluated in terms of the bias and the mean squared error. The effect of publication bias on the type I error rate for the test of the mean effect size was estimated for a grid of values within the considered ranges of the level of publication bias and the size of betweenstudy variability. A twosided significance level of 0.05 was assumed.
For each scenario, the power and the type I error rate for the publication bias tests were also investigated. Additionally, I estimated the type I error rate for the test of the mean effect size using only those samples where no publication bias was found. The purpose of this analysis was to investigate the effect of a onesided selection process based on the statistical significance on the falsepositive rate in metaanalysis in situations where publication bias detection methods cannot not identify the bias. All reported estimates are based on 10 000 simulations. The analysis was conducted in R (V.2.15.0). The R code used to perform the simulations is available online (see data sharing statement).
Results
Validity of the mean effect size estimates
Figure 1 shows the type I error rates for the test of the mean effect size for the range of the level of publication bias and the amount of betweenstudy variability considered in the study. In the presence of a selection process characterised by a higher probability of including statistically significant positive results, the metaanalyses frequently concluded that the mean effect size differed from zero when it in fact equalled zero. The magnitude of the effect of publication bias increased with an increasing number of studies and the amount of betweenstudy variability. When statistically significant positive results were four times more likely to be included than other results, the type I error rate was between 11% and 100%. When statistically significant positive results were 10 times more likely to be included, between 25% and 100% of the metaanalyses concluded that the mean effect size differed from zero when it in fact equalled 0 (figure 1).
A higher probability of including statistically significant positive outcomes led to a drastic increase of the bias and the mean squared error, especially when a large betweenstudy variability was present (table 1). When statistically significant positive results were four times more likely to be included than other results and 90% of the variability was due to betweenstudy differences, the randomeffects metaanalysis overestimated the mean logOR approximately by 0.5 on average. When statistically significant positive results were 10 times more likely to be included and the same amount of betweenstudy variability was present, the randomeffects metaanalysis overestimated the mean logOR by 0.83 on average. The mean squared error was especially large when the betweenstudy variability was large (table 1).
Publication bias detection
Next, I investigated whether a onesided selection process based on the statistical significance (which caused a drastic increase of the falsepositive rate of the metaanalyses, as described in the previous section) was detectable by different publication bias methods.
Figure 2 shows data from simulations without publication bias (A and B) and simulations in which statistically significant positive results were 10 times more likely to be included than other results (C and D). A visual examination of the funnel plots indicated that a onesided selection process based on the statistical significance introduced little asymmetry to the funnel plot both when the betweenstudy variability was small (compare figure 2A, C) and large (compare figure 2B, D). In other words, the funnel plot provided no evidence of publication bias when positive statistically significant results were 10 times more likely to be included than other results.
Table 2 gives the proportions of the metaanalyses in which the presence of publication bias was indicated by formal tests. The scenarios with publication bias (RR=4 and RR=10) provide estimates of the power of different tests to detect a onesided selection process based on the statistical significance. The scenarios without publication bias provide estimates of the type I error rate (the rate at which publication bias was indicated when no publication bias was present). When statistically significant positive results were four times more likely to be included than other results, all methods indicated the presence of publication bias in not more than 15% of the metaanalyses for all simulation settings (table 2). When statistically significant positive results were 10 times more likely to be included, the power of publication bias detection methods did not exceed 30% for any simulation setting. The type I error rates for the ‘Egger’, ‘Harbord’ and ‘ArcEgger’ tests substantially exceeded 0.05 for some simulation settings, especially when a large betweenstudy variability was present
Falsepositive rate in metaanalyses in which no publication bias was found
For the completeness of the study, I repeated the investigation of the effect of a selection process based on the statistical significance on the type I error rate for the test of the mean effect size using only those samples in which a certain publication bias test did not show evidence of publication bias. The aim of this analysis was to study whether a onesided selection process based on the statistical significance threatened the validity of those metaanalyses where no evidence of publication bias was apparent. For example, metaanalyses were simulated until 10 000 samples were identified in which the ‘Egger’ test did not show any evidence of publication bias. Next, those samples were used to estimate the rate at which the metaanalysis led to the conclusion that the mean effect size differed from 0 when it actually did not, under a selection process based on the statistical significance that could not be detected by the ‘Egger’ test. Table 3 compares the proportion of metaanalyses incorrectly showing that the mean effect size differed from zero among all samples (column ‘All’) and among samples where no publication bias was found. There was little difference in the type I error rate for the test of the mean effect size between the metaanalyses without evidence of publication bias and all metaanalyses.
Discussion
The results of these realistic simulations demonstrate that when a onesided selection process based on the statistical significance is present, the falsepositive rate in metaanalysis dramatically increases. The magnitude of the problem increases with an increasing number of studies used and the amount of heterogeneity. When statistically significant positive results were four times more likely to be included in the metaanalyses than other results, the falsepositive rate was between 11% and 100%. When statistically significant positive results were 10 times more likely to be included, between 25% and 100% of the metaanalyses wrongly concluded that the mean effect size differed from zero.
Publication bias tests based on the funnel plot were unlikely to detect a publication bias of a sufficient magnitude to frequently overturn the metaanalytic conclusions. For example, when statistically significant positive results were four times more likely to be included and a large betweenstudy variability was present, more than 90% of the metaanalyses of 50 and 100 studies wrongly concluded that the mean effect size differed from zero. In the same scenario, all publication bias tests based on the funnel plot detected the bias at rates not exceeding 15%. The power of the tests did not exceed 30% for any simulation settings. In general, the Egger's test,34 the modified Egger's test based on the efficient score28 and the Egger's test based on the arcsine transformation38 showed the highest power. However, the type I error rate of these tests substantially exceeded 0.05, especially when a large betweenstudy variability was present.
Many selection processes are known to introduce a considerable amount of asymmetry to the funnel plot. For example, when studies with most extreme negative effect estimates fail to enter a metaanalysis, a test based on the R estimator from the trim and fill method provides a powerful tool to detect this bias.39 In addition to the type of selection process, the mean effect size also determines the performance of publication bias detection methods. Several studies considering different selection processes have observed that tests based on the funnel plot are characterised by a low power when the mean effect size equals zero.26 ,41 The current study shows that this is also the case for a onesided publication bias based on the statistical significance.
A higher probability of including statistically significant positive results caused a large increase of the type I error rate for the test of the mean effect size also in those metaanalyses, where publication bias tests did not detect the bias. This result demonstrates that underreporting of negative and nonsignificant results is also a threat to the validity of those metaanalyses where publication bias cannot be found by the methods based on the funnel plot.
The most common approaches to address publication bias in a metaanalysis include ignoring the issue and applying methods based on the funnel plot.35 The current study demonstrates that when a onesided publication bias based on the statistical significance is possibly present, the issue should never be ignored because this bias causes a severe increase of the falsepositive rate in metaanalysis. Moreover, the study shows that the methods based on the funnel plot are not appropriate to address the problem because a selection process based on the statistical significance introduces little asymmetry to the funnel plot when the meaneffect size equals zero. Parametric 16 ,42 ,43 and nonparametric 44 ,45 selection models may be an attractive alternative to the methods based on the funnel plot. In a recent study with settings based on characteristics of large metaanalyses from major medical journals, a Bayesian hierarchical selection model outperformed methods based on the funnel plot.16 Future research should compare the performance of different selection models and methods based on the funnel plot in a wider range of scenarios. Selection models were not considered in this study because their relatively large computational burden made it impossible to incorporate them in the simulations, which involved analysing hundreds of thousands of samples.
Many recent developments enhance complete and unbiased reporting of clinical trials. The International Committee of Medical Journal Editors began to require trial registration as a condition for publication in 2005. In 2008, the 59th World Medical Association (WMA) General Assembly stated that clinical trials must be registered prospectively and called a public disclosure of positive, negative and inconclusive results an author's duty. The results of this study add to the evidence that publication bias is a major threat to the validity of conclusions from medical research and strongly support the usefulness of the efforts to limit publication bias.
Conclusions
Underreporting of negative and inconclusive results, which was demonstrated by studies on publication bias, represents a major threat to the validity of metaanalysis. A higher probability of including statistically significant positive outcomes causes a severe increase of the falsepositive rate in metaanalysis. Moreover, a onesided selection process based on the statistical significance of a sufficient magnitude to dramatically bias metaanalysis conclusions is poorly detectable by publication bias methods based on the funnel plot when the mean effect size equals 0. Future research is needed to compare the performance of these methods with selection models. The study supports the usefulness of initiatives aiming to reduce publication bias in the medical literature.
References
Footnotes

Funding This research received no specific grant from any funding agency in the public, commercial or notforprofit sectors. MK is a PhD fellow at the Research FoundationFlanders (FWO).

Competing interests None.

Provenance and peer review Not commissioned; externally peer reviewed.

Data sharing statement The R code that was used to perform the simulations is available on figshare at: http://www.dx.doi.org/10.6084/m9.figshare.1119702.
Request permissions
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.