Intended for healthcare professionals

Letters

Bias in meta-analysis detected by a simple, graphical test

BMJ 1998; 316 doi: https://doi.org/10.1136/bmj.316.7129.469 (Published 07 February 1998) Cite this as: BMJ 1998;316:469

Asymmetry detected in funnel plot was probably due to true heterogeneity

  1. Andreas E Stuck, Chiefa,
  2. Laurence Z Rubenstein, Professor of geriatric medicineb,
  3. Darryl Wieland, Professorc
  1. a Department of Geriatrics and Rehabilitation, Zieglerspital, Berne, Switzerland
  2. b Education and Clinical Center, UCLA-Sepulveda VA Medical Center, Los Angeles, CA, USA
  3. c Department of Medicine, Division of Geriatrics, University of South Carolina School of Medicine, Columbia, SC, USA
  4. d Department of Clinical Epidemiology, Leiden University Hospital, 2300 RC Leiden, Netherlands
  5. e Department of Public Health and Community Medicine, A27, University of Sydney, NSW 2006, Australia
  6. f Department of Social and Preventive Medicine, University of Queensland, Medical School, Herston, QLD 4006, Australia
  7. g Unit of Healthcare Epidemiology, Institute of Health Sciences, Oxford University, Oxford OX3 7LF
  8. h Diabetes Research Laboratories, Radcliffe Infirmary, Oxford OX2 6HE
  9. i Department of Social Medicine, University of Bristol, Bristol BS8 2PR
  10. j Department of Social and Preventive Medicine, University of Berne, Switzerland
  11. k Academic Section of Geriatric Medicine, Royal Infirmary, Glasgow G4 0SF
  12. l NHS Centre for Reviews and Dissemination, University of York, York YO1 5DD

    Editor—Egger et al report that they “found bias in 38% of meta-analyses published in four leading journals.”1 This is misleading, at least insofar as our meta-analysis of inpatient geriatric consultations is concerned.2

    Firstly, the bias observed in our meta-analysis was not a retrospective detection of bias, as one might infer from Egger et al's statements. We knew that there was evidence of heterogeneity for the pooled effect estimates of geriatric consultation programmes and reported this finding.2 Secondly, the asymmetry detected in the funnel plot of the meta-analysis of inpatient consultation programmes was probably due not to bias (distortion of true effect) but to true heterogeneity (true difference of effects between trials). We took the presence of heterogeneity as an opportunity to examine whether we could identify the programme elements that might have resulted in the observed effect differences between geriatric consultation programmes. Using a multivariate logistic regression approach, we found that both geriatric assessment programmes in which the consultant controlled the implementation of the recommendations and those that included long term follow up resulted in better outcomes than did programmes in which this was not the case.

    Thus, the meta-analytical methods of testing heterogeneity or drawing funnel plots should not be considered absolute criteria for separating good from bad meta-analyses. Meta-analyses reporting effect estimates that may contain bias should continue to be published in leading medical journals, as long as the possibility of heterogeneity is stated and potential underlying reasons for heterogeneity are addressed. This is especially true for meta-analyses of complex interventions. Although they are methodologically difficult to deal with, variations in effect estimates give us the opportunity to disentangle the black box of complex interventions, such as of geriatric assessment, and identify what the necessary ingredients of these programmes are.3

    A third issue concerns the “mega-trial” to which our meta-analysis was being compared.4 This trial was different from any of the trials considered in our meta-analysis. Among other things, it was based in a health maintenance organisation system that had incorporated considerable geriatric expertise into its usual care for older people. Another important factor was that it involved four hospital sites, each with different characteristics, populations, and survival rates. If Egger et al had taken the same pains as we did in recovering unpublished data from primary trials, they would have found that the mega-trial they used in questioning our meta-analysis was a multicentre trial with unreported variability in intervention components and outcomes across study sites. Analysts must consider rigorously any methodological issues unique to each trial, particularly when considering complex interventions.

    References

    1. 1.
    2. 2.
    3. 3.
    4. 4.

    Experts' views are still needed

    1. Jan P Vandenbroucke, Professord
    1. a Department of Geriatrics and Rehabilitation, Zieglerspital, Berne, Switzerland
    2. b Education and Clinical Center, UCLA-Sepulveda VA Medical Center, Los Angeles, CA, USA
    3. c Department of Medicine, Division of Geriatrics, University of South Carolina School of Medicine, Columbia, SC, USA
    4. d Department of Clinical Epidemiology, Leiden University Hospital, 2300 RC Leiden, Netherlands
    5. e Department of Public Health and Community Medicine, A27, University of Sydney, NSW 2006, Australia
    6. f Department of Social and Preventive Medicine, University of Queensland, Medical School, Herston, QLD 4006, Australia
    7. g Unit of Healthcare Epidemiology, Institute of Health Sciences, Oxford University, Oxford OX3 7LF
    8. h Diabetes Research Laboratories, Radcliffe Infirmary, Oxford OX2 6HE
    9. i Department of Social Medicine, University of Bristol, Bristol BS8 2PR
    10. j Department of Social and Preventive Medicine, University of Berne, Switzerland
    11. k Academic Section of Geriatric Medicine, Royal Infirmary, Glasgow G4 0SF
    12. l NHS Centre for Reviews and Dissemination, University of York, York YO1 5DD

      Editor—Egger et al's regression analysis of funnel plot asymmetry is an interesting exercise in descriptive statistics: most fascinating is their distribution of biasedness in meta-analyses.1 The funnel plot test that they derive, however, rests on the assumption that it is the smaller trials that are the culprits. What if the larger trials are those that were stopped judiciously at the right moment or underwent some data-analytic “massage”? As noted in the accompanying editorial, the predictive power of the test was validated retrospectively on eight specific instances and became positive only when its test boundaries were changed to a 10% value.2 More experience with the test seems necessary.

      If we accept the test, or any similar test of heterogeneity on meta-analyses, what should we conclude from it? The main message from it is that there might be a problem because the funnel plot is asymmetrical—which we also see on the plot. The real questions to which we would like an answer are: what is the cause of the asymmetry and, more importantly, which trials should we believe? The cause of the asymmetry can be anything, from publication bias, “willingness to please” during data collection, data massage in the analysis, unclear rules for stopping the trial, or downright fraud (as indicated by Egger et al); it can also be a mix of all these things. Alternatively, the source of heterogeneity might be a true difference in underlying populations. Most difficult to live with is the overall conclusion of the test that the literature is biased. If the test is positive, should we dismiss all randomised trials on the subject? This means that we discard one trial by one group of investigators because of the results of another trial by a completely unrelated group. We might try to use quality criteria, but a recent meta-analysis on homoeopathy teaches us that this will not suffice.3

      In the end there is no escape from a return to “the expert,” who tells us which trial to believe, not only on the basis of methodology but also on the basis of insights in pathophysiology, pharmacology, and perhaps type of publication (supplements, special interest or “throw away” journals, etc). All that we can ask from the expert is a careful explanation of what arguments he or she used in accepting or dismissing the evidence from certain trials.

      References

      1. 1.
      2. 2.
      3. 3.

      Graphical test is itself biased

      1. Les Irwig, Professor of epidemiologye,
      2. Petra Macaskill, Statistical research officere,
      3. Geoffrey Berry, Professor in epidemiology and biostatisticse,
      4. Paul Glasziou, Associate professorf
      1. a Department of Geriatrics and Rehabilitation, Zieglerspital, Berne, Switzerland
      2. b Education and Clinical Center, UCLA-Sepulveda VA Medical Center, Los Angeles, CA, USA
      3. c Department of Medicine, Division of Geriatrics, University of South Carolina School of Medicine, Columbia, SC, USA
      4. d Department of Clinical Epidemiology, Leiden University Hospital, 2300 RC Leiden, Netherlands
      5. e Department of Public Health and Community Medicine, A27, University of Sydney, NSW 2006, Australia
      6. f Department of Social and Preventive Medicine, University of Queensland, Medical School, Herston, QLD 4006, Australia
      7. g Unit of Healthcare Epidemiology, Institute of Health Sciences, Oxford University, Oxford OX3 7LF
      8. h Diabetes Research Laboratories, Radcliffe Infirmary, Oxford OX2 6HE
      9. i Department of Social Medicine, University of Bristol, Bristol BS8 2PR
      10. j Department of Social and Preventive Medicine, University of Berne, Switzerland
      11. k Academic Section of Geriatric Medicine, Royal Infirmary, Glasgow G4 0SF
      12. l NHS Centre for Reviews and Dissemination, University of York, York YO1 5DD

        Editor—Although the concept is useful, the method proposed by Egger et al to detect bias in meta-analyses is itself biased1: it overestimates the occurrence and extent of publication bias. This is easily shown by simulating data for a meta-analysis of a hypothetical intervention that is effective (and therefore has a negative regression coefficient by Egger et al's method) and is free of publication bias (and hence should have an intercept of zero in the regression analysis).

        In our simulations, each study was of a treated group and a control group, both of equal size. For each simulated meta-analysis, studies ranging from 100 per group to 1000 per group, in increments of 100, were generated. The observed number of events in each group was generated from a binomial distribution.

        Here is one example in which the true event rate is 40% in the control group and 10% in the treatment group. When the true population values (which would not be known in practice) are used to estimate precision, the regression coefficient is −1.7942 (an estimated log odds ratio equivalent to the expected value of 0.1667) and the intercept (0.0380, P=0.1) is close to the expected value of zero, reflecting the lack of publication bias. However, the regression coefficient estimated when the precision is based on the observed values, as would occur using Egger et al's method, is −1.7169. More importantly, the intercept is −0.4492 and significant (P<0.0001), incorrectly suggesting that there has been publication bias. In general, our other simulations suggest that the bias in the estimated intercept is greater the more effective the intervention actually is and the smaller the sample size of the studies.

        This problem has several causes. Firstly, the estimates of precision are subject to random error due to sampling variability. This regression-dilution bias causes the regression slope to “tilt” around the mean of the predictor and response variables so that its coefficient is closer to zero; this in turn leads to the intercept becoming negative.2 Secondly, the estimated standardised log odds ratio is correlated with the estimated precision. Thirdly, the precision estimated by the method that we assume Egger et al used3 is a biased estimate of the true precision, with the degree of bias increasing as sample size decreases.4

        Clearly, until the causes of the problems we have outlined are better elucidated and solutions developed, one cannot rely on the method proposed by Egger et al to detect publication bias.

        References

        1. 1.
        2. 2.
        3. 3.
        4. 4.

        Test had 10% false positive rate

        1. Valerie Seagroatt, University research lecturerg,
        2. Irene Stratton, University research lecturerh
        1. a Department of Geriatrics and Rehabilitation, Zieglerspital, Berne, Switzerland
        2. b Education and Clinical Center, UCLA-Sepulveda VA Medical Center, Los Angeles, CA, USA
        3. c Department of Medicine, Division of Geriatrics, University of South Carolina School of Medicine, Columbia, SC, USA
        4. d Department of Clinical Epidemiology, Leiden University Hospital, 2300 RC Leiden, Netherlands
        5. e Department of Public Health and Community Medicine, A27, University of Sydney, NSW 2006, Australia
        6. f Department of Social and Preventive Medicine, University of Queensland, Medical School, Herston, QLD 4006, Australia
        7. g Unit of Healthcare Epidemiology, Institute of Health Sciences, Oxford University, Oxford OX3 7LF
        8. h Diabetes Research Laboratories, Radcliffe Infirmary, Oxford OX2 6HE
        9. i Department of Social Medicine, University of Bristol, Bristol BS8 2PR
        10. j Department of Social and Preventive Medicine, University of Berne, Switzerland
        11. k Academic Section of Geriatric Medicine, Royal Infirmary, Glasgow G4 0SF
        12. l NHS Centre for Reviews and Dissemination, University of York, York YO1 5DD

          Editor—With examples of results from meta-analyses conflicting with those from subsequent large trials there is increasing need to distinguish the good from the not so good meta-analyses. To this end, Egger et al have developed a test for detecting bias in meta-analyses based on funnel plot asymmetry.1 This test predicted discordance in meta-analyses. But, as with any significance test, there is also the possibility of falsely identifying bias when none existed. Since significance was defined by P<0.1, the false positive rate of this test would be 10%. For instance, the quoted 13% (5/38) of the systematic reviews in the Cochrane Database showing bias may be attributed to chance alone.

          Defining significance to be P<0.1 enabled the test to predict discordant meta-analyses—the conventional P<0.05 produced significant bias in only one of the four discordant meta-analyses—but resulted in a 10% false positive rate. Some may consider this rate of false positive results to be unacceptably high. Be that as it may, these findings showed the continuing need for care in the interpretation of results of significance tests. These comments, however, should not detract from the importance of looking for bias in meta-analyses and the potential benefits this test may bring to screening for such bias.

          References

          1. 1.

          Authors' reply

          1. Matthias Egger, Reader in social medicine and epidemiologyi,
          2. George Davey Smith, Professor of clinical epidemiologyi,
          3. Christoph Minder, Head, medical statistics unitj
          1. a Department of Geriatrics and Rehabilitation, Zieglerspital, Berne, Switzerland
          2. b Education and Clinical Center, UCLA-Sepulveda VA Medical Center, Los Angeles, CA, USA
          3. c Department of Medicine, Division of Geriatrics, University of South Carolina School of Medicine, Columbia, SC, USA
          4. d Department of Clinical Epidemiology, Leiden University Hospital, 2300 RC Leiden, Netherlands
          5. e Department of Public Health and Community Medicine, A27, University of Sydney, NSW 2006, Australia
          6. f Department of Social and Preventive Medicine, University of Queensland, Medical School, Herston, QLD 4006, Australia
          7. g Unit of Healthcare Epidemiology, Institute of Health Sciences, Oxford University, Oxford OX3 7LF
          8. h Diabetes Research Laboratories, Radcliffe Infirmary, Oxford OX2 6HE
          9. i Department of Social Medicine, University of Bristol, Bristol BS8 2PR
          10. j Department of Social and Preventive Medicine, University of Berne, Switzerland
          11. k Academic Section of Geriatric Medicine, Royal Infirmary, Glasgow G4 0SF
          12. l NHS Centre for Reviews and Dissemination, University of York, York YO1 5DD

            Editor—Bias in meta-analysis is often reflected in asymmetrical funnel plots. As we discussed in our paper, both bias and true heterogeneity in underlying effects can lead to asymmetry. Complex interventions such as geriatric consultation services may be implemented less thoroughly in larger studies, and this would explain the more positive results in smaller trials. Results of meta-analysis will then depend on how many, or how few, small or large studies are included. A thorough attempt should always be made to identify heterogeneity, and the analysis by Stuck et al is a good example of this.1 We maintain that in these situations the combined estimate is likely to be biased and should not feature prominently in published reports. Stuck et al suggest that we should have considered differences in outcomes across centres in the health maintenance organisation trial. Post hoc analyses of effects by study centres, however, are likely to mislead, as recently shown for the ß blocker heart attack trial.2

            Figure1

            Asymmetrical funnel plot of clinical trials of homoeopathy4 (upper panel) indicating presence of bias. The linear regression of the standard normal deviate against precision (defined as the inverse of the standard error) shows a significant (P<0.001) deviation of the intercept from zero (arrow). In the absence of bias, trials would scatter about a line running through the origin at standard normal deviate zero

            Vandenbroucke could have benefited from a formal analysis of funnel plot asymmetry on at least two occasions. After visual assessment of a funnel plot he suggested that publication bias may explain the association found between passive smoking and lung cancer.3 However, we found no evidence of asymmetry (P=0.80). Conversely, when he discussed a recent meta-analysis of homoeopathy,4 significant funnel plot asymmetry (P<0.001) would have lent support to his assertion that bias had produced a body of false positive evidence (fig).5

            Irwig et al claim that our method will overestimate the occurrence of bias. They simulated hypothetical trials of a treatment that reduced event rates from 40% to 10% (relative risk 0.25) with sample sizes ranging from 200 to 2000. Their example is not typical of the small effects usually examined in meta-analyses. More importantly, when performing 10 000 simulations based on the same assumptions we found that on average 4.99% of tests were significant at the 5% level and 9.63% were significant at the 10% level. Therefore, contrary to Irwig et al's contention, regression dilution bias did not produce false positive results above what was expected by chance, and the P value they quote for the intercept (P<0.0001), presumably based on a large number of simulations, is misleading.

            Seagroatt and Stratton are concerned about the specificity of our test. Considering the many possible biases, we think that the low sensitivity is of greater concern. When meta-analyses are based on a few small trials no test will be able to detect or exclude bias reliably. No statistical solution exists in this situation, and the results should be treated with great caution.

            References

            1. 1.
            2. 2.
            3. 3.
            4. 4.
            5. 5.

            Prospectively identified trials could be used for comparison with meta-analyses

            1. Peter Langhorne, Senior lecturerk On behalf of the Stroke Unit Trialists' Collaboration
            1. a Department of Geriatrics and Rehabilitation, Zieglerspital, Berne, Switzerland
            2. b Education and Clinical Center, UCLA-Sepulveda VA Medical Center, Los Angeles, CA, USA
            3. c Department of Medicine, Division of Geriatrics, University of South Carolina School of Medicine, Columbia, SC, USA
            4. d Department of Clinical Epidemiology, Leiden University Hospital, 2300 RC Leiden, Netherlands
            5. e Department of Public Health and Community Medicine, A27, University of Sydney, NSW 2006, Australia
            6. f Department of Social and Preventive Medicine, University of Queensland, Medical School, Herston, QLD 4006, Australia
            7. g Unit of Healthcare Epidemiology, Institute of Health Sciences, Oxford University, Oxford OX3 7LF
            8. h Diabetes Research Laboratories, Radcliffe Infirmary, Oxford OX2 6HE
            9. i Department of Social Medicine, University of Bristol, Bristol BS8 2PR
            10. j Department of Social and Preventive Medicine, University of Berne, Switzerland
            11. k Academic Section of Geriatric Medicine, Royal Infirmary, Glasgow G4 0SF
            12. l NHS Centre for Reviews and Dissemination, University of York, York YO1 5DD

              Editor—Egger et al's paper about bias in meta-analysis outlines the value of comparing the results of a meta-analysis of small randomised trials with those of a subsequent large definitive trial.1 Unfortunately, in many areas of clinical practice such as stroke rehabilitation, large trials are difficult to carry out and unlikely to be available.2

              One possible solution in this circumstance is to compare the results of meta-analysis with those of prospectively identified trials that could not have been subject to publication bias. This was possible with the recent publication of a systematic review by the Stroke Unit Trialists' Collaboration.3 The funnel plot for several small trials can be compared with the summary result of either six trials which were identified before they were fully published or two trials (in Perth and Nottingham) which were recruited to the systematic review project before data analysis had started. The figure shows the funnel plot results for individual trials and the summary results for the two groups of prospectively identified trials.

              Figure2

              Funnel plot results: odds ratio for combined adverse outcomes of death and needing institutional care versus precision of trial or group of trials RCT=randomised controlled trial

              In this case the results of meta-analysis seem to be compatible with those of the prospectively identified trials. With the increasing move towards prospective registration of trials, this approach may allow some assessment of bias in meta-analyses where no large definitive trial is available.

              References

              1. 1.
              2. 2.
              3. 3.

              Increase in studies of publication bias coincided with increasing use of meta-analysis

              1. Fujian Song*, Senior research fellowl*,
              2. Simon Gilbody, MRC fellow in health services researchl*
              1. a Department of Geriatrics and Rehabilitation, Zieglerspital, Berne, Switzerland
              2. b Education and Clinical Center, UCLA-Sepulveda VA Medical Center, Los Angeles, CA, USA
              3. c Department of Medicine, Division of Geriatrics, University of South Carolina School of Medicine, Columbia, SC, USA
              4. d Department of Clinical Epidemiology, Leiden University Hospital, 2300 RC Leiden, Netherlands
              5. e Department of Public Health and Community Medicine, A27, University of Sydney, NSW 2006, Australia
              6. f Department of Social and Preventive Medicine, University of Queensland, Medical School, Herston, QLD 4006, Australia
              7. g Unit of Healthcare Epidemiology, Institute of Health Sciences, Oxford University, Oxford OX3 7LF
              8. h Diabetes Research Laboratories, Radcliffe Infirmary, Oxford OX2 6HE
              9. i Department of Social Medicine, University of Bristol, Bristol BS8 2PR
              10. j Department of Social and Preventive Medicine, University of Berne, Switzerland
              11. k Academic Section of Geriatric Medicine, Royal Infirmary, Glasgow G4 0SF
              12. l NHS Centre for Reviews and Dissemination, University of York, York YO1 5DD

                Editor—Egger et al suggest a method for testing the possible existence of publication bias, based on the assumption that larger trials are more likely to be published, irrespective of their results.1 Stern and Simes, however, suggest that large sample size is not sufficient, because of the delay in the publication of larger studies with negative results.2 A recent letter showed that trials published at an early stage were more likely to be positive.3

                To test the association between the year of publication and treatment effect we identified 38 meta-analyses published in BMJ or JAMA during 1992–6 which provided summary data from individual studies. For each meta-analysis we tested the association between the year of publication and the treatment effect of the individual studies, using rank correlation analysis. We also tested the correlation between the sample size and the treatment effect. We ignored the sign of the correlation coefficient because it is often difficult to decide which group was the control when competing interventions were compared. Using 0.10 as a level of significance, we found that four meta-analyses showed a significant correlation between the year of publication and the treatment effect while 10 showed a significant correlation between the sample size and the treatment effect. In 25 meta-analyses the correlation coefficient between the sample size and the treatment effect was greater than that between the year of publication and the treatment effect. Therefore, both the delay to publication and the small sample size may be associated with the negative results but small sample size seems to be more important as a risk of publication bias.

                Publication bias jeopardises the validity of meta-analysis as well as any other attempts to use published literature. A systematic approach is crucial to identify all published studies, particularly in low circulation or non-English journals and in the grey literature, and to exclude duplicate publications of positive results.4 We agree with Naylor that “meta-analysis is an important contribution to research and practice but it's not a panacea.”5 In fact, it was meta-analysis and systematic review that highlighted the problem of publication bias. By searching Medline, we found that the number of published studies (empirical, methodological, or editorial) of publication bias was 71 during 1993 to June 1997, 41 during 1987-92, three during 1981-6, and zero during 1966-80. The increase in the number of articles coincides with an increasing use of meta-analysis. It is naive to believe that publication bias did not exist or was less important a decade ago, when medical literature review was dominated by conventional non-systematic methods.

                Footnotes

                • *The authors are undertaking a review of publication bias in systematic reviews funded by the NHS Health Technology Assessment programme.

                References

                1. 1.
                2. 2.
                3. 3.
                4. 4.
                5. 5.
                View Abstract