Article Text


Methods used to conduct and report Bayesian mixed treatment comparisons published in the medical literature: a systematic review
  1. Diana M Sobieraj1,
  2. Joseph C Cappelleri2,
  3. William L Baker1,
  4. Olivia J Phung1,
  5. C Michael White1,
  6. Craig I Coleman1
  1. 1University of Connecticut/Hartford Hospital Evidence-based Practice Center, Hartford, Connecticut, USA
  2. 2Department of Biostatistics, Pfizer, Groton, Connecticut, USA
  1. Correspondence to Dr Craig I Coleman; ccolema{at}


Objectives To identify published closed-loop Bayesian mixed treatment comparisons (MTCs) and to summarise characteristics regarding their conduct and reporting.

Design Systematic review.

Methods We searched multiple bibliographic databases (January 2006–31 July 2011) for full-text, English language publications of Bayesian MTCs comparing the effectiveness or safety of ≥3 interventions based on randomised controlled trials and having at least one closed loop. Methodological and reporting characteristics of MTCs were extracted in duplicate and summarised descriptively.

Results We identified 34 Bayesian MTCs spanning 13 clinical areas. Publication of MTCs increased over the 5-year period; with 76.5% published during or after 2009. MTCs included a mean (±SD) of 35.9±30.1 trials (n=33 459±71 233 participants) and 8.5±4.3 interventions (85.7% pharmacological). Non-informative and informative prior distributions were reported to be used in 44.1% and 8.8% of MTCs, respectively, with the remainder failing to specify the prior used. A random-effects model was used to analyse the networks of trials in 58.5% of MTCs, all using WinBUGS; however, code was infrequently provided (20.6%). More than two-thirds of MTCs (76.5%) also conducted traditional meta-analysis. Methods used to evaluate convergence, heterogeneity and inconsistency were infrequently reported, but from those providing detail, methods appeared varied. MTCs most often used a binary effect measure (85.3%) and ranking of interventions based on probability was common (61.8%), although rarely displayed in a figure (8.8% of MTCs). MTCs were published in 24 different journals with a mean impact factor of 9.20±8.71. While 70.8% of journals imposed limits on word counts and 45.8% limits on the number of tables/figures, online supplements/appendices were allowed in 79.2% of journals. Publication of closed-loop Bayesian MTCs is increasing in frequency, but details regarding their methodology are often poorly described. Efforts in clarifying the appropriate methods and reporting of Bayesian MTCs should be of priority.

Statistics from

Article summary

Article focus

  • To identify published closed-loop Bayesian mixed treatment comparisons (MTCs) and to summarise characteristics regarding their conduct and reporting.

Key messages

  • We identified 34 closed-loop Bayesian MTCs spanning 13 clinical areas, published in 24 different journals.

  • Closed-loop Bayesian MTCs are increasing in frequency, but details regarding their methodology are often poorly described. Efforts in clarifying the appropriate methods and reporting of Bayesian MTCs should be of priority.

Strengths and limitations of this study

  • Our systematic review adds to this existing literature by updating results and adding new information as prior reviews only included literature through 2007/2008. Unlike prior publications, our systematic review focused only on Bayesian MTCs of networks with at least one closed loop.

  • Unlike prior reviews, we evaluated reporting of additional model characteristics in depth including testing for model fit, evaluation of convergence, adjustment for covariates or multiarm trials, the specific priors used and availability of the code and aggregated study-level data.

  • An important limitation of our review is that we cannot say with certainty that a lack of reporting means a given method or analysis was not undertaken (ie, the testing for convergence or inconsistency need not be described in a paper for it to have been performed by the investigators) or that the reporting of a piece of data or statistical code was not considered.


Clinicians and decision-makers often need to select from multiple available interventions when determining the optimal treatment for a disease. Ideally, high-quality randomised controlled trials (RCTs) that estimate the effectiveness of all possible interventions directly against one another would be available to guide decision-making.1–4 However, interventions are commonly compared with placebo or non-active control in RCTs rather than another active intervention. When direct comparative trials are completed, they typically include only two interventions from a larger group of possible treatments. As such, decision-makers are faced with a lack of adequate direct comparative data with which to make their judgements.

In the absence of head-to-head trials, indirect comparisons may provide valuable information. For example, if two different interventions have been evaluated against a common comparator, the comparative effects of the two interventions versus each other can be estimated indirectly.1 ,2 Even in the presence of head-to-head data, indirect comparisons may add value by improving precision of treatment effect estimates.

Methodologies exist to indirectly compare interventions, as do modes to implement such methodologies.1 ,5–8 In the simplest form, interventions that are evaluated against a common comparator in separate trials can be compared using an anchored-indirect treatment comparison approach.5 As a generalisation of indirect comparisons, when more than two treatments are being compared indirectly, and at least one pair of treatments is being compared directly as well as indirectly (a closed loop is present), both direct and indirect types of data can be used to estimate effects in a mixed treatment comparison (MTC) meta-analysis using a Bayesian or frequentist framework.1–8 Prior research has attempted to categorise the use of indirect comparisons in the medical literature, but either did not included Bayesian MTCs or collected limited data on this approach.9 ,10 The Agency for Healthcare Research and Quality commissioned us to evaluate how MTCs in published systematic reviews are conducted and reported.11 We present the findings of our systematic review identifying closed-loop MTCs using a Bayesian framework and descriptively summarise their methodological and reporting characteristics.


A systematic literature search was conducted in MEDLINE, the Centre for Reviews and Dissemination Databases (including the Database of Abstracts and Reviews of Effects, Health Technology Assessment and the National Institute for Health Research Economic Evaluation Database), The Cochrane Library and the American College of Physicians Journal Club from 1 January 2006 through 31 July 2011. The search strategy in online supplementary appendix S1 was used. Manual additions were permitted based on the citations identified by the literature search.

Two independent investigators assessed citations for inclusion in a parallel manner based on a priori defined criteria. Specifically, we included meta-analyses that compared the clinical effectiveness or safety of interventions (any pharmacological (including placebo and different doses), behavioural or procedural interventions) based on RCTs, utilised a Bayesian approach to conduct MTC, had at least one closed loop (see online supplementary appendix S2) and were published in full-text and in the English language. There has been inconsistency in what constitutes a MTC in the medical literature12; therefore, for the purposes of this systematic review a MTC was defined as the comparison of three or more interventions in which direct as well as indirect evidence was used. Methodological publications that presented MTCs for illustrative purposes and cost-effectiveness analyses were not considered in this systematic review, nor were individual patient data meta-analyses.

Two reviewers independently extracted data with disagreements resolved through discussion. For each included closed-loop Bayesian MTC, all published material including the manuscript, supplements, appendices or external websites which the reader of the article was referred to for additional information were used during data extraction. Therefore, the extraction of data was predicated on the reporting of the information by the authors within these sources. When extracting data, we recorded what the authors reported without ourselves judging whether the methods were appropriate or not. If there was insufficient data from all available sources, we indicated ‘not reported’ for that criterion on data extraction.

General characteristics of each MTC were extracted including author and funding information, if a methodologist was an author, the number and type of intervention comparisons made, number of printed pages and use of supplement or appendix, the number of trials and patients in the analyses, clinical area (eg, cardiology and endocrinology) and the network pattern. For the purpose of this project, we defined a methodologist as an individual having an affiliation with a department of statistics, biostatistics, epidemiology, clinical epidemiology or public health services, as determined by author information and affiliations listed in the publication.13 The country in which a review was conducted was determined by the corresponding author's affiliation.

The network pattern3 ,4 ,11 ,14 was determined by figures presented within the identified publication. If a figure was not available, we determined the pattern based on text descriptions of included trials.

We also extracted information regarding the methodology used to conduct the closed-loop Bayesian MTC including the models applied (eg, fixed vs random effects), description of model parameters (eg, choices of prior distributions), methods for assessment of model fit, potential bias, inconsistency and heterogeneity, use of covariate adjustment in models, whether the model accommodated multiarm trials, software utilised and availability of code.

Finally, we extracted data concerning the reporting of results including the type of endpoint (eg, binary vs continuous), effect size and measure of variance, use of other methods to report results (eg, probability of treatment being best, claims of equivalence or non-inferiority) and the format/presentation of results (eg, text, tables and figures). Characteristics of the journals in which included MTCs were published were collected, including journal name, impact factor, allowance of supplements or appendices, and limitations on word, table and figure counts.

The characteristics of the closed-loop Bayesian MTCs and journals were summarised descriptively. Categorical data are presented using frequencies and continuous data as means±SDs.


A total of 626 citations were identified through the database searches with an additional five MTCs identified through manual review (figure 1). After full text review, 35 articles representing 34 unique closed-loop Bayesian MTCs were included.15–49 The publication by Orme et al25 analysed two distinct networks of RCTs.

Figure 1

Flow diagram of citation inclusion and exclusion.

The rate of publication of closed-loop Bayesian MTCs increased over the 5-year search period, with 26 (76.5%) of the MTCs published between 2009 and 2011 compared with only 8 (23.5%) published prior to 2009. On average, 6.1±4.8 authors were listed per publication and less than half of publications (47.1%) included a methodologist as an author (table 1). The most common country from which authors published MTCs was the UK (35.3%), followed by the USA (11.8%) and Greece (11.8%).

Table 1

General characteristics of Bayesian mixed treatment comparisons

Funding sources for the MTCs included governmental/foundation (29.4%), industry (26.5%) and unfunded (17.6%) with 23.6% not making a statement regarding funding source(s). Only two publications identified an organisational affiliation, one each with the Health Technology Assessment Program and The Cochrane Collaboration. The mean number of printed pages per publication was 16.6±36.3 (range 4–221) and over half published a supplement or appendix. From those that did not publish a supplement or appendix, one publication did not have the option to do so, given journal (or report) specifications.

There were 13 different categories of disease states evaluated among included MTCs. The mean number of interventions included within the analyses was 8.5±4.3, of which most were pharmacological (85.7%) in nature. The mean number of trials included in the MTCs was 35.9±30.1 and the mean number of patients included was 33 459±71 233 (range 594–324 168).

The most common model used in closed-loop Bayesian MTCs was a random-effects model (58.5%; table 2). Very few analyses reported information about whether there was adjustment for covariates (25.6%). Of the 28 MTCs that included trials with three or more arms, 10 (35.7%) reported use of an adjustment for multiarm trials. Less than half of all analyses reported testing model fit. Of the 15 analyses that reported testing model fit in some manner, the most common method was residual deviance (40%). More than two-thirds of the MTCs (76.5%) also included a traditional meta-analysis.

Table 2

Methods characteristics in Bayesian MTCs

Closed-loop Bayesian MTCs used WinBUGS software, and two also specified the use of additional software including the BUGS XLA Wrapper and S-Plus. The statistical WinBUGS code was made available to the reader in only 20.6% of cases, most often in an online supplement/appendix (71.4%). Aggregated study-level patient data used in the MTC was frequently made available to the reader and of these 21 analyses (61.8%) it was most commonly published within the manuscript itself (85.7%). Evaluation of convergence was found in 35.3% of analyses, most commonly using the Gelman-Rubin statistic (58.3%).

Utilised priors were reported as either non-informative (vague or flat) or informative in 44.1% and 8.8% of analyses, respectively. The remaining analyses (47.1%) did not specify the nature of the prior distributions used. It was also uncommon for the actual prior distribution to be reported for the population treatment effect (d) and the between-study SD of population treatment differences across studies (σ); with only 32.4% and 29.4% of MTCs, respectively, reporting this information. Sensitivity analyses based on priors were conducted in 11.8% of MTCs.

Accompanying traditional meta-analyses were common (61.5%). The most common method used to assess heterogeneity was the I2 statistic (81.3%) followed by the Cochrane Q-statistic (43.8%). Evaluation of heterogeneity within the MTC was less common, reported in only 32.4% of publications. Of these 11 analyses, τ2 (among-study variance of true effects) was used in 54.5% of analyses followed by between-study SD (45.5%) and several other less frequent methods (some MTCs reported multiple means to test for heterogeneity and therefore are counted twice in the numerator).

Inconsistency between indirect and direct estimates was evaluated in 24 (70.6%) studies. One review reported being unable to evaluate inconsistency due to lack of direct data while the remaining MTCs simply did not comment on inconsistency. The most common method used to evaluate inconsistency was comparing results of the MTC to those of a traditional meta-analysis conducted by the authors simultaneously or a previously published traditional meta-analysis.

Most analyses (85.3%) reported outcomes that were binary (table 3). Of these 29 analyses, ORs were the most commonly reported effect measure (62.1%), followed by relative risks (17.2%) and HRs (13.8%), among other less frequent measures. Of the 10 (29.4%) analyses that reported continuous outcomes, the weighted-mean difference was the most common effect measure (80%). All analyses reported variance with 95% credible intervals and one also reported SEs. Most analyses did not report if the posterior distribution was the mean or median value (85.3%). Presentation of results varied, although most analyses used multiple media including tables, figures and text.

Table 3

Outcomes and results reporting in Bayesian mixed treatment comparisons

Few analyses (8.8%) presented graphical representations of the posterior distributions of outcomes. Rank-ordering of interventions based on probability statements (including rankograms with the probability of a treatment being best, second best and so on) for a given outcome was reported in 21 (61.8%) of the MTCs. Only one MTC made claims of equivalence and two made claims of non-inferiority, of which two defined the minimally important difference required to make these statements.

Complete details of each journal in which at least one MTC was published can be found in tables 4 and 5. The 34 MTCs were published in 24 different journals, with a mean impact factor of 9.20±8.71. BMJ published the most MTCs (6 of the 34, 17.6%) followed by Current Medical Research and Opinion (4 of the 34, 11.8%). The majority of journals (70.8%) imposed word count limits and 45.8% imposed table/figure limitations; however, 79.2% of journals allowed online supplements or appendices.

Table 4

Aggregate journal characteristics

Table 5

Individual journal characteristics


Meta-analysis has been regarded as the most highly cited study design in health science.50 However, a drawback of the traditional meta-analysis is its ability to compare only two interventions, without the ability to simultaneously evaluate other comparators. This is inconsistent with clinical practice as in many instances there are a variety of interventions that exist and one must decide which is best. The use of statistical methods (including simple approaches as well as MTC meta-analysis) to compare greater than two interventions simultaneously is on the rise within the peer-reviewed literature. As recent as 2005, a search of the medical literature yielded four publications that utilised such methods; while in 2011, the number increased to 57.12 The results of our systematic review also suggest that indirect comparisons, specifically closed-loop Bayesian MTC, have become more prevalent. A recent study found that a median of three studies (IQR 2–6) were included per meta-analysis, with close to 75% of meta-analyses including five or less trials.51 Our results suggest that compared to traditional meta-analyses, closed-loop Bayesian MTCs are larger and more comprehensive. Moreover, identified MTCs were published in a wide variety of journals covering a range of disease states and thus likely to reach a large readership given their collective mean impact factor. However, we found a variety of reporting strategies or a lack of reporting of characteristics that are important to the conduct of closed-loop Bayesian MTC. This may be related to the limited guidance as to how to conduct and report an MTC, a topic which has been extensively reviewed and summarised elsewhere.11

Prior research by Donegan et al9 has attempted to categorise published indirect comparisons and evaluate their quality, although advanced methods including Bayesian (and frequentist) MTCs were not included. Of the 43 included comparisons, 23 used an anchored indirect approach while others used hypothesis testing, CI overlap and meta-regression methods to draw indirect comparisons. The authors concluded that quality of published indirect comparisons, in particular the assessment of model assumptions and the methods used to do so, were suboptimal. A set of quality criteria were proposed by the authors to be used in future indirect comparisons, specifically evaluating if the method of indirect comparison applied was appropriate, if methods to assess similarity, homogeneity and consistency were stated and if such methods were appropriate, and details of overall interpretation and reporting of results.

Song et al10 also have systematically reviewed previously published indirect comparisons and, of the 88 identified, found only 18 using ‘network or Bayesian approaches’. Their findings are similar to that of Donegan and colleagues, suggesting that the main methodological problems included unclear understanding of assumptions, incomplete inclusion of relevant studies, flawed or inappropriate methods, lack of similarity assessment and inappropriate combination of direct and indirect evidence.

Our systematic review adds to this existing literature by updating results and adding new information. First, the aforementioned prior reviews only included literature through 2007/2008, making ours the most up-to-date review available. Unlike prior publications, our systematic review focused only on Bayesian MTCs of networks with at least one closed loop, perhaps the most common method utilised of late to analyse complex networks of RCTs. While prior publications focused on the evaluation and reporting of assumptions made within the models, we evaluated additional model characteristics in depth including testing for model fit, evaluation of convergence, adjustment for covariates or multiarm trials, the specific priors used and availability of the code and aggregated study-level data. Despite these differences, however, our findings are consistent with prior research and with the opinion of experts regarding the challenges and concerns around implementing and reporting these more complex statistical methods.10 ,12 ,52 Perhaps clearer guidance as to how to conduct and report these types of meta-analyses will lead to a more optimal and consistent approach.

While we only characterised the methods and reporting of closed-loop Bayesian MTC in this report, our search strategy was designed to capture MTCs regardless of methodological approach (including frequentist MTC). Of note, only a handful (n=9) of frequentist MTCs were identified in our search, three of which specifically reference using the methods for MTC proposed by Lumley and colleagues, while the others more generically referenced mixed-model approaches.49 ,53–60 This suggests that meta-analysts at present seem to favour a Bayesian approach to MTC, since investigators could have chosen to use either a Bayesian or frequentist method for any of the MTC identified in our search (given all analysed networks with at least one closed loop). Given the relative paucity of frequentist models, we do not describe the characteristics of their methods and reporting in this paper but they can be found elsewhere.11

An important limitation of our review is that we cannot say with certainty that a lack of reporting means a given method or analysis was not undertaken (ie, the testing for convergence or inconsistency need not be described in a paper for it to have been performed by the investigators) or that the reporting of a piece of data or statistical code was not considered. However, we evaluated word, table and figure limits imposed by journals in which these MTCs were published and our findings do not suggest journal space should be an obstacle to complete reporting. Another limitation is the definition used to describe a methodologist. While this definition has been used by previous researchers in a similar topic area,13 to our knowledge it has not been validated and therefore may not accurately depict the true involvement of an individual who considered themselves a methodologist.

With the growing publication of Bayesian MTCs in the peer-reviewed literature and the recognised challenges of such methods, its appropriate use and interpretation becomes imperative. Efforts in clarifying the appropriate use and reporting of Bayesian MTC should be of priority.


View Abstract
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors DMS, JCC, CIC, WLB, OJP and CMW were responsible for study design. DMS, WLB and OJP were responsible for data collection. DMS, CIC and JCC were responsible for data analysis and interpretation. All authors contributed in drafting the manuscript, revising the manuscript and approved the final manuscript. CIC is responsible for the overall content as the corresponding author.

  • Funding This work was supported by the Agency for Healthcare Research and Quality contract number HHSA 290 2007 10067 I.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Individual study data that has been extracted can be found by accessing the full report on the AHRQ EHC website.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.