Article Text

This article has a correction. Please see:


Quality improvement needed in quality improvement randomised trials: systematic review of interventions to improve care in diabetes
  1. Noah M Ivers1,
  2. Andrea C Tricco2,
  3. Monica Taljaard3,
  4. Ilana Halperin4,
  5. Lucy Turner5,
  6. David Moher5,
  7. Jeremy M Grimshaw5
  1. 1Department of Family and Community Medicine, Women's College Hospital-University of Toronto, Toronto, Ontario, Canada
  2. 2Li Ka Shing Knowledge Institute of St Michael's Hospital, Toronto, Ontario, Canada
  3. 3Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Ontario, Canada
  4. 4Division of Endocrinology and Metabolism, Department of Medicine, University of Toronto, Toronto, Ontario, Canada
  5. 5Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada
  1. Correspondence to Dr Noah Ivers; noah.ivers{at}


Objective Despite the increasing numbers of published trials of quality improvement (QI) interventions in diabetes, little is known about the risk of bias in this literature.

Design Secondary analysis of a systematic review.

Data sources Medline, the Cochrane Effective Practice and Organisation of Care (EPOC) database (from inception to July 2010) and references of included studies.

Eligibility criteria Randomised trials assessing 11 predefined QI strategies or financial incentives targeting health systems, healthcare professionals or patients to improve the management of adult outpatients with diabetes.

Analysis Risk of bias (low, unclear or high) was assessed for the 142 trials in the review across nine domains using the EPOC version of the Cochrane Risk of Bias Tool. We used Cochran-Armitage tests for trends to evaluate the improvement over time.

Results There was no significant improvement over time in any of the risk of bias domains. Attrition bias (loss to follow-up) was the most common source of bias, with 24 trials (17%) having high risk of bias due to incomplete outcome data. Overall, 69 trials (49%) had at least one domain with high risk of bias. Inadequate reporting frequently hampered the risk of bias assessment: allocation sequence was unclear in 82 trials (58%) and allocation concealment was unclear in 78 trials (55%). There were significant reductions neither in the proportions of studies at high risk of bias over time nor in the adequacy of reporting of risk of bias domains.

Conclusions Nearly half of the included QI trials in this review were judged to have high risk of bias. Such trials have serious limitations that put the findings in question and therefore inhibit evidence-based QI. There is a need to limit the potential for bias when conducting QI trials and improve the quality of reporting of QI trials so that stakeholders have adequate evidence for implementation.

Statistics from

Article summary

Article focus

  • Reliable quality improvement research is needed to make decisions about initiating or scaling up quality improvement strategies.

  • The number of published quality improvement trials has increased rapidly over time.

  • The quality of trials published in other areas of health seem to be improving over time but the risk of bias in the quality improvement literature is uncertain.

Key messages

  • Nearly half of quality improvement trials for diabetes are at high risk of bias.

  • The quality of quality improvement trials does not seem to be improving over time.

  • Policy-makers, administrators, clinicians and research funders must carefully scrutinize the methods used in quality improvement trials to ensure evidence-based quality improvement.

Strengths and limitations of this study

  • This is the largest systematic review of risk of bias in the quality improvement literature and the only to assess for trends over time.

  • The risk of bias tool does not capture all sources of methodological bias and poor reporting interferes with the assessment of many domains.

  • The merits of any given trial report depend to some extent on the needs of the reader, such that some trials with high risk of bias may be of value for certain purposes.


There is significant interest in quality improvement (QI) in healthcare, as evidenced by the rapidly increasing number of randomised clinical trials (RCTs) of QI interventions, especially in the diabetes literature.1 RCTs can provide a foundation for making statements regarding causation, but the validity of trials varies widely; trials with adequate allocation concealment and blinding generally produce smaller effect sizes.2 Since internal validity in QI trials is a necessary precursor for application to other settings,3 the ‘risk of bias’ of the findings should be assessed to ascertain the utility of the trial results. When an RCT is deemed to have high risk of bias, the study's findings become questionable.4

Evaluations to assess trends in methodological quality of RCTs have been conducted in many fields of healthcare,5 but no previous reviews have assessed risk of bias in QI RCTs or whether risk of bias in QI RCTs has changed over time. Recently, we conducted a systematic review and metaregression that included 142 RCTs evaluating QI strategies to improve care for patients with diabetes.1 In this secondary analysis of those data, we aimed to examine the risk of bias of included studies using the Cochrane Risk of Bias tool developed by the Cochrane Effective Practice and Organisation of Care (EPOC) group6 and determine whether the proportion with high risk of bias decreased over time. We also evaluated the trial and publication characteristics that might be associated with high risk of bias. Finally, we assessed whether the adequacy of reporting of risk of bias domains improved over time.


A detailed description of the methods used for searching, screening and abstracting the relevant data has been published1 and is briefly summarised here.

Search strategy

Studies were identified by searching MEDLINE and the Cochrane EPOC database (up to July 2010), and screening references of included RCTs. The search strategy has been previously published1 and is available on request.

Study selection

RCTs examining 1 of the 11 predefined QI strategies, and/or financial incentives, targeting health systems and/or healthcare professionals for the management of adult outpatients with diabetes were included. RCTs had to report at least one of the chosen process of care measures (proportion of patients taking acetylsalicylic acid, statins, antihypertensive medication, screened for retinopathy, screened for foot abnormalities and monitored for renal function) or intermediate outcomes (glycosylated haemoglobin levels, low-density lipoprotein cholesterol levels, diastolic and systolic blood pressure, proportion of patients with controlled hypertension and proportion of patients who quit smoking) for inclusion.

Data abstraction

A draft data abstraction form was developed and modified after a training exercise among reviewers. Two reviewers abstracted relevant data for each RCT independently. Discrepancies were resolved by discussion or the involvement of a third reviewer. Authors of the included RCTs were contacted to obtain further information for data items requiring clarification. Journal impact factors from journal citation reports (ISI Web of Science, 2009) were obtained. When a journal's ranking was unavailable, we used the impact ranking of the open access SMImago journal and country rank database if available.7 This ranking is calculated using a similar formula and is strongly correlated with the journal citation impact factor.8

Assessing risk of bias

As the included trials tested QI interventions, the Cochrane EPOC Risk of Bias Tool6 was used to assess the risk of bias in each study. The standard Cochrane Risk of Bias Tool includes an assessment of seven domains: sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective reporting and others. The Cochrane Handbook9 provides instructions for making judgements about the specific domains as high, unclear or low risk. When formulating summary assessments for each trial, classification of a study as ‘high risk’ indicates that bias could have affected the results, while unclear risk of bias indicates that some doubt exists about the results, and low risk of bias indicates that bias is unlikely to affect the results. It has been shown empirically that studies classified as high risk using this tool are more likely to have larger effect sizes.10

The EPOC tool was adapted to account for the unique features of QI trials. (The guidelines for applying the Cochrane EPOC tool are summarised in table 1.) For example, in many QI trials, it is not possible to blind participants. In addition, QI trials may require cluster-randomisation to avoid contamination, but in cluster-randomised trials balance at baseline is a particular concern.11 Therefore, the EPOC tool uses the same approach as the general Cochrane Risk of Bias Tool, but requires an assessment of bias in nine domains: sequence generation, allocation concealment, similarity of baseline measurements, similarity of baseline characteristics, incomplete data, blinding of outcome assessment, contamination, selective outcome reporting and others. If a given domain is deemed ‘unclear’, it was inadequately reported to determine whether it meets high risk or low risk criteria. Risk of bias assessment was conducted independently by a clinician-researcher (NMI) and a systematic review methodologist (ACT), and conflicts were resolved by discussion with an expert QI trialist (JMG).

Table 1

Cochrane Effective Practice and Organisation of Care risk of bias assessment tool*


For each risk of bias domain, the proportions of RCTs meeting the criteria for high or low or unclear risk of bias were determined. To assess for trends over time in the bias classifications, year of publication was categorised into three groups demarcated by the publication of the 2001 CONSORT statement12 and the publication of the earlier version of the systematic review of diabetes QI interventions in 2006,13 as we believed that these may have spurred investigators to improve the quality of their trial. Therefore, we categorised the year of publication as before 2002; 2002–2006; and 2007–2010. We examined each of the risk of bias domains for change over time descriptively and conducted either exact or asymptotic Cochran-Armitage tests for trend for each item.

We estimated the proportion of QI RCTs at high risk of bias overall, together with 95% asymptotic CI. For this analysis, we created a dichotomous indicator for each RCT based on whether or not the study was classified as high risk of bias in at least one domain. To assess for trends in reporting over time, we dichotomised domains as ‘reported’ (low or high risk of bias) and ‘unreported’ (unclear risk of bias). We tested for trend over time in the proportion at high risk of bias overall, hypothesising that the proportion would decline over time. We used the same year of publication categories and conducted Cochran-Armitage tests for trend of the dichotomous indicator.

We also conducted a post hoc sensitivity analysis that applied an empirically based rule for assigning high risk of bias overall. Since previous meta-analyses have found that high risk of bias in four specific domains, namely allocation sequence generation, allocation concealment, blinding and selective outcome reporting are each associated with greater effect size,14–16 we repeated the analyses considering only studies with high risk of bias in these domains as high risk of bias overall.

Finally, we tested for associations between high risk of bias in at least one domain and study characteristics chosen a priori: type of diabetes (type 1, type 2, both or unclear), type of allocation (cluster randomised or patient randomised), country (USA or Canada, UK or Western Europe or others), type of intervention (single or multifaceted), journal impact factor, effective sample size and year of publication using χ2 tests (or Fisher's exact tests, as appropriate) for categorical and Wilcoxon signed-rank tests for continuous measures. We hypothesised that each of these characteristics may be associated with studies at high risk of bias overall.

All analyses were conducted in SAS V.9.2.17


See figure 1 for a study flow diagram.

We analysed 142 studies, with 37 (26%) published before 2002, 46 (32%) between 2002 and 2006, and 59 (42%) between 2007 and 2010. These studies evaluated the effects of QI interventions on 123 529 patients with diabetes. Trial and patient characteristics are described in table 2. The proportions of studies judged to be at low, unclear or high risk of bias for each domain are illustrated in figure 2. The domains most commonly at high risk of bias were outcome reporting bias (17%) and similarity across characteristics at baseline (16%). A lack of similarity in outcome measures at baseline (10%), and lack of adequate blinding (8%) were also relatively common domains with high risk of bias. Studies were rarely at high risk of bias due to the allocation sequence generation (4%) or allocation concealment (3%), but these domains were often unclearly reported (57% and 55% unclear, respectively). Selective outcome reporting was deemed unclear approximately 84% of the time because published protocols were rarely available and it was often plausible that many more outcomes than those reported were measured. Table 3 indicates a lack of significant trend over time in the proportion of trials at high risk of bias for any given domain. Examination of table 3 also reveals no trends over time in quality of reporting for any of the risk of bias domains.

Table 2

Study and patient characteristics

Table 3

Trends over time in proportions of trials classified high, unclear or low for each risk of bias domain

Figure 2

Percentage of studies judged to be at low, unclear or high risk of bias in each risk of bias domain.

Overall, 48.6% (69/142) of the RCTs had a high risk of bias in at least one domain (95% CI 40.4 to 56.8). Figure 3 illustrates the rapid increase in number of QI RCTs published over time and the cumulative proportion of trials having at least one domain with high risk of bias up to a given year. In general, the line representing the proportion at high risk of bias runs parallel to the number of trials published, consistently accounting for almost half of the studies. Table 4 indicates a lack of significant trend over time in the proportion of trials with at least one domain with high risk of bias: these proportions were 46%, 44% and 54% before 2002, between 2002 and 2006, and after 2006, respectively. Table 4 also demonstrates a lack of significant association between any of the study characteristics considered and the presence of high risk of bias in at least one domain.

Table 4

Association between study characteristics and risk of bias

Figure 3

Cumulative number of diabetes quality improvement trial publications at high risk of bias in any domain, 1990–2010.

The sensitivity analysis, restricting studies defined as high risk of bias overall to those with high risk of bias in one of four domains (allocation sequence generation, allocation concealment, blinding or selective outcome reporting) also revealed no trends over time—the proportions were 19%, 20% and 20% before 2002, between 2002 and 2006 and after 2006, respectively (p=0.86).


Main findings

Using the Cochrane EPOC Risk of Bias Tool,6 we found that nearly half of RCTs focusing on diabetes had at least one domain at high risk of bias. The trials were most often at high risk of bias due to inadequate follow-up of participants, a lack of similarity at baseline across outcome measures or covariates, or inadequate blinding. We also noted that the majority of RCT reports failed to include an adequate description of the allocation process (ie, sequence generation and allocation concealment were ‘unclear’). To be interpreted appropriately, RCTs must be completely and transparently reported.18 ,19 Our findings indicate that greater efforts are needed to ensure both adequate reporting and methodological conduct of diabetes QI trials.

We found that poor follow-up, baseline imbalances and blinding were the most common sources of high risk of bias. Although these domains may be difficult fully control in QI trials, methodological approaches are available to mitigate and/or explore such causes of risk of bias. For example, sensitivity analyses may be used to explore the risk of bias related to loss of follow-up, and risk of baseline imbalances in QI trials may be reduced through restricted randomisation techniques, especially when trials are cluster-randomised with relatively few clusters. In addition, selective outcome reporting may be limited if more QI trial protocols were registered. Finally, although blinding may be particularly difficult to accomplish in QI trials, this should be clearly reported; if outcome assessment is not blinded, risk of bias could still be limited by using objective outcomes.

Comparison to literature

A systematic review focusing on cluster randomised trials found minimal improvement over time in either reporting or methodological conduct.20 We found no evidence for a difference in the proportion of cluster-randomised trials at high risk of bias compared with trials in which individuals were allocated. However, imbalance at baseline was a common source of potential bias in diabetes QI trials, possibly owing to the inadequate use of restricted randomisation in cluster trials.21 Another systematic review included 35 studies covering a range of health-related fields assessing trends over time in quality criteria for RCTs.5 Of these, 26 found improvement over time for at least one aspect of methodological quality. The domain most commonly noted to have improvement was allocation concealment, but the authors noted that this domain remained either poorly reported or inadequately performed in over half of the examined trials. We found a similarly low proportion of studies clearly reporting adequate allocation concealment, and no evidence of improvement over time.

Previous authors have noted that QI reports may not contain enough information to inform generalisation and allow for replication in different clinical settings.22 Standards for Quality Improvement Reporting (SQUIRE) guidelines suggest that investigators conducting trials use both SQUIRE and CONSORT to inform their manuscripts.19 Journal editors should enforce the requirements of both SQUIRE and CONSORT for QI RCTs, possibly by permitting detailed information to be posted as online appendices. Although it might seem onerous to force investigators to address all items in SQUIRE and CONSORT, the risks of poor reporting are substantial. Inadequate description of context could omit essential preconditions or important effect modifiers for a successful QI programme, while incomplete description of the programme itself might lead to failure due to partial implementation.

Strengths and limitations

To our knowledge, this is the largest analysis of risk of bias ever reported for healthcare QI RCTs and the only one to assess for trends over time. The findings are strengthened by the rigorous methods used to prepare the data for the systematic review.

QI evaluations have been criticised based on numerous criteria beyond the risk of bias domains, including short duration of intervention, lack of justification for intervention design and poor generalisability.23 ,24 Some important components of methodological quality do not relate to bias (eg, reporting of a sample size calculation). Thus, it is possible that studies at low risk of bias have important flaws with respect to methodology and/or reporting (and vice versa), and it is possible that using other scales to assess study quality could have led to different results.14 While the overall risk of bias assessment using the Cochrane Risk of Bias Tool has been shown to differentiate effect sizes (ie, higher risk of bias studies usually have larger effect sizes),10 studies at high risk of bias may still offer valuable knowledge for QI implementers. The merit of any given report will depend on the needs of the reader, while the current analysis provides an assessment of the progress in the literature as a whole.

Furthermore, we acknowledge that assigning trials with high risk of bias in a single domain a status of high risk of bias overall may be arguable. Nevertheless, our sensitivity analysis led to the same conclusion: there has been no improvement over time in the proportion of trials at high risk of bias in this literature and no particular study characteristics were associated with high risk of bias.

Another potential limitation stems from our analytical approach regarding change over time; collapsing publication year into three timeframes (pre-2002, 2002–2006 and 2007–2010) and testing for trends may have limited our power. These timeframes were chosen a priori based on the publication of important documents that we thought might affect the conduct and reporting of these trials. We felt the assumption of linear change over time underlying the Cochran-Armitage test for trend was appropriate and in keeping with our hypotheses (eg, high and unclear risk of bias would decrease gradually over time, while low risk of bias would increase). Risk of type 2 error is tempered by the number of tests performed; the lack of a significant p value for trend for any level of risk of bias in any domain supports our main conclusion. Finally, this review considered only RCTs from the diabetes literature. It would have been preferable to evaluate a random sample of all QI trials, but adequate QI electronic literature searches are yet to be developed.25


Published trials testing QI in diabetes are frequently at high risk of bias, producing results that may not be replicable. Clinicians must scrutinise the internal validity of the results as a first step in the process of considering the application of clinical findings for particular patients. Our findings emphasise the need for policy-makers, managers and/or clinical-administrators seeking to implement QI interventions to apply the same process.3 It is likely that QI investigators publishing RCTs desire for their work to have a broad impact. To help them in accomplishing this, research funders and journal editors can play an important role by ensuring that QI trials are reported thoroughly and transparently and are designed in a manner that limits the potential for risk of bias.


We would like to thank Jennifer D'Souza for her help in formatting the manuscript.


View Abstract


  • Contributors NMI and ACT designed and coordinated the study, participated in data collection, data analysis, data interpretation and drafted the manuscript. MT conducted the analysis and participated in data interpretation and drafting the manuscript. IH, LT, DM and JMG helped to design the study and write the manuscript. All authors read and approved the final manuscript.

  • Funding Ontario Ministry of Health and Long-Term Care and the Alberta Heritage Fund supported the initial systematic review.

  • Competing interests This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. The authors declare that (1) NMI, ACT, MT, IH, LT, DM and JMG received support from Ontario Ministry of Health and Long-term Care and the Alberta Heritage Foundation for the original systematic review, but the funding agencies had no role in the study design, collection, analysis or interpretation of data, writing of the manuscript or in the decision to submit this manuscript for publication; (2) NMI, ACT, MT, IH, LT, DM and JMG have no relationships with any companies that might have an interest in the submitted work in the previous 3 years; (3) their spouses, partners or children have no financial relationships that may be relevant to the submitted work and (4) NMI, ACT, MT, IH, LT, DM and JMG have no non-financial interests that may be relevant to the submitted work. NMI holds fellowship awards from the Canadian Institutes of Health Research (CIHR) and from the Department of Family and Community Medicine, University of Toronto. ACT holds a CIHR/Drug Safety and Effectiveness Network New Investigator Award in Knowledge Synthesis. DM holds a University Research Chair at the University of Ottawa and JMG holds a Canada Research Chair in Health Knowledge Transfer and Uptake.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Data detailing the risk of bias for each of the 142 trials in the review are available upon request.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles

  • Correction
    British Medical Journal Publishing Group