Intended for healthcare professionals

Editorials

Why do the results of randomised and observational studies differ?

BMJ 2011; 343 doi: https://doi.org/10.1136/bmj.d7020 (Published 07 November 2011) Cite this as: BMJ 2011;343:d7020
  1. Jan P Vandenbroucke, professor of clinical epidemiology
  1. 1Department of Clinical Epidemiology, Leiden University Medical Centre, 2300 RC Leiden, Netherlands
  1. j.p.vandenbroucke{at}lumc.nl

Statistical theory conflicts with empirical findings in several areas of research

In the linked study (doi:10.1136/bmj.d6829), Tzoulaki and colleagues found that cardiovascular risk markers show less predictive power in secondary analyses of data from randomised controlled trials (RCTs) than in observational studies that were set up to investigate these markers.1 Why would this be?

For decades the question of “which are better?”—randomised trials or observational studies—has been debated. We now have not only theory, but also evidence in three different areas—effects of treatment, adverse effects, and biomarkers. Theory predicts that randomised trials are superior when investigating the hoped for effects of treatments. In daily practice, treatment depends on the perceived prognosis of a patient, so any effect of treatment becomes inextricably intermingled with prognosis. Therefore, data from daily medical practice cannot be used to investigate the intended effects of treatments. Trials with concealed randomisation are needed to obtain the right answers. However, empirical proof that observational studies of treatment are widely off the mark has been surprisingly elusive.2 Four meta-analyses contrasting RCTs and observational studies of treatment found no large systematic differences (Benson 2000, Concato 2000, MacLehose 2000, Ioannidis 2001).2 The first and second found no difference, with RCTs showing larger variation in the second; the third found no differences for higher quality studies; the fourth found a high correlation coefficient, with a slight tendency for larger estimates and somewhat more heterogeneity in observational studies. A systematic difference was found in an older study on historical controls.3 A semi-simulation showed average similarity but larger variation for observational studies.4 Thus, the notion that RCTs are superior and observational studies untrustworthy, except when looking at dramatic effects,5 rests on theory and singular events—discrepancies in the effects of vitamins6 and hormone replacement therapy. For hormone replacement therapy, however, discrepancies between RCTs and observational studies were shown to have little to do with assumed advantages of randomisation but to be the result of different time axes in the analysis.7

For adverse effects, the same theory predicts that observational studies based on records from daily practice can give the right answers. Adverse effects are diseases that are different from those being treated; they have different risk factors, and they are always unintended and often unpredictable (or analyses can be restricted to patients for whom the adverse effect is unpredictable). Hence, there is no confounding by indication.8 This idea has been supported by one small and one larger meta-analysis,9 10 both of which showed that estimates of adverse effects from randomised trials and observational studies are similar. The similarity was most clearly shown in a funnel plot.10 Thus, on the basis of comparisons where both types of study were available, results from observational studies are reliable. This is fortunate because randomised trial data do not exist for many adverse effects, especially those that are rare or occur late. The greatest benefit of being able to use data from daily practice for research into adverse effects is that the frequency of adverse effects can be much higher in daily practice than in the superselected population of trials.

Theory is silent, however, about a potential difference between secondary analyses of randomised trials versus observational studies for the predictive power of biomarkers. Indeed, almost all advantages of RCTs disappear when trial data are used to assess a biomarker. Biomarkers are not randomised (and thus there is no protection against confounding), and any alleged advantage of advance protocol specification of end points and analyses does not apply because biomarkers are often analysed as an “afterthought” to publish something extra from an RCT. Researchers doing biomarker analyses might be expected to “data dredge” in the same way in a dataset that came originally from an RCT as they would for observational data.

Tzoulaki and colleagues discuss three potential explanations for their finding that the predictive power of biomarkers is lower in RCTs than in observational studies: data dredging, individual patient data analyses, and spectrum bias. The authors admit that the question whether secondary analyses of data from randomised trials lead to less or more data dredging than for observational studies can be argued either way. Meta-analyses of individual patient data and meta-analyses based on aggregate results did differ, but the authors dismissed this as a possible explanation because markers studied in individual patient meta-analyses were probably different. Lastly, patients in RCTs may have a more limited range of risk profiles, but the authors did not think that this could explain their findings. However, RCTs have a restricted range of patients because inclusion criteria aim to reduce any risk to participants or sponsors. The enrolment of patients in trials is even more selective than can be gleaned from the stated inclusion and exclusion criteria.11 12 Sensitivity and specificity will differ between studies because of the inclusion of a different range of patients, a phenomenon known as “spectrum bias.” Spectrum bias is a good candidate to explain differences in results for prognostic markers in trials versus observational studies. If spectrum bias is the explanation, credibility should be given to observational studies that include a wider range of patients from daily practice.

As a solution, Tzoulaki and colleagues propose to register all study populations with acceptable data quality and to reanalyse all studies for each novel emerging biomarker of interest. However, this might not be feasible because the biomaterial from older studies may no longer exist, the material was not stored in the right way, or it is not financially or logistically possible. Before we search for solutions, we ought to know what the problem is. Can Tzoulaki and colleagues’ findings be replicated and, most importantly, if they can, what is the underlying mechanism? Understanding the mechanism will determine which studies to trust.

Notes

Cite this as: BMJ 2011;343:d7020

Footnotes

  • Research, doi:10.1136/bmj.d6829
  • Competing interests: The author has completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declares: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • Provenance and peer review: Commissioned; not externally peer reviewed.

References

View Abstract