Objectives To analyse the impact of placebo effects on outcome in trials of selected minimally invasive procedures and to assess reported adverse events in both trial arms.
Design A systematic review and meta-analysis.
Data sources and study selection We searched MEDLINE and Cochrane library to identify systematic reviews of musculoskeletal, neurological and cardiac conditions published between January 2009 and January 2014 comparing selected minimally invasive with placebo (sham) procedures. We searched MEDLINE for additional randomised controlled trials published between January 2000 and January 2014.
Data synthesis Effect sizes (ES) in the active and placebo arms in the trials’ primary and pooled secondary end points were calculated. Linear regression was used to analyse the association between end points in the active and sham groups. Reported adverse events in both trial arms were registered.
Results We included 21 trials involving 2519 adult participants. For primary end points, there was a large clinical effect (ES≥0.8) after active treatment in 12 trials and after sham procedures in 11 trials. For secondary end points, 7 and 5 trials showed a large clinical effect. Three trials showed a moderate difference in ES between active treatment and sham on primary end points (ES ≥0.5) but no trials reported a large difference. No trials showed large or moderate differences in ES on pooled secondary end points. Regression analysis of end points in active treatment and sham arms estimated an R2 of 0.78 for primary and 0.84 for secondary end points. Adverse events after sham were in most cases minor and of short duration.
Conclusions The generally small differences in ES between active treatment and sham suggest that non-specific mechanisms, including placebo, are major predictors of the observed effects. Adverse events related to sham procedures were mainly minor and short-lived. Ethical arguments frequently raised against sham-controlled trials were generally not substantiated.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
Selection of trials with low risk of bias.
Calculation of effect sizes on primary and pooled secondary end points in active treatment as well as sham arms.
Heterogeneous interventions, outcome measures and timing of assessment.
It is normally assumed that medical practices are based on firm clinical evidence and that new practices or techniques are introduced when superiority, or at least non-inferiority, has been demonstrated compared to established treatments. However, medical history reveals numerous examples contradicting this assumption. Forty-two per cent of 146 medical practices were found to be reversed in a recent review analysing 10 years of publication in a high-impact medical journal.1 Large effects of an intervention in initial reports are often spurious findings, while the vast majority may represent substantial overestimations.2
Even though surgical and other invasive techniques generally have reached a high degree of sophistication through the last decades, not all invasive procedures have lived up to expectations. Promising results in initial observational studies have in some cases led to widespread clinical implementation, in spite of lack of documented effectiveness.3 The reluctance to abandon contradicted medical practice is commonly ascribed to both culturally embedded medical practices and different forms of vested interests.4 ,5 The continuation of unnecessary and potentially harmful interventions leads to major costs for both patients and society.
The randomised placebo-controlled trial is considered the gold standard for evaluating the effects of pharmacological treatments. However, there are relatively few controlled studies in peer-reviewed surgical journals and even fewer placebo (sham)-controlled studies.6–8 Ethical concerns raised by the potential for harm to participants are usually cited as the main obstacle to sham-controlled studies.9 Problems of a practical nature relate to patient blinding, differing technical expertise, the heterogeneity of the interventional techniques and variable outcome specifications, making standardisation difficult to achieve.10
A meaningful effect in clinical trials may result from a large effect in the active treatment group, a small effect in the placebo group or a combination of both. Even though a placebo effect has been documented in a range of clinical conditions, there are few studies assessing the magnitude of the placebo effect in surgical procedures. In the present study, we analysed placebo-controlled trials of minimally invasive interventions in musculoskeletal, neurological and cardiac conditions. The aims were threefold: (1) to assess the magnitude of change in outcome from baseline to trial end point in the active treatment as well as placebo (sham) arms; (2) to explore the contribution of non-specific factors, including placebo, to the outcome of active treatment and (3) to assess the level of reported adverse effects in both trial arms.
Search strategy and selection criteria
We conducted electronic searches for randomised placebo-controlled trials of minimally invasive interventions for cardiac, neurological and musculoskeletal conditions. We defined minimally invasive procedures as interventions involving the introduction of a medical device, substance or other foreign material into the body through a cannula, catheter or arthroscope, thereby minimising damage to biological tissues at the point of entrance. We first used MEDLINE and Cochrane library to identify systematic reviews published between January 2009 and January 2014. The following key words were used in our search strategies: “randomi* controlled trial”, “placebo OR sham” in combination with “low back pain”, “neck OR cervical pain”, “radiofrequency denervation”, “facet joint AND “nerve block” OR injection”, “intradiscal OR annular AND thermal”, “epidural AND corticosteroid* AND sciatica OR radic*”, “hyaluron* OR viscosuppl* AND knee AND osteoarthritis”, “vertebroplast*”, “arthroscop*”, “debridement AND lavage AND knee AND osteoarthr*”, “meniscectomy AND knee”, “myocardial laser revascularization”, “transplantation OR gene OR stem cell OR deep brain stimulation AND Parkinson* OR dystonia”, “spinal cord stimulation”, and “foramen ovale AND migraine”. We used the “core clinical journals” filter in PubMed, which is an index of journals particularly relevant to practicing physicians.
From the most recently published systematic review of each procedure, we selected randomised placebo-controlled trials published later than January 2000. We excluded trials published before January 2000 because our primary aim was to assess interventions that are currently, or until recently have been, in common use. We selected trials that according to the review fulfilled at least four of the following methodological criteria: random allocation, allocation concealment, blinding of participant, blinding of assessor and intention-to-treat analysis. We chose these criteria because they were the most commonly used in the selected reviews, and because use of scales for assessing quality or risk of bias is explicitly discouraged in Cochrane reviews.11 Two of the authors (RH and JIB) independently assessed the five methodological criteria in the RCTs included from systematic reviews.
We next searched MEDLINE for additional randomised placebo-controlled trials published between January 2000 and January 2014. Two of the reviewers (OT and JIB) independently assessed the five criteria aforementioned in the additional RCTs that were identified from this search.
Only English language journals were included. We excluded crossover trials, trials that did not report results as means, SD, SE or CIs in active and sham groups, as well as trials with only graphic representation of data. This review is reported in accordance with the PRISMA statement.12
We registered all continuous primary end points. In trials without continuous primary end points, with multiple end points or no defined primary end point, we selected an outcome related to pain or condition-specific end point. The heterogeneity of trials did not allow for use of pain as a primary outcome. We used the RCTs’ defined primary outcome to avoid bias introduced by choosing our own end point. We also registered secondary end points in order to avoid potential bias from selective reporting in the included trials. End points describing medication, radiographic or physiological variables, social or psychological functions were not included. For the Parkinson's trials, only end points in the off-medication state were registered. Results from the last follow-up until 12 months were extracted. The trials’ protocol registration, funding source, description of sham intervention, sample size, disease duration, length of follow-up and reported adverse events (AEs) in both trial arms were registered.
To assess clinically important change, we calculated effect size (ES, Cohen's d) based on the means and SDs. We calculated ES for both the active and sham interventions to obtain information about the pre-to-post treatment change in both arms. Without first calculating ES of change in each trial arm, we would not be able to discern the relative contribution of placebo, which was one of the objectives of the study. Subtracting the average score after treatment from the average score before treatment and dividing the result by the average of the SDs before and after treatment calculates ES. An ES of 0.8 or more is assumed to be large, while an ES of 0.5–0.8 is considered moderate.13 In trials with multiple secondary end points we calculated the pooled mean ES, without weighting. Owing to the small sample sizes in most of the included trials, we calculated an adjusted ES in accordance with a recommended procedure.14 Unadjusted linear regression analyses were used to explore the association between outcomes in the active and sham groups, both for primary and pooled secondary end points. For this analysis, we used Medcalc Statistical Software V.184.108.40.206.15
Selection of interventions and trials
The searches provided sham-controlled trials for the following interventions: percutaneous laser revascularisation of myocardium for angina pectoris (n=2), closure of foramen ovale for migraine (n=1), arthroscopic meniscectomy for meniscal tears (n=1), debridement (n=1) and injection of hyaluronic acid (n=3) for symptomatic osteoarthritis of the knee, injection or transplantation of biologically active material for Parkinson's disease (human retinal pigmental cells (n=1), fetal nigral cells (n=1) and neurturin (n=2)), epidural injections of corticosteroids for sciatica (caudal (n=1), interlaminar (n=2) and transforaminal (n=1)) routes, percutaneous heating of the intervertebral disc for chronic low back pain (intradiscal radiofrequency thermocoagulation (n=1), intradiscal electrothermal therapy (n=2)) and vertebroplasty for vertebral body fractures (n=2). We give a short description of each procedure's introduction, therapeutic rationale and history in web appendix table 1.
The searches provided no sham-controlled trials of cervical, thoracic or lumbar facet joint nerve blocks or joint injections, spinal cord stimulation for low back pain, cervical epidural injections, transmyocardial laser revascularisation for angina pectoris, deep brain stimulation for Parkinson’s disease or dystonia or arthroscopic procedures other than knee conditions. We found six placebo-controlled trials of radiofrequency denervation for low back pain, but all were excluded as: SD not provided (n=1),16 compound primary end point (n=1),17 risk of false-positive response because of only one diagnostic block (n=4).18–21
The study selection process is summarised in figure 1. The search provided five systematic reviews, all identified through searches in MEDLINE; none were commercially funded.22–26 It identified a total of 71 clinical trials, 12 of these were not identified from the systematic reviews. Forty-four trials were excluded for methodological reasons, principally risk of bias. Six additional trials were excluded because ES could not be calculated.27–32 Web appendix table 2 shows the excluded trials and the reasons for exclusion. Finally, 21 clinical trials with a total of 2519 participants were included in the present review (table 1). Trial interventions in active treatment and sham arms are also shown.
Fourteen trials from the systematic reviews fulfilled at least four of the five methodological criteria.33–46 Seven trials provided through searches in MEDLINE fulfilled the same criteria.47–53 The included and excluded secondary end points are shown in web appendix table 3. All trials reported approval of study protocol prior to patient enrolment (table 1). Seven trials were commercially funded.36 ,37 ,45 ,50–53 Most of the trials had few participants, ranging from 20 to 346 (median 80).
Clinical outcomes after active treatment and sham
ES on primary end points was moderate in three of the active treatment groups and in two of the sham groups.
On pooled secondary end points, a large ES was estimated in seven trials after active treatment and in five trials after sham, while a moderate ES was reported in four and three trials, respectively (table 2).
In none of the trials did the actively treated group show a deterioration of primary end point during treatment, while this was the case for two of the sham groups (not reported to be related to the procedure). On secondary end points, deterioration occurred in two active treatment and two sham groups (table 2).
Differences in outcome between active treatment and sham
Better results on primary end points were reported with active treatment compared to sham in 14 of the 21 trials, but the differences were small. Three trials (1 epidural study,41 1 discogenic pain study44 and 1 Parkinson's study52) reported a moderate effect but none showed a large effect (figure 3 and table 2). Seven trials reported a better primary end point outcome after sham than after active treatment.
Nineteen trials reported secondary end points, 11 of these reported better outcome after active treatment than after sham, but in no case did the differences reach a moderate ES (figure 3 and table 2). In 12 trials, the outcome was better for primary than for pooled secondary end points. This bore no relation to funding source.
Eighteen studies provided information about AEs (table 1). Three of these trials reported no procedural AEs in any of the groups.33 ,39 ,47 Major AEs were reported after active treatment in four trials34 ,50 ,51 ,53 including one death in one of the Parkinson's studies.51 In the sham groups, one trial53 listed three major AEs possibly or probably related to the procedure, all presumed to be caused by antiplatelet medication, none of them life-threatening. Apart from this trial, there were no major AEs in the sham groups. The reported minor AEs were all of limited duration.
Analysis of 21 sham-controlled trials of minimally invasive procedures showed that the ESs in the active treatment arms were predicted by the ESs in the sham arms. There was a large ES on primary end points in about half of both the active and sham interventions, but none of the trials showed a large difference in ES between active treatment and sham groups, either on primary or secondary end points.
The magnitude of the effect in each trial arm varied considerably, both between different procedures and between trials using the same procedure. For instance, in the active treatment groups, ES for primary end points varied from around 0 to almost 2 after active treatment and from about −0.4 to 1.5 after sham. Disparate outcomes were reported even between trials where technical parameters were similar. For instance, ES in the sham group in the three hyaluronic acid trials varied by a factor of 3 and in the epidural trials by a factor of 2. This variability is probably related to differences in study design, duration of disability before inclusion, contextual factors, including the doctor–patient relationship, as well as other factors. The close association between end points in the active treatment and sham groups on regression analyses suggests that a large part of the reported outcomes in the active treatment groups are due to placebo effects, statistical regression to the mean or the natural course of the condition.
Strengths and limitations of the study
It is our opinion that the calculation of ESs in active treatment as well as placebo arms is a strength of the present study. This made it possible to assess the magnitude of change in arms as well as the contribution of non-specific factors to change in the active treatment arms. The calculation of ESs provides an alternative assessment to probability estimates. Another strength of the study is the supplementary analyses of pooled secondary end points, enabling a more comprehensive evaluation than using primary end points alone. Reports of tactically motivated use of primary and secondary end points before publication in order to improve study results strengthen the argument for registering all relevant secondary end points.54 Our finding that a majority of trials reported better results on primary than on secondary end points might lend support to such a hypothesis, although all trials, according to the authors, had sought and gained approval of the protocol from ethics committee and/or review board (table 1).
The present review is limited to selected minimally invasive procedures in cardiology, neurology and musculoskeletal conditions. While some procedures are or have been in wide clinical use, some are still in the clinical trial phase. Other sources of heterogeneity are variable duration of disease before inclusion, selection of outcome measures and time to follow-up. Results cannot be generalised to minimally invasive procedures in all medical disciplines, but a similar methodology could be applied to more systematic analyses of the role of non-specific effects in other minimally invasive procedures.
We applied principles from guidelines for conducting systematic reviews and meta-analyses, and included an independent assessment of methodological trial quality by two of the authors. We cannot rule out that we have missed relevant trials because we limited our search to the Cochrane Library and MEDLINE, but the most relevant trials are likely to have been identified by our searches. By preferentially selecting core journals and trials that had previously been methodologically evaluated in systematic reviews, it was our intention to reduce the risk of bias by excluding studies of low quality. We realise that this selection process and the fact that we relied on previous methodological evaluations may have contributed to unrecognised selection bias.
The use of ES as a measure of clinical effect assumes a normal distribution of the data. This does not necessarily apply in the included trials because the majority of them are small. Including trials reporting non-parametric data would, however, necessitate other methods of statistical analysis. Small studies increase the likelihood of type-2 errors, though this is more relevant to probability estimates than analysis of ES.
Adequate blinding and lack of physiological effects
We cannot rule out that treatment-specific effects in the actively treated groups may have jeopardised blinding, leading to overestimation of treatment effects through positive expectations. However, all the included trials gave a detailed description of the sham procedure, and both participant and assessor blinding seems to have been adequate.
On a more general level, it has been argued that sham procedures are not inert and may have specific physiological effects, thereby underestimating a treatment effect.55 Recently, Bickett et al hypothesised that epidural injection of small volumes of saline might have physiological effects.56 However, it is to be noted that in the four selected epidural trials in the present study, improvements in the sham group were greater in the two trials using non-epidural saline than in those using epidural saline, making a physiological effect less likely. In our opinion, physiological effects of the sham interventions are also unlikely in the remaining procedures.
Surgery and other invasive procedures are commonly believed to be associated with enhanced placebo effects, a phenomenon coined mega-placebo.57 In spite of their heterogeneous nature, the 21 selected trials share a medicotechnological context in which an a priori enhanced placebo response could be expected. If an ES >0.8 is considered as mega-placebo, half of the included sham interventions reached this level. Factors such as the level of enthusiasm and conviction conveyed by the therapist, the impression of advanced procedures and the extent to which these factors succeed in activating a placebo response are probably crucial in explaining the improvements after sham interventions and the correlation of end points in the active treatment and sham groups. Participants’ perception of whether they received active treatment or sham has been shown to contribute more to clinical improvement than the biological effects per se.32 ,58
The role of non-specific factors, primarily spontaneous remission or statistical regression to the mean in placebo-controlled studies is controversial.59 A recent meta-analysis analysing 202 trials with an untreated group, spanning 60 different clinical conditions, found rather small differences between placebo and no treatment, with ESs in the range of 0.2–0.3.60 Apart from acupuncture trials (mean ES 0.68), the authors did not include trials reporting the effectiveness of invasive procedures. Another meta-analysis studied the placebo effect of a range of treatments (pharmacological, non-pharmacological and surgical) for osteoarthritis of the hand, hip and knee.61 Of 198 included trials, 14 had a no-treatment arm. The mean ES in the placebo groups was about 0.5, while it was only slightly above 0 in the no-treatment groups. The difference between the placebo and no-treatment groups was larger than the difference between the placebo and active treatment groups. Trials using injections, acupuncture and surgery had the largest placebo effects, and the effects were larger for subjective than objective end points. The authors concluded that there is a significant placebo effect on pain, stiffness and function in symptomatic osteoarthritis.
Because the trials in the present study did not include a no-treatment arm (ie, waiting list), we cannot rule out that the changes appearing during the trial period also reflect non-specific factors, that is, spontaneous improvement or regression to the mean. Such mechanisms would be expected to be most prominent in trials with brief illness duration before inclusion and with longer time to follow-up, while improvements in chronic, unremitting conditions such as Parkinson's disease would be more likely attributed to placebo. Interestingly, in three of the four included Parkinson's trials, there were moderate to large improvements in the sham groups even at 1-year follow-up.49 ,50 ,51 Other authors have also found improvements several years after sham surgery, indistinguishable from conventional surgery.32 ,62 This is in agreement with recent insights into the neurobiological effects of placebo and their relation to underlying psychological mechanisms, principally expectation and conditioning.63
Are ethical objections to sham justified?
The use of sham in controlled surgical trials is a divisive issue, with scepticism, even frank opposition, being voiced by both ethics committees, involved surgeons and anaesthetists, and potential patients.10 Ethical arguments include the inherent risks of sham procedures combined with the lack of obvious benefits to the participants. Barriers related primarily to feasibility include problems with patient and assessor blinding, differing technical expertise, the heterogeneity of the interventional techniques and variable outcome specifications, making standardisation difficult to achieve. Existing ethical guidelines accept the role of placebo-controlled trials when certain conditions are met.64 There must be genuine equipoise, that is, conflicting or weak evidence of the effectiveness of a procedure. Blinding of both participants and assessors must be assured and participants must freely consent to suspend knowledge of whether they are receiving sham or conventional treatment. The health risks and consequences of placebo or delayed treatment must be minimal, and outweighed by the societal importance of establishing the clinical utility of the intervention in question.65 ,66
The selected trials gave a detailed description of AEs in active as well as sham-treated groups (table 1). The safety concerns frequently raised as an argument against the use of sham were generally not supported. Major AEs related to the sham procedure were reported in only 1 of the trials53 and they were short-lived and not life-threatening. Minor AEs were more frequent, but of limited duration. Positive placebo-induced effects generally outweighed AEs, thus weakening ethical arguments against the use of sham interventions. In our opinion, the consequences of the continued use of unproven invasive procedures are of a different magnitude. In the light of studies supporting the beneficial effects of sham procedures, at least for pain and Parkinson's symptoms, research ethics committees should consider such factors in their risk–benefit assessments of planned sham-controlled trials.67 ,68
The present results are pertinent to the ongoing discussion about wasteful and unproven medical practices, and underscore the necessity for a continual assessment of existing or novel unproven procedures. Minimally invasive techniques have lowered the threshold for interventions and led to their application to a wider clinical spectrum (indication creep) without an ongoing evaluation of effectiveness or safety.4 The last two decades have seen dramatic increases in the use of several of the described procedures, as well as interventions we have not investigated, such as acromioplasty, percutaneous coronary intervention and recently, robotic surgery.69–74 In light of the results in the present study, placebo effects might well explain a large part of the purported effects of such procedures. When clinicians and regulators are faced with claims of large treatment effects for insufficiently tested procedures, their default mode should be watchful scepticism. The standards of the evaluation process before approval and reimbursement for devices and procedures need to be strengthened, and economic or regulatory incentives that perpetuate the use of undocumented or harmful procedures should be abrogated.
Sham-controlled trials are unique in their ability to discriminate between true treatment effects and non-specific effects. The results of the present study suggest that placebo and other non-specific effects explain a large part of their purported benefits. Further, results indicate that the risks of AEs in sham-controlled trials are over-rated and could be considered acceptable in view of the potential personal harm and societal costs associated with unproven minimally invasive interventions.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
Contributors RH initiated and planned the project and searched databases; also made article screening, data extraction and statistical analysis, who also wrote the draft. He is the guarantor. JIB and OT assisted in developing search strategies and also quality of data extraction and checking and they reviewed the draft and contributed to manuscript revisions.
Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None.
Provenance and peer review Not commissioned; internally peer reviewed.
Data sharing statement Extra data can be accessed via the Dryad data repository at http://datadryad.org/ with the doi:10.5061/dryad.105cb.