Objectives To investigate if the treatment effect of antidepressants in patients with depression substantially varies in each patient (patient-by-treatment interaction or treatment heterogeneity), a necessary but largely unexplored prerequisite of personalised antidepressant treatment.
Design Meta-analytic variance comparison of treatment outcome between drug arms and placebo arms of clinical trials, based on the assumption that patient-by-treatment interaction should lead to larger variances in drug arms than placebo arms. To put the results into context, we run simple simulations, assuming different definitions and rates of those who respond especially well to antidepressants.
Data sources 163 randomised, placebo-controlled trials (51 396 patients) with complete results for pre–post differences, selected from a recently published systematic review.
Analysis Variance ratios (VRs) and coefficients of variance ratios (CVRs) of individual trials were meta-analytically combined. The analysis was repeated for classes of antidepressants and specific antidepressants.
Results VRs (VR=1.01, CI 0.99 to 1.02) and CVRs (CVR=0.82, CI 0.80 to 0.84) of the antidepressant-treatment arms were comparable or smaller than in placebo arms. Similar results were observed for classes of antidepressants and for specific antidepressants. Our simulation analysis confirmed that equal VRs can only be obtained if they are not more than a few patients who respond slightly above average.
Conclusions The lack of increased treatment-outcome variance in the antidepressants versus placebo groups in randomised controlled trials indicates that no or only very small subgroups of patients respond particularly well to antidepressants. Thus, the scope for personalised treatment with antidepressants seems to be limited.
- depression and mood disorders
- statistics and research methods
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
For the first time, the amount of patient-by-treatment interaction (treatment heterogeneity), a necessary prerequisite for personalised medicine, is estimated for the pharmacological treatment of depression with antidepressants.
The database is from a systematic review of published and unpublished studies and is one of the largest so far, resulting in precise estimations of the main outcomes.
The study results are important to inform further attempts in personalised (precision) medicine in psychiatry.
As with all clinical trials, it remains an open question if our results can be replicated in real-world settings, for example among psychiatric inpatients with very severe depression.
Personalised or precision medicine, that is, applying medical interventions only to those patients known to benefit especially well to the intervention (henceforth termed ‘benefiters’), is important to increase benefits from treatment and to decrease harms. For example, if a drug with severe side effects is very effective in some patients with a specific genotype, it is crucial to know about these benefiters, because for all other patients, the risk–benefit ratio would be unfavourable. Similarly, if a drug is found to have only modest efficacy across all patients but notable side effects, then it would be important to know if there are patients (eg, those defined via a specific biomarker) who are benefiters. The latter example corresponds with the pharmacological treatment of major depression, because the average efficacy of antidepressants (ADs) is modest, corresponding to, on average, only about 2 points difference on the Hamilton Depression Rating Scale (HDRS) between AD groups and placebo in short-term randomised controlled trials (RCTs).1–5 Put differently, according to our most recent estimate, there is about 88% overlap in distribution of depression scores between ADs and placebo at the end of acute treatment.2
Despite substantial research efforts, no predictors of treatment success with ADs were found that were robust and reliable enough for use in clinical practice.6–9 Thus, much of the variance of the treatment outcome remains unexplained so far. Sources of outcome variation include variation between treatment arms (indicating that group means differ due to efficacy of treatments, eg, ADs vs placebo), variation between patients (indicating that the outcome differs from patient to patient, independent of the treatment received), variation within patients (indicating that the outcome for the same patient differs over time due to random symptom fluctuations) and patient-by-treatment interactions (indicating that treatment effects vary from patient to patient).10 The quest for precision psychiatry, in this case—personalised AD treatment, assumes that specific patients benefit more from ADs than others, that is, assumes that there is a patient-by-treatment interaction. Ideally, this can also be explained by a plausible causal mechanism, for example, inter-individual differences in monoamine function. Investing research efforts in personalised medicine only makes sense if there truly is a patient-by-drug interaction that explains some variance in the treatment outcome. Although the field is mostly enthusiastic about personalised medicine or precision psychiatry,11 experts from various fields now start to dampen expectations and caution that personalised/precision medicine may fall short of expectations.12–14
Thus, we must remain mindful that there might be no notable subgroup of true AD benefiters and that the modest average treatment effect is the best we can hope for.15 We further need to acknowledge that RCTs are inherently limited to demonstrating patient-by-treatment interactions.10 To identify patient-by-treatment interactions, repeated period cross-over trials are necessary, but these are hardly feasible with common ADs due to delayed onset of therapeutic effect and relatively high rates of spontaneous remission. The most common trial design is the simple parallel-group trial, where patients are randomised to either ADs or placebo. However, these trials can only identify mean differences between treatment arms (ie, efficacy), whereas variation between patients, within patients, as well as patient-by-treatment interactions are part of the error term. Nevertheless, if patient-by-treatment interaction effects are present, then the variance in the treatment outcome should be increased in the drug group relative to the placebo group, because no comparable drug-by-patient interaction is present in the placebo group.10 16 17 Thus, results from RCT can inform indirectly if there might be subgroups of benefiters.
The goal of this meta-analysis was to examine whether the outcome variances between ADs and placebo differ, in order to gauge the potential of personalised/precision psychiatry for treatment with ADs.
Our analysis was based on short-term RCTs of ADs for patients with unipolar major depression, reported in the most recent systematic review.18 The authors of this comprehensive study made the data available in a public repository (https://data.mendeley.com/datasets/83rthbp8ys/2). This included 522 trials (with 21 different ADs), of which 254 trials were suitable for further analysis, that is, contained information about the outcome and also included a placebo arm. Where trials had multiple treatment arms with different dosages of ADs, these arms were aggregated. Where trials compared different ADs, the data of these arms were aggregated to only have one value for the drugs in these trials, similar as in a previous publication.16 Additionally, we recorded different ADs by their class [serotonin-norepinephrine reuptake inhibitor (SNRI), selective serotonin reuptake inhibitor (SSRI), atypical ADs, and tricyclic ADs]. For 169 (67%) of studies, the pre-post mean reduction of depression scores (M) and the related SDs were available, and only the analysis for these studies is reported here. Analysis for the 85 studies (33%) where only the mean value and SD of the post-treatment depression scores were available are reported in the (online supplementary file 1,https://osf.io/98kex/files/). Several additional variables were created for sensitivity analysis (see statistical analysis).
Patient and public involvement statement
This research was done without patient involvement. Patients were not invited to comment on the study design and were not consulted to develop patient relevant outcomes or interpret the results. Patients were not invited to contribute to the writing or editing of this document for readability or accuracy.
We calculated the variance ratio (VR) for each RCT and aggregated them by means of a random effect meta-analysis according to the procedure suggested by Winkelbeiner et al 16; see https://osf.io/qarvs/files/, using the metafor package in R. Because the pre–post differences were significantly associated with their SD, we repeated the analysis using the coefficient of the variance ratio (CVR). This removes the effect of expected changes in the SD due to changes in the mean.19 A VR of 1.00 suggests equal variance of AD and placebo. If the VR exceeds 1, then the variance of the AD group is larger than in the placebo group. If the CVR exceeds 1, then the increase of variance with increasing pre–post differences is stronger in the AD than in the placebo arms. Sensitivity analysis included meta-regression models with the assessment instruments, year of publication, type of publication (published vs unpublished), sample size and drop-out rates.
To bring our results into context, we also ran simulation analysis with different definitions and probabilities for benefiters. We based these simulations on an efficacy of 2 points mean difference between AD and placebo groups, as reported in a meta-analysis on the same dataset for trials using the HDRS-17 instrument.5 We assumed SD=8 and a mean difference between pre-depression and post-depression scores of 11 points in the AD group (based on rounded means of these values observed in our dataset). We used different cut-offs to define patients as benefiters, ranging from 5 to 10 points superior treatment outcome with the HDRS-17, and a proportion of 5%–50% in the AD group versus 0% in the placebo group. To simulate placebo groups, we sampled from a normal distribution with the above parameters for the placebo group (M=9, SD=8 and 5 000 000 samples). We used a similar sampling procedure for the AD group, but created benefiters by adding necessary HDRS responder points to an assumed fraction of the sample, and adding as many points to the rest of the sample to end up with the overall efficacy of 2 HDRS points.
The R-code and data of this publication are available online (https://osf.io/98kex/files/).
Across all ADs, the VR was almost perfectly 1.00 with a narrow confidence interval (VR=1.01, 95% CI=0.99 to 1.02). This means that the variances in the AD and placebo groups are nearly identical (table 1). Similar findings were found for all classes of ADs (table 1) and for each individual drug (online supplementary table 2, https://osf.io/98kex/files/). There was no significant sign of heterogeneity in the meta-analyses, except for SSRIs. A closer inspection revealed that this resulted from a single outlier (see footnote in table 1).
For the CVR, results indicated that the increase of variance with increasing pre–post differences is less strong in AD than in placebo arms (CVR=0.82, 95% CI=0.80 to 0.84). Comparable results were found for all classes of ADs and individual drugs (table 1). The heterogeneity was statistically significant in nearly all meta-analyses of the CVR.
In the sensitivity analysis, the meta-regression models could not detect statistically significant effects for year of publication, type of publication, measurement instruments, drop out rates, and sample size (see https://osf.io/98kex/files/).
As shown in figure 1, most benefiter assumptions lead to VRs much different from those we observed in our study.
However, for liberal definitions of benefiters and low rates of these benefiters, the VRs can indeed be small. For example, if there are 10% benefiters, as defined with 6 HDRS points difference to average placebo response, the VR is within the CI of our main finding (0.99–1.02).
We found nearly identical treatment outcome variances for AD arms compared with placebo in RCTs for the acute treatment of major depression in a large database, as indicated by VRs almost perfectly being VR=1. The simplest explanation for the finding of similar variances is that there are constant treatment effects and no treatment heterogeneity, that is, no patient-by-treatment interaction effects and no specific subgroups of patients who respond particularly well to the treatment.20 Alternatively, such a subgroup of benefiters would be very small (≤10%) and the threshold to classify someone as benefiter would be low (≤6 HDRS points difference to average placebo response). To put this in context, according to anchor-based linkage studies, at least 6 points on the HDRS are necessary for a global impression of ‘minimally improved’.21 22 Consequently, the search for meaningful predictors of relative treatment response (compared with placebo) will probably fail or will at least be very difficult due to the small subgroup of weak benefiters. Therefore, the mean effect size estimate from parallel-group RCT remains the best guess for predicting treatment outcome for an individual patient. Furthermore, the results for the coefficient of variance indicated that the increase of variance associated with increasing larger pre–post differences was stronger in the placebo than the AD groups. There is no immediately plausible explanation for this finding, given that baseline severity does not predict differential treatment effects.23–25
Our findings are in line with Senn,12 who argued that exploratory post-hoc delineation of putative benefiters, such as the ‘true benefiters’ suggested by Thase et al,26 are simply statistical artefacts due to random symptom fluctuations and measurement error (see also Hengartner15). Our findings also replicate the findings for antipsychotics in the acute treatment of schizophrenia,16 and several treatments in a review of various medical interventions.17 Together, these studies indeed suggest that the promises of precision medicine may remain elusive and that the scope for personalised medicine might be smaller than previously hoped for.12 13 Given the high expectations placed in biomarker-based precision medicine, such findings will probably cause disbelief and reluctance in many advocates of this enthusiastic movement. In anticipation of such critique, we would like to address two objections that are likely to be submitted in response to this paper.
First, as recently stressed by biostatistics professor Dr Frank Harrell, to assume that there is treatment heterogeneity (ie, significant patient-by-treatment interaction) when the average treatment effect is close to zero (which is the case with ADs), would imply that there must be a large subgroup of patients where the treatment causes significant harm.27 Although it has been suggested that ADs may worsen the long-term outcome of depression in some patients,28–31 there is no evidence that they may do harm in a large subgroup of patients in short-term trials. In the absence of consistent biologically-informed patient-specific treatment effects, our best treatment estimate for ADs thus remains the average drug effect relative to placebo.27
Second, and closely related to the above argument, even after decades of massive research efforts there is no evidence of robust neurobiological and genetic predictors of differential treatment response in depression.6–9 32 Biostatistics professor Dr Stephen Senn once stated: ‘Unless patient by treatment interaction exists, it is pointless looking for gene by treatment interactions’.33 Thus, calling for more genetic and neurobiological research into differential treatment effects clearly conflicts with the current literature and will most likely fail to yield the hoped-for results.
We acknowledge the following major limitation: as Cortés et al 17 describe in their paper, equal variances are no definite proof for a lack of patient-by-treatment interactions. They hypothetically describe a situation that leads to equal variances in the treatment and in the control condition and with patient-by-treatment interactions, but this situation is highly unlikely. Furthermore, VRs of 1 are, theoretically, also possible with a small fraction of ‘super-responders’ and a specific response for all others, as highlighted in a vivid Twitter-Discussion of our paper (https://twitter.com/Martin_Ploederl/status/1188006207497363457). It can indeed be debated what assumption is more plausible: a constant treatment effect, a hypothetical fraction of super-responders or other highly specific premises. However, a small fraction of super-responders would obviously lead to non-normal distributions with notable peaks at very low levels of depression scores. This was not observed so far, to our knowledge,26 but could be further investigated with patient level data. Moreover, VRs would increase for a wide range of scenarios with varying fractions of benefiters and varying definitions of ‘benefiters’.
Another potential problem may be, as one reviewer pointed out, that the VRs did not vary much across trials, as indicated by the Q statistics, and also by the low I2 statistics. However, the main results remained the same for different estimators of heterogeneity, unweighted results or with the Knapp and Hartung adjustment (see online supplementary table 3). Furthermore, by manually increasing the value of the heterogeneity, results remained comparable. Presumably, the low between-trial heterogeneity was caused by insufficient randomisation of trials, or due to narrow and selective inclusion criteria for trial participants.34
In conclusion, the results of our meta-analysis suggest that there is no or at best a very small patient-by-treatment interaction. The lack of increased outcome variance in the AD versus placebo groups in parallel-group RCT indicates that no specific subgroup of patients may respond particularly well to ADs. Thus, with the ADs currently available, the scope for personalised AD treatments is probably limited and it is unlikely that precision psychiatry will succeed in finding clinical or biological predictors of differential treatment response that would account for a therapeutic effect that goes beyond a minimal clinical improvement.
Contributors MP and MPH were responsible for the conception and design of the study, interpretation of data, and drafting and revising the manuscript. MP performed the meta-analysis and simulation analysis. Both the authors had full access to all the data and take responsibility for the integrity of the data and the accuracy of the data analysis.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient consent for publication Not required.
Ethics approval No ethical approval is required since our study is a secondary analysis.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available in a public, open access repository.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.