Clinical trials and the response rate illusion

doi:10.1016/j.cct.2006.10.012

Contemporary Clinical Trials

Volume 28, Issue 4, July 2007, Pages 348-351

https://doi.org/10.1016/j.cct.2006.10.012 Get rights and content

Abstract

Clinical trial outcome data can be presented and analyzed as mean change scores or as response rates. These two methods of presenting the data can lead to divergent conclusions. This article explores the reasons for the apparently divergent outcomes produced by these methods and considers their implications for the analysis and reportage of clinical trial data. It is shown that relatively small differences in improvement scores can produce relatively large differences in expected response rates. This is because differences in response rates do not indicate differences in the number of people who have improved; they indicate differences in the number of people whose degree of improvement has pushed them over a specified criterion. Therefore, patients classified as non-responders may have shown substantial and clinically significant improvement, and these are the patients who are most likely to become responders when given medication. Response rates based on continuous data do not add information, and they can create an illusion of clinical effectiveness.

Introduction

There are various ways of comparing the performance of drug and placebo in clinical trials and systematic reviews of those trials. Outcome data for conditions that are assessed on continuous scales are typically presented as mean change scores (or sometimes as post treatment scores adjusted for baseline scores). Categorical results can also be calculated from continuous data, in the form of response rates or improvement rates (i.e., the percent of patients deemed to have responded or improved) and statistics derived from them including odds ratios, relative risk ratios, and number needed to treat. These categorical outcomes consist of the proportion of people who meet a (usually) predefined level of improvement or fall below a predefined threshold score on a continuous measure. They do not reflect natural categories, but are simply a way of dividing up a continuous distribution.

Ideally, these different methods of assessing outcome should produce similar conclusions, since they are derived from the same data. In practice, they can be divergent. This is particularly striking in reviews of antidepressant medication, where mean improvement scores indicate very small drug-placebo differences [1], [2]. whereas response rate data suggest that these differences are more substantial [1], [3]. The purpose of this article is to explore the reasons for the apparently divergent outcomes produced by comparisons of mean improvement and response rates and to consider their implications for the analysis and reportage of clinical trial data. Although our concern extends to all clinical trials in which data derived from continuous scales are reported as response rates, we illustrate the issue with data from the antidepressant literature.

Section snippets

Response rates and continuous distributions

Response rates depend on the criterion used to define a response, as well as on the magnitude of drug and placebo effects. When a response rate of 50% has been found, as has been reported in the published antidepressant trial data [3], it means that the criterion that had been chosen for defining a response has coincidentally turned out to be the median of the distribution of improvement scores. This is true regardless of the shape of that distribution. In normal distributions, it is also the

How the response rate illusion is produced

The response rate illusion is not due to response rate statistics, but rather to our interpretation of them. We think of a responder as a person who improves and a non-responder as a person who does not improve, but this is not necessarily accurate when response to treatment has been derived from continuous scores, rather than defined by natural categories (e.g., survival). A patient who is classified as a non-responder on the basis of a criterion of improvement on a continuous scale, may have

Conclusions

Some outcome data (e.g., death and pregnancy) can only be expressed in terms of response rates. Other outcomes do not fall into natural categories and can be assessed meaningfully with continuous scores. Imposing categories on such data is hazardous. It creates the impression of discrete patterns of response where the data does not suggest any, it obscures the arbitrary nature of criteria used to form the categories, and as we have shown, it can spuriously inflate the differences between groups

References (5)

NICE; National Institute for Clinical Excellence, 2004; Vol....
I. Kirsch et al.
Prev Treat
(2002)

There are more references available in the full text version of this article.

Cited by (69)

Toxicity profile of epidermal growth factor receptor tyrosine kinase inhibitors for patients with lung cancer: A systematic review and network meta-analysis
2021, Critical Reviews in Oncology/Hematology
Citation Excerpt :
Other exclusion criteria were: 1) trials in which EGFR-TKIs were used in a combination, maintenance, neoadjuvant, or adjuvant treatment strategy; 2) trials comparing EGFR-TKIs with other treatments like monoclonal antibodies, immunotherapies, and some pathway inhibitors; 3) trials comparing treatments unapproved by any Food and Drug Administration (FDA); and 4) trials whose safety outcomes subsequently published updated data in a mature or longer duration of follow-up. Placebo-controlled trials were excluded in the original analysis as they mainly recruit patients with a mild form of disease to meet ethical and safety requirements for regulatory approval (Kirsch and Moncrieff, 2007); however, these trials were added to a validation analysis. The primary outcome was systemic AEs reported as those of all-grade and grade ≥3, representing incidence and severity of toxicity, respectively; the secondary outcome was specific AEs (all-grade) including those from skin, gastrointestinal tract, lung, etc.
Epidermal growth factor receptor tyrosine kinase inhibitors (EGFR-TKIs) are treatments commonly used for lung cancer. The toxicity profile including toxicity incidence, severity, and spectrum (involving various specific adverse events) of each EGFR-TKI are of particular clinical interest and importance. Data from phase II and III randomized controlled trials comparing treatments among EGFR-TKIs (osimertinib, dacomitinib, afatinib, erlotinib, gefitinib, and icotinib) and chemotherapy for lung cancer were synthesized with Bayesian network meta-analysis. The primary outcome was systemic all-grade and grade ≥3 adverse events. The secondary outcome was specific all-grade adverse events including those of the skin, gastrointestinal tract, lung, etc. 40 trials randomizing 13,352 patients were included. Generally greater toxicity for dacomitinib and afatinib, and safety for icotinib were suggested. Furthermore, we found individual EGFR-TKIs had different toxicity spectrums. These findings provide a compelling safety reference for the individualized use of EGFR-TKIs for patients with lung cancer.
Network meta-analysis of antidepressants
2018, The Lancet
Messing about with the brain: a response to commentaries on 'Depression: why electricity and drugs are not the answer'
2022, Psychological Medicine
Depression: why drugs and electricity are not the answer
2022, Psychological Medicine
Great boast, small roast on effects of selective serotonin reuptake inhibitors: Response to a critique of our systematic review
2018, Acta Neuropsychiatrica
Mapping actions to improve access to medicines for mental disorders in low and middle income countries
2017, Epidemiology and Psychiatric Sciences

View all citing articles on Scopus

View full text

Short communicationClinical trials and the response rate illusion