Elsevier

Biological Psychiatry

Volume 59, Issue 11, 1 June 2006, Pages 990-996
Biological Psychiatry

Review
Size of Treatment Effects and Their Importance to Clinical Research and Practice

https://doi.org/10.1016/j.biopsych.2005.09.014Get rights and content

In randomized clinical trails (RCTs), effect sizes seen in earlier studies guide both the choice of the effect size that sets the appropriate threshold of clinical significance and the rationale to believe that the true effect size is above that threshold worth pursuing in an RCT. That threshold is used to determine the necessary sample size for the proposed RCT. Once the RCT is done, the data generated are used to estimate the true effect size and its confidence interval. Clinical significance is assessed by comparing the true effect size to the threshold effect size. In subsequent meta-analysis, this effect size is combined with others, ultimately to determine whether treatment (T) is clinically significantly better than control (C). Thus, effect sizes play an important role both in designing RCTs and in interpreting their results; but specifically which effect size? We review the principles of statistical significance, power, and meta-analysis, and commonly used effect sizes. The commonly used effect sizes are limited in conveying clinical significance. We recommend three equivalent effect sizes: number needed to treat, area under the receiver operating characteristic curve comparing T and C responses, and success rate difference, chosen specifically to convey clinical significance.

Section snippets

Statistical and Clinical Significance, Power, and Meta-Analysis

As statistical hypothesis testing is typically performed, a “statistically significant” result with p < .05 means that the data indicate that something nonrandom is going on. When p < .01, the evidence is more convincing, and p = 10−6 very convincing indeed. However, the p value is a comment on how convincing the data are against the null hypothesis of randomness; the conclusion is always “something nonrandom is going on.” Such a conclusion gives no clue as to the size or importance of the

Cohen’s d

When an RCT outcome measure is scaled, the most common effect size is Cohen’s d (Cooper and Hedges 1994, Hedges and Olkin 1985), the difference between the T and C group means, divided by the within-group standard deviation. This effect size was designed for the situation in which the responses in T and C have normal distributions with equal standard deviations.

The population parameter estimated by Cohen’s d ranges across the real line, with zero indicating no difference between T and C,

Number Needed to Treat

The effect size proposed that seems to best reflect clinical significance is one proposed in the context of evidence-based medicine for binary (success/failure) outcomes: NNT (Altman and Andersen 1999, Cook and Sackett 1995). Number needed to treat is defined as the number of patients one would expect to treat with T to have one more success (or one less failure) than if the same number were treated with C. For a binary outcome (success/failure), the success rate difference (SRD) is defined as

Confidence Intervals and Effect Sizes

In every report of an RCT, we recommend that each p value be accompanied by NNT (for interpretability) and SRD with its standard error and confidence interval (for computations). The difficulty is that the correct computation of the confidence interval and the standard error of SRD depends on the distribution of the data underlying that effect size.

In those circumstances in which Cohen’s d is appropriate (normal distributions, equal variances), the exact distribution of Cohen’s d is known (

Discussion: The Threshold of Clinical Significance

To summarize, we propose that for any RCT, along with reporting the p value comparing T with C, researchers report NNT and SRD, as well as the standard error and a confidence interval for SRD. If effect sizes were so reported, they could then be used to facilitate consideration of what the threshold of clinical significance might be for design of subsequent related studies.

Here we have attempted to take the first major step, recommending an effect size that is clinically interpretable and

References (45)

  • R.J. Cook et al.

    The number needed to treatA clinically useful measure of treatment effect

    Br Med J

    (1995)
  • H. Cooper et al.

    The Handbook of Research Synthesis

    (1994)
  • H.M. Cooper et al.

    Statistical versus traditional procedures for summarizing research

    Psychol Bull

    (1980)
  • J. Cornfield

    A method of estimating comparative rates from clinical data. Applications to cancer of the lung, breast and cervix

    J Natl Cancer Inst

    (1951)
  • J. Cornfield

    A statistical problem arising from retrospective studies

  • R. Dar et al.

    Misuse of statistical tests in three decades of psychotherapy research

    J Consult Clin Res

    (1994)
  • B. Efron et al.

    A leisurely look at the bootstrap, the jackknife, and cross-validation

    Am Statistician

    (1983)
  • B. Efron et al.

    Computer-Intensive Statistical Methods (Technical Report 174)

    (1995)
  • J.L. Fleiss

    On the asserted invariance of the odds ratio

    Br J Prev Soc Med

    (1970)
  • R.J. Grissom et al.

    Effect Sizes for Research

    (2005)
  • L.V. Hedges et al.

    Statistical Methods for Meta-Analysis

    (1985)
  • L.M. Hsu

    Biases of success rate differences shown in binomial effect size displays

    Psychol Bull

    (2004)
  • Cited by (668)

    View all citing articles on Scopus
    View full text