Intended for healthcare professionals

Endgames Statistical Question

What is a non-inferiority trial?

BMJ 2013; 347 doi: https://doi.org/10.1136/bmj.f6853 (Published 15 November 2013) Cite this as: BMJ 2013;347:f6853
  1. Philip Sedgwick, reader in medical statistics and medical education
  1. 1Centre for Medical and Healthcare Education, St George’s, University of London, London, UK
  1. p.sedgwick{at}sgul.ac.uk

Researchers investigated the effectiveness of cognitive behavioural therapy delivered by telephone compared with the same therapy given face to face in the treatment of obsessive compulsive disorder. A randomised controlled non-inferiority trial study design was used. The intervention was 10 weekly sessions of exposure therapy and response prevention delivered by telephone or face to face. The aim of the study was to investigate whether the delivery of therapy by telephone was as effective as face to face sessions, the standard mode of delivery.1

The primary outcome measures included the self report version of the Yale Brown obsessive compulsive disorder checklist measured six months after the completion of treatment. The total score for the checklist ranges from 0 to 40, with higher scores indicating greater obsessive compulsiveness. A prespecified non-inferiority margin of 5 units on the Yale Brown checklist was proposed.

Participants were 72 patients with obsessive compulsive disorder recruited from two psychology outpatient departments. In total, 36 participants were randomised to each treatment group. Using an intention to treat analysis, the difference between treatments (face to face sessions minus telephone) in the Yale Brown checklist score at six months’ follow-up was −0.55 (95% confidence interval −4.26 to 3.15).

Which of the following statements, if any, are true?

  • a) The null hypothesis states that in the population the mean Yale Brown checklist score at six months’ follow-up for standard delivery was higher than that for telephone delivery by 5 units or more

  • b) Analysis is based on the 95% confidence interval for the mean difference between treatment groups in the primary outcome

  • c) It can be concluded that telephone delivery is not inferior in effectiveness to the standard delivery of face to face sessions

Answers

Statements a, b, and c are all true.

Weekly face to face sessions were the standard mode of delivery for cognitive behavioural therapy in the treatment of obsessive compulsive disorder. However, because they are resource intensive and have long waiting lists, it is likely that some patients will not be able to access treatment. Provision of therapy over the telephone has been proposed as an alternative mode of delivery. Such an approach would reduce contact with therapists and costs, while increasing access for patients who cannot attend clinic appointments for geographical, social, medical, or economic reasons. If delivery of therapy by telephone was shown to be as effective as face to face sessions in clinical outcomes, it might become the standard mode of delivery.

To establish whether provision of cognitive behavioural therapy over the telephone was as effective as face to face sessions in the treatment of obsessive compulsive disorder, the researchers used a non-inferiority trial study design. The aim was to establish whether telephone delivery was “no worse than”—that is, not inferior to—face to face sessions in effectiveness. This required the non-inferiority margin to be specified, which in the trial above was 5 units on the Yale Brown obsessive compulsive disorder checklist at six months’ follow-up. The non-inferiority margin was determined on the basis of clinical judgment. For it to be shown that telephone delivery was not inferior to face to face sessions in effectiveness, the mean Yale Brown checklist score at six months’ follow-up for standard delivery must not be higher than that for telephone delivery by a margin of 5 units or more.

The classic randomised controlled trial seeks to establish superiority of a new treatment compared with the standard treatment or placebo. Such trials are called superiority trials and have been described in a previous question.2 In a superiority trial, statistical hypothesis testing starts at the position of equipoise, with the null hypothesis stating that there is no difference in effectiveness between treatments. The aim is to establish whether there is evidence to reject the null hypothesis in favour of the two sided alternative, which states that treatments are not equivalent in effectiveness, with one being superior to the other.

Non-inferiority trials incorporate the process of statistical hypothesis testing, although the approach is an adaptation of the traditional one used in superiority trials. In the non-inferiority trial above, the null hypothesis started at the position of inferiority, with telephone delivery being inferior to the standard delivery of face to face sessions. Specifically, it stated that the mean Yale Brown checklist score at six months’ follow-up for standard delivery was higher than that for telephone delivery by 5 units or more (a is true). The aim was to establish whether the results of the trial supported the null hypothesis or provided evidence for the alternative, which stated that non-inferiority exists. Specifically, the alternative hypothesis stated that telephone delivery was not inferior to face to face sessions, and the mean Yale Brown checklist score at six months’ follow-up for standard delivery was less than 5 units higher than that for telephone delivery. It is possible under the alternative hypothesis that the mean score on the Yale Brown checklist for telephone delivery was greater than that for face to face sessions.

It would not have been appropriate to use a superiority trial to investigate whether delivery of therapy by telephone was equivalent in effectiveness to the standard mode of delivery by face to face sessions. It was not anticipated that telephone delivery would be superior to face to face sessions in effectiveness. Therefore, if the null hypothesis had not been rejected in favour of the alternative in a superiority trial, it could not have been concluded that there was no difference between treatments in effectiveness, only that there was no evidence of a difference. The concept that absence of evidence is not evidence of absence in a superiority trial has been discussed in a previous question.3

Analysis of a superiority trial is based on a P value plus a two sided confidence interval, typically 95%, for the test of the difference (whether an absolute or relative measure) between treatment groups in the primary outcome. The association between the P value and 95% confidence interval with respect to the 5% (0.05) level of significance has been described in a previous question.4 Analysis of a non-inferiority trial is based only on a confidence interval, typically 95% and two sided, for the difference between treatment groups in the primary outcome in relation to the non-inferiority margin (b is true). In the above trial, if the 95% confidence interval for the difference straddled the non-inferiority margin of 5 units—that is, one limit was smaller than 5 units and the other limit greater—there would have been no evidence to reject the null hypothesis in favour of the alternative. This would also hold true if one of the limits was exactly equal to 5 units. It would have been inconclusive as to whether the mean effect of telephone delivery was within 5 units on the primary outcome compared with standard delivery. If both limits of the 95% confidence interval for the mean difference between treatments (standard minus telephone delivery) on the Yale Brown checklist at six months’ follow-up were less than 5 units then non-inferiority would have been demonstrated. Specifically, telephone delivery would have been shown to be not inferior to the standard mode of delivery by more than the non-inferiority margin of 5 units.

The difference between treatments (face to face sessions minus telephone) in the Yale Brown obsessive compulsive disorder checklist score at six months was −0.55 (95% confidence interval −4.26 to 3.15). Both limits of the confidence interval were less than five units. Therefore, it can be concluded that telephone delivery was not inferior to the standard delivery of face to face sessions by more than the non-inferiority margin of 5 units (c is true).

Non-inferiority trials are undertaken for many reasons. Perhaps the most obvious is the development of a new mode of delivery for a treatment, as described in the example above. The new mode of delivery could replace the standard one if it can be shown to be not inferior in effectiveness while conferring benefits, such as increased patient access and reduced costs.

A non-inferiority trial may also be undertaken during the development of a new drug if the drug is not anticipated to be less effective than the standard one and no other differences are expected. Although the new drug may not improve therapeutic care compared with the standard drug, the development of the drug would lead to an additional treatment that can be used in patient care. This would be invaluable because patients benefit from different drugs. Some patients experience substantial benefit from one drug but little benefit, if any, from another.

A slightly more complex, and possibly more realistic, situation that might occur in the development of a new drug is the trade-off between effectiveness and other factors such as side effects, patient acceptability, and costs. It might be shown that the new drug is not inferior to the standard one, although there may be a small reduction in effectiveness. Despite the reduction in therapeutic benefits the new treatment might be acceptable as an alternative if, for example, it had fewer side effects, was more convenient to the patient, or was cheaper.

Notes

Cite this as: BMJ 2013;347:f6853

Footnotes

  • Competing interests: None declared.

References

View Abstract