Background In non-inferiority trials of radiotherapy in patients with early stage breast cancer, it is inevitable that some patients will cross over from the experimental arm to the standard arm prior to initiation of any treatment due to complexities in treatment planning or subject preference. Although the intention-to-treat (ITT) analysis is the preferred approach for superiority trials, its role in non-inferiority trials is still under debate. This has led to the use of alternative approaches such as the per-protocol (PP) analysis or the as-treated (AT) analysis, despite the inherent biases of such approaches.
Methods Using simulations, we investigate the effect of 2%, 5% and 10% random and non-random crossovers prior to radiotherapy initiation on the ITT, PP, AT and the combination of ITT and PP analyses with respect to type I error in trials with time-to-event outcomes. We also evaluate bias and SE of the estimates from the ITT, PP and AT approaches.
Results The AT approach had the best performance in terms of type I error, but was anticonservative as non-random crossover increased. The ITT and PP approaches were anticonservative under all percentages of random and non-random crossover. Similarly, lowest bias was seen with the AT approach; however, bias increased as the percentage of non-random crossover increased. The ITT and PP had poor performance in terms of bias as crossovers increased.
Conclusions If minimal crossovers were to occur, we have shown that the AT approach has the lowest type I error rates and smallest opportunity for bias. Results of trials with a high number of crossovers should be interpreted with caution, especially when crossover is non-random. Attempts to prevent crossovers should be maximised.
- STATISTICS & RESEARCH METHODS
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
Using simulations, we investigated the effect of crossovers on various analysis populations in non-inferiority trials with time-to-event outcomes.
We considered random and non-random crossovers.
Simulations were limited to clinically plausible non-inferiority trials of radiotherapy in women with early stage breast cancer.
Adjustment of baseline covariates was not considered in the analysis.
The non-inferiority randomised trial design is frequently used to compare novel experimental breast radiation regimens with standard breast irradiation for the prevention of local recurrence in patients with breast cancer who have undergone breast conserving surgery. For example, hypofractionated radiotherapy that delivers a high dose of radiation per fraction and therefore requires a shorter duration of treatment resulting in greater convenience for the patient is often compared with standard radiotherapy using a non-inferiority design.1–4 The challenges in the design, conduct and analysis of such trials have been discussed by several authors.5–9 These include the determination of the non-inferiority margin and issues related to assay sensitivity, biocreep and the choice of the analysis population.5 ,7 ,10–14
Typically, in breast cancer radiotherapy trials, prior to beginning treatment, the patient undergoes a planning process to establish the treatment fields to target the tumour and avoid radiating normal tissue. Such planning generally occurs after randomisation. Sometimes, planning may reveal that it is not possible to deliver the experimental regimen, and therefore the patient is treated with standard therapy. In some cases, after being randomised to experimental radiotherapy, the patient decides to be treated with standard radiotherapy instead. In a trial of 1234 women comparing hypofractionated radiotherapy with standard radiotherapy for the prevention of local recurrence, the crossover percentage was 1.2%.3 Generally, in trials evaluating new experimental radiotherapy techniques (that are not currently available as part of usual care), patients are not permitted to cross over from standard therapy to experimental therapy.15
In randomised superiority trials, it is well established that the analysis should be performed based on the intention-to-treat (ITT) principle—which states that patients are analysed according to the group they were randomised to regardless of the treatment they received. An ITT analysis tends to produce diluted treatment effect estimates and therefore is considered a conservative approach in analysis of superiority trials, but is anticonservative in demonstrating non-inferiority.16 This has led to the use of a per-protocol (PP) analysis where the analysis set comprises only patients who fully comply with their assigned treatment,11 or an as-treated (AT) analysis which groups patients according to the treatment they actually received,12 despite the inherent bias of such analyses.17 ,18
Several researchers have investigated the effect of non-compliance on the ITT and PP analyses in non-inferiority trials with binary or continuous outcomes.11 ,19–22 This research has focused mainly on issues such as dropouts, missing data and treatment discontinuations. Literature on the effect of crossover bias in non-inferiority trials is limited. Sheng and Kim23 showed that both the ITT and PP can be biased in trials with binary outcomes. Matsuyama16 studied the effect of crossovers to the other treatment after initiating the assigned treatment in the time-to-event situation, and suggested that the PP analysis should not be used. Similarly, others have studied the effect of switching treatments mid-trial (ie, after receiving at least some of their original allocated treatment) in the context of superiority trials.24–29 However, the effect of crossovers prior to treatment initiation such as in radiotherapy trials is unknown.
Some authors have suggested performing the analysis using the ITT and PP populations, and that non-inferiority should only be concluded if the null hypothesis is rejected using both analyses.5 ,22 ,30 The Committee of Proprietary Medicinal Products Points-to-Consider states that ‘… similar conclusions from both the ITT and PP are required in a non-inferiority trial’.31 However, this has not been investigated comprehensively for trials with time-to-event outcomes with crossovers.
In this paper, we focus on non-inferiority trials of radiotherapy with a time-to-event outcome in patients with early stage breast cancer. Using simulation, we investigate the effect of patient crossover from the experimental to standard therapy (prior to initiation assigned therapy) on the analysis using the ITT, PP and AT and the combination of ITT and PP analysis sets with respect to type I error, bias and SEs.
The ITT approach uses all patients who were randomised to the study and analyses them according to their assigned treatment group regardless of whether they actually received or complied with the treatment. This is advantageous because it preserves the integrity of randomisation and therefore ensures that, on average, the treatment groups are comparable. In addition, since it includes all patients, it helps prevent bias which may occur when excluding patients.12 Furthermore, the ITT approach will produce results that are likely to be observed in the clinic.10 The International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use also states: “however, in an equivalence or non-inferiority trial use of the full analysis set (ITT) is generally not conservative and its role should be considered very carefully”.10
The PP approach excludes patients who have not completed their assigned treatment based on the study protocol. This approach aims to measure the ‘pure’ treatment effect by including only patients who comply with the protocol and excludes those who have crossed over.32 The use of the PP analysis in non-inferiority trials has increased because of the apparent anticonservative nature of the ITT in this setting.11
This approach analyses patients according to the treatment they actually received, and not the treatment assigned. Therefore, crossover patients are included in the analysis and are grouped with the treatment arm to which they crossed over.
Combination of ITT and PP analyses
This combined approach requires that both the ITT and PP analyses are performed and that the null hypothesis is rejected (ie, declaring non-inferiority) only if both analyses reject the null hypothesis.
Hypotheses and assessing non-inferiority
Following D'Agostino et al,5 let and represent the constant hazard rates for the experimental and standard therapy, respectively, and be the HR. Let M be the non-inferiority margin, that is, the maximum tolerable amount by which λE can be worse than λS (M>1). Then the null and alternative hypotheses are:
To test the hypothesis of non-inferiority, we compute the CI for . If the upper bound of the CI is less than M, then we can conclude that the experimental therapy is no worse than the standard therapy by a maximum of M, and hence is non-inferior to the standard therapy at a significance level of α.
The 5-year local recurrence rate following radiotherapy in women with early stage breast cancer who have undergone breast conserving surgery was approximately 5%, or .3 In recent trials of radiotherapy in women with breast cancer, non-inferiority margins of 1.5 and 1.7 have been used.2 ,3 Also, it is recommended that a one-sided α=0.025 be used for non-inferiority studies.10 ,31 We considered two non-inferiority trials with a 5-year local recurrence rate of 5%, a one-sided α=0.025, 90% power, 4 years of accrual and an additional 3 years of follow-up. On the basis of these parameters, we calculated total sample sizes of 5134 and 3004 for trials with non-inferiority margins of 1.5 and 1.7, respectively, assuming a 1:1 allocation ratio.33
Patients undergoing breast conserving therapy have a varying risk of recurrence, and therefore we considered two risk subgroups (high, low) and assumed the HR of high versus low risk to be 1.4, similar to that of a grade III versus grade I/II tumours.34 In addition, we assumed that 20% of the patients were high risk and 80% were low risk.
To evaluate the type I error rates, data were simulated under the null hypothesis where E is inferior to S with a true HR of θ being equal to the non-inferiority margin (θ=M). For each trial, we simulated data for two randomly generated treatment groups of equal size. For each patient, the baseline covariate of risk (high vs low) was generated using the binomial distribution with probability 0.2. Local recurrence-free survival times were generated using the formula35:where U is a random variable following a uniform distribution on the interval from 0 to 1, is the baseline hazard function, β is the vector of the regression coefficients and x is the vector of covariates. The regression coefficient for treatment (E vs S) was log (M) and log (1.4) for the risk (high vs low) covariate. Enrolment times were generated using the uniform distribution from 0 to 4 corresponding to 4 years of accrual. Patients were censored at the end of the trial if they remained event-free at that time.
We evaluated scenarios where the percentage of patients who crossed over from experimental to standard therapy was 2%, 5% and 10%. For each of these, we considered the following situations: (1) the crossover of patients was random, and (b) the high-risk patients were more likely to cross over (ie, non-random). This was simulated assuming that 50% of the crossover patients were high-risk patients.
For each approach, computation of the 100(1–2α) CI of the estimated HR, , was performed using the Cox proportional hazards (Cox-PH) model with α=0.025. We carried out 10 000 replications for each trial giving an SE of the estimate of type I error of 0.15%. The one-sided type I error was calculated as the proportion of trials that had the null hypothesis of inferiority rejected, that is, the proportion of trials in which the upper CI was less than M. Bias for each of the ITT, PP and AT analyses was calculated as the percentage difference between and , and averaged over the number of simulations. The SE was also averaged over the total number of simulations. All analyses were performed in R 3.0 (http://www.r-project.org).
Impact on type I error
The results of the type I errors for the four approaches are shown in table 1, and graphically in figure 1. The results showed that the AT approach had the best performance with type I errors closer to nominal for 2% and 5% crossovers, 0.028 and 0.027, respectively (figure 1A). We observed that the combined ITT+PP approach performed better than the separate ITT and PP analyses, and that the ITT and PP approaches had comparable overall type I errors. However, these approaches had type I errors that were greater than the nominal value, regardless of the crossover percentage. In general, overall type I errors increased as the crossover percentage increased for all approaches.
For scenarios with a random crossover (figure 1B), the AT approach had nominal or close to nominal type I errors for all crossover percentages. The ITT+PP approach had close to nominal type I error when the random crossover was 2%, but performed poorly as the random crossover percentage increased. The individual ITT and PP approaches had greater than nominal type I errors under all scenarios of random crossover.
Under non-random crossover scenarios (figure 1C), all approaches performed poorly irrespective of the crossover percentage with the exception of the AT analyses when the true HR was 1.5 and the crossover was 2% (table 1). In general, the PP approach had the worst performance under scenarios of non-random crossover.
Impact of bias
The AT approach also had the best performance in terms of overall bias of the HR estimates, whereas the ITT and PP approaches perform similarly (figure 2A). As the percentage of crossover patients increased, the overall per cent bias also increased for all approaches. When the crossovers were random (figure 2B), the AT approach had comparable bias across all levels of crossover percentages, whereas the ITT and PP approaches had greater bias as the crossover percentage increased. The ITT and PP approaches behaved similarly under the random and non-random scenarios, but their bias was larger under the non-random crossover scenario with the PP approach having a larger bias compared with the ITT (figure 2C). Similar to the ITT and PP approaches, the AT approach also showed increased bias as non-random crossover increased, albeit with smaller bias. For each approach, bias was greater when the true HR was 1.7 compared with 1.5.
Impact on SE
The three approaches had comparable overall SEs across all scenarios (table 1). However, a slight trend was observed where the AT approach had the smallest SEs, followed by the ITT, and then the PP approach. Furthermore, we observed that for each approach, the SEs were comparable under scenarios of random and non-random crossovers. SEs were greater for the trials where the true HR was 1.7 compared with 1.5.
In randomised non-inferiority trials of radiotherapy regimens in women with early stage breast cancer, it is inevitable that some patients will cross over from the experimental arm to the standard arm prior to treatment initiation due to complications in experimental radiotherapy planning or patient or physician preference. In such situations, the ideal population for the final analysis is unclear. We evaluated the performance of the ITT, PP, AT and combined ITT+PP approach under various crossover scenarios.
The AT approach had the best performance under all scenarios in terms of type I error rate. However, it can only be recommended for situations where the crossover is random. Patients who crossed over had their hazard of outcome determined by that of the standard group, and were analysed accordingly by the AT approach. Considering this and the fact that the crossover was random, it is not surprising that the AT approach had near nominal type I error rates under these situations.
Moreover, the combined ITT+PP approach performed better than the ITT and PP approaches separately. This is due to the fact that the ITT+PP approach requires both analyses to reject the null hypothesis prior to non-inferiority being concluded, hence adding an extra level of testing compared to individual ITT and PP approaches, and therefore making it harder to conclude non-inferiority. Interestingly, neither the ITT nor the PP approach can be recommended under simulated scenarios, adding to the literature that both approaches could provide increased erroneous results.16 ,23
We also observed that the AT approach had the lowest bias of the HR estimate across all crossover percentages. Moreover, the biases of the ITT and PP approaches were comparable across all scenarios. For all three approaches, the bias is in the negative direction, and generally increases as the crossover percentage increases, except for the AT approach under the random crossover scenarios where it is not affected by the percentage crossover. Reasons for this observation are similar to that of its performance in terms of type I error under the same scenarios.
The biases for all methods are larger in scenarios where the true HR is larger because this reflects a greater hazard of events in the experimental arm. Therefore, the crossover patients have a greater impact on the estimated HR, driving it closer to the null than in situations where the true HR is smaller.
Since the assessment of non-inferiority is based on the CI approach, a combination of greater bias in the negative direction and smaller SEs would yield a lower upper limit of the 95% CI, which is more likely to fall within the non-inferiority margin. Therefore, it is no coincidence that in general the scenarios with the greater bias and smaller SEs corresponded to the scenarios with larger type I error rates. We observed that within each approach, the SEs were comparable for random and non-random crossovers, but the bias was larger for non-random crossover, suggesting that the bias had a greater influence on the type I error rate when comparing non-random versus random crossover within each method.
Our study had some limitations. The generalisability of our findings may be limited since we studied trials with event rates that are pertinent to radiotherapy trials in patients with early stage breast cancer. However, our methodology and results can be applied to other clinical settings where crossovers occur prior to initiation of treatment. In diseases where the event rates differ from the ones evaluated in this research, further simulations would be required to evaluate these approaches. For simplicity, we assumed that non-random crossover was based on a single covariate. However, non-random crossover can occur for several reasons and, depending on the reason the for crossover, the hazards may also differ. We did not consider adjusting for baseline covariates in the analysis, which may improve the estimation of the treatment effect. However, this is less likely in large RCTs. We did not evaluate the causal proportional hazards estimator36 because it is not readily available in standard statistical software. Other complex methods based on the counterfactual framework, such as the randomised-based analysis which employs g-estimation methods, were also not evaluated.15 ,16
The choice of analysis population for non-inferiority trials is a difficult issue. We have shown that the AT approach preserves type I error under scenarios of random crossover. However, it is difficult to prove that crossover is random, and therefore assuming a random crossover may not be appropriate leading to concerns about the validity of the inference test. Moreover, the PP approach, which excludes patients, is likely to disturb the prognostic balance achieved by randomisation, which can also cause erroneous trial results. The advantage of the ITT approach is that it preserves the advantages of randomisation and mirrors what will happen in practice, and therefore is pragmatic. On the other hand, it can be anticonservative in situations where crossover is high. In our experience, the crossover percentage in radiotherapy trials in patients with early stage breast cancer is less than 2%, and we have shown that the AT and combined ITT+PP approaches are better at handling crossovers than the ITT and PP approaches.
The design, conduct and analysis of non-inferiority trials should be performed with extra rigour and to the highest standards. Attempts to prevent crossovers and other protocol deviations such as dropouts and losses to follow-up should be maximised. If a minimal percentage of crossovers were to occur, we have shown that the AT approach had the lowest type I error rates and smallest bias. A sensitivity analysis using the combined ITT+PP approach may also be warranted. In addition, both the ITT and PP results should be reported with details of the patients who crossed over.
Contributors SP, JAJ, CG, LT, TJW and MNL conceived the study. SP conducted the literature review, designed and implemented the simulation and wrote the initial draft of the manuscript. All authors reviewed and revised the draft version of the manuscript, and they also read and approved the final version of the manuscript.
Funding This research was funded in part by funds from the CANNeCTIN Program.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.