Article Text

Download PDFPDF

Estimating treatment effects in randomised controlled trials with non-compliance: a simulation study
  1. Chenglin Ye1,2,
  2. Joseph Beyene1,
  3. Gina Browne1,3,
  4. Lehana Thabane1,2
  1. 1Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada
  2. 2Biostatistics Unit, St Joseph's Healthcare Hamilton, Hamilton, Ontario, Canada
  3. 3School of Nursing, McMaster University, Hamilton, Ontario, Canada
  1. Correspondence to Dr Lehana Thabane; thabanl{at}mcmaster.ca

Abstract

Objective Randomised controlled trials (RCTs) are often considered as the gold standard for assessing new health interventions. Patients are randomly assigned to receive an intervention or control. The effect of the intervention can be estimated by comparing outcomes between groups, whose prognostic factors are expected to balance by randomisation. However, patients’ non-compliance with their assigned treatment will undermine randomisation and potentially bias the estimate of treatment effect. Through simulation, we aim to compare common approaches in analysing non-compliant data under different non-compliant scenarios.

Settings Based on a real study, we simulated hypothetical trials by varying three non-compliant factors: the type, randomness and degree of non-compliance. We compared the intention-to-treat (ITT), as-treated (AT), per-protocol (PP), instrumental variable (IV) and complier average casual effect (CACE) analyses to estimate large (50% improvement over the control), moderate (25% improvement) and null (same as the control) treatment effects. Different approaches were compared by the bias of estimate, mean square error (MSE) and 95% coverage of the true value.

Results For a large or moderate treatment effect, the ITT estimate was considerably biased in all scenarios. The AT, PP, IV and CACE estimates were unbiased when non-compliant behaviours were random. The IV estimate was unbiased when non-compliant behaviours were symmetrically dependent on patients’ conditions. The PP estimate was mostly unbiased when patients in the control group did not have access to the intervention. When the intervention was not different from the control, the ITT was less biased than the other approaches. Similar results were found when comparing the MSE and 95% coverage.

Conclusions The standard ITT analysis under non-compliance is biased when the intervention has a moderate or large effect. Alternative analyses can provide unbiased or less biased estimates. Based on the results, we make some suggestions on choosing optimal approaches for analysing specific non-compliant scenarios.

  • STATISTICS & RESEARCH METHODS
  • RANDOMIZED CONTROLLED TRIAL
  • NON-COMPLIANCE

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • We compared different methods to analyse non-compliant data by simulating hypothetical randomised controlled trials.

  • Different non-compliant scenarios were generated by three factors: the type, randomness and degree of non-compliance.

  • The simulation framework and parameters were built on a real study.

  • Patients’ prognostic factors and missing data due to withdrawal were not considered in the simulation.

Background

Randomised controlled trials (RCTs) are often considered as the gold standard for assessing new health interventions where patients are randomly assigned to receive an intervention or control (eg, placebo). Since patients’ prognostic factors are expected to balance by randomisation patients’ outcomes can be directly compared between groups to infer the effect of a treatment. In many cases, patients may not fully comply with their assigned treatment according to the protocol. Such protocol violation compromises the ‘fair’ comparison, which is protected by randomisation, and will potentially bias the estimate of treatment effect. Analysing RCTs subject to non-compliance can be challenging. While different analyses have been proposed to deal with non-compliance, the bias of treatment effect estimate is rarely compared among different approaches. Result interpretations also vary depending on the nature of non-compliance and the objective of a trial. Some RCTs, known as pragmatic trials,1–3 are primarily designed to guide clinical practice. Their goal is often to assess whether an intervention will work in routine practice. In contrast, non-pragmatic trials usually focus on the biological efficacy of an intervention. Despite the objective, an analysis that provides an unbiased or less biased estimate of treatment effect is always desirable. In this study, we compare common approaches to analyse non-compliant data in RCTs. The results will provide useful knowledge in choosing optimal methods for different non-compliant scenarios.

This study was motivated by an RCT that compared the integrated care organised through the Children's Treatment Network (CTN) with the usual care directed by parents for managing children with special healthcare needs.4 The CTN coordinated community resources to deliver comprehensive health services for the target children and their families. The hypothesis was that the target children's health outcomes would improve by receiving integrated, proactive and necessary services tailored for them. While the use of RCTs in assessing CTN-like interventions has been promising, non-compliant rates are generally high in these trials. This is largely due to the complexity of implementing multidisciplinary interventions in real-life settings.

Intention to treat (ITT), as treated (AT) and per protocol (PP) are common approaches to analyse non-compliant data in RCTs. The ITT analysis is considered as the gold standard5 but can be problematic for some scenarios.6 Multiple methods are often recommended for analysing RCTs with substantial non-compliance.7–9 A literature review10 randomly selected 100 RCTs published in high impact journals in 2008. Of 98 RCTs which reported non-compliance, 46 employed variations of PP analyses in addition to an ITT analysis. Another class of methods to deal with non-compliance includes instrumental variable (IV) and complier average causal effect (CACE) approaches.11–15 A conceptual difference among all these methods is that the ITT, IV and CACE approaches estimate treatment effects by preserving randomisation or accounting for potential confounding, but the AT and PP approaches do not. There are also other proposed approaches to correct for non-compliance in RCTs,16 for example, G-estimation and inverse probability weighted estimators. However, these methods have not been widely adopted. Therefore, we only included the ITT, AT, PP, IV and CACE methods in our comparison.

Very few studies have compared these five methods on the bias of estimating treatment effects. Bang and Davis17 had compared ITT, AT, PP and IV methods. They showed that ITT and IV analyses were biased in certain non-compliant cases. However, the authors did not include CACE analysis in their comparison and did not consider the situation where there was no crossover between treatment groups. This scenario is common when a new intervention is only accessible to patients who are offered it. In another study, McNamee18 compared ITT, AT, PP and IV analyses and concluded that an ITT analysis was not always biased towards the null while AT and PP analyses were generally biased. Sheng and Kim19 investigated the effect of non-compliance on ITT analysis of equivalence trials and showed that non-compliance did not always favour the null hypothesis, that is, no difference between treatment groups. Hertogh et al20 concluded that the IV method could give insight into confounding by non-compliance in RCTs.

Most of the previous studies did not consider different associations of non-compliant behaviours with patients’ conditions. For example, patients with certain characteristics may always reject a new intervention. Also, there are partial non-compliant cases where patients receive parts of the intervention even if they did not fully comply with the protocol. In our simulation study, we considered additional non-compliant scenarios that were not considered by previous studies and compared the five common methods by the bias of estimate, the mean square error (MSE) and 95% coverage of the true value. Our objectives were to compare the performance of different approaches in analysing non-compliant RCT data and make recommendations on optimal approaches under specific scenarios.

Methods

Simulation framework

In the CTN trial, over 50% of the children randomised in the CTN group did not fully comply with the intervention for various reasons. Primary and sensitivity analyses showed that the effect of the CTN was not significant but the estimates varied in direction, magnitude and precision.4 This observation prompted us to further investigate the impact of non-compliance on estimating treatment effects.

On the basis of the CTN setting, we simulated hypothetical RCTs where patients were randomly assigned to the intervention or usual care by a 1:1 allocation ratio. The parameters for generating hypothetical patients were estimated from the CTN trial. We simulated different non-compliant scenarios by varying three factors: (1) the type of non-compliers, (2) the randomness of non-compliance and (3) the degree of non-compliance. Our simulation framework is shown in figure 1. The design, conduct and reporting of this study has followed the guideline of designing and reporting simulation studies.21

Figure 1

The simulation framework.

Type of non-compliers

We considered two types of non-compliers which were defined as: never-takers and always-takers.22 Never-takers are patients who will always reject a new intervention if they are offered it. Always-takers will always receive a new intervention even if they are not offered it. Two scenarios were considered. In one scenario, we assumed that non-compliers were either never-takers or always-takers, which mimicked the situation where patients were able to get the intervention elsewhere even if they were not offered it. In the other scenario, we assumed that non-compliers were only never-takers, which mimicked the situation where the intervention was only accessible to patients who were offered it. In addition, we assumed that the intervention and usual care were the only treatment options.

Randomness of non-compliance

Non-compliant behaviours could be random or dependent on patients’ conditions. In particular, we considered six scenarios of dependent non-compliant behaviours that were studied by McNamee18:

  1. Patients with good conditions would always get the intervention while patients with poor conditions would always reject it;

  2. Patients with good conditions would always get the intervention;

  3. Patients with poor conditions would always reject the intervention;

  4. Patients with good conditions would always reject the intervention while patients with poor conditions would always get it;

  5. Patients with good conditions would always reject the intervention;

  6. Patients with poor conditions would always get the intervention.

Patients’ conditions were considered to be positively associated with their outcomes under usual care. We assumed that good condition represented an outcome score of at least 0.5 SDs above the group mean under usual care (assuming that a high score was a better outcome). Poor condition represented an outcome score of at least 0.5 SDs below the group mean under usual care. When there were no always-takers, only scenarios C and E were considered.

Degree of non-compliance

Degree of non-compliance referred to the proportion of interventional components that a patient did not receive according to the protocol. The simplest case was all-or-none where compliers received 100% of the intervention and non-compliers received none of it. For multifaceted interventions, patients were likely to receive some components of the intervention even if they did not fully comply with the protocol. In addition, patients might only receive parts of the intervention because of the intervention fidelity. For example, a systematic review showed that many interventions of integrated care did not actually deliver all services as planned due to complexity of implementation.23 Conversely, we also use the term ‘degree of compliance’ for the proportion of interventional components that a patient received according to the protocol. Non-compliance and compliance are used throughout this paper.

All-or-none and partial non-compliance were considered in our simulation. For all-or-none case, we considered two compliance levels: all components (d=1) or none (d=0) of the intervention. A study reported that non-compliance rate could be as high as 30–40% for a treated population.24 Therefore, we randomly selected 30% of patients to receive a treatment opposite to what they were assigned for. For partial non-compliance, we considered four compliance levels: none (d=0), one-third (d=1/3), two-thirds (d=2/3) or all (d=1) components of the intervention. These four levels have been studied in a previous simulation study.17

Simulation procedures

We employed a modified simulation model from the previous study.17 Let Y1 and Y0 be a pair of counterfactual outcomes for a patient if he or she were in the intervention and the usual care groups, respectively. In practice, we can only observe one of the counterfactual outcomes because we can never observe both outcomes for any patient at the same time. Thus, causal inference is often made at population level instead of patient level. By adopting a marginal view, we define the causal effect for the treatment of interest (δ) asEmbedded Imagewhere μ1 and µ0 were the means of Y1 and Y0, respectively.

We chose µ0=59 to be the effect of usual care. The effect of usual care was estimated from the CTN trial with an SD of 10. For the effect of treatment (µ1), we chose three different cases: µ1=89 for a 50% improvement over the usual care; µ1=74 for a 25% improvement over the usual care and µ1=59 for no difference from the usual care. Each case was simulated separately. We then generated individual patient’s counterfactual outcomes through a normal distribution:Embedded Image

Thus, good condition was defined for a patient with Y0>64 (half SDs above the group mean under usual care) and poor condition was defined for a patient with Y0<54 (half SDs below the group mean under usual care). A group indicator Z (1=intervention and 0=usual care) was generated for each patient from a Bernoulli distribution with equal probability of 0.5 of being assigned to either group. The observed outcome for a patient was calculated byEmbedded Imagewhere di was the degree of treatment compliance with the protocol for patient i. For all-or-none case, d was either 1 or 0. For partial compliance, d took a value of 0, 1/3, 2/3 or 1.

In the CTN trial, 450 patients were needed to detect a minimum clinically important difference (MCID) of 15 with 80% statistical power and 5% ɑ. Using the same MCID, we chose to simulate 500 participants in each hypothetical trial. We estimated the SE of the treatment effect to be 1.53 from the CTN trial. Based on this estimate, at least 816 simulations were needed to produce an effect estimate within 1% accuracy of the MCID by the standard formula.21 To have sufficient power, we chose to generate 1000 simulations per scenario. The steps of simulation are shown in figure 2.

Figure 2

Summary of the simulation steps. Y1=the counterfactual outcome for a patient in the intervention group. Y0=the counterfactual outcome for a patient in the usual care group. Z=randomisation indicator. d=the degree of compliance with the intervention. y=the simulated outcome for a patient. Scenario A: patients with good conditions will always get the intervention and those with poor conditions will always reject it. Scenario B: patients with good conditions will always get the intervention. Scenario C: patients with poor conditions will always reject the intervention. Scenario D: patients with good conditions will always reject the intervention and those with poor conditions will always get it. Scenario E: patients with good conditions will always reject the intervention. Scenario F: patients with poor conditions will always get the intervention.

Statistical analysis

This section describes the different methods that we compared. The estimated treatment effect was expressed as the difference of the mean score between groups. The methods were compared by the bias, MSE and 95% coverage.21 Bias is defined asEmbedded Imagewhere Embedded Image is the average estimate of interest over all iterations and δ is the true value. MSE is calculated byEmbedded Imagewhere Embedded Image is the SD of the empirical distribution of the estimates from all iterations. Out of all iterations, 95% coverage is the number of times the 95% CIs include δ. CI is calculated by normal approximation.

Intention to treat

In the ITT approach, patients are analysed by how they were randomised regardless of their actual compliance with treatment. The treatment effect was estimated byEmbedded Imagewhere Embedded Image and Embedded Image were the mean outcome scores of the intervention and usual care groups, respectively.

As treated

The AT approach compares patients by the treatment they actually received. The treatment effect was estimated byEmbedded Image

For patients with partial compliance, the treatment effect was estimated by regressing the degree of compliance (d) on the outcome in a linear regression model. Also, a different notation, Embedded Image, was used to differentiate from the estimators of making causal inferences.

Per protocol

The PP approach excludes patients who did not fully comply with treatment protocol. The treatment effect was estimated byEmbedded Image

Instrumental variable

The IV approach employs the randomisation indicator (Z) as an IV to adjust for the proportion of non-compliant patients. The theory and assumptions of IV analysis have been thoroughly discussed in the literature.12 ,13 We used the standard IV estimator for linear models:Embedded Image

The Fieller's theorem was used to calculate the SE of the estimate.17

Complier average causal effect

The CACE method estimates the treatment effect among compliers. The assumptions and the casual framework of CACE have been discussed elsewhere.14 ,15 ,25 The treatment effect was estimated byEmbedded Image

There are two general approaches to the CACE inference25: the maximum likelihood approach by expectation-maximisation (EM) algorithm and the Bayesian approach. The CACE analysis in this paper was conducted in Mplus (V.7; Mac OS X 10.6.8) Los Angeles, CA, Muthén & Muthén, which employed the EM algorithm. The rest of the analyses and simulations were performed in R V.2.15.2.

Cut-off points for non-compliance

In practice, investigators often dichotomise patients to be either compliers or non-compliers. A cut-off of 80% is commonly used.9 ,10 Patients are considered to be compliers if they have complied with at least 80% of the intervention according to protocol. A cut-off of 100% has also been used such that patients are considered to be compliers only if they have complied with the entire treatment protocol. Compliers are expected to receive the full effect of an intervention. We conducted a sensitivity analysis to investigate the impact of these two cut-offs on dichotomising compliers. A new compliance indicator for patient i was defined as: ti=I(d≥0.8) for a cut-off of 80% and ti=I(di=1) for a cut-off of 100%. The indicator function I returned 1 if the condition was satisfied and 0 otherwise. We then performed the same analysis by replacing di with ti for patient i.

Results

The estimates by different analyses under the simulated scenarios, and their bias, MSE and 95% coverage are summarised in tables 1 and 2. For a large treatment effect (a mean difference of 30, representing 50% improvement over the usual care), the ITT estimate was considerably biased. The other estimates were unbiased when non-compliant behaviours were random. When there were always-takers and never-takers, the PP and CACE estimates were the least biased if never-taking behaviours were dependent on patients’ conditions (scenarios C and E). The IV estimate was the least biased if non-compliant behaviours were symmetrically dependent on patients’ conditions (scenarios A and D). When there were only never-takers, the PP estimates were mostly unbiased. Similar results were found for a moderate treatment effect (details not shown). When an intervention was not different from the usual care (details not shown), all estimates were unbiased when non-compliant behaviours were random. The ITT estimate was also unbiased if non-compliant behaviours were symmetrically dependent on patients’ conditions. For the remaining scenarios, the ITT estimate was the least biased and the other estimates were biased with a different degree.

Table 1

Summary of the results when never-takers and always-takers were allowed (treatment effect=30)

Table 2

Summary of the results when only never-takers were allowed (treatment effect=30)

The IV estimate generally had a larger MSE than the other estimates. That was because the standard IV estimator was sensitive to non-compliant rates. For example, when non-compliant rate was equal between groups, the denominator was zero and the estimate became undefined. When never-takers and always-takers were considered, the ITT and IV approaches generally had a better 95% coverage than the other approaches. When there were only never-takers, the PP approach had the best 95% coverage. For a large treatment effect, the ITT approach had zero 95% coverage. Overall, the results from comparing the MSE and 95% coverage were consistent with those from comparing the bias of estimates.

In the sensitivity analysis, we compared the impact of using a cut-off of 80% or 100% to dichotomise compliant patients. The results showed that dichotomising patients by a cut-off of 80% resulted in less biased estimates than dichotomising patients by a cut-off of 100%. For a null treatment effect, the treatment estimates obtained by applying a cut-off of 80% were less biased than those obtained by directly analysing patients on the observed degree of compliance.

Discussion

Through simulation, we compared different methods of analysing non-compliant RCT data. Our results showed that the ITT approach was the most optimal when estimating a null effect since it provided an unbiased or the least biased estimate in different scenarios. This result was consistent with the general opinion that the ITT estimate is conservative towards the null. However, for the case of a large or moderate treatment effect, the ITT approach was much more biased than the other approaches. When patients’ non-compliant behaviours were purely random, the AT, PP, IV and CACE approaches all provided unbiased estimates. For other non-compliant scenarios that we considered, the choice of optimal method varied. Figure 3 summarises the choices of methods under different scenarios to produce an unbiased or less biased estimate.

Figure 3

Choosing optimal analyses for different non-compliant scenarios. ITT, intention to treat; AT, as treated; PP, per protocol; IV, instrumental variable; CACE, complier average causal effect. Scenario A: patients with good conditions will always get the intervention and those with poor conditions will always reject it. Scenario C: patients with poor conditions will always reject the intervention. Scenario D: patients with good conditions will always reject the intervention and those with poor conditions will always get it. Scenario E: patients with good conditions will always reject the intervention. A good condition was defined to have an outcome score at least 0.5 SDs above the group mean under usual care. A poor condition was defined to have an outcome score at last 0.5 SDs below the group mean under usual care. In addition, it was assumed that the intervention and usual care were the only treatment option.

Although the ITT method is the most commonly reported analysis, other analyses of non-compliant data may provide a better estimate. Thus, understanding the extent of bias for different analyses is important when choosing an optimal approach and interpreting the results. Our results are limited by a number of factors. First, we did not consider specific prognostic factors in the simulation. Adjusting for prognostic factors may improve the estimation of treatment effect. However, we did consider different associations between patients’ outcomes and non-compliant behaviours. Second, we assumed that the clinical effect of an intervention was proportional to the degree of compliance. This linear association might not represent all real-life situations. Third, we did not consider missing data in the simulation and assumed that non-compliers’ outcomes were still collectable. Alternatively, imputation techniques can be applied to handle missing data.26 Finally, we only simulated a subset of general non-compliant scenarios. Thus, our findings may not be generalisable to other scenarios.

Despite the limitations, our study has several strengths. The simulation framework was built on three key factors of non-compliance: the type of non-compliers, the randomness of non-compliance and the degree of non-compliance. These three factors were not considered simultaneously in previous studies. We generated a total of 60 scenarios by varying non-compliant factors and the magnitude of treatment effect (ie, null, moderate or large). The findings will help investigators choose the optimal approaches when dealing with similar non-compliant problems. Our results also confirm some previous findings. For example, the ITT analysis was unbiased if the treatment effect was zero.9 All estimates were unbiased if non-compliance was independent of patients’ outcomes and the IV estimate was also unbiased when non-compliance was symmetrically dependent on patients’ outcomes.17 In addition, we found that the PP estimate was unbiased when there were only never-takers. While the real impact of non-compliance on estimating treatment effect is difficult to generalise, we have compared the performance of common analyses under specific non-compliant scenarios. The results highlight the value of employing multiple approaches to analyse non-compliant data. Our work has considered additional non-compliant scenarios that were not considered by previous studies. It also contributes to the quality assessment of research evidence generated from RCTs subject to non-compliance and provides basis for a more complex evaluation.

Conclusion

Our simulation shows that the ITT analysis under non-compliance is considerably biased when an intervention has a large effect over the control. Alternative analyses can provide unbiased or less biased estimates. For RCTs subject to non-compliance, we make some suggestions for the choice of analyses under specific scenarios to minimise the bias of estimated treatment effect. Our study also informs the design of further investigations on the issue of non-compliance in RCTs.

References

Footnotes

  • Contributors CY conceived the study, designed and performed the simulations, conducted the statistical analyses, interpreted the results and drafted the manuscript; LT advised on the design of the study and revised the manuscript; JB and GB contributed to the interpretation of the results and revision of the manuscript; all authors have read and approved the final manuscript.

  • Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. CY is supported in part by funding from the Father Sean O'Sullivan Research Center (FSORC) Studentship award, the Canadian Institute of Health Research (CIHR) Training Award in Bridging Scientific Domains for Drug Safety and Effectiveness, and the Canadian Network and Centre for Trials Internationally (CANNeCTIN) programme.

  • Competing interests The funding organisations have no influence on the submitted work. All authors declare no competing interests.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.