Article Text
Abstract
Objective Randomised controlled trials (RCTs) are often considered as the gold standard for assessing new health interventions. Patients are randomly assigned to receive an intervention or control. The effect of the intervention can be estimated by comparing outcomes between groups, whose prognostic factors are expected to balance by randomisation. However, patients’ noncompliance with their assigned treatment will undermine randomisation and potentially bias the estimate of treatment effect. Through simulation, we aim to compare common approaches in analysing noncompliant data under different noncompliant scenarios.
Settings Based on a real study, we simulated hypothetical trials by varying three noncompliant factors: the type, randomness and degree of noncompliance. We compared the intentiontotreat (ITT), astreated (AT), perprotocol (PP), instrumental variable (IV) and complier average casual effect (CACE) analyses to estimate large (50% improvement over the control), moderate (25% improvement) and null (same as the control) treatment effects. Different approaches were compared by the bias of estimate, mean square error (MSE) and 95% coverage of the true value.
Results For a large or moderate treatment effect, the ITT estimate was considerably biased in all scenarios. The AT, PP, IV and CACE estimates were unbiased when noncompliant behaviours were random. The IV estimate was unbiased when noncompliant behaviours were symmetrically dependent on patients’ conditions. The PP estimate was mostly unbiased when patients in the control group did not have access to the intervention. When the intervention was not different from the control, the ITT was less biased than the other approaches. Similar results were found when comparing the MSE and 95% coverage.
Conclusions The standard ITT analysis under noncompliance is biased when the intervention has a moderate or large effect. Alternative analyses can provide unbiased or less biased estimates. Based on the results, we make some suggestions on choosing optimal approaches for analysing specific noncompliant scenarios.
Statistics from Altmetric.com
Strengths and limitations of this study

We compared different methods to analyse noncompliant data by simulating hypothetical randomised controlled trials.

Different noncompliant scenarios were generated by three factors: the type, randomness and degree of noncompliance.

The simulation framework and parameters were built on a real study.

Patients’ prognostic factors and missing data due to withdrawal were not considered in the simulation.
Background
Randomised controlled trials (RCTs) are often considered as the gold standard for assessing new health interventions where patients are randomly assigned to receive an intervention or control (eg, placebo). Since patients’ prognostic factors are expected to balance by randomisation patients’ outcomes can be directly compared between groups to infer the effect of a treatment. In many cases, patients may not fully comply with their assigned treatment according to the protocol. Such protocol violation compromises the ‘fair’ comparison, which is protected by randomisation, and will potentially bias the estimate of treatment effect. Analysing RCTs subject to noncompliance can be challenging. While different analyses have been proposed to deal with noncompliance, the bias of treatment effect estimate is rarely compared among different approaches. Result interpretations also vary depending on the nature of noncompliance and the objective of a trial. Some RCTs, known as pragmatic trials,1–3 are primarily designed to guide clinical practice. Their goal is often to assess whether an intervention will work in routine practice. In contrast, nonpragmatic trials usually focus on the biological efficacy of an intervention. Despite the objective, an analysis that provides an unbiased or less biased estimate of treatment effect is always desirable. In this study, we compare common approaches to analyse noncompliant data in RCTs. The results will provide useful knowledge in choosing optimal methods for different noncompliant scenarios.
This study was motivated by an RCT that compared the integrated care organised through the Children's Treatment Network (CTN) with the usual care directed by parents for managing children with special healthcare needs.4 The CTN coordinated community resources to deliver comprehensive health services for the target children and their families. The hypothesis was that the target children's health outcomes would improve by receiving integrated, proactive and necessary services tailored for them. While the use of RCTs in assessing CTNlike interventions has been promising, noncompliant rates are generally high in these trials. This is largely due to the complexity of implementing multidisciplinary interventions in reallife settings.
Intention to treat (ITT), as treated (AT) and per protocol (PP) are common approaches to analyse noncompliant data in RCTs. The ITT analysis is considered as the gold standard5 but can be problematic for some scenarios.6 Multiple methods are often recommended for analysing RCTs with substantial noncompliance.7–9 A literature review10 randomly selected 100 RCTs published in high impact journals in 2008. Of 98 RCTs which reported noncompliance, 46 employed variations of PP analyses in addition to an ITT analysis. Another class of methods to deal with noncompliance includes instrumental variable (IV) and complier average causal effect (CACE) approaches.11–15 A conceptual difference among all these methods is that the ITT, IV and CACE approaches estimate treatment effects by preserving randomisation or accounting for potential confounding, but the AT and PP approaches do not. There are also other proposed approaches to correct for noncompliance in RCTs,16 for example, Gestimation and inverse probability weighted estimators. However, these methods have not been widely adopted. Therefore, we only included the ITT, AT, PP, IV and CACE methods in our comparison.
Very few studies have compared these five methods on the bias of estimating treatment effects. Bang and Davis17 had compared ITT, AT, PP and IV methods. They showed that ITT and IV analyses were biased in certain noncompliant cases. However, the authors did not include CACE analysis in their comparison and did not consider the situation where there was no crossover between treatment groups. This scenario is common when a new intervention is only accessible to patients who are offered it. In another study, McNamee18 compared ITT, AT, PP and IV analyses and concluded that an ITT analysis was not always biased towards the null while AT and PP analyses were generally biased. Sheng and Kim19 investigated the effect of noncompliance on ITT analysis of equivalence trials and showed that noncompliance did not always favour the null hypothesis, that is, no difference between treatment groups. Hertogh et al20 concluded that the IV method could give insight into confounding by noncompliance in RCTs.
Most of the previous studies did not consider different associations of noncompliant behaviours with patients’ conditions. For example, patients with certain characteristics may always reject a new intervention. Also, there are partial noncompliant cases where patients receive parts of the intervention even if they did not fully comply with the protocol. In our simulation study, we considered additional noncompliant scenarios that were not considered by previous studies and compared the five common methods by the bias of estimate, the mean square error (MSE) and 95% coverage of the true value. Our objectives were to compare the performance of different approaches in analysing noncompliant RCT data and make recommendations on optimal approaches under specific scenarios.
Methods
Simulation framework
In the CTN trial, over 50% of the children randomised in the CTN group did not fully comply with the intervention for various reasons. Primary and sensitivity analyses showed that the effect of the CTN was not significant but the estimates varied in direction, magnitude and precision.4 This observation prompted us to further investigate the impact of noncompliance on estimating treatment effects.
On the basis of the CTN setting, we simulated hypothetical RCTs where patients were randomly assigned to the intervention or usual care by a 1:1 allocation ratio. The parameters for generating hypothetical patients were estimated from the CTN trial. We simulated different noncompliant scenarios by varying three factors: (1) the type of noncompliers, (2) the randomness of noncompliance and (3) the degree of noncompliance. Our simulation framework is shown in figure 1. The design, conduct and reporting of this study has followed the guideline of designing and reporting simulation studies.21
Type of noncompliers
We considered two types of noncompliers which were defined as: nevertakers and alwaystakers.22 Nevertakers are patients who will always reject a new intervention if they are offered it. Alwaystakers will always receive a new intervention even if they are not offered it. Two scenarios were considered. In one scenario, we assumed that noncompliers were either nevertakers or alwaystakers, which mimicked the situation where patients were able to get the intervention elsewhere even if they were not offered it. In the other scenario, we assumed that noncompliers were only nevertakers, which mimicked the situation where the intervention was only accessible to patients who were offered it. In addition, we assumed that the intervention and usual care were the only treatment options.
Randomness of noncompliance
Noncompliant behaviours could be random or dependent on patients’ conditions. In particular, we considered six scenarios of dependent noncompliant behaviours that were studied by McNamee18:

Patients with good conditions would always get the intervention while patients with poor conditions would always reject it;

Patients with good conditions would always get the intervention;

Patients with poor conditions would always reject the intervention;

Patients with good conditions would always reject the intervention while patients with poor conditions would always get it;

Patients with good conditions would always reject the intervention;

Patients with poor conditions would always get the intervention.
Patients’ conditions were considered to be positively associated with their outcomes under usual care. We assumed that good condition represented an outcome score of at least 0.5 SDs above the group mean under usual care (assuming that a high score was a better outcome). Poor condition represented an outcome score of at least 0.5 SDs below the group mean under usual care. When there were no alwaystakers, only scenarios C and E were considered.
Degree of noncompliance
Degree of noncompliance referred to the proportion of interventional components that a patient did not receive according to the protocol. The simplest case was allornone where compliers received 100% of the intervention and noncompliers received none of it. For multifaceted interventions, patients were likely to receive some components of the intervention even if they did not fully comply with the protocol. In addition, patients might only receive parts of the intervention because of the intervention fidelity. For example, a systematic review showed that many interventions of integrated care did not actually deliver all services as planned due to complexity of implementation.23 Conversely, we also use the term ‘degree of compliance’ for the proportion of interventional components that a patient received according to the protocol. Noncompliance and compliance are used throughout this paper.
Allornone and partial noncompliance were considered in our simulation. For allornone case, we considered two compliance levels: all components (d=1) or none (d=0) of the intervention. A study reported that noncompliance rate could be as high as 30–40% for a treated population.24 Therefore, we randomly selected 30% of patients to receive a treatment opposite to what they were assigned for. For partial noncompliance, we considered four compliance levels: none (d=0), onethird (d=1/3), twothirds (d=2/3) or all (d=1) components of the intervention. These four levels have been studied in a previous simulation study.17
Simulation procedures
We employed a modified simulation model from the previous study.17 Let Y_{1} and Y_{0} be a pair of counterfactual outcomes for a patient if he or she were in the intervention and the usual care groups, respectively. In practice, we can only observe one of the counterfactual outcomes because we can never observe both outcomes for any patient at the same time. Thus, causal inference is often made at population level instead of patient level. By adopting a marginal view, we define the causal effect for the treatment of interest (δ) aswhere μ_{1} and µ_{0} were the means of Y_{1} and Y_{0}, respectively.
We chose µ_{0}=59 to be the effect of usual care. The effect of usual care was estimated from the CTN trial with an SD of 10. For the effect of treatment (µ_{1}), we chose three different cases: µ_{1}=89 for a 50% improvement over the usual care; µ_{1}=74 for a 25% improvement over the usual care and µ_{1}=59 for no difference from the usual care. Each case was simulated separately. We then generated individual patient’s counterfactual outcomes through a normal distribution:
Thus, good condition was defined for a patient with Y_{0}>64 (half SDs above the group mean under usual care) and poor condition was defined for a patient with Y_{0}<54 (half SDs below the group mean under usual care). A group indicator Z (1=intervention and 0=usual care) was generated for each patient from a Bernoulli distribution with equal probability of 0.5 of being assigned to either group. The observed outcome for a patient was calculated bywhere d_{i} was the degree of treatment compliance with the protocol for patient i. For allornone case, d was either 1 or 0. For partial compliance, d took a value of 0, 1/3, 2/3 or 1.
In the CTN trial, 450 patients were needed to detect a minimum clinically important difference (MCID) of 15 with 80% statistical power and 5% ɑ. Using the same MCID, we chose to simulate 500 participants in each hypothetical trial. We estimated the SE of the treatment effect to be 1.53 from the CTN trial. Based on this estimate, at least 816 simulations were needed to produce an effect estimate within 1% accuracy of the MCID by the standard formula.21 To have sufficient power, we chose to generate 1000 simulations per scenario. The steps of simulation are shown in figure 2.
Statistical analysis
This section describes the different methods that we compared. The estimated treatment effect was expressed as the difference of the mean score between groups. The methods were compared by the bias, MSE and 95% coverage.21 Bias is defined aswhere is the average estimate of interest over all iterations and δ is the true value. MSE is calculated bywhere is the SD of the empirical distribution of the estimates from all iterations. Out of all iterations, 95% coverage is the number of times the 95% CIs include δ. CI is calculated by normal approximation.
Intention to treat
In the ITT approach, patients are analysed by how they were randomised regardless of their actual compliance with treatment. The treatment effect was estimated bywhere and were the mean outcome scores of the intervention and usual care groups, respectively.
As treated
The AT approach compares patients by the treatment they actually received. The treatment effect was estimated by
For patients with partial compliance, the treatment effect was estimated by regressing the degree of compliance (d) on the outcome in a linear regression model. Also, a different notation, , was used to differentiate from the estimators of making causal inferences.
Per protocol
The PP approach excludes patients who did not fully comply with treatment protocol. The treatment effect was estimated by
Instrumental variable
The IV approach employs the randomisation indicator (Z) as an IV to adjust for the proportion of noncompliant patients. The theory and assumptions of IV analysis have been thoroughly discussed in the literature.12 ,13 We used the standard IV estimator for linear models:
The Fieller's theorem was used to calculate the SE of the estimate.17
Complier average causal effect
The CACE method estimates the treatment effect among compliers. The assumptions and the casual framework of CACE have been discussed elsewhere.14 ,15 ,25 The treatment effect was estimated by
There are two general approaches to the CACE inference25: the maximum likelihood approach by expectationmaximisation (EM) algorithm and the Bayesian approach. The CACE analysis in this paper was conducted in Mplus (V.7; Mac OS X 10.6.8) Los Angeles, CA, Muthén & Muthén, which employed the EM algorithm. The rest of the analyses and simulations were performed in R V.2.15.2.
Cutoff points for noncompliance
In practice, investigators often dichotomise patients to be either compliers or noncompliers. A cutoff of 80% is commonly used.9 ,10 Patients are considered to be compliers if they have complied with at least 80% of the intervention according to protocol. A cutoff of 100% has also been used such that patients are considered to be compliers only if they have complied with the entire treatment protocol. Compliers are expected to receive the full effect of an intervention. We conducted a sensitivity analysis to investigate the impact of these two cutoffs on dichotomising compliers. A new compliance indicator for patient i was defined as: t_{i}=I(d≥0.8) for a cutoff of 80% and t_{i}=I(d_{i}=1) for a cutoff of 100%. The indicator function I returned 1 if the condition was satisfied and 0 otherwise. We then performed the same analysis by replacing d_{i} with t_{i} for patient i.
Results
The estimates by different analyses under the simulated scenarios, and their bias, MSE and 95% coverage are summarised in tables 1 and 2. For a large treatment effect (a mean difference of 30, representing 50% improvement over the usual care), the ITT estimate was considerably biased. The other estimates were unbiased when noncompliant behaviours were random. When there were alwaystakers and nevertakers, the PP and CACE estimates were the least biased if nevertaking behaviours were dependent on patients’ conditions (scenarios C and E). The IV estimate was the least biased if noncompliant behaviours were symmetrically dependent on patients’ conditions (scenarios A and D). When there were only nevertakers, the PP estimates were mostly unbiased. Similar results were found for a moderate treatment effect (details not shown). When an intervention was not different from the usual care (details not shown), all estimates were unbiased when noncompliant behaviours were random. The ITT estimate was also unbiased if noncompliant behaviours were symmetrically dependent on patients’ conditions. For the remaining scenarios, the ITT estimate was the least biased and the other estimates were biased with a different degree.
The IV estimate generally had a larger MSE than the other estimates. That was because the standard IV estimator was sensitive to noncompliant rates. For example, when noncompliant rate was equal between groups, the denominator was zero and the estimate became undefined. When nevertakers and alwaystakers were considered, the ITT and IV approaches generally had a better 95% coverage than the other approaches. When there were only nevertakers, the PP approach had the best 95% coverage. For a large treatment effect, the ITT approach had zero 95% coverage. Overall, the results from comparing the MSE and 95% coverage were consistent with those from comparing the bias of estimates.
In the sensitivity analysis, we compared the impact of using a cutoff of 80% or 100% to dichotomise compliant patients. The results showed that dichotomising patients by a cutoff of 80% resulted in less biased estimates than dichotomising patients by a cutoff of 100%. For a null treatment effect, the treatment estimates obtained by applying a cutoff of 80% were less biased than those obtained by directly analysing patients on the observed degree of compliance.
Discussion
Through simulation, we compared different methods of analysing noncompliant RCT data. Our results showed that the ITT approach was the most optimal when estimating a null effect since it provided an unbiased or the least biased estimate in different scenarios. This result was consistent with the general opinion that the ITT estimate is conservative towards the null. However, for the case of a large or moderate treatment effect, the ITT approach was much more biased than the other approaches. When patients’ noncompliant behaviours were purely random, the AT, PP, IV and CACE approaches all provided unbiased estimates. For other noncompliant scenarios that we considered, the choice of optimal method varied. Figure 3 summarises the choices of methods under different scenarios to produce an unbiased or less biased estimate.
Although the ITT method is the most commonly reported analysis, other analyses of noncompliant data may provide a better estimate. Thus, understanding the extent of bias for different analyses is important when choosing an optimal approach and interpreting the results. Our results are limited by a number of factors. First, we did not consider specific prognostic factors in the simulation. Adjusting for prognostic factors may improve the estimation of treatment effect. However, we did consider different associations between patients’ outcomes and noncompliant behaviours. Second, we assumed that the clinical effect of an intervention was proportional to the degree of compliance. This linear association might not represent all reallife situations. Third, we did not consider missing data in the simulation and assumed that noncompliers’ outcomes were still collectable. Alternatively, imputation techniques can be applied to handle missing data.26 Finally, we only simulated a subset of general noncompliant scenarios. Thus, our findings may not be generalisable to other scenarios.
Despite the limitations, our study has several strengths. The simulation framework was built on three key factors of noncompliance: the type of noncompliers, the randomness of noncompliance and the degree of noncompliance. These three factors were not considered simultaneously in previous studies. We generated a total of 60 scenarios by varying noncompliant factors and the magnitude of treatment effect (ie, null, moderate or large). The findings will help investigators choose the optimal approaches when dealing with similar noncompliant problems. Our results also confirm some previous findings. For example, the ITT analysis was unbiased if the treatment effect was zero.9 All estimates were unbiased if noncompliance was independent of patients’ outcomes and the IV estimate was also unbiased when noncompliance was symmetrically dependent on patients’ outcomes.17 In addition, we found that the PP estimate was unbiased when there were only nevertakers. While the real impact of noncompliance on estimating treatment effect is difficult to generalise, we have compared the performance of common analyses under specific noncompliant scenarios. The results highlight the value of employing multiple approaches to analyse noncompliant data. Our work has considered additional noncompliant scenarios that were not considered by previous studies. It also contributes to the quality assessment of research evidence generated from RCTs subject to noncompliance and provides basis for a more complex evaluation.
Conclusion
Our simulation shows that the ITT analysis under noncompliance is considerably biased when an intervention has a large effect over the control. Alternative analyses can provide unbiased or less biased estimates. For RCTs subject to noncompliance, we make some suggestions for the choice of analyses under specific scenarios to minimise the bias of estimated treatment effect. Our study also informs the design of further investigations on the issue of noncompliance in RCTs.
References
Footnotes

Contributors CY conceived the study, designed and performed the simulations, conducted the statistical analyses, interpreted the results and drafted the manuscript; LT advised on the design of the study and revised the manuscript; JB and GB contributed to the interpretation of the results and revision of the manuscript; all authors have read and approved the final manuscript.

Funding This research received no specific grant from any funding agency in the public, commercial or notforprofit sectors. CY is supported in part by funding from the Father Sean O'Sullivan Research Center (FSORC) Studentship award, the Canadian Institute of Health Research (CIHR) Training Award in Bridging Scientific Domains for Drug Safety and Effectiveness, and the Canadian Network and Centre for Trials Internationally (CANNeCTIN) programme.

Competing interests The funding organisations have no influence on the submitted work. All authors declare no competing interests.

Provenance and peer review Not commissioned; externally peer reviewed.

Data sharing statement No additional data are available.
Request permissions
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.