Article Text

## Abstract

**Objective** This article reviews and compares four commonly used approaches to assess patient responsiveness with a treatment or therapy (return to normal (RTN), minimal important difference (MID), minimal clinically important improvement (MCII), OMERACT-OARSI [Outcome Measures in Rheumatology—Osteoarthris Reseach Society International] (OO)) and demonstrates how each of the methods can be formulated in a multilevel modelling (MLM) framework.

**Design** Cohort study.

**Setting** A cohort of patients undergoing total hip and knee replacement were recruited from a single UK National Health Service hospital.

**Population** 400 patients from the Arthroplasty Pain Experience cohort study undergoing total hip (n=210) and knee (n=190) replacement who completed the Intermittent and Constant Osteoarthritis Pain questionnaire prior to surgery and then at 3, 6 and 12 months after surgery.

**Primary outcomes** The primary outcome was defined as a response to treatment following total hip or knee replacement. We compared baseline scores, change scores and proportion of individuals defined as ‘responders’ using traditional and MLM approaches with patient responsiveness.

**Results** Using existing approaches, baseline and change scores are underestimated, and the variance of baseline and change scores overestimated in comparison with MLM approaches. MLM increases the proportion of individuals defined as responding in RTN, MID and OO criteria compared with existing approaches. Using MLM with the MCII criteria reduces the number of individuals identified as responders.

**Conclusion** MLM improves the estimation of the SD of baseline and change scores by explicitly incorporating measurement error into the model and avoiding regression to the mean when making individual predictions. Using refined definitions of responsiveness may lead to a reduction in misclassification when attempting to predict who does and does not respond to an intervention and clarifies the similarities between existing methods.

- Patient Responsiveness
- Multi-level Modelling
- Return To Normal
- Minimal Important Difference
- Patient-reported outcomes
- Minimial clinical important improvement

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

## Statistics from Altmetric.com

- Patient Responsiveness
- Multi-level Modelling
- Return To Normal
- Minimal Important Difference
- Patient-reported outcomes
- Minimial clinical important improvement

### Strengths and limitations of this study

Four different approaches to patient responsiveness can be unified into a multilevel model.

A multilevel model framework of patient responsiveness highlights the similarities and differences between existing methods.

Multilevel models provide a simple framework which incorporates measurement error and non-linear change in trajectories of patient recovery.

Multilevel models are technically more demanding than existing formulations of patient responsiveness, and convergence is not guaranteed.

Multilevel models does not improve the arbitrary placement of the thresholds that define responsiveness in comparison with existing methods.

## Introduction

Joint replacement is an increasingly common elective procedure worldwide1–3 and improving patient-reported outcomes after joint replacement is a key research priority due to the high prevalence of poor outcomes after joint arthroplasty.4 Poor outcomes include continuing pain, functional limitations5 and increased healthcare utilisation.6 However, there is some debate on how the efficacy of interventions can be judged due to the variety of different outcomes used in orthopaedic research.7–18 Traditionally, objective primary outcomes such as prosthetic survivorship and mortality rates were used.19 However, more recently there has been a shift in focus which ensures that patients’ perspective is central to the assessment of intervention success.20 Many studies now use patient-reported outcome measures (PROMs) as endpoints, and these tools can assess a variety of health outcomes, including pain,7 21 physical functioning,7 mental well-being22 and health-related quality of life.23

Although PROMs are widely used,4 there is still debate in how the results should be interpreted and how to define a clinically meaningful change.24–35 From a measurement perspective, the ability to estimate if a change has occurred depends on the application of an appropriate statistical model. From a clinical perspective, some authors suggest that the average statistical change is insufficient to ‘tell you anything about an individual’s chances of improving’.36 Therefore, the utility of simple statistical analyses are limited when attempting to help patients weigh up the risks and benefits of undergoing surgery.

To supplement simple statistical analysis, many researchers attempt to dichotomise the population into those who have or have not responded to an intervention, creating a two-stage process of defining an outcome. There are a number of different methods (definitions) that can be used to dichotomise the population, and these secondary analyses are collectively referred to as responsiveness analyses.36 Four substantively different methods of estimating the proportion of individuals who respond to an intervention have been previously identified in orthopaedic research36: (1) return to normal (RTN), (2) distribution-based minimally important difference (MID), (3) anchor-based minimal clinically important difference (MCII) and (4) the OMERACT-OARSI (OO) responder criteria. The first three approaches are generic and used in many fields of health research, whereas the fourth approach is specific to orthopaedic research, but in principle could be used in many fields of health research.

Each of these approaches is often thought to be methodologically distinct. However, all of the methods can be shown to be special cases of a multilevel model (MLM). MLM have been used in a wide variety of contexts ranging from growth modelling to modelling educational data. One of the principal reasons to use MLM is to take advantage of the direct estimation of different variance components37 and provide efficient and unbiased estimates of fixed and random effects.38

Despite a number of extensive reviews of patient responsiveness,31 33 39 40 we will describe these four approaches to calculating responsiveness and highlight the substantively different decisions each method makes. We will then describe how each approach can be translated into a MLM framework, emphasising the benefits of the translation and contrast the approaches using an example from the APEX (Arthroplasty Pain Experience)cohort study.41

## Methods

We outline the four existing approaches to patient responsiveness previously used in orthopaedic research36 and describe their potential limitations and how they can be formulated in an MLM framework.

### Review of existing approaches to responsiveness

Return to normal (RTN)26 suggests that an individual has returned to ‘normal’ if their score on a postintervention outcome is greater than 2 SDs from the mean baseline response.

The use of 2 SD appears to be justified on theoretical grounds; however, it is quite arbitrary. Assuming scores are normally distributed and measured without error, 2 SDs corresponds to a 95.5% prediction interval for the mean, which is similar to the equally arbitrary and much-criticised significance threshold p=0.05 (type I error=0.05) criterion used throughout medical research.42 43 However, there is no reason why a 1.6 or a 2.6 SD cut-offs should not be used in preference, which corresponds to 90% and 99% prediction intervals.

The method also assumes the observed change is unlikely to be due to chance alone and does not account for any uncertainty. To alleviate this problem the use of the Relative Change Index (RCI) was proposed to be used in conjunction with the RTN classification.24 27 The RCI constructs a test of the individual’s score at follow-up compared with their baseline, where the SE of the difference is estimated indirectly using the SD of the baseline score and an assumed reliability coefficient from empirical research or a range of reliability values in the spirit of a sensitivity analysis.

A commonly described distribution-based minimally important difference (MID) method classifies individuals as responders if their observed change is greater than a fixed proportion of the SD of the presurgery score.30 There has been much debate about the exact size, or proportion, of the SD change score to use; however, 0.5 SDs have been reported widely and suggested to be a difference that is minimally perceptible to patients.30 Any individual with a change score greater than 0.5 SD of the baseline score is defined as responding to the treatment. Similar to the RTN criteria, the decision to use 0.5 is arbitrary and there is no reason why more or less stringent criteria of 0.25, 1 or 2 SDs could not be used. Additionally, there is no reason why a test such as the RCI should not be conducted to check that change is beyond the bounds of measurement error.

Anchor-based minimal clinically important improvement (MCII) is similar to the MID approach, in that it defines an individual as a responder based on their individual change score. However, the cut point is determined in individuals who report themselves as having an outcome which is either good/satisfactory or perceived as improved from baseline using an external anchoring question. The authors proposed using a cut point at the 75th centile of the change score in those who are satisfied.34 Therefore any individuals, whether they are satisfied or not, who has a change score greater than the 75th centile are defined as responders. A closely related anchor-based metric is the patient acceptable symptom state (PASS),35 the construction is similar to that of the MCII with the exception that it is based on the final score of patients opposed to change. Conceptually, the PASS is more closely related to the RTN definition of responsiveness, and much of the criticism levied against MCII and RTN can therefore be applied to the PASS.

The OMERACT-OARSI (OO) criteria32 recognises that a response to an intervention may occur in one or more different measured outcomes, that is, a multivariate response mechanism. In keeping with much of the orthopaedic literature, they assume the proposed score has been rescaled between 0 and 100,32 and that a responder is defined as any individual with (1) a ≥50% relative change or a ≥20-point absolute change on one or more responses scales or (2) a ≥20% relative change or a ≥10-point absolute change in two or more response scales. Relative change is defined as the ratio of the change to the individual baseline score multiplied by 100. Unlike the RTN, MID or MCII, it is very clear that the thresholds for relative and absolute changes are based on a panel of expert opinions and are fixed.

Despite the variety of existing approaches used to identifying responders, there are a number of problems common to all methods. Common assumptions include: (1) each observed outcome is measured without error and reflects the true underlying patient’s response, test–retest reliability studies indicate that this is not a realistic assumption44; (2) regression to the mean does not occur and therefore the variance of the change score will not be overestimated; (3) floor and ceiling effects do not bias estimates of the variance of the change score.45

Furthermore, in RTN, specific combinations of means and variances may result in a threshold beyond the range of the measurement tool, therefore no individuals would be defined as responding to a therapy. The MCII approach assumes the additional anchoring variable is measured without error and the response trajectory is distinct from those who are unsatisfied.46 The method also assumes a two-parameter logistic function is an appropriate model for the cumulative proportional rank of patients and change in outcome, and that there is no uncertainty in the calculation of the threshold.47 Finally, the OO approach considers a response in two or more outcomes. However, it does not explicitly describe how the correlation between the two outcomes is accounted for and fails to recognise that if not modelled appropriately may introduce bias.48–50

The four methods identified have a number of other limitations,25 but they are difficult to compare methods when presented as distinct approaches.

Embedding them in a unified statistical framework makes their underlying assumptions explicit, while highlighting their similarities and differences. In addition, it provides a framework to incorporate non-linear change, measurement error and variability in the timing of measurement occasions, all of which are to be expected in real word data collections and are critical when attempting to asses a patients change at a specified point in time.

### MLM approach to responsiveness

We now present a general MLM for patient responsiveness and show how the four approaches described above can be specified as special cases.

Under the assumption of linear change, the measured response (y) at the *i*th occasion for the *j*th individual is modelled as a linear function of time.

(1)

where
is the time at which measurement
was taken on individual
, coded as zero at baseline.
is the baseline population average response and
represents the *j*th individual difference from the baseline response. The sum of
is the estimated individual baseline response.
represents the population average change per unit increase in time and
represents the *j*th individual difference from the population average change per unit increase in time. The sum of
is the estimated individual average change per unit increase in time. Measurement error in the linear trajectory is represented by
.

The variance in individual deviations from the population average response at baseline and average rate of change are and , respectively. Furthermore, the correlation between baseline measurements and rate of change can be assumed to be independent or correlated by constraining to be zero or allowing it to be freely estimated. The variances of the shrunken residuals and , also known as empirical Bayes estimates, are typically less than the estimated population variances and as they shrink towards the population averages of and . The extent of the shrinkage depends on the number of measurement occasions and the within-individual variability, with greater shrinkage as the number of measurement occasions decrease and as the within-individual variance increases. A more detailed discussion of MLM can be found in most advanced statistics textbooks.48 51 52

We now describe how the four traditional approaches to measuring patient responsiveness can be unified into a MLM framework. General benefits of the MLM over existing approaches include: (1) with more than three measurement occasions, an MLM directly allows for measurement error, ; (2) the use of shrunken residuals and allows for regression to the mean when predicting an individual’s score53; (3) MLM can be extended to include multivariate response models which appropriately model the correlation between two or more outcomes and (4) MLM allows for variability in the timing of measurement occasions. Fundamentally, the MLM approach recognises that observed patient responses are subject to error, and therefore the true patient’s response following an intervention must be estimated.

#### MLM: return to normal

To apply the RTN criteria using an MLM approach, we first estimate the baseline population SD in individuals considered to be abnormal using the model described in equation 1. Assuming is normally distributed at baseline with a population mean and variance , prediction interval for the baseline measurement can be constructed, that is, where α is the type I error rate and z is the critical value from a standard normal distribution. Importantly, is not assumed to be measured without error, and therefore estimates of are less likely to be biased than using simple methods. However, it is important to note that the choice of α is entirely that of the researcher, and while α=0.05 (leading to ) is common, more or less stringent criteria could be applied.

The second step is to estimate the score of the individual at time *j* following surgery and determine if it is within the baseline prediction interval. This prediction is simply calculated by substituting estimates of
,
,
and
into equation 1, to give the empirical best linear unbiased prediction or the *j*th individual at the *i*th occasion.54

Finally, to determine whether or not the response of the individual following surgery is greater than one would attribute to chance alone, that is, the null hypothesis that the *j*th individual's slope is not equal to zero, a test statistic similar to RCI should be conducted,

#### MLM: minimally important difference

The threshold of minimally important difference can also be estimated using an MLM. Similar to RTN, a linear model of change is applied, as in equation 1. Then the population SD of the baseline response is estimated by
. By comparing the estimated change for the *j*th individual
with the baseline SD, that is,
, the individual can be classed as a responder or not. The MID approach does not specifically state whether a test of whether an individual’s change scores is less than the MID threshold should be conducted, but a test statistic is simply constructed as
.

#### MLM minimal clinically important improvement

The MLM MCII requires a simple extension of the univariate model presented previously (equation 1). The outcome of interest is stratified using an external criterion. The stratification is achieved by creating dummy variables for those who are unsatisfied/satisfied with some aspect of their treatment, for example,
takes the values 0 and 1 representing unsatisfied and satisfied individuals, respectively, and
. These dummy variables are then included as additional explanatory variables, with no overall model intercept, and interacted with *t.*

(2)

Therefore, and are the mean population outcome score at baseline for those who are satisfied and unsatisfied, respectively, and and are the corresponding mean population changes per unit of time. Variances and covariances are similarly interpreted for those who are satisfied and unsatisfied, respectively. However, that satisfaction on the external anchoring question is assumed to be known without error, and individual effects and errors for are uncorrelated with those for because the satisfied and unsatisfied categories are mutually exclusive. Whether or not it is desirable to fit a model to both satisfied and unsatisfied individuals simultaneously is debatable, as only those who are satisfied contribute to the definition of MCII. However, we present a simultaneous modelling approach to satisfied and unsatisfied individuals as it make the underlying modelling assumptions explicit. Furthermore, if the stratification on satisfaction status leads to small samples, alternative estimators and degree of freedom can be used in an MLM framework to account for this, that is, restricted maximum likelihood, restricted generalised least squares or adjustments to the denominator df.55

Following the prediction of each individual’s trajectory, including those unsatisfied with treatment, the second stage in the MCII method requires a threshold for determining responsiveness. Using a similar suggestion to Tubach *et al*,35 the 75th centile of those who are satisfied could be used to classify all individuals as responding or not. Similar to the MID, there is no suggestion of whether a test against the null value of the 75th centile should be constructed, but this is easily done within the MLM framework.

#### MLM: OO criteria

The OO criteria can be similarly extended into a multivariate MLM framework by the inclusion of dummy variables and reshaping into a ‘double’ long format with both responses stored in a single vector. Figure 1 illustrates the data structure for a bivariate model.

Dummy variables, also known as response indicators, are used to denote the response options: is coded 1 for the first measurement outcome (pain) and 0 for the second outcome (function), and . The response indicators and their interactions with are included as explanatory variables to obtain the following bivariate response model.

(3)

With a similar functional form to the univariate MLM, there are separate population and individual intercepts for the first and second outcome ( , respectively), and separate population and individual slopes are estimated for the second outcome . Using an MLM approach, the outcomes are modelled jointly, which allows for non-zero covariances between the intercepts and slopes of the two responses ( ). The measurement errors for the two responses are not assumed to be independent, with their covariance directly estimated ( ).

Finally, the threshold of response must be decided and individual trajectories estimated and classified. Similar to the other methods, it is relatively simple to construct a test statistic for testing whether individual slopes are significantly different from the chosen threshold.

#### Limitations of the MLM approach

The MLM approach described by equation 1, equations 2 and 3 assumes that change in the outcome is linearly associated with time. The linearity assumption is imposed for simplicity. Non-linear changes are easily incorporated by including higher order polynomials or using linear or non-linear splines.56

The standard MLM approach also fails to directly address the issue of floor and ceiling effects. Mixed-response multilevel Tobit models allow for such effects and provide some adjustment.45 57 Furthermore, while the MLM described in equation 2 allows for heterogeneity in known groups, they fail to allow for heterogeneity in trajectories when the groups are unknown. The use of group-based trajectory models or growth mixture models in these circumstances may reveal latent (unobserved) classes of individuals with distinct patterns of recovery.58

### Example: the APEX cohort study

Using a mixed cohort of patients undergoing total hip replacement (THR) and total knee replacement (TKR),41 we investigated the performance of the existing and MLM approaches using four definitions of responsiveness. A simulated data set and code to fit each of these models are included in the online supplementary material.

Patients in the APEX cohort completed the Intermittent and Constant Osteoarthritis Pain (ICOAP) questionnaire before and after surgery at approximately 0, 3, 6 and 12 months. The date at which the postsurgical questionnaire was completed is recorded in days postsurgery. As the name suggests, the ICOAP questionnaire attempts to measure intermittent and constant pain.21 The developers of the tool suggest three ways of summarising the scale to generate an intermittent, constant and total pain scores (the sum of the intermittent and constant pain subscales). The tool is scored between 0 and 100 and a full description of the ICOAP scale is provided in the original validation paper.21 Satisfaction of pain relief following surgery was recorded by asking patients to ‘Rate the relief of pain provided by (hip/knee) replacement’ using a single-item 5-point scale (none, poor, fair, good, excellent). We categorised good and excellent as a satisfactory outcome following surgery.

Using the three methods of aggregation, we present estimates of pain at baseline and for change at approximately 3 months postsurgery using existing methods (summary statistics) and MLM estimates.

To facilitate comparisons between existing and MLM approaches, we assume that all individuals are measured at exactly 0, 3, 6 and 12 months. While the existing approaches only uses the 0 and 3 month measurements, the MLM approach uses a random intercept and random slopes across four measurements occasions, using two linear splines with a knot point at 3 months to estimate the response at 3 months. The inclusion of the second spline and the additional two measurement occasions allows adjustment for measurement error in the MLM approach. Tables 1 and 2 presents results for patients undergoing THR and TKR, respectively. The placement of the knot at 3 months was determined by visually inspecting the data, similar to the methods by Lenguerrand *et al*.59 With more complex patterns of response an iterative model fitting approach is likely to be required to determine the optimal knot placement. Modelling assumptions were checked using ladder plots and normal plots of residuals.

To describe how the responsiveness classification in patients changed at 3 months, we used an Exact McNemar test to compare the number of discordant classifications generated by existing and MLM approaches.

The APEX study was approved by Southampton and South West Hampshire Research Ethics Committee (09/H0504/94).

## Results

In all subdivisions of the ICOAP questionnaire, for THR/TKR patients, the estimates of the baseline mean and change scores are approximately equal to those from the MLM approaches. In addition, estimates of the SD of baseline and change score are overestimated using existing approaches in THR/TKR patients. The SD of baseline measurements of pain were approximately 3.3 and 3.75 points greater in existing methods compared with MLM methods in THR/TKR patients, respectively, while the corresponding SD of change scores are approximately 6.3 and 7 points greater in existing methods (see tables 1 and 2, respectively). An example of model diagnostics is included in figure 2, which presents the observed ICOAP total scores at 0, 3, 6 and 12 months and the population average response in ICOAP across time. In addition, baseline, change residuals are also presented using quantile–quantile plots.

### Return to normal

Using similar baseline score estimates to the conventional RTN approach and different SDs results in a reduction in the threshold of response by approximately five points in THR/TKR patients. The change in threshold is due to smaller estimates of baseline and change SDs. When considering the total ICOAP score, the MLM approach classifies approximately 10% more individuals as responders than existing approaches. It is also interesting to note that the threshold of response using the existing approach when considering total ICOAP score in THR patients is beyond the range of the score.

### Minimally important difference

Using similar change score estimates and different SDs results in an approximately 2-point reduction in the MID threshold in THR/TKR patients. The reduced threshold results in more individuals being classified as responders using the MLM approach.

### Minimally clinically important difference

Using the MLM approach in satisfied and unsatisfied individuals results in a small increase in the threshold of response in comparison with existing approaches. The increase in threshold is due to shrunken residuals and therefore reduced the variability of predicted change scores. The increase in threshold results in a reduced number of individuals (3% of THR patients and 6% of TKR patients) being identified as responders.

### OMERACT-OARSI

The OO approach uses fixed definitions of responsiveness. Individual estimates of change from the bivariate MLM for constant and intermittent pain are very similar to those from the univariate MLM. However, the SD of the change score is reduced by approximately 0.5 and 1 points in constant and intermittent pain comparing the univariate and bivariate MLM, respectively, whereas the SD of baseline score approximately the same. Despite the larger absolute threshold of 20 and 10 points for changes in one or two items, respectively, that is, larger than MID, there is an increase in the proportion of individuals identified as responding. The increase is partly due to the use of the relative change threshold and the reduced variability in change in comparison with the univariate MLM using MID definition of responsiveness.

### Responsiveness classification

The effect of using a MLM approach to defining patient responsiveness compared with existing approaches is presented in tables 3 and 4 for THR and TKR patients,respectively. While the use of MLM provides refined thresholds of responsiveness, it fundamentally changes the way individuals are classified due to adjustment for measurement error, regression to the mean and ability to conduct refined tests. Patients previously defined as non-responding using existing methods are now responders (positive change) in MLM approaches, and similarly, patients defined as responders using existing methods are classified as non-responders (negative change) in MLM (see figure 3 for graphical illustration). MLM MID and OO methods appear to be most consistent in the reclassification of patients increasing the number of patients defined as non-responders using existing methods as responders in MLM approaches, whereas MLM RTN and MCII provide a more fundamental change the classifications of patient responsiveness.

## Discussion

The primary purpose of a responsiveness analysis is to convey the variability of an individual’s chances of perceiving an improvement following a treatment. Existing approaches appear to be distinct from one another, and the precise relationship between existing methods was unclear.

We have clearly shown how four commonly used approaches to estimating patient responsiveness can be incorporated into the unified statistical framework of MLM. Their translation into unified framework makes many of the assumption (linearity of response, heterogeneity in the timing of measures, multiple measurements) underpinning existing approaches clear. The application of patient responsiveness models in a cohort of orthopaedic patients illustrates how SDs of baseline and change scores in existing approaches are overestimated in comparison with the MLM approach. Thresholds for defining responders from MLM are lower when based on SD, and therefore existing approaches to RTN and MID may appear to provide a worse case scenario with regards the efficacy of a treatment or therapy. Similarly, responsiveness approaches based on the distribution of predicted change scores (MCII) are higher in MLM, and therefore existing thresholds could be described as a best-case scenario in comparison with existing approaches. However, the reclassification of patients using the MLM is more fundamental than increasing or reducing the threshold to determine responsiveness, the implicit adjustments for measurement error and regression to the mean change which patients are defined as responding or not.

MLM are not the panacea of patient responsiveness methods; however, they do highlight implicit assumptions in existing approaches and provide sensible adjustments for measurement error, regression to the mean and heterogeneity in the timing of measurements in clinical studies.

From a clinical perspective, it is very clear there are differences in the outcomes at 3 months following THR and TKR, while patient’s baseline level of pain is similar between THR and TKR, the response to surgery is less and consistently less (lower variability) for all pain domains. Similarly, we have previously observed different patterns of pain, in relation to pain at rest and pain on movement,60 yet the mechanisms underpinning theses effects are unclear and require more research, but this emphasises the necessity to treat hip and knee osteoarthritis as separate disease states.

### Strengths and limitations

One of the key benefits of adopting a MLM approach when defining clinically meaningful change is the improved estimation of individual change by the greater flexibility in the MLM framework. Specifically, MLM do not assume the response is measured without error, they adjust for regression to the mean, the trajectory of recovery is not constrained to be linear and data from multiple measurements and variability in the timing of those measurement occasions can also be incorporated into the model. Furthermore, assuming the underlying MLM adequately represents the true causal mechanism, parameter estimates, SDs and SEs will be unbiased in comparison with existing approaches.

Furthermore, the unification of existing approaches into a MLM framework clearly shows the relationship between the four different approaches. For example, RTN and MID share the same underlying model. MCII is also the same at RTN/MID if you assume the baseline and change scores are the same across strata of unsatisfied/satisfied patients. Similarly, the model underlying OO approach is the same as the RTN/MID approach if you assume independence in the measured outcomes of the two trajectories and the error term.

Despite the numerous benefits of adopting an MLM approach, it is not to say it is without some limitations. MLMs are technically more demanding than existing formulations of patient responsiveness, and while there are no theoretical limits on how large or small samples have to be, model convergence is not guaranteed. The need to use appropriate estimation methods38 or denominator degrees of freedom55 when calculating standard errors also requires consideration. Furthermore, it is important to perform model diagnostic to check the data fit with the model. MLM does not improve the arbitrary placement of the thresholds that define responsiveness in comparison with existing methods, and despite the improved trajectory modelling, it is currently unclear if the refined definitions correlate more strongly with patient expectations, functional data, long-term self-reported outcomes or hard endpoints such as mortality and revision. Further research externally validating the classification using patient groups, expert opinion61 or functional data may demonstrate improved classification of those responding to treatment in comparison with existing methods. In addition, the use of multiple measurements in MLM primarily restricts the method to a research setting.

It is clear the MLMs provide considerable advantages over existing approaches to identifying patients who respond to a treatment. Consequently, the proportion of individuals thought not to be responding to treatment may be smaller than previously thought. Using the redefined definition may reduce the number of individuals misclassified as non-responders and improve the prediction of those individuals who are likely to respond to treatment.

### Supplementary Material

## Acknowledgments

We thank Professor Fiona Steele for her extensive comments and help preparing this manuscript. The research team acknowledges the support of the National Institute for Health Research (NIHR) through the Comprehensive Clinical Research Network.

## References

## Footnotes

Contributors AS: study conception, wrote first draft and revisions and final approval of manuscript; VW: APEX study design acquisition of data, drafting and review of manuscript and final approval of manuscript; EL: APEX study acquisition of data, drafting and review of manuscript and final approval of manuscript; RG-H: APEX study design acquisition of data, drafting and review of manuscript and final approval of manuscript; JD: ACHE study design, drafting and review of manuscript and final approval of manuscript; DB: ACHE study design, drafting and review of manuscript and final approval of manuscript; AP: ACHE study design, drafting and review of manuscript and final approval of manuscript and AWB: APEX/ACHE study design acquisition of data, drafting and review of manuscript and final approval of manuscript.

Funding This work was supported by AS and is funded by an MRC Fellowship MR/L01226X/1 and HTA Project: 11/63/01—‘ACHE’. This article presents independent research funded by the NIHR under its Programme Grants for Applied Research programme (RP-PG-0407-10070).

Disclaimer The views expressed in this article are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Data sharing statement Data are unavailable to share. CORRECT

## Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.