Article Text


A unified multi-level model approach to assessing patient responsiveness including; return to normal, minimally important differences and minimal clinically important improvement for patient reported outcome measures
  1. Adrian Sayers1,2,
  2. Vikki Wylde1,
  3. Erik Lenguerrand1,
  4. Rachael Gooberman-Hill1,
  5. Jill Dawson3,
  6. David Beard4,
  7. Andrew Price4,
  8. Ashley W Blom1
  1. 1 Musculoskeletal Research Unit, School of Clinical Sciences, University of Bristol, Southmead Hospital, Bristol, UK
  2. 2 School of Social and Community Medicine, University of Bristol, Bristol, UK
  3. 3 Nuffield Department of Population Health, University of Oxford, Oxford, UK
  4. 4 Biomedical Research Unit, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Science, Nuffield Orthopaedic Centre, Oxford, UK
  1. Correspondence to Dr. Adrian Sayers; adrian.sayers{at}


Objective This article reviews and compares four commonly used approaches to assess patient responsiveness with a treatment or therapy (return to normal (RTN), minimal important difference (MID), minimal clinically important improvement (MCII), OMERACT-OARSI [Outcome Measures in Rheumatology—Osteoarthris Reseach Society International] (OO)) and demonstrates how each of the methods can be formulated in a multilevel modelling (MLM) framework.

Design Cohort study.

Setting A cohort of patients undergoing total hip and knee replacement were recruited from a single UK National Health Service hospital.

Population 400 patients from the Arthroplasty Pain Experience cohort study undergoing total hip (n=210) and knee (n=190) replacement who completed the Intermittent and Constant Osteoarthritis Pain questionnaire prior to surgery and then at 3, 6 and 12 months after surgery.

Primary outcomes The primary outcome was defined as a response to treatment following total hip or knee replacement. We compared baseline scores, change scores and proportion of individuals defined as ‘responders’ using traditional and MLM approaches with patient responsiveness.

Results Using existing approaches, baseline and change scores are underestimated, and the variance of baseline and change scores overestimated in comparison with MLM approaches. MLM increases the proportion of individuals defined as responding in RTN, MID and OO criteria compared with existing approaches. Using MLM with the MCII criteria reduces the number of individuals identified as responders.

Conclusion MLM improves the estimation of the SD of baseline and change scores by explicitly incorporating measurement error into the model and avoiding regression to the mean when making individual predictions. Using refined definitions of responsiveness may lead to a reduction in misclassification when attempting to predict who does and does not respond to an intervention and clarifies the similarities between existing methods.

  • Patient Responsiveness
  • Multi-level Modelling
  • Return To Normal
  • Minimal Important Difference
  • Patient-reported outcomes
  • Minimial clinical important improvement

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • Four different approaches to patient responsiveness can be unified into a multilevel model.

  • A multilevel model framework of patient responsiveness highlights the similarities and differences between existing methods.

  • Multilevel models provide a simple framework which incorporates measurement error and non-linear change in trajectories of patient recovery.

  • Multilevel models are technically more demanding than existing formulations of patient responsiveness, and convergence is not guaranteed.

  • Multilevel models does not improve the arbitrary placement of the thresholds that define responsiveness in comparison with existing methods.


Joint replacement is an increasingly common elective procedure worldwide1–3 and improving patient-reported outcomes after joint replacement is a key research priority due to the high prevalence of poor outcomes after joint arthroplasty.4 Poor outcomes include continuing pain, functional limitations5 and increased healthcare utilisation.6 However, there is some debate on how the efficacy of interventions can be judged due to the variety of different outcomes used in orthopaedic research.7–18 Traditionally, objective primary outcomes such as prosthetic survivorship and mortality rates were used.19 However, more recently there has been a shift in focus which ensures that patients’ perspective is central to the assessment of intervention success.20 Many studies now use patient-reported outcome measures (PROMs) as endpoints, and these tools can assess a variety of health outcomes, including pain,7 21 physical functioning,7 mental well-being22 and health-related quality of life.23

Although PROMs are widely used,4 there is still debate in how the results should be interpreted and how to define a clinically meaningful change.24–35 From a measurement perspective, the ability to estimate if a change has occurred depends on the application of an appropriate statistical model. From a clinical perspective, some authors suggest that the average statistical change is insufficient to ‘tell you anything about an individual’s chances of improving’.36 Therefore, the utility of simple statistical analyses are limited when attempting to help patients weigh up the risks and benefits of undergoing surgery.

To supplement simple statistical analysis, many researchers attempt to dichotomise the population into those who have or have not responded to an intervention, creating a two-stage process of defining an outcome. There are a number of different methods (definitions) that can be used to dichotomise the population, and these secondary analyses are collectively referred to as responsiveness analyses.36 Four substantively different methods of estimating the proportion of individuals who respond to an intervention have been previously identified in orthopaedic research36: (1) return to normal (RTN), (2) distribution-based minimally important difference (MID), (3) anchor-based minimal clinically important difference (MCII) and (4) the OMERACT-OARSI (OO) responder criteria. The first three approaches are generic and used in many fields of health research, whereas the fourth approach is specific to orthopaedic research, but in principle could be used in many fields of health research.

Each of these approaches is often thought to be methodologically distinct. However, all of the methods can be shown to be special cases of a multilevel model (MLM). MLM have been used in a wide variety of contexts ranging from growth modelling to modelling educational data. One of the principal reasons to use MLM is to take advantage of the direct estimation of different variance components37 and provide efficient and unbiased estimates of fixed and random effects.38

Despite a number of extensive reviews of patient responsiveness,31 33 39 40 we will describe these four approaches to calculating responsiveness and highlight the substantively different decisions each method makes. We will then describe how each approach can be translated into a MLM framework, emphasising the benefits of the translation and contrast the approaches using an example from the APEX (Arthroplasty Pain Experience)cohort study.41


We outline the four existing approaches to patient responsiveness previously used in orthopaedic research36 and describe their potential limitations and how they can be formulated in an MLM framework.

Review of existing approaches to responsiveness

Return to normal (RTN)26 suggests that an individual has returned to ‘normal’ if their score on a postintervention outcome is greater than 2 SDs from the mean baseline response.

The use of 2 SD appears to be justified on theoretical grounds; however, it is quite arbitrary. Assuming scores are normally distributed and measured without error, 2 SDs corresponds to a 95.5% prediction interval for the mean, which is similar to the equally arbitrary and much-criticised significance threshold p=0.05 (type I error=0.05) criterion used throughout medical research.42 43 However, there is no reason why a 1.6 or a 2.6 SD cut-offs should not be used in preference, which corresponds to 90% and 99% prediction intervals.

The method also assumes the observed change is unlikely to be due to chance alone and does not account for any uncertainty. To alleviate this problem the use of the Relative Change Index (RCI) was proposed to be used in conjunction with the RTN classification.24 27 The RCI constructs a test of the individual’s score at follow-up compared with their baseline, where the SE of the difference is estimated indirectly using the SD of the baseline score and an assumed reliability coefficient from empirical research or a range of reliability values in the spirit of a sensitivity analysis.

A commonly described distribution-based minimally important difference (MID) method classifies individuals as responders if their observed change is greater than a fixed proportion of the SD of the presurgery score.30 There has been much debate about the exact size, or proportion, of the SD change score to use; however, 0.5 SDs have been reported widely and suggested to be a difference that is minimally perceptible to patients.30 Any individual with a change score greater than 0.5 SD of the baseline score is defined as responding to the treatment. Similar to the RTN criteria, the decision to use 0.5 is arbitrary and there is no reason why more or less stringent criteria of 0.25, 1 or 2 SDs could not be used. Additionally, there is no reason why a test such as the RCI should not be conducted to check that change is beyond the bounds of measurement error.

Anchor-based minimal clinically important improvement (MCII) is similar to the MID approach, in that it defines an individual as a responder based on their individual change score. However, the cut point is determined in individuals who report themselves as having an outcome which is either good/satisfactory or perceived as improved from baseline using an external anchoring question. The authors proposed using a cut point at the 75th centile of the change score in those who are satisfied.34 Therefore any individuals, whether they are satisfied or not, who has a change score greater than the 75th centile are defined as responders. A closely related anchor-based metric is the patient acceptable symptom state (PASS),35 the construction is similar to that of the MCII with the exception that it is based on the final score of patients opposed to change. Conceptually, the PASS is more closely related to the RTN definition of responsiveness, and much of the criticism levied against MCII and RTN can therefore be applied to the PASS.

The OMERACT-OARSI (OO) criteria32 recognises that a response to an intervention may occur in one or more different measured outcomes, that is, a multivariate response mechanism. In keeping with much of the orthopaedic literature, they assume the proposed score has been rescaled between 0 and 100,32 and that a responder is defined as any individual with (1) a ≥50% relative change or a ≥20-point absolute change on one or more responses scales or (2) a ≥20% relative change or a ≥10-point absolute change in two or more response scales. Relative change is defined as the ratio of the change to the individual baseline score multiplied by 100. Unlike the RTN, MID or MCII, it is very clear that the thresholds for relative and absolute changes are based on a panel of expert opinions and are fixed.

Despite the variety of existing approaches used to identifying responders, there are a number of problems common to all methods. Common assumptions include: (1) each observed outcome is measured without error and reflects the true underlying patient’s response, test–retest reliability studies indicate that this is not a realistic assumption44; (2) regression to the mean does not occur and therefore the variance of the change score will not be overestimated; (3) floor and ceiling effects do not bias estimates of the variance of the change score.45

Furthermore, in RTN, specific combinations of means and variances may result in a threshold beyond the range of the measurement tool, therefore no individuals would be defined as responding to a therapy. The MCII approach assumes the additional anchoring variable is measured without error and the response trajectory is distinct from those who are unsatisfied.46 The method also assumes a two-parameter logistic function is an appropriate model for the cumulative proportional rank of patients and change in outcome, and that there is no uncertainty in the calculation of the threshold.47 Finally, the OO approach considers a response in two or more outcomes. However, it does not explicitly describe how the correlation between the two outcomes is accounted for and fails to recognise that if not modelled appropriately may introduce bias.48–50

The four methods identified have a number of other limitations,25 but they are difficult to compare methods when presented as distinct approaches.

Embedding them in a unified statistical framework makes their underlying assumptions explicit, while highlighting their similarities and differences. In addition, it provides a framework to incorporate non-linear change, measurement error and variability in the timing of measurement occasions, all of which are to be expected in real word data collections and are critical when attempting to asses a patients change at a specified point in time.

MLM approach to responsiveness

We now present a general MLM for patient responsiveness and show how the four approaches described above can be specified as special cases.

Under the assumption of linear change, the measured response (y) at the ith occasion for the jth individual is modelled as a linear function of time.

Embedded Image (1)

where Embedded Image is the time at which measurement Embedded Image was taken on individual Embedded Image , coded as zero at baseline. Embedded Image is the baseline population average response and Embedded Image represents the jth individual difference from the baseline response. The sum of Embedded Image is the estimated individual baseline response. Embedded Image represents the population average change per unit increase in time and Embedded Image represents the jth individual difference from the population average change per unit increase in time. The sum of Embedded Image is the estimated individual average change per unit increase in time. Measurement error in the linear trajectory is represented by Embedded Image .

The variance in individual deviations from the population average response at baseline and average rate of change are Embedded Image and Embedded Image , respectively. Furthermore, the correlation between baseline measurements and rate of change can be assumed to be independent or correlated by constraining Embedded Image to be zero or allowing it to be freely estimated. The variances of the shrunken residuals Embedded Image and Embedded Image , also known as empirical Bayes estimates, are typically less than the estimated population variances Embedded Image and Embedded Image as they shrink towards the population averages of Embedded Image and Embedded Image . The extent of the shrinkage depends on the number of measurement occasions and the within-individual variability, with greater shrinkage as the number of measurement occasions decrease and as the within-individual variance increases. A more detailed discussion of MLM can be found in most advanced statistics textbooks.48 51 52

We now describe how the four traditional approaches to measuring patient responsiveness can be unified into a MLM framework. General benefits of the MLM over existing approaches include: (1) with more than three measurement occasions, an MLM directly allows for measurement error, Embedded Image ; (2) the use of shrunken residuals Embedded Image and Embedded Image allows for regression to the mean when predicting an individual’s score53; (3) MLM can be extended to include multivariate response models which appropriately model the correlation between two or more outcomes and (4) MLM allows for variability in the timing of measurement occasions. Fundamentally, the MLM approach recognises that observed patient responses are subject to error, and therefore the true patient’s response following an intervention must be estimated.

MLM: return to normal 

To apply the RTN criteria using an MLM approach, we first estimate the baseline population SD in individuals considered to be abnormal using the model described in equation 1. Assuming Embedded Image is normally distributed at baseline with a population mean Embedded Image  and variance Embedded Image , prediction interval for the baseline measurement can be constructed, that is, Embedded Image where α is the type I error rate and z is the critical value from a standard normal distribution. Importantly, Embedded Image  is not assumed to be measured without error, and therefore estimates of Embedded Image are less likely to be biased than using simple methods. However, it is important to note that the choice of α is entirely that of the researcher, and while α=0.05 (leading to Embedded Image ) is common, more or less stringent criteria could be applied.

The second step is to estimate the score of the individual at time j following surgery and determine if it is within the baseline prediction interval. This prediction is simply calculated by substituting estimates of Embedded Image , Embedded Image , Embedded Image and Embedded Image into equation 1, to give the empirical best linear unbiased prediction or the jth individual at the ith occasion.54

Finally, to determine whether or not the response of the individual following surgery is greater than one would attribute to chance alone, that is, the null hypothesis that the jth individual's slope is not equal to zero, a test statistic similar to RCI should be conducted,

Embedded Image

MLM: minimally important difference 

The threshold of minimally important difference can also be estimated using an MLM. Similar to RTN, a linear model of change is applied, as in equation 1. Then the population SD of the baseline response is estimated by Embedded Image . By comparing the estimated change for the jth individual Embedded Image with the baseline SD, that is, Embedded Image , the individual can be classed as a responder or not. The MID approach does not specifically state whether a test of whether an individual’s change scores is less than the MID threshold should be conducted, but a test statistic is simply constructed as Embedded Image .

MLM minimal clinically important improvement 

The MLM MCII requires a simple extension of the univariate model presented previously (equation 1). The outcome of interest is stratified using an external criterion. The stratification is achieved by creating dummy variables for those who are unsatisfied/satisfied with some aspect of their treatment, for example, Embedded Image takes the values 0 and 1 representing unsatisfied and satisfied individuals, respectively, and Embedded Image . These dummy variables are then included as additional explanatory variables, with no overall model intercept, and interacted with t.

Embedded Image (2)

Therefore, Embedded Image  and Embedded Image  are the mean population outcome score at baseline for those who are satisfied and unsatisfied, respectively, and Embedded Image  and Embedded Image  are the corresponding mean population changes per unit of time. Variances and covariances are similarly interpreted for those who are satisfied and unsatisfied, respectively. However, that satisfaction on the external anchoring question is assumed to be known without error, and individual effects and errors for Embedded Image are uncorrelated with those for Embedded Image because the satisfied and unsatisfied categories are mutually exclusive. Whether or not it is desirable to fit a model to both satisfied and unsatisfied individuals simultaneously is debatable, as only those who are satisfied contribute to the definition of MCII. However, we present a simultaneous modelling approach to satisfied and unsatisfied individuals as it make the underlying modelling assumptions explicit. Furthermore, if the stratification on satisfaction status leads to small samples, alternative estimators and degree of freedom can be used in an MLM framework to account for this, that is, restricted maximum likelihood, restricted generalised least squares or adjustments to the denominator df.55

Following the prediction of each individual’s trajectory, including those unsatisfied with treatment, the second stage in the MCII method requires a threshold for determining responsiveness. Using a similar suggestion to Tubach et al,35 the 75th centile of those who are satisfied could be used to classify all individuals as responding or not. Similar to the MID, there is no suggestion of whether a test against the null value of the 75th centile should be constructed, but this is easily done within the MLM framework.

MLM: OO criteria 

The OO criteria can be similarly extended into a multivariate MLM framework by the inclusion of dummy variables and reshaping into a ‘double’ long format with both responses stored in a single vector. Figure 1 illustrates the data structure for a bivariate model.

Figure 1

Illustration of a ‘double’ long data set-up for creating a bivariate multilevel modelling.

Dummy variables, also known as response indicators, are used to denote the response options: Embedded Image is coded 1 for the first measurement outcome (pain) and 0 for the second outcome (function), and Embedded Image . The response indicators and their interactions with Embedded Image are included as explanatory variables to obtain the following bivariate response model.

Embedded Image (3)

With a similar functional form to the univariate MLM, there are separate population and individual intercepts for the first and second outcome (Embedded Image , respectively), and separate population and individual slopes are estimated for the second outcome Embedded Image . Using an MLM approach, the outcomes are modelled jointly, which allows for non-zero covariances between the intercepts and slopes of the two responses (Embedded Image ). The measurement errors for the two responses are not assumed to be independent, with their covariance directly estimated (Embedded Image ).

Finally, the threshold of response must be decided and individual trajectories estimated and classified. Similar to the other methods, it is relatively simple to construct a test statistic for testing whether individual slopes are significantly different from the chosen threshold.

Limitations of the MLM approach 

The MLM approach described by equation 1, equations 2 and 3 assumes that change in the outcome is linearly associated with time. The linearity assumption is imposed for simplicity. Non-linear changes are easily incorporated by including higher order polynomials or using linear or non-linear splines.56

The standard MLM approach also fails to directly address the issue of floor and ceiling effects. Mixed-response multilevel Tobit models allow for such effects and provide some adjustment.45 57 Furthermore, while the MLM described in equation 2 allows for heterogeneity in known groups, they fail to allow for heterogeneity in trajectories when the groups are unknown. The use of group-based trajectory models or growth mixture models in these circumstances may reveal latent (unobserved) classes of individuals with distinct patterns of recovery.58

Example: the APEX cohort study

Using a mixed cohort of patients undergoing total hip replacement (THR) and total knee replacement (TKR),41 we investigated the performance of the existing and MLM approaches using four definitions of responsiveness. A simulated data set and code to fit each of these models are included in the online supplementary material.

Patients in the APEX cohort completed the Intermittent and Constant Osteoarthritis Pain (ICOAP) questionnaire before and after surgery at approximately 0, 3, 6 and 12 months. The date at which the postsurgical questionnaire was completed is recorded in days postsurgery. As the name suggests, the ICOAP questionnaire attempts to measure intermittent and constant pain.21 The developers of the tool suggest three ways of summarising the scale to generate an intermittent, constant and total pain scores (the sum of the intermittent and constant pain subscales). The tool is scored between 0 and 100 and a full description of the ICOAP scale is provided in the original validation paper.21 Satisfaction of pain relief following surgery was recorded by asking patients to ‘Rate the relief of pain provided by (hip/knee) replacement’ using a single-item 5-point scale (none, poor, fair, good, excellent). We categorised good and excellent as a satisfactory outcome following surgery.

Using the three methods of aggregation, we present estimates of pain at baseline and for change at approximately 3 months postsurgery using existing methods (summary statistics) and MLM estimates.

To facilitate comparisons between existing and MLM approaches, we assume that all individuals are measured at exactly 0, 3, 6 and 12 months. While the existing approaches only uses the 0 and 3 month measurements, the MLM approach uses a random intercept and random slopes across four measurements occasions, using two linear splines with a knot point at 3 months to estimate the response at 3 months. The inclusion of the second spline and the additional two measurement occasions allows adjustment for measurement error in the MLM approach. Tables 1 and 2 presents results for patients undergoing THR and TKR, respectively. The placement of the knot at 3 months was determined by visually inspecting the data, similar to the methods by Lenguerrand et al.59 With more complex patterns of response an iterative model fitting approach is likely to be required to determine the optimal knot placement. Modelling assumptions were checked using ladder plots and normal plots of residuals.

Table 1

Mean and SD of baseline and change scores estimated using current and multilevel model approaches to responsiveness in a patient undergoing total hip replacement in the APEX cohort study

Table 2

Mean and SD of baseline and change scores estimated using current and MLM approaches to responsiveness in patient undergoing total knee replacement in the APEX cohort study

To describe how the responsiveness classification in patients changed at 3 months, we used an Exact McNemar test to compare the number of discordant classifications generated by existing and MLM approaches.

The APEX study was approved by Southampton and South West Hampshire Research Ethics Committee (09/H0504/94).


In all subdivisions of the ICOAP questionnaire, for THR/TKR patients, the estimates of the baseline mean and change scores are approximately equal to those from the MLM approaches. In addition, estimates of the SD of baseline and change score are overestimated using existing approaches in THR/TKR patients. The SD of baseline measurements of pain were approximately 3.3 and 3.75 points greater in existing methods compared with MLM methods in THR/TKR patients, respectively, while the corresponding SD of change scores are approximately 6.3 and 7 points greater in existing methods (see tables 1 and 2, respectively). An example of model diagnostics is included in figure 2, which presents the observed ICOAP total scores at 0, 3, 6 and 12 months and the population average response in ICOAP across time. In addition, baseline, change residuals are also presented using quantile–quantile plots.

Figure 2

Modelling diagnostic plots. Upper left, ladder plot of observed ICOAP total scores at 0, 3, 6 and 12 months following THR and population average trajectory estimated from a MLM, used in RTN and MID analysis, with two linear splines with a knot at 3 months. Upper right, lower left and right plots are quantile–quantile plots of the residual distribution of random effects estimated from an MLM with two linear splines with a knot at 3 months. ICOAP, intermittent and constant osteoarthritis pain; MID, minimally important difference; MLM, multilevel model; RTN, return to normal; THR, total hip replacement.

Return to normal

Using similar baseline score estimates to the conventional RTN approach and different SDs results in a reduction in the threshold of response by approximately five points in THR/TKR patients. The change in threshold is due to smaller estimates of baseline and change SDs. When considering the total ICOAP score, the MLM approach classifies approximately 10% more individuals as responders than existing approaches. It is also interesting to note that the threshold of response using the existing approach when considering total ICOAP score in THR patients is beyond the range of the score.

Minimally important difference

Using similar change score estimates and different SDs results in an approximately 2-point reduction in the MID threshold in THR/TKR patients. The reduced threshold results in more individuals being classified as responders using the MLM approach.

Minimally clinically important difference

Using the MLM approach in satisfied and unsatisfied individuals results in a small increase in the threshold of response in comparison with existing approaches. The increase in threshold is due to shrunken residuals and therefore reduced the variability of predicted change scores. The increase in threshold results in a reduced number of individuals (3% of THR patients and 6% of TKR patients) being identified as responders.


The OO approach uses fixed definitions of responsiveness. Individual estimates of change from the bivariate MLM for constant and intermittent pain are very similar to those from the univariate MLM. However, the SD of the change score is reduced by approximately 0.5 and 1 points in constant and intermittent pain comparing the univariate and bivariate MLM, respectively, whereas the SD of baseline score approximately the same. Despite the larger absolute threshold of 20 and 10 points for changes in one or two items, respectively, that is, larger than MID, there is an increase in the proportion of individuals identified as responding. The increase is partly due to the use of the relative change threshold and the reduced variability in change in comparison with the univariate MLM using MID definition of responsiveness.

Responsiveness classification

The effect of using a MLM approach to defining patient responsiveness compared with existing approaches is presented in tables 3 and 4 for THR and TKR patients,respectively. While the use of MLM provides refined thresholds of responsiveness, it fundamentally changes the way individuals are classified due to adjustment for measurement error, regression to the mean and ability to conduct refined tests. Patients previously defined as non-responding using existing methods are now responders (positive change) in MLM approaches, and similarly, patients defined as responders using existing methods are classified as non-responders (negative change) in MLM (see figure 3 for graphical illustration). MLM MID and OO methods appear to be most consistent in the reclassification of patients increasing the number of patients defined as non-responders using existing methods as responders in MLM approaches, whereas MLM RTN and MCII provide a more fundamental change the classifications of patient responsiveness.

Table 3

Cross-classification of responsiveness status in THR patients using existing and MLM model approaches to responsiveness: RTN, MID, MCII and OO criteria

Table 4

Cross-classification of responsiveness status in TKR patients using existing and MLM model approaches to responsiveness: RTN, MID, MCII and OO criteria

Figure 3

Change in responder classification using an RTN definition comparing existing approaches to MLM approach using the ICOAP total score in patients following THR. Upper left panel illustrates observed trajectories for patients whose responsiveness classification changes using an MLM approach to estimating responsiveness. Lower left panel illustrates the observed and predicted trajectories of ICOAP total score in patients positively reclassified as responders compared with existing approaches. Lower right panel illustrates the observed and predicted trajectories of ICOAP total score in patients negatively reclassified as non-responders compared with existing approaches. ICOAP, Intermittent and Constant Osteoarthritis Pain; MLM, multilevel model; RTN, return to work; THR, total hip replacement.


The primary purpose of a responsiveness analysis is to convey the variability of an individual’s chances of perceiving an improvement following a treatment. Existing approaches appear to be distinct from one another, and the precise relationship between existing methods was unclear.

We have clearly shown how four commonly used approaches to estimating patient responsiveness can be incorporated into the unified statistical framework of MLM. Their translation into unified framework makes many of the assumption (linearity of response, heterogeneity in the timing of measures, multiple measurements) underpinning existing approaches clear. The application of patient responsiveness models in a cohort of orthopaedic patients illustrates how SDs of baseline and change scores in existing approaches are overestimated in comparison with the MLM approach. Thresholds for defining responders from MLM are lower when based on SD, and therefore existing approaches to RTN and MID may appear to provide a worse case scenario with regards the efficacy of a treatment or therapy. Similarly, responsiveness approaches based on the distribution of predicted change scores (MCII) are higher in MLM, and therefore existing thresholds could be described as a best-case scenario in comparison with existing approaches. However, the reclassification of patients using the MLM is more fundamental than increasing or reducing the threshold to determine responsiveness, the implicit adjustments for measurement error and regression to the mean change which patients are defined as responding or not.

MLM are not the panacea of patient responsiveness methods; however, they do highlight implicit assumptions in existing approaches and provide sensible adjustments for measurement error, regression to the mean and heterogeneity in the timing of measurements in clinical studies.

From a clinical perspective, it is very clear there are differences in the outcomes at 3 months following THR and TKR, while patient’s baseline level of pain is similar between THR and TKR, the response to surgery is less and consistently less (lower variability) for all pain domains. Similarly, we have previously observed different patterns of pain, in relation to pain at rest and pain on movement,60 yet the mechanisms underpinning theses effects are unclear and require more research, but this emphasises the necessity to treat hip and knee osteoarthritis as separate disease states.

Strengths and limitations

One of the key benefits of adopting a MLM approach when defining clinically meaningful change is the improved estimation of individual change by the greater flexibility in the MLM framework. Specifically, MLM do not assume the response is measured without error, they adjust for regression to the mean, the trajectory of recovery is not constrained to be linear and data from multiple measurements and variability in the timing of those measurement occasions can also be incorporated into the model. Furthermore, assuming the underlying MLM adequately represents the true causal mechanism, parameter estimates, SDs and SEs will be unbiased in comparison with existing approaches.

Furthermore, the unification of existing approaches into a MLM framework clearly shows the relationship between the four different approaches. For example, RTN and MID share the same underlying model. MCII is also the same at RTN/MID if you assume the baseline and change scores are the same across strata of unsatisfied/satisfied patients. Similarly, the model underlying OO approach is the same as the RTN/MID approach if you assume independence in the measured outcomes of the two trajectories and the error term.

Despite the numerous benefits of adopting an MLM approach, it is not to say it is without some limitations. MLMs are technically more demanding than existing formulations of patient responsiveness, and while there are no theoretical limits on how large or small samples have to be, model convergence is not guaranteed. The need to use appropriate estimation methods38 or denominator degrees of freedom55 when calculating standard errors also requires consideration. Furthermore, it is important to perform model diagnostic to check the data fit with the model. MLM does not improve the arbitrary placement of the thresholds that define responsiveness in comparison with existing methods, and despite the improved trajectory modelling, it is currently unclear if the refined definitions correlate more strongly with patient expectations, functional data, long-term self-reported outcomes or hard endpoints such as mortality and revision. Further research externally validating the classification using patient groups, expert opinion61 or functional data may demonstrate improved classification of those responding to treatment in comparison with existing methods. In addition, the use of multiple measurements in MLM primarily restricts the method to a research setting.

It is clear the MLMs provide considerable advantages over existing approaches to identifying patients who respond to a treatment. Consequently, the proportion of individuals thought not to be responding to treatment may be smaller than previously thought. Using the redefined definition may reduce the number of individuals misclassified as non-responders and improve the prediction of those individuals who are likely to respond to treatment.

Supplementary Material

Supplementary material 1


We thank Professor Fiona Steele for her extensive comments and help preparing this manuscript. The research team acknowledges the support of the National Institute for Health Research (NIHR) through the Comprehensive Clinical Research Network.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
  58. 58.
  59. 59.
  60. 60.
  61. 61.
View Abstract


  • Contributors AS: study conception, wrote first draft and revisions and final approval of manuscript; VW: APEX study design acquisition of data, drafting and review of manuscript and final approval of manuscript; EL: APEX study acquisition of data, drafting and review of manuscript and final approval of manuscript; RG-H: APEX study design acquisition of data, drafting and review of manuscript and final approval of manuscript; JD: ACHE study design, drafting and review of manuscript and final approval of manuscript; DB: ACHE study design, drafting and review of manuscript and final approval of manuscript; AP: ACHE study design, drafting and review of manuscript and final approval of manuscript and AWB: APEX/ACHE study design acquisition of data, drafting and review of manuscript and final approval of manuscript.

  • Funding This work was supported by AS and is funded by an MRC Fellowship MR/L01226X/1 and HTA Project: 11/63/01—‘ACHE’. This article presents independent research funded by the NIHR under its Programme Grants for Applied Research programme (RP-PG-0407-10070).

  • Disclaimer The views expressed in this article are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Data are unavailable to share. CORRECT

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.