This study has three objectives. (1) Investigate the association between body mass index (BMI) and the efficacy of primary hip replacement using a patient-reported outcome measure (PROMs) with a measurement floor and ceiling, (2) Explore the performance of different estimation methods to estimate change in PROMs score following surgery using a simulation study and real word data where data has measurement floors and ceilings and (3) Lastly, develop guidance for practising researchers on the analysis of PROMs in the presence of floor and ceiling effects.

Simulation study and prospective national medical device register.

National Register of Joint Replacement and Medical Devices.

Using a Monte Carlo simulation study and data from a national joint replacement register (162 513 patients with pre- and post-surgery PROMs), we investigate simple approaches for the analysis of outcomes with floor and ceiling effects that are measured at two occasions: linear and Tobit regression (baseline adjusted analysis of covariance, change-score analysis, post-score analysis) in addition to linear and multilevel Tobit models.

The primary outcome of interest is change in PROMs from pre-surgery to 6 months post-surgery.

Analysis of data with floor and ceiling effects with models that fail to account for these features induce substantial bias. Single-level Tobit models only correct for floor or ceiling effects when the exposure of interest is not associated with the baseline score. In observational data scenarios, only multilevel Tobit models are capable of providing unbiased inferences.

Inferences from pre- post-studies that fail to account for floor and ceiling effects may induce spurious associations with substantial risk of bias. Multilevel Tobit models indicate the efficacy of total hip replacement is independent of BMI. Restricting access to total hip replacement based on a patients BMI can not be supported by the data.

We use a comprehensive simulation study and large prospective study set to investigate the effect of floor and ceiling effects in the analysis of change in patient-reported outcome measure pre- post-surgery.

We demonstrate the use and performance of mutlilevel Tobit models to estimate change in patient-reported outcome measures with floor and ceiling effects and compare them to simple analytical approaches.

We compare and demonstrate a variety of estimators in simulation under a variety of different data generating mechanisms and compare results to real world data.

This is the largest and most comprehensive analysis of the effect of body mass index on the efficacy of total hip replacement and provides data which will influence the provision of hip replacement.

In many non-randomised experiments, researchers are interested in assessing how change in health status is associated with a covariate of interest. While there is much guidance available on assessing change in randomised experiments, and extensive discussion with respect to efficiency and bias,

MLTMs are now incorporated in mainstream statistical software packages, such as Stata version 15. Given their accessibility, they could arguably be used more frequently than they are. This is relevant considering that the use of measurement instruments with floor and ceiling effects are omnipresent in health-related research. Examples include outcomes in health-related quality of life (eg, EQ-5D, SF-36 and SF-12), psychological well-being (eg, Hospital Anxiety and Depression Scale, Edinburgh Postnatal Depression Scale) and disease-specific measures of well-being (eg, Western Ontario and McMaster Universities Osteoarthritis Index and Oxford Hip Score (OHS) as used in patients with osteoarthritis (OA)). Despite this, there is very little guidance available with respect to the consequences of using measurement instruments with floor or ceiling effects, when attempting to make inferences about the effect of an exposure on the change (between two time points) of an outcome of interest.

In this paper, we use a Monte Carlo (MC) simulation study to compare the performance of multilevel linear and Tobit models, ordinarily least squares (OLS) regression and single-level Tobit regression, with and without adjustment for baseline scores, in the analysis of change in three different non-randomised experiments and a randomised experiment. We also demonstrate the use of these models using real world data from a large national joint replacement register.

We motivate the simulation and exemplar data analysis using an example from joint replacement research describing the association between body mass index (BMI) and the change in a disease specific patient-reported outcome measure (PROM), the OHS. The issue is contentious in the UK

We investigated the performance of four different methods of analysis, when estimating the effect of an exposure (BMI) on change in response (PROM) before and after THR with floor and ceiling effects using the Aims, Data Generating Process (DGP), Methods, Estimand, Performance approach recommended by Morris

We simulated longitudinal data of ‘well-being’ before and after surgery. We assume that ‘well-being’ is a latent, truly continuous and stable construct which is measured imperfectly by the OHS. Measurement error and floor/ceiling effects are then added to the latent construct to illustrate their consequences.

We assume the response, well-being, is a latent construct ^{th}
^{th}
^{2} (under weight), −1= 18.5<BMI≤25 kg/m^{2} (normal), 0 = 25<BMI≤30 kg/m^{2} (overweight), 1 = 30<BMI≤35 kg/m^{2} (obese), and 2= BMI >35 kg/m^{2} (morbidly obese), i.e.

where ^{th}
^{th}

Under the assumption of linear change, data were simulated from a multilevel model (MLM) with a random intercept and slope, see

Graphical illustration of a multilevel random intercept and slope model used to generate data for a individual with average BMI. BMI, body mass index.

The observed response without floor and ceiling effects (

A response with floor and ceiling effects

See

Graphical illustration of the data generating process of the latent, measured and measured response with floor and ceiling effects. The latent response is

We compared four DGPs to illustrate a range of scenarios by manipulating

DGP 2 replicates a simple randomised trial where there is no difference between levels of the exposure at baseline (

Graphical illustration of the four DGP used to investigate the effect of floor and ceiling effects on analysis of pre-surgery and post-surgery change with BMI as an exposure. Horizontal red lines at 0 and 48 indicate floor and ceilings of the measurement instrument. BMI, body mass index; DGP, data generating process.

We conducted an MC simulation with 1000 replicated datasets, each with 10 000 patients. A balanced dataset, that is, three data points for each individual, was simulated to ensure identification of the linear and Tobit MLMs occurred, that is, two data points allow estimation of baseline and change parameters but not measurement error. The middle data point was then dropped to replicate a pre- post- study design.

For data sets with three measurement occasions, a linear MLM and an MLTM that reflects the DGP were fitted to the data, see

In datasets with two measurement occasions, that is, a pre- post- study design, single-level OLS and Tobit models were fitted to the data. Tobit models were only used when floor and ceiling effects had been simulated. Three different models were explored:

1. A simple model for post-surgery well-being.

2. A SACS.

3. A model for change adjusted for baseline, that is, baseline adjusted analysis of covariance (ANCOVA). This model is equivalent to a model for the post-score adjusted for baseline ANCOVA, with the exception of the interpretation of the intercept.

In addition, an underidentified MLTM, equivalent to

The estimand of interest is the population average effect of the interaction between the exposure and change in slope, that is,

The performance of each method was explored in terms of bias, coverage, empirical SE, model-based SE, mean square error, relative error and relative precision.

Using data from the National Joint Registry (NJR), we investigated the association between BMI and a PROM, the OHS, in patients undergoing elective THR between 1 April 2003 and 22 February 2017.

The NJR commenced data collection in April 2003; at inception it was mandatory for all THRs conducted in the private sector to be entered into the NJR, and from 2011 all THR procedures in the public and private sector were required to be entered into the NJR. A recent national audit of data entered into the NJR between 2014 and 2015 estimated data capture of 95% for primary THR and 91% for revision THR.

All consenting patients undergoing THR were eligible to be included in the analysis. Patients were included if their patient history was unique and consistent, that is, contained no duplicates, revision prior to primary, or currently held in query by the submitting unit. Due to the requirement for reliable date information, patients who were indicated to have died prior to undergoing a procedure, were more than 110 years of age, had undergone a procedure prior to their date of birth, or received a procedure prior to 2003 were excluded from the analysis. Only primary THRs, where the primary indication for operation was OA with unique prosthesis combinations were included in the analysis. All THRs with metal-on-metal bearing combinations were excluded from the analysis due to the exceptionally high failure rate in this group.

See

The primary exposure of interest in this study is BMI. BMI was introduced into the second ‘Minimal Data Set’ in 2004. Patients with BMI between 10 and 60 were included in the analysis. BMI measures were excluded as implausible if height and weight measures were less than 130 cm and weight less than 30 kg, respectively. See

The primary outcome of interest in this study is change in OHS after surgery. Linked National PROMs were first available in 2009, see

Preoperative confounding factors were thematically organised into groups: (1) Patient factors included sex, American Society of Anesthesiologists grade and operation funder; (2) Operation factors included fixation, approach, patient position during surgery, anaesthetic type, thromboprophylaxis regimen, bearing and year of primary THR; (3) The setting of the treatment episode (ie, private or National Health Service hospital); (4) Consultant-based factors included the training status of the primary surgeon performing the operation and (5) Deprivation factors were based on the English indices of multiple deprivation (an area-based index of deprivation).

Means, SD and IQR points were used to describe continuous variables. Frequencies and percentages were used to describe categorical variables.

The association between change in PROMS score was investigated using the same single-level methods and the ML Tobit model with constrained error variances described in the simulation study as an exemplar. In addition, we conducted more comprehensive analyses using restricted cubic splines (RCS) to model the BMI association in the ML Tobit model with constrained error variance, single-level linear and Tobit SACS, ANCOVA and post-score models. In the ML Tobit model, BMI was modelled with RCS at baseline and its interaction with time. Correspondingly, we adjusted OHS for patient and deprivation confounding factors at baseline and operation, setting and confounding factors with an interaction with time, that is, operative factors and settings influence the change in outcome but not the baseline response. In single-level models, the effect of BMI was modelled using RCS and adjusted for confounding factors using standard regression approaches.

Due to the method of data collection in the national PROMS programme, item non-response is masked. De facto mean imputation of up to two missing items in the OHS occurred automatically. In addition, despite valid values appearing with individual OHS items, if the questionnaire was marked as ‘not complete’, implausible overall scores were obtained. For simplicity, only patients with complete pre-operative and post-operative PROMS were used in the analysis. BMI is missing in a substantial proportion of the cohort. Patients prior to 2004 did not have BMI recorded, and the proportion of patients with missing BMI in 2004 is large. In 2009, ~40% of patients did not have BMI recorded; this reduced year on year and in 2016 was ~18% of eligible patients.

For pedagogical simplicity, we use complete-case analyses throughout.

Patient representatives sit on the committee structure of the NJR. The research priorities of the NJR are identified by this committee structure and approved by the patient representatives. Patients were not involved in the setting of the research question or the outcome measures, nor were they involved in designing or implementing this work or interpretation of the results. We are unable to disseminate results of this study directly to study participants due to the anonymous nature of the data. We plan to disseminate our findings to the NJR, via their communications team, to relevant individuals with regard to the provision of joint replacement and to the general population through the local and national press.

Plot of 1000 estimates by each DGP, for each method of analysis. Within each method, the vertical axis is the repition number of each simulated dataset. The white pipe symbol is the average of the estimates. ANCOVA, analysis of covariance; DGP, data generating process; MLM, multilevel model; OLS, ordinarily least squares; SACS, Simple Analysis of Change Scores.

Simulation estimates of performance characteristics including mean and Monte Carlo Standard Error in parantheses of different models using each DGP.

Model | DGP 1: | DGP 2: | DGP 3: | DGP 4: | ||||

Estimate | ||||||||

MLM | 1.1 | (0.0024) | −1.36 | (0.0023) | −0.26 | (0.0024) | −0.23 | (0.0025) |

ML Tobit | −0.0056 | (0.0038) | −3.03 | (0.0037) | −3.04 | (0.0037) | −3.01 | (0.0037) |

ML Tobit | −0.13 | (0.0044) | −3.01 | (0.0042) | −3.13 | (0.0046) | −2.57 | (0.0038) |

ML Tobit | −0.093 | (0.0044) | −3.09 | (0.0042) | −3.14 | (0.0045) | −3.05 | (0.0041) |

ML Tobit | −0.057 | (0.0044) | −3.09 | (0.0042) | −3.12 | (0.0045) | −3.11 | (0.0043) |

ML Tobit | −0.04 | (0.0044) | −3.08 | (0.0042) | −3.11 | (0.0045) | −3.12 | (0.0044) |

ML Tobit | −0.031 | (0.0044) | −3.07 | (0.0042) | −3.1 | (0.0045) | −3.11 | (0.0044) |

ML Tobit | −0.026 | (0.0044) | −3.07 | (0.0042) | −3.09 | (0.0045) | −3.1 | (0.0045) |

OLS SACS | 1.1 | (0.0024) | −1.36 | (0.0023) | −0.26 | (0.0024) | −0.23 | (0.0025) |

OLS ANCOVA | −0.31 | (0.0022) | −1.36 | (0.002) | −1.69 | (0.0021) | −2.43 | (0.0019) |

OLS Post | −1.36 | (0.0023) | −1.36 | (0.0022) | −2.72 | (0.0023) | −2.69 | (0.0018) |

Tobit SACS | −0.72 | (0.0044) | −3.04 | (0.0041) | −3.78 | (0.0044) | −4.83 | (0.0048) |

Tobit ANCOVA | −0.5 | (0.0046) | −3.09 | (0.0042) | −3.61 | (0.0045) | −5.5 | (0.0042) |

Tobit Post | −3.07 | (0.0049) | −3.06 | (0.0047) | −6.13 | (0.005) | −6.14 | (0.004) |

Coverage | ||||||||

MLM | 0 | (0) | 0 | (0) | 0 | (0) | 0 | (0) |

ML Tobit | 94.8 | (0.7) | 95.3 | (0.67) | 93.9 | (0.76) | 95.4 | (0.66) |

ML Tobit | 67.5 | (1.48) | 86.7 | (1.07) | 71.6 | (1.43) | 4.6 | (0.66) |

ML Tobit | 83.5 | (1.17) | 85 | (1.13) | 77.4 | (1.32) | 92.2 | (0.85) |

ML Tobit | 91.1 | (0.9) | 87.7 | (1.04) | 85.7 | (1.11) | 86.8 | (1.07) |

ML Tobit | 92.7 | (0.82) | 90 | (0.95) | 88.2 | (1.02) | 87.1 | (1.06) |

ML Tobit | 93.4 | (0.79) | 91.6 | (0.88) | 89.5 | (0.97) | 88.2 | (1.02) |

ML Tobit | 93.6 | (0.77) | 91.9 | (0.86) | 91.1 | (0.9) | 88.9 | (0.99) |

OLS SACS | 0 | (0) | 0 | (0) | 0 | (0) | 0 | (0) |

OLS ANCOVA | 0.8 | (0.28) | 0 | (0) | 0 | (0) | 0 | (0) |

OLS post | 0 | (0) | 0 | (0) | 2.4 | (0.48) | 0 | (0) |

Tobit SACS | 0.1 | (0.1) | 94.4 | (0.73) | 0 | (0) | 0 | (0) |

Tobit ANCOVA | 7.2 | (0.82) | 89.7 | (0.96) | 1.2 | (0.34) | 0 | (0) |

Tobit post | 0 | (0) | 93.3 | (0.79) | 0 | (0) | 0 | (0) |

Model SE | ||||||||

MLM | 0.074 | (2E-05) | 0.074 | (2E-05) | 0.077 | (2E-05) | 0.078 | (2E-05) |

ML Tobit | 0.12 | (3E-05) | 0.12 | (3E-05) | 0.12 | (3E-05) | 0.12 | (3E-05) |

ML Tobit | 0.1 | (3E-05) | 0.1 | (3E-05) | 0.11 | (3E-05) | 0.11 | (4E-05) |

ML Tobit | 0.12 | (3E-05) | 0.12 | (3E-05) | 0.13 | (4E-05) | 0.12 | (3E-05) |

ML Tobit | 0.13 | (4E-05) | 0.13 | (4E-05) | 0.14 | (4E-05) | 0.13 | (3E-05) |

ML Tobit | 0.13 | (5E-05) | 0.13 | (5E-05) | 0.14 | (5E-05) | 0.14 | (4E-05) |

ML Tobit | 0.14 | (5E-05) | 0.13 | (5E-05) | 0.14 | (6E-05) | 0.14 | (5E-05) |

ML Tobit | 0.14 | (5E-05) | 0.13 | (5E-05) | 0.15 | (6E-05) | 0.14 | (5E-05) |

OLS SACS | 0.074 | (2E-05) | 0.074 | (2E-05) | 0.077 | (2E-05) | 0.078 | (2E-05) |

OLS ANCOVA | 0.07 | (2E-05) | 0.065 | (2E-05) | 0.072 | (2E-05) | 0.059 | (2E-05) |

OLS post | 0.07 | (2E-05) | 0.07 | (2E-05) | 0.072 | (2E-05) | 0.055 | (2E-05) |

Tobit SACS | 0.13 | (5E-05) | 0.13 | (5E-05) | 0.14 | (6E-05) | 0.15 | (6E-05) |

Tobit ANCOVA | 0.14 | (6E-05) | 0.14 | (6E-05) | 0.15 | (6E-05) | 0.13 | (6E-05) |

Tobit post | 0.15 | (7E-05) | 0.15 | (6E-05) | 0.16 | (7E-05) | 0.13 | (6E-05) |

ANCOVA, analysis of covariance; DGP, data generating process; MLM, multilevel model; OLS, ordinarily least squares; SACS, Simple Analysis of Change Scores.

Plot of 1000 estimated SEs by each DGP, for each method of analysis. Within each method, the vertical axis is the repition number of each simulated dataset. The white pipe symbol is the average of the SEs. ANCOVA, analysis of covariance; DGP, data generating process; MLM, multilevel model; OLS, ordinarily least squares; SACS, Simple Analysis of Change Scores.

Following application of inclusion and exclusion criteria, there were 162 513 patients with pre-operative and post-operative OHS available for analysis.

Estimate and 95% CIs of constrained ml Tobit, Single-level OLS and Tobit: ANCOVA, sacs and post-models. ANCOVA, analysis of covariance; ML, multilevel; NJR, National Joint Registry; OLS, ordinarily least squares; SACS, Simple Analysis of Change Scores.

Estimates and 95% CIs of baseline and change in Oxford Hip Score (OHS) pre- post-surgery and its association with body mass index (BMI) adjusted for confounding.

Estimates and 95% CIs of single-level approaches to the analysis of change in Oxford Hip Score pre- post-surgery and its association with body mass index (BMI) adjusted for confounding. ANCOVA, analysis of covariance; OLS, ordinarily least squares; SACS, Simple Analysis of Change Scores.

A single-level OLS SACS appoach suggests a positive association between BMI and change in OHS, that is, patients with greater BMIs have greater gains in well-being, whereas OLS ANCOVA and OLS post-score models suggest a negative association. The single-level post-model score is approximately 50% greater than the ANCOVA model. All single-level Tobit models suggest a negative association between BMI and OHS. The Tobit SACS model is the smallest, with both the Tobit ANCOVA and post-models estimating substantially larger effects. The constrained ML Tobit models all provide equivalent (to two decimal places) results, suggesting there is no effect of BMI on the change in OHS pre- and post-surgery, see

Crude analyses, which model BMI using RCS, illustrate a complex association between BMI and pre-operative OHS. A~4.5 point reduction in OHS is observed as BMI increases between 20 and 50 kg/m^{2}. However, the change in OHS between pre-surgery and post-surgery is very weakly associated with pre-operative BMI, with individuals with BMIs <25 kg/m^{2} and >45 kg/m^{2} receiving modestly greater gains than those patients with an average BMI of 28 kg/m^{2}. However, with less than ½ a unit variation across the range of BMI observed in the cohort, the difference falls well below anything that could be considered clinically meaningful, see

The results of the simulation study clearly illustrate that, in the presence of floor and ceiling effects, neither baseline adjustment, or SACS will yield unbiased estimates of the effect of an exposure on the outcome of interest. Single-level modifications to account for floor and ceiling effects such as the Tobit model only work in the context of a randomised trial, that is, when there is no difference between baseline values by BMI. Importantly, single-level methods, OLS and Tobit models, induce significant bias, with negligible coverage, when

The simulation study is consistent with a lay intuition with respect to analyses of floor and ceiling effects. Assuming we accept that either the MLM and OLS change analyses are appropriate in the absence of floor and ceiling effects, DGP 1 illustrates that when there is no effect of obesity on the efficacy of surgery, the addition of an artificial ceiling compresses the gain of individuals towards the top of the distribution. Due to the baseline association between obesity and well-being, underweight individuals tend to have gains that are more compressed compared with obese individuals. This inevitably induces bias, and provides evidence of a change in presurgery and post-surgery well-being by BMI, where none actually exist. Similarly, in DGP 2 (no baseline differences) where there is truly an interaction effect, will also lead to biased estimates. The DGP used in the simulation assumes underweight individuals benefit more from surgery than heavier individuals, which results in a fanning out of the trajectories. Underweight individuals have truly greater gains than obese individuals, but these gains are underestimated due to the ceiling effect, resulting in bias towards the null. In DGPs 3 and 4 (baseline differences in BMI, and interaction between BMI and change), we see a more extreme pattern of results compared with DGP 2, but overall consistency with the expected response of compressing individual gains which have initially higher starting values.

In the exemplar analysis of NJR data, the pattern of results is very similar to that of DGP 1 of the simulation, suggesting that results of the simulation are likely to be replicated in real-world datasets. The more comprehensive analysis of the NJR data, using RCS to reflect the continuous nature of BMI, aptly illustrate where the effects from misspecified single-level models are arising from. The ML Tobit model illustrates a strong negative association between BMI and pre-operative OHS, and failing to account for these baseline differences appropriately when attempting to estimate change leads to variation at baseline being incorporated in the estimate of change. Furthermore, the ability to adjust both baseline and post-surgical OHS for their pronounced floor and ceiling effects, respectively, leads to unbiased estimates of the effect of interest. Unfortunately, due to the constraints on the level 1 variance, interpretation of the random effects are difficult, as they depend on the magnitude of the variance applied in the constraint, see

Floors and ceilings in PROM instruments have somewhat predictable effects on estimated coefficients from standard OLS models that do not adjust for floor or ceiling effects, assuming the true underlying association is known. As this is rarely the case, it is important to consider a variety of different DGP to explore the likely impact on an analysis. It is important to consider the validity of the assumptions underpinning the Tobit model, that is, that the latent response is truly continuous and that there is a true ceiling just beyond the range of the measurement being used.

Single-level Tobit models do not ameliorate floor and ceiling effects in SACS. However, ML Tobit models appear to recover the effects of interest under specific assumptions. The analyses of pre- post-designs require further constraints to ensure models are fully identified. The difference between analytical approaches can profoundly alter the interpretation of the model parameters, and this may have serious consequences if used to generate policy inappropriately. For example, inappropriate analyses that fail to consider DGP appropriately may lead to the restriction of joint replacement for overweight or obese patients.

When designing a study to investigate the effect of an exposure on change in health status, it would be preferable to use a measurement instrument that does not have floor or ceiling effects as inference is less complicated, and design trumps analysis in most scenarios. If the use of measurement instrument with floor and ceiling effects is unavoidable, it is preferable to collect data at three time points which ensure models are fully identified, alleviating the need to constrain level 1 variance in order to identify models, again design trumps analysis. If retrospective analysis of pre- and post-data sets are required, it appears that using ML Tobit model with constrained level 1 error variance would be preferable to single-level approaches.

Broadly speaking the analyses of this simulation are in agreement with the work of Glymour

We thank the patients and staff of all the hospitals who have contributed data to the National JointRegistry. We are grateful to the Healthcare Quality Improvement Partnership, the National JointRegistry Steering Committee, and staff at the National Joint Registry for facilitating this work. Theviews expressed represent those of the authors and do not necessarily reflect those of the NationalJoint Registry Steering Committee or Healthcare Quality Improvement Partnership, who do not vouchfor how the information is presented.

AS, MRW, AJ, AJM, AWB and YB-S were responsible for the study design, AS conducted the data analysis. AS, MRW, AJ, AJM, AWB and YB-S were responsible for interpreting the data. AS, MRW, AJ, AJM, AWB and YB-S prepared and edited and approved the final manuscript.

AS is funded by an MRC Strategic Skills Fellowship MR/L01226X/1. This study was supported by the NIHR Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol.

The views expressed represent those of the authors and do not necessarily reflect those of the National Joint Registry Steering Committee or Healthcare Quality Improvement Partnership, who do not vouch for how the information is presented.

Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

Not required.

Ethics approval of pseudo anonymised analysis of NJR data is considered as secondary use of clinical registry data, under HRA guidance this does not require formal ethical approval. However, all research projects are internally approved by the NJR. The full NJR privacy notice can be found online (

Not commissioned; externally peer reviewed.

Data may be obtained from a third party and are not publicly available. Access to the data can be made via research requests to the National Joint Registry of England, Wales, Northern Ireland and the Isle of Man. Full details can be found at