Association between surgical volume and failure of primary total hip replacement in England and Wales: findings from a prospective national joint replacement register

Objective To investigate the association of volume of total hip arthroplasty (THA) between consultants and within the same consultant in the previous year and the hazard of revision using multilevel survival models. Design Prospective cohort study using data from a national joint replacement register. Setting Elective THA across all private and public centres in England and Wales between April 2003 and February 2017. Participants Patients aged 50 years or more undergoing THA for osteoarthritis. Intervention The volume of THA conducted in the preceding 365 days to the index procedure. Main outcome and measure Revision surgery (excision, addition or replacement) of a primary THA. Results Of the 579 858 patients undergoing primary THA (mean baseline age 69.8 years (SD 10.2)), 61.1% were women. Multilevel survival found differing results for between and within-consultant effects. There was a strong volume–revision association between consultants, with a near-linear 43.3% (95% CI 29.1% to 57.4%) reduction of the risk of revision comparing consultants with volumes between 1 and 200 procedures annually. Changes in individual surgeons (within-consultant) case volume showed no evidence of an association with revision. Conclusion Separation of between-consultant and within-consultant effects of surgical volume reveals how volume contributes to the risk of revision after THA. The lack of association within-consultants suggests that individual changes to consultant volume alone will have little effect on outcomes following THA. These novel findings provide strong evidence supporting the practice of specialisation of hip arthroplasty. It does not support the practice of low-volume consultants increasing their personal volume as it is unlikely their results would improve if this is the only change. Limiting the exposure of patients to consultants with low volumes of THA and greater utilisation of centres with higher volume surgeons with better outcomes may be beneficial to patients.


Conclusion
Separation of between-and within-consultant effects of surgical volume reveals how volume contributes to the risk of revision after THA. The lack of association within-consultants suggests that individual changes to consultant volume alone will have little effect on outcomes following THA.
These novel findings provide strong evidence supporting the practice of specialisation of hip arthroplasty. It does not support the practice of low volume consultants increasing their personal volume as it is unlikely their results would improve if this is the only change.
Limiting the exposure of patients to consultants with low volumes of THA and greater utilization of centres with higher volume surgeons with better outcomes may be beneficial to patients.

Strengths and limitations of this study
 This is the largest study in the world to explore the association between surgical volume and outcomes in total hip arthroplasty  We uniquely calculate a time-varying exposure of surgical volume.
 We differentiate between-and within-consultant effects using a multi-level Weibull survival model.  The effect of volume is modelled continuously using restricted cubic splines.
Differentiating between between-and within-consultant effects is crucial to interpreting the data. A between-consultant effect is essentially a cross-sectional analysis that compares the performance of one consultant against another and is highly likely to be confounded by centre level effects.
[16] A within-consultant effect is based on a time series of individual data and compares changes of volume across time within the same consultant. Correspondingly, within effects can be interpreted more strongly, as the effect of changing a consultant's personal volume, assuming centre-level factors remain relatively constant over the short term analysis period.
The concept of between-and within-effects is well known in epidemiology, and the ecological fallacy is one example. For example, in a standard (single-level) regression analysis, we may observe a positive association between red meat consumption and life expectancy across several countries with differing levels of development. A between-and within-decomposition would reveal a positive between-country correlation, which is explained by the level of development, and a negative within-country association i.e. In order to facilitate a between-and within-effects analysis, consultant volume needs to be assessed continuously across time. Allowing volume to vary over time is computationally intensive, but responds to variation in demand and capacity to deliver arthroplasty, specialisation or diversification of professional practice, and is in contrast to previous approaches. [1,4] Additionally, as consultant volume changes, dichotomising the data using arbitrary thresholds, e.g. 12 operations per year, [3] is difficult to justify as consultants status will vary depending upon the time they are observed. Furthermore, the interpretation of the results is less likely to be distorted by the arbitrary placement of the threshold. [17,18] The aim of this research is to investigate the between-and within-consultant (surgeon) effect of the volume of primary total hip arthroplasty (THA) and the risk of subsequent revision.

Data source
The NJR commenced data collection in April 2003; at inception, it was mandatory for all THA conducted in the private sector to be entered into the NJR, and from 2011 all THA procedures in the public and private sector were required to be entered into the NJR. A recent national audit of data entered into the NJR between 2014 and 2015 estimated data capture of 95% for primary THA and 91% for revision THA.

Patient and public involvement
Patient representatives sit on the committee structure of the National Joint Registry. The research priorities of the National Joint Registry are identified by this committee structure and approved by the patient representatives. Patients were not involved in the setting of the research question or the outcome measures, nor were they involved in designing or implementing this work or interpretation of the results. We are unable to disseminate results of this study directly to study participants due to the anonymous nature of the data. We plan to disseminate our findings to the National Joint Registry, via their communications team, to relevant to the provision of joint replacement and to the general population through the local and national press. consenting patients, and prior to the application of inclusion and exclusion criteria, see Figure   1.

Confounding factors
Confounding factors were thematically organised into 5 groups: 1) Patient factors: Age, sex, American Society of Anesthesiologists (ASA) grade, and operation funder.

Statistical analyses
Means, standard-deviations and interquartile points were used to describe continuous variables. Frequencies and percentages were used to describe categorical variables. The The associations between consultant volume in the preceding 365-days of the index procedure and all-cause revision were explored using multi-level parametric (Weibull) survival models.
Between-and within-effects are decomposed using a process known as group mean centring. [21,22] Group mean centring is the process of creating two new variables from the primary exposure. The first variable is the consultant specific mean volume, and the second variable is the deviation of a consultant's volume for a given procedure from their personal average (i.e. consultant-mean centred volume). The between-effect is estimated using the coefficient of a consultant's mean volume, whereas the within-effect is the coefficient based on the consultants deviation from their average. Continuous variables were modelled used orthogonalized Restricted Cubic Splines (RCS), this also included the volume effect. We iteratively varied the number of knot points and placement strategies, in unadjusted models to optimize fit. The most parsimonious specification of RCS was selected using Akaike Information Criterion (AIC). [23] Confounding adjustment was conducted incrementally introducing patient, operation, centre, surgeon, and finally, deprivation confounding variable groups. The effect of confounding adjustment on the primary exposure of interest was explored at each stage of the model building process. All modelling was conducted in Stata 15.1. [24] Missing data Given the large data set and small fraction of incomplete cases among the observed (FICO), any improvement of efficiency from multiple imputation is likely to be negligible, and a complete-cases analysis would provide unbiased results. [25,26] Therefore, we have assumed   Table 4. Higher volume consultants were observed to treat patients who were younger, with lower ASA grade, who privately funded their THA, and received a uncemented or hybrid prostheses more frequently. They were also more likely to use a posterior approach, place patients in the lateral position, work in the private sector, predominately perform hip arthroplasty, and to treat patients from more deprived areas,

Discussion
We provide novel insights into the volume outcome relationship of 579,858 elective THA patients using a between-and within-decomposition to analyse the effects of consultant volumes on revision. Between different consultants, the volume of arthroplasty in the previous year is associated with a near linear 49% and 43% reduction in hazard ratio, between 1 and 200 procedures with revision THA in crude and fully adjusted models respectively. Within the same consultant we demonstrate that there is no evidence of an association between volume of THA in the previous year and risk of revision.
Uniquely, we use a time-varying volume specification that facilitates the decomposition of between-and within-consultant effects. We suggest the within-consultant effect is much closer to the causal interpretation desired by many policymakers, and failure of research to recognise the difference amongst between-and within-effects may lead to erroneous policy decisions and unintended consequences.
We demonstrate that between-consultant optimal results are reached when the consultant volumes in the previous year are approximately 200 procedures. We suggest these factors are not causally related to volume, but rather due to unmeasured factors such as centre effects.
There is no evidence to suggest that consultants should change their personal volume in the hope of improving their outcomes or that there is an arbitrary threshold where the outcome of results become good.
Whilst the results appear contradictory compared to previous research, the differences can be explained by the method of analysis, i.e. single-level models vs. multi-level model, and interpretation of results, i.e. separation of between-and within-consultant volume effects.
Single-level analyses assume that all procedures are independent of one another, so that a low volume consultant would achieve the results of a high-volume consultant if they could instantly increase their personal volume. The interpretation of a single-level is similar to that of the between-consultant interpretation. Whilst this may be an attractive interpretation for policymakers, it fails to recognise the complexity within the data and processes observed, and that there are many factors, intrinsic to each consultant and the centre or centres in which they work, that predispose them to be either low-or high-volume surgeons e.g. fellowship trained, threshold for revision, or unit organisation.
This study has a number of strengths and limitations. Strengths include: 1) our unique decomposition of between-and within-consultant which we believe provides a more actionable interpretation for policy makers. 2) Time-varying consultant volume, which is independent of the index procedure, and allows for stronger inferences. 3) OA was the only indication for arthroplasty, which we believe represents a "best case scenario", and the volume effect will only be attenuated by the inclusion of other diagnoses. 4) We demonstrate the use of RCS to model volume effects, which ensures flexibility and a smooth continuous function, emphasising the lack of any threshold in the volume effect. 5) The study is significantly, 10 times larger, than any other published study on this topic, [1][2][3][4][5][6] with a follow-up period of more than 13 years. 6) we have conducted extensive case-mix adjustment and illustrate that both between-and within-effects are insensitive to our measured confounding factors.
Despite the many strengths, this study has a number of limitations. 1) The analysis and decomposition of between-and within-consultant effects is more complex than traditional analyses assume, and requires careful interpretation. 2) Despite the independent nature of the volume of arthroplasty calculation prior to the index procedure, we still see anticipated associations, i.e. younger patients, patients with lower ASA scores, patients receiving uncemented implants and patients operated on in the private sector all tend to be treated by higher volume consultants. Consultants specialising in THA and working principally within confounders. [27] 3) The use of a single indication for arthroplasty, namely OA, may limit the generalisability of results particularly in regard to Black or Asian ethnicities where OA is not as dominant an indication for THA. [28] 4) The use of RCS requires extensive sensitivity analyses to ensure knot points are placed optimally, and that results are not sensitive to knot placement. 5) Calculation of the time-varying volume specification of THA in the previous year is computationally intensive and requires significant parallelisation before analyses can be started. 6) Our covariates are unlikely to capture important centre differences in staffing, organization and policy in the management of THAs We suggest the within-consultant effect from the multi-level regression is much closer to the causal interpretation required by consultants, patients, and policymakers i.e. what is the effect of changes in personal volume on the hazard of revision THA? This is not to say the between-effect is not of interest to policy makers, but to say that the between-effect suggests that there are intrinsic differences between high and low volume consultants, where higher volume consultants tend to have better outcomes, but these differences cannot be attributed to volume per se. Understanding variability between units independent of patient case mix has been explored in other registries such as the UK renal Registry. [16]

Conclusion
In summary, using data from the largest arthroplasty register in the world, we have demonstrated that there is no within-consultant association between surgical volume in the previous year and the risk of revision in patients undergoing primary THA for OA. Whereas there is strong evidence to suggest higher volume consultants tend to have better outcomes for reasons that are unlikely to be due to the volume of arthroplasty in the previous year per se. The results from this study have profound implications for quality improvement within healthcare. Encouraging consultants to undertake a minimum number of procedures under the guise of raising standards could be counterproductive and may only serve to expose patients to increased risk of revision by low or previously low volume consultants. Centralisation and specialisation of THA in consultants who for reasons, not including volume, can undertake a greater number of procedures is likely to benefit patients and reduce the revision burden overall.

Design
Prospective cohort study using data from a national joint replacement register.

Elective total hip arthroplasty (THA) across all private and public centres in England and
Wales between April 2003 and February 2017.

Participants
Patients age 50 years or more undergoing THA for osteoarthritis.

Intervention
The volume of THA conducted in the preceding 365-days to the index procedure.

Main Outcome and measure
Revision surgery (excision, addition or replacement) of a primary THA.

Conclusion
Separation of between-and within-consultant effects of surgical volume reveals how volume contributes to the risk of revision after THA. The lack of association within-consultants suggests that individual changes to consultant volume alone will have little effect on outcomes following THA.
These novel findings provide strong evidence supporting the practice of specialisation of hip arthroplasty. It does not support the practice of low volume consultants increasing their personal volume as it is unlikely their results would improve if this is the only change.
Limiting the exposure of patients to consultants with low volumes of THA and greater utilization of centres with higher volume surgeons with better outcomes may be beneficial to patients.

Strengths and limitations of this study
 This is the largest study in the world to explore the association between surgical volume and outcomes in total hip arthroplasty  We uniquely calculate a time-varying exposure of surgical volume.
 We differentiate between-and within-consultant effects using a multi-level Weibull survival model.  The effect of volume is modelled continuously using restricted cubic splines.
 We are unable to affirm causality due to the observational nature of the data

Funding Statement
AS is funded by an MRC Strategic Skills Fellowship MR/L01226X/1

Competing interests statement
We declare no competing interests

Keywords
Total Hip Arthroplasty, Volume, Revision, Within surgeon effects, Between surgeon effects.

Introduction
Centralisation and specialisation in medical care are advocated to optimise a theorised volume-outcome relationship. In arthroplasty, the volume-outcome relationship has been investigated with respect to outcomes including surgical revision, 1-6 mortality, 1 6-9 patient reported outcome measures (PROMS), 10 and complications 6 11-14 where volume is measured either by surgeon [1][2][3][4][5][6] or hospital annual volume. 15 Given the technical requirements of arthroplasty, a strong argument for specialisation based on surgical volume and the risk of revision arthroplasty would exist if volume was causally related to outcome. The evidence to support this assertion is surprisingly sparse and has methodological limitations. [1][2][3][4][5][6] The principal limitation is the failure to distinguish between-and within-consultant effects.
Differentiating between-and within-consultant effects is crucial to interpreting the data. A between-consultant effect is essentially a cross-sectional analysis that compares the performance of one consultant against another and is highly likely to be confounded by centre level effects. 16 A within-consultant effect is based on individual time series data and compares changes of volume across time within the same consultant. Correspondingly, within effects can be interpreted more strongly, as the effect of changing a consultant's personal volume, assuming centre-level factors remain relatively constant over the short term analysis period.
The concept of between-and within-effects is well known in epidemiology, and the ecological fallacy is one example. For example, in a standard (single-level) regression analysis, we may observe a positive association between red meat consumption and life expectancy across several countries with differing levels of development. A between-and within-decomposition would reveal a positive between-country correlation, which is explained by the level of development, and a negative within-country association i.e. individuals within a country who eat more red meat have a lower life expectancy. The decomposition of the volume effect into a between-and within-effect is similar i.e. the between-consultant effect is explained by factors, other than volume, which are intrinsic to those consultants and the hospital where they are based ("centre effects"), whereas the within-consultant effect represents the consequence of consultants individually changing their personal volume assuming that other factors remain constant.
In order to facilitate a between-and within-effects analysis, consultant volume needs to be assessed continuously across time. Allowing volume to vary over time is computationally intensive, but responds to variation in demand and capacity to deliver arthroplasty, specialisation or diversification of professional practice, and is in contrast to previous approaches. 1 4 Additionally, as consultant volume changes, dichotomising the data using arbitrary thresholds, e.g. 12 operations per year, 3 is difficult to justify as a consultant's volume will vary over the time they are observed. Furthermore, the interpretation of the results is less likely to be distorted by the arbitrary placement of the threshold. 17 18 The aim of this research is to investigate the between-and within-consultant (surgeon) effect of the volume of primary total hip arthroplasty (THA) and the risk of subsequent revision.

Data source
The NJR commenced data collection in April 2003; at inception, it was mandatory for all THA conducted in the private sector to be entered into the NJR, and from 2011 all THA procedures in the public and private sector were required to be entered into the NJR. A recent national audit of data entered into the NJR between 2014 and 2015 estimated data capture of 95% for primary THA and 91% for revision THA.

Ethical Approval
Pseudo anonymised analysis of NJR data is considered as secondary use of clinical registry data, under HRA guidance this does not require formal ethical approval. The full NJR privacy notice can be found online (http://www.njrcentre.org.uk/njrcentre/About-the-NJR/Privacy-Notice-GDPR).

Patient and public involvement
Patient representatives sit on the committee structure of the National Joint Registry. The research priorities of the National Joint Registry are identified by this committee structure and approved by the patient representatives. Patients were not involved in the setting of the research question or the outcome measures, nor were they involved in designing or implementing this work or interpretation of the results. We are unable to disseminate results of this study directly to study participants due to the anonymous nature of the data. We plan to disseminate our findings to the National Joint Registry, via their communications team, to

Inclusion/exclusion criteria
All consenting patients undergoing THA were eligible for inclusion in the analysis. Patients were included if their patient history was unique and consistent, i.e. contained no duplicates, ipsilateral revision prior to the primary, or currently held in query by the submitting unit. Due to the requirement of reliable date information, patients who were indicated to have died prior to undergoing a procedure, were more than 110 years of age, had undergone a procedure prior to their date of birth, or received a procedure prior to 2003 were excluded. Only primary THA, where the sole indication for operation was osteoarthritis (OA) with unique prosthesis combinations, were included. All metal-on-metal bearing combinations were excluded from the analysis due to the known exceptionally high failure rate in this group. 19 20 Consultants with less than 365-days of data were excluded as were patients who were less than 50 years of age at the date of the index THA, because these cases are highly likely to be due to secondary OA. See Figure 1 and Figure 2 for a detailed breakdown of inclusion criteria.

Primary outcome
The primary outcome of interest is all cause revision after a primary THA. Revision arthroplasty was identified by the inclusion of a revision specific data upload after a primary ipsilateral THA. We note that it is not always the primary surgeon that performs the revision.

Censoring
Patients were censored following death. Death status was established by linking patients to the NHS Personal Demographic tracing service.

Primary exposure
The primary exposure of interest in this study was the consultant surgical volume of any THA recorded in the NJR in the preceding 365-days prior to the index procedure in consenting patients, and prior to the application of inclusion and exclusion criteria, see Figure   1. We choose a 365 day period as this represents one calendar year, and this effectively integrates out seasonal variation from the volume definition,

Confounding factors
Confounding factors were thematically organised into 5 groups:  Table 1.

Statistical analyses
Means, standard-deviations and interquartile points were used to describe continuous variables. Frequencies and percentages were used to describe categorical variables. The association between confounding factors and consultant volume was explored by comparing summary statistics between levels of each factor.
Graphical methods including frequency distributions, and empirical cumulative distributions were used to describe the relative frequency and centiles of the volume distributions. The empirical cumulative distribution allows centiles of the distribution to be quickly identified.
The associations between consultant volume in the preceding 365-days of the index procedure and all-cause revision were explored using multi-level parametric (Weibull) survival models.
Between-and within-effects are decomposed using a process known as group mean centring. 21 22 Group mean centring is the process of creating two new variables from the primary exposure. The first variable is the consultant specific mean volume, and the second variable is the deviation of a consultant's volume for a given procedure from their personal average (i.e. consultant-mean centred volume). The between-effect is estimated as the coefficient of a consultant's mean volume, whereas the within-effect is the coefficient of the consultant's deviation from their average. Continuous variables were modelled used orthogonalized Restricted Cubic Splines (RCS), this also included the volume effect. We iteratively varied the number of knot points and placement strategies, in unadjusted models to optimize fit. The most parsimonious specification of RCS was selected using Akaike Information Criterion (AIC). 23 Confounding adjustment was conducted incrementally introducing patient, operation, centre, surgeon, and finally, deprivation confounding variable groups. The effect of confounding adjustment on the primary exposure of interest was explored and presented at each stage of the model building process allowing the effect of adjustment to be clearly illustrated. All modelling was conducted using the mestreg package in Stata 15.1. 24 . The specification of the model is described in more detail in the supplementary material.

Missing data
Given the large data set and small fraction of incomplete cases among the observed (FICO) (89% of all eligible records were included in the analysis), any improvement in efficiency from multiple imputation is likely to be negligible, and a complete-cases analysis would provide unbiased results. 25 26 Therefore, we have assumed the reason for missingness is independent of both the outcome and primary exposure of interest.   Table 2 and Supplementary Table 3 (Figure 4). We also note that the centre IQR of centre volume, is much less variable than consultant volume, but is more negatively skewed at centre level, (Figure 4).
Summary statistics of consultant volume by confounding factors at the index procedure are listed in Supplementary Table 4. Higher volume consultants were observed to treat patients who were younger, with lower ASA grade, who privately funded their THA, and received a uncemented or hybrid prostheses more frequently. They were also more likely to use a posterior approach, place patients in the lateral position, work in the private sector,  Table 4

Discussion
We provide novel insights into the volume-outcome relationship of 579,858 elective THA patients using a between-and within-decomposition to analyse the association of consultant volumes on revision. Between different consultants, the volume of arthroplasty in the previous year is associated with a near linear 49% and 43% reduction in hazard ratio, between 1 and 200 procedures with revision THA in crude and fully adjusted models respectively. Within the same consultant we demonstrate that there is no evidence of an association between volume of THA in the previous year and risk of revision.
Uniquely, we use a time-varying volume specification that facilitates the decomposition of between-and within-consultant effects. We suggest the within-consultant effect is much closer to the causal interpretation desired by many policymakers, and failure of research to recognise the difference amongst between-and within-effects may lead to erroneous policy decisions and unintended consequences.
We demonstrate that optimal between-consultant results are reached when the consultant volumes in the previous year are approximately 200 procedures. We suggest these factors are not causally related to volume, but rather due to unmeasured surgeon, patient and/or centre factors. There is no evidence to suggest that consultants should change their personal volume in the hope of improving their outcomes or that there is an arbitrary threshold where the outcome of results become good.
Whilst the results appear contradictory compared to previous research i.e. no threshold volume effect, the differences may be explained by the method of analysis, i.e. single-level models vs. multi-level model, and interpretation of results, i.e. separation of between-and within-consultant volume effects. Previous analyses are single-level analyses assume that all procedures are independent of one another, so that a low-volume consultant would achieve  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  The interpretation of a single-level model is similar to that of the between-consultant interpretation. Whilst this may be an attractive interpretation for policymakers, it fails to recognise the complexity of the data and processes observed, and that there are many factors, intrinsic to each consultant and the centre or centres in which they work, that predispose them to be either low-or high-volume surgeons e.g. fellowship trained, threshold for revision, or unit organisation.
This study has a number of strengths and limitations. Strengths include: 1) our unique decomposition of between-and within-consultant which we believe provides a more actionable interpretation for policy makers. 2) Time-varying consultant volume, which is independent of the index procedure, and allows for stronger inferences. 3) OA was the only indication for arthroplasty, which we believe represents a "best case scenario", and the volume effect will only be attenuated by the inclusion of other diagnoses. 4) We demonstrate the use of RCS to model volume effects, which ensures flexibility and a smooth continuous function, emphasising the lack of any threshold in the volume effect. 5) The study is significantly, 10 times larger, than any other published study on this topic, 1-6 with a maximal follow-up period of more than 13 years. 6) we have conducted extensive case-mix adjustment and illustrate that both between-and within-effects are insensitive to our measured confounding factors.
Despite the many strengths, this study has a number of limitations. 1) The analysis and decomposition of between-and within-consultant effects is more complex than traditional analyses assume, and requires careful interpretation. 2) Despite the independent nature of the volume of arthroplasty calculation prior to the index procedure, we still see anticipated associations, i.e. younger patients, patients with lower ASA scores, patients receiving uncemented implants and patients operated on in the private sector all tend to be treated by  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  We suggest the within-consultant effect from the multi-level regression is much closer to the causal interpretation required by consultants, patients, and policymakers i.e. what is the effect of changes in personal volume on the hazard of revision THA? This is not to say the between-effect is not of interest to policy makers, but to say that the between-effect suggests that there are intrinsic differences between high and low volume consultants i.e. expertise, where higher volume consultants tend to have better outcomes, but these differences cannot be attributed to volume per se. We suggest our analyses illustrate "State vs Trait" behaviour.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   17 Where between-consultant association illustrates the "traits" of surgeons, and withinconsultant associations illustrate their "state". This is to say traits of experienced high volume surgeons with good outcomes are unaffected by changes to their personal volume.
Conversely, low volume inexperienced arthroplasty surgeons who transiently increase their personal volume do not improve their outcomes.

Conclusion
In summary, using data from the largest arthroplasty register in the world, 29 we have demonstrated that there is no within-consultant association between surgical volume in the previous year and the risk of revision in patients undergoing primary THA for OA. Whereas there is strong evidence to suggest higher volume consultants tend to have better outcomes for reasons that are unlikely to be due to the volume of arthroplasty in the previous year per se.

Design
Prospective cohort study using data from a national joint replacement register.

Elective total hip arthroplasty (THA) across all private and public centres in England and
Wales between April 2003 and February 2017.

Participants
Patients age 50 years or more undergoing THA for osteoarthritis.

Intervention
The volume of THA conducted in the preceding 365-days to the index procedure.

Conclusion
Separation of between-and within-consultant effects of surgical volume reveals how volume contributes to the risk of revision after THA. The lack of association within-consultants suggests that individual changes to consultant volume alone will have little effect on outcomes following THA.
These novel findings provide strong evidence supporting the practice of specialisation of hip arthroplasty. It does not support the practice of low volume consultants increasing their personal volume as it is unlikely their results would improve if this is the only change.
Limiting the exposure of patients to consultants with low volumes of THA and greater utilization of centres with higher volume surgeons with better outcomes may be beneficial to patients.

Strengths and limitations of this study
 This is the largest study in the world to explore the association between surgical volume and outcomes in total hip arthroplasty  We uniquely calculate a time-varying exposure of surgical volume.

Competing interests statement
We declare no competing interests
Differentiating between-and within-consultant effects is crucial to interpreting the data. A between-consultant effect is essentially a cross-sectional analysis that compares the performance of one consultant against another and is highly likely to be confounded by centre level effects. 16 A within-consultant effect is based on individual time series data and compares changes of volume across time within the same consultant. Correspondingly, within effects can be interpreted more strongly, as the effect of changing a consultant's personal volume, assuming centre-level factors remain relatively constant over the short term analysis period.
The concept of between-and within-effects is well known in epidemiology, and the ecological fallacy is one example. For example, in a standard (single-level) regression analysis, we may observe a positive association between red meat consumption and life expectancy across several countries with differing levels of development. A between-and within-decomposition would reveal a positive between-country correlation, which is explained by the level of development, and a negative within-country association i.e. individuals within a country who eat more red meat have a lower life expectancy. The  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   6 decomposition of the volume effect into a between-and within-effect is similar i.e. the between-consultant effect is explained by factors, other than volume, which are intrinsic to those consultants and the hospital where they are based ("centre effects"), whereas the within-consultant effect represents the consequence of consultants individually changing their personal volume assuming that other factors remain constant.
In order to facilitate a between-and within-effects analysis, consultant volume needs to be assessed continuously across time. Allowing volume to vary over time is computationally intensive, but responds to variation in demand and capacity to deliver arthroplasty, specialisation or diversification of professional practice, and is in contrast to previous approaches. 1 4 Additionally, as consultant volume changes, dichotomising the data using arbitrary thresholds, e.g. 12 operations per year, 3 is difficult to justify as a consultant's volume will vary over the time they are observed. Furthermore, the interpretation of the results is less likely to be distorted by the arbitrary placement of the threshold. 17 18 The aim of this research is to investigate the between-and within-consultant (surgeon) effect of the volume of primary total hip arthroplasty (THA) and the risk of subsequent revision.

Data source
The NJR commenced data collection in April 2003; at inception, it was mandatory for all THA conducted in the private sector to be entered into the NJR, and from 2011 all THA procedures in the public and private sector were required to be entered into the NJR. A recent national audit of data entered into the NJR between 2014 and 2015 estimated data capture of 95% for primary THA and 91% for revision THA.

Ethical Approval
Pseudo anonymised analysis of NJR data is considered as secondary use of clinical registry data, under HRA guidance this does not require formal ethical approval. The full NJR privacy notice can be found online (http://www.njrcentre.org.uk/njrcentre/About-the-NJR/Privacy-Notice-GDPR).

Patient and public involvement
Patient representatives sit on the committee structure of the National Joint Registry. The research priorities of the National Joint Registry are identified by this committee structure and approved by the patient representatives. Patients were not involved in the setting of the research question or the outcome measures, nor were they involved in designing or implementing this work or interpretation of the results. We are unable to disseminate results of this study directly to study participants due to the anonymous nature of the data. We plan to disseminate our findings to the National Joint Registry, via their communications team, to

Inclusion/exclusion criteria
All consenting patients undergoing THA were eligible for inclusion in the analysis. Patients were included if their patient history was unique and consistent, i.e. contained no duplicates, ipsilateral revision prior to the primary, or currently held in query by the submitting unit. Due to the requirement of reliable date information, patients who were indicated to have died prior to undergoing a procedure, were more than 110 years of age, had undergone a procedure prior to their date of birth, or received a procedure prior to 2003 were excluded. Only primary THA, where the sole indication for operation was osteoarthritis (OA) with unique prosthesis combinations, were included. All metal-on-metal bearing combinations were excluded from the analysis due to the known exceptionally high failure rate in this group. 19 20 Consultants with less than 365-days of data were excluded as were patients who were less than 50 years of age at the date of the index THA, because these cases are highly likely to be due to secondary OA. See Figure 1 and Figure 2 for a detailed breakdown of inclusion criteria.

Primary outcome
The primary outcome of interest is all cause revision after a primary THA. Revision arthroplasty was identified by the inclusion of a revision specific data upload after a primary ipsilateral THA. We note that it is not always the primary surgeon that performs the revision.

Censoring
Patients were censored following death. Death status was established by linking patients to the NHS Personal Demographic tracing service.

Primary exposure
The primary exposure of interest in this study was the consultant surgical volume of any THA recorded in the NJR in the preceding 365-days prior to the index procedure in consenting patients, and prior to the application of inclusion and exclusion criteria, see Figure   1. We choose a 365 day period as this represents one calendar year, and this effectively integrates out seasonal variation from the volume definition,

Confounding factors
Confounding factors were thematically organised into 5 groups:  Table 1.

Statistical analyses
Means, standard-deviations and interquartile points were used to describe continuous variables. Frequencies and percentages were used to describe categorical variables. The association between confounding factors and consultant volume was explored by comparing summary statistics between levels of each factor.
Graphical methods including frequency distributions, and empirical cumulative distributions were used to describe the relative frequency and centiles of the volume distributions. The empirical cumulative distribution allows centiles of the distribution to be quickly identified.
The associations between consultant volume in the preceding 365-days of the index procedure and all-cause revision were explored using multi-level parametric (Weibull) survival models.
Between-and within-effects are decomposed using a process known as group mean centring. 21 22 Group mean centring is the process of creating two new variables from the primary exposure. The first variable is the consultant specific mean volume, and the second variable is the deviation of a consultant's volume for a given procedure from their personal average (i.e. consultant-mean centred volume). The between-effect is estimated as the coefficient of a consultant's mean volume, whereas the within-effect is the coefficient of the consultant's deviation from their average. Continuous variables were modelled used orthogonalized Restricted Cubic Splines (RCS), this also included the volume effect. Each RCS is centred at meaningful value, these values are listed in supplementary material, mean between consultant volume is centred at 32 procedures annually, whereas within consultant volume is naturally centred at zero. We iteratively varied the number of knot points and placement strategies, in unadjusted models to Confounding adjustment was conducted incrementally introducing patient, operation, centre, surgeon, and finally, deprivation confounding variable groups. The effect of confounding adjustment on the primary exposure of interest was explored and presented at each stage of the model building process allowing the effect of adjustment to be clearly illustrated. All modelling was conducted using the mestreg package in Stata 15.1. 24 . The specification of the model is described in more detail in the supplementary material.

Discussion
We provide novel insights into the volume-outcome relationship of 579,858 elective THA patients using a between-and within-decomposition to analyse the association of consultant volumes on revision. Between different consultants, the volume of arthroplasty in the previous year is associated with a near linear 49% and 43% reduction in hazard ratio, between 1 and 200 procedures with revision THA in crude and fully adjusted models respectively. Within the same consultant we demonstrate that there is no evidence of an association between volume of THA in the previous year and risk of revision.
Uniquely, we use a time-varying volume specification that facilitates the decomposition of between-and within-consultant effects. We suggest the within-consultant effect is much closer to the causal interpretation desired by many policymakers, and failure of research to recognise the difference amongst between-and within-effects may lead to erroneous policy decisions and unintended consequences.
We demonstrate that optimal between-consultant results are reached when the consultant volumes in the previous year are approximately 200 procedures. We suggest these factors are not causally related to volume, but rather due to unmeasured surgeon, patient and/or centre factors. There is no evidence to suggest that consultants should change their personal volume in the hope of improving their outcomes or that there is an arbitrary threshold where the outcome of results become good. It is important to add that these volume changes are for experienced consultants who have already passed early improvements that one might observe at the early trainee level due to practice. Furthermore, their low volume for hip replacement may reflect high volume experience for other procedures thereby ensuring manual dexterity albeit for a different operation. Whilst the results appear contradictory compared to previous research i.e. no threshold volume effect, the differences may be explained by the method of analysis, i.e. single-level models vs. multi-level model, and interpretation of results, i.e. separation of between-and within-consultant volume effects. Previous analyses are single-level analyses assume that all procedures are independent of one another, so that a low-volume consultant would achieve the results of a high-volume consultant if they could instantly increase their personal volume.
The interpretation of a single-level model is similar to that of the between-consultant interpretation. Whilst this may be an attractive interpretation for policymakers, it fails to recognise the complexity of the data and processes observed, and that there are many factors, intrinsic to each consultant and the centre or centres in which they work, that predispose them to be either low-or high-volume surgeons e.g. fellowship trained, threshold for revision, or unit organisation.
This study has a number of strengths and limitations. Strengths include: 1) our unique decomposition of between-and within-consultant which we believe provides a more actionable interpretation for policy makers. 2) Time-varying consultant volume, which is independent of the index procedure, and allows for stronger inferences. 3) OA was the only indication for arthroplasty, which we believe represents a "best case scenario", and the volume effect will only be attenuated by the inclusion of other diagnoses. 4) We demonstrate the use of RCS to model volume effects, which ensures flexibility and a smooth continuous function, emphasising the lack of any threshold in the volume effect. 5) The study is significantly, 10 times larger, than any other published study on this topic, 1-6 with a maximal follow-up period of more than 13 years. 6) we have conducted extensive case-mix adjustment and illustrate that both between-and within-effects are insensitive to our measured confounding factors.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   17 We suggest the within-consultant effect from the multi-level regression is much closer to the causal interpretation required by consultants, patients, and policymakers i.e. what is the effect of changes in personal volume on the hazard of revision THA? This is not to say the between-effect is not of interest to policy makers, but to say that the between-effect suggests that there are intrinsic differences between high and low volume consultants i.e. expertise, where higher volume consultants tend to have better outcomes, but these differences cannot be attributed to volume per se. We suggest our analyses illustrate "State vs Trait" behaviour.
Where between-consultant association illustrates the "traits" of surgeons, and withinconsultant associations illustrate their "state". This is to say traits of experienced high volume surgeons with good outcomes are unaffected by changes to their personal volume.
Conversely, experienced low volume arthroplasty surgeons who transiently increase their personal volume do not improve their outcomes.

Conclusion
In summary, using data from the largest arthroplasty register in the world, 29 we have demonstrated that there is no within-consultant association between surgical volume in the previous year and the risk of revision in patients undergoing primary THA for OA. Whereas there is strong evidence to suggest higher volume consultants tend to have better outcomes for reasons that are unlikely to be due to the volume of arthroplasty in the previous year per se.