Article Text

Original research
Patient-reported outcome measures following revision knee replacement: a review of PROM instrument utilisation and measurement properties using the COSMIN checklist
  1. Shiraz A Sabah,
  2. Elizabeth A Hedge,
  3. Simon G F Abram,
  4. Abtin Alvand,
  5. Andrew J Price,
  6. Sally Hopewell
  1. Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
  1. Correspondence to Shiraz A. Sabah; shiraz.sabah{at}ndorms.ox.ac.uk; Shiraz A Sabah; shiraz.sabah{at}ndorms.ox.ac.uk

Abstract

Objectives To identify: (1) patient-reported outcome measures (PROMs) used to evaluate symptoms, health status or quality of life following discretionary revision (or re-revision) knee joint replacement, and (2) validated joint-specific PROMs, their measurement properties and quality of evidence.

Design (1) Scoping review; (2) systematic review following the COnsensus-based Standards for selection of health status Measurement INstruments (COSMIN) checklist.

Data sources MEDLINE, Embase, AMED and PsycINFO were searched from inception to 1 July 2020 using the Oxford PROM filter unlimited by publication date or language.

Eligibility criteria for selecting studies Studies reporting on the development, validation or outcome of a joint-specific PROM for revision knee joint replacement were included.

Results 51 studies reported PROM outcomes using eight joint-specific PROMs. 27 out of 51 studies (52.9%) were published within the last 5 years. PROM development was rated ‘inadequate’ for each of the eight PROMs studied. Validation studies were available for only three joint-specific PROMs: Knee Injury and Osteoarthritis Outcome Score (KOOS), Lower Extremity Activity Scale (LEAS) and Western Ontario and McMaster Universities Arthritis Index (WOMAC). 25 out of 27 (92.6%) measurement properties were rated insufficient, indeterminate or not assessed. The quality of supporting evidence was mostly low or very low. Each of the validated PROMs was rated ‘B’ (potential for recommendation but require further evaluation).

Conclusion Joint-specific PROMs are increasingly used to report outcomes following revision knee joint replacement, but these instruments have insufficient evidence for their validity. Future research should be directed toward understanding the measurement properties of these instruments in order to inform clinical trials and observational studies evaluating the outcomes from joint-specific PROMs.

  • knee
  • musculoskeletal disorders
  • adult orthopaedics

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This is the first study to apply the Consensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist to report the quality of patient-reported outcome measure (PROM) development and validation studies for discretionary revision knee joint replacement.

  • Our search strategy was based on the Oxford PROM filter, which has been shown to be a sensitive tool for identifying relevant studies.

  • PROM instruments that were not patient completed were excluded, which maintained a patient-focus, but limited the number of eligible instruments for evaluation.

  • While our study has critically summarised PROM measurement properties, qualitative studies may be needed in the future to provide deeper insights into the outcomes from revision knee replacement that are most important to patients.

Introduction

Primary knee replacement is a successful procedure that improves quality of life for the majority of patients by reducing pain and improving joint function.1 However, not all patients achieve a good outcome. For example, approximately 13% of patients are dissatisfied with their outcome following knee replacement,2 with higher rates in younger patients3 and those with partial thickness cartilage loss.4 Many of these patients are managed with supportive treatment.5 However, at 10 years following primary knee replacement, 3.5% of patients will have undergone revision surgery. In total, 6500 revision knee replacement procedures are performed each year in the UK.6 The majority of these procedures (~85%) are for discretionary indications, where the goal of surgery is to improve joint function and quality of life.6 These contrast to non-discretionary procedures (such as for infection or fracture), which are necessary to prevent catastrophic joint failure or new comorbidity. To measure the success or otherwise of the outcome from discretionary revision knee replacement, one important aspect is the ability to measure pain and joint function from the perspective of the patient.

Patient-reported outcome measures (PROMs) are widely used for this purpose in lower limb surgery. Many PROMs aim to report quality of life and functional outcomes, while others assess sporting performance, activities of daily living or psychological health. However, not all have optimal measurement properties.7 8 For primary knee replacement, many PROMs have good quality evidence for their validity.9 10 This has facilitated utilisation of PROMs to support patient choice and manage healthcare providers,2 11 12 with many schemes also including revision procedures. A prominent example is the NHS PROMs programme,2 which has collected data from more than 10 000 patients who have undergone revision knee replacement.13 However, interpretation of this data has been critically limited by a lack of PROM validation.

Revision knee replacement is one of the most expensive procedures in modern healthcare14 and high-quality PROM data is important to evaluate cost-effectiveness.15 While generic PROMs can be used to compare patients with different conditions, they may miss important items in specific populations.16 The COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) initiative provides tools to aid systematic reviews and selection of measurement instruments.17 The ideal PROM is developed or subsequently validated in the population of interest, has good measurement properties (GMP) and is supported by high-quality evidence. PROM instruments meeting these criteria can be selected for a core outcome set in order to standardise outcome measurement. If there are no suitable PROMs, then further validation studies may be required or the development of a new PROM. For discretionary revision knee replacement, no systematic review has evaluated PROMs in current use, their measurement properties or the quality of this evidence. This limits meta-analysis of previous research and design of future trials.

The aims of this review were: (1) to scope the literature to identify PROMS in current use for evaluation of symptoms, health status or quality of life following discretionary revision (or re-revision) knee replacement, and (2) to identify validated joint-specific PROMs, their measurement properties and quality of evidence.

Methods

This section is structured to follow the COSMIN Handbook and a figure to illustrate our methods is provided in an online supplemental appendix 1.17

Patient and Public Involvement

Patients and the public were involved in the design, or conduct, or reporting, or dissemination plans of our research. This article was motivated by the James Lind Alliance Priority Setting Partnership for revision knee replacement,18 particularly the question: ‘How should we measure the outcomes following revision knee surgery in a way that is meaningful to patients?

Part A: aim and literature search

Step 1: Aims

Described above.

Step 2: Study eligibility criteria

Randomised and non-randomised studies were eligible for inclusion. Revision knee replacement was defined as any procedure where an arthroplasty component was removed, modified or added. This included isolated liner exchange, secondary patellar resurfacing and re-revision procedures. Studies where the majority of procedures were performed for non-discretionary indications (such as infection or malignancy) were excluded, as well as amputations and arthrodesis procedures. Since 85% of revisions are for discretionary indications, studies where the indication was not specified were deemed eligible for inclusion. PROMs were required to address one of the following domains:

  • Pain (eg, Western Ontario and McMaster Universities Arthritis Index (WOMAC) pain subscale19),

  • Function (eg, WOMAC functional limitation subscale),

  • Combined pain and function (eg, Oxford Knee Score20),

  • Joint-related health status (eg, Knee Injury and Osteoarthritis Outcome Score (KOOS) quality of life (QOL)21), or

  • Patient activity (eg, Lower Extremity Activity Scale (LEAS22).

Collectively, we have termed these ‘joint-specific’ PROMs. The focus of this study was not to examine generic health-related quality of life instruments (eg, EQ-5D23). However, we did report the use of these instruments in conjunction with a joint-specific PROM. Outcome scores not considered to be patient-centred were excluded; for example, surgeon-completed scores such as the Bristol Knee Score (BKS) and the Knee Society Score (KSS). Studies with less than 50 patients were excluded as their sample size would be considered inadequate when applying COSMIN rules for rating of measurement properties and evidence quality.10

Step 3: search strategy

This is provided in online supplemental appendix 2. MEDLINE, Embase, AMED and PsycINFO were searched on 1 July 2020 using the Oxford PROM filter.24 Searches were translated for each database. There were no limitations on language or publication date. The citations of included studies were searched to identify additional articles.

Step 4: study selection

Two authors (SAS and EAH) independently reviewed title and abstract for all records returned by the search against eligibility criteria. Disagreement was resolved through discussion of the full text publication. Data were extracted using a calibrated form on name and type of PROM, geography, journal, year of publication and number of patients. Data were summarised using counts with percentage frequency for each of the data items collected.

Part B: evaluation of measurement properties of the included PROMs

Steps 5, 6 and 7: content validity, internal structure, reliability and responsiveness

Descriptions of terminology for measurement properties are provided in online supplemental appendix 3. Each measurement property was evaluated in three separate sub-steps:

Substep 1: evaluation of methodological quality

Two authors (SAS and SGFA) independently evaluated the measurement properties in each article against the COSMIN Risk of Bias checklist. A priori hypotheses for construct validity and responsiveness were set (online supplemental appendix 4, table 1). Study quality was assessed separately for each measurement property using a four-point rating system (very good, adequate, doubtful or inadequate). The ‘worst score counts’ principle was used, where the overall rating for each measurement property is given by the lowest rating of any standard in the box.25

Substep 2: application of criteria for GMP

Two authors (SAS and SGFA) independently extracted data on: PROM characteristics (intended construct for measurement, measurement properties, method of administration), study sample (number of patients, patient demographics, diagnosis) and study details (setting, country, language). The few disagreements were resolved through discussion. The results from each study on a measurement property were assigned a quality rating as: sufficient (+), insufficient (−) or indeterminate (?).

Substep 3: summary and grading of quality of evidence

This section refers to rating the quality of the PROM as a whole. PROMs were qualitatively summarised and assigned a four-point quality rating. A modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach (omitting publication bias) was used to assign evidence quality as high, moderate, low or very low.26

Part C: selecting a PROM

Step 8: description of interpretability and feasibility

Interpretability and feasibility were analysed descriptively as per COSMIN guidance.17

Step 9: formulation of recommendations

PROMs were categorised into three categories: (A) Sufficient content validity and at least low-quality evidence for internal consistency; (B) Between ‘A’ and ‘C’; and (C) High-quality evidence for an insufficient measurement property. PROMs rated ‘A’ can be recommended for use. PROMs rated ‘B’ have potential for recommendation but require further evaluation. PROMs rated ‘C’ should not be recommended.

Step 10: reporting of the systematic review

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram is provided in figure 1.

Figure 1

PRISMA flow diagram. The full search strategy is provided in online supplemental appendix 2. PROM, patient-reported outcome measure.

Results

Part A

Study selection

One thousand two hundred and five unique articles were identified for screening. Sixty-six full text articles were assessed for eligibility. Fifty-one studies were included in the scoping review, reporting on eight joint-specific PROMs. Four studies met inclusion criteria for PROM validation, describing measurement properties for three PROMs (figure 1).

Characteristics of studies reporting PROM outcomes for revision knee replacement

Fifty-one studies reported on PROM outcomes (tables 1 and 2) recruiting a median of 104 (range 51–1391) patients. Study designs included 1 (2.0%) randomised controlled trial, 14 (27.5%) prospective cohort studies, 29 (56.9%) retrospective cohort studies, 3 (5.9%) reports from national joint registries, 3 (5.9%) cross-sectional surveys and 1 (2.0%) data analysis of routinely collected secondary care data. Twenty-five studies (49.0%) were from Europe, 19 (37.3%) from North America, 6 (11.8%) from Asia and 1 (2.0%) from Australasia. The joint-specific PROMs reported were the WOMAC Index (25 studies, 49.0%), Oxford Knee Score (OKS) (19 studies, 37.3%), KOOS (8 studies, 15.7%), LEAS (4 studies, 7.8%), University of California at Los Angeles Activity Score (UCLA, 4 studies, 7.8%), Kujala score (2 studies, 3.9%), Lower Extremity Functional Scale (LEFS, 2 studies, 3.9%) and the Lysholm score (1 study, 2.0%). The majority of studies were published within the past 5 years (27/51 (52.9%) studies) (online supplemental appendix 4, figure 1).

Table 1

Characteristics of studies reporting PROMs for revision knee replacement

Table 2

Summary characteristics for studies reporting PROMs following revision knee replacement

Part B

Quality of PROM development studies

The quality of PROM development for the eight disease-specific PROMs identified in Part A is summarised in table 3. The construct to be measured was clear in two studies (25%), with the remainder rated ‘inadequate’. One example of a study rated ‘inadequate’ was the Kujala study.27 This rating was made because, while the score was designed to measure anterior knee symptoms, the specific aspects of these symptoms to be measured were not described (such as pain intensity or pain interference). The Lysholm score28 was rated ‘very good’ due to a specific description (defining ‘the lowest activity level needed during walking, running or jumping to produce giving way or pain and swelling’). The origin of the construct to be measured was clear in only two studies (25.0%). One example of a study rated ‘very good’ for this property was the LEFS study,29 which referenced the WHO’s International Classification of Functioning, Disability and Health (ICF) conceptual framework.30 The context of use was rated ‘very good’ for three studies (37.5%). These studies provided at least one clear description of the intended application of the instrument. For example, the OKS was designed to evaluate patients before and after knee replacement surgery.20 All studies were rated as ‘very good’ for their description of a clear target population. While many studies provided a very broad description (eg, the LEFS described patients ‘with lower extremity orthopaedic conditions’31), the COSMIN guidance is permissive for rating this property. However, the PROM development sample was rated ‘inadequate’ for all studies either because the patient sample was not correspondingly broad or, taking a view on the patient sample of interest in this review, did not recruit a sample representative of discretionary revision knee replacement. While the LEAS study did recruit patients with revision knee replacements for some aspects of PROM development, a surgeon panel was used in lieu of patients for content validity, justifying an ‘inadequate’ rating.29 In summary, the total PROM development was rated ‘inadequate’ for all studies based on the ‘worst score counts’ principle recommended by COSMIN. However, this does not reflect positive ratings for some aspects of PROM development as described above.

Table 3

Quality of PROM development

Characteristics of PROM validation studies

Four studies22 29 32 33 from the scoping review validated three joint-specific PROMs (KOOS, LEAS, WOMAC) (table 4). The mean age of patients in the included studies ranged from 67 to 77 years. Female patients accounted for 50% to 78% of the study populations. The primary objective of the included articles varied from validation of a PROM, validation of another instrument with the PROM as a comparator, development of a new instrument and reporting of clinical outcome after revision knee replacement. The characteristics of the PROMs included in the validation studies are described in table 5.

Table 4

Characteristics of PROM validation studies

Table 5

Characteristics of the joint-specific PROMs evaluated in validation studies

Quality of studies on measurement properties

In total, 20 measurement properties for the KOOS, LEAS and WOMAC were evaluated (table 6). There were 40 additional opportunities to evaluate measurement properties that were not attempted. Two (10.0%) measurement properties were rated ‘very good’, 5 (25.0%) ‘adequate’, 3 (15.0%) ‘doubtful’ and 10 (50.0%) ‘inadequate’. For structural validity, de Groot’s evaluation for the KOOS was rated ‘inadequate’ due to an insufficient sample size for factor analysis (less than five times the number of participants). Three out of four (75.0%) studies that reported on responsiveness were rated ‘inadequate’ due to their construct approach. For example, Saleh et al29 used an ‘inadequate’ comparator instrument for development of the LEAS—the measurement properties of the WOMAC are not well enough known for revision. Ghomrawi et al22 did not set hypotheses for construct validity, and their statistical methodology did not allow these to be evaluated at review. Two studies reported on reliability. These were rated ‘adequate’ as, while they chose an appropriate interval, they did not also ensure patients were stable.

Table 6

Quality of studies on measurement properties

Quality of the evidence for measurement properties of the PROMs

The quality of the evidence for measurement properties of the included PROMs is provided in table 7. Twenty-five out of 27 (92.6%) measurement properties were rated insufficient, indeterminate or not assessed. The only measurement property to receive a ‘sufficient’ rating was reliability for both the KOOS and the LEAS, supported by ‘low’ and ‘moderate’ quality evidence, respectively.

Table 7

Quality of the evidence for measurement properties of the PROMs

Part C

Data on the interpretability of the studies is summarised in table 8. The mode of PROM administration was unclear for all studies except de Groot et al.32 Missing responses ranged from 25% to 60%. No study reported on missing items within a PROM instrument. Floor and ceiling effects were not reported, except by Saleh et al.29 No PROM met criteria either to be recommended or not recommended for use. Each of the validated PROMs (ie, KOOS, LEAS and WOMAC) was therefore assigned recommendation ‘B’, indicating that further evidence is needed.

Table 8

Interpretability including missing items, response rate and floor/ceiling effects

Discussion

This review has demonstrated the increasing use of PROMs to evaluate symptoms and functional outcomes following discretionary revision knee replacement. The majority of studies were retrospective and observational, with only one randomised controlled trial. Eight different joint-specific PROMs were identified, with the WOMAC index (25 studies, 49.0%) and the OKS (19 studies, 37.3%) the most frequent. Only three joint-specific PROMs were supported by a validation study: KOOS, LEAS and WOMAC. Each of these validation studies had ‘low’ or ‘very low’ quality evidence and the majority of measurement properties were either not evaluated or rated ‘inadequate’ or ‘indeterminate’. As such, each of these PROMs requires more evidence in order to be recommended for use.

Secondary findings and relation to other studies

Musculoskeletal disorders account for one-third of all reviews on the COSMIN database.34 At least three reviews have evaluated the measurement properties of PROMs following primary knee replacement.9 10 35 These studies found that many PROM instruments had limited evidence to support their measurement properties, justifying the need for further research. We are not aware of previous reviews that have examined the measurement properties of PROMs following discretionary revision knee replacement. While many of the goals from discretionary revision knee replacement are shared with primary knee replacement, there are important differences in the patient populations and disease processes being treated and the surgical interventions themselves. For example, while primary knee replacement treats predominantly osteoarthritis, discretionary revision knee replacement treats many varied disease processes.36 The revision patient population is also more comorbid and may have different expectations from surgery.37 As such, the evidence for PROMs developed in primary knee replacement cannot necessarily be assumed to be transferable across.

Strengths and weaknesses

This study has a number of important strengths, including the use of a broad search strategy based on the Oxford PROM filter24 and the application of latest COSMIN guidelines. The use of a priori hypotheses by our review team to evaluate construct validity and responsiveness is novel and meant these properties could be considered even when not a focus of the original article. This study was motivated by the James Lind Alliance Priority Setting Partnership for revision knee replacement, which generated the question: ‘How should we measure the outcomes following revision knee surgery in a way that is meaningful to patients?’.38 As such, outcome scores that were not patient completed were excluded. We acknowledge that this has restricted the number of eligible studies from North America, where use of the KSS is prevalent. In the future, qualitative studies to explore patients’ reasons for choosing surgery and to identify the outcomes that are most important to patients may be needed.

Implications for practice

We have not put forward a PROM for recommendation because the quality of the available evidence was low, and data were lacking for many of the measurement properties. However, we can make recommendations to direct future research and to move towards developing a core outcome set for discretionary revision knee replacement. First, we wish to highlight that standards for reporting of psychometric studies have changed considerably over the past 20 years.9 COSMIN tools are not limited to systematic reviews and may be used guide the scope and detail required to develop a new instrument or to evaluate an existing one. Second, this study has highlighted a number of common methodological flaws that result in high risk of bias. For example, when evaluating structural validity, none of the validation studies performed confirmatory factor analysis to understand whether the PROM scores reflected the dimensionality of the construct. For reliability, test conditions were not recorded with sufficient detail to ensure that not only the repeat interval was appropriate but also that the patient remained stable. For interpretability, none of the studies calculated a minimal important change nor comprehensively assessed floor and ceiling effects. Third, we recommend that future studies planning to use an existing joint-specific PROM to evaluate outcomes after revision surgery do so in conjunction with a validated generic health-related quality of life instrument (such as the Short Form-36 (SF36)39 or EQ-5D23). While neither the EQ-5D or SF36 were developed in patients undergoing revision knee replacement, their measurement properties have been studied extensively and allow generalisability between different conditions. This approach will provide valuable information on construct validity and responsiveness in the future.

Conclusion

In conclusion, joint-specific PROMs are increasingly used to report outcomes following revision knee replacement, but these instruments have insufficient evidence for validity. Future research is needed to target the deficiencies highlighted by this review in order to inform clinical trials and observational studies evaluating these outcomes.

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

Ethics statements

Patient consent for publication

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @simonabram

  • Contributors SAS: concept, study selection and scoping review, assessment of methodological quality, analysis, writing and editing paper, guarantor. EAH: study selection and scoping review, critically revising paper. SGFA: assessment of methodological quality, critically revising paper. AA: critically revising paper. AJP: concept, methodology, writing and editing paper. SH: concept, methodology, writing and editing paper.

  • Funding SAS has received funding from the Royal College of Surgeons One-Year Fellowship, Rosetrees Trust and National Institute for Health Research (NIHR).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.