Article Text

Patient-Reported Outcome (PRO) questionnaires for young to middle-aged adults with hip and groin disability: a systematic review of the clinimetric evidence
  1. K Thorborg1,
  2. M Tijssen2,
  3. B Habets2,
  4. E M Bartels3,
  5. E M Roos4,
  6. J Kemp5,
  7. K M Crossley6,
  8. P Hölmich1
  1. 1Sports Orthopedic Research Centre—Copenhagen, Arthroscopic Centre Amager, Copenhagen University Hospital, Copenhagen, Denmark
  2. 2Sports Medical Centre Papendal, Arnhem, The Netherlands
  3. 3The Parker Institute, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark
  4. 4Institute of Sports Science and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark
  5. 5Australian Centre for Research into Injury in Sport and its Prevention (ACRISP), Federation University Australia, Ballarat, Australia
  6. 6School of Health and Rehabilitation Science, The University of Queensland, Brisbane, Australia
  1. Correspondence to Dr K Thorborg, Department of Orthopaedic Surgery, Amager Hospital, Faculty of Health Sciences, University of Copenhagen, Denmark, Italiensvej 1, Copenhagen S 2300, Denmark; kristianthorborg{at}hotmail.com

Abstract

Background/aim To recommend Patient-Reported Outcome (PRO) questionnaires to measure hip and groin disability in young-aged to middle-aged adults.

Methods A systematic review was performed in June 2014. The methodological quality of the studies included was determined using the COnsensus-based Standards for the selection of health Measurement INstruments list (COSMIN) together with standardised evaluations of measurement properties of each PRO.

Results Twenty studies were included. Nine different questionnaires for patients with hip disability, and one for hip and groin disability, were identified. Hip And Groin Outcome Score (HAGOS), Hip Outcome Score (HOS), International Hip Outcome Tool-12 (IHOT-12) and IHOT-33 were the most thoroughly investigated PROs and studies including these PROs reported key aspects of the COSMIN checklist. HAGOS and IHOT-12 were based on studies with the least ratings of poor study methodology (23% and 31%, respectively), whereas IHOT-33 and HOS had a somewhat larger distribution (46%). These PROs all contain adequate measurement qualities for content validity (except HOS), test–retest reliability, construct validity, responsiveness and interpretability. No information or poor quality rating on methodological aspects made it impossible to fully evaluate the remaining PROs at present.

Conclusions HAGOS, HOS, IHOT-12 and IHOT-33 can be recommended for assessment of young-aged to middle-aged adults with pain related to the hip joint, undergoing non-surgical treatment or hip arthroscopy. At present, HAGOS is the only PRO also aimed for young-aged to middle-aged adults presenting with groin pain and is recommended for use in this population.

Trial registration number CRD42014009995.

  • Hip
  • Groin
  • Evidence based review
  • Sports medicine
  • Orthopaedics

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Treatment interventions, such as hip arthroscopy, endoscopic groin hernia repair and specific exercise regimens, are advancing rapidly to manage hip and groin disability in young-aged to middle-aged adults.1–3 This area of sports medicine is a ‘hot topic’ that needs to be advanced with rigorous research.1

Patient-Reported Outcomes (PROs) are considered the gold standard when measuring the efficacy of interventions from the patient's perspective.4 Prior to recommending or discarding specific PRO questionnaires, a systematic investigation of their clinimetric properties is required.5 A systematic review from 2010 of PROs for patients with hip and/or groin disability showed that most PROs were developed for people aged over 50 with hip osteoarthritis and/or in need of hip replacement.6 The year 2011 saw two new systematic reviews on PROs evaluating patients undergoing hip arthroscopy.7 ,8 Combined, these three systematic reviews agreed that the Hip Outcome Score (HOS) was the best available PRO for patients undergoing hip arthroscopy.6–8 However, these conclusions were based on only six studies and the consensus was that more research was needed in this area.6–8

In the past 3 years, several publications concerning the development and evaluation of PROs for young-aged to middle-aged adults, including patients undergoing surgical as well as those undergoing non-surgical treatment, have emerged and been debated.9–16 These recommend instruments other than the HOS as the most appropriate in this setting involving younger patients.9–16 Therefore, we systematically evaluated the clinimetric evidence pertaining to PROs for young-aged and middle-aged patients with hip or groin problems.

Methods

We performed a systematic review of the literature concerning assessment of hip and/or groin disability: (1) to identify PROs to assess young-aged to middle-aged adult patients with hip and/or groin disability in clinical practice, or in studies or clinical databases concerning outcome of various types of surgical, medical or exercise treatment and (2) to evaluate PRO study quality, and the clinimetric properties of available PROs in this population. The study protocol was pre-registered in PROSPERO (CRD42014009995), in May 2014.

The groin is anatomically located in the anteromedial part of the hip region, and the hip and groin region share vascular and neural supply.17 The pathologies of the hip joint and the groin often present simultaneously, and the symptoms can be overlapping.18–21 We therefore searched for PROs concerning both regions.

Definitions

Clinimetric properties

Clinimetrics, derived from psychometrics, is the discipline concerned with measurement of variables in tests and questionnaires.22 The term ‘clinimetric properties’ in this study was defined as measurement properties of questionnaires concerning validity, reliability and responsiveness.5

Psychometric theory

Clinimetric properties can be assessed using Classical Test Theory (CTT) and Item Response Theory (IRT). CTT predicts outcomes of testing such as the difficulty of items or the ability of the person being tested. CTT assumes that an observed score can be decomposed into a ‘true’ score and an ‘error’ score, and the reliability coefficient can be formulated as the ratio of true variance to (true+error) variance. The term ‘classical’ contrasts with recent psychometric theories such as IRT. This theory assumes that the score is unidimensional and creates an interval-scaled measure.22

Patient-reported outcome

A PRO is any report coming directly from a patient concerning a health condition and its treatment.4 ,23 PRO questionnaires include items, instructions and guidelines for scoring and interpretation, and are used to measure outcomes from the perspective of the patient.4

Disability

Disability in this study refers to the health dimensions within the methodological framework of The International Classification of Functioning, Disability and Health (ICF) as categorised at one of three levels: impairment (body structure and function), disabilities (activities) and participation problems (participation).24

Literature search strategy

A comprehensive, systematic literature search was conducted in the following bibliographic databases: MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials, PsycINFO, SportsDiscus and Web of Science, all from January 2009 to June 2014. Relevant studies from a previous systematic review from our group,6 including studies on the same topic, were also included. This study included similar search strategies and bibliographic databases as the previous study,6 where the databases were searched up to January 2009.

Our search strategy was:

Hip OR groin OR inguinal hernia

AND

outcome assessment* OR self assessment* OR questionnaire*

AND

reliability OR validity

The terms were searched as key words (in MEDLINE named MESH terms) where possible, and also as ‘free-text’ words. From the retrieved and selected references, reference lists were checked for further relevant studies. Finally, specific searches for identified questionnaires were carried out, and experts in the field were contacted for possible additional references.

Study selection

Two reviewers (KT and EMB) independently carried out the selection among the retrieved references of possible studies for inclusion, based on titles and abstracts. All eligible studies were obtained in full text and evaluated according to the inclusion criteria. Excluded studies were identified and presented with the reasons for exclusion, following the PRISMA guidelines (figure 1).25

Inclusion criteria

Inclusion criteria for this study were as follows:

  1. The retrieved study was published in English as a full report.

  2. All patients were aged ≥18 and with a mean age ≤50. If only median age was reported then median age was ≤50 years.

  3. Clinimetric properties in the study were evaluated with CTT or IRT.

  4. The main purpose of the study was to evaluate one or more clinimetric properties of a PRO applied in a patient population with hip and/or groin disability.

  5. The study included a PRO specifically concerning hip and/or groin disability, containing items related to impairment (body structure and function), disabilities (activities) or participation problems (participation), according to the ICF.

  6. Data on hip and/or groin disability could be separated from disabilities of other anatomical regions.

Characteristics of studies and instruments

Information on the evaluation of clinimetric properties of the PRO(s), time of administration, target population (diagnosis/clinical features), study population and mode of administration were included whenever possible. Extracted information from the identified questionnaires included full name of the questionnaire, abbreviation of the name of the questionnaire, assessment dimensions and the number of rating scales.

Methodological quality of included studies

The methodological quality of the included studies was determined by the COnsensus-based Standards for the selection of health Measurement INstruments list (COSMIN).26 The COSMIN checklist is based on an international Delphi study in which 57 experts participated. COSMIN has proven high inter-rater agreement.26 ,27 It contains three steps and has 11 areas (boxes) with several questions and criteria. Nine boxes can be used to assess whether a study meets the standard for good methodological quality (boxes A to I). Only the boxes corresponding to the properties assessed in the study will be evaluated. Each item is rated as excellent, good, fair or poor in accordance with the criteria described by Terwee et al.28 A methodological study quality score per box is determined by the item with the lowest score (‘worse score counts’).28 Two reviewers (MT and BH) conducted the review process individually, and a third reviewer (Robert van Cingel, RvC) was consulted for consensus, in cases of disagreement.

Data extraction and evaluation of clinimetric properties

Based on the guidelines for systematic reviews,25 ,29 we used a criteria list for evaluative purposes and described the operationalisation of it explicitly. The criteria list in question was published by Terwee et al,5 and was designed to evaluate PROs and their clinimetric properties, where group comparisons are needed. This criteria list has recently been applied in other systematic reviews on PROs including young-aged to middle-aged patients with hip and groin disability,6–8 and was considered the best available instrument for our purpose. In our previous systematic review,6 the methodological issues of the criteria list were discussed and refined in the study group, which is in accordance with recommendations in the original article.5 This refined version was also used for the present review.

The criteria list described the clinimetric properties: content validity, internal consistency, construct validity, floor and ceiling effects, test–retest reliability, intertester reliability, agreement, responsiveness and interpretability. Inter-tester reliability is only relevant for PRO questionnaires if observer-administration is introduced. The clinimetric properties were rated as positive (+), indeterminate (±), negative (−) or no information available ( ) (see online supplementary appendix 1). In order to avoid systematic errors in the study design or execution, two reviewers (MT and BH) independently rated the clinimetric properties of each questionnaire according to the criteria list. Uncertainty or disagreement was resolved by discussion with a third reviewer (RvC). Where further information of the studies was needed, the authors of these studies were contacted for clarification if required. The PRO ratings in the individual studies are described in online supplementary appendix 1, all in accordance with the recommendations by Terwee et al.5

Statistical analysis of the reliability of the ratings

In the present study, unweighted κ statistics were used to calculate the intertester reliability of the initial ratings by the two reviewers concerning methodological study quality and clinimetrics, since the ratings are considered nominal.30

Results

The new search (2009–2014) identified 661 publications in total. Six publications31–36 identified from our previous systematic review and similar search (1980–2009)6 were also included, since they fulfilled the inclusion criteria. Following the screening of titles and abstracts, 583 publications were excluded. Of the remaining 84 publications, which were read in full, 64 publications were excluded, as they did not fulfil our inclusion criteria (figure 1). Twenty studies were finally included in the systematic review11 ,12 ,14 ,31–47 involving 4996 patients, as our final data for reviewing (table 1). A total of 10 PROs were identified in the included studies (table 2). Nine PROs considered the hip region, and one questionnaire considered the hip as well as the groin region.

Table 1

Description of included studies in the systematic review

Table 2

Included PRO questionnaires for patients with hip and/or groin disability

The intertester reliability of the independent ratings based on the COSMIN ratings was good (κ=0.74, CI 95% 0.66 to 0.82). Disagreement here was mainly caused by differences in interpretation of the exact COSMIN criteria. In a few cases disagreement was caused by reading errors where one of the reviewers had overlooked specific information. In all cases consensus was reached by discussion between the two reviewers.

The intertester reliability of the independent ratings of clinimetric properties was very good (κ=0.90, CI 95% 0.86 to 0.95). Disagreement here was minimal and mainly caused by reading errors where one of the reviewers had overlooked specific information on a specific clinimetric property. In all cases consensus was reached by discussion between the two reviewers.

Methodological quality of the included studies

The methodological quality of the included studies evaluated by the COSMIN checklist can be seen in table 3. The most commonly evaluated PROs were: HOS in eight studies, Copenhagen Hip And Groin Outcome Score (HAGOS) in four studies, International Hip Outcome Tool-33 (IHOT-33) in four studies and IHOT-12 in three studies. The studies concerning the HOS, HAGOS, IHOT-33 and IHOT-1211 ,12 ,14 ,31–47 covered all important methodological quality aspects, except criterion validity, which is usually not relevant for PROs of this kind (table 2). The studies appraising the HOS, HAGOS, IHOT-33 and IHOT-12 exhibited the following distribution of ratings for poor methodology (number of poor ratings/number of total ratings): HOS (16/35=46%), HAGOS (5/22=23%), IHOT-33 (6/13=46%) and IHOT-12 (4/13=31%).

Table 3

Scores of articles rated by COSMIN checklist

IRT-based and CTT-based analyses were only performed for HOS. IRT-based analysis was not performed in the studies concerning HAGOS, IHOT-12 and IHOT-33, as these studies were developed using only CTT-based analyses.11 ,12 ,14 ,38 ,40 ,41 ,45 ,47 Unidimensionality and structural validity can, however, also be evaluated by CTT, and IRT is therefore not a pre-requisite for evaluating these methodological aspects. The studies assessing these four questionnaires (HOS, HAGOS, IHOT-12 and IHOT-33)11 ,12 ,14 ,32 ,33 ,34 ,38 ,40–47 adequately address all important measurement aspects.

The other studies, including: modified Harris Hip Score (mHHS) in three studies,11 ,35 ,40 Hip disability and Osteoarthritis Outcome Score (HOOS) in two studies,11 ,40 Non-Arthritic Hip Score (NAHS) in three studies,31 ,37 ,40 Hip Sports Activity Scale (HSAS) in one study,44 Super Simple Hip Score (SUSHI) in one study39 and Western Ontario and McMaster Universities Osteoarthritis Index—12 (WOMAC-12) in one study36 had either no evaluation of their methodological quality concerning content validity specifically for young-aged and middle-aged patients, or a poor rating on this aspect. Furthermore, structural validity was either not assessed, or displayed poor methodology in all these studies, except in the study using WOMAC-12.36 The studies concerning mHHS, HOOS, NAHS, HSAS, SUSHI and WOMAC-12 showed the following distribution of ratings for poor methodology (number of poor ratings/number of total ratings): mHHS (4/9=44%), HOOS (3/8=38%), NAHS (6/12=50%), HSAS (3/5=60%), SUSHI (1/3=33%) and WOMAC (0/3=0%). Of these questionnaires, WOMAC-12 was the only one to be assessed by IRT, such that important aspects of reliability and validity could not be assessed.36 As content and structural validity are vital aspects in relation to a study's ability to measure the PROs internal and external validity, no information, or poor methodology on these aspects, makes it impossible to fully evaluate these PROs at present, based on available studies on these PROs (HOOS, HSAS, mHHS, NAHS, SUSHI and WOMAC-12).

Overall quality of PROs

The ratings of the individual PRO reports can be found in online supplementary appendix 2. The ratings of the clinimetric properties of the included PROs are synthesised and presented in table 4.

Table 4

Quality of the questionnaires based on psychometric properties

Overall, HAGOS, IHOT-33 and IHOT-12 received the best ratings concerning their clinimetric properties (six positive scores out of eight relevant scores). HOS followed with five positive scores out of eight relevant scores. Then HOOS, mHHS and NAHS followed with four positive scores out of eight relevant scores, and last came HSAS and SUSHI with two positive scores out of eight relevant scores. WOMAC-12 was mainly developed with IRT and could only be evaluated for internal consistency.

Content validity

Content validity was defined as the extent to which the domain of interest is comprehensively sampled by the items in the questionnaire.5 The HAGOS, IHOT-33, IHOT-12 and NAHS showed good content validity in the use of target population and investigators or experts in the item selection.12 ,14 ,31 ,38 During the development of the HOS, the target population was not used in the item generation process.32 For HSAS, a target population was not used in the development of the questionnaire.44 The study investigating SUSHI had a doubtful design for the item generation process, as no investigators or experts were involved.39 For the remaining PROs, no information was found on content validity in young-aged to middle-aged adults.

Internal consistency

Internal consistency is the extent to which items in a (sub) scale are intercorrelated and is a measure of homogeneity of a (sub)scale.5 Appropriate factor analysis was performed for HAGOS with high Cronbach's α leading to positive ratings for internal consistency.14 ,47 HOOS, HOS, IHOT-12, IHOT-33 NAHS and WOMAC-12 all scored indeterminate ratings for internal consistency. This was due to a lack of appropriate factor analysis and/or missing Cronbach's α for each subscale in most of the studies investigating these clinimetric properties for these PROs.11 ,12 ,31 ,32 ,37 ,38 ,42 ,43 ,45 ,46

Construct validity

Construct validity is the extent to which scores on PROs relate to other measures, in a manner that is consistent with theoretically derived hypotheses concerning the domains that are measured.5 All PROs except HSAS, NAHS and WOMAC-12 scored a positive rating for construct validity. Indeterminate ratings for HSAS and NHAS were given based on a lack of information concerning a priori hypotheses, and WOMAC-12 was not assessed for construct validity.31 ,36 ,37 ,44

Floor and ceiling effects

Floor and ceiling effects are present if the questionnaire fails to demonstrate a worse score in patients who clinically deteriorated and an improved score in patients who are clinically improved.5 Three questionnaires showed floor and ceiling effects, namely HAGOS, HOOS and mHHS.11 ,14 ,47 While mHHS is a single score, HAGOS and HOOS hold six and five separately scored subscales, respectively, which are administered separately. For HAGOS, floor effects were found for the subscale Participation in Physical Activity (PA) before intervention (surgical and non-surgical) in two studies,14 ,47 and ceiling effects for the subscales Activities of Daily Living (ADL)11 ,47 and PA11 ,14 after intervention (surgical and non-surgical) in two studies, respectively. For HOOS ceiling effects for the subscales, ADL and sport/recreation were found after surgical intervention (hip arthroscopy).11

Test–retest reliability

Test–retest reliability is the extent to which the same results are obtained on repeated administrations of the same PRO when no change in clinical status has occurred.5 Information on test–retest reliability was found for eight questionnaires of which all had a positive rating.12 ,14 ,34 ,37 ,38 ,40–44 ,47 No test–retest reliability results were available for SUSHI and WOMAC-12.36 ,39

Agreement

Agreement is the ability to produce exactly the same scores with repeated measurements.5 Information on agreement was found for seven questionnaires. Martin et al34 demonstrated a minimal important change (MIC) for use in individual patients, which was larger than the smallest detectable change (SDC) for HOS, but this was contradicted by Kemp et al.11 HAGOS, HOOS, IHOT-33, IHOT-12 and mHHS all got a negative rating concerning agreement, as their SDCindividual were generally larger than the MIC.11 ,41 ,47 NAHS received an indeterminate rating as no MIC was presented.

Responsiveness

Responsiveness was defined as the ability to detect important change over time in the concept being measured.5 Responsiveness was investigated for nine questionnaires and found to be good for seven of them, including HAGOS, HOS, HOOS, IHOT-12, IHOT-33, mHHS and NAHS.11 ,14 ,34 ,40 ,41 ,46 ,47 The HSAS and SUSHI scored a negative rating since these studies only reported standardised response means as measures of responsiveness.39 ,44

Interpretability

Interpretability is the degree to which one can assign qualitative meaning to quantitative scores.5

Only HAGOS, HOOS, HOS, IHOT-33, IHOT-12 and mHHS received a positive rating concerning this property as these were the only PROs for which mean and SD scores of at least two subgroups or MIC were presented.11 ,14 ,34 ,41–43 ,47

Discussion

We identified 20 studies, including nine PROs applied in the assessment of young-aged to middle-aged adults with hip disability, and one PRO for assessing hip and groin disability, also in young-aged to middle-aged adults.

In our previous systematic review,6 only the HOS, mHHS and NAHS had been evaluated in young-aged to middle-aged adults with hip and/or groin disability. We identified that the HOS had adequate clinimetric properties to assess young-aged to middle-aged patients undergoing hip arthroscopy,6 which was also confirmed in this updated version.6 However, the HOS is no longer the only PRO that can be considered as a relevant and valid measure of hip disability in young-aged to middle-aged adults.

HAGOS, IHOT-33 and IHOT-12 have also been thoroughly investigated, and these PROs contain adequate clinimetric qualities for assessment of young-aged to middle-aged patients with hip disability. Furthermore, HAGOS measures pain and difficulties not only related to the hip but also to the groin region.14 This is important since disability related to the groin region is a common problem in young and physically active people.19 ,21

The present study showed that studies on HAGOS and IHOT-12 had the least ratings of poor methodology (23–31%), whereas studies investigating the IHOT-33 and HOS had a somewhat larger distribution of poor ratings (46%). The studies assessing these four PROs include sufficient coverage of all important measurement aspects and have, overall, sufficient quality to make it possible to conclude on the quality of their clinimetric properties.26 ,28 These PROs all showed adequate measurement qualities for content validity (except HOS), construct validity, test–retest reliability, responsiveness and interpretability.5 Ceiling effects were seen for some of the subscales in the HAGOS, HOOS and in mHHS.11 ,14 ,47 According to the criteria described by Terwee et al,5 floor and ceiling effects are present if >15% of patients display highest (100 point) or lowest (0 point) possible score.5 ,6 Floor and ceiling effects should, however, always be considered in the relevant context.

In the HAGOS (PA) subscale, a maximal PA score of 100 means that patients cannot deteriorate or improve any further, as they report that they are ‘never’ or ‘always’ able to participate in their preferred physical activities for as long as they want, and that they are ‘never’ or ‘always’ able to perform their preferred physical activities at their normal performance level. In this group of individuals, further deterioration or improvement seems of no clinical relevance, as such answers strongly indicate that these individuals are already functioning at the lowest or highest possible physical level.16 A postintervention ceiling effect may, therefore, be an effect of successful treatment and not necessarily a sign of a PROs poor clinimetric quality.

Concerning agreement, no PROs receive a positive rating, including HAGOS, HOS, IHOT-12 and IHOT-33. Lack of precision at the individual level is due to a considerable measurement variation (SDCindividual), indicating that quite large differences are needed to be reliably detected for an individual in the clinic.40 In the two studies that included a direct head-to-head comparison on HAGOS, HOS, IHOT-12 and IHOT-33, SDCindividual ranged from 10 to 20 points for patients 12–24 months after undergoing hip arthroscopy,11 and 20–30 points for patients with a primary complaint of hip and groin pain seeking either a physiotherapist or an orthopaedic surgeon for treatment.40 However, at the group level, the HAGOS, HOS, IHOT-12 and IHOT-33 have very low measurement variation, where SDC at the group level ranged from 1 to 3 points for patients 12–24 months after undergoing hip arthroscopy,11 and 2–6 points for patients with a primary complaint of hip and groin pain seeking treatment from either a physiotherapist or an orthopaedic surgeon,40 making these PROs highly capable of detecting small differences at group level (SDCgroup), when considering a group of 23–50 patients.11 ,40

Methodological limitations

A limitation of our study is that no gold standard exists to evaluate clinimetric properties of PRO questionnaires, and our chosen criteria list may therefore be disputed. There are other criteria lists available,48 ,49 but in the absence of a gold standard we utilised the most comprehensive criteria list available to evaluate the PROs’ clinimetric properties.5

The COSMIN checklist26 ,28 and Terwee's criteria list5 used in our study were developed to evaluate study quality and clinimetric properties of PRO questionnaires, respectively, primarily based on CTT.5 ,26 ,28 IRT is a relatively new method to evaluate PROs in healthcare and has some potential advantages over CTT.22 ,50 The Rasch model, which is a mathematical model applied in IRT, has been used to develop and internally validate measures, and it uses a logistic function that creates an interval-scaled measure.22 ,50 The COSMIN checklist and Terwee criteria list are mainly developed to evaluate clinimetric properties of questionnaires based on CTT,5 ,26 ,28 and this is a limitation of our study. In the future, criteria evaluating methods and results of studies using IRT models must be further developed, since this method has gained acceptance,5 ,26 ,28 and studies concerning development and/or evaluation of questionnaires based on IRT, as also shown by this review, are now more frequent.

A recent study by Schellingerhout et al51 synthesised the different studies by taking the methodological quality of the studies and the consistency of their results into account. The possible overall rating for a measurement property was ‘positive’, ‘indeterminate’ or ‘negative’, accompanied by levels of evidence, similar to that proposed by the Cochrane Back Review Group.52 ,53

In the present study we did not follow this approach, since the COSMIN checklist is not able to rate the overall methodology of the study, but instead rates nine individual items concerning PRO study quality. Currently, there is no clear method to handle a study with poor methodology for one or more items, but good, fair or excellent for other items; a situation common among the included studies (see table 2). Therefore, we decided on a more pragmatic approach, where we identified the total distribution of ratings for poor methodology in studies on each questionnaire. In our opinion, this provides a better overall view of PROs based on studies with a large proportion of poor methodology. HAGOS and IHOT-12 had the smallest proportion of items with a poor methodology score, but they still showed poor methodology score in 23% and 31% of items, respectively, suggesting that the study quality of studies developing and evaluating PROs can still be improved considerably.

Several systematic reviews evaluate the efficacy of different treatment modalities for young-aged to middle-aged patients with hip and/or groin disability.2 ,54 ,55 None of these considered the quality of the outcome measures applied in the included studies. Earlier reviews were mainly concerned with obvious methodological qualities such as randomisation procedures, control groups, blinding, compliance, drop-out, intention to treat, etc.56 Measurement properties have rarely been evaluated in the same methodologically stringent manner.56 A risk of bias may have been introduced with the possibility of unqualified instruments being selected when investigating and reporting the efficacy of different treatment modalities.2 ,54 ,55 The present study provides valuable information regarding clinimetric properties of PROs for young-aged to middle-aged adults with hip and/or groin disability.

Conclusion

HAGOS, HOS, IHOT-12 and IHOT-33 can be recommended in the assessment of young-aged to middle-aged adults with pain and dysfunction related to the hip joint. There is insufficient evidence to recommend the other identified instruments, namely, HOOS, HSAS, mHHS, NAHS, SUSHI and WOMAC-12, at present. The HAGOS is the only PRO aimed for young-aged to middle-aged adults addressing pain and dysfunction not only in the hip but also the groin area. HAGOS can be recommended for assessment in this population. The methodological quality of the existing reports varies greatly and can be considerably improved.

What are the new findings?

  • Ten different Patient-Reported Outcomes (PROs) were assessed for their clinimetric evidence in relation to young-aged to middle-aged adults with pain related to the hip joint, undergoing non-surgical treatment or hip arthroscopy.

  • Hip And Groin Outcome Score (HAGOS), Hip Outcome Score (HOS), International Hip Outcome Tool-12 (IHOT-12) and IHOT-33 can be recommended for the assessment of young-aged to middle-aged adults with pain related to the hip joint, undergoing non-surgical treatment or hip-arthroscopy.

  • HAGOS is the only PRO also aimed toward young-aged to middle-aged adults presenting with groin pain and can be recommended for use in this population.

How might it impact on clinical practice in the near future?

  • Assessing patient-reported hip and groin disability in young-aged to middle-aged adults can now be narrowed down to using HAGOS, HOS, IHOT-12 or IHOT-33, depending on which of these PROs clinicians or researchers find to be the most relevant in relation to the specific context and target group.

  • We recommend that assessment of young-aged to middle-aged adults presenting with groin pain routinely include the HAGOS.

  • The use of valid, reliable and responsive PROs in young-aged to middle-aged adults presenting with hip and groin pain will hopefully be implemented in studies assessing treatment interventions, such as hip arthroscopy, endoscopic groin hernia repair and specific exercise regimens in the future, as these interventions are currently advancing rapidly in the management of hip and groin disability.

Acknowledgments

The authors would like to thank Dr Robert van Cingel PT, PhD, from Sports Medical Centre Papendal, for his assistance as a third reviewer consulted for consensus in cases of disagreement in COSMIN ratings and ratings of measurement properties for each PRO.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Footnotes

  • Correction notice This paper has been amended since it was first published Online First. The title of the paper has been edited for better English.

  • Contributors Analysis of the data was performed by MT and BH, and all authors revised and consented to the discussion and conclusion sections in the paper concerning the data. All authors contributed to the original idea of this review and the design; they all participated in the writing of the protocol before upload on PROSPERO and also commented and contributed with important intellectual content and final approval of the version to be published.

  • Competing interests KT is one of the developers of Copenhagen Hip And Groin Outcome Score (HAGOS), which is included for evaluation in the present review. EMR is one of the developers of Copenhagen HAGOS and Hip disability and Osteoarthritis Outcome Score (HOOS), which are included for evaluation in the present review. PH is one of the developers of Copenhagen HAGOS, which is included for evaluation in the present review. KT, EMR and PH were not involved in the rating of the study quality and the clinimetric properties, as MT and BH conducted this.

  • Provenance and peer review Not commissioned; externally peer reviewed.