Objectives To identify credible anchor-based minimal important differences (MIDs) for patient-reported outcome measures (PROMs) relevant to a BMJ Rapid Recommendations addressing subacromial decompression surgery for shoulder pain.
Design Systematic review.
Outcome measures Estimates of anchor-based MIDs, and their credibility, for PROMs judged by the parallel BMJ Rapid Recommendations panel as important for informing their recommendation (pain, function and health-related quality of life (HRQoL)).
Data sources MEDLINE, EMBASE and PsycINFO up to August 2018.
Study selection and review methods We included original studies of any intervention for shoulder conditions reporting estimates of anchor-based MIDs for relevant PROMs. Two reviewers independently evaluated potentially eligible studies according to predefined selection criteria. Six reviewers, working in pairs, independently extracted data from eligible studies using a predesigned, standardised, pilot-tested extraction form and independently assessed the credibility of included studies using an MID credibility tool.
Results We identified 22 studies involving 5562 patients that reported 74 empirically estimated anchor-based MIDs for 10 candidate instruments to assess shoulder pain, function and HRQoL. We identified MIDs of high credibility for pain and function outcomes and of low credibility for HRQoL. We offered median estimates for the systematic review team who applied these MIDs in Grading of Recommendations Assessment, Development and Evaluation (GRADE) evidence summaries and in their interpretations of results in the linked systematic review addressing the effectiveness of surgery for shoulder pain.
Conclusions Our review provides anchor-based MID estimates, as well as a rating of their credibility, for PROMs for patients with shoulder conditions. The MID estimates inform the interpretation for a linked systematic review and guideline addressing subacromial decompression surgery for shoulder pain, and could also prove useful for authors addressing other interventions for shoulder problems.
PROSPERO registration number CRD42018106531.
- minimal important differences
- shoulder condition
- patient-reported outcome measures
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
Our review includes a comprehensive search for anchor-based minimal important differences (MIDs) for instruments commonly used in Randomized Controlled Trials (RCTs) of shoulder conditions conducted without restrictions of study design or language of publication.
We undertook judgements of MID credibility using a formal instrument with demonstrated reliability and most studies provided highly credible estimates.
The range of reported MIDs was wide for some of the patient-reported outcome measures.
Although participants’ disease/conditions, sample size, anchors and analytical methods varied among included studies, we cannot convincingly relate these characteristics to variability in estimates.
For some instruments used in RCTs of surgery for shoulder, we did not find any study estimating MIDs in our target patient population.
The shoulder is the body’s most mobile joint, allowing movement in many directions. Shoulder conditions, including arthritis, adhesive capsulitis, rotator cuff conditions, dislocations, fractures, shoulder instability, and shoulder separation, are common problems that cause pain and disability.1 Up to 26% of adults have recently experienced shoulder pain.2 In the USA, the evaluation and management of one shoulder condition—rotator cuff tears—costs US$3 billion each year.3 4
The relationship between shoulder pain in an individual and the physical cause is often not clear: anatomical abnormalities are frequently not the cause of an individual patient’s shoulder pain. Subacromial pain syndrome, also known as shoulder impingement syndrome or rotator cuff disease, is a broad diagnosis that includes several specific conditions and is one of most common diagnoses for patients with shoulder or upper extremity pain or disability.5 6 Subacromial pain syndrome encompasses all non-traumatic shoulder conditions including partial tear of the rotator cuff, tendon cuff degeneration, bursitis, tendinosis, supraspinatus tendinopathy or biceps tendinitis.6 It is most often unilateral.
Investigating interventions to address shoulder conditions such as shoulder pain requires measurement of patients’ pain and function, best undertaken using patient-reported outcome measures (PROMs). PROMs are reported directly by the patient and address aspects of the patient’s experience and perspective without interpretation by the clinician or caregiver.7 Investigators of interventions for shoulder conditions often include PROMs addressing shoulder pain, function and health-related quality of life (HRQoL) as their primary outcomes.1 8–14 Interpreting PROMs can, however, be challenging. In particular, interpretation requires knowing if an apparent treatment effect is trivial in magnitude, small but important, moderate or large. Statistical significance provides no insight into this issue.15
To aid interpretation of PROM findings, researchers developed the concept of the minimal important difference (MID): the smallest change—either positive or negative—that patients perceive as important.16 17 The MID can help clinicians, patients, and clinical practice guideline developers interpret the magnitude of effects of interventions on PROMs.15 18 19
There are two common approaches for determining the MID: anchor-based and distribution-based methods.20 Distribution-based methods rely solely on the statistical characteristics of PROMs (eg, mean and SD of PROM scores). These statistical characteristics do not reflect the patient’s perspective, severely limiting the distribution-based approach in aiding interpretation of results.18 21
Investigators using the anchor-based approach choose an independent interpretable measure as an external criterion or anchor and then examine the relation between the target PROM instrument and that anchor.18 Although there is no ‘gold standard’ anchor-based methodology, our group has used the existing literature and expert input to develop an instrument that measures the credibility of anchor-based MIDs. Among desirable criteria to establish a trustworthy MID is a requirement for at least a moderate correlation between change in the target PROM instrument and the change on the anchor.20 22
Although systematic reviews addressing MIDs in shoulder PROMs are available,23–27 they are dated and have not applied an assessment of credibility. Therefore, we set out to identify the most credible anchor-based MID estimates to inform a systematic review addressing the effectiveness of subacromial decompression surgery for shoulder pain. Our review informed an associated BMJ Rapid Recommendations and facilitated interpretation of critical outcomes of interest, including shoulder pain, function and HRQoL. The BMJ Rapid Recommendations project is a collaboration between the MAGIC foundation (www.magicproject.org) and the BMJ, with the goal of providing timely, trustworthy practice guidelines.28
A variety of study designs could inform MIDs for PROMs chosen by investigators for the Randomized Controlled Trials (RCTs). Therefore, in this systematic review, we (1) summarise MID estimate that comes largely from observational studies for the PROMs chosen by the triallists in RCTs that investigated the effect of surgery on shoulder pain and (2) assessed the credibility of these MID estimates.
Guideline panel and patient involvement
The BMJ Rapid Recommendations guideline panel provided critical oversight to this systematic review. The panel included academic and community-based practitioners (orthopaedic surgeons, general internists, physiotherapists, a rheumatologist, a general practitioner and a geriatrician), methodologists and patients with lived experience of shoulder pain. The panel members also provided input into the methodology of our review. Patients helped, in particular, to identify the outcomes of interest for which we identified MID estimates.28 This study builds on methods used in a similar BMJ Rapid Recommendation on arthroscopic knee surgery.29 30
Instruments under consideration
The BMJ Rapid Recommendations panel, informed by the Outcome Measures in Rheumatology shoulder core outcomes set,31 nominated shoulder pain, function and HRQoL as critical patient-important outcomes of interest in the management of shoulder conditions. Following guidance from the panel, the systematic review team addressing the effectiveness of surgery for subacromial pain syndrome sought evidence for each of these outcomes in the eligible RCTs. We worked closely with that review team and addressed each of the PROMs corresponding to these constructs included as outcomes in the RCTs that proved eligible for the systematic review addressing the impact of shoulder surgery (the subacromial decompression surgery) (table 1).
Literature search and study identification
This project used a database that includes all articles reporting anchor-based MID from 1989 to April 2015 (the MID concept was first described in the medical literature in 1989).16 32 We obtained full access to the database of these MIDs—the leaders (ACL, TD and GG) of that project are participants in the current review.
We conducted a comprehensive search for relevant studies addressing MIDs from February 2015 to August 2018 using the MEDLINE, EMBASE and PsycINFO databases. For outcomes that did not fully meet the definition of patient-reported outcomes (such as Constant score33 34) or were not identified in the systematic review informing the database of MIDs, we conducted a comprehensive search for relevant studies from January 1989 to August 2018. We used the MID search strategy filter from the previous MID database development project including a shoulder filter for the relevant PROMs. We also hand searched references from related reviews. There were no language restrictions. Online supplementary appendix 1 presents the full search strategy.
Supplementary file 1
We included studies with any intervention, including expectant management. We included original reports of all studies that estimated MID(s) using anchor-based methods for any candidate PROM (table 1). If, for a particular PROM, MID(s) were available for a shoulder condition, we restricted ourselves to those MIDs. If no study estimated MIDs in patients with shoulder conditions, we used the results from studies focusing on upper extremity musculoskeletal conditions. We did not consider studies that estimated MIDs in patients with lower extremity or other conditions. Because RCTs evaluated the effects of an intervention on pain, function and HRQoL that would require MIDs for improvement, we did not include MIDs for deterioration.
Eligible studies used any design including retrospective and prospective observational studies or clinical trials that compared the results of a target PROM instrument to an anchor, regardless of the credibility of the design, conduct or results of the study. Two reviewers independently performed title and abstract screening and, subsequently, full-text screening of studies included by either reviewer. At full-text screening, reviewers resolved the disagreement by discussion or, if needed, by consultation with a third reviewer.
Six reviewers, working in three pairs, independently extracted the following data from eligible studies using a predesigned, standardised, pilot-tested extraction table: ﬁrst author name; publication year; country(ies); demographic characteristics of participants (eg, sample size, age, sex, condition or disease); intervention; characteristics of the PROM (eg, construct(s), domains(s) and range); anchor details (eg, construct(s), threshold, range of options, categories or values); details in MID determination methods (eg, number of participants used to estimate the MID, duration of follow-up from baseline, analysis methods and correlation between the anchor and PROM). Reviewers resolved disagreements by discussion.
The MID database project included the development of an instrument to assess the credibility of anchor-based MID estimates and tested its reliability (it proved reliable—manuscript in preparation, data available on request). We defined the credibility of studies estimating the MIDs as the extent to which the methodology and performance of studies are likely to have protected against misleading estimates.32 We used an abridged version of the MID credibility tool developed by our group to measure the credibility of MIDs. The tool needs to assess many aspects of the MIDs (table 2) and has proved reliable (manuscript in preparation). Six reviewers, working in three pairs, independently assessed the credibility of included studies. Reviewers resolved disagreements by discussion. We deemed that the MID estimate had high credibility if three or more of the five criteria were met (either ‘definitely yes’ or ‘to a great extent’ for each item); otherwise, we deemed that the MID had low credibility. We regard the credibility as a dichotomous variable (high and low) and do not quantify the credibility.
Synthesis of results
We described the characteristics of eligible studies including MID estimates, demographic characteristics of participants, intervention and characteristics of the instrument and anchor. We identified the median, minimum and maximum values across the range of high credibility trustworthy MID estimates generated from the eligible studies for the PROMs of interest. If all MIDs estimates were of low credibility, we presented these estimates.
For each MID with multiple estimates of the MID, we considered variables that may influence the MID. These included: the intervention type (surgical or non-surgical) and, for transition anchors, the period from first to second instrument administration (<3 months vs 3 months or more). We tested the subgroup effect by examining the interaction between each variable and the MID (p<0.05 was deemed statistically significant).
We found six eligible studies from the existing database of anchor-based MIDs and one study from the references in related reviews. We identified 2643 records through our search of electronic databases, of which 534 were duplicates, leaving 2109 records for the title and abstract screening. We excluded 1962 records based on our title and abstract screening and assessed 147 full-text articles, of which 15 were eligible. Therefore, 22 studies were eligible for this review. Figure 1 summarises the study identification process.
Table 3 presents the characteristics of the 22 eligible studies.24 35–55 Sample sizes ranged from 2049 to 1856,46 with a total of 5562 participants providing MID estimates for two relevant instruments assessing shoulder pain, one assessing function, five assessing shoulder symptoms and function and two assessing HRQoL (table 3). The 22 studies reported 74 anchor-based MIDs estimates. Twenty-one of 22 studies employed a variety of transition ratings as the anchor to determine the MIDs, of which five had a follow-up period of less than 3 months.38 43 44 48 49 One study used the Pen shoulder score (cut-off point: 8.6) as the anchor to determine the MIDs for pain measurement (PNRS).42 Of the 22 studies, 19 reported the absolute estimates for the MIDs and 3—addressing the Constant score, quick Disability of the Arm, Shoulder and Hand (DASH) and Oxford Shoulder Score (OSS)—relative estimates.35 39 43 Patients underwent surgical interventions in 4 studies36 40 46 47; 4 studies used both surgical and non-surgical interventions41 51 54 55; 13 used non-surgical interventions24 35 37–39 42–45 48 49 52 53 and 1 did not report the type of intervention.50
The analysis methods for estimating the MID included mean change in patients who had experienced a small but minimally important difference over time35–40 48 49 52 54 55; mean difference in groups perceived to have changed versus not changed24 40 46 47 53 and Receiver Operating Characteristic (ROC) curves.35 38–45 50 51 Fourteen studies provided highly credible estimates and eight studies provided low credibility estimates.37 39 42 43 47 48 54 55 Studies with high credibility reported MID estimates for Constant score, Simple Shoulder Test (SST), Pain Visual Analogue Scale (VAS), DASH, OSS and Short Form Health Survey 12 (SF-12) (table 1). Studies provided low credibility MID estimates for the Pain Numeric Rating Scale, Quick DASH, Neer score and EuroQol 5 dimensions 3 levels (EQ-5D-3L) (table 1). No studies estimate MIDs for the following instruments in shoulder or upper extremity conditions: PainDETECT Numerical Rating Scale (0–10), Shoulder Disability Questionnaire (SDQ), Project on Research and Intervention in Monotonous work score, Watson-Sonnabend score, 15D, SF-36 and Hospital Anxiety and Depression Score (HADS).
Table 4 presents median, maximum and minimum estimates of MIDs according to credibility, with the best estimates suggested to the systematic review team shaded. For the MID estimates with high credibility, MIDs for the SST (1.5–2.1) and overall pain VAS (1.4–1.6) were consistent across the two available estimates. The MIDs for the Constant score (3–16.6), DASH (4.4–25.4) and OSS (4.0–14.7) were, however, inconsistent among 6–10 estimates provided.
Available evidence permitted subgroup analyses exploring potential sources of heterogeneity only for surgical versus non-surgical interventions for the Constant score and SST and follow-up time (less than 3 months or ≧ 3 months) for the OSS. In no case did these differences explain the variation in the MID. Online supplementary appendix 2 provides details of the MID estimates and the results of subgroup analysis.
Supplementary file 2
We identified 22 studies involving 5562 patients that reported 74 empirically estimated anchor-based MIDs for 10 candidate instruments to assess shoulder pain, function and HRQoL. The majority of studies used a global rating of change (transition rating) as the anchor and had a follow-up period of over 3 months. We identified MIDs of high credibility for pain and function outcomes and of low credibility for HRQoL. MIDs estimates often varied widely; we offered median estimates for the systematic review team and guideline panel. We also provided the systematic review team with the median, minimum and maximum values across the range of high credibility trustworthy MID estimates generated from the eligible studies for the PROMs of interest. The only instance in which the variability in scores was sufficiently great that choice of one of the extremes rather than the median could substantially influence conclusions was for the Constant score.
Authors of the linked review used these MIDs (Pain VAS 0–10 1.5 units, the Constant score 0–100 scale 8.3 units and EQ-5D, 0.07 units) to gauge the importance of possible difference patients in Grading of Recommendations Assessment, Development and Evaluation (GRADE) evidence summaries and to dichotomise the improvements (proportions of patients achieving MID or more); the BMJ Rapid Recommendations guideline panel used them to inform their judgements of magnitude of effect in formulating their recommendations. The systematic review informed the BMJ Rapid Recommendations panel in their development of the guideline.
Strengths of our review include a comprehensive search for anchor-based MIDs for instruments commonly used in RCTs of shoulder conditions conducted without restrictions of study design or language of publication. We undertook judgements of MID credibility using a formal instrument with demonstrated reliability. Most studies (n=14) provided highly credible estimates. These MIDs not only can help clinicians, patients and clinical practice guideline developers interpret the magnitude of effects of interventions on PROMs, they also can be used in power calculations in future trials on shoulder conditions.
For the credibility assessment, we found that the anchor instrument directly addressed the patient’s perspective, and judged the understanding the anchor instrument for patients as ‘definitely yes’ or ‘to a great extent’, for all the MID estimates. Approximately half of the estimates did not report the correlation between the anchor and the PROM. We judged the precision of the MID estimation and the threshold or difference between groups on the anchor used to estimate the MID as ‘definitely no’ or ‘not so much’ for most MID estimates.
The results of our systematic review have limitations. The range of reported MIDs was wide for some of the PROMs (eg, 0.3–30 for Constant score; 4.4–25.41 for DASH). Baseline characteristics (participants’ disease/conditions, sample size, PROMs or instruments), anchors and analytic methods varied among included studies; though others have detected associations between methodological approaches and MIDs,56 our attempts to establish a clear relation between these variables and the MIDs were not successful. For some instruments used in RCTs of surgery for shoulder pain—the SDQ, SF-36 and 15D—we did not find any study estimating MIDs in our target patient population. For others, MIDs for shoulder conditions closely related to subacromial syndrome, or for shoulder conditions at all, were not available, and we, therefore, relied on estimates from any upper extremity problem population. With respect to the assessment of credibility, a formal assessment of the validity of the instrument has not been undertaken. Moreover, one might challenge our judgement in inferring high credibility if three or more criteria were met. Finally, investigators used different methods to relate the anchor to a transition rating; the optimal approach remains uncertain.56 57
Our results are consistent with previous studies.23–25 A previous review of MIDs of upper extremity instruments that appeared in selected orthopaedic journals from 2014 to 2016 found a wide range of MIDs for the Constant score (8–36) and reported a pain VAS MID of 1.4 on 10-point scale.26 Reviews of pain VAS MIDs in shoulder injuries found a range of 0.5–3.0.24 36 46 47 A review of pain ratings in a wide variety of conditions reported VAS MIDs of 0.1–8.2 and noted that absolute MIDs are higher in patients with more pain at baselines.27 Only one study included in our review reported MID estimates separately according to the baseline severity48 but these estimates had low credibility due to problems in the anchor selected and failure to report the correlation between the anchor and the instrument. Two other reviews of shoulder instrument MIDs, primarily from rotator cuff injuries reported MID values of 10.2–20 for DASH, and 4.0–13.4 for OSS.23 25 Participants’ disease/conditions, baseline scale score and inappropriate analytic methods can cause serious bias in determining MIDs56 58; researchers should pay more attention to these factors during the MID estimation studies.
Our review provides anchor-based MID estimates, as well as a rating of their credibility, for PROMs for measurement instruments addressing patients with shoulder conditions. The review identified methodological limitations of the primary studies, future studies should strive for high precision of MID estimation, seek to identify difference between groups and reasons for those differences and report correlations between the anchor and the PROM.56 58
The MID estimates inform the interpretation for a linked systematic review and guideline on arthroscopy for shoulder pain. Researchers addressing a wide variety of shoulder conditions can in future make use of our summary MIDs to inform sample size and aid in interpretation of results.
The authors thank Rachel Couban, librarian at McMaster university for testing the search strategy and members of the Rapid Recommendations panel (especially, Clare Ardern, physiotherapist, Teemu Karjalainen, orthopedic surgeon, Lyubov Lytvyn, patient partnership liaison and Rudolf Poolman, orthopedic surgeon) for critical feedback on the inclusion of relevant PROMS and for their review of this manuscript. The abridged version of the MID credibility tool used in this study is derived from a MID credibility tool that was made available to the authors under license from McMaster University.
Patient consent for publication Not required.
Contributors GG and RACS conceived the study idea; QH designed the search strategy; QH, YW and DZ screened studies for eligibility; QH, TD, YW, DZ, RACS and AQ extracted data and assessed the credibility; QH wrote the first draft of the manuscript; GG, TD, POV, TL, TA, ACL and RACS interpreted the data analysis and critically revised the manuscript. QH is the guarantor.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.