Introduction Patient-reported outcomes (PROs) are often the outcomes of greatest importance to patients. The minimally important difference (MID) provides a measure of the smallest change in the PRO that patients perceive as important. An anchor-based approach is the most appropriate method for MID determination. No study or database currently exists that provides all anchor-based MIDs associated with PRO instruments; nor are there any accepted standards for appraising the credibility of MID estimates. Our objectives are to complete a systematic survey of the literature to collect and characterise published anchor-based MIDs associated with PRO instruments used in evaluating the effects of interventions on chronic medical and psychiatric conditions and to assess their credibility.
Methods and analysis We will search MEDLINE, EMBASE and PsycINFO (1989 to present) to identify studies addressing methods to estimate anchor-based MIDs of target PRO instruments or reporting empirical ascertainment of anchor-based MIDs. Teams of two reviewers will screen titles and abstracts, review full texts of citations, and extract relevant data. On the basis of findings from studies addressing methods to estimate anchor-based MIDs, we will summarise the available methods and develop an instrument addressing the credibility of empirically ascertained MIDs. We will evaluate the credibility of all studies reporting on the empirical ascertainment of anchor-based MIDs using the credibility instrument, and assess the instrument's inter-rater reliability. We will separately present reports for adult and paediatric populations.
Ethics and dissemination No research ethics approval was required as we will be using aggregate data from published studies. Our work will summarise anchor-based methods available to establish MIDs, provide an instrument to assess the credibility of available MIDs, determine the reliability of that instrument, and provide a comprehensive compendium of published anchor-based MIDs associated with PRO instruments which will help improve the interpretability of outcome effects in systematic reviews and practice guidelines.
- Minimally Important Difference
- Patient Reported Outcome
- Systematic Survey
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
For decades, longevity and major morbid events (eg, stroke, myocardial infarction) have been a primary focus in health research. Increasingly, investigators and clinicians have acknowledged the critical role of disease and treatment-related symptoms, as well as the function and perceptions of well-being for informed clinical decision-making. Typically measured by direct patient inquiry, these outcomes, previously generally referred to as ‘quality of life’ or ‘health-related quality of life’ measures, are now most commonly referred to as patient-reported outcomes (PROs). PROs provide patients’ perspectives on treatment benefits and harms, and are often the outcomes of most importance to patients.
The PRO literature has grown exponentially over the past three decades (figure 1), with several instruments (eg, Short-Form-36,1 Beck Depression Inventory2) in routine use in research and clinical practice. The number of clinical trials evaluating the impact of interventions on PROs has also steadily increased (figure 2), and PROs are increasingly considered in practice guidelines (figure 3).
Although PROs are often measured as primary outcomes in clinical trials, challenges remain in their application. In addition, although evidence supporting reliability, validity and responsiveness exists for many PRO instruments, interpretation of their results remains a challenge.
Interpretability has to do with understanding the changes in instrument scores that constitute trivial, small but important, moderate or large differences in effect. For instance, if a treatment improves a PRO score by three points relative to control, what are we to conclude? Is the treatment effect large, warranting widespread dissemination in clinical practice, or is it trivial, suggesting that the new treatment should be abandoned? Recognition of this potentially serious limitation has led to increasing interest in the interpretation of treatment effects on PROs.3 ,4
The minimally important difference (MID) provides a measure of the smallest change in the PRO of interest that patients perceive as important, either beneficial or harmful, and that would lead the patient or clinician to consider a change in management.5 Knowledge of the MID allows decision-makers to better interpret the magnitude of treatment effect and assess the trade-off between beneficial and harmful outcomes. Patients, clinicians and clinical practice guideline developers require knowledge of the MID to guide their decisions. For example, a guideline developer using the GRADE approach6 might consider the MID as a decision threshold for determining if the quality of evidence for a given intervention should be rated down (for imprecision) if the CI surrounding a pooled effect estimate includes the MID,7 or if it is sufficiently precise, lying well above the MID threshold. The MID also provides a metric for clinical trialists planning sample sizes for their studies. This is accomplished by first calculating the proportion of patients achieving an MID or greater change, and subsequently determining the difference in the proportion of responders that trialists would like to examine between the treatment and control that would constitute a clinically important difference. The widespread recognition of the usefulness of the MID is reflected in the exponential growth in the number of citations reporting MIDs since the concept was first introduced into the medical literature in 1989 (figure 4).4
There are two primary approaches for estimating an MID: distribution-based and anchor-based methods. Distribution-based methods rely on the distribution around the mean scores of the measure of interest (eg, SD).8 In the anchor-based approach, investigators examine the relation between the target PRO instrument and an independent measure that is itself interpretable—the anchor. An appropriate anchor will be relevant to patients (eg, measures of symptoms, disease severity or response to treatment).9 Investigators often use global ratings of change (patients classifying themselves as unchanged, or experiencing small, moderate and large improvement or deterioration) as an anchor. It is generally agreed that the patient-reported anchor-based approach is the optimal way to determine the MID because it directly captures the patients’ preferences and values,8 ,10 although it can still be problematic if the credibility of the anchor is in question (eg, is the anchor itself interpretable and are responses on the anchor independent of responses on the PRO?).
Currently, there is no study or database that systematically documents all available anchor-based MIDs associated with PRO instruments. In addition, there are currently no accepted standards for appraising the credibility of an MID determination; incorrect methods, assumptions or interpretation of MID could be detrimental to otherwise well-designed clinical trials, systematic reviews and guidelines. For example, systematic review authors and guideline developers might interpret trials using PROs incorrectly, and provide misleading guidance to patients and clinicians.11 ,12 In addition, erroneous MIDs may lead to inappropriate sample size calculations.
Our objectives are therefore to:
Summarise the anchor-based methods that investigators have used to estimate MIDs and the criteria thus far suggested to conduct studies optimally (henceforth referred to as a ‘systematic survey addressing the anchor-based methods used to estimate MIDs’).
Develop an instrument for evaluating the credibility of MIDs that emerge from identified PRO instruments.
Document published anchor-based MIDs associated with PRO instruments used in evaluating the effects of interventions on chronic medical and psychiatric conditions in adult and paediatric populations (henceforth referred to as a ‘systematic survey of the inventory of published anchor-based MIDs’).
Apply the credibility criteria (developed in objective 2) to each of the MID estimates that emerge from our synthesis in objective 3.
Determine the reliability of our credibility instrument (developed in objective 2).
To ensure the feasibility and a manageable scope of the project and given that the primary use of PROs is in the management of chronic medical and psychiatric conditions,13 we have restricted our focus to these clinical areas.
We define an anchor-based approach as any independent assessment to which the PRO instrument is compared, irrespective of the interpretability or the quality of the anchor. We will include two types of publications: (1) Methods articles addressing MID estimation using an anchor-based approach (theoretical descriptions, summaries, commentaries, critiques). We will include only studies that dedicate a minimum of two paragraphs to discuss methodological issues when estimating MIDs from both of these types of studies. (2) Original reports of studies that document the empirical development of an anchor-based MID for a particular instrument in a wide range of chronic medical and psychiatric conditions. That is, studies will compare the results of a PRO instrument to an independent standard (the ‘anchor’), irrespective of the interpretability or the quality of the anchor. We will include adult (≥18 years of age) and paediatric populations (<18 years of age). PROs of interest will include self-reported patient-important outcomes of health-related quality of life, functional ability, symptom severity and measures of psychological distress and well-being.
In our definition of a PRO, if the patient is incapable of responding to the instrument and a proxy is used, then the study is still eligible. If the clinician completes only the anchor, then it is still eligible. For anchor-based MIDs identified in the paediatric population for children under the age of 13, we will include both patient-reported and caregiver-reported instruments.
We will exclude studies when only the clinician completes the PRO instrument (ineligible proxy), as well as studies reporting only distribution-based MIDs without an accompanying anchor-based MID.
Information sources and search
We will search MEDLINE, EMBASE and PsycINFO for studies published from 1989 to the present (the MID concept was first introduced into the medical literature in 1989).3 Search terms will include database subject headings and text words for the concepts: ‘minimal important difference’, ‘minimal clinical important difference’, ‘clinically important difference’, ‘minimal important change’, alone and in combination, and adapted for each of the chosen databases. Table 1 presents the MEDLINE search strategy. One of the co-authors (DLP) works closely with MAPI Trust, which maintains PROQOLID—a large repository of commonly used, well-validated PRO instruments.13 To supplement our search, the investigator has granted us access to the PROQOLID internal library, which houses approximately 180 citations related to MID estimates. We will, in addition, search the citation lists of included studies and collect narrative and systematic reviews for original studies that report an MID for a given instrument.
Teams of two reviewers will independently screen titles and abstracts to identify potentially eligible citations. To determine eligibility, the same reviewers will review the full texts of citations flagged as potentially eligible.
Data collection, items and extraction
Teams of data extractors will, independently and in duplicate, extract data using two pilot-tested data collection forms: one for the systematic survey addressing the anchor-based methods used to estimate MIDs, and the second for the systematic survey of the inventory of published anchor-based MIDs. On study initiation, we will conduct calibration exercises until sufficient agreement is achieved. Our data collection forms will include the following items: study design, description of population, interventions, outcomes, characteristics of candidate instruments (eg, generic or disease specific), characteristics of independent anchor measures, critiques/commentary on methods to estimate MID(s), and credibility criteria for method(s) to estimate MIDs.
For the extraction of anchor-based methods studies used to estimate MIDs, a team of methodologists familiar with MID methods will use standard thematic analysis techniques14 to abstract concepts related to the methodological quality of MID determinations until reaching saturation. We will review coding and revise the taxonomy of methodological factors iteratively until informational redundancy is achieved. Appendix A presents the data extraction items for the systematic survey of the inventory of published anchor-based MIDs. Reviewers will resolve disagreements by discussion and, if needed, a third team member will serve as an adjudicator.
Subsequently, we will synthesise and complete each objective as follows:
Objective 1: Methods to develop anchor-based methods used to estimate MIDs
We will classify the anchor-based approaches used to determine MIDs into separate categories, describe their methods, and summarise their advantages, disadvantages and important factors that constitute a high-quality anchor.
Objective 2: Development of a credibility instrument for studies determining MIDs
We define credibility as the extent to which the design and conduct of studies measuring MIDs are likely to have protected against misleading estimates. Similar definitions have been used for instruments measuring the credibility of other types of study designs.15 ,16
The systematic survey addressing methods to estimate MIDs (objective 1) will identify all the available methodologies and concepts along with their strengths and limitations, and will thus inform the item generation stage of instrument development. On the basis of the survey of the methods literature and our group's experience with methods of ascertaining MIDs,3 ,4 ,17–22 we will develop initial criteria for evaluating the credibility of anchor-based MID determinations. Our group has used these methods successfully for developing methodological quality appraisal standards across a wide range of topics.16 ,23–26
Using a sample of eligible studies, we will pilot the draft instrument with four target users, specifically researchers interested in the credibility of MID estimates, who will be identified within our international network of knowledge users (please see ‘Knowledge Translation’ section below). The data collected at this stage will inform item modification and reduction. This iterative process will be conducted until we achieve consensus for the final version of the instrument.
Objective 3: Systematic survey of studies generating an anchor-based MID of target PRO instruments
We will summarise MID estimates separately for paediatric and adult populations, along with study design, intervention, population characteristics, characteristics of the PRO, characteristics of the anchor and credibility ratings. If multiple MID estimates are captured for the same PRO instrument across similar clinical conditions, we will summarise all the estimates.
Objective 4: Measuring credibility of compiled MIDs
Using the instrument (the development of which we have described in objective 2), teams of two reviewers will undertake the credibility assessment for all eligible studies identified in the review. The appraisal will be performed in duplicate, using prepiloted forms. Disagreements will be resolved by discussion between the reviewers and, if needed, with a third team member. Knowledge users (eg, future systematic review authors or guideline developers) could then use their judgement to consider the credibility of the MID on a continuum from ‘highly credible’ to ‘very unlikely to be credible’.
Objective 5: Reliability study of the credibility instrument
We will conduct a reliability study of our instrument to measure the credibility of MIDs by calculating the inter-rater reliability and associated 95% CI as measured by weighted κ with quadratic weights. We will complete reliability analyses using classical test theory. We will consider a reliability coefficient of at least 0.7 to represent ‘good’ inter-rater reliability.27–29 According to Walter et al,30 considering three replicates per study (three raters), a minimally acceptable level of reliability of 0.6 and an expected reliability of 0.7, an α of 0.05 and a β of 0.2, we would require a minimum of 133 observations/study assessed per rater. We will use all the studies identified in the systematic survey of estimated MIDs to calculate the reliability estimate. On the basis of initial pilot screening, we estimate that we will have approximately 400 eligible studies.
Our multidisciplinary study team is composed of knowledge users and researchers with a broad range of content and methodological expertise. We represent and provide direct links to international networks of key knowledge users including the GRADE Working Group, the Cochrane Collaboration, the WHO and the Canadian Agency for Drugs and Technologies in Health. Collectively, these organisations provide opportunities for disseminating our findings to an international network of clinical trialists, systematic review authors, guideline developers and researchers involved in the development of PROs.
Our dissemination strategies include the incorporation of our knowledge products into Cochrane reviews via the GRADEprofiler (http://www.guidelinedevelopment.org—a globally-adopted platform that is widely used by our target audience for conducting systematic reviews and developing practice guidelines) and MAGICapp (a guideline and evidence summary creation tool that uses the GRADE framework). We will offer our work for relevant portions of the printed and online versions of the Cochrane Handbook, and we will develop interactive education sessions (workshops) and research briefs to inform knowledge-users from various health disciplines about our findings and their implications for clinical trial, systematic review and guideline development.
Our systematic survey will represent the first overview of methods to develop anchor-based MIDs, and the first comprehensive compendium of published anchor-based MIDs associated with PRO instruments used in evaluating the effects of interventions on chronic medical and psychiatric conditions. Our systematic survey on methods to estimate MIDs will draw attention to the methodological issues and challenges involved in MID determinations. In doing so, we will deepen the understanding and improve the quality of reporting and use of MIDs by our target knowledge users. The methods review will inform the development of an instrument to determine the credibility of anchor-based MIDs that will allow us to address existing MIDs and can subsequently be used to evaluate new studies offering anchor-based MIDs. This work will also help knowledge users identify anchor-based MIDs that may be less credible, misleading or inappropriate with respect to the average magnitude of change that is important to patients.
We recognise that some variability in MIDs will be attributable to context and patient characteristics. For example, anchor-based MIDs are established using average magnitudes of change that are considered important to patients. Without individual patient data, we will not be able to explore, for example, subgroups of patients with mild or severe disease, gender differences and the relative contributions of these factors. We will alert our knowledge users to this potential limitation.
Collectively, these efforts will promote better-informed decision-making by clinical trialists, systematic review authors, guideline developers and clinicians interpreting treatment effects on PROs.
The authors would like to thank Tamsin Adams-Webber at the Hospital for Sick Children and Mr. Paul Alexander for their assistance with developing the initial literature search.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online supplement
Contributors BCJ, SE, GHG and GN conceived the study design, and AC-L, TAF, DLP, BRH and HJS contributed to the conception of the design. BCJ and SE drafted the manuscript, and all authors reviewed several drafts of the manuscript. All authors approved the final manuscript to be published.
Funding This project is funded by the Canadian Institutes of Health Research, Knowledge Synthesis grant number DC0190SR. SE is supported by an MITACS Elevate and SickKids Restracomp Postdoctoral Fellowship Awards.
Competing interests TAF has no COI with respect to the present study but has received lecture fees from Eli Lilly, Meiji, Mochida, MSD, Otsuka, Pfizer and Tanabe-Mitsubishi, and consultancy fees from Sekisui Chemicals and Takeda Science Foundation. He has received royalties from Igaku-Shoin, Seiwa-Shoten and Nihon Bunka Kagaku-sha publishers. He has received grant or research support from the Japanese Ministry of Education, Science, and Technology, the Japanese Ministry of Health, Labour and Welfare, the Japan Society for the Promotion of Science, the Japan Foundation for Neuroscience and Mental Health, Mochida and Tanabe-Mitsubishi. He is a diplomate of the Academy of Cognitive Therapy. All other authors declare no conflicts of interest.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.