Introduction As patient assessment of health-related quality of life (HRQOL) in cancer clinical trials has increased over the years, so has the need to attach meaningful interpretations to differences in HRQOL scores between groups and changes within groups. Determining what represents a minimally important difference (MID) in HRQOL scores is useful to clinicians, patients and researchers, and can be used as a benchmark for assessing the success of a healthcare intervention. Our objective is to provide an evidence-based protocol to determine MIDs for the European Organisation for Research and Treatment for Cancer Quality of life Questionnaire core 30 (EORTC QLQ-C30). We will mainly focus on MID estimation for group-level comparisons. Responder thresholds for individual-level change will also be estimated.
Methods and analysis Data will be derived from published phase II and III EORTC trials that used the QLQ-C30 instrument, covering several cancer sites. We will use individual patient data to estimate MIDs for different cancer sites separately. Focus is on anchor-based methods. Anchors will be selected per disease site from available data. A disease-oriented and methodological panel will provide independent guidance on anchor selection. We aim to construct multiple clinical anchors per QLQ-C30 scale and also to compare with several anchor-based methods. The effects of covariates, for example, gender, age, disease stage and so on, will also be investigated. We will examine how our estimated MIDs compare with previously published guidelines, hence further contributing to robust MID guidelines for the EORTC QLQ-C30.
Ethics and dissemination All patient data originate from completed clinical trials with mandatory written informed consent, approved by local ethical committees. Our findings will be presented at scientific conferences, disseminated via peer-reviewed publications and also compiled in a MID ‘blue book’ which will be made available online on the EORTC Quality of Life Group website as a free guideline document.
- eortc-qlq C30
- anchor-based methods
- minimal important difference (mid)
- cancer clinical trials
- responder threshold
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
- eortc-qlq C30
- anchor-based methods
- minimal important difference (mid)
- cancer clinical trials
- responder threshold
Strengths and limitations of this study
Several anchor-based methods will be applied and compared.
Multiple clinical anchors will be constructed per QLQ-C30 scale.
A library of minimally important differences (MIDs) will be established on the European Organisation for Research and Treatment for Cancer QLQ-C30 across various patient populations, according to cancer site.
This study will supplement previously published MID guidelines, hence establishing more robust MID guidelines.
Anchor-based MIDs can only be estimated for QLQ-C30 scales for which a suitable anchor is available in our database.
Patient assessment of health-related quality of life (HRQOL) in cancer clinical trials has increased over the years.1 Consequently, there is greater need to attach meaningful interpretations to aggregated HRQOL scores, whether differences in HRQOL scores between groups or within-patient changes in HRQOL over time. Determining what represents a minimally important difference (MID) in HRQOL scores is useful to clinicians, patients and researchers, and can be used as a benchmark for assessing the success of a healthcare intervention (eg, a new treatment) or the design of future clinical trials (eg, determining sample sizes).
MID has been defined as: ‘the smallest difference in score in the outcome of interest that informed patients or informed proxies perceive as important, either beneficial or harmful, and which would lead the patient or clinician to consider a change in the management’ (Schünemann and Guyatt, p594).2 It is important to note that there is a wide-range of terminology for ‘clinical meaningful change’ in the literature. Notable distinctions in terminology have been made when referring to either group-level difference/change or individual-level change. A valuable and comprehensive critique on this topic and relevant references is given by King.3 In this manuscript, we shall use the term ‘MID’ to refer to group-level thresholds and ‘responder threshold (RT)’ to refer to individual-level change. This project will mainly focus on MID estimation and will make distinctions between (1) group-level difference: cross-sectional differences in HRQOL scores between clinically-defined groups at a given time point, (2) group-level change: change in HRQOL scores within a group over time and (3) individual-level change: within-patient change in HRQOL scores over time. MIDs that are based on (1) and (2) are useful for interpreting group-based trial results, while RTs for individual-level change can be useful in trials as thresholds to define ‘responders’, that is, patients who improved (or conversely patients who deteriorated) by a certain amount.
There are two broad methods for estimating MIDs/RTs; the anchor-based and distribution-based methods. The anchor-based approach has received much attention in the literature.4–10 This approach expresses differences or change in HRQOL scores by linking particular HRQOL domains either to known variables, which have clinical relevance, or to patient-derived/physician-derived ratings of change in the particular domain.2 4 10 11 In this approach, it is crucial to evaluate the appropriateness of anchors. The usefulness of the estimated threshold will depend on the anchor selected, how adjacent groups are defined within that anchor, and the strength of the relationship (conceptually and empirically) between the anchor and the target HRQOL domain.3 It is worth noting that the estimated thresholds will depend on a range of factors, including the instrument, patient population, selected anchors and the methods used. Hence a global rule for MIDs/RTs applicable to all situations is highly unlikely.11 12 It is recommended that thresholds be estimated by applying several anchor-based methods and using several types of anchors, and then to triangulate on a single value or small range of values.11 Also, the literature does not clearly distinguish between the methods for estimating group-level versus individual-level thresholds. This study will offer a great opportunity to compare with results across several anchor-based methods.
Distribution-based methods, on the other hand, rely solely on the statistical distribution of HRQOL scores (do not consider patients’/clinicians’ perspective),13 14 and have been recommended to be used as supportive evidence to anchor-based estimates.11
In this project, we focus on the anchor-based approach, particularly in a setting where both the anchors and HRQOL scores are collected longitudinally. The data will be derived from published phases II and III European Organisation for Research and Treatment for Cancer (EORTC) trials which assessed HRQOL using the EORTC core HRQOL questionnaire, QLQ-C30. The aim of the project is to provide an evidence-based approach to determine MIDs for HRQOL scores of the EORTC QLQ-C30. Specifically, the appropriateness of particular clinical anchors in determining MIDs will be empirically evaluated. In addition, a library of MIDs will be established on the EORTC QLQ-C30 across various patient populations, according to cancer site (melanoma, lung, brain, etc) as well as stage of disease.
Osoba et al 4 provided recommendations for small (5–10 points), moderate (10–20 points) and large changes (>20) for interpreting HRQOL scores of the EORTC QLQ-C30. This was based on individual data from patients with breast and small-cell lung cancers and included four of the EORTC QLQ-C30 scales (physical, emotional, social and global health). A global patient rating of change was the anchor. Similar findings were reported by King8 based on comparing group differences, from multiple cancer sites, using published study results. More recent guidelines by Cocks et al 15 16 using anchor-based methods highlighted that previous guidelines may be too simplistic in that they do not differentiate between the QLQ-C30 scales as well as between direction of change (improvement vs deterioration). These evidence-based guidelines further recommended using the lower bound as a minimal relevant threshold, arguing that large effect sizes (ES) were not always realistically achievable in all settings.
In contrast to Osoba et al, 4 this project will use multiple clinical anchors using clinical variables tailored to the specific cancer disease sites that are available in our database. The guidelines of King8 and Cocks et al 15 16 were based on meta-analyses of published studies, pooling across cancer sites, whereas we will use individual patient data to estimate MIDs for different cancer sites separately. Therefore, this project presents an opportunity to add to previously published MID guidelines for the EORTC QLQ-C30 scales, for example,4 6–8 15 16 and compare these to estimated MIDs from our study.
Methods and analysis
Datasets and definition
Databases used for the analysis
The data will mainly be extracted from published phase II and III EORTC clinical trials. We will include only studies that collected HRQOL data at baseline and follow-up using the EORTC QLQ-C30 and supplementary EORTC questionnaire modules. Cancer types include melanoma, lung, colorectal, brain, head and neck, prostate, breast, testis, ovarian, pancreas and oesophageal cancer. Data from more recent EORTC studies, completed during this project, will also be included as well as data from non-EORTC clinical trials when available.
Data will be pooled within each cancer site separately using study time (defined as days since randomisation) as the common temporal scale per patient. MIDs will be established per cancer site, with attention to robustness across the different subpopulations.
The EORTC QLQ-C30
The focus of the analysis is on the EORTC QLQ-C30, a self-administered questionnaire designed for use in cancer clinical trials. The EORTC QLQ-C30 comprises 30 items, 24 of which are aggregated into nine multi-item scales, that is, five functioning scales (physical, role, cognitive, emotional and social), three symptom scales (fatigue, pain and nausea/vomiting) and one global health status scale. The remaining six single-item (dyspnoea, appetite loss, sleep disturbance, constipation, diarrhoea and the financial impact) scales assess symptoms. The financial impact scale will be omitted from the analysis because suitable anchors are unlikely to exist.
Scoring of the EORTC QLQ-C30 scales will follow the standard procedures (see EORTC QLQ-C30 Scoring Manual17). For consistency in signs of the change scores across the various EORTC QLQ-C30 scales, the symptom scores will be reversed to follow the functioning scales interpretation, that is, all scales will be scored such that 0 represents the worst possible score and 100 the best possible score. All versions of the EORTC QLQ-C30 will be used.17
We hope to identify at least one suitable clinical anchor for each EORTC QLQ-C30 scale from among potential clinical factors (eg, laboratory measures, physiological measures and clinician ratings) that are available in the databases. No patient ratings of change (eg, subjective significance questionnaires) are available in the database. HRQOL scores will only be considered as anchors if valid MIDs are known. Since the QLQ-C30 yields 15 scales measuring a wide range of symptoms and functioning, the suitability of an anchor must be considered relative to specific HRQOL domains. A suitable anchor for any particular QLQ-C30 scale should fulfil several criteria. Most notably the anchor should be relevant for the disease indication, should have clear medical interpretation and clinicians should be familiar with it. Also, there should be a conceptual and empirical relationship between the anchor and its patient-reported counterpart.11
Anchors will be selected per cancer site. This exercise will be guided by a panel of five to six clinical experts (per disease site) who are familiar with the specific trials, as well as with the structure of the EORTC QLQ-C30. These experts will primarily be recruited from the EORTC QOL group and from the panel of investigators involved in the included studies. The clinical experts will be briefed on the purpose of the project and the importance of selecting anchors that are clinically related to the corresponding HRQOL scales.
Clinical anchors will be preselected based on availability (ie, the total that can be successfully matched to existing QLQ-C30 assessments), strength of correlation with the corresponding EORTC QLQ-C30 scale and finally clinical plausibility. A clinical anchor will be matched to an EORTC QLQ-C30 form if their respective assessment dates are within a predefined window. This time window will be determined on a per trial basis to ensure that the underlying true associations in the data are preserved. First, a candidate list of relevant clinical variables will be assembled based on the availability within each disease site. The acceptable compliance rate (ie, availability of complete information on both the anchor and the HRQOL scale) will depend on both relative and absolute available numbers. We aim for compliance rates ≥50% and an effective sample size of at least 200 patients with repeated observations after pooling data for each cancer site separately. Thereafter, we will evaluate how well the anchors correlate with the corresponding QLQ-C30 scale at various time points of interest. Either a Spearman’s rank, polyserial or polychoric correlation will be used, depending on the distribution of the pair of variables. The correlation between their change scores will also be checked. Revicki et al 11 suggested a correlation of ≥|0.30| as a measure of an acceptable association. Where achievable, however, anchors with much stronger correlations will be prioritised as suggested by recent simulation studies.18 The list of retained anchors will be independently scrutinised for clinical relevance by the clinical experts, who will help to define clinically relevant cut-off points in the anchor. Multiple anchors will be constructed for each QLQ-C30 scale where possible. If no suitable anchors can be identified for a given scale, no anchor-based MID will be estimated and reported for that scale.
Availability of anchors
When an anchor is only available for a subset of trials, only that subset will be used. A table will be constructed to summarise the availability of each anchor in the set of trials, and the QLQ-C30 scales to which each anchor is related (conceptually, clinically and empirically). For each anchor, we will present how important change will be defined (as prescribed by our panel of clinical experts), along with the estimated correlation with the corresponding QLQ-C30 scale.
Descriptive tabulation of the distribution of anchors and the EORTC QLQ-C30 scales will be made by trial, and pooled across trials. If insufficient variation is present or missing data are substantial in any anchor or scale, its inclusion in further analyses will be re-evaluated.
As a first step to establish the validity of an anchor, correlations between the anchors and their corresponding QLQ-C30 scales will be calculated using all matched anchor/HRQOL scale pairs, regardless of time point. Scatter plots of the correlations will be inspected to gain greater understanding of bivariate distributions. The correlations will be calculated taking potential confounding factors into account (eg, treatment, gender, age, disease stage, country, trial, etc), to investigate the robustness of the associations in the overall population. Anchor/HRQOL scale pairs that fail to correlate at least 0.30 in at least one subgroup will be excluded from further consideration. Subgroups with associations <0.30 may be excluded from further analysis, after discussion with the clinical experts. Similarly, we will investigate the correlation between change scores of the anchor and HRQOL scale over time. Priority will be given to anchor/HRQOL scale pairs with correlations of at least 0.30 when MIDs for change scores are to be calculated.
The HRQOL score will be presented descriptively (eg, mean, median, range and SD) at every time point of interest, within various subgroups (eg, treatment, gender, age group, disease stage, country, trial, etc), as well as in the overall population.
Handling of missing data
Missing HRQOL data
We will cross-check compliance with the protocol schedule and verify the reasons for missing data. A cross-tabulation of the clinical anchors with HRQOL compliance will be made. We will evaluate the proportion of missing HRQOL forms per category of the anchor and also check if subjects with missing HRQOL forms differ systematically from those with complete HRQOL data. If systematic differences are found, a panel of methodological experts will be consulted to suggest appropriate sensitivity analyses (eg, imputation techniques) to check the robustness of the estimated thresholds.
Missing clinical anchor data
Clinical anchors will be selected in such a way that missing data is minimised. For each EORTC QLQ-C30 scale, the subset of anchors with the least amount of missing data will be prioritised. Similar to the handling of missing HRQOL data, we will also explore the anchor data to identify patterns as well as reasons for missingness.
Cross-sectional analysis of HRQOL scores
Cross-sectional differences (ie, at the same time point) of HRQOL scores will be calculated between distinct subgroups of patients, where the grouping has been done on the clinical anchor. The categorisation based on the clinical anchor is expected to yield groups that are distinct in health state, as this property is part of the clinical anchor building and evaluation process. For each HRQOL scale, the difference in mean HRQOL between each pair of adjacent group categories will be calculated at specific time points of interest, for example, at baseline, at the end of treatment and at the end of follow-up. In addition, we will calculate ES for these groups by dividing the difference of the mean HRQOL score from both groups by the SD between patients in either group.8
Anchor-based method for change scores
The focus will be on examining both group-level and individual-level change over time. We will compute all possible pairwise time point differences in HRQOL scores and combine the data. This means that a subject can contribute multiple change scores that are calculated across different pairs of time points, and the resulting dependency within the data will be accounted for whenever a regression model is applied. We will consider specific time intervals, namely changes in HRQOL scales in the periods between start and end of treatment, and between end of treatment and end of follow-up as these are often well defined across several studies. Furthermore, depending on the study design and setting, we will consider additional shorter time intervals prior to the end of treatment where feasible. Subjects will be assigned to distinct subgroups reflecting various levels of change (eg, no change, small positive changes, large positive changes, small negative changes or large negative changes) based on the clinical anchor(s). These groups will be referred to as clinical change groups (CCG) and they are mutually exclusive. For each pair of time points and for a given anchor, a patient can thus belong to only one CCG category.
Change in HRQOL score between two time points is commonly expressed as a simple difference. We will explore other ways to express this change, for example, using relative differences that correct for the scores at baseline or another previous time point. Table 1 presents a list of alternative summary scores for expressing change in HRQOL scores that will be explored. For each CCG, the summary scores will be presented descriptively (eg, mean, median, range and SD). Differences in HRQOL summary scores between adjacent CCGs will be evaluated using primarily non-parametric techniques.
Mean change method: For a given HRQOL scale and its corresponding anchor, the MID for improvement is equal to the mean summary score of the ‘small positive change’ CCG and the MID for deterioration is equal to ‘small negative change’ CCG. The mean summary scores of the ‘small change’ CCGs and that of the ‘no change’ CCG will be compared. If the mean summary score for ‘no change’ CCG is similar to any of the two ‘small change’ CCGs, the estimated MID is doubtful.19
Linear regression: The estimate of the numerical change in HRQOL summary scores (see table 1) that is associated with the transition between adjacent CCG categories will be determined using a linear regression. Separate models will be fitted for improving and deteriorating scores based on the anchor. The outcome variable is the summary score, and the covariate is a binary anchor variable; coded as ‘no change’=0 and ‘small positive change’=1 for model on improvement, and ‘no change’=0 and ‘small negative change’=1 for model on deterioration. The resulting β’s (ie, slope parameters) correspond to the MIDs for improvement and deterioration respectively. This approach can be extended to correct for other covariates that could possibly affect the MID estimates.20
Receiver operating characteristic (ROC) curves: For each summary score, the ROC analysis will be used to estimate RTs based on an anchor. Changes in different directions will be examined separately. For example, for defining improvement, we will create an ‘at least minimally important change’ group using all CCGs for improvements, that is, small positive and large positive CCGs, and a ‘no minimally important change’ group using no change CCG and any level of worsening (ie, small negative and large negative CCGs). Different approaches will be used to calculate threshold values, for example, by; (1) minimising the gap between sensitivity and specificity, (2) minimising the sum of 1-sensitivity and 1-specificity and (3) minimising the sum of squares of 1-sensitivity and 1-specificity.21 The various estimates will be compared and triangulation considered in order to establish robust guidelines. The assurance with which an estimated threshold can be used will depend on their corresponding sensitivity and specificity values. It is commonly not recommended to apply thresholds to individual patients when sensitivity and specificity are less than 75%.22
Empirical cumulative distribution function: For each possible value of a given summary score (see table 1) expressing change in the EORTC QLQ-C30 domains over time, the percentage of patients achieving at least that amount of change will be plotted separately for each CCG, and also separately for improving and deteriorating scores. The benefit of this approach is that the separation between CCGs may be visually compared across all values of the summary scores, thus offering a range of possible RTs for clinical relevance that can be considered simultaneously.23 24
The estimated thresholds across these methods will be compared, and the percentage of patients with improved or deteriorated HRQOL scores will be reported. Recommendation for using estimated RTs for classifying individual patients will be based on whether the probability of misclassification is low22 or whether the RT values exceed the measurement error level by comparing the thresholds to the minimum detectable change (MDC).3 13 22 25 The MDC 25 represents the smallest change that can be considered to be above the measurement error. Usually, if the MDC is greater than the RT then the measure is insufficiently precise to monitor individual patients. Furthermore, when setting RTs, especially on domains that are computed based on a single item, we will check that the RTs align with the underlying change levels of the scale scores.26
We will examine the distribution-based approaches based on the SD (standard deviation) criteria, for example, 0.2 SD, 0.3 SD, 0.5 SD and the SEM (standard error of measurement).13 The SDs and SEM will be calculated on the summary scores (see table 1) yielding MIDs corresponding to the rules above. Since this approach requires that the data are normally distributed, those summary scores that violate this assumption (based on standard testing techniques) will not be considered.
ES14 will be calculated by dividing the summary scores in table 1 by the pooled SD of subjects at baseline (ie, before treatment). This will be done for any two adjacent time points, for example, depending on whether the level of compliance is acceptable. As a variation we will also calculate the ES between adjacent time points by using the SD of subjects at the previous time point27
Validation and sensitivity analysis
Stability of the estimated MIDs
Characteristics, such as age, gender, disease stage, country, etc, typically influence the absolute score outcomes of many HRQOL scales.28 The stability of the estimated MIDs will therefore be investigated by including these factors (one at a time) and an interaction term with the anchor in a regression model. We will include as many sociodemographic and clinically relevant covariates as are available from the study database and that can be evaluated by the available sample size.
For each cancer site in order to perform external validation, we will examine external (ie, non-EORTC) studies having comparable data. This is subject to the availability of such data.
Handling the boundaries (floor and ceiling) effects
We will check for the proportion of patients with boundary (extreme) scores. For those patients where the later time point was a boundary score, the change over time may be incorrectly estimated by simple subtraction. The change in clinical anchor for these patients at the boundaries will be used to estimate the magnitude of the problem. The proportion of patients with a change in clinical anchor that is not reflected in the HRQOL change due to the boundary constraints would be an indication of a limiting boundary problem. As a sensitivity check, we will investigate how much the MIDs are affected if we include or exclude these patients.
In this project, we will determine MID for HRQOL scores of the EORTC QLQ-C30, using empirical individual patient data. The main focus is on the anchor-based approach. We aim to construct multiple anchors per QLQ-C30 multiple-item or single-item scale and apply and compare results from several anchor-based methods as recommended in the literature.11 19 Figure 1 presents a flow diagram summarising the key data component, the clinical anchor construction procedure and the main statistical methods which will be applied in this project. Hopefully, the resulting MID estimates can triangulate to one value or a small range of values.
It is important to highlight that there are diverse opinions in the literature on whether or not it is plausible to use the same methods for interpreting individual-level change versus group-level differences/change. For instance, the mean change method and the ROC curve method have been labelled to be appropriate for comparing group-level and individual-level change, respectively.20 29 On the other hand, both methods have been recommended to be useful for estimating MIDs that are useful for interpreting either group-level or individual-level change as long as the anchor is available at the individual level .22 30 31 We will compare and contrast MID estimates from the different methods to provide empirical evidence, and assess whether it is possible to apply a simplified guideline to between group differences/change and individual-level change.
A strength of our research is its integral combination of both clinical and methodological expertise. The findings will ultimately improve the interpretation of the QLQ-C30 scale scores in clinical trials by providing empirical guidelines for relevant improvements and deteriorations.
Each year, there are over 5000 newly registered downloads of the EORTC quality-of-life measures. The information from our research will be of added value to all its users (eg, pharma and academic) since a frequent issue raised by regulators and trial sponsors is an understanding of MID.
The main limitations of this project are that anchor-based MIDs can only be estimated for QLQ-C30 scales for which a suitable anchor is available in the database. Also, the available anchors rely exclusively on clinical observations or interpretations. Unfortunately, patient ratings of change (eg, subjective significance questionnaires) are not available in the study database. We will consider using other HRQOL scores as a way to include the patient’s perspective if valid MIDs are known for the given HRQOL scores.
Overall, this project will supplement previously published research by using individual patient data to estimate MIDs for different cancer sites separately, hence, further providing evidence to robust and practical MID guidelines for the EORTC QLQ-C30.
We thank the members of the various EORTC disease groups and their clinical investigators, and all the patients who participated in the trials that we shall be analysing.
Contributors AB, CC, DEE, MTK, MG, MAGS, GV and H-HF: contributed to the conception and design of the study. MG, MAGS, YB, GV and H-HF: revised the proposed methodology for clinical plausibility. ZJM, J-FH, CC, KC, DEE, JM and MTK: provided critical input on the proposed statistical analysis. ZJM and CC: drafted the protocol. All the authors read and corrected the drafts and approved the final version.
Funding This study was funded by an unrestricted academic grant from the EORTC Quality of Life Group.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.