Thresholds for clinically important deterioration versus improvement in COPD health status: results from a randomised controlled trial in pulmonary rehabilitation and an observational study during routine clinical practice

Objectives Chronic Obstructive Pulmonary Disease (COPD) is a progressive disease. Preventing deterioration of health status is therefore an important therapy goal. (Minimal) Clinically Important Differences ((M)CIDs) are used to interpret changes observed. It remains unclear whether (M)CIDs are similar for both deterioration and improvement in health status. This study investigates and compares these clinical thresholds for three widely-used questionnaires. Design and setting Data were retrospectively analysed from an inhouse 3-week pulmonary rehabilitation (PR) randomised controlled trial in the German Klinik Bad Reichenhall (study 1), and observational research in Dutch primary and secondary routine clinical practice (RCP) (study 2). Participants Patients with COPD aged ≥18 years (study 1) and aged ≥40 years (study 2) without respiratory comorbidities were included for analysis. Primary outcomes The COPD Assessment Test (CAT), Clinical COPD Questionnaire (CCQ) and St George’s Respiratory Questionnaire (SGRQ) were completed at baseline and at 3, 6 and 12 months. A Global Rating of Change scale was added at follow-up. Anchor-based and distribution-based methods were used to determine clinically relevant thresholds. Results In total, 451 patients were included from PR and 207 from RCP. MCIDs for deterioration ranged from 1.30 to 4.21 (CAT), from 0.19 to 0.66 (CCQ), and from 2.75 to 7.53 (SGRQ). MCIDs for improvement ranged from −3.78 to −1.53 (CAT), from −0.50 to −0.19 (CCQ), and from −9.20 to −2.76 (SGRQ). Thresholds for moderate improvement versus deterioration ranged from −5.02 to −3.29 vs 3.89 to 8.14 (CAT), from −0.90 to −0.72 vs 0.42 to 1.23 (CCQ), and from −15.85 to −13.63 vs 7.46 to 9.30 (SGRQ). Conclusions MCID ranges for improvement and deterioration on the CAT, CCQ and SGRQ were somewhat similar. However, estimates for moderate and large change varied and were inconsistent. Thresholds differed between study settings. Trial registration number Routine Inspiratory Muscle Training within COPD Rehabilitation trial: #DRKS00004609; MCID study: #UMCG201500447.


InTRODuCTIOn
The use of health status questionnaires is recommended by the Global initiative for Chronic Obstructive Lung Disease (GOLD) for the assessment, evaluation Strengths and limitations of this study ► Our study is the first dedicated investigation of (minimal) clinically important differences ((M)CIDs) for deterioration on chronic obstructive pulmonary disease (COPD) health status tools in comparison with thresholds for improvement. ► Our study used a combination of anchor-based and distribution-based methods to determine clinically relevant thresholds for both deterioration and improvement. ► Our study investigated clinically relevant thresholds in two different study settings-pulmonary rehabilitation (PR) and routine clinical practice (RCP)-by using data from various follow-up periods to minimise the possible impact of the recall period. ► Our study included a limited number of patients with deterioration after PR and during RCP, and a limited number of patients indicating moderate and large change in health status. ► Our study resulted in broad ranges and wide CIs for (M)CIDs of COPD health status tools, requiring possibly larger sample sizes for more accuracy.

Open access
and management of patients with chronic obstructive pulmonary disease (COPD). 1 The COPD Assessment Test (CAT), 2 the Clinical COPD Questionnaire (CCQ) 3 and the St George's Respiratory Questionnaire (SGRQ) 4 are frequently used patient-reported health status tools important for clinical practice and scientific research, 5 especially since the burden of COPD is high worldwide. 6 7 Various studies have examined clinically relevant thresholds for change on the CAT, CCQ and SGRQ in order to be able to evaluate and interpret treatment effects. [8][9][10][11][12][13][14][15][16][17][18] The minimal clinically important difference (MCID) is a parameter that quantifies this threshold. It has been defined as 'the smallest difference in score, which patients perceive as beneficial and which would mandate a change in the patient's management'. 19 MCIDs are particularly interesting for health status questionnaires, where a change in its score is not intuitively meaningful. Change exceeding the level of the MCID can be considered clinically relevant, thus justifying therapy and help developing guidelines. It is pivotal that clinically relevant thresholds for change on a health status tool are rigorously studied and analysed carefully.
Most clinical studies that determine the MCID of patient-reported outcomes (PROs) are executed in the context of an intervention such as pharmacotherapy or pulmonary rehabilitation (PR). This usually results in an improvement in the patients' health-related quality of life (HRQoL). MCIDs for improvement have thus been investigated; however, there is a lack of evidence for the MCIDs for deterioration. 20 It remains unclear and debated on to what extent clinically relevant thresholds for improvement should be similar to those for deterioration. [21][22][23][24] Certain studies outside the field of COPD have analysed the MCIDs of PROs and found evidence that values for improvement differed from deterioration. [25][26][27][28][29] On the other hand, there is also evidence that thresholds might be similar. 30 Interpreting worsening of HRQoL is of major importance, since one needs to differentiate between real worsening of patients' status and random variations. Furthermore, the effects of therapy may also halt further deterioration especially for a progressive chronic disease like COPD. So no relevant worsening or a reduction in clinically relevant deterioration over time might also be considered a success of therapy and in clinical trials. 31 In COPD health status, the estimated MCID for the CAT score is 2.00-3.00 units, 11-15 20 for the CCQ score 0.40-0.50 units 8-13 20 and for the SGRQ score 4.00-8.00 units. 12 16-18 20 This is valid for improvement only, as there were too few patients with deterioration to investigate. There are currently no studies that specifically investigate clinically relevant thresholds for deterioration on these PROs. It is however worrying that up to date, multiple studies included the MCIDs of these COPD health status instruments for improvement to interpret deterioration in clinical trials. [32][33][34] This study therefore aims to determine and compare clinically relevant thresholds for deterioration and improvement on the COPD health status questionnaires CAT, CCQ and SGRQ in both a PR and routine clinical practice (RCP) setting.

PATIenTS AnD meThODS Study subjects
This study was a retrospective analysis of data obtained from two prospective clinical trials. Study 1 was a secondary analysis of a subsample from the Routine Inspiratory Muscle Training within COPD Rehabilitation (RIMTCORE) real-life randomised controlled trial in the Klinik Bad Reichenhall, Center for Rehabilitation, Pulmonology and Orthopedics in Germany. 12 35 Patients were recruited on arrival in the clinic between February 2013 and July 2014. Participants were included if they had COPD GOLD II-IV, were aged ≥18 years and gave informed consent. 12 35 Exclusion criteria were the presence of other respiratory comorbidities (eg, bronchiectasis, asthma, history of bronchial carcinoma, sarcoidosis, tuberculosis) or alpha-1-antitrypsin deficiency.
Study 2 (MCID study) was an observational trial of patients with COPD GOLD I-IV aged ≥40 years without other respiratory comorbidities or alpha-1-antitrypsin deficiency. Patients were recruited from Dutch primary and secondary RCP between September 2015 and September 2016. Patients were approached via multiple general practices, hospitals and the Dutch patient lung federation.

Patient and public involvement
In both studies, patients and the public have not actively been involved during the design of the study nor in the assessment of the burden. Summary results are disseminated to participating patients after completion.

Study design and data collection
Patients in study 1 participated in an intensive, 3-week, full-day inpatient PR programme tailored to the patient's individual needs. Details have been presented previously. 12 35 Patient descriptives and postbronchodilator spirometry were collected at baseline and discharge in the clinic. Patients in study 2 received routine care from their physician according to national treatment guidelines. Evaluation of health status over a 12-month period was the primary measurement outcome. Patient descriptives and spirometry data were obtained at baseline. Spirometry results were obtained via the including physician after approval of the participant.
The primary outcomes selected from both prospective studies for this retrospective analysis were the CAT (no recall period), CCQ (weekly version) and SGRQ (monthly version). In study 1, these questionnaires were collected at baseline, at PR discharge and during follow-up at 3, 6, 9 and 12 months. Baseline and discharge measurements were taken in the clinic, where patients were blinded to their baseline scores. Follow-up questionnaires were sent by mail. In study 2, all questionnaires were sent by mail and scored at home at baseline and at 3, 6 and 12 months. For this retrospective analysis, baseline and follow-up scores at 3, 6 and 12 months were included, to allow for sufficient time for deterioration in HRQoL, to include various time periods of measurement and to allow for comparison between both study settings.
The CAT is an eight-item, one-dimensional scale with item scores ranging 0-5 (0: no impairment; 5: maximum impairment) and a total score summing up to a maximum of 40. 2 The CCQ consists of 10 items scoring 0-6 (0: no impairment; 6: maximum impairment). 3 The items cover the domains symptoms (four items), functional status (four items) and mental status (two items). Total and domain scores on the CCQ derive from adding up relevant item scores and dividing this by the number of items. The SGRQ has 50 items classified into the domains symptoms (8 items), activities (16 items) and impact (26 items). 4 Domain and total SGRQ scores can range from 0 to 100 (0: no impairment; 100: maximum impairment). A 15-point Likert scale anchor question (Global Rating of Change, GRC) was scored retrospectively by the patient at each follow-up visit in both data sets. The GRC required patients to assess their COPD health status compared with baseline. The answers were marked on a scale from −7 to +7, ranging from very much worse to very much better and 0 equalling no change. 36 37 Study methods All change scores for the total scores of the CAT, CCQ and SGRQ were calculated as the difference between baseline and the respective follow-up visit (3, 6 and 12 months). Negative change on all questionnaires represented improvement, and positive change deterioration. First, in the anchor-based approach, changes on the health status instruments were classified using the corresponding score on the GRC. Scores of 0 and ±1 on the GRC indicated no change; scores of ±2 and ±3 represented a minimal improvement/deterioration; scores of ±4 and ±5 were summarised as a moderate improvement/deterioration; and scores of ±6 and ±7 indicated a large improvement/deterioration. 36 37 MCID estimates for Figure 1 Forest plot of clinically relevant thresholds for improvement and deterioration on the COPD Assessment Test. Data are presented as mean estimates (squares) including 95% CI (horizontal lines). Estimates from the half SD analysis are represented as single squares. Weighted mean estimates are presented as larger diamonds. Data are separated as minor, moderate and large improvement thresholds (left half), versus minor and moderate deterioration thresholds (right half). COPD, chronic obstructive pulmonary disease; PR, pulmonary rehabilitation; RCP, routine clinical practice. T0, baseline measurement; T3, 3-month follow-up; T6, 6-month follow-up; T12, 12-month follow-up.

Open access
both improvement and deterioration on the CAT, CCQ and SGRQ were calculated as the mean change scores including 95% CI of those patients indicating a minimal improvement/deterioration (±2 and ±3) on the GRC for each follow-up visit, verifying normality of distribution. Mean estimates including 95% CI were determined in a similar way for patients indicating no change (GRC 0 and ±1), moderate change (GRC ±4 and ±5) and large change (GRC ±6 and ±7). Second, the distribution-based method half SD (0.5 SD) of the change score was calculated for improved and deteriorating health status patients at respective follow-up visits. 38 Data analysis Data analysis was performed using SPSS V.24.0. Descriptives were evaluated at baseline for either frequencies with percentages (%), mean with SD or median with range. This was depending on the variable characteristics and/or normality of distribution. Health status data on the CCQ, CAT and SGRQ were evaluated at baseline (T0), 3 months (T3), 6 months (T6) and after 12 months (T12). Normality of distribution was verified using skewness and kurtosis. Values between −1 and +1 were considered indicative for normality. Data were checked for floor and ceiling effects, defined as over 15% of patients scoring in the lowest and highest 10% of the maximum scale range. 39 Mean and SD (or median and range) were calculated at each measurement moment for all patients, as well as specifically for patients with improved and deteriorated health status scores. Baseline scores were compared between improving and deteriorating patients, and tested using independent t-tests after verifying normality of distribution. Baseline scores were compared between both data sets (PR vs RCP) using independent t-tests, Mann-Whitney U tests or χ 2 tests depending on the variable characteristic and/or normality of distribution. Health status change scores were all calculated in comparison with baseline. Follow-up scores were compared with baseline to test for significance of change using paired t-tests verifying normality of distribution.

Open access
In order to determine clinically relevant thresholds for change, first correlations between the GRC and the CCQ, CAT and SGRQ were assessed using Pearson or Spearman correlation coefficients depending on normality of distribution. Correlations needed to be ≥0.30 (preferably ≥0.50) to be eligible as anchor. 22 Correlations were assessed between GRC and questionnaire change scores, and between GRC, baseline and follow-up questionnaire score to assess for a possible response shift. Next, participants were categorised according to their GRC score at each follow-up. Mean changes (95% CI) for each respective category were determined to define thresholds for clinically relevant change. Significance of change for each GRC class at the respective follow-up visit was compared with baseline and assessed with paired t-tests verifying normality of the data. Last, the 0.5 SD of the change score was determined for patients with improved and deteriorating health status change scores separately at each follow-up. Thresholds were compared between both study settings (PR vs RCP).
An absolute overall weighted mean MCID estimate for both improvement and deterioration was calculated at the end by multiplying the number of observations (n) at each follow-up visit times the MCID estimate for that period. The sum was divided by the total number of observations. Anchor-based and distribution-based approaches had similar weights. Estimates for improvement and deterioration were compared visually in a plot.

Patient characteristics
Study 1 included 451 patients with completed baseline data (table 1). 12  There were no significant baseline differences between completers and non-completers of the 12-month follow-up in both studies, except that significantly more women (28.4%) compared with men (10.0%) did not complete the follow-up during RCP. Significant differences in age, forced expiratory volume in 1 s percentage predicted (FEV1%pred) and health status were observed between both studies (table 1).

health status scores for improvement and deterioration
In study 1 and study 2, CAT, CCQ and SGRQ total were normally distributed at baseline and follow-up. Completed pairs of change scores (follow-up vs baseline) were included (pairwise deletion). Floor and ceiling effects were negligible. Mean health status baseline scores were significantly different for PR and RCP (table 1). Overall, 58%-59% of patients had improved health status scores (negative change) at T12 after PR, compared with 44%-47% during RCP (table 2). After PR mean changes observed on the CAT questionnaire at T12 were −5.45±4.66 for improvers and 5.47±4.22 for patients who deteriorated; on the CCQ questionnaire −0.87±0.72 for improvement and 0.83±0.62 for deterioration; and on the SGRQ questionnaire −13.83±10.43   figure 3). These ranges were, respectively, from −4.76 to −2.76

Summary of main findings
Using both anchor-based and distribution-based methods, the weighted MCIDs for improvement and deterioration on the CAT were, respectively, −2.51 vs 2.76 during PR, and −2.49 vs 1.65 during RCP. These thresholds for improvement and deterioration on the CCQ were, respectively, −0.40 vs 0.43 during PR and −0.33 vs 0.30 during RCP. MCIDs for the SGRQ were, respectively, −6.74 vs 5.31 during PR and −4.06 vs 4.78 during RCP for improvement and deterioration. Estimates for minimal clinically important improvement and deterioration were overall somewhat similar; however, absolute MCIDs differed between PR and RCP. Thresholds for moderate and large improvement and deterioration differed from each other, as well as between study settings.

Interpretation of findings
Little evidence exists whether MCIDs for improvement are similar for deterioration. 21 23 40 Jaeschke et al 19 were the first to determine the MCID of a health status tool using a 15-point GRC combining both improved and deteriorated patients with COPD into one group of minimally changed participants. Juniper et al 37 elaborated on this by separating minimally improved patients from deterioration in asthma, but only a limited number of patients indicated deterioration and no conclusions on the MCID of deterioration were drawn. Outside the field of COPD, Crosby et al and de Vet et al 21 40 stated that some studies demonstrated that a smaller MCID for improvement was required compared with deterioration. The current study does not confirm this, although MCIDs seemed smaller for RCP patients compared with PR. Patients experienced more change (hence larger absolute MCIDs) during intervention, possibly as a result of treatment. In RCP, smaller changes may be noted and regarded as relevant for the patient. Up to now it remains unclear, whether the reported differences between PR and RCP are a rehab-specific finding or generally as a result of intervention. Overall, the absolute values for the MCIDs for improvement versus deterioration did not seem to differ much here, with the exception of the SGRQ during PR.
The ranges found in this study for the MCID of the CAT (improvement −3.78 to −1.53; deterioration 1.30 to 4.21) matched with estimates found in other studies. 11-15 20 Two Table 3 Correlations between health status (change) scores and the GRC   Open access studies used a patient-assessed GRC to estimate the MCID of the CAT. 14 15 However, no results were reported for worsened patients or the numbers of patients were too few. Other anchor-based methods suggested that a change of one point on the CAT might represent the MCID for deterioration. 14 The weighted thresholds for minimal clinically relevant improvement (−2.51 in PR and −2.49 in RCP) seemed somewhat comparable with the ones for deterioration (2.76 in PR and 1.65 in RCP) in the current study, except for deterioration during RCP. As CAT allows only integer scores, 2 a change of three points seems a valid threshold for improvement and deterioration, although the MCID for deterioration in RCP could be closer to two points. Thresholds for moderate improvement (−4.23 in PR) and deterioration (7.06 in PR and 3.89 in RCP) turned out less similar. The number of patients moderately deteriorating was low and differences were observed between both study settings. Moderate change might be experienced with a change on the CAT score of four to seven points. Two previous studies suggested that a cut-off point of four points was identified for acute HRQoL deterioration in clinical practice. 41 42 This would match our estimates for moderate change. The number of patients with a large change was too low with wide CIs to enable valid conclusions.

Open access
Regarding the CCQ, the MCID ranges found for both improvement (−0.50 to −0.19) and deterioration (0.19-0.66) overlapped each other in absolute sense, indicating that estimates for improvement and deterioration may be similar. However, differences were noted between PR (±0.40) and RCP (±0.30) for both minimal improvement and deterioration. These estimates for the MCID matched with earlier evidence. [8][9][10][11][12][13] One other study used a GRC to determine the MCID of the CCQ. 8 Unfortunately, no data were available on worsening patients. Thresholds for moderate change on the CCQ were broad (±0.62 to ±1.23). Few patients experienced large changes, but estimates for both types of MCID from both study settings were approximately one point.
Minimal thresholds for improvement (−9.20 to −2.76) and deterioration (2.75-7.53) on the SGRQ overlapped each other, although more variation was present here. A change of approximately four to seven points for both improvement and deterioration seemed to be the minimal clinically important threshold in the current study. The MCID for improvement during PR (−6.74) was larger than for deterioration (5.31); however, CIs for deterioration were wide. Estimates for the thresholds during RCP (4-5 points) were smaller compared with PR (5-7 points). Moreover, the distribution-based estimates turned out smaller than the anchor-based estimates, lowering the absolute weighted MCIDs. Thresholds for moderate improvement and deterioration in the current study were not very similar, ranging absolutely from 7.46 to 16.06 points. Estimates for clinically relevant large HRQoL improvement on the SGRQ ranged from −20 to −18 points for PR and RC, but too few patients were included to draw valid conclusions.
The SGRQ MCID matched to some extent with previous results. 12 16-18 20 Jones et al 16 18 published a threshold of four points, which is generally accepted and applied in Data reported as clinically relevant threshold or n. Negative change represents improvement for all health status instruments. Paired t-tests were applied with significance level at p<0.05. Nonsignificant results were excluded, except for the 'No change' group.

Table 4 Continued
Open access Table 5 Estimates for clinically relevant thresholds for improvement and deterioration on the CCQ CCQ  Strengths and limitations of the current study This retrospective analysis of two prospective studies was the first to investigate clinically relevant thresholds for minimal, moderate and large changes in COPD health status comparing both improvement and deterioration using a triangulation of both anchor-based and distribution-based methods. There were sufficient correlations between the GRC and respective health status questionnaires as required, 22 although they were still only weak to moderate. It should be noted that correlations were stronger with the follow-up score compared with the baseline and/or change score, possibly due to a response shift. Another strength is that multiple follow-up visits were included to limit possible influence of the period of measurements on the MCID and recall bias. 21 24 Moreover, this study investigated clinically relevant thresholds for both PR and RCP, improving its clinical application and external validity.
Although this is the first study to investigate thresholds for clinically relevant deterioration, still a limited number of patients indicated deterioration in HRQoL after PR and during RCP. This is a major limitation lowering the statistical power of the analysis, especially since sample size calculations were not based on the separate GRC categories. A second limitation is that the found thresholds demonstrate broad ranges with wide CIs, limiting its accuracy and requiring a larger sample size than our current studies had. Third, it should be taken into account that anchor-based and distribution-based approaches each has its own relevance, either based on clinical retrospective assessments or statistical parameters. It is recommended to combine both methods in measuring an instrument's MCID 22 ; however, estimates were rather different between these methods.

Implications for future research and clinical practice
Patients with COPD tend to have worsening HRQoL over time; hence, MCIDs for deterioration have an important implication for clinical practice. 44 45 Clinicians and researchers should be able to judge whether groups of patients were really worsening over time or that change observed was subject to random fluctuation. Preventing clinically relevant deterioration in HRQoL by means of therapy is thus an important goal too. Ideally, more research is needed to validate our thresholds for clinically relevant deterioration on the CAT, CCQ and SGRQ, for instance in studies of other kinds of interventions than PR. One cannot directly transform the thresholds for improvement into those for deterioration. Evidence outside the field of COPD   Open access has found differences. However, in the current study, the estimates turned out rather similar with differing MCIDs between studies. Setting could thus potentially impact the MCID, implying that the results in the current study do not necessarily need to be valid in other settings too.

COnCluSIOnS
Determining deterioration in HRQoL is of importance, since one needs to differentiate between real worsening of patients' status and random variations. In this study, estimates for clinically relevant thresholds for improvement and deterioration were somewhat similar, but differed between PR and RCP. We would recommend using cut-points of CAT ≥3 (intervention), CAT ≥2 (RCP), CCQ ≥0.40 (intervention), CCQ ≥0.30 (RCP), SGRQ ≥6 (intervention) and SGRQ ≥5 (RCP) for both minimal improvement and deterioration. Thresholds for, respectively, moderate and large changes should be further explored, but could approximately be in the range of, respectively, 4-5 and 5-6 for CAT, 0.80 and 1.00 for CCQ, and 10-15 points and 15-20 points for SGRQ.  Open access GSK. After this study was terminated, he became an employee of GSK. None of these stated conflicts of interest are linked to the current manuscript. TvdM developed the Clinical COPD Questionnaire (CCQ) and holds the copyright.
Patient consent for publication Not required.
ethics approval All patients in both studies signed informed consent upon participation. The RIMTCORE trial has been approved by the Ethik-Kommission der Bayerischen Landesärztekammer (#12107) and registered in the German Clinical Trials Register (#DRKS00004609). The MCID study has been registered at the University Medical Center Groningen (UMCG) Research Register and evaluated by its Medical Ethical Committee (#201500447).
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The data that support the findings of this study are not publicly available. Participating patients in the RIMTCORE trial have only agreed on the availability of their data to the Klinik Bad Reichenhall, their scientific partners in the data analysis and the Committee of the Bavarian State Chamber of Labor in Munich. Participating patients in the MCID study only agreed on the availability of their data to the University Medical Center Groningen (UMCG) and their scientific partners in the data analysis.
Author note This study is a secondary retrospective analysis of a subsample from the Routine Inspiratory Muscle Training within COPD Rehabilitation (RIMTCORE) real-life randomised controlled trial (#DRKS00004609) in the Klinik Bad Reichenhall, Center for Rehabilitation, Pulmonology and Orthopedics in Germany; and a primary analysis of all patients participating in the Dutch observational trial (MCID study) on COPD health status in routine clinical practice (UMCG trial #201500447).
Open access This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https:// creativecommons. org/ licenses/ by/ 4. 0/.