Article Text

Download PDFPDF

The measurement of long-term health-related quality of life after injury: comparison of EQ-5D and the health utilities index
  1. Suzanne Polinder1,
  2. Juanita A Haagsma1,
  3. Gouke Bonsel1,
  4. Marie-Louise Essink-Bot2,
  5. Hidde Toet3,
  6. Ed F van Beeck1
  1. 1Department of Public Health, ErasmusMC/University Medical Center, Rotterdam, The Netherlands
  2. 2Department of Social Medicine, Academic Medical Centre, University of Amsterdam, Amsterdam, The Netherlands
  3. 3Consumer Safety Institute, Amsterdam, The Netherlands
  1. Correspondence to Dr Suzanne Polinder, Department of Public Health, Erasmus MC University Medical Center Rotterdam, PO Box 2040, 3000 CA Rotterdam, The Netherlands; s.polinder{at}erasmusmc.nl

Abstract

Objective Empirical head-to-head comparison of the health utility index (HUI) mark 2 and 3 and the EuroQol-5D (EQ-5D) in injury patients of all severity levels to obtain more insight into the strengths and limitations of the multi-attribute utility measures (MAUI) to estimate utility losses in injury populations.

Design A self-assessment survey that included the EQ-5D, HUI2 and HUI3 to measure generic health-related quality of life.

Patients Injury patients in The Netherlands 2 years after they attended the emergency department.

Main Outcome Measures Shannon's index and Shannon's evenness index were used to assess absolute and relative informativity, both for the summary scores and by dimension. The study also analysed convergent and construct validity of the MAUI.

Results Mean summary scores significantly differed between the instruments, with highest summary scores for HUI2 (0.88), followed by HUI3 (0.80) and EQ-5D (0.78). Absolute and relative informativity by dimension was highest for the HUI3 descriptive system. The HUI3 was most sensitive for ageing and comorbidity. The largest differences between the MAUI were found for pain/discomfort and anxiety/depression/emotion. The largest differences in discriminative power between EQ-5D and HUI (mark 2 and 3) were seen for skull–brain injury, internal organ injury and upper extremity fractures.

Conclusions Different MAUI resulted in significantly different summary scores. The instruments and their dimensions performed differently for injury severity levels, ageing, comorbidity and injury groups. A combination of the HUI and EQ-5D should be used in studies on injury-related disability, because the combination covers all relevant health dimensions, is applicable in all kinds of injury populations and in widely different age ranges.

  • disability
  • EQ-5D
  • health utility index
  • long-term health-related quality of life
  • methods

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Health-related quality of life (HRQoL) has become an important consideration in the allocation of healthcare resources.1 In the field of injury prevention, policy decisions could be supported by metrics on HRQol and also disability.2 However, this is hampered by a lack of consensus on the preferred methods to arrive at these metrics, as can be derived from the wide variety of different approaches that are used by injury researchers.2 3 Therefore, in a first attempt to reduce the heterogeneity of applied methods, a European group published guidelines for the conduction of follow-up studies measuring injury-related disability.3 Here it is advised to use a combination of the EuroQol-5D (EQ-5D) and the health utilities index (HUI) 3 in all studies on injury-related disability as common core. The HUI (mark 2 and 3) and the EQ-5D are frequently used generic HRQoL measures4–7 and aim to cover the full spectrum of disease and disability.

Both the HUI and the EQ-5D are multi-attribute utility instruments (MAUI), which are standardised health state classifications that can be used to obtain a single summary score (utility score) or so-called preference weight for different health states, based on preferences of the general public.1 8 At the core of any MAUI is a classification system consisting of multiple attributes (dimensions) with ordered levels for each dimension.9 In this way, the health status of patients with problems on several dimensions can be quantified in a single metric between 0 (worst possible health status, and some health states are allocated index scores worse than dead) and 1 (best possible health status). This provides opportunities to compare the health status of patient groups with different diseases (eg, heart disease vs cancer) and injury types (eg, skull–brain injuries vs hip fractures).

The HUI and EQ-5D have been found to be acceptable, feasible, valid and reliable in several population and patient studies.10–13 Clear differences in the HUI and EQ-5D exist in definitions of health, inclusion of health dimensions and construction of the formula. The literature has shown variation in summary scores between the EQ-5D and the HUI for similar health states.10 12 14 15 These differences have the undesirable effect that the distinct instruments yield different utilities for similar health states. Furthermore, it is not yet clear which method provides the most valid summary scores within comprehensive samples of injury patients.

It is generally believed that the HUI is a more responsive utility measure than the EQ-5D because of the crude level structure of the EQ-5D compared with the HUI. However, the performance characteristics of an instrument may be population specific,16 and the question arises as to whether this also holds for the heterogeneous group of injury patients. To be able to interpret the ability to discriminate between different injury and patient groups and different severity levels, a head-to-head comparison between the HUI2, HUI3 and EQ-5D is needed. As far as we know, published evidence of head-to-head comparisons between these three MAUI among injury patients has not yet been conducted.3 A comparison is needed to obtain more insight into the strengths and limitations of both methods to estimate utility losses in injury populations. This could support further consensus development on preferred methodologies within the injury research field.

This paper describes a large follow-up study in which functional outcome was assessed with the EQ-5D and the HUI 2 years after injury in a comprehensive population of hospitalised and non-hospitalised injury patients. We will address the following questions:

  1. Are there differences in summary scores between the EQ-5D and the HUI (mark 2 and 3) in an injury population sample?

  2. What is the discriminative power of the three systems in terms of their ability to distinguish between different levels of HRQoL among injury patients of all external causes and severity levels?

  3. What is the discriminatory power of the three systems based on head-to-head comparisons assessed with Shannon's indices of informativity?

Methods

Study population

We conducted a patient follow-up study among injury patients aged 15 years and older, who had visited one of the emergency departments of the Dutch Injury Surveillance System (LIS).17 18 All unintentional and intentional injuries are recorded. LIS has been implemented in 17 hospitals in The Netherlands (15% coverage), which are considered to be representative for the total population. We used the data of the 1781 respondents on the 24-month questionnaire. The questionnaire was designed to collect information on functional outcome, sociodemographic and injury-related characteristics and healthcare use. A non-response analysis was performed by multivariate logistic regression. The study sample was stratified by type of injury (39 injury groups19) and admission so that severe, less common injuries were overrepresented. Data were corrected for non-response and sample stratification.17 18 Only patients with full information on all three HRQoL measures were included (n=1285).

Instruments

The questionnaire included two generic quality of life measures: the HUI mark 2/35 and the EQ-5D. Table 1 gives an overview of the instrument properties.

Table 1

Overview of HRQoL instrument properties

The EQ-5D self-classifier comprises fives dimensions of HRQoL (mobility, self-care, usual activities, pain/discomfort and anxiety/ depression) each with three levels (for instance, no problems, some/moderate problems and extreme/unable to).7 Accordingly, the EQ-5D classification system distinguishes 243 different health states. The EQ-5D was analysed using the so-called York A1 tariff, which is based on time trade-off preferences from the general public of the UK.6 Reference scores for the EQ-5D index were obtained from the York study.20 The EQ-5D is well able to describe a heterogeneous injury population and to discriminate among specific injuries.17 Moreover, the EQ-5D has been recommended for (economic) evaluation of trauma care at a consensus conference.21 Because the EQ-5D classification does not include memory patterns and/or ability to concentrate, an item was added on cognitive ability.22 The EQ-5D supplemented by the cognitive dimension is referred to as the EQ-6D.

The HUI is a self-administered health status questionnaire consisting of 15 questions, classifying respondents into either the HUI2 or HUI3 health states.

Responses to the questionnaire are converted by an algorithm23 to levels in the complementary HUI2 and HUI3 health status classification systems4 to form seven and eight-element health-state vectors, respectively. From these vectors, single-attribute and overall health state summary scores are calculated using the respective HUI2 and HUI3 utility functions,5 24 25 which are based on the standard gamble method, using preferences from the general Canadian population.24

Analysis

Descriptive statistics were used to compare the mean summary scores, standard deviations and the number of missing cases per item. Missing values were defined as those cases in which no answer was provided. Floor or ceiling effects are considered to be present if more than 15% of respondents achieved the lowest or highest possible score, respectively.26 Questionnaires should exhibit minimal floor and ceiling effects to be optimally able to detect difference and change.

To assess convergent validity, the extent to which the three instruments measure the same concepts, paired comparisons of the mean scores across measures were tested with the paired Student's t test.

In the absence of a gold standard to measure health state utility, there is no clear technique to determine the construct validity of utility measures. A way to examine the construct validity is to examine whether summary scores are different for distinctive groups following a priori hypotheses of the expected patterns (sizes and directions) of the differences (known groups validity).27 Comparisons were made between EQ-5D, HUI2 and HUI3 by age, sex, injury group and different severity levels (multiple injury, comorbidity). Differences regarding the mean summary scores for EQ-5D, HUI2 and HUI3 between groups were tested using a one-way analysis of variance.

The Shannon index and the Shannon evenness index of informativity were used to assess the discriminatory power of each classification system.28 The methodology of Shannon indices originates from the field of information theory, but can in principle be applied to any classification, including health classification systems or MAUI such as the EQ-5D, HUI and the short form health survey (SF-6D).9 Shannon indices can be calculated by dimension separately or by MAUI as a whole. The basic characteristic of Shannon's indices is explained as follows. In an item with two response categories in which one response category has a very high (or low) endorsement, for example, more than 0.95 (or less than 0.05), the response category transmits very little information because one can predict with more than 95% certainty in what response category the answer will be.29 Conversely, in the case of an even distribution, the health dimension is being most efficiently used, which means that the discriminant ability of the level descriptors is maximal. This characteristic of an even distribution underlies Shannon's indices. The Shannon index (H′) combines the number of non-empty categories defined by a system, and measures to what extent the information is (empirically) evenly spread over the non-empty categories. The higher the index H′ is, the more information is captured by the system. The Shannon evenness index (J′) exclusively reflects the evenness (rectangularity) of a distribution, regardless of the number of levels. Five dimensions allowed head-to-head comparison of informativity: mobility/ambulation; anxiety/depression/emotion; pain/discomfort (EQ-5D; HUI2; HUI3); self-care (EQ-5D; HUI2) and cognition (EQ-6D; HUI2; HUI3). To calculate Shannon's indices by instrument as a whole and by injury group, permutations are treated as unique categories (eg, 243 categories for EQ-5D). As the number of observations in our study (n=1285) is lower than the number of theoretically possible permutations in HUI2 (24 000) and HUI3 (972 000), maximum informativity cannot be reached a priori. The basic characteristics of Shannon's index are described elsewhere.9 Regarding the Shannon indices, the observed number of health states in a population are used, not the theoretical possible number of health states.

Results

Descriptive comparison of questionnaires

Of the 1781 injury patients who responded to the 2-year questionnaire 1561, 1541 and 1454 persons fully completed the EQ-5D, HUI2 and HUI3, respectively (table 2). Mean summary scores for the injury population were highest for the HUI2. Generally, the questionnaires demonstrated no floor effects. All three questionnaires showed ceiling effects; 40% of the population was indicated to have no problems on EQ-5D, against 25% on HUI2 and 24% on HUI3. The HUI2 and HUI3 have similar interquartile ranges: 0.15 and 0.18, respectively; conversely, the interquartile range of the EQ-5D was wider (0.23).

Table 2

Descriptive statistics of the EQ-5D and HUI

Comparison of summary scores

Following the results from the paired t-test on the unweighted data, the mean utilities derived with the three MAUI were significantly different from each other (EQ-5D vs HUI2 t=−82.9; EQ-5D vs HUI3 t=18.7; HUI3 vs HUI2 t=−122.0).

In table 3 the differences in summary scores for specific determinants were analysed by MAUI. For several determinants, the differences in summary scores between the instruments were not in the same direction as the total summary scores (HUI2 > HUI3 > EQ-5D). The HUI3 scores were relatively low for elderly patients (age over 65 years), with a more than 0.20 utility difference compared with the HUI2 and EQ-5D for the age group over 85 years. Furthermore, the HUI3 was most sensitive for the presence of comorbidity. Patients with two or more comorbid conditions had a lower summary score (0.19 lower compared with the mean score) for the HUI3, compared with a reduction of 0.10 and 0.08 for the HUI2 and EQ-5D, respectively.

Table 3

HUI2, HUI3, EQ5D summary scores by determinants (15 years and older, hospitalised) and the relationship between determinants and the MAUI

All three instruments showed significant differences in summary score by length of stay in hospital. The HUI2 scores were higher than HUI3 and EQ-5D for all types of injury. The HUI3 showed lower summary scores than the EQ-5D for skull–brain injury, facial fractures and hip fracture.

Informativity

Figure 1 shows absolute informativity (Shannon's H′) and relative informativity (Shannon's evenness J′) of the common dimensions among the three instruments. Absolute informativity (H′), or diversity (the degree to which health states were distributed equally among the injury patients), was highest for HUI3 in most dimensions, with the largest differences between HUI3 and EQ-5D in the dimensions pain/discomfort (1.70 vs 1.02) and anxiety/depression/emotion (1.34 vs 0.64). Furthermore, for cognition large differences in absolute informativity were found between the HUI3 and EQ-6D (1.17 vs 0.63). HUI3 showed the highest relative informativity (J′), or evenness (the degree to which the instrument reflected the maximal diversity that was possible given the number of health states observed) for pain, emotion and cognition, with the largest differences with the other two instruments in the dimensions anxiety/depression/emotion (0.12 difference compared with EQ-5D; 0.05 difference compared with HUI2) and cognition (0.08 difference compared with HUI2; 0.05 difference compared with EQ-6D). For the dimension mobility/ambulation both the absolute and relative informativity were highest for EQ-5D in comparison with HUI2 and HUI3.

Figure 1

The Shannon Index and the Shannon Evenness Index for the common dimensions between EQ-5D, HUI2 and HUI3: comparison by dimension. (a) mobility dimension. (b) selfcare dimension. (c) pain dimension. (d) anxiety/emotion dimension. (e) cognition dimension. NA, not available.

Table 4 shows Shannon's indices by classification system as a whole. Absolute informativity (H′) was highest for HUI3 (6.08), followed by HUI2 (4.75) and lowest for EQ-5D (2.71). This means that most information was captured by the HUI3 classification system.

Table 4

Shannon's index (H′) and Shannon's evenness index (J′) for EQ-5D, HUI2 and HUI3 by type of injury: comparison by instrument

Relative informativity (J′) was highest for HUI3 (0.72) (0.19 higher than EQ-5D; 0.06 higher than HUI2). The EQ-5D, HUI2 and HUI3 descriptive systems distinguished 36, 150 and 347 observed different unique health states, accounting for 14.8%, 0.6% and 0.04% of all possible permutations, respectively.

The biggest differences in absolute informativity (Shannon's H′) between EQ-5D compared with HUI2 and HUI3 was seen for skull–brain injury (2.83 vs 4.58 and 5.46, respectively) and internal organ injury (2.31 vs 4.37 and 4.65, respectively). Furthermore, a large difference in discriminative power exists between the EQ-5D and HUI3 for upper extremity fractures (2.33 vs 5.14).

For all MAUI the discriminative power was higher for persons with comorbidity. Both the absolute and relative informativity were higher for the HUI2 and HUI3 compared with the EQ-5D. Also, for all MAUI the absolute informativity (Shannon's H′) was highest for long-term admitted patients. This implies that the observed health states among injury patients who were long-term admitted and/or with a comorbid disease provide the best reflection of the maximal diversity given the possible health states.

Discussion

This paper focused on the ability of the HUI2, HUI3 and EQ-5D to discriminate between different levels of HRQoL among injury patients of all severity levels. Mean summary scores for the injury population were significantly different between the instruments, with highest summary scores for the HUI2 and lowest for EQ-5D. The HUI3 is most sensitive for HRQoL resulting from old age (over 65 years) and comorbidity. All three instruments demonstrated sensitivity for differences in injury type and hospitalisation; in addition, they showed similar rankings between injury patient groups. Absolute and relative informativity by dimension was highest for the HUI3 descriptive system. The largest differences between the MAUI were found for pain/discomfort and anxiety/depression/emotion. EQ-5D appears to underperform in these two dimensions. In addition, EQ-6D appeared to have low discriminative power compared with HUI also on the added cognitive dimension. The biggest differences in discriminative power between EQ-5D compared with HUI2 and HUI3 were seen for skull–brain injury, internal organ injury (HUI2) and upper extremity fractures (HUI3).

To the best of our knowledge, only a few studies have been published comparing the HUI and EQ-5D extensively.9–11 15 30 Generally, the results of most of these studies agreed with ours in that there were significantly different utility values generated for the MAUI. As far as we know, published evidence of head-to-head comparisons between the HUI and the EQ-5D among injury patients have not yet been conducted.3

Typical for injuries are their heterogeneous functional sequelae and recovery patterns. Therefore it is important that the MAUI used has good discriminative power for the severity of injury. All three instruments showed significant differences of summary scores by injury group, with relatively low scores for injuries of the spine/vertebrae and hip fractures and high scores for superficial injury and upper extremity injury, close to the health state of the Dutch general population.31

The large differences found between the HUI2, HUI3 and EQ-5D may be confounded by the different valuation and/or scoring methods of the MAUI. First of all, there are important differences in the applied health state valuation technique. The HUI systems used the standard gamble as a valuation technique13 and the EQ-5D used the time trade-off technique.10 The time trade-off technique has been shown to yield lower results compared with the standard gamble technique,8 10 which indicates that lower EQ-5D scores could be expected. Second, the scoring functions for the EQ-5D utility values were derived from samples of the UK population, which may differ from preferences given by those in Canada.6

Furthermore, an advantage of the HUI3 is its potentially greater discriminatory power in a wide range of diseases because it defines 972 000 unique health states, whereas the EQ-5D defines 243.10 We found that the EQ-5D showed highest ceiling effects: 40% of the population were indicated to have no problems on EQ-5D, against 25% on HUI2 and 24% on HUI3. These high ‘ceiling effects’ are not surprising, because in earlier research it was shown that a large part of the patients recover within 2 years from an injury.17 However, in 16% of the cases the EQ-5D finds no disability when the HUI results in functional problems and 32% indicated that they were not fully recovered 2 years after the injury.

Performance in terms of the informativity of EQ-5D, HUI2 and HUI3 of the common dimensions varies over dimensions. Absolute and relative informativity by dimension was highest for the HUI3 descriptive system. Shannon's indices ‘translated’ this difference adequately. Apparently, the EQ-5D would benefit from more levels on the pain/discomfort dimension. Regarding the cognition dimension, the difference in absolute informativity between HUI2, EQ-6D and HUI3 might be explained by the extra levels in HUI3, but the higher J′ value in HUI3 suggests an alternative contributive factor. The very low informativity in self-care among all instruments might be explained by adaptation of skills to the new situation, even when persons still experience health problems 2 years after the injury.

All three measures have the advantage that they are available in formats designed for self-completion, and include a comprehensive health status classification system. The HUI with 15 questions is somewhat more elaborate than the EQ-5D, which consists of five questions. However, fewer questions may yield a higher response rate and fewer missing scores in a mail survey. In our research we also found that more persons fully completed the EQ-5D compared with the HUI2 and HUI3, but the differences were modest (88% vs 87% and 82%, respectively).

Most information is captured by the HUI3 classification system. However, it should be considered that this also results in the fact that the HUI3 is very sensitive for HRQoL reduction due to comorbidity and ageing. For patients in our study above the age of 75 years and with one or more comorbidities (n=50), the summary score of the HUI3 is 0.55, which is much lower than for the HUI2 and EQ-5D (0.75 and 0.72, respectively). This could mainly reflect the influence of other health problems instead of injury consequences, and researchers should be cautious with the HUI3 as a stand-alone measure in injured elderly populations.

This study showed that different MAUI resulted in significantly different summary scores. Furthermore, the HUI2, HUI3 and EQ-5D and their dimensions perform differently for injury type, hospitalisation and length of stay in hospital, comorbidity and ageing. These differences have the undesirable effect that the distinct instruments yield different utilities for similar health states. To demonstrate the functional outcome of injury patients, clinicians and researchers should be aware of these differences between the MAUI.

Decisions about which HRQoL measure to use will be influenced by a range of factors. We showed that the HUI classification system is more informative than the EQ-5D, in particular for patients with skull–brain injury, internal organ injury and upper extremity fractures. Most information is captured by the HUI3 classification system, but this does not seem to have enormous advantages in practice. Nevertheless, the EQ-5D seems to be the utility measure more often applied in injury research (E. Belt, S. Polinder, RA Lyons, et al., 2010, in preparation) in all injury populations and it was recommended for economic evaluations. Although noting the injury outcome limitations of the EQ-5D, such as the absence of a cognitive dimension, the EQ-5D being freely available, its simplicity and high response rates, and with many language versions available, seems suitable for inclusion in studies in the injury population.

Well-founded choices are essential with regard to the type of measure to be used for analysing HRQoL injury patients. We advise using a combination of the HUI and EQ-5D in studies on injury-related disability, in line with earlier published guidelines.3 This combination covers all relevant health dimensions, is applicable in all kinds of injury populations and in widely different age ranges.

What is already known on the subject

  • Injuries are very heterogeneous, resulting in a wide array of individual patterns of functional outcome.

  • The HUI (mark 2 and 3) and the EQ-5D have been recommended as utility measures for injury patients.

  • The HUI and EQ-5D have been found to be acceptable, feasible, valid and reliable utility measures in several patient groups, but a head-to-head comparison among injury patients has not yet been conducted.

What this study adds

  • Significant differences exist in utilities obtained from the EQ-5D, HUI2 and HUI3 for similar health states.

  • The HUI classification system is more informative than the EQ-5D, in particular for skull–brain injury, internal organ injury and upper extremity fractures, but is also more sensitive for ageing and comorbidity among injury patients after 2 years follow-up.

  • The EQ-5D is freely available in many language versions, and its simplicity results in higher response rates, which makes it suitable for inclusion in injury studies.

  • We advise using a combination of the HUI and EQ-5D in studies on injury-related disability, because the combination covers all relevant health dimensions, is applicable in all kinds of injury populations and in widely different age ranges.

References

Footnotes

  • Linked articles 25833.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Linked Articles