Original Article
The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes

https://doi.org/10.1016/j.jclinepi.2010.02.006Get rights and content

Abstract

Objective

Lack of consensus on taxonomy, terminology, and definitions has led to confusion about which measurement properties are relevant and which concepts they represent. The aim was to clarify and standardize terminology and definitions of measurement properties by reaching consensus among a group of experts and to develop a taxonomy of measurement properties relevant for evaluating health instruments.

Study Design and Setting

An international Delphi study with four written rounds was performed. Participating experts had a background in epidemiology, statistics, psychology, and clinical medicine. The panel was asked to rate their (dis)agreement about proposals on a five-point scale. Consensus was considered to be reached when at least 67% of the panel agreed.

Results

Of 91 invited experts, 57 agreed to participate and 43 actually participated. Consensus was reached on positions of measurement properties in the taxonomy (68–84%), terminology (74–88%, except for structural validity [56%]), and definitions of measurement properties (68–88%). The panel extensively discussed the positions of internal consistency and responsiveness in the taxonomy, the terms “reliability” and “structural validity,” and the definitions of internal consistency and reliability.

Conclusions

Consensus on taxonomy, terminology, and definitions of measurement properties was reached. Hopefully, this will lead to a more uniform use of terms and definitions in the literature on measurement properties.

Introduction

What is new?

  • International consensus on terminology and definitions of measurement properties.

  • Development of a taxonomy of the relationships of measurement properties.

  • Multidisciplinary international collaboration leading to consensus.

A lack of consensus exists about terminology (how do we call it?) and definitions (what does it mean?) of measurement properties (such as reliability and validity) across the different fields that contribute to health measurement. For example, in the literature, many different terms for the same measurement property “reliability” are used interchangeably, such as reproducibility, reliability, repeatability, agreement, precision, variability, consistency, and stability [1]. At the same time, the term “agreement” is also used to indicate another measurement property, that is, “measurement error.” Different uses of terminology can lead to confusion about which measurement property is assessed. Differences in definitions may lead to confusion about which concept the measurement property represents and how it should be assessed. For example, responsiveness may be defined as “the ability to detect clinically important change” or as “the ability to detect change in the construct to be measured.” These definitions reflect different constructs. The choice of a definition leads to different ways of assessing a measurement property. For example, Terwee et al. [2] calculated a number of parameters for responsiveness on the same data set, for example, an effect size (ES) (as a parameter to detect clinically important change) and a receiver operator characteristic (ROC) curve (as a parameter to detect change in the construct being measured). They found an ES of 0.39, which can be considered as moderate and an ROC curve of 0.47, which is poor [2]. Thus, the result and the conclusion of a study on measurement properties are dependent on the parameter used. Therefore, using one definition and its corresponding parameter may lead to other results and conclusions than using another definition and parameter.

A taxonomy of the relationships among measurement properties provides a complete picture of the relevant measurement properties when assessing the quality of health-related patient-reported outcomes (HR-PROs). A taxonomy is a classification containing domains and subcategories. In our taxonomy, the measurement properties and aspects of measurement properties are the subcategories.

To improve the field of assessing measurement properties, it is of utmost importance to reach consensus about terminology, definitions, and a taxonomy of the relationships of measurement properties of HR-PROs. A Delphi procedure is an appropriate design to reach consensus among experts [3], [4].

The aim of this article is to clarify and standardize terminology and definitions of measurement properties by reaching consensus among a group of experts. In addition, the aim was to develop a taxonomy of the relationships of relevant measurement properties for evaluating HR-PRO instruments.

This study is part of the COSMIN initiative (COnsensus-based Standards for the selection of health Measurement INstruments), which aims to improve the selection of health measurement instruments. Within the COSMIN initiative, we performed a Delphi study in which we aimed to reach consensus on (1) which measurement properties are relevant for evaluating HR-PROs; (2) terminology and definitions of these measurement properties; and (3) the design requirements and preferred statistical methods. These issues should all be in line with each other to avoid confusion. Consensus reached on these issues is needed so that more uniformity is obtained in the use of terminology, definitions, and subsequently the design requirements and statistical methods. Results of studies on measurement properties are then better comparable. Moreover, consensus on these issues can lead to more understanding about what important measurement properties of measurement instruments are and how they should be investigated. In a related article, we describe the results of the Delphi study in which the panel reached consensus on which measurement properties are relevant for evaluating HR-PROs [5]. These are internal consistency, reliability, measurement error, content validity (including face validity), construct validity (subdivided into structural validity, hypotheses testing, and cross-cultural validity), criterion validity, and responsiveness. Interpretability was also considered to be relevant, although it was not considered to be a measurement property. In the related article, we also describe the consensus reached on design requirements and preferred statistical methods.

Section snippets

Steering Committee

A Steering Committee was formed to initiate and guide the study. Members of the Steering Committee are the authors of this article: five epidemiologists (L.B.M., C.B.T., D.L.P., L.M.B., and H.C.W.d.V.), a clinician (J.A.), a physiotherapist (P.W.S.), and a psychometrician (D.L.K.). The Steering Committee was responsible for the selection of the panel members, the design of the questionnaires, the analysis of the responses, and the formulation of the feedback reports. The members of the Steering

Panel members

We invited 91 experts to participate: 57 (63%) agreed to participate and 15 (16%) were unwilling or unable to participate. The main reason for nonparticipation was lack of time. Nineteen experts (21%) did not respond. The mean number (minimum–maximum) of years of experience that the panel members had in research on measuring health or comparable fields (e.g., in educational or psychological measurements) was 20 (6–40) years. Most of the panel members came from Northern America (n = 25) and Europe

Discussion

In an international Delphi study, consensus was reached on the positions of all relevant measurement properties of HR-PRO instruments in a taxonomy. In addition, we reached consensus on terminology (percentage consensus ranging from 74% to 88% except for one term [only 56% agreed on structural validity]) and on definitions (percentage consensus ranging from 68% to 88%) for each included measurement property. Our aim of reaching consensus was to provide clarification and standardization on these

Conclusion

Consensus was reached on the position of measurement properties of HR-PRO instrument instruments in a taxonomy, on terminology, and on definitions of measurement properties. Lack of consensus in the literature has lead to confusion about which measurement properties are relevant, which concepts they represent, and how to assess these measurement properties in terms of design requirements and preferred statistical methods. Hopefully, this will lead to a more uniform use of terms and definitions

Acknowledgments

The authors are grateful to all the panel members who have participated in the COSMIN study: Neil Aaronson, Linda Abetz, Elena Andresen, Dorcas Beaton, Martijn Berger, Giorgio Bertolotti, Monika Bullinger, David Cella, Joost Dekker, Dominique Dubois, Anne Evers, Diane Fairclough, David Feeny, Raymond Fitzpatrick, Andrew Garratt, Francis Guillemin, Dennis Hart, Graeme Hawthorne, Ron Hays, Elizabeth Juniper, Robert Kane, Donna Lamping, Marissa Lassere, Matthew Liang, Kathleen Lohr, Patrick

References (15)

There are more references available in the full text version of this article.

Cited by (0)

View full text