Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures

Joseph C Cappelleri; J Jason Lundy; Ron D Hays

doi:10.1016/j.clinthera.2014.04.006

Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures

Clin Ther. 2014 May;36(5):648-62. doi: 10.1016/j.clinthera.2014.04.006. Epub 2014 May 5.

Authors

Joseph C Cappelleri¹, J Jason Lundy², Ron D Hays³

Affiliations

¹ Pfizer Inc, Groton, Connecticut. Electronic address: joseph.c.cappelleri@pfizer.com.
² Critical Path Institute, Tucson, Arizona.
³ Division of General Internal Medicine & Health Services Research, University of California at Los Angeles, Los Angeles, California.

Abstract

Background: The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures.

Methods: We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses.

Results: If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow.

Conclusion: Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures.

Keywords: classical test theory; content validity; item response theory; patient-reported outcomes; scale development.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, U.S. Gov't, P.H.S.
Review

MeSH terms

Humans
Models, Theoretical
Outcome Assessment, Health Care / organization & administration*
Patient Outcome Assessment*
United States
United States Food and Drug Administration

Abstract

Publication types

MeSH terms

Grants and funding