Original Article
Overview of the SF-36 Health Survey and the International Quality of Life Assessment (IQOLA) Project

https://doi.org/10.1016/S0895-4356(98)00081-XGet rights and content

Abstract

This article presents information about the development and evaluation of the SF-36 Health Survey, a 36-item generic measure of health status. It summarizes studies of reliability and validity and provides administrative and interpretation guidelines for the SF-36. A brief history of the International Quality of Life Assessment (IQOLA) Project is also included.

Introduction

The SF-36 Health Survey is a multi-purpose, short-form health survey which contains 36 questions. It yields an eight-scale profile of scores as well as summary physical and mental measures. The SF-36 is a generic measure of health status as opposed to one that targets a specific age, disease, or treatment group. Accordingly, the SF-36 has proven useful in comparing general and specific populations, estimating the relative burden of different diseases, differentiating the health benefits produced by a wide range of different treatments, and screening individual patients [1]. The International Quality of Life Assessment (IQOLA) Project was established in 1991 to translate the SF-36 Health Survey and to validate, norm, and document the translations as required for their use internationally. This overview summarizes the development and evaluation of the SF-36, including studies of reliability and validity, and provides administrative and interpretation guidelines. It also presents a brief history of the IQOLA Project.

It should be noted that most of the references provided in this overview are for the U.S.-English version of the SF-36. IQOLA Project researchers replicated methods used in the United States and utilized new methods to test scaling assumptions and the reliability and validity of the SF-36 translations. Thus, much of the information provided in this article can be seen as a benchmark against which the psychometric properties of the translations can be compared.

Section snippets

Construction of the sf-36

Much remains to be discovered about population health in terms of functional health and well-being, the relative burden of disease, and the relative benefits of alternative treatments. One reason for this has been the lack of practical measurement tools appropriate for widespread use across diverse populations. The SF-36 was constructed to provide a basis for such comparisons.

The SF-36 was constructed to satisfy minimum psychometric standards necessary for group comparisons. The eight health

Sf-36 measurement model

Figure 1 illustrates the taxonomy of items and concepts underlying the construction of the SF-36 scales and summary measures. The taxonomy has three levels: (1) items; (2) eight scales that aggregate 2–10 items each; and (3) two summary measures that aggregate scales. All but one of the 36 items (self-reported health transition) are used to score the eight SF-36 scales. Each item is used in scoring only one scale.

The eight scales are hypothesized to form two distinct higher-ordered clusters

Tests of scaling and scoring assumptions

A major objective in constructing the SF-36 was achievement of high psychometric standards. Guidelines for testing were derived from those recommended for use in validating psychological and educational measures by the American Psychological Association, the American Education Research Association, and the National Council on Measurement in Education [21]. Extensive psychometric testing initially was conducted on the SF-36 in the United States 15, 22, 23 and later in numerous other countries.

Reliability and confidence intervals

The reliability of the eight scales and two summary measures has been estimated using both internal consistency and test-retest methods. With rare exceptions, published reliability statistics have exceeded the minimum standard of 0.70 recommended for measures used in group comparisons [1]; in a summary of 15 studies, most exceeded 0.80 [3]. Reliability estimates for physical and mental summary scores usually exceed 0.90 [14]. In addition, a reliability of 0.93 has been reported for the Mental

Validity and interpretation

Studies of validity are about the meaning of scores and whether or not they have their intended interpretations. Because of the widespread use of the SF-36 across a variety of applications, evidence of all types of validity is relevant. Studies to date have addressed content, concurrent, construct, criterion, and predictive validity.

The brevity of the SF-36 was achieved by focusing on only eight of 40 health concepts studied in the MOS and by measuring each concept with a short-form scale. The

Administrative guidelines

The SF-36 is suitable for self-administration, computerized administration, or administration by a trained interviewer in person or by telephone, to persons age 14 and older. The SF-36 has been administered successfully in general population surveys in the United States and other countries [43] as well as to young and old adult patients with specific diseases 3, 23. It can be administered in 5–10 minutes with a high degree of acceptability and data quality [3]. Indicators of data quality that

Interpretation guidelines

Table 2 summarizes information about the eight SF-36 scales and two summary measures that is important in their use and interpretation. The eight scales are ordered in terms of their factor content (i.e., construct validity) as they are in the SF-36 profile to facilitate interpretation. The first scale is Physical Functioning (PF), which has been shown to be the best all around measure of physical health; the last scale, Mental Health (MH) is the most valid measure of mental health in studies

Sf-36 literature

The experience through 1996 with the SF-36 has been documented in more than 400 publications, which have been summarized in annotated bibliographies 1, 46; an additional 300 publications were published in 1997 and will be documented in the 1997 update to the bibliography. The most complete information about the history and development of the SF-36, its psychometric evaluation, studies of reliability and validity, and normative data is available in the first of three user’s manuals [3]. A second

History of the iqola project

The International Quality of Life Assessment (IQOLA) Project began in 1991, with the goal of developing validated translations of a health status questionnaire for use in multinational clinical trials and other international studies of health 48, 49. Although the SF-36 Health Survey has become an increasingly popular measure since 1991, at that time the SF-36 was only beginning to be widely used. Thus, much consideration was given to the questionnaire that was to be translated in the IQOLA

References (61)

  • A.L Stewart et al.

    Advances in the measurement of functional statusConstruction of aggregate indexes

    Med Care

    (1981)
  • J.E Ware

    Scales for measuring general health perceptions

    Health Serv Res

    (1976)
  • R.H Brook et al.

    Overview of adult health status measures fielded in RAND’s Health Insurance Study

    Med Care

    (1979)
  • J.E Ware

    How to Score the Revised MOS Short-Form Health Scale (SF-36)

    (1988)
  • J.E Ware et al.

    The MOS 36-Item Short-Form Health Survey (SF-36)I. Conceptual framework and item selection

    Med Care

    (1992)
  • J.E Ware et al.

    SF-36 Physical and Mental Health Summary ScalesA User’s Manual

    (1994)
  • C.A McHorney et al.

    The MOS 36-Item Short-Form Health Survey (SF-36)II. Psychometric and clinical tests of validity in measuring physical and mental health constructs

    Med Care

    (1993)
  • C Jenkinson et al.

    Development and testing of the Medical Outcomes Study 36-Item Short Form Health Survey summary scale scores in the United Kingdom

    Med Care

    (1997)
  • J.E Ware et al.

    The factor structure of the SF-36 Health Survey in ten countriesResults from the IQOLA Project

    J Clin Epidemiol

    (1988)
  • J.E Ware et al.

    Comparison of methods for the scoring and statistical analysis of SF-36 health profiles and summary measuresSummary of results from the Medical Outcomes Study

    Med Care

    (1995)
  • A.R Davies et al.

    Measuring Health Perceptions in the Health Insurance Experiment

    (1981)
  • American Psychological Association

    Standards for Educational and Psychological Tests

    (1985)
  • C.A McHorney et al.

    The validity and relative precision of MOS short- and long-form health status scales and Dartmouth COOP chartsResults from the Medical Outcomes Study

    Med Care

    (1992)
  • C.A McHorney et al.

    The MOS 36-Item Short-Form Health Survey (SF-36)III. Tests of data quality, scaling assumptions and reliability across diverse patient groups

    Med Care

    (1994)
  • Medical Outcomes Trust

    How to Score the SF-36 Health Survey

    (1991)
  • C.A McHorney et al.

    Construction and validation of an alternate form general mental health scale for the Medical Outcomes Study Short-Form 36-Item Health Survey

    Med Care

    (1995)
  • J.E Brazier et al.

    Validating the SF-36 Health Survey QuestionnaireNew outcome measure for primary care

    Br Med J

    (1992)
  • J.E Ware

    Tech NotesConfidence intervals for individual scores

    Medical Outcomes Trust Bulletin

    (1994)
  • J.E Ware et al.

    Evaluating translations of health status questionnairesMethods from the IQOLA Project

    Int J Technol Assess Health Care

    (1995)
  • J.N Katz et al.

    Comparative measurement sensitivity of short and longer health status instruments

    Med Care

    (1992)
  • Cited by (2038)

    • Health-Related Quality of Life in Patients with Eosinophilic Esophagitis

      2024, Immunology and Allergy Clinics of North America
    View all citing articles on Scopus
    View full text