Original Article
Tests of Data Quality, Scaling Assumptions, and Reliability of the Danish SF-36

https://doi.org/10.1016/S0895-4356(98)00092-4Get rights and content

Abstract

We used general population data (n = 4084) to examine data completeness, response consistency, tests of scaling assumptions, and reliability of the Danish SF-36 Health Survey. We compared traditional multitrait scaling analyses to analyses using polychoric correlations and Spearman correlations. The frequency of missing values was low, except for elderly people and people with lower levels of education. Response consistency was high and compared well with results for the U.S. SF-36. For respondents with computable scales in all eight domains, scaling assumptions (item internal consistency, item discriminant validity, equal item–own scale correlations, and equal variances) were satisfactory in the total sample and in all subgroups. The SF-36 could discriminate between levels of health in all subgroups, but there were skewness, kurtosis, and ceiling effects in many subgroups (elderly people and people with chronic diseases excepted). Concerning correlation methods, we found interesting differences indicating advantages of using methods that do not assume a normal distribution of answers as an addition to traditional methods.

Introduction

A critical step in the development of health status measures is to evaluate their measurement properties. In the evaluation of such questionnaires, the health status measurement field has to some extent developed criteria that are unique to or especially important in the health field (e.g., response burden and responsiveness), but most criteria and techniques have been adopted from the area of psychological testing. In particular, statistical techniques from classical psychometrics [1] have been adopted to examine scaling assumptions, reliability, and validity. In principle, these techniques assume that data are continuous and normally distributed, but the techniques often work well also with categorical rank scaled data 1, 2, that is, the kind of data used in health status measurement. The use of methods from classical psychometrics in the evaluation of health status measures may, however, still be debatable, as health status data are often skewed.

The psychometric properties of health status measures are examined as part of the development process, but it is also recommended to do new psychometric analyses when a questionnaire is translated into another language [3]. Further, because the validity and reliability of a questionnaire are specific to the setting and the population [1], a check of psychometric properties may be appropriate if the questionnaire is to be used in new settings or with groups for whom it has not previously been tested.

This article concerns the psychometric properties of the Danish translation of the MOS SF-36 Health Survey. The translation and adaptation of the Danish SF-36 followed the procedures of the International Quality of Life Assessment (IQOLA) project 3, 4, 5, an international collaboration to translate and implement the SF-36 across languages and cultures. Initial studies have shown the adequacy of the Danish translation as judged from backward translations and from independent evaluation of the conceptual equivalence, use of common language, and clarity of the Danish translation [5]. Factor analytic studies have established that the factor structure is very similar to the U.S. version 6, 7. Analyses by the Rasch model of the Physical Functioning scale and contingency table analyses of differential item functioning (DIF) in all scales have found that some items show DIF but that these cases of DIF have only a small impact on the total score for each scale in comparisons of general population data 8, 9.

We present data here on the psychometric properties of the Danish SF-36, focusing on data completeness, response consistency, tests of scaling assumptions, and the reliability of the Danish SF-36. As has been done in studies of the U.S. SF-36 [10], these properties are analyzed for the total population and subgroups defined by age, gender, education, and disease status. In addition, we investigate some more general issues concerning the choice of different statistical methods in testing scaling assumptions. Given that the Danish general population sample data used in this paper include many healthy people, our data are more skewed than the data originally used to validate the SF-36 in the United States 10, 11. For this reason, we examined whether our results changed when using methods that are not dependent on the assumption of normal distribution of the observed variables.

Section snippets

Data Collection

Data for the Danish general population were collected from February to August 1994 as part of a population health survey. A representative sample of 5983 noninstitutionalized Danish citizens more than 15 years of age was drawn from the Civil Registration System, which registers addresses and other data for all Danes. The survey included a home visit with a 30-minute structured personal interview regarding social and demographic data, health behavior, health status, and diseases. After the

Background Data

The respondents’ ages ranged from 16 to 94 years. The mean age of the population was approximately the same for men and women (44 years), and the mean number of years of education did not differ between genders. Chronic diseases were slightly more frequent among women. The younger age groups had more years of education and fewer chronic diseases than the older age groups. In the age groups 25 to 66 years, chronic diseases were more frequent among people with short school education, but no

Discussion

Danes tend to have a somewhat more positive self-evaluated health status than Americans (see Ware et al. [23]) as indicated by more skewness and higher ceiling effects in Danish data. Even in this healthy Danish population sample, the SF-36 is able to distinguish levels of health. For some items, like the PF7, PF8, and MH4, skewness is especially pronounced compared with the U.S. general population data [24]. It is possible that the translation has led to a slight shift in meaning of these

Conclusion

Although the results on missing responses warrant further work on questionnaire layout for some subgroups, the rest of our psychometric results are satisfactory. The scaling properties, the reliability, and the discriminatory power of the Danish SF-36 seemed to be best in the group with whom the questionnaire can be expected to be most used: among people with chronic disease. Thus, the Danish SF-36 may be said to have found the balance between fulfilling requirements of shortness and low

Acknowledgements

The International Quality of Life Assessment (IQOLA) Project is sponsored by Glaxo Wellcome, Inc., Research Triangle Park, North Carolina, and Schering-Plough Corporation, Kenilworth, New Jersey. This study has been supported by grants from Glaxo Research Institute, from the Danish Medical Research Council, and from the Danish Health Insurance Fund. We thank Barbara Gandek, John E. Ware, Jr., and two anonymous reviewers for comments to a previous version of this article.

References (32)

  • Ware JE, Keller SD, Gandek B, Brazier JE, Sullivan M, The IQOLA Project Group. Evaluating Translations of Health Status...
  • C.A. McHorney et al.

    The MOS 36-item Short-Form Health Survey (SF-36)III. Test of data quality, scaling assumptions, and reliability across diverse patient groups

    Med Care

    (1994)
  • C.A. McHorney et al.

    The MOS 36-Item Short-Form Health Survey (SF-36)II. Psychometric and clinical tests of validity in measuring physical and mental health constructs

    Med Care

    (1993)
  • [Classification of Diseases - Systematic Part]. Copenhagen: Danish National Board of Health;...
  • J.E. Ware et al.

    SF-36 Health SurveyManual and Interpretation Guide

    (1993)
  • J.B. Bjorner et al.

    Danish Manual for the SF-36 (in Danish)

    (1997)
  • Cited by (227)

    • Autologous fat grafting seems to alleviate postherpetic neuralgia – a feasibility study investigating patient-reported levels of pain

      2021, Journal of Plastic, Reconstructive and Aesthetic Surgery
      Citation Excerpt :

      We asked the patient to report on their maximum and average level of pain during the last two weeks. Our secondary outcome measures were QoL, measured using the Short-Form 36 (SF-36) and the quality of neuropathic pain, measured using the Neuropathic Pain Inventory Scale (NPSI).28–30 The SF-36 reported on eight parameters; general health, pain, social functioning, emotional well-being, energy/fatigue, role limitations due to emotional problems, role limitations due to physical health, and physical function.

    View all citing articles on Scopus
    View full text