Elsevier

The Lancet Neurology

Volume 6, Issue 12, December 2007, Pages 1094-1105
The Lancet Neurology

Review
Rating scales as outcome measures for clinical trials in neurology: problems, solutions, and recommendations

https://doi.org/10.1016/S1474-4422(07)70290-9Get rights and content

Summary

Have state-of-the-art clinical trials failed to deliver treatments for neurodegenerative diseases because of shortcomings in the rating scales used? This Review assesses two methodological limitations of rating scales that might help to answer this question. First, the numbers generated by most rating scales do not satisfy the criteria for rigorous measurements. Second, we do not really know which variables most rating scales measure. We use clinical examples to highlight concerns about the limitations of rating scales, examine their underlying rationales, clarify their implications, explore potential solutions, and make some recommendations for future research. We show that improvements in the scientific rigour of rating scales can improve the chances of reaching the correct conclusions about the effectiveness of treatments.

Introduction

A recent review of UK health research funding1 emphasised the importance of translational research and highlighted an internationally recognised problem: success in basic science rarely leads to effective treatments. Why have state-of-the-art clinical trials failed to deliver treatments? Are all candidate molecules that work in controlled laboratory settings worthless when studied in human beings? Conversely, do some of the methods used to test the efficacy of treatments hinder advances in basic science?

In this Review we focus on the latter point and, in particular, the rating scales used to measure the health outcomes of trials for the treatment of neurological diseases, which are increasingly selected as primary or secondary outcome measures in clinical trials.2, 3, 4, 5, 6 Rating scales are, therefore, the main dependent variables on which decisions are made that influence patient care and guide future research; the adequacy of these decisions depends directly on the scientific quality of the rating scales.

Two developments indicate an appreciation of this fact: the increased application of the science of rating scales (psychometrics) for the measurement of health outcomes in clinical neurology; and the impending US Food and Drug Administration's (FDA) scientific requirements for patient-reported rating scales in clinical trials.7, 8 The FDA requirements are likely to be emulated by the European Medicines Agency (EMEA)9 and will be pertinent to all rating scales, not just those that are patient-reported.

Our opening remarks might suggest that we think that published data from clinical trials are littered with type-2 errors due to poor rating scales. We do not know whether this is the case; nor do we know the frequency of type-1 errors that arise from problems with rating scales. We do know, however, that the reliability, validity, and responsiveness of different scales will influence their ability to estimate accurately the effect of a disease, to detect clinical change, and will have implications for calculations of sample size.10 As such, the differences among rating scales have the potential to influence the outcome of clinical trials (panel 1).

Therefore, clinicians need to ensure that rating scales are fit for purpose, and maximising the scientific rigour of rating scales improves the chances of coming to the correct conclusion about the efficacy of a treatment. On this basis, a fundamental requirement of rigorous clinical trials is that the numbers generated by rating scales satisfy established scientific criteria as measurements of explicit, clinically meaningful variables.

A review of the subject of rating scales as outcome measures is, therefore, timely. We introduce the basic principles of the mechanics of rating scales and the limitations of the data derived from them. We discuss the benefits of moving to new psychometric methods and make recommendations to bring rating scales into line with what they measure. We highlight two methodological limitations that require attention to ensure that state-of-the-art clinical trials are underpinned by state-of-the-art measurements: the first limitation is that the numbers generated by most rating scales do not satisfy criteria as rigorous measurements; the second limitation is that we do not really know what variables most rating scales are measuring. These facts have great potential to undermine clinical trials, patient care, and research. The extent to which the limitations of rating scales are to blame for the failure of clinical trials to deliver treatments is unknown. However, our review highlights the potential contribution of rating scales and the way their data are analysed.

Section snippets

Basis of rating scales as outcome measures

Some variables (eg, height and weight) can be measured directly. Other variables (eg, disability, cognitive function, and quality of life) are measured indirectly by how they manifest; therefore, we need a method to transform the manifestations of these “latent” variables into numbers that can be taken as measurements.13

Rating scales are a means to measure latent variables, and two types of rating scale are commonly used in neurology: single item scales (eg, Ashworth scale [figure 1],14

The requirement for rating scales to generate rigorous measurements

Phase III clinical trials need rating scales that generate rigorous measurements. Unfortunately, this is rarely achieved because most rating scales generate ordered scores that are only suitable for group comparison studies, rather than precise measurements of an individual.

The requirement to know precisely what variables are measured

Clinical trials require rating scales that actually measure the health constructs that they claim to (ie, the scales are valid) and health constructs that are clinically meaningful and can be interpreted. Unfortunately, current methods to establish the validity of a rating scale rarely meet these goals.

Recommendations

The FDA draft recommendations for patient-reported rating scales in clinical trials highlight the importance of “conceptually sound, reliable, and valid measures”.8 Such an acknowledgment is a vital, albeit first, step. Surprisingly, the document barely mentions new psychometric methods, despite their clear advantages and increased use;104, 105, 106, 107, 108, 109 furthermore, despite the emphasis on the improvement of methods to establish validity, they do not provide detailed guidance on how

Conclusions

In this Review we posed a question: why have state-of-the-art clinical trials in neurology failed to deliver treatments? Our aim was to highlight the potential contribution to this failure of the currently available rating scales and the way their data are analysed. However, rating scales are not always to blame. Indeed, the extent to which rating scales undermine inferences from clinical trials is difficult to determine. Our message is simple: when rating scales are used, they must be fit for

Search strategy and selection criteria

Our Review is a focused critique of the literature on the basis of articles, reports, and book chapters that span more than a century of research in three areas: psychometrics, health measurement, and neurological clinical trials. These were collected as part of the general strategy in our unit (Neurological Outcome Measures Unit) during the past 15 years to develop a clear and detailed understanding of the science behind rating scales. Our search strategy included searches of electronic

References (118)

  • D Cooksey

    A review of health research funding

    (2006)
  • P Aisen et al.

    Effects of rofecoxib or naproxen vs placebo on Alzheimer's disease progression: a randomized controlled trial

    JAMA

    (2003)
  • J Fairbank et al.

    Randomised controlled trial to compare surgical stabilisation of the lumbar spine with an intensive rehabilitation programme for patients with chronic low back pain: the MRC spine stabilisation trial

    BMJ

    (2005)
  • K Lees et al.

    NXY-059 for acute ischemic stroke

    N Engl J Med

    (2006)
  • Patient reported outcome measures: use in medical product development to support labelling claims, 2006

  • Reflection paper on the regulatory guidance for the use of the health-related quality of life (HRQL) measures in the evaluation of medicinal products

    (2006)
  • JC Hobart et al.

    How responsive is the MSIS-29? A comparison with other self report scales

    J Neurol Neurosurg Psychiatr

    (2005)
  • DATATOP: a multicenter controlled clinical trial in early Parkinson's disease

    Arch Neurol

    (1989)
  • F Stocchi et al.

    Neuroprotection in Parkinson's disease: clinical trials

    Ann Neurol

    (2003)
  • BD Wright et al.

    Rating scale analysis: Rasch measurement

    (1982)
  • B Ashworth

    Preliminary trial of carisoprodol in multiple sclerosis

    Practitioner

    (1964)
  • JF Kurtzke

    Rating neurological impairment in multiple sclerosis: an expanded disability status scale (EDSS)

    Neurology

    (1983)
  • J Rankin

    Cerebral vascular accidents in patients over the age of 60: II. Prognosis

    Scott Med J

    (1957)
  • S Hauser et al.

    Intensive immunosuppression in progressive multiple sclerosis: a randomised three-arm study of high dose intravenous cyclophosphamide, plasma exchange, and ACTH

    N Engl J Med

    (1983)
  • MM Hoehn et al.

    Parkinsonism: onset, progression, and mortality

    Neurology

    (1967)
  • FM Collen et al.

    The Rivermead Mobility Index: a further development of the Rivermead Motor Assessment

    Int Disabil Stud

    (1991)
  • FI Mahoney et al.

    Functional evaluation: the Barthel Index

    Maryland State Med J

    (1965)
  • CV Granger et al.

    Advances in functional assessment for medical rehabilitation

    Topics Geriatr Rehab

    (1986)
  • JC Nunnally

    Psychometric theory

    (1967)
  • W Manning et al.

    The status of health in demand estimation: or beyond excellent, good, fair, and poor

  • EuroQoL: a new facility for the measurement of health-related quality of life

    Health Policy

    (1990)
  • B Haas et al.

    The inter rater reliability of the original and of the modified Ashworth scale for the assessment of spasticity in patients with spinal cord injury

    Spinal Cord

    (1996)
  • JC Hobart et al.

    Kurtzke scales revisited: the application of psychometric methods to clinical intuition

    Brain

    (2000)
  • M Blackburn et al.

    Reliability of measurements obtained with the modified Ashworth scale in the lower extremities of people with stroke

    Phys Ther

    (2002)
  • N Clopton et al.

    Interrater and intrarater reliability of the Modified Ashworth Scale in children with hypertonia

    Pediatric Physical Therapy

    (2005)
  • J Wilson et al.

    Reliability of the modified Rankin Scale across multiple raters: benefits of a structured interview

    Stroke

    (2005)
  • W Yam et al.

    Interrater reliability of Modified Ashworth Scale and Modified Tardieu Scale in children with spastic cerebral palsy

    J Child Neurol

    (2006)
  • P New et al.

    Critical appraisal and review of the Rankin scale and its derivatives

    Neuroepidemiology

    (2006)
  • CA McHorney et al.

    The validity and relative precision of MOS short- and long-form health status scales and Dartmouth COOP charts

    Med Care

    (1992)
  • JC Hobart

    Rating scales for neurologists

    J Neurol Neursurg Psychiatr

    (2003)
  • C Vaney et al.

    Efficacy, safety and tolerability of an orally administered cannabis extract in the treatment of spasticity in patients with multiple sclerosis: a randomized, double-blind, placebo-controlled, crossover study

    Mult Scler

    (2004)
  • M Uyttenboogaart et al.

    Measuring disability in stroke: relationship between the modified Rankin scale and the Barthel index

    J Neurol

    (2007)
  • JC Hobart et al.

    Measuring the impact of MS on walking ability: the 12-item MS Walking Scale (MSWS-12)

    Neurology

    (2003)
  • EL Thorndike

    An introduction to the theory of mental and social measurements

    (1904)
  • LL Thurstone

    Theory of attitude measurement

    Psychol Rev

    (1929)
  • C Merbitz et al.

    Ordinal scales and foundations of misinference

    Arch Phys Med Rehabil

    (1989)
  • BD Wright et al.

    Observations are always ordinal: measurements, however, must be interval

    Arch Phys Med Rehabil

    (1989)
  • R Massof

    The measurement of vision disability

    Optom Vis Sci

    (2002)
  • J Michell

    Measurement: a beginner's guide

    J Appl Meas

    (2003)
  • J Michell

    An introduction to the logical of psychological measurement

    (1990)
  • Cited by (0)

    View full text