Article Text

Inter-rater and test–retest reliability of quality assessments by novice student raters using the Jadad and Newcastle–Ottawa Scales
  1. Mark Oremus1,2,
  2. Carolina Oremus3,4,
  3. Geoffrey B C Hall3,4,
  4. Margaret C McKinnon3,4,
  5. ECT & Cognition Systematic Review Team3,4,*
  1. 1McMaster Evidence-based Practice Centre, McMaster University, Hamilton, Ontario, Canada
  2. 2Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada
  3. 3McMaster Integrative Neuroscience Discovery and Study (MINDS) Program, Hamilton, Ontario, Canada
  4. 4Department of Psychiatry and Behavioural Neuroscience, Hamilton, Ontario, Canada
  1. Correspondence to Dr Mark Oremus; oremusm{at}mcmaster.ca

Abstract

Introduction Quality assessment of included studies is an important component of systematic reviews.

Objective The authors investigated inter-rater and test–retest reliability for quality assessments conducted by inexperienced student raters.

Design Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle–Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13–20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles.

Setting McMaster Integrative Neuroscience Discovery and Study Program.

Participants 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses.

Main outcome measures The authors measured inter-rater reliability using κ and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test–retest reliability using ICC(2,1).

Results Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific κs ranged from 0.13 (95% CI −0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were −0.14 (95% CI −0.28 to 0.00) to 0.39 (95% CI −0.02 to 0.81) for the NOS cohort and −0.20 (95% CI −0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case–control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test–retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1)s for the NOS cohort were −0.19 (95% CI −0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case–control, the ICC(2,1)s were 0.46 (95% CI −0.13 to 0.92) and 0.83 (95% CI 0.48 to 0.95).

Conclusions Inter-rater reliability was generally poor to fair and test–retest reliability was fair to excellent. A pilot rating phase following rater training may be one way to improve agreement.

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Footnotes

  • * The ECT & Cognition Systematic Review Team includes Allyson Graham, Caitlin Gregory, Gagan Fervaha, Lindsay Hanford, Anthony Nazarov, Melissa Parlar, Maria Restivo, Erica Tatham and Wanda Truong.

  • To cite: Oremus M, Oremus C, Hall GBC, et al. Inter-rater and test–retest reliability of quality assessments by novice student raters using the Jadad and Newcastle–Ottawa Scales. BMJ Open 2012;2:e001368. doi:10.1136/bmjopen-2012-001368

  • Contributors MO and CO conceived and designed the study. MO analysed the data. MO, CO, MCM, GBCH and the ECT & Cognition Systematic Review Team interpreted the data. MO drafted the manuscript. CO, MCM, GBCH and the ECT & Cognition Systematic Review Team critically revised the manuscript for important intellectual content. All authors approved the final version of the manuscript.

  • Funding This study did not receive funds from any sponsor. No person or organisation beyond the authors had any input in study design and the collection, analysis and interpretation of data and the writing of the article and the decision to submit it for publication.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data available.