Reliability and validity of three quality rating instruments for systematic reviews of observational studies

Res Synth Methods. 2011 Jun;2(2):110-8. doi: 10.1002/jrsm.41. Epub 2011 Sep 15.

Abstract

To assess the inter-rater reliability, validity, and inter-instrument agreement of the three quality rating instruments for observational studies. Inter-rater reliability, criterion validity, and inter-instrument reliability were assessed for three quality rating scales, the Downs and Black (D&B), Newcastle-Ottawa (NOS), and Scottish Intercollegiate Guidelines Network (SIGN), using a sample of 23 observational studies of musculoskeletal health outcomes. Inter-rater reliability for the D&B (Intraclass correlations [ICC] = 0.73; CI = 0.47-0.88) and NOS (ICC = 0.52; CI = 0.14-0.76) were moderate to good and was poor for the SIGN (κ = 0.09; CI = -0.22-0.40). The NOS was not statistically valid (p = 0.35), although the SIGN was statistically valid (p < 0.05) with medium to large effect sizes (f(2) = 0.29-0.47). Inter-instrument agreement estimates were κ = 0.34, CI = 0.05-0.62 (D&B versus SIGN), κ = 0.26, CI = 0.00-0.52 (SIGN versus NOS), and κ = 0.43, CI = 0.09-0.78 (D&B versus NOS). Reliability and validity are quite variable across quality rating scales used in assessing observational studies in systematic reviews. Copyright © 2011 John Wiley & Sons, Ltd.

Keywords: instrument psychometrics; meta‐analysis; research methods; systematic review.