PT  - JOURNAL ARTICLE
AU  - Hannah R Maybrier
AU  - Angela M Mickle
AU  - Krisztina E Escallier
AU  - Nan Lin
AU  - Eva M Schmitt
AU  - Ravi T Upadhyayula
AU  - Troy S Wildes
AU  - George A Mashour
AU  - Kerry Palihnich
AU  - Sharon K Inouye
AU  - Michael Simon Avidan
ED  - ,
TI  - Reliability and accuracy of delirium assessments among investigators at multiple international centres
AID  - 10.1136/bmjopen-2018-023137
DP  - 2018 Nov 01
TA  - BMJ Open
PG  - e023137
VI  - 8
IP  - 11
4099  - http://bmjopen.bmj.com/content/8/11/e023137.short
4100  - http://bmjopen.bmj.com/content/8/11/e023137.full
SO  - BMJ Open2018 Nov 01; 8
AB  - Introduction Delirium is a common, serious postoperative complication. For clinical studies to generate valid findings, delirium assessments must be standardised and administered accurately by independent researchers. The Confusion Assessment Method (CAM) is a widely used delirium assessment tool. The objective was to determine whether implementing a standardised CAM training protocol for researchers at multiple international sites yields reliable inter-rater assessment and accurate delirium diagnosis.Methods Patients consented to video recordings of CAM delirium assessments for research purposes. Raters underwent structured training in CAM administration. Training entailed didactic education, role-playing with intensive feedback, apprenticeship with experienced researchers and group discussions of complex cases. Raters independently viewed and scored nine video-recorded CAM interviews. Inter-rater reliability was determined using Fleiss kappa. Accuracy was judged by comparing raters’ scores with those of an expert delirium researcher.Results Twenty-seven raters from eight international research centres completed the study and achieved almost perfect agreement for overall delirium diagnosis, kappa=0.88 (95% CI 0.85 to 0.92). Agreement of the four core CAM features ranged from fair to substantial. The sensitivity and specificity for identifying delirium were 72% (95% CI 60% to 81%) and 99% (95% CI 96% to 100%), considering an expert rater’s scores as the reference standard (delirious, n=3; non-delirious, n=6). Delirium severity ratings were tightly clustered, with most scores within 5% of the median.Conclusion Our results demonstrate that, with appropriate training and ongoing scoring discussions, researchers at multiple sites can reliably detect delirium in postsurgical patients. These results support the premise that methodologically rigorous multi-centre studies can yield standardised and accurate determinations of delirium.