PT - JOURNAL ARTICLE AU - Hannah R Maybrier AU - Angela M Mickle AU - Krisztina E Escallier AU - Nan Lin AU - Eva M Schmitt AU - Ravi T Upadhyayula AU - Troy S Wildes AU - George A Mashour AU - Kerry Palihnich AU - Sharon K Inouye AU - Michael Simon Avidan ED - , TI - Reliability and accuracy of delirium assessments among investigators at multiple international centres AID - 10.1136/bmjopen-2018-023137 DP - 2018 Nov 01 TA - BMJ Open PG - e023137 VI - 8 IP - 11 4099 - http://bmjopen.bmj.com/content/8/11/e023137.short 4100 - http://bmjopen.bmj.com/content/8/11/e023137.full SO - BMJ Open2018 Nov 01; 8 AB - Introduction Delirium is a common, serious postoperative complication. For clinical studies to generate valid findings, delirium assessments must be standardised and administered accurately by independent researchers. The Confusion Assessment Method (CAM) is a widely used delirium assessment tool. The objective was to determine whether implementing a standardised CAM training protocol for researchers at multiple international sites yields reliable inter-rater assessment and accurate delirium diagnosis.Methods Patients consented to video recordings of CAM delirium assessments for research purposes. Raters underwent structured training in CAM administration. Training entailed didactic education, role-playing with intensive feedback, apprenticeship with experienced researchers and group discussions of complex cases. Raters independently viewed and scored nine video-recorded CAM interviews. Inter-rater reliability was determined using Fleiss kappa. Accuracy was judged by comparing raters’ scores with those of an expert delirium researcher.Results Twenty-seven raters from eight international research centres completed the study and achieved almost perfect agreement for overall delirium diagnosis, kappa=0.88 (95% CI 0.85 to 0.92). Agreement of the four core CAM features ranged from fair to substantial. The sensitivity and specificity for identifying delirium were 72% (95% CI 60% to 81%) and 99% (95% CI 96% to 100%), considering an expert rater’s scores as the reference standard (delirious, n=3; non-delirious, n=6). Delirium severity ratings were tightly clustered, with most scores within 5% of the median.Conclusion Our results demonstrate that, with appropriate training and ongoing scoring discussions, researchers at multiple sites can reliably detect delirium in postsurgical patients. These results support the premise that methodologically rigorous multi-centre studies can yield standardised and accurate determinations of delirium.