Article Text
Abstract
Objectives UK statistics suggest only two-thirds of patients with dementia get a diagnosis recorded in primary care. General practitioners (GPs) report barriers to formally diagnosing dementia, so some patients may be known by GPs to have dementia but may be missing a diagnosis in their patient record. We aimed to produce a method to identify these ‘known but unlabelled’ patients with dementia using data from primary care patient records.
Design Retrospective case–control study using routinely collected primary care patient records from Clinical Practice Research Datalink.
Setting UK general practice.
Participants English patients aged >65 years, with a coded diagnosis of dementia recorded in 2000–2012 (cases), matched 1:1 with patients with no diagnosis code for dementia (controls).
Interventions Eight coded and nine keyword concepts indicating symptoms, screening tests, referrals and care for dementia recorded in the 5 years before diagnosis. We trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naïve Bayes, random forest).
Primary and secondary outcomes The outcome variable was dementia diagnosis code; the accuracy of classifiers was assessed using area under the receiver operating characteristic curve (AUC); the order of features contributing to discrimination was examined.
Results 93 426 patients were included; the median age was 83 years (64.8% women). Three classifiers achieved high discrimination and performed very similarly. AUCs were 0.87–0.90 with coded variables, rising to 0.90–0.94 with keywords added. Feature prioritisation was different for each classifier; commonly prioritised features were Alzheimer’s prescription, dementia annual review, memory loss and dementia keywords.
Conclusions It is possible to detect patients with dementia who are known to GPs but unlabelled with a diagnostic code, with a high degree of accuracy in electronic primary care record data. Using keywords from clinic notes and letters improves accuracy compared with coded data alone. This approach could improve identification of dementia cases for record-keeping, service planning and delivery of good quality care.
- dementia
- health informatics
- geriatric medicine
- primary care
- delirium & cognitive disorders
- old age psychiatry
This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.
Statistics from Altmetric.com
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Twitter @drelizabethford
Contributors EF, SO and JAC conceived and directed the study. EF drew up code lists and PR checked these. PR managed the data. JS conducted the analyses. PR and SO gave data analysis advice. SB and JAC gave clinical advice. EF and JS wrote the manuscript. All authors provided critical feedback on the manuscript and approved the final version.
Funding This project was funded by a grant from the Wellcome Trust ref. 202133/Z/16/Z.
Competing interests None declared.
Patient consent for publication Not required.
Ethics approval This study was approved by the Independent Scientific Advisory Committee at the Medicines and Healthcare Products Regulatory Authority, UK, protocol number 15_111_RA. Following approval, administrative permissions to access and use the electronic patient records were granted by Clinical Practice Research Datalink (CPRD.com).
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data may be obtained from a third party and are not publicly available. The data that support the findings of this study are available from Clinical Practice Research Datalink (CPRD; www.cprd.com), but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. For reusing these data, an application must be made directly to CPRD.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.