Objectives Much research with electronic health records (EHRs) uses coded or structured data only; important information captured in the free text remains unused. One dimension of EHR data quality assessment is ‘currency’ or timeliness, that is, data are representative of the patient state at the time of measurement. We explored the use of free text in UK general practice patient records to evaluate delays in recording of rheumatoid arthritis (RA) diagnosis. We also aimed to locate and quantify disease and diagnostic information recorded only in text.
Setting UK general practice patient records from the Clinical Practice Research Datalink.
Participants 294 individuals with incident diagnosis of RA between 2005 and 2008; 204 women and 85 men, median age 63 years.
Primary and secondary outcome measures Assessment of (1) quantity and timing of text entries for disease-modifying antirheumatic drugs (DMARDs) as a proxy for the RA disease code, and (2) quantity, location and timing of free text information relating to RA onset and diagnosis.
Results Inflammatory markers, pain and DMARDs were the most common categories of disease information in text prior to RA diagnostic code; 10–37% of patients had such information only in text. Read codes associated with RA-related text included correspondence, general consultation and arthritis codes. 64 patients (22%) had DMARD text entries >14 days prior to RA code; these patients had more and earlier referrals to rheumatology, tests, swelling, pain and DMARD prescriptions, suggestive of an earlier implicit diagnosis than was recorded by the diagnostic code.
Conclusions RA-related symptoms, tests, referrals and prescriptions were recorded in free text with 22% of patients showing strong evidence of delay in coding of diagnosis. Researchers using EHRs may need to mitigate for delayed codes by incorporating text into their case-ascertainment strategies. Natural language processing techniques have the capability to do this at scale.
- Rheumatoid arthritis
- electronic health records
- data quality
- general practice
- free text
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.