Article Text

Download PDFPDF

What evidence is there for a delay in diagnostic coding of RA in UK general practice records? An observational study of free text
  1. Elizabeth Ford1,
  2. John Carroll2,
  3. Helen Smith1,
  4. Kevin Davies3,
  5. Rob Koeling2,
  6. Irene Petersen4,5,
  7. Greta Rait4,
  8. Jackie Cassell1
  1. 1Division of Primary Care and Public Health, Brighton and Sussex Medical School, Falmer, Brighton, UK
  2. 2Department of Informatics, University of Sussex, Falmer, Brighton, UK
  3. 3Division of Medicine, Brighton and Sussex Medical School, Falmer, Brighton, UK
  4. 4Research Department of Primary Care and Population Health, UCL, London, UK
  5. 5Department of Clinical Epidemiology, Aarhus University, Denmark
  1. Correspondence to Dr Elizabeth Ford;{at}


Objectives Much research with electronic health records (EHRs) uses coded or structured data only; important information captured in the free text remains unused. One dimension of EHR data quality assessment is ‘currency’ or timeliness, that is, data are representative of the patient state at the time of measurement. We explored the use of free text in UK general practice patient records to evaluate delays in recording of rheumatoid arthritis (RA) diagnosis. We also aimed to locate and quantify disease and diagnostic information recorded only in text.

Setting UK general practice patient records from the Clinical Practice Research Datalink.

Participants 294 individuals with incident diagnosis of RA between 2005 and 2008; 204 women and 85 men, median age 63 years.

Primary and secondary outcome measures Assessment of (1) quantity and timing of text entries for disease-modifying antirheumatic drugs (DMARDs) as a proxy for the RA disease code, and (2) quantity, location and timing of free text information relating to RA onset and diagnosis.

Results Inflammatory markers, pain and DMARDs were the most common categories of disease information in text prior to RA diagnostic code; 10–37% of patients had such information only in text. Read codes associated with RA-related text included correspondence, general consultation and arthritis codes. 64 patients (22%) had DMARD text entries >14 days prior to RA code; these patients had more and earlier referrals to rheumatology, tests, swelling, pain and DMARD prescriptions, suggestive of an earlier implicit diagnosis than was recorded by the diagnostic code.

Conclusions RA-related symptoms, tests, referrals and prescriptions were recorded in free text with 22% of patients showing strong evidence of delay in coding of diagnosis. Researchers using EHRs may need to mitigate for delayed codes by incorporating text into their case-ascertainment strategies. Natural language processing techniques have the capability to do this at scale.

  • Rheumatoid arthritis
  • electronic health records
  • data quality
  • general practice
  • free text

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.