Background Primary care databases provide a unique resource for healthcare research, but most researchers currently use only the Read codes for their studies, ignoring information in the free text, which is much harder to access.
Objectives To investigate how much information on ovarian cancer diagnosis is ‘hidden’ in the free text and the time lag between a diagnosis being described in the text or in a hospital letter and the patient being given a Read code for that diagnosis.
Design Anonymised free text records from the General Practice Research Database of 344 women with a Read code indicating ovarian cancer between 1 June 2002 and 31 May 2007 were used to compare the date at which the diagnosis was first coded with the date at which the diagnosis was recorded in the free text. Free text relating to a diagnosis was identified (a) from the date of coded diagnosis and (b) by searching for words relating to the ovary.
Results 90% of cases had information relating to their ovary in the free text. 45% had text indicating a definite diagnosis of ovarian cancer. 22% had text confirming a diagnosis before the coded date; 10% over 4 weeks previously. Four patients did not have ovarian cancer and 10% had only ambiguous or suspected diagnoses associated with the ovarian cancer code.
Conclusions There was a vast amount of extra information relating to diagnoses in the free text. Although in most cases text confirmed the coded diagnosis, it also showed that in some cases GPs do not code a definite diagnosis on the date that it is confirmed. For diseases which rely on hospital consultants for diagnosis, free text (particularly letters) is invaluable for accurate dating of diagnosis and referrals and also for identifying misclassified cases.
- Electronic patient records
- survey data
- non-response bias in surveys
- multivariate statistics
- misclassification bias
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.
Statistics from Altmetric.com
To cite: Tate AR, Martin AGR, Ali A, et al. Using free text information to explore how and when GPs code a diagnosis of ovarian cancer: an observational study using primary care records of patients with ovarian cancer. BMJ Open 2011;1:e000025. doi:10.1136/bmjopen-2010-000025
Funding This work was supported by the Wellcome Trust (086105/Z/08/Z). Access to the General Practice Research Database was funded through the Medical Research Council's licence agreement with Medicines and Healthcare Products Regulatory Agency (MHRA). The authors were independent from the funder and sponsor, who had no role in the conduct, analysis or the decision to publish. This study is based in part on data from the Full Feature General Practice Research Database obtained under licence from the UK MHRA. However, the interpretation and conclusions contained in this study are those of the authors alone.
Competing interests None.
Ethics approval Access to the dataset was approved by the Independent Scientific Advisory Committee (protocol 07 069).
Contributors ART conceived and wrote the paper, read and classified the free text, and carried out the subsequent analysis. AGRM devised the classification scheme, read through and classified the free text, wrote part of the paper, and provided expert advice. JAC was involved in the conception and writing of the paper. AA participated in writing the paper (including the literature review) and assisted with data management. All authors had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis. ART is the guarantor.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Codelists and statistical code available from the corresponding author.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.