Quality of recording of diabetes in the UK: how does the GP's method of coding clinical data affect incidence estimates? Cross-sectional study using the CPRD database

A Rosemary Tate; Sheena Dungey; Simon Glew; Natalia Beloff; Rachael Williams; Tim Williams

doi:10.1136/bmjopen-2016-012905

Responses

PDF

XML

Health informatics

Research

Quality of recording of diabetes in the UK: how does the GP's method of coding clinical data affect incidence estimates? Cross-sectional study using the CPRD database

Compose a Response to This Article

Compose Response

Title *

Author Information

Contributors
First Name and Middle Initial * First or given name, e.g. 'Peter'. Last Name * Your last, or family, name, e.g. 'MacMoody'. Email Address * Your email address, e.g. higgs-boson@gmail.com Occupation * Your role and/or occupation, e.g. 'Orthopedic Surgeon'. Affiliation * Your organization or institution (if applicable), e.g. 'Royal Free Hospital'.

Statement of Competing Interests

Competing interests? *

Yes

Please describe the competing interests

PLEASE NOTE:

A rapid response is a moderated but not peer reviewed online response to a published article in a BMJ journal; it will not receive a DOI and will not be indexed unless it is also republished as a Letter, Correspondence or as other content. Find out more about rapid responses.
We intend to post all responses which are approved by the Editor, within 14 days (BMJ Journals) or 24 hours (The BMJ), however timeframes cannot be guaranteed. Responses must comply with our requirements and should contribute substantially to the topic, but it is at our absolute discretion whether we publish a response, and we reserve the right to edit or remove responses before and after publication and also republish some or all in other BMJ publications, including third party local editions in other countries and languages
Our requirements are stated in our rapid response terms and conditions and must be read. These include ensuring that: i) you do not include any illustrative content including tables and graphs, ii) you do not include any information that includes specifics about any patients,iii) you do not include any original data, unless it has already been published in a peer reviewed journal and you have included a reference, iv) your response is lawful, not defamatory, original and accurate, v) you declare any competing interests, vi) you understand that your name and other personal details set out in our rapid response terms and conditions will be published with any responses we publish and vii) you understand that once a response is published, we may continue to publish your response and/or edit or remove it in the future.
By submitting this rapid response you are agreeing to our terms and conditions for rapid responses and understand that your personal data will be processed in accordance with those terms and our privacy notice.

I agree to and have read the terms and conditions for rapid responses and BMJ Privacy Notice *

CAPTCHA

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

Vertical Tabs

Other responses

Jump to comment:

Selecting code lists to define a disease
Daniel Ng, Ravi Patel, Jonathan Bates-Powell, Duncan Edwards and Jenny Lund
Published on: 6 February 2017

Published on: 6 February 2017
Selecting code lists to define a disease
- Daniel Ng, Medical Student Primary Care Unit, University of Cambridge
- Other Contributors:
  Ravi Patel, Medical Student
  
  Jonathan Bates-Powell, Medical Student
  
  Duncan Edwards, GP
  
  Jenny Lund, GP
As Tate et al. (2017) have shown, taking a systematic approach to creating a code list is necessary, in the face of significant variation in incidence estimates when different code lists are used. Our group has been working on finding a systematic approach to code list selection for diabetes, by looking at the effect of additional codes on prevalence estimates.

We have looked at the effects of adding additional codes to a code list, on the number of patients identified with diabetes in CPRD at a single point in time. We looked at a randomised sample of 25,000 patients, downloaded on 7th June 2016, from CPRD. A comprehensive list of 378 diagnostic codes for diabetes was determined by visual inspection of all codes which contained the “diabetes”/”diabetic” keywords. 2334 diabetic patients were identified in our sample using this comprehensive code list. This was defined as the complete cohort.

All codes in the code list were then ranked, using the following algorithm:

1. The diabetes code that identified that largest number of patients was ranked highest.

2. The next ranked code was the one that identified the largest number of new patients.

3. Repeat (2) until all patients in the cohort are identified.

Thus, we created a list where codes were ranked according to how useful they were in identifying additional diabetic patients.

To illustrate, our highest ranked code, ‘Type 2 diabetes mellitus’, identified 1504...
Show More

As Tate et al. (2017) have shown, taking a systematic approach to creating a code list is necessary, in the face of significant variation in incidence estimates when different code lists are used. Our group has been working on finding a systematic approach to code list selection for diabetes, by looking at the effect of additional codes on prevalence estimates.

We have looked at the effects of adding additional codes to a code list, on the number of patients identified with diabetes in CPRD at a single point in time. We looked at a randomised sample of 25,000 patients, downloaded on 7th June 2016, from CPRD. A comprehensive list of 378 diagnostic codes for diabetes was determined by visual inspection of all codes which contained the “diabetes”/”diabetic” keywords. 2334 diabetic patients were identified in our sample using this comprehensive code list. This was defined as the complete cohort.

All codes in the code list were then ranked, using the following algorithm:

1. The diabetes code that identified that largest number of patients was ranked highest.

2. The next ranked code was the one that identified the largest number of new patients.

3. Repeat (2) until all patients in the cohort are identified.

Thus, we created a list where codes were ranked according to how useful they were in identifying additional diabetic patients.

To illustrate, our highest ranked code, ‘Type 2 diabetes mellitus’, identified 1504 patients, which corresponded to 64.4% of the complete cohort. The 2nd ranked code, ‘Diabetes mellitus’, identified an additional 98 patients, such that the top 2 codes together identified 68.6% of the complete cohort. The 3rd ranked code, ‘O/E - Right diabetic foot at low risk’, identified an additional 96 patients, such that the top 3 codes together identified 72.8% of the complete cohort.

Continuing the described ranking process, the 27 highest ranking codes out of 378 were able to identify 95% of the complete diabetic cohort. 78 codes were able to identify 100% of the complete cohort. Thus, the number of codes needed in a code list for picking up diabetic patients with high sensitivity is not necessarily large. Furthermore, codes such as ‘O/E - Right diabetic foot at low risk’ ranked highly, despite being a more descriptive Read code.

The estimation of incidence presents a more substantial undertaking than the prevalence estimates used in our simple analysis. However, it still serves to highlight that identifying approaches to systematising code list selection could help avoid inadvertent miss-estimates in CPRD studies.
Show Less

Conflict of Interest:
None declared.
- Back to top

Main menu

Plain text

PLEASE NOTE:

Vertical Tabs

Other responses

Jump to comment:

Log in using your username and password

Main menu

Log in using your username and password

You are here

Plain text

PLEASE NOTE:

Vertical Tabs

Other responses

Jump to comment:

Read the full text or download the PDF:

Log in using your username and password