Download PDFPDF

Quality of recording of diabetes in the UK: how does the GP's method of coding clinical data affect incidence estimates? Cross-sectional study using the CPRD database
Compose Response

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Author Information
First or given name, e.g. 'Peter'.
Your last, or family, name, e.g. 'MacMoody'.
Your email address, e.g.
Your role and/or occupation, e.g. 'Orthopedic Surgeon'.
Your organization or institution (if applicable), e.g. 'Royal Free Hospital'.
Statement of Competing Interests


  • Responses are moderated before posting and publication is at the absolute discretion of BMJ, however they are not peer-reviewed
  • Once published, you will not have the right to remove or edit your response. Removal or editing of responses is at BMJ's absolute discretion
  • If patients could recognise themselves, or anyone else could recognise a patient from your description, please obtain the patient's written consent to publication and send them to the editorial office before submitting your response [Patient consent forms]
  • By submitting this response you are agreeing to our full [Response terms and requirements]

Vertical Tabs

Other responses

Jump to comment:

  • Published on:
    Selecting code lists to define a disease
    • Daniel Ng, Medical Student Primary Care Unit, University of Cambridge
    • Other Contributors:
      • Ravi Patel, Medical Student
      • Jonathan Bates-Powell, Medical Student
      • Duncan Edwards, GP
      • Jenny Lund, GP

    As Tate et al. (2017) have shown, taking a systematic approach to creating a code list is necessary, in the face of significant variation in incidence estimates when different code lists are used. Our group has been working on finding a systematic approach to code list selection for diabetes, by looking at the effect of additional codes on prevalence estimates.

    We have looked at the effects of adding additional codes to a code list, on the number of patients identified with diabetes in CPRD at a single point in time. We looked at a randomised sample of 25,000 patients, downloaded on 7th June 2016, from CPRD. A comprehensive list of 378 diagnostic codes for diabetes was determined by visual inspection of all codes which contained the “diabetes”/”diabetic” keywords. 2334 diabetic patients were identified in our sample using this comprehensive code list. This was defined as the complete cohort.

    All codes in the code list were then ranked, using the following algorithm:

    1. The diabetes code that identified that largest number of patients was ranked highest.

    2. The next ranked code was the one that identified the largest number of new patients.

    3. Repeat (2) until all patients in the cohort are identified.

    Thus, we created a list where codes were ranked according to how useful they were in identifying additional diabetic patients.

    To illustrate, our highest ranked code, ‘Type 2 diabetes mellitus’, identified 1504...

    Show More
    Conflict of Interest:
    None declared.