More information about text formats
As Tate et al. (2017) have shown, taking a systematic approach to creating a code list is necessary, in the face of significant variation in incidence estimates when different code lists are used. Our group has been working on finding a systematic approach to code list selection for diabetes, by looking at the effect of additional codes on prevalence estimates.
We have looked at the effects of adding additional codes to a code list, on the number of patients identified with diabetes in CPRD at a single point in time. We looked at a randomised sample of 25,000 patients, downloaded on 7th June 2016, from CPRD. A comprehensive list of 378 diagnostic codes for diabetes was determined by visual inspection of all codes which contained the “diabetes”/”diabetic” keywords. 2334 diabetic patients were identified in our sample using this comprehensive code list. This was defined as the complete cohort.
All codes in the code list were then ranked, using the following algorithm:
1. The diabetes code that identified that largest number of patients was ranked highest.
2. The next ranked code was the one that identified the largest number of new patients.
3. Repeat (2) until all patients in the cohort are identified.
Thus, we created a list where codes were ranked according to how useful they were in identifying additional diabetic patients.
To illustrate, our highest ranked code, ‘Type 2 diabetes mellitus’, identified 1504...
To illustrate, our highest ranked code, ‘Type 2 diabetes mellitus’, identified 1504 patients, which corresponded to 64.4% of the complete cohort. The 2nd ranked code, ‘Diabetes mellitus’, identified an additional 98 patients, such that the top 2 codes together identified 68.6% of the complete cohort. The 3rd ranked code, ‘O/E - Right diabetic foot at low risk’, identified an additional 96 patients, such that the top 3 codes together identified 72.8% of the complete cohort.
Continuing the described ranking process, the 27 highest ranking codes out of 378 were able to identify 95% of the complete diabetic cohort. 78 codes were able to identify 100% of the complete cohort. Thus, the number of codes needed in a code list for picking up diabetic patients with high sensitivity is not necessarily large. Furthermore, codes such as ‘O/E - Right diabetic foot at low risk’ ranked highly, despite being a more descriptive Read code.
The estimation of incidence presents a more substantial undertaking than the prevalence estimates used in our simple analysis. However, it still serves to highlight that identifying approaches to systematising code list selection could help avoid inadvertent miss-estimates in CPRD studies.