Antibiotic prescribing records in two UK primary care electronic health record systems. Comparison of the CPRD GOLD and CPRD Aurum databases

Word count: Abstract 324 words Text 2,596 words Tables 2 Figures 1 . CC-BY 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 10, 2020. ; https://doi.org/10.1101/2020.03.07.20028290 doi: medRxiv preprint


INTRODUCTION
Primary care electronic health records (EHRs) provide an important source of longitudinal population-based data for public health research and population health surveillance. (1) In the UK, several EHR databases collect de-identified data from general practices, which are responsible for providing a broad range of general medical services in the primary care setting. The CPRD GOLD database (1) and The Health Improvement Network (THIN) database (2) collect data from general practices that use the Vision practice system; data from EMIS practice systems have been historically collected by the QResearch database. (3) CPRD has now established a new database, CPRD Aurum (4) that also collects data from general practices using the EMIS practice system.
While the research community has nearly 30 years of experience of using Vision data from CPRD GOLD, with numerous studies reporting on data quality, (5,6) less is known about the similarities and differences with data collected in the CPRD Aurum database. EHR systems may offer end-users a variety of options, and differing incentives, to code clinical data and record test results. It is therefore possible that analysis of Vision-derived data from CPRD GOLD and EMIS-derived data from CPRD Aurum may yield different findings when substantive research questions are addressed. However, the magnitude of any possible differences between these data sources are largely unexplored. This study aimed to compare results obtained from an analysis of CPRD GOLD and CPRD Aurum data for one exemplar, the recording of antibiotic prescribing.
Antibiotic prescribing in primary care has been the subject of increasing interest in recent years because of the growing awareness that unnecessary prescription of antibiotics in primary care is contributing to the problem of antimicrobial resistance. (7,8) Research using CPRD GOLD (9,10) and THIN, (11) showed that the antibiotic prescribing rate is between . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 10, 2020. ; https://doi.org/10.1101/2020.03.07.20028290 doi: medRxiv preprint 4 500 and 600 antibiotic prescriptions per 1,000 patient years, with higher rates at the extremes of age. Nearly half of antibiotic prescriptions may be issued without a clear indication being recorded. (9,11) The present analysis aimed to determine whether analysis of data from CPRD Aurum and CPRD GOLD provides similar estimates with respect to antibiotic prescription and recording.

Data and participants
The CPRD GOLD database collects data from the four countries of the UK, with 30% of contributing practices located in England at the time of this study. The CPRD GOLD database has been well described (1) and the high quality of the data collected has been documented in many studies. (12) The October 2019 database release from which the study cohort was sampled for this analysis included data on 17.6 million patients, of whom 2.6 million were currently active The CPRD Aurum database is more recently established, and at the time of this study (June 2019 release) drew on data collected from general practices in England only, using the EMIS practice system.(4) The CPRD Aurum database from which patients were sampled included data on 883 general practices with 23.1 million patients, including 2.5 million currently active patients. The study required analysis of anonymised data, the protocol for the study received scientific and ethical approval from the CPRD Independent Scientific Advisory Committee (ISAC Protocol 19_110R).
A sample of 158,305 patients in CPRD Aurum was taken by randomly selecting 'n' patients from each stratum of general practice, gender and age-group. The value of n=9 was selected to provide an appropriate total sample size of just over 150,000. This sampling . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 10, 2020. ; https://doi.org/10.1101/2020.03.07.20028290 doi: medRxiv preprint approach ensured that each general practice was equally represented in the analysis and that age-specific rates would be estimated with equal precision. Age was calculated as the

Measures
We evaluated antibiotic prescribing for the year 2017 because this was the most recent complete year that we included in a larger study of antibiotic prescribing that we report  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 10, 2020. ; https://doi.org/10.1101/2020.03.07.20028290 doi: medRxiv preprint For CPRD GOLD, we employed a list of 2,627 antibiotic drugs that were identified from searches of the CPRD GOLD product dictionary browser made by all authors. Searches were made on the drug substance name, product name, BNF chapter and BNF codes. To identify the corresponding products in CPRD Aurum, dm+d codes (the prescribing codes from the National Health Service dictionary of medicines and devices) associated with individual product codes in the CPRD GOLD dictionary browser were mapped to the corresponding dm+d codes in the CPRD Aurum product dictionary browser. A more complete search of the CPRD Aurum product dictionary browser was additionally undertaken on term, product name and drug substance. We also conducted searches using approximate string matching (`fuzzy matching`) to match the CPRD Aurum product name to the CPRD GOLD product name or drug substance name from the CPRD GOLD antibiotic code list. The 'agrep' command was used in the R program, (15)   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 10, 2020. ; https://doi.org/10.1101/2020.03.07.20028290 doi: medRxiv preprint

Analysis
Age-specific rates were estimated with 95% confidence intervals from the Poisson distribution. Age-and sex-standardised rates, and associated 95% confidence intervals, were calculated per 1,000 person-years using the 2013 European Standard Population as reference. Differences between databases were evaluated using Bland-Altman plots.(17)

Comparison of antibiotic prescribing rates
In the CPRD Aurum sample, there were 158,305 participants from 883 general practices with 101,360 antibiotic prescriptions during 2017. In the CPRD GOLD sample there were 160,394 patients from 290 general practices with 112,931 antibiotic prescriptions during 2017. The age-and sex-standardised antibiotic prescribing rate was 512.6 (95% confidence interval 510.4 to 514.9) per 1,000 PYs in CPRD Aurum and 584.3 (582.1 to 586.5) per 1,000 PYs in CPRD GOLD. The rate for CPRD GOLD practices in England was 505.2 (501.6 to 508.9) per 1,000 PYs, which is similar to the rate observed in CPRD Aurum.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 10, 2020. ; https://doi.org/10.1101/2020.03.07.20028290 doi: medRxiv preprint provides a Bland-Altman plot that presents the difference (95% confidence interval) between CPRD GOLD and CPRD Aurum for all CPRD GOLD practices (blue) and CPRD GOLD practices in England (red). In men and women for all age groups, CPRD GOLD general practices generally had slightly higher antibiotic prescribing rates than CPRD Aurum, while CPRD GOLD general practices in England had similar antibiotic prescribing rates to CPRD Aurum. Table 1 presents data for the 25 most frequently prescribed antibiotic products. In CPRD Aurum, amoxicillin 500mg capsules, doxycycline 100mg capsules, flucloxacillin 500mg capsules, trimethoprim 200mg tablets and nitrofurantoin 100mg modified-release capsules represented the five most frequently prescribed products, accounting for 45% of all antibiotic prescriptions. In CPRD GOLD, there were more prescriptions for trimethoprim and fewer prescriptions for nitrofurantoin, consequently clarithromycin 500 mg tablets and not nitrofurantoin appeared as the fifth ranked product. The same pattern was observed for CPRD GOLD practices in England, although trimethoprim comprised a smaller proportion of all prescriptions than in CPRD GOLD as a whole. Twenty-three of the 25 most frequently prescribed drugs in CPRD Aurum were also in the top 25 ranked prescriptions in CPRD GOLD general practices.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Recording of medical terms associated with prescriptions
The copyright holder for this this version posted March 10, 2020.

Main findings
This analysis shows that antibiotic prescribing estimates from EMIS-derived data in CPRD Aurum are similar to those obtained through analysis of Vision-derived data in CPRD GOLD.
This similarity includes the rates of antibiotic prescriptions for sub-groups of age and gender, the drug name and strength of antibiotic products prescribed, and the recording of medical diagnoses on the same day as the antibiotic prescription. We noted that antibiotic prescribing rates are generally higher in CPRD GOLD than in CPRD Aurum, but this was not the case when the CPRD GOLD sample was restricted to general practices in England. This implies that antibiotic prescribing may be higher in Wales, Scotland and Northern Ireland where there are no charges to patients for prescriptions. As well as slight differences in overall rates, we noted that drug choice might vary between databases. Trimethoprim prescribing was higher in CPRD GOLD than in CPRD Aurum. Nitrofurantoin has been recommended by Public Health England (18) as the drug of first choice for urinary tract infections in adults because of increasing antimicrobial resistance to trimethoprim. CPRD GOLD general practices in England were more similar to CPRD Aurum with respect to prescribing of trimethoprim and nitrofurantoin. It is likely that differences in clinical practice between England and the devolved administrations (Scotland, Wales and Northern Ireland) may be greater than the differences between EMIS and Vision practices within England.
Previous primary care database studies have revealed that antibiotic prescriptions are often issued without specific reasons being coded into electronic health records. A high proportion . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 10, 2020. ; https://doi.org/10.1101/2020.03.07.20028290 doi: medRxiv preprint of antibiotic prescriptions in the THIN (11) and CPRD GOLD (9) databases are associated with either no code being recorded or non-specific codes being recorded. Electronic health records systems may offer users discretion over the recording of data items. We found that EMIS data included similar proportions of antibiotic prescriptions being associated with no codes, non-specific codes and codes for infection episodes.

Strength and Limitations
One major strength of our study is that we used real world data from primary care to estimate rates of antibiotic prescribing. Using data from primary care is likely to provide a reliable picture of prescribing patterns given that about 80% of all antibiotic prescribing in the UK's National Health Service take place in primary care.(18) We estimated the difference between CPRD Aurum and CPRD GOLD, as well as the difference between CPRD Aurum and CPRD GOLD in England. There were generally only small differences between CPRD Aurum and CPRD GOLD in England. We acknowledge that we could have obtained greater precision with larger samples, but the present approach was pragmatic and provided sufficiently precise estimates for age-specific rates. The community of CPRD researchers collectively have wide experience of compiling code lists for research in the CPRD GOLD database. Less experience is available for the CPRD Aurum database. We noted that CPRD Aurum product codes may be up to 17 characters in length and use of special programming features, such as the 'bit64' package in R, (16) are required in order to maintain data integrity. We completed extensive searches for product codes in order to identify antibiotic products. We identified a greater number of potential products from the CPRD GOLD data dictionary, but a generally similar number of antibiotic product codes were actually recorded in the two datasets during 2017. Searches in the CPRD Aurum product dictionary should be based on term, drug substance and product names as the BNF classification is less widely available in the CPRD Aurum product dictionary than in CPRD GOLD. It may also be possible to compare 'dm+d' codes from the dictionary of medicines and devices, which are . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 10, 2020. ; https://doi.org/10.1101/2020.03.07.20028290 doi: medRxiv preprint now employed in both CPRD GOLD and CPRD Aurum product dictionaries. We mapped medical code sets between CPRD GOLD and CPRD Aurum by matching on Read codes to make a like-for-like comparison. The analysis shows that, for these conditions, use of the same Read codes gives similar results in CPRD Aurum and CPRD GOLD. There are some medical codes that are only employed in EMIS, which might be omitted through this process, and this merits further evaluation. Experience shows that in Read-coded data, the majority of events are associated with a small number of codes, consequently omission of infrequently used codes is seldom important; our main findings with respect to medical codes were consistent between databases. We also note that records of antibiotic prescribing do not indicate whether medicines were dispensed, whether they were taken, or whether they were taken by the patient they were prescribed to or by someone else. It is also possible that prescriptions recorded at out of hours visits, home visits or during attendance at residential care homes may be missing from the patient electronic record. Finally, the analysis undertaken was cross-sectional in nature and does not provide evidence about trends in antibiotic recording over time between CPRD GOLD and CPRD Aurum. We used data from July and October 2019 releases of CPRD Aurum and CPRD Gold respectively, it may be preferable to compare the same month's releases but data for 2017 should be complete by 2019.

Conclusion
This study finds that analysis of EMIS-derived data in CPRD Aurum gives generally similar estimates for antibiotic prescribing and infection recording to those reported for Visionderived data in CPRD GOLD. CPRD GOLD includes general practices in Scotland, Wales and Northern Ireland which have slightly higher antibiotic prescribing than either EMIS or Vision general practices in England. Based on these results, we believe that future research studies can be conducted in CPRD Aurum, informed by previous results from CPRD GOLD or THIN. It may be possible to combine CPRD GOLD and CPRD Aurum data in research on . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 10, 2020. ; https://doi.org/10.1101/2020.03.07.20028290 doi: medRxiv preprint antibiotic prescribing. However, further work is needed in other clinical topics and to understand the quality and completeness of information recorded in areas such as dosing regimen and treatment duration which is important in estimating treatment exposure in pharmacoepidemiology and pharmacovigilance research.

Data sources
The study is based on data from the Clinical Practice Research Datalink obtained under license from the UK Medicines and Healthcare products Regulatory Agency. However, the interpretation and conclusions contained in this report are those of the authors alone.

Data sharing
Requests for access to data from the study should be addressed to martin.gulliford@kcl.ac.uk. All proposals requesting data access will need to specify planned uses with approval of the study team and CPRD before data release.

Funding
The study is funded by the National Institute for Health Research (NIHR) Health Services and Delivery Programme (16/116/46). MG was supported by the NIHR Biomedical Research Centre at Guy's and St Thomas' Hospitals. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health. The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The authors had full access to all the data in the study and all authors shared final responsibility for the decision to submit for publication.

Conflict of Interest
The authors have no conflicts of interest.

Author Contributions
All authors contributed to the study design and definition of code lists. MG completed the analysis and drafted the paper. All authors reviewed and contributed to the final draft. MG is guarantor.
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 10, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 10, 2020. ; https://doi.org/10.1101/2020.03.07.20028290 doi: medRxiv preprint Figure 1: Antibiotic prescribing rates in CPRD Aurum and CPRD GOLD by age-group and sex.