Incidence and prevalence of inflammatory bowel disease in UK primary care: a population-based cohort study

Objectives We describe temporal trends in the recorded incidence of inflammatory bowel disease (IBD) in UK primary care patients between 2000 and 2018. Design A cohort study. Setting The IQVIA Medical Research data (IMRD) primary care database. Participants All individuals registered with general practices contributing to IMRD during the period 01 January 2000–31 December 2018. Main outcome measures The primary outcome was the recorded diagnosis of IBD. Results 11 325 025 individuals were included and 65 700 IBD cases were identified, of which 22 560 were incident diagnoses made during the study period. Overall, there were 8077 incident cases of Crohn’s disease (CD) and 12 369 incident cases of ulcerative colitis (UC). Crude incidence estimates of ‘IBD overall’, CD and UC were 28.6 (28.2 to 28.9), 10.2 (10.0 to 10.5) and 15.7 (15.4 to 15.9)/100 000 person years, respectively. No change in IBD incidence was observed for adults aged 17–40 years and children aged 0–9 years. However, for adults aged over 40 years, incidence fell from 37.8 (34.5 to 41.4) to 23.6 (21.3 to 26.0)/100 000 person years (average decrease 2.3% (1.9 to 2.7)/year (p<0.0001)). In adolescents aged 10–16 years, incidence rose from 13.1 (8.4 to 19.5) to 25.4 (19.5 to 32.4)/100 000 person years (average increase 3.0% (1.7 to 4.3)/year (p<0.0001)). Point prevalence estimates on 31 December 2018 for IBD overall, CD and UC were 725, 276 and 397 per 100 000 people, respectively. Conclusions This is one of the largest studies ever undertaken to investigate trends in IBD epidemiology. Although we observed stable or falling incidence of IBD in adults, our results are consistent with some of the highest reported global incidence and prevalence rates for IBD, with a 94% rise in incidence in adolescents. Further investigation is required to understand the aetiological drivers.


INTRODUCTION
The inflammatory bowel diseases (IBD; Crohn's disease (CD), ulcerative colitis (UC) and IBD unclassified (IBDU)) are chronic inflammatory conditions of unknown aetiology that affect the gastrointestinal tract. 1 2 In North America, over 1.5 million individuals are living with IBD and in Europe it is estimated that 2.5-3 million individuals are affected, with an estimated direct healthcare cost of 4.6-5.6 billion Euros/year. 3 4 Historically, IBD was regarded as a disease of high-income western countries with a substantial rise in incidence observed during the latter half of the 20th century. 5 However, there is evidence that the rate has plateaued in western countries while rising rapidly in newly industrialised countries. 4 Accurate and up-to-date estimates of trends in incidence and prevalence of IBD are an essential step in preparing services for the delivery of future IBD care.
Studies using local hospital records and secondary care databases have been conducted to describe the epidemiology of IBD in the UK. [6][7][8][9] However, patient follow-up is challenging and loss to follow-up may introduce bias, notably where patients do Strengths and limitations of this study ► We used a large sample size. ► The data source was prospectively collected healthcare records representative of 'real-life' clinical practice. ► Unlike many previous inflammatory bowel disease incidence/prevalence studies that have relied on external data sources to estimate denominator population characteristics, we were able to extract demographics and person-time follow-up for all individuals in our cohort. ► Data in IQVIA Medical Research Data are recorded for patient management purposes rather than medical research. ► As our records were not linked to secondary care, we were not able to confirm our cases by evidence of radiological, endoscopic or histological findings.
Open access not require hospitalisation and/or move geographical location. More recently, estimates of incidence and prevalence of IBD were reported in a rigorously validated IBD cohort of 10 926 cases in Lothian, Scotland. 10 However, it remains unknown if these findings are generalisable across the UK. In the UK, over 60 000 000 people are registered with a general practice (GP; about 91% of the population). 11 Electronic GP health records databases can enable largescale investigation of relatively rare diagnoses such as IBD. 12 The largest such UK study performed to date was undertaken in Northern England and included 179 incident cases of IBD diagnosed in a population of 135 723 during the period 1984-1995. 13 In the present study, we investigated temporal trends in the incidence of IBD diagnoses from 2000 to the end of 2018 using electronic GP data from the IQVIA Medical Research Data (IMRD) primary care database (formally The Health Improvement Network database).

MATERIALS AND METHODS
Data source IMRD is a large longitudinal database currently containing the anonymised electronic medical records of 18.3 million patients collected from 797 GPs throughout the UK; 3 million of these patients are presently registered with a practice contributing to IMRD and are currently providing data. It is one of the most comprehensive data sources of its kind and is used worldwide for research by academic institutions, government departments and the pharmaceutical industry. Data are based on patient consultation records and include demographics (eg, gender, age and socioeconomic level of deprivation), presenting symptoms and diagnoses, referrals to secondary care, medications, results of investigations, vaccinations and additional health data such as height, blood pressure, weight and smoking status. Data are recorded in IMRD using the Read code hierarchical coding system. 14 No other coding system was used in IMRD for the duration of the observation period. The GPs in IMRD are broadly representative of all primary care practices in the UK in terms of age and sex of patients, practice size, geographical distribution and the prevalence of numerous chronic conditions such as hypertension, diabetes, asthma and epilepsy. 15 A previous validation study using electronic GP records showed that for individuals with a code for IBD, the diagnosis was highly probable or probable for 92% (95% CI 86% to 96%) of patients. 12 Although IMRD data are not linked to secondary care records, diagnoses made in secondary care are captured in IMRD; either through letters and communications to the GP or during patient consultations.

Study population
All data included in this study were from time periods after the GPs had met acceptable computer usage (ACU) and acceptable mortality reporting (AMR) standards. 16 17 The date of meeting ACU was the date after which the practice was confirmed to have electronically logged an average of at least one medical record, one additional health record and two prescriptions per person year. The date of meeting AMR standards was the date after which the practice was confirmed to record mortality at a similar rate to that expected for a population with comparable demographics as per the Office for National Statistics.
All individuals of any age contributing data between the 1 January 2000 and the 31 December 2018 were included. The study was a dynamic cohort with individuals entering and exiting at different times. Cohort entry was defined as the latest date of the following: 1 January 2000; the date of registration with the GP plus 9 months to account for prevalent disease being recorded as incident disease when patients register with the practice (this time period was selected using previously published methodology) 18 ; or the date the practice met predefined quality indicators for electronic data (AMR and ACU). Cohort exit was defined as the earliest date of the following: first diagnosis of IBD; deregistration with the GP contributing data; death; or 31 December 2018.

Main outcome definitions
The main outcomes of interest were newly diagnosed CD, UC or any IBD. The any IBD category included specific and general terms for IBD (comprising CD, UC, IBDU and unspecified IBD). Read code lists, adapted from those used in previous literature, 12 13 19 were generated for all three main outcomes using published methodology, 20 then subsequently discussed with a panel of experts including gastroenterologists, epidemiologists, a GP and a statistician (online supplementary appendix Read code lists).
As a quality filter, individuals were only included in the study as cases if they had at least two IBD Read codes recorded on separate dates or at least one IBD Read code plus at least one prescription for a drug commonly used to treat IBD (any aminosalicylate or rectal steroid enema listed in chapter 1.5 of the British National Formulary, 21 azathioprine, mercaptopurine, methotrexate, ciclosporin, infliximab or adalimumab) (online supplementary appendix Read code lists). As 'incident IBD cases' (who have not been prescribed 'IBD drugs') were required to have their diagnosis verified on a subsequent GP visit, we anticipated that this would result in under ascertainment of those diagnosed in the final months of the study period. Thus, temporal trends in incidence are graphically presented for the period 2000-2017 as opposed to 2000-2018. The date at which the first recording of any IBD code or IBD drug prescription was made was classified as the incident date. For individuals who had been given a code for both UC and CD in their lifetime, the most recent code recorded was used as their final diagnosis.
A separate algorithm was developed to explore the validity of the diagnosis of IBD in IMRD. This involved checking whether individuals who had ever been given a medical Read code for IBD had a record of (1) presentation with symptoms suggestive of IBD (abdominal pain, diarrhoea, bloody stools, weight loss); (2) a prescription for a drug commonly used to treat IBD. 'Incident cases' were not required to meet these criteria to be included in the analysis.

Covariates
The following covariates were included in the analyses: (1) birth gender, (2) age by Montreal/Paris classification 22 23 (A classification system for IBD whereby the 'A' variable describes 'age at diagnosis', the levels of which are: A1a (0-9 years); A1b (10-17 years); A2 (17-40 years) and A3 (40+ years)), (3) calendar time, (4) Townsend Deprivation Index (a quintile measurement of social deprivation based on post code linked census data) 24 and (5) geographical location of GP. This was included at the level of former Strategic Health Authority (a defined region responsible for the management of health services for that particular area) for England and at the level of country for Scotland, Wales and Northern Ireland.

Statistical analyses
Crude incidence estimates per 100 000 person years at risk were calculated by dividing the total number of cases by the total number of person years of follow-up then multiplying by 100 000. This was done separately for CD, UC and any IBD with 95% CIs estimated assuming a Poisson distribution. Stratified incidence rates were calculated by sex, age, Townsend Deprivation Index and geographical location. Time period was fitted as both a continuous variable and a categorical variable by calendar year. Mixed multivariable Poisson regression was used to estimate incidence rate ratios (IRRs). Individuals with missing data on Townsend score were included in the analysis using 'missing' as a level to the Townsend variable. GP was included as a random effect to account for any data clustering by practice; the other covariates were included as fixed effects. The Wald test was used to test for significance of categorical variables in the regression model and to test for multiplicative interactions.
Point prevalence was calculated by dividing all cases of IBD (both incident and prevalent) by the total number of individuals contributing data to the cohort on the last day of the study period.
StataCorp. 2017. Stata Statistical Software: Release 15. College Station, Texas: StataCorp LLC was used for all analyses.

Sensitivity analysis
In the sensitivity analysis, we broadened our case definition to include any individual who had a single IBD medical Read code (as opposed to two medical Read codes or one medical Read code plus one relevant prescription).

Patient and public involvement
We involved representatives from the University College Hospitals NHS Foundation Trust patient with IBD panel in the early stages of protocol design. However, we did not involve patient and public involvement (PPI) representatives in other aspects of study design or analysis due to IMRD licence agreements and the technical computer programming methods that were involved. We intend to involve PPI representatives in writing a plain language summary for dissemination to peers and patient groups.
Of 28 879, 24 173 (83.7%) individuals given a new code for IBD since entering the study had a record of a prescription for a drug commonly used to treat IBD, comparing to 1.8% for the whole cohort. Additionally, 23 337/28 879 (80.8%) with a code for IBD had a record of presentation to their GP with either diarrhoea, bloody stools, abdominal pain or weight loss, comparing to 33.6% for the whole cohort.
Incidence     However, no fall in CD incidence was observed (online supplementary appendix figure 1). When stratifying IBD incidence by 5 year age bands, the peak in incidence later in life was higher and occurred later than in the primary analysis (online supplementary appendix figures 2 and 3).

DISCUSSION
This is one of the largest observational studies undertaken to investigate trends in IBD epidemiology. Although incidence of IBD remained relatively stable for those aged 17-40 years and those aged 0-9 years, we observed a 38% fall in incidence for those aged over 40 years and a 94% rise in incidence in the adolescent population. The most recent incidence estimates are in line with some of the highest reported rates of paediatric IBD internationally. [25][26][27] Study strengths include the large sample size and the prospective collection of healthcare records representative of 'real-life' clinical practice. Unlike previous incidence/prevalence studies that have relied on external data sources to estimate denominator population characteristics, we were able to extract demographics and person-time follow-up for all individuals in our cohort, including those who did not develop IBD. Additionally, IMRD has been shown to be broadly representative of the UK in terms of age, sex, mortality rates and prevalence of numerous comorbidities, 15 allowing us to draw inferences from our data and relate this to the UK population as a whole. Not only has the diagnosis of IBD been validated in a similar GP database, 12 but we have demonstrated that the majority of individuals coded for IBD in IMRD have been prescribed drugs commonly used to treat IBD and presented with symptoms in keeping with IBD. This would support the argument that IMRD represents an important and useful resource for further epidemiological studies of IBD.
Limitations arise when conducting GP database research, particularly as the primary use of the software that contributes to IMRD is for patient management purposes rather than medical research. Thus, data can be incomplete and will often only reflect those events   Open access that are deemed to be relevant to the patient's care. Given that we were also reasonably strict with our case definition, this may have resulted in underascertainment of cases. Although we find reason to be confident in the validity of the data, we were not able to confirm our cases by evidence of radiological, endoscopic or histological findings. Therefore, it is possible that some individuals were misclassified. There was a small risk of duplication of medical records. This could occur if a patient deregistered with one practice contributing to IMRD then subsequently registered with another IMRD practice during the observation period. This is likely to be the case for a very small number of individuals as IMRD only covered 5%-6% of UK GP during the study period. Although the total number of individuals contributing may be a slight overestimate, this would have no effect on incidence or prevalence rates. This is for two reasons: (1) duplicated records would cover different time periods during the study without overlap; (2) we took steps to ensure that prevalent cases of IBD newly transferring to practices were not counted as incident cases 18 Therefore, incident cases were not counted twice. Data from a multicentre European study (including two UK sites in North West London and East Yorkshire) reported site incidence rates of 2.6 and 8.4/100 000 person years for CD and 15.9 and 8.2/100 000 years for UC. 28 However, only a small number of UK cases were included (n=167). Incidence rates of 8.3 (3.4 to 13.2) and 13.9 (95% CI 7.5 to 20.3)/100 000 person years for CD and UC, respectively, have been reported in North-East England for the period 1990-1994. 13 We report overall incidence rates of 10.2 (95% CI 10.0 to 10.5) and 15.6 (95% CI 15.3 to 15.9)/100 000 person years for CD and UC, respectively, in a far larger cohort and at a national level. We report considerable geographical variation in IBD incidence across the UK with notably high CD incidence in Scotland and Northern Ireland and high UC incidence in the East of England. This may reflect variation in lifestyle factors such as dietary habits and importantly smoking (it is estimated that 14.4% adults in England smoke compared with 15.9% in Scotland and 16.3% in Northern Ireland). 29 A Danish study based on nationwide registry data (1995-2012) observed comparable incidence rates: 8.9 (95% CI 8.3 to 9.5) and 10.3 (95% CI 9.7 to 11.0)/100 000 person years for CD and 23.4 (95% CI 22.4 to 24.5) and 23.2 (95% CI 22.2 to 24.3)/100 000 person years for UC in males and females, respectively. 30 In contrast to our results, they observed overall rising incidence rates of IBD, but their study was conducted in a different country over an earlier time period including the 1990s when a rise in IBD incidence was described in many high-income countries. Although they adjusted for age in their analysis, temporal trends in incidence stratified by age group were not reported.
In the sensitivity analysis, we observed higher than expected overall incidence rates. Additionally, for UC, the observed peak in incidence for older individuals was higher than the peak in incidence for younger individuals (online supplementary appendix figure 2); this would be unusual in clinical practice. An explanation for this could be that a number of these patients, who perhaps had colitis of a different aetiology, had been misclassified as IBD. On the basis of these findings, one Read code alone was deemed not specific enough for the diagnosis of IBD.
In keeping with published literature, we observed a rising incidence of paediatric IBD during the early 21st century 31 and we provide further evidence of male preponderance in paediatric IBD when compared with adult onset disease. 32 33 Uniquely, in our study, we have demonstrated rising incidence of adolescent IBD in the context of stable incidence in those aged 17-40 years and falling incidence in the over 40s. This may represent a general shift towards earlier diagnosis of IBD for all age groups except the very young (age 0-9 years). Given that IBD most commonly presents in the second to fourth decade of life, 1 2 rising incidence in adolescents might be explained by a number of factors, including improvements in referral pathways and the introduction of new diagnostic tools (eg, faecal calprotectin testing or capsule endoscopy) resulting in cases being picked up earlier. However, one would expect stepwise increases in incidence when new diagnostic tools are rolled out, which we did not observe. Moreover, if rising incidence of adolescent IBD is due to improved referral pathways, a corresponding rise in incidence might be expected in very young children as well as adolescents. It could be argued that changes in GP coding practice may be contributing. But again, one would expect comparable changes in younger age groups if this were the case. On the other hand, if the epidemiological patterns we observed reflect real increases in the incidence of pathology, this is of great concern and could represent earlier manifestation of disease related to environmental exposures in childhood and adolescence.
Our prevalence estimates were very similar to those reported in a well-validated IBD cohort in Lothian, Scotland 10 ; our estimate of IBD prevalence for Scotland on 31 August 2018 was 810 per 100 000 compared with 832 per 100 000 reported by their group. Although our study lacked linkage to secondary care records, the similar prevalence estimates would support the argument that few cases were missed. In 2018, 67 150 000 people were estimated to be living in the UK from which we might extrapolate from our data that there were approximately 487 000 people living with IBD in the UK at that time.
Compounding prevalence of IBD has been demonstrated in Canada and in Scotland. 10 34 This relates to the principle that although IBD incidence may be static or falling, while IBD mortality remains very low, overall prevalence will increase (more people are being diagnosed than are dying). In Scotland, IBD prevalence is estimated to reach 1.0% by 2028. We report rising incidence rates of IBD in younger populations and falling incidence in older age groups. Thus, not only will services need to be attuned to rising IBD prevalence and an ageing Open access demographic, but also to increasing numbers of new diagnoses in young people who will require lifelong care. This is in the context of significant financial challenges to health services.

CONCLUSION
Although we observed a stable or falling incidence of IBD in adults over an 18-year period, our results are consistent with some of the highest reported global incidence and prevalence rates for IBD, with a 94% rise in incidence in adolescents. These findings are concerning and suggest that detailed prospective studies are required to understand the aetiological drivers. Contributors TJP: study concept and design; analysis and interpretation of data; drafting of the manuscript; statistical analysis; obtained funding. LH: study concept and design; analysis and interpretation of data; critical revision of the manuscript for important intellectual content; statistical analysis. SB and NF: study concept and design; critical revision of the manuscript for important intellectual content. AWS: study concept and design; critical revision of the manuscript for important intellectual content; obtained funding. CS: study concept and design; critical revision of the manuscript for important intellectual content; statistical analysis. GR: study concept and design; analysis and interpretation of data; critical revision of the manuscript for important intellectual content; study supervision. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding This work was supported by The Charles Wolfson Charitable Trust (grant number 539234) and The Harbour Foundation (grant number 549321). Laura Horsfall is supported by a Wellcome Trust Fellowship (grant number 209207/Z/17/Z).
Map disclaimer The depiction of boundaries on this map does not imply the expression of any opinion whatsoever on the part of BMJ (or any member of its group) concerning the legal status of any country, territory, jurisdiction or area or of its authorities. This map is provided without any warranty of any kind, either express or implied.
Competing interests TJP and AWS report research grants from The Charles Wolfson Charitable Trust and The Harbour Foundation for the submitted work.
Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.
Patient consent for publication Not required.
Ethics approval IMRD data collection was approved by the NHS South-East Multicentre Research Ethics Committee in 2003. This study was approved by the Scientific Research Committee (SRC) on 29/09/2018 (SRC reference 18THIN082).
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Due to licence agreements with IQVIA, we are unable to share patient level data from the IQVIA Medical Research Database. However, we are happy to share our data extractions upon reasonable request. Data requesters should email the corresponding author to request the relevant data.
Open access This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http:// creativecommons. org/ licenses/ by-nc/ 4. 0/.