Article Text


Exposure to combined oral contraceptives and risk of venous thromboembolism: a protocol for nested case–control studies using the QResearch and the CPRD databases
  1. Yana Vinogradova,
  2. Carol Coupland,
  3. Julia Hippisley-Cox
  1. Division of Primary Care, University of Nottingham, University Park, Nottingham, UK
  1. Correspondence to Yana Vinogradova; Yana.Vinogradova{at}


Introduction Many studies have found an increased risk of venous thromboembolism (VTE) associated with the use of combined hormonal contraceptives, but various methodologies have been used in the study design relating to definition of VTE event and the selection of appropriate cases for analysis. This study will focus on common oral hormonal contraceptives, including compositions with cyproterone because of their contraceptive effect and will perform a number of sensitivity analyses to compare findings with previous studies.

Methods and analysis 2 nested case–control studies will be based on the general population using records from UK general practices within the QResearch and Clinical Practice Research Datalink databases. Cases will be female patients aged 15–49 with primary VTE diagnosed between 2001 and 2013. Each case will be matched by age, year of birth and practice to five female controls, who are alive and registered with the practice at the time of diagnosis of the case (index date). Exposure to different hormonal contraceptives will be defined as at least one prescription for that contraceptive in the year before the index date. The effects of duration and the length of any gap since last use will also be investigated. Conditional logistic regression will be applied to calculate ORs adjusted for smoking, ethnicity, comorbidities and use of other medications. Possible indications for prescribing hormonal contraceptives, such as menstrual disorders, acne or hirsutism will be included in the analyses as confounding factors. A number of sensitivity analyses will be carried out.

Ethics and dissemination The initial protocol has been reviewed and approved by ISAC (Independent Scientific Advisory Committee) for Medicine and Healthcare Products Regulatory Agency Database Research. The project has also been reviewed by QResearch and meets the requirements of the Trent Research Ethics Committee. The results will be published in a peer-reviewed journal.

Statistics from

Strengths and limitations of this study

  • Primary care research databases.

  • Large size and great statistical power.

  • A range of sensitivity analyses to compare the results with other studies.

  • Results being adjusted for all confounders for which data are available.

  • Prescription-based study.

  • Possible uncertainty in the diagnosis of venous thromboembolism.

  • Underestimation of hormonal contraceptive use.

  • Lack of information on some confounding factors that might affect the choice of contraceptive drug.


An increased risk of thrombosis in users of hormonal contraceptives has been identified by a number of studies, and this has resulted in British National Formulary (BNF) recommendations1 to consider risk factors for venous thromboembolism (VTE) before prescribing the drugs and to avoid using them if two or more risk factors are present. Since the onset of oral contraceptive use in the general female population in the 1960s, studies have demonstrated associations between the drugs and a range of adverse side effects, including an increased risk of VTE. The composition of hormonal contraceptives has, therefore, changed over time. A ‘second generation’ aimed to reduce the increased VTE risk, lowering the oestrogen component by using potent testosterone-derived progestogens.2 A later ‘third-generation’ was introduced to lower the androgenic and vascular risk by introducing progestogens with low androgenic activity3 and to reduce arterial vascular impact.2 Effects on VTE from third-generation contraceptive use, have, however, been complex, with some increased risks reported.4

There are a large number of observational studies looking at the effect of contraceptive drugs on the general female population, but there are three key methodological issues which have been handled very differently across these. The first concerns verification of the VTE diagnosis. Standardised criteria for diagnostic categories include four levels of verification: positive imaging tests (eg, positive Doppler ultrasound or impedance plethysmography) and subsequent therapy (1: definite VTE), uncertain imaging tests with subsequent therapy (2: probable VTE), positive imaging tests without subsequent therapy or negative imaging tests but with subsequent therapy (3: possible VTE), and ‘typical symptoms’ without confirming tests or therapy (4: potential VTE).5

To date, observational studies have treated the verification of VTE in a number of different ways. An Austrian study distinguished between confirmed and not confirmed cases, concentrating on cases with definite and probable VTE for the main analysis and performing additional analysis on the sample including possible and potential VTE cases, which produced statistically identical results to their main analysis.6 An Israeli study based on a healthcare provider’s database used clinical records only without any verification.7 A Danish study based on national healthcare databases used anticoagulation prescriptions for verification and produced a stratified analysis of confirmed and non-confirmed diagnoses demonstrating a twofold to threefold higher risk associated with VTE in the confirmed group.8 A number of studies based on electronically collected data included cases with diagnosis of VTE confirmed with subsequent anticoagulant prescriptions but without using any diagnostic tests.4 ,9–12 A Dutch study based on hospital and general practitioners’ (GP) records required confirmation of VTE diagnosis with Doppler ultrasonography.13 These variations in levels of verification and differences in analysis strategy both complicate comparisons of study findings.

The second area of variation between studies lies in sample selection. For example, although women with oophorectomy, hysterectomy or sterilisation should not remain in the group potentially exposed to contraceptives, of the major studies with reference to the no use group only Lidegaard et al8 mention exclusion of such patients. As important is the difference in handling of non-idiopathic cases—those with other potentially important proximate causes and risk factors. Almost all studies were aligned in excluding women with previous VTE (as the studies were focused on incident cases) and the majority8–10 ,13 ,14 in excluding pregnant and postpartum women (unlikely users of contraceptive drugs and with a higher risk of VTE), but the handling of non-idiopathic cases based on morbidities has varied significantly.

The study of Farmer et al4 excluded patients with recent major surgery and trauma, cancer and congenital heart disease, while the studies of Jick and coauthors9 ,11 added renal failure, chronic cardiovascular disease, inflammatory or autoimmune conditions and an operation or major trauma before the diagnosis as exclusion criteria. The study of Parkin et al10 extended the exclusion list even further with diabetes type I, colitis, systemic lupus erythematosus, spondylitis, cystic fibrosis, psoriatic arthritis and coagulation disturbances. The study of Lidegaard et al,8 however, excluded only those with selected cancers and coagulation disturbances, while a number of studies15–18 did not exclude any such morbidity-related idiopathic cases. A study by Heinemann et al6 identified a subgroup of idiopathic cases and used additional analysis to demonstrate that the ORs for the selected group were twice those for the whole study sample.

The third methodological issue involves the related issues of exposure definition and reference group selection. Some studies were based only on current users, estimating the risk associated with use of one drug in comparison with another,7 ,9 ,10 ,15 ,16 while others compared current users with the non-exposed group.6 ,8 ,18 ‘Current use’ has also had a range of definitions, which is problematic because the increased risk of VTE in patients on oral contraceptives decreases after therapy stops and disappears within 3 months.19 The study of Heinemann et al6 considered a patient as a current user for 6 weeks after discontinuation, while the studies of Jick et al9 and Parkin et al10 extended the period of current use for only 30 days. The study of Lidegaard et al8 allowed 4 weeks after the end date of the prescription before changing a woman's exposure status to previous user, while the study of Gronich et al7 allowed 3 months. Seeger et al15 considered women as current users only if they had not reached the end date for the prescription, while in the questionnaire-based studies16 ,18 current use was derived from the responses of participants.

Twenty-six studies based on data up to 2013 contributed to a recent meta-analysis20 for combined oral contraceptives, which demonstrated a twofold or more increased risk of VTE in users of any type or generation of ‘the pill’ compared with non-users. All the studies listed above were included in the meta-analysis but had a wide variation in estimates because of their heterogeneity of approach, in particular with respect to the definition of cases, inclusion criteria and reference group selection and definition of exposure. This overview shows that there are no established criteria for selecting patients for assessment of the VTE risk associated with use of oral contraceptives. Excluding cases without anticoagulant therapy might introduce a selection bias as doctors may be more likely to start medical treatment in patients on contraceptive drugs even with mild symptoms of VTE.21 Excluding non-idiopathic cases restricts analysis to relatively healthy patients but does not remove patients with known risk factors such as smoking, obesity or other unmeasured lifestyle factors. Established risk factors, however, do not prevent doctors from prescribing contraceptive drugs and their decisions are affected by all information available to them, so the question of how inclusive or exclusive the sample should be is important from a practical point of view.

The proposed nested case–control studies based on the general female population will investigate the association between the use of most common hormonal contraceptives and risk of VTE adjusted for indications other than contraception (polycystic ovarian syndrome (PCOS) and menstrual disorders), comorbidities affecting prescribing and other confounding factors. In terms of exclusions and inclusions, the study will perform a number of sensitivity analyses to make it comparable with other studies. It will provide an overview of the risks associated with the currently most common types of oral contraceptives and increase its power by combining the results obtained from the two largest electronic medical records databases, QResearch and Clinical Practice Research Datalink (CPRD).

Methods and analysis

Data source

This study will use two separate data sources—the QResearch primary care research database ( and the CPRD, Both consist of routinely collected data from GP clinical computer systems and each contains information from around 7% of all UK general practices. The information recorded on the databases includes patient demographics (year of birth and sex), characteristics (height, weight and smoking status), clinical diagnoses, symptoms and prescribed medications, including repeat prescriptions. Both databases have been created for research purposes and linked to other sources of information, such as Hospital Episode Statistics (HES) and Office of National Statistics mortality data. Both have been validated using other sources of information in the UK, which has demonstrated their accuracy and completeness.22 Although QResearch has not been validated as extensively as CPRD, a recent study on risk of cancer and use of bisphosphonates based on these databases demonstrated similar prevalence in outcomes and prescribing.23 ,24 Both databases have been used in previous studies of VTE associated with prescription information.10 ,25

Sample selection

The studies will use records from UK general practices within the QResearch database and within the CPRD database. An open cohort of women from each database will be identified, all between the age of 15 and 49 years, registered with the study practices during the study period between 1 January 2001 and 31 December 2013. The right censor date will be the earliest of the following where applicable: date of diagnosis of VTE, date of death, date of leaving the practice, date of the latest download of data and study end date. Diagnosis of VTE will be based on recording in the electronic patient records using READ codes—the list of READ codes is presented in table 1.

Table 1

Read codes for VTE used in QResearch and CPRD data extraction

Within each of these two cohorts we will design two nested case–control studies with incident cases of VTE registered during the study period. Cases will be individually matched with up to five female controls with the same year of birth, age and from the same general practice. The controls will be selected using incidence density sampling and allocated an index date, which is the date on which their matched case was first diagnosed with VTE.


Cases and controls with any previous VTE diagnosis prior to entry into the study will be excluded. Cases with anticoagulant prescriptions (BNF 2.8) earlier than 6 weeks prior to the diagnosis will be excluded since this could indicate a previous VTE, and controls with such prescriptions before the index date will also be excluded.

Cases and controls will be excluded from the analysis if they have conditions preventing use of contraceptives such as oophorectomy, hysterectomy and sterilisation. We will also exclude women pregnant at the index date or in the first 3 months after delivery (using pregnancy codes and an estimated conception date as delivery date minus 280 days or delivery date minus gestational age if recorded), because these patients have a higher VTE risk26 and because it is less likely for breast feeding women to have been using hormonal contraceptives.

Eligible cases and controls will have at least 1 year of records prior to the index date.


The observational period for assessing exposure for each patient will be defined as the last year before the index date.

Exposure to hormonal contraceptives will be based on all prescriptions for combined hormonal and progestogen-only contraceptives within the 1 year observation period (BNF 7.3.1, 7.3.2) and hormonal treatment of acne (co-cyprindiol or cyproterone, from BNF 13.6.2). Cyproterone will be included as a hormonal contraceptive because it has a similar effect to progestogen on the release of testosterone by ovaries.27 A participant will be considered as ever exposed if they had at least one prescription for a hormonal contraceptive.

The main focus will be on combined oral contraceptives. We will consider the compositions containing levonorgestrel, desogestrel, norgestimate, norethisterone, gestodene, drospirenone and cyproterone. As an association of increased risk of VTE in transdermal versus oral contraceptive users has been found,28 women exposed to non-oral combined contraceptives will be identified and kept in the analysis. Progestogen-only drugs are not expected to be associated with an increased risk of VTE8 but will be kept in the analysis for comparison purposes, with oral and non-oral preparations as two different types of exposure. For all analyses non-users of hormonal contraceptives in the previous year will be the reference category.

Numbers permitting, dosages of oestrogen, 20 or 30 mg and more of ethinylestradiol, will be analysed for the most common compositions: norethisterone, desogestrel and gestodene.

The duration of exposure will be assessed by calculating the number of days prescribed within the previous year. If the gap between the end of one prescription and the start of the next is not more than 30 days, use will be considered as continuous and the duration of the prescriptions will be summed. If a gap between prescriptions is more than 30 days only the latest period of exposure will be considered.

Recency of use will be analysed by calculating the gap in days between the estimated date for the last use and the index date, and categorising it as: current use (using drugs at the index date or the last use was no more than 28 days before the index date), past use (last use between 29 and 365 days before the index date) and no use in last year. If a woman was exposed to more than one oral contraceptive in the 28 days before the index date, only the latest received drug will be analysed, and a variable indicating whether or not women had switched in the last 28 days will be included in the analysis.

We will estimate the effect of the duration of the last exposure by categorising it as up to 84 days (short term) and more than 84 days (long term). The cut point of 84 days (12 weeks) is chosen because VTE risk decreases after 3 months of exposure19 and 84 days is the most common length of a contraceptive prescription. We will combine recency and duration of exposure to give four categories for each drug exposure: current use with short-term exposure, current use with long-term exposure, past exposure and no use in the last year.

Confounding factors

All analyses will be adjusted for confounders established as risk factors for VTE because they are listed in National Health Service (NHS) guidelines29 and affect doctors’ decisions about prescribing hormonal contraceptives. The list will include comorbidities associated with increased risk of VTE30: cancer, congestive cardiac failure, varicose veins, cardiovascular disease, rheumatoid arthritis, systemic lupus erythematosus, chronic renal disease, asthma, chronic obstructive pulmonary disease, Crohn's or ulcerative colitis and coagulation disturbances (Leiden factor V, protein C and S deficiencies).31 Particular medical events will also be included if recorded in the past 6 months prior to the index date: acute infections, surgery, hospitalisation, leg or hip fracture.12 ,30 Patients with these comorbidities and conditions will be identified as non-idiopathic cases for further sensitivity analysis.

Other confounders—patients’ characteristics measured at the closest date before the index date—will be: body mass index (BMI, continuous variable)7, smoking status (current smoker: light 1–9 cigarettes/day, medium 10–19, heavy 20 or more, ex-smoker and non-smoker)32, alcohol consumption and ethnicity (White, Black, Asian, other).33

As there is likely to be a large group of women taking hormonal contraceptives for treatment of PCOS, this condition will also be included because of associations with increased risk of VTE.34 Other reasons for combined hormonal contraceptive use, such as acne, hirsutism and menstrual disorders, will be included into analysis if the OR for at least one of the exposure variables is changed by more than 10%.

Statistical analysis

Conditional logistic regression will be used to estimate ORs with 95% CIs for VTE. The initial analysis model will determine the unadjusted ORs for VTE associated with the key exposure variables of interest (specific types of drugs, recency of use and duration and dose). A multivariable model will determine the OR for VTE associated with hormonal contraceptive prescriptions, adjusted for the confounding variables listed above. The main analyses will be run on all cases with VTE identified from the general practice data and their matched controls. A sensitivity analysis will be run on the subgroup of cases and their matched controls where the case diagnosis is supported by thrombolytic prescriptions in the 6 weeks before or after the VTE diagnosis. A second sensitivity analysis will be run on idiopathic cases and their controls, excluding from the analysis all cases and controls with medical conditions and recent events established as VTE risk factors. A third sensitivity analysis will be run on all non-idiopathic cases and controls.

For practices linked to HES data another sensitivity analysis will be run. New cases of VTE identified in HES will be added to the analysis and controls with VTE recorded in HES prior to the index date will be removed.

As the proportion of women using contraceptive clinics (where the data on contraception is not recorded in their GP records) is higher (10%) for a younger group (15–24 years) compared with 3% for women 25 years and older,35 separate subgroup analyses will be run on the older and younger group, and we will carry out a test for interaction with age.

As BMI, smoking status and alcohol consumption may be important confounders but have non-negligible numbers of missing data, multiple imputation will be used to impute missing values.36 Ten imputed datasets will be created. Index year, case/control status, age, years of records, potential confounders and exposure to hormonal contraceptives and other drugs, will be included in the imputation model. The distribution for BMI will be assessed and, if not normal, a transformation will be carried out prior to inclusion in the imputation model. Characteristics of women with missing values and with complete data will be compared to assess whether it is plausible that data are missing at random. A sensitivity analysis restricted to women without missing data for BMI, smoking status and alcohol consumption will also be performed.

The nested case–control studies for QResearch and CPRD will be carried out separately and in exactly the same way, selecting the same confounders and running the same procedures. All observations will be from general practices in the UK, from the same time period, with similar exposures and using similar methods for recording outcomes. The sizes of the studies are also expected to be similar. Any differences in associations observed across the databases are likely to be caused only by sampling variation. The results from the two studies will then be combined using the method of Mantel and Haenszel for fixed effect models.

A 1% level of statistical significance will be used to allow for multiple comparisons. Stata V.12 will be used for all the analyses.

Sample size calculation

All eligible cases from QResearch and CPRD will be used. According to the Office for National Statistics, combined contraceptives are used by 25% of women aged 15–49 in the UK.37 For an individual drug with exposure of 5%, 2115 cases and 10 575 controls will be needed to detect a clinically important OR of 1.5 at a significance level of 1% with 90% power. For rarer compositions such as drospirenone or cyproterone with exposure of 1% and a clinically important OR of 2.0, 2882 cases and 14 410 controls will be needed. The numbers of cases from QResearch and CPRD are expected to be fairly similar. The January 2014 version of CPRD contains 8673 cases with first time VTE recorded between 2001 and 2013. After removing pregnant and postpartum women, cases with previous anticoagulant prescriptions and cases with less than a year of medical records, a sample of 5920 cases will be available for analysis in CPRD.


This is an observational study based on routinely collected data from two large primary care research databases and will have the strengths and limitations common to all such studies. By combining results from two databases, the study will have greater statistical power than previous studies. It will allow analyses to be carried out investigating the effects of the recency and duration of use for the most commonly used hormonal contraceptives. Because the data on prescriptions and potential confounding variables are routinely and prospectively collected and recorded before the index date, the study will be free from recall bias. Similarly, as all eligible cases and randomly selected controls will be included, there should be no selection bias.

The study will conduct a number of sensitivity analyses to address conflicting methodological issues giving the reader an opportunity to decide which estimates are the most valid.

The limitations of the study will include possible uncertainty in VTE diagnosis. A systematic review based on General Practice Research Database (GPRD) validation studies has reported that, on average, 85% of diagnoses of circulatory system problems recorded on the GP electronic record were confirmed from other data sources.38 Lawrenson et al39 looked specifically at VTE validation and found that 84% of the diagnoses were supported by hospitalisation or death certificate. Any misclassification (assuming it is non-differential between cases and controls) will result in underestimation of associations with hormonal contraceptives, shifting the ORs towards unity. The sensitivity analysis on validated diagnosis of VTE along with descriptive statistics will address issues about differential attention to different types of contraceptives raised in the Danish study.8

Another limitation is potential underestimation of hormonal contraceptive use. Apart from the GP, hormonal contraceptives are available from other NHS services such as family planning or contraceptive clinics. According to the Health & Social Care Information Centre,35 approximately 0.6 million women in England are supplied with hormonal contraceptives from contraceptive clinics, which represents about 5% of the targeted population. Although a survey of contraceptive services use in Britain40 reported that only 59% of responding women would use general practice and 15% would use contraceptive clinics, the response rate of the survey was only 65% and the actual proportions might be smaller. From currently available CPRD data, overall use of hormonal contraceptives based on GP prescriptions is about 50%, so excess of use from other sources will be considered minor.

Other limitations are also common to any general practice database. Information on certain risk factors such as level of physical activity or use of air travel is not reliably recorded so these factors cannot be included in the analyses. Important confounders such as smoking or BMI have non-negligible amounts of missing data so these will be imputed.

The results of this study will help to establish risks of VTE associated with different oral hormonal contraceptive drugs.

Ethics and dissemination

The project has also been reviewed by QResearch and meets the requirements of the Trent Research Ethics Committee. To guarantee the confidentiality of personal and health information only the authors will have access to the data during the study. It will be possible to access the QResearch and CPRD data after the publication of the results but only on premises of the University of Nottingham according to QResearch standard procedures and the CPRD license. The full protocol and statistical code will be available from the authors after the publication of the results.


The authors would like to acknowledge the helpful contribution of the reviewers, Professor Susan Jick and Professor Øjvind Lidegaard.


View Abstract


  • Contributors JH-C had the original idea for this study. CC contributed to the development of the idea and the study design. YV reviewed the literature, contributed to the study design and wrote the draft of the manuscript. JH-C and CC critically reviewed the paper. YV is the guarantor of the study. All authors have approved the submitted version.

  • Competing interests JHC is a professor of clinical epidemiology at the University of Nottingham and unpaid director of QResearch, a not-for-profit organisation which is a joint partnership between the University of Nottingham and EMIS (commercial IT supplier for 60% of general practices in the UK); JHC is also a paid director of ClinRisk Limited, which produces open and closed source software to ensure the reliable and updatable implementation of clinical risk algorithms within clinical computer systems to help improve patient care.

  • Ethics approval This protocol has been approved by Independent Scientific Advisory Committee for MHRA Database Research (N 13_118R).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement To guarantee the confidentiality of personal and health information only the authors will have access to the data during the study. It will be possible to access the QResearch and CPRD data after the publication of the results but only on premises of the University of Nottingham according to QResearch standard procedures and CPRD license. The full protocol and statistical code will be available from the authors after the publication of the results.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.