Objectives We aimed to design and test a method to extract information on antithrombotic therapy from anonymised free-text notes in the Clinical Practice Research Datalink (CPRD).
Setting General practice database representative of the UK.
Participants All patients undergoing total hip replacement (THR, n=25 898) or total knee replacement (TKR, n=22 231) between January 2008 and October 2012 were included. Antithrombotic drug use related to THR or TKR was identified using anonymised free text and prescription data.
Primary and secondary outcome measures Internal validity of our newly designed method was determined by calculating positive predictive values (PPVs) of hits for predefined keywords in a random sample of anonymised free-text notes. In order to determine potential detection bias, total joint replacement (TJR) patient characteristics were compared as per their status of exposure to antithrombotics.
Results PPVs ranging between 97% and 99% for new oral anticoagulants (NOAC) or low-molecular weight heparins (LMWH) exposure related to TJR were obtained with our method. Our search strategy increased detection rates by 57%, yielding a total proportion of 18.5% of all THR and 18.6% of all TKR surgeries. Identified users of NOACs and LMWHs were largely similar with regards to age, sex, lifestyle, disease and drug history compared to patients without identified drug use.
Conclusions We have developed a useful method to identify additional exposure to NOACs or LMWHs with TJR surgery.
- STATISTICS & RESEARCH METHODS
- ORTHOPAEDIC & TRAUMA SURGERY
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
This is the first study reporting an effective method to identify antithrombotic drug use in anonymised free text in a large population-based database.
Our method can be implemented at relatively low costs.
We were unable to determine the specificity and sensitivity of our method.
Osteoarthritis (OA) is the most common musculoskeletal condition in older people in the UK.1 Based on 7-year consultation prevalence in British general practice, an estimated 8.75 million people have been treated for OA. This constitutes one-third people aged 45 years and over in the UK.1 Total joint replacements (TJR), such as total hip replacement (THR) or total knee replacement (TKR), substantially improve quality of life in these patients.2 However, risk of potentially fatal venous thromboembolic events, such as deep-vein thrombosis (DVT), is increased up to 14-fold following these surgical procedures when no drug for thromboprophylaxis is given.3 Intensive antithrombotic treatment up to 35 days is, therefore, recommended and has resulted in a reduced risk of asymptomatic DVT in a meta-analysis of randomised trails (RR=0.51, 95% CI (0.45 to 0.59), p<0.001).4 ,5
There has been much debate about the risk/benefit ratio of antithrombotic agents following THR or TKR surgery.6–8 Low-molecular weight heparins (LMWHs) have gained popularity in the past decade, but can only be administered subcutaneously. The recently introduced orally administered direct thrombin inhibitors and direct factor Xa inhibitors (eg, dabigatran and rivaroxaban) seem to combine both advantages of vitamin-K antagonists (VKAs) and LMWHs. Their benefit-risk profile has been suggested to outperform VKAs in patients with atrial fibrillation.9 To further understand the benefit/risk profile of these antithrombotic drugs in real life use, data form large databases, such as the Clinical Practice Research Datalink (CPRD), could contribute largely . However, some antithrombotic therapy are ‘red-listed’ and therefore, predominantly dispensed during inpatient hospitalisations and the period postdischarge. Consequently, patients do not have to visit a public pharmacy to repeat their prescription. Unfortunately, inhospital pharmacy data is often lacking in general practitioner (GP) databases. Therefore, there are limitations in using these databases to conduct drug utilisation or drug effect studies. However, Martinez et al10 have found a feasible approach to overcome some of these restrictions. They identified a total of 754 rivaroxaban users by combining information from prescription files and electronic search of anonymised free text in the CPRD between October 2008 and January 2012. Approximately 66% of these users were identified by anonymised free text only, thereby tripling the information on exposure to rivaroxaban. Unfortunately, details of the electronic search method used were lacking in this study and it focused only on one drug (rivaroxaban).
In order to assess safety and efficacy of the new oral anticoagulants (NOACs), new methods are needed to efficiently extract the additional information from anonymised free-text notes. In some fields this has already been done. For example, Shah designed an algorithm that identifies dosage instructions for a variety of drugs in anonymised free text. Their algorithm converted unstructured data from these notes to structured, usable data. They subsequently validated their structured data by manually checking a random sample of these records. Their algorithm identified correct dosage information in 99% of the analysed anonymised free text.11 Since inhospital pharmacy data are not included in most primary care databases, we aimed to design and test a method to extract additional information on antithrombotic therapy in patients undergoing TJR surgery from anonymised free-text notes in the CPRD.
This study was conducted using CPRD GOLD, formally known as the General Practice Research Database (GPRD). The CPRD is a GP database currently containing approximately 8% of the UK population. GPs play a key role in the UK healthcare system, as they are responsible for primary healthcare and specialist referrals. Consequently, medical information from GPs, specialist referrals and hospitalisation data are recorded in the CPRD.12 Most data are recorded using CPRD's therapy files, but some data, such as hospital discharge summaries, are collected as anonymised free text. CPRD has been used to study outcomes after TJR surgery.13 ,14
The study population comprised all patients who underwent a primary THR or TKR surgery from January 2008 to October 2012. This time frame was chosen because NOACs have been registered for antithrombotic therapy after TKR and THR in Europe since 2008 (dabigatran, 18 March 2008; rivaroxaban, 13 September 2008).
Selection of comparison groups
We determined the exposure to antithrombotics using CPRD product codes and analysis of anonymised free text. Use of product codes of NOACs (dabigatran and rivaroxaban), LMWHs (bemiparin, certoparin, reviparin, enoxaparin, tinzaparin, dalteparin), and aspirin during the 10 days before and 10 days after surgery were included. Based on the annual reports of the National Joint Registry (NJR), VKAs were used for thromboprophylaxis in <1% of the TJR surgeries. They were, therefore, not included in the analyses. The procedure of the anonymised free-text analysis (figure 1) was based on methods used in a previous study that identified rivaroxaban use and a study that restructured dosage instructions.10 ,11 For the anonymised free-text analysis, we first ran our search strategy on the study population to determine the use of NOACs, LMWHs and aspirin. This search strategy searched for specific keywords in the anonymised free-text notes of TJR patients. Keywords included abbreviations, mis-spelled variations, and actual names of both the generic and the brand products (see online supplementary appendix table S1). Anonymised free-text notes between 10 days before and 10 days after date of surgery were included in the analysis. This time window was based on the mean duration of hospital stays after TJRs, as reported in the National Registry of Hospitalisations of England (2008–2013). 15
Patients with a positive hit for these keywords were further analysed to determine the validity of this positive response. From both THR and TKR patients, we randomly selected 100 patients with at least one positive NOAC hit, 100 patients with at least one positive LMWH hit and 100 patients with at least one positive aspirin hit. This resulted in a total of 300 THR patients and 300 TKR patients. Additionally, anonymised free-text notes of these randomly selected patients were analysed. The additional analysis consisted of anonymising of the free-text surrounding the positive hit in each record. By anonymising 20 words before and after the positive hit, we either confirmed or refuted the drug use. In the following examples drug use was confirmed (1) or refuted (2), respectively:
…still on dabigatran…
…aspirin was stopped in …
Within the anonymised free text, we manually searched for negating or confirming terms. Two pharmacists (JTHN and FdV) then independently decided whether a patient had been exposed or unexposed to antithrombotic drugs. In case of disagreement, the participant was discussed until an agreement was reached.
Subsequantly we determined exposure to antithrombotics in the entire study population by combining drug use based on a positive response in the anonymised free-text analysis and drug use based on product codes (see online supplementary appendix table S2). Concomitant use of NOACs and aspirin were categorised as NOAC use. Concomitant use of LMWHs and aspirin were categorised as LMWH use. This classification was chosen because it is more likely that in these cases the exposure to NOACs or LMWHs is associated with TJR. Patients without identified drug use were categorised as unknown users. Exposure to a NOAC and a LMWH is highly unlikely. Therefore, concomitant use of these drugs was also classified as unknown use.
Positive predictive values (PPVs) of the positive hits by free-text search were calculated by dividing the number of positive hits, confirmed by manual analysis of surrounding free text, by the total number of positive hits. PPVs can range between 0% and 100%, where 100% corresponds with perfect predictive value of positive hits. We were unable to calculate false or true negatives because we did not analyse the free-text notes without a positive hit. As a result, we could not determine the specificity and sensitivity of our method. In order to assess the additional detection rate by using our free-text search, we determined the distribution of cases within a drug category according to the identification method. Cases were categorised according to type of surgery and the previously described user categories. To gain insight for the external validity of the method, we first compared the proportion of NOAC and LMWH use in the CPRD to the proportion reported by the National Joint Registry (NJR)13 by means of the χ2 statistic for independent samples (p<0.05). Second, we compared the characteristics of the different user groups (NOACs, LMWHs and aspirin) to patients without identified drug use by means of the χ2 or t test statistic (p<0.05). We used SAS 9.3 software for statistical testing and randomisation. This study protocol was approved by the Independent Scientific Advisory Committee (ISAC), protocol number: 14_026R.
A total of 22 231 TKR patients and 25 898 THR patients were included in our study. From all patients (n=6324) with a positive hit for antithrombotic drug use, 600 patients were randomly selected for anonymisation of the free-text surrounding a keyword. However, 14 patients did not give consent for this analysis. The remaining 586 patients were further analysed to determine actual drug use (figure 1). On an average, TKR patients were younger than THR patients. They also had a higher BMI and used more drugs, such as statins, non-steroidal anti-inflammatory drugs (NSAIDs), and antihypertensives (table 1).
PPVs of drugs used for thromboprophylaxis related to TJR ranged between 97% and 99% for NOAC and LMWH use, and between 97% and 99% for aspirin use. Overall, 96.2% of the hits were true positives and 3.8% of the hits were false positives. By combining exposure according to positive anonymised free text hits with exposure according to product codes identification of drug use was increased by 57% compared to product codes only. Ultimately, drug use was identified in a total proportion of 18.5% of all THR and 18.6% of all TKR surgeries. Antithrombotic drug use was determined by free-text analysis only in 65–70% of the LMWHs group and >80% of the NOACs group; which threefold to fivefold increased the detection rates of these drugs compared to detection by product codes. In the aspirin group, drug use was mostly (∼90%) determined by coded prescription data (figure 2 or see online supplementary appendix table S3). Use of NOACs was higher with our method in CPRD as compared to the NJR reports in both THR and TKR patients when ratio of NOAC and LMWH use was compared. NOAC/LMWH ratio was only statistically significantly different in TKR patients in 2009 and 2010. All other groups were not statistically significantly different (table 2). Given the lower PPV and the substantial difference of baseline characteristics, and the limited role of thromboprophylaxis as related to TJR, we did not further focus on comparing our aspirin data with those from the NJR.
Table 3 shows that TKR and THR patients using NOACs or LMWHs were largely similar with regard to distribution by age, sex, comorbidities and drug use as compared to patients with unknown drug use. Aspirin users, however, were different as compared to unknown users particularly with regard to a history of ischaemic heart disease. Compared to unknown use, prescription of antithrombotic drugs was higher in the highest socioeconomic status (SES) quintile and lower in the lowest SES quintile (except for THR patients using NOACs). Compared to unknown users, all other groups showed a different distribution across the four major regions. Moreover, only one NOAC prescription was identified in Northern Ireland. Furthermore, THR patients using NOACs or LMWHs had a higher BMI as compared to unknown users.
We have developed a useful method to identify exposure to NOACs or LMWHs related to TJR surgery. Our search method identified NOAC and LMWH use with PPVs between 97% and 99%. Aspirin use yielded PPVs ranging from 91% to 95%. When combining our anonymised free text method with a traditional method using CPRD product codes, the identification of drug use increased by 57% on average. Moreover, the identification of NOAC use showed a fivefold increase and the exposure to LMWHs a more than threefold increase as compared to identification by product codes. Users of NOACs or LMWHs were largely similar to patients with unknown tromboprophylaxis related to TJR surgery with regards to age, sex, comorbidities and use of other drugs. As expected, aspirin users were different and had, for example, more often a history of ischaemic heart disease.
To our knowledge, anonymised free text has only been used once to identify antithrombotic drug use in the CPRD.10 However, details of the electronic search method used in this study are unclear and results were restricted to rivaroxaban use only. In this paper, we present an effective and highly efficient approach to additionally identify exposure to multiple antithrombotic drug classes, using a predefined set of keywords. Positive hits using the selected keywords in anonymised free-text notes were highly associated with actual use, especially the use of NOAC and LMWH. Thus, we believe that our free-text search method is a valid method for the additional identification of drug use in the CPRD. When combining our anonymised free text method with a traditional method using CPRD product codes, identification of drug use is increased by 57% compared to the identification based on product codes. This increase of NOAC identification is somewhat higher than the threefold increase of rivaroxaban use presented by Martinez et al.10 This difference could be caused by an overestimation in our method due to false positive responses. However, previously described PPVs indicate this can only account for 1–3% overestimation. It could also be due to the fact that we determined both rivaroxaban and dabigatran use, whereas Martinez and colleagues only measured rivaroxaban use. An alternative explanation is that Martinez et al underestimated the drug utilisation. Our method could be more sensitive due to the wider variety of keywords used in our electronic search.
While we found a method to increase the detection rate of drug use, the question that arises is whether there is a reason why drug use was identified in these specific patients (ie, are these patients different compared to the patients without identified drug use, and could this difference be the reason why drug use was identified in the former group and not in the latter). In order to determine whether we were dealing with a deviating group of patients and to confirm external validity, we first compared use of NOACs and LMWHs with annual NJR reports. We did not include aspirin use in this analysis because of expected differences based on the fact that aspirin is indicated for various other cardiovascular diseases. The ratio of NOAC and LMWH use in our CPRD analysis appeared to be different compared to the NJR reports in TKR patients in 2009 and 2010 only. This could be due to the fact that NOAC prescription is considered to be new and therefore more likely to be mentioned in the free-text notes as compared to LMWH use. To further investigate this, we also compared comorbidities and history of drug use of NOACs, LMWHs or aspirin among patients with unknown drug exposure status.
Only minor differences in characteristics were found between NOAC or LMWH users and patients with an unknown exposure status associated with TJR surgery (table 3). This suggests that these groups are comparable and that there is no apparent reason for a detection bias of NOAC or LMWH use in these specific patients. This is reassuring when long-term effects and side effects of the different drugs are compared in further research. As expected, aspirin users were very different compared to unknown users with regard to comorbidities and drug use (table 3). This is most likely due to the fact that aspirin is indicated for multiple other conditions such as angina pectoris, the prevention of myocardial infarction and other types of ischaemic heart disease. Therefore, our method seems to have limited usability for the identification of aspirin use in chemical thromboprophylaxis related to TJR. Differences in drug use according to region may be caused by use of regional guidelines. NICE guidance is predominantly designed for use in England and Wales. The Health and Social Care (HSC) and the Scottish Intercollegiate Guidelines Network (SIGN) are responsible for guidance in Northern Ireland (NI) and Scotland, respectively. However, linkage with NICE guidelines for implementation in NI has been available since 2006.
Our study had various strengths. To the best of our knowledge, this is the first study reporting an effective method to identify antithrombotic drug use from anonymised free text in a large population-based GP database in a peer-reviewed scientific journal. Moreover, our method can be implemented at relatively low costs, since actual anonymisation of free text is not required. This will then allow us to use the wealth of data in the world's largest primary care database to study, for example, the potential side effects of thromboprophylaxis associated with TJR. Our study also had limitations. We were practically unable to calculate false or true negatives. As a result, we were unable to determine the specificity and sensitivity of our method. In order to evaluate whether documentation of the exposure was differential between patients with a positive hit and patients without a hit, we applied two methods. First, we assessed distribution of NOAC and LMWH use from an external source, the National Joint Registration (NJR). Second, we assessed differences in baseline characteristics of the various exposure groups to the group without a hit. With these surrogate measurements we have generated information concerning the potential differential detection of drug use exposure. Another limitation was that we were able to identify thromboprophylaxis related to TJR for only 18% of our patients, whereas actual thromboprophylaxis is likely to be close to 100%. This is probably the result of under-reporting of thromboprophylaxis by either hospitals, the GP or simply because discharge letters were sent to GPs as scanned PDF files rather than searchable free text. Nevertheless, we were still able to substantially increase identification using anonymised free text compared to identification based on product codes only. In the future, this may be further enhanced by linking to other data sources such as the inhospital prescribing data, or by making the existing linkage between the NJR and CPRD available to researchers without any restrictions.
In conclusion, we have developed a useful method to identify exposure to NOACs or LMWHs associated with TJR surgery. We can conclude that positive hits using the selected keywords in anonymised free-text notes are strongly associated with actual use and that by using this method, identification of drug use has increased fivefold. Furthermore, identified users of NOACs and LMWH users appear to be reasonably similar to patients without identified drug use (table 3). In contrast, aspirin users were very different as compared to patients without identified drug use, possibly due to the fact that aspirin is prescribed for various other health problems. Identification of drugs used for a specific postsurgery health ailment can be substantially enhanced by using our method; similar methods may be used for the identification of drugs prescribed for other diseases or hospital-specific medications such as biologicals and blood products. Our method is a useful tool to identify exposure to NOACs or LMWHs related to total TKR or THR surgery in the CPRD, and increases statistical power to evaluate potential side effects of these drugs in pharmacoepidemiological studies.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online supplement
Contributors JTHN, FdV, PCD, PJE, NV, BJFvdB, T-PvS and AB contributed to the conception and design of the study. T-PvS, AL, FdV and JTHN contributed to the acquisition of data. JTHN and AL contributed to analysis of data. All authors contributed to interpretation of data JTHN and FdV drafted the manuscript, and all authors contributed intellectual content while revising the article. All authors gave final approval of the version to be submitted and any revised version.
Funding This study was internally funded by the Maastricht University Medical Center+.
Competing interests JTHN, AL, T-PvS and FdV are employed by the Division of Pharmacoepidemiology and Clinical Pharmacology at Utrecht Institute for Pharmaceutical Sciences, which has received unrestricted funding from the Netherlands Organisation for Health Research and Development (ZonMW), the Dutch Health Care Insurance Board (CVZ), the Royal Dutch Pharmacists Association (KNMP), the private-public funded Top Institute Pharma (http://www.tipharma.nl)( including co-funding from universities, government, and industry), the EU Innovative Medicines Initiative (IMI), the EU 7th Framework Program (FP7), and the Dutch Ministry of Health and Industry (including GlaxoSmithKline, Pfizer, and others). AL reports grants from Netherlands Organisation for Scientific Research (NWO), outside the submitted work. BJFvdB reports research grants to his department from Pfizer and Roche and occasionally speaker’s honoraria from Pfizer, Roche, Abbvie and MSD. AB reports research grants to her department from Amgen Abbvie, Pfizer and Merck, and occasionally speaker’s honoraria from Pfizer, UCB and Sandoz. PJE reports research grants to his department from Stryker, Active implants, Carbylan Biosurgery, DSM Biomedical and Regentis, and occasionally speaker’s honoraria from Biomet and Push braces. PCD has received unrestricted grants from NWO, EU and nutritional industry for research unrelated to this topic.
Ethics approval Independent Scientific Advisory Committee MHRA Database Research.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.