Article Text

Download PDFPDF

Cohort profile: prescriptions dispensed in the community linked to the national cancer registry in England
  1. Katherine E Henson1,
  2. Rachael Brock1,
  3. Brian Shand1,
  4. Victoria H Coupland1,
  5. Lucy Elliss-Brookes1,
  6. Georgios Lyratzopoulos1,2,
  7. Philip Godfrey3,
  8. Abigail Haigh3,
  9. Kelvin Hunter4,
  10. Martin G McCabe1,5,
  11. Graham Mitchell3,
  12. Nina Monckton3,
  13. Robert Robson3,
  14. Thomas Round6,
  15. Kwok Wong1,
  16. Jem Rashbass1
  1. 1 National Cancer Registration and Analysis Service, Public Health England, London, UK
  2. 2 Department of Behavioural Science and Health, ECHO (Epidemiology of Cancer Healthcare and Outcomes) Group, University College London, London, UK
  3. 3 NHS Prescription Services, NHS Business Services Authority, Newcastle upon Tyne, UK
  4. 4 Department of Medicine (Cambridge University), Addenbrooke’s Hospital, Cambridge, UK
  5. 5 Institute of Cancer Sciences, Manchester Academic Health Science Centre, Core Technology Facility, University of Manchester, Manchester, UK
  6. 6 Department of Primary Care and Public Health Sciences, King’s College London, London, UK
  1. Correspondence to Dr Katherine E Henson; katherine.henson{at}


Purpose The linked prescriptions cancer registry data resource was set up to extend our understanding of the pathway for patients with cancer past secondary care into the community, to ultimately improve patient outcomes.

Participants The linked prescriptions cancer registry data resource is currently available for April to July 2015, for all patients diagnosed with cancer in England with a dispensed prescription in that time frame.

The dispensed prescriptions data are collected by National Health Service (NHS) Prescription Services, and the cancer registry data are processed by Public Health England. All data are routine healthcare data, used for secondary purposes, linked using a pseudonymised version of the patient’s NHS number and date of birth.

Detailed demographic and clinical information on the type of cancer diagnosed and treatment is collected by the cancer registry. The dispensed prescriptions data contain basic demographic information, geography measures of the dispensed prescription, drug information (quantity, strength and presentation), cost of the drug and the date that the dispensed prescription was submitted to NHS Business Services Authority.

Findings to date Findings include a study of end of life prescribing in the community among patients with cancer, an investigation of repeat prescriptions to derive measures of prior morbidity status in patients with cancer and studies of prescription activity surrounding the date of cancer diagnosis.

Future plans This English linked resource could be used for cancer epidemiological studies of diagnostic pathways, health outcomes and inequalities; to establish primary care comorbidity indices and for guideline concordance studies of treatment, particularly hormonal therapy, as a major treatment modality for breast and prostate cancer which has been largely delivered in the community setting for a number of years.

  • oncology
  • epidemiology
  • primary care
  • health informatics

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

View Full Text

Statistics from

Strengths and limitations of this study

  • The linkage of community-dispensed prescriptions data to cancer registration data can greatly enhance our understanding of the patient pathway, as it can provide novel insight into symptom profiles and potentially identify patterns which could indicate opportunities for the earlier diagnosis of cancer.

  • The key strength of this data source is its population coverage, and the pseudonymisation process strengthens its value, as it has permitted the existence of an anonymised national control population for comparison purposes, to understand how patients with cancer differ from the general population.

  • Sex is not available in the prescriptions data, therefore cannot be used to match controls of non-cancer patients.

  • The indication for the dispensed drug is not included in the data which may be problematic for drugs with multiple indications, for example, antidepressants can be indicated for depression or chronic pain.

  • For pharmacovigilance studies, one should be aware that dispensed drug information does not guarantee that the patient has taken the medication.


National healthcare systems make it more feasible to routinely collect detailed healthcare data for the entire population, which are vital for population health research. For example, in the National Health Service (NHS) in England, Public Health England (PHE) holds cancer and other disease data; NHS Digital collects hospital activity1 and other process data, while the NHS Business Services Authority (NHSBSA) gathers administrative data for payments of services. NHSBSA gathers this through NHS Prescription Services, which calculates the remuneration and reimbursement owed to dispensing pharmacy contractors in the community across England, as per the Drug Tariff.2

The resulting data resource collected by NHSBSA is vast. Over one billion prescription items were dispensed in England in 2014, an increase of over 55% since 2004 at a cost of nearly £9 billion per year.3 As in many developed countries, prescribing rates have continued to rise in England, largely due to an increasingly old and multimorbid population.4 Therefore, the importance of collecting and using such data is increasing.

Since April 2015, NHSBSA expanded the dataset to include NHS number, which is the primary patient identifier in England. This has transformed the data, allowing linkage to other health data, for example, national cancer registration data.

The National Cancer Registration and Analysis Service (NCRAS) in PHE is responsible for collecting data on all cases of cancer and certain benign tumours that occur in people living in England.5 Hospital trusts submit multiple feeds of electronic health data, which are processed and combined by trained registration officers into a clinically comprehensive record for each tumour. The majority of cancer care is delivered in secondary care; therefore, the data collected reflect this.

A Data Sharing Agreement was established between PHE and NHSBSA, for NHSBSA to supply pseudonymised dispensed prescriptions data to PHE, providing hitherto missing information on this aspect of community healthcare for patients with cancer. We aim to describe the dispensed prescriptions data, the linkage and key data quality implications, with a focus on the linked data available within PHE.

Cohort description

Data collected

The dispensed prescriptions data are collected from the Electronic Prescribing Service (EPS) and captured from the paper FP10 prescription forms using high speed scanners and character recognition software, with human input when required. FP10 forms are the legal prescription forms issued to patients in England, which they take to their chosen pharmacy. As of summer 2017, over half of the prescriptions dispensed in the community in England were issued via the EPS service.6 This has increased since 2015. Both EPS and FP10 forms are legal prescriptions, which are usually issued by a doctor, but may also be issued by a nurse, pharmacist prescriber or supplementary prescriber. The data are collected from prescriptions dispensed at a community pharmacy. Prescriptions are not filled at community pharmacies in the following scenarios: drugs supplied during a hospital stay, including those filled at discharge; drugs provided by private healthcare institutions; drugs supplied during a stay at a hospice or drugs supplied by other healthcare institutions, for example, urgent care centres.

Demographic, prescriber and drug details are collected for the dispensed prescriptions, including quantity, strength (dose) and presentation (tablet, injection etc). The full drug item is recorded as per the British National Formulary (BNF) V.68.7 Exemptions to this classification are detailed in the BNF Classification booklet published by NHSBSA,8 including additional items not covered by the BNF, for example, dressings and appliances. A full data dictionary is shown in table 1, with the percentage completeness for each data item.

Table 1

Description of data items included, with data quality completeness of linked dispensed prescriptions cancer registry data, for patients diagnosed with malignant cancer (excluding non-melanoma skin cancer) after 1994 with a dispensed prescription record during April to July 2015 (n=1 680 764 patients and 33 669 294 prescription items)

Dispensed prescriptions data are available from April 2015, with data for April to July 2015 currently linked to cancer registration data. Data from August 2015 onwards will be available in due course, with updated linked data available in 2018. The objective is to link updated prescriptions data to the cancer registry data on a quarterly basis, with an approximate lag to real time of 6 months. During April to July, 332 655 118 dispensed prescription items among 29 481 344 individuals are identified. Ten per cent of the dispensed items and 6% of the individuals were linked to the cancer registry data, and therefore had a history of cancer (restricting to malignant tumours excluding non-melanoma skin cancer [ICD-10 (International Statistical Classification of Diseases and Related Health Problems 10th Revision) C00-C97 excl. C44] diagnosed after 1994). Using this linked data, 90 840 patients were identified as being diagnosed with a malignant tumour during April to July 2015, and had a prescription dispensed during the same period. Among this cohort, 99% of their dispensed prescriptions were issued by a General Practitioner (GP) practice9 (see table 2).

Table 2

Number of dispensed prescription items by the prescribing institution for a cohort of patients with cancer diagnosed during April to July 2015, with a prescription dispensed in the same time period

Pseudonymisation process

The partnership between NCRAS and NHSBSA has allowed NCRAS to obtain prescribing dispensed in the community information for both patients with and without cancer. The information on (anonymous) individuals without cancer can be used for analytical purposes as ‘controls’. It was required to be in an anonymised format because NCRAS does not have the legal permissions to hold identifiable data on patients without a cancer diagnosis. Therefore, a secure pseudonymisation process was developed within NCRAS to allow the data linkage to be performed securely, without requiring the individual identities of patients to be shared.

The stages performed by both NHSBSA and NCRAS of the data extraction, pseudonymisation and linkage are detailed in figure 1. The pseudonymisation procedure was run at source by NHSBSA on the dispensed prescriptions data, and the pseudonymised data were sent to NCRAS on an encrypted disk. The same pseudonymisation procedure was also run by NCRAS on the identifiers of patients with cancer stored in the cancer registry. A script was then run which linked the pseudonymised dispensed prescriptions data with patient records in the cancer registry using the pseudonym identifiers. A subset of the pseudonymised row-level dispensed prescriptions data linked to the cancer registry data was therefore securely created.

Figure 1

Pseudonymisation and linkage process for the dispensed prescriptions data and cancer registry data. NCRAS, National Cancer Registration and Analysis Service; NHSBSA, NHS Business Services Authority.

The pseudonymisation procedure uses standard third-party encryption and hashing modules (a modified version of the OpenPseudonymiser approach10). Fields which identify a patient are encrypted (hashed), and these encrypted values replace the original. The fields used are a patient’s NHS number and their date of birth. The pseudonymisation procedure is run at source by NHSBSA on the prescriptions data. The procedure used by both NHSBSA and NCRAS is identical. This allows us to match the pseudonyms in the dispensed prescriptions data, where possible, with those in the cancer registry without the need to reveal the original patient details. Current security technology shows that a ‘brute force’ approach could theoretically be used to decrypt the data, but in practice the probability is minuscule,11 therefore, in practice it is not possible to reidentify NHS numbers.

Data quality

NHS Prescription Services reprocess a random sample of 50 000 dispensed prescription items a month to assess the accuracy of these data. As of December 2016, the processing accuracy was 99.5%.12

Preliminary analyses of the linked data highlighted a number of data quality issues. A number of these issues relate to the FP10 forms, which account for 73% of the items in the April to July 2015 data. In these items, the date information is the month that the claim was submitted to NHSBSA, rather than the date that the drug was dispensed. There are therefore cases of inconsistent date orders, for example, the date of patient death is before the date of prescription for 0.3% of dispensed prescription items. Among the FP10-dispensed prescription items, age at prescription is missing for 19% (in the unlinked data). However, the age at prescription can be approximated for patients with cancer using the prescription date and date of birth of the patient (from cancer registration data). For patients without a missing age at prescription, this calculation highlighted inconsistencies for 0.2% of dispensed prescription items in the linked data. An inconsistency was defined as a difference in age of more than 1 year. Examples of this included a recorded age on the prescription record of 2 years, compared with a calculated age of 102 years.

NHSBSA processing of Electronic Transfer of Prescriptions (ETP) messages has been interrupted on a small number of days and there are no dispensed prescriptions recorded for the first day in either April or May due to system downtime. A spike in the number of dispensed prescriptions exists at the end of each of the affected months, which could indicate a recording bias due to bulk processing of missed records. Another data quality issue is that the net ingredient cost for a particular drug can change each month, and assessments of cost must adjust for this; this field is present in both FP10 and EPS items.

Comparing the linked data to cancer registration data, there were systematic differences between the patients with cancer who were and were not dispensed a prescription. To demonstrate this, a cohort of patients diagnosed with malignant cancer (ICD-10: C00–C97, excluding C44), during April to July 2015 was identified. Among these 100 424 patients with cancer, 90 840 patients received a prescription in the same time period (90%). The distribution of demographic and tumour factors, for example, age and cancer type, varied between patients with cancer with and without a dispensed prescription record in the same period (figure 2). The confidence intervals did not overlap for all but two factors; however, the absolute differences were small. The most marked difference was for age at cancer diagnosis, where, for example, 49% of patients with cancer with a dispensed prescription were aged 70–89, but among patients with cancer without a dispensed prescription only 18% were aged 70–89. Variation by age is expected clinically, as individuals often need treatment for an increasing number of conditions as they age (therefore the background rate changes), and patients with cancer who do not have any prescriptions dispensed in the community may be particularly unwell and admitted to hospital. This highlights the need to control for age during analyses, particularly when matching to the general population. For sex, cancer site, stage at diagnosis and ethnicity, the absolute differences were small (≤1.0).

Figure 2

Representativeness of linked cancer registry and dispensed prescription data as compared with cancer registry data alone, by key patient and tumour characteristics (age at diagnosis, ethnicity, sex, tumour site and stage at cancer diagnosis). *Indicates non-overlapping confidence intervals. 

Patient and public involvement

Patients and the public were not involved in the development of this study. Data for this study are based on information collected by the NHS.

Findings to date

Timely, record-level community-dispensed prescriptions data provide a rich data source for public health; especially when the coverage is national and near complete. The linkage of dispensed prescriptions data to national cancer registration data creates a powerful resource that facilitates a better understanding of healthcare use over the duration of illness. This may, to our knowledge, become the largest resource of its kind globally, due to the underlying population size of England and longitudinal coverage of the national cancer registry. Projects of interest to NCRAS include a study of end of life prescribing in the community among patients with cancer, an investigation of repeat prescriptions to derive measures of prior morbidity status in patients with cancer and studies of prescription activity surrounding the date of cancer diagnosis.13 This work has not yet (as of March 2018) been published in peer-reviewed journals.

A very similar resource is available in Scotland (the Prescribing Information System), and the existence of their unique numeric patient identifier allows linkage to other Scottish healthcare data, including the Scottish National Cancer Registry. The population size of Scotland is smaller than England, but the benefit of such a linked resource has been demonstrated.14 The national structure of healthcare in Denmark has also afforded a similar resource to be created, and epidemiological studies investigating the impact on early diagnosis have been recently emerging.15–17 This English linked resource could be used for cancer epidemiological studies of diagnostic pathways, health outcomes and inequalities; to establish primary care comorbidity indices and for guideline concordance studies of treatment. Hormonal therapy is one example of this, as it is a major treatment modality for breast and prostate cancer which has been largely delivered in the community setting for a number of years.

Strengths and limitations

The linkage of community-dispensed prescriptions data to cancer registration data can greatly enhance our understanding of the patient pathway. It can provide novel insight into symptom profiles and improve our understanding of long-term trends in the patterns of drugs dispensed before and after a diagnosis of cancer (the latter of which may have utility as a proxy for quality of life). It could also potentially identify patterns which could indicate opportunities for the earlier diagnosis of cancer. The key strength of this data source is its population coverage. Linkage to the cancer registration data also provides information on the date and cause of death of the patient, allowing effective censoring. This linked dispensed prescriptions and cancer registration data resource can also be linked to other datasets held by NCRAS, including Hospital Episodes Statistics,18 the Radiotherapy Dataset19 and the Systemic Anti-Cancer Therapy dataset.20 The pseudonymisation process strengthens its value, as it has permitted the existence of an anonymised national control population for comparison purposes, to understand how patients with cancer differ from the general population.

There are a number of limitations of this data resource, as with all routinely collected healthcare data.21 22 First, prescriptions dispensed in a private setting, prison setting or without an NHS number recorded are not captured by the data. However, the impact of this is estimated to be less than 3%,23 less than 1%24 25 and 10% of all prescriptions dispensed. In addition, prescriptions that were written but not dispensed, or not submitted by the pharmacy to NHS Prescription Services are not captured, though this is thought to be minimal as the dispenser would not be reimbursed. Sex is not captured within the dispensed prescriptions dataset, neither is an alternative piece of information which could be used to derive sex (eg, title). This is only problematic for the control population, and limits the type of comparison that can be made.

The drug indication is not included in the prescription, therefore cannot be captured. For drugs with multiple indications, this may be problematic as it may be unclear which underlying condition required the prescription. To overcome this, the patients’ prescribing history could be used to approximate the indication. Future work could link drugs prescribed to likely indications as recommended by the BNF. Ideally, the patient’s medical history, as recorded in primary care, would be available. However, a national data source of this has not been identified. Many primary care databases are available, for example, the Clinical Practice Research Datalink and The Health Improvement Network database, but these only cover 6.9%26 and 6.2%27 of the UK population, respectively.

The full date of the dispensed prescription and the prescribing organisations’ postal code is only available for dispensed prescriptions from the EPS. In the linked data (patients with cancer diagnosed after 1994), the proportion of dispensed prescription items from FP10 forms, therefore without the day of prescription or prescribers’ postal code were 76% in April, 74% in May, 71% in June and 70% in July 2015, which is continually improving. The EPS is currently on phase 4 of deployment and release 2 of the system, with each phase improving the functionality for both GP practices and pharmacies, thereby improving ease of use. The latest statistics (as of 6 November 2017) showed that 91.6% of GP practices and 99.2% of pharmacies were using the EPS.6

Two further limitations exist, which are primarily relevant for pharmacovigilance studies, particularly long-term studies investigating whether a certain drug leads to cancer. Namely, the information captured is for drugs dispensed, irrespective of whether the patient actually took the drug. However, it can still be used as a proxy for an underlying disease or the presence of disease risk factor. Finally, for studies where detailed information on the strength of the drug is required, work must be undertaken to parse text fields and accurately calculate the amount of active ingredient, using combinations of the data items available.


The Data Sharing Agreement outlined the terms of the partnership, data transfer and subsequent use. These state that NHSBSA will remain the primary data owner for the prescriptions dataset and will continue to be responsible for the continuity, quality, timeliness and availability of the data it contains. PHE will assume the responsibility of in-common data controller as appropriate for all data transferred to it from the NHSBSA under the terms of the agreement. PHE will be responsible for determining and approving the specific purposes that fit within the objectives of the agreed data exchange, for which the linked dispensed prescriptions and cancer registration dataset are used to support its statutory functions and core remit. PHE will be responsible for reviewing and approving the release of extracts of the linked data.

PHE has been granted specific legal permission to collect information about patients with cancer for specific purposes, without the need to seek consent. These purposes include health improvement and service provision. This permission was granted to PHE through Section 251 of the NHS Act 2006. This support is reviewed annually by the Confidentiality Advisory Group of the Health Research Authority.28 PHE manages the release of potentially identifiable data through the Office for Data Release (ODR). The ODR provides a common governance framework for responding to requests to access PHE data, and is subject to strict confidentiality provisions in line with the requirements of the Common Law Duty of Confidentiality,29 the Data Protection Act 199830 (to be superseded by the General Data Protection Regulation (EU) 2016/679 which will take effect on 25 May 2018)31 and the 7 Caldicott principles.32 Applications to access this linked prescriptions data for patients with cancer should be directed through the ODR,33 and application forms are available on their website.33

The ODR accepts applications from UK, EEA and international organisations; however, approvals to process any data controlled by PHE will be subject to adequate safeguards being established with the data recipient to ensure that: the level of protection afforded to individuals by UK data protection laws is not undermined; the purpose of any request complements the permissions to process the data without consent granted to PHE by the Secretary of State under the Health Service (Control of Patient Information) Regulations 200234 and that appropriate ethical assurances are met.


The need to understand the healthcare of patients with cancer in the community is paramount, both following their diagnosis and to improve earlier diagnosis. Repurposing administrative datasets is an efficient method of doing this, providing the data quality and content is sufficient. National sources of reimbursed prescriptions, such as the resource described in this profile, have been valuable to epidemiological research in Scandinavian countries, particularly Denmark.15–17 We have demonstrated the value of dispensed prescriptions data for research and health monitoring in England, and the added value when linked to cancer registration information. The long-term coverage of the national cancer registration data has created a large linked resource even with 4 months of dispensed prescriptions data, and with more months of data the scale of the resource will expand creating a resource of unparalleled scale. The implications of data quality must be considered when designing studies, but the value to cancer research is clear.


The authors acknowledge and thank the entire team at NHS Business Services Authority for supporting and facilitating this partnership. Data for this study are based on patient-level information collected by the NHS, as part of the care and support of cancer patients. The data are collated, maintained and quality assured by the National Cancer Registration and Analysis Service, which is part of Public Health England (PHE).


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
View Abstract


  • Contributors Conception and design of the work: KEH, RB, JR, LE-B, GL, MGM and TR. Acquisition and pseudonymisation of the data: KEH, BS, PG, AH, KH, GM, NM and RR. Analysis of the data: KEH, VHC and KW. All authors made substantial contributions to the interpretation of the findings. All authors contributed to drafting the manuscript or revising it critically for important intellectual content and approved the final version submitted. All authors have agreed to be accountable for all aspects of the work.

  • Funding GL is supported by a Cancer Research UK Advanced Clinician Scientist Fellowship (C18081/A18180).

  • Competing interests None declared.

  • Patient consent Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Collaborations can be proposed to the National Cancer Registration and Analysis Service via Enquiries for data access can be made to Public Health England’s Office for Data Release (

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.