Article Text

Original research
Linkage of maternity hospital episode statistics birth records to birth registration and notification records for births in England 2005–2006: quality assurance of linkage
  1. Victoria Coathup1,
  2. Alison Macfarlane2,
  3. Maria Quigley1
  1. 1National Perinatal Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
  2. 2Centre for Maternal and Child Health Research, School of Health Sciences, City University, London, UK
  1. Correspondence to Dr Victoria Coathup; victoria.coathup{at}


Objectives The objectives of this study were to describe the methods used to assess the quality of linkage between records of babies’ birth registration and hospital birth records, and to evaluate the potential bias that may be introduced because of these methods.

Design/setting Data from the civil registration and the notification of births previously linked by the Office for National Statistics (ONS) had been further linked to birth records from the Hospital Episode Statistics (HES) for babies born in England. We developed a deterministic, six-stage algorithm to assess the quality of this linkage.

Participants All 1 170 790 live, singleton births, occurring in National Health Service hospitals in England between 1 January 2005 and 31 December 2006.

Primary outcome measure The primary outcome was the number of successful links between ONS birth records and HES birth records. Rates of successful linkage were calculated for the cohort and the characteristics associated with unsuccessful linkage were identified.

Results Approximately 92% (1 074 572) of the birth registration records were successfully linked with a HES birth record. Data quality and completeness were somewhat poorer in HES birth records compared with linked birth registration and birth notification records. The quality assurance algorithms identified 1456 incorrect linkages (<1%). Compared with the linked dataset, birth records were more likely to be unlinked if babies were of white ethnic origin; born to unmarried mothers; born in East England, London, North West England or the West Midlands; or born in March.

Conclusions It is possible to link administrative datasets to create large cohorts, allowing researchers to explore important questions about exposures and childhood outcomes. Missing data, coding errors and inconsistencies mean it is important that the quality of linkage is assessed prior to analysis.

  • births
  • hospital records
  • data-linkage
  • hospital episode statistics

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

Statistics from

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors All authors were involved in the design of the study with input from Rod Gibson (Data management consultant) and Nirupa Dattani (Data analyst at City, University of London). RG and ND were involved in the cleaning of the ONS births dataset. VC performed the data manipulation and analysis. All authors were all involved in the interpretation of the data. VC was responsible for the initial draft of the manuscript. AM and MQ reviewed and contributed to drafts of the manuscript, and all authors have reviewed the final version.

  • Funding This work was funded by the Medical Research Council: MR/M01228X/1. VC and MQ had full access to all the data in the study and final responsibility for the decision to submit for publication.

  • Disclaimer The funder had no input into the study design, data analysis, interpretation of results or writing of the manuscript.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

  • Patient consent for publication Not required.

  • Ethics approval Ethics approval for this study was granted by the Health Research Authority Research Ethics Committee (South West – Frenchay; REC reference Ethics 15/SW/0294). The TIGAR study used data linked as part of a previous study led by City, University of London.6 For that study, ethics approval 05/Q0603/108 and subsequent substantial amendments were granted by East London and City Local Research Ethics Committee 1 and its successors. Permission to use patient-identifiable data without consent under Regulation 5 of the Health Service (Control of Patient Information) Regulations 2002 (‘section 251 support’) was initially granted by the Patient Information Advisory Group PIAG 2-10(g)/2005. Renewals and amendments and a second permission, CAG 9-08(b)2014, under Regulation 5 of the Health Service (Control of Patient Information) Regulations 2002 (or ‘same legislation’) were granted by the Secretary of State for Health and the Health Research Authority following advice from the Confidentiality Advisory Group (CAG) to use patient-identifiable data without consent and create a research database held at the ONS for analyses relating to inequalities in the outcome of pregnancy and to inform maternity service users about the outcome of midwifery, obstetric and neonatal care. For the TIGAR study, permission from the Health and Social Care Information Centre for the work described in this article was included in Data Sharing Agreement NIC-273840-N0N0 N.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement The authors do not have permission to supply data or identifiable information to third parties, including other researchers, but the team at City, University of London has permission under Regulation 5 of the Health Service (Control of Patient Information) Regulations 2002 to analyse patient-identifiable data for England and Wales without consent and create a research database that could be accessed by other researchers using the SRS at the ONS. The TIGAR team has permission under Regulation 5 of the Health Service (Control of Patient Information) Regulations 2002 to analyse these. Anyone wishing to access the linked datasets for research purposes should apply via the Confidentiality Advisory Group (CAG) to the Health Research Authority to access patient-identifiable data without consent and then to the ONS and NHS Digital. In the first instance, enquiries about access to the data should be addressed to Alison Macfarlane.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.