Introduction Maternity Hospital Episode Statistics (HES) data for 2005–2014 were linked to birth registration and birth notification data (previously known as NHS Numbers for Babies or NN4B) to bring together some key demographic and clinical data items not otherwise available at a national level. The linkage algorithm that was previously used to link 2005–2007 data was revised to improve the linkage rate and reduce the number of duplicate HES records.
Methods Birth registration and notification linked records from the Office for National Statistics (‘ONS birth records’) were further linked to Maternity HES delivery and birth records using the NHS Number and other direct identifiers if the NHS Number was missing.
Results For the period 2005–2014, over 94% of birth registration and notification records were correctly linked to HES delivery records. Two per cent of the ONS birth records were incorrectly linked to the HES delivery record and 5% of ONS birth records were linked to more than one HES delivery record. Therefore, a considerable amount of time was spent in quality assuring these files.
Conclusion The linkage rate for birth registration and notification records to HES delivery records steadily improved from 2005 to 2014 due to improvement in the quality and completeness of patient identifiers in both HES and birth notification data.
- national data
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
Linking three national data on births together greatly increased the number of variables available for analysis.
The findings are relevant for other users of trusted third party linkage who should not assume that datasets linked using patient identifiers are error free and may affect any analyses carried out on them.
Data are held in a secure environment at the Office for National Statistics so access is restricted but can be used by approved researchers.
When a baby is born in England and Wales, data are recorded in several separate information systems, namely birth registrations where mainly socio-demographic data are collected. A smaller set of data is recorded when the birth is notified to the NHS and the NHS Number, a unique identifier, is issued. Data about care at delivery are recorded in the Hospital Episode Statistics (HES) if the birth occurs in England. Data about care at delivery in Wales are recorded in the Patient Episode Database for Wales which is linked to the National Community Child Health database. Each of these systems includes common data items such as the baby’s and mother’s date of birth, postcode of residence and NHS Number which can be used as identifiers for record linkage.
In England and Wales, all live births must be registered within 42 days. The data recorded at registration include names, address of residence, place of birth, occupation of the parents and country of birth of mother and the father.1 The introduction of the interim NHS Numbers for Babies (NN4B) Service in 2002 provided the opportunity to obtain information such as gestational age and baby’s ethnicity data. Information on gestational age at birth is of key importance as babies born preterm, before 37 completed weeks of gestation, are at particularly high risk of morbidity and mortality in early years of life.2–4
A collaborative project was set up in 2004 between City University London, the Office for National Statistics (ONS) and the Welsh Assembly to link these datasets for all births that occurred in England and Wales from 2005 to 2007. Stage 1 of the project involved linkage of birth registration data with the birth notification data (previously known as NN4B dataset) and assessment of the quality and completeness of the notification data. This was piloted on the 2005 data.5 6 Since 2007, these datasets have been routinely linked by ONS and gestation-specific infant mortality and birth statistics have been published annually.7
Stage 2 of the project involved linkage of the dataset created in stage 1 to Maternity HES and assessment of data quality and completeness by comparison with birth registration or notification dataset, where possible.8 The linkage of Welsh data for all three years (2005–2007) was carried out separately.9
The primary focus of the first two projects was to test the feasibility of the linkages and assess the quality of the linked datasets. The next project aimed to answer a specific set of research questions. In 2013, a project was funded to describe and analyse daily, weekly and yearly cyclical variations in births and their outcome and explore the potential implications of the patterns observed for NHS staffing and for service users. This involved extension of the linkage in stages 1 and 2 to include births occurring in England and Wales between 1 January 2005 and 31 December 2014. This article describes the linkage of data for England. As before, data for Wales were linked separately.
Several variables are common to all three data sources, Maternity HES, birth registration and birth notification, as can be seen in table 1. In addition, some data items are unique to each data source and linkage enables new analyses using these linked data. For example, it is now possible to compare time of birth with birth outcomes, and report on the outcomes of birth by care at birth in terms of onset of labour and mode of delivery by gestational age, time of day and day of the week.
The source data, birth registration, birth notification and Maternity HES are described in detail in the article describing the linkage of data for 2005 and 2006.8
There are two types of maternity records in HES: the delivery record and the birth record. Both types of records consist of an admitted patient care record with an additional 19 fields, in an appended baby ‘tail’.
The HES delivery record is a mother-based record containing the mother’s details with a maternity tail and a baby tail which can accommodate up to nine babies born in one maternity. In contrast, the birth registration and notification linked data consist of one record per baby. Therefore, the linkage was based on linking babies to their mothers' records.
A HES birth record is generated for the baby. It contains the baby’s details and also has a baby tail containing the same type of information that is recorded in the corresponding baby tail of the mother’s delivery record.
The baby tail data coverage is less complete than the rest of the HES data. There are a number of reasons for the incompleteness and data quality issues, such as
trusts submitting a significantly higher number of delivery episodes compared with birth episodes
trusts failing to submit data on the number of birth episodes where they record a higher number of delivery episodes.
Patient identifiers including mother’s and baby’s NHS numbers, postcode of residence, mother’s and baby’s date of birth and baby’s sex together with a unique record ID were extracted by ONS from the linked birth registration and notification file and sent to the data linkage team at the Health and Social Care Information Centre, now known as NHS Digital.
The linkage algorithm that had been previously used to link 2005–2007 births was used to link further years data 2008–2014. ONS identifiers were first linked to the HES index to obtain HES patient identifiers known as HESIDs.10 These were then linked to the HES delivery records, but the number of duplicate HES delivery records linked to ONS birth records was very much higher than it had been when the data for 2005–2007 were linked. NHS Digital therefore recommended using its inhouse linkage algorithm that is used routinely to link ONS death registration data to HES11 except in our study step 8 of the algorithm involved using only the NHS Number, as shown in online supplementary appendix A. This was piloted on the 2005 data and the number of duplicate HES records linked to the ONS birth record and the linkage rate was ascertained before data for 2006–2014 were linked.
Supplementary file 1
The linked data provided by NHS Digital consisted of two files for each financial year from 2004/2005 to 2014/2015. One file contained ONS birth records linked to the HES delivery records and a second file, based on linkage of ONS birth records to HES baby records.
The linked data were accessed by researchers from City, University of London in the secure setting of the Virtual Microdata Laboratory facility at ONS. The researchers concerned had ONS Approved Researcher status.
The quality of linkage was assessed to ensure that the ONS birth record was linked to the correct delivery record in HES. This involved use of deterministic stepwise rules based on a combination of data items common to both datasets such as place of birth, birthweight, date of birth of the baby, gestational age, multiplicity and sex of baby.
A pilot study was carried out using the 2005 data. The file sent to NHS Digital consisted of 617 613 babies who were either born in England or resident in England. The resident in England category was used for births that occurred at home in the ONS linked dataset. NHS Digital first linked these to the HES index to get the HESID and then to the HES delivery records. The file returned to ONS consisted of 624 326 records with a HESID and a second file of 582 963 of ONS birth records that were linked to the HES delivery record. The number of ONS births linked to HESID was higher as it included old and new HESID for some women. This normally happens when a woman is allocated a new HESID and it subsequently becomes evident that she has already been assigned a HESID previously. In addition, there were 25 188 duplicate HES delivery records, that is where the ONS birth record was linked to more than one HES delivery record (table 2). By using the revised linkage algorithm, the number of duplicate HES delivery records linked to ONS birth records was reduced to 4% from 6%.8 Data for 2006–2014 were therefore then linked to HES delivery records using the revised linkage algorithm shown in online supplementary appendix A.
Around 66% of the previously linked ONS birth registration and notification records were linked to the HES delivery records in stage 1 of the linkage algorithm shown in table 3. This matched records having same mother’s NHS Number, exact date of birth, sex and exact full postcode. A further 29% of the ONS birth records were matched to HES delivery records using the exact date of birth, postcode of residence of the mother and sex (stage 6 of the algorithm). About 5% of the records were linked using a combination of mother’s NHS Number, exact or partial date of birth, sex and postcode. ONS birth records that were not linked to HES accounted for 3% of all records.
Linkage of ONS birth records to HES delivery records for births from 2005 to 2014 showed that the number of records linked using stage 1 of the algorithm increased from 66% in 2005 to 93% in 2014. There was a corresponding decrease in the number of records linked in stage 6 of the algorithm which excludes use of mother’s NHS Number from 29% to 3%.
Each year there were about 36 000 duplicate HES delivery records linked to ONS birth records, that is, where the same ONS birth record was linked to multiple HES delivery records. During assessment of the quality of linkage, the HES delivery record with mother’s and baby’s information matching the ONS birth record and with greatest amount of information on onset of labour and method of delivery was retained for analysis. The other records were discarded. In addition, there were 13 300 HES delivery records incorrectly linked to the ONS birth records. It took over 71 days to assess the quality of linkage and to produce a final linked dataset consisting of one ONS birth record linked to the relevant HES delivery record.
The baby file was much more straightforward to link than the mother file as it involved a one-to-one link between an ONS birth record and a HES birth record, also referred to as the HES baby record.
The numbers of HES birth records linked to ONS birth records, for each year from 2005 to 2014, were higher than the numbers of HES delivery records linked to the ONS birth records (see table 4). The quality of linkage of the baby file has yet to be assessed.
Although the linkage rate increased from 94% in 2005 to 97% in 2014, there were statistically significant differences between distributions of records that were linked to HES delivery records by NHS Digital and those that were not linked in terms of multiplicity, age of mother, ethnicity and region of residence (table 5). The linkage rate was 3% lower for multiple births than for singletons, 2% lower in mother’s aged under 15 years and 2% lower for those aged 40 years and above compared with all other age groups. A comparison by baby’s ethnicity showed that over 5% of black African babies were not linked to Maternity HES. Over 98% of the babies resident in East Midlands, North West, South Central, South West, West Midlands, and Yorkshire and The Humber were successfully linked to Maternity HES, and this proportion was slightly lower, 95%, among babies resident in London.
Although the data linkage team at NHS Digital has experience of linking external datasets to HES and we used a similar linkage algorithm to that routinely used by NHS Digital to link ONS death records to HES records, there were issues with the quality of linkage. In the period 2005–2014, 2% of HES delivery records were incorrectly linked to the ONS birth records as common data items such as place of birth, date of birth of baby, gestational age, birthweight, multiplicity and sex differed in the HES delivery and ONS birth records. In addition, 366 000 duplicate HES delivery records were linked to ONS birth records. This meant that a considerable amount of time was spent in quality assuring these files.
The number of birth registration and notification records linked to the HES delivery records using the NHS Number increased over the years from 2005 to 2014. This was not surprising as completeness of the mother’s NHS Number improved over time in the registration and notification linked records. In 2005, the mother’s NHS Number was present in over two-thirds of the records and this increased to over 90% in 2014. There were also a small proportion of HES records that had the mother’s NHS Number missing. A further quarter of the registration and notification linked records in 2005 were linked using exact date of birth, sex and postcode which reduced to 3% in 2014. There were concerns about using postcode in the linkage algorithm for linking data for earlier years, as the HES index may not hold all historical postcodes of residence of the mother and the postcode on registration and notification linked data was recorded at the time of registration. It is possible the mother could have moved since having the baby and this variable is also subject to recording and reporting errors.
Overall, a linkage rate of over 90% was achieved and it improved over time, especially in 2014, when there had been a shorter time before linkage was carried out and HESID would have been less likely to have changed. This suggests that HESID at birth could be retained as a separate field for linkage.
Although the linkage rate for ONS birth records to HES births was higher than the linkage rate for the delivery records and we did not assess the quality of linkage, our previous linkage study showed that there were many duplicate HES birth records linked to ONS birth records.8 In addition,
NHS Digital acknowledges that a high proportion of baby records are known to be missing in Maternity HES.12 HES delivery records include information about the baby and the mother so the quality of information in HES was assessed using the delivery records.
While ONS birth registration data have remained of consistently high quality, there have been issues with data quality and completeness in Maternity HES.8 12 13 The number of births and deliveries in London are under-represented in Maternity HES which could be due to under-reporting or complete lack of reporting, of births by several hospitals. Also HES currently captures few home births and none occurring in private hospitals, even though data about all births should be submitted to Maternity HES.
This study shows that it is possible to link a large majority of the linked birth registration and notification records to Maternity HES records, but linkage would be considerably more valuable if data quality and completeness improved in Maternity HES. Information about parity, onset of labour, method of delivery and complications in pregnancy can only be obtained at a national level from Maternity HES, so linking all three national datasets on births and maternity would expand the scope and range of data available.
The authors would like to thank all the relevant colleagues in the Office for National Statistics, NHS Digital, formerly the Health and Social Care Information Centre and the NHS Wales Informatics Service for their, help. In particular, they would like to thank Emma Gordon, Joanne Evans, Claudia Wells, Alex Lloyd, Justine Pooley, Elizabeth Mclaren and members of the VML Team at the Office for National Statistics, Ariane Alamdari and Garry Coleman at NHS Digital and Gareth John at the NHS Informatics Service. The authors would also like to thank everyone who took part in our Public and Patient Involvement activities for the advice and insights they gave and the members of our Study Advisory Group for their help and advice.
Contributors Nirupa Dattani was responsible for the linkage methodology and writing first draft of the paper, and Professor Alison Macfarlane re-drafted and provided final approval of the version to be published. We agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Funding This work was supported by the National Institute for Health Research. HS&DR Programme, project number HS&DR 12/136/93. Project title: ‘Births and their outcome: analysing the daily, weekly and yearly cycle and their implications for the NHS’.
Disclaimer The views and opinions expressed here are those of the authors and do not necessarily reflect those of the HS&DR programme, NIHR, NHS or the Department of Health. The data were processed in the secure environment of the Office for National Statistics’ Virtual Microdata Laboratory and the following disclaimer applies: This work contains statistical data from ONS which is Crown Copyright. The use of the ONS statistical data in this work does not imply the endorsement of the ONS in relation to the interpretation or analysis of the statistical data. This work uses research datasets which may not exactly reproduce National Statistics aggregates.
Competing interests None declared.
Patient consent Not required.
Ethics approval East London and City Local Research Ethics Committee 1 and its successors.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The authors do not have permission to supply data or identifiable information to third parties, including other researchers but they have permission under Section 251 of the Health Service (Control of Patient Information) Regulations 2002 to analyse patient identifiable data for England and Wales without consent and create a research database which could be accessed by other researchers using the VML at the Office for National Statistics. Anyone wishing to access the linked datasets for research purposes should apply to the Office for National Statistics and NHS Digital as well as to the Confidentiality Advisory Group of the Health Research Authority to access patient identifiable data without consent. We are currently in discussion with ONS and NHS Digital about the application process.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.