Linkage of Maternity Hospital Episode Statistics data to birth registration and notification records for births in England 2005-2014: methods. A population-based birth cohort study

Introduction Maternity Hospital Episode Statistics (HES) data for 2005–2014 were linked to birth registration and birth notification data (previously known as NHS Numbers for Babies or NN4B) to bring together some key demographic and clinical data items not otherwise available at a national level. The linkage algorithm that was previously used to link 2005–2007 data was revised to improve the linkage rate and reduce the number of duplicate HES records. Methods Birth registration and notification linked records from the Office for National Statistics (‘ONS birth records’) were further linked to Maternity HES delivery and birth records using the NHS Number and other direct identifiers if the NHS Number was missing. Results For the period 2005–2014, over 94% of birth registration and notification records were correctly linked to HES delivery records. Two per cent of the ONS birth records were incorrectly linked to the HES delivery record and 5% of ONS birth records were linked to more than one HES delivery record. Therefore, a considerable amount of time was spent in quality assuring these files. Conclusion The linkage rate for birth registration and notification records to HES delivery records steadily improved from 2005 to 2014 due to improvement in the quality and completeness of patient identifiers in both HES and birth notification data.


Introduction
This is clear.

Methods
The second paragraph of the record linkage section suggests that step 8 of the usual NHS Digital linkage algorithm was omitted for this project but this does not tally with Table 2: please clarify.
Can a reference be provided for the paper on quality assuring the linked dataset?

Results
In the first paragraph of the mother file section can the authors clarify why NHS Digital returned a higher number of births linked to HESID (624,326) than there were in the birth registration/notification file (617,613)?
In the second paragraph of the baby file section, the authors state that a lower proportion of birth registration/notification records linked to HES birth records than to HES delivery records but this does not tally with Tables 1 and 3.
In the linkage bias section, the authors state that >2% of records relating to Black African babies did not link to HES delivery records but Table 4 suggests it was >5%.

Discussion
In the first paragraph the authors state that 4-6% of birth registration/notification records were 'incorrectly linked' to HES delivery records. Can this statement be clarified?
A reference 14 is cited but this is missing from the reference list.

Conclusions
The second paragraph may well be true but it is not supported by results presented in this paper. Please either provide supporting results or amend.

Tables
Should the heading of the 3rd column in Table 1 read including (rather than excluding) duplicated HES delivery records?
In general the headings of the tables could be tidied up and made more consistent. For example, Table 3 does not provide results by match rank and the terminology for specific datasets is variable eg birth notification/NN4B.
No reporting checklist accompanies this paper as far as I can tell. A STROBE or, preferably, a RECORD statement would be appropriate.

REVIEWER
Toan C Ong University of Colorado Anschutz Medical Campus, Aurora, Colorado USA No Competing Interest REVIEW RETURNED 14-Jun-2017

GENERAL COMMENTS
Thank you for the opportunity to review this paper. This paper describes the efforts to perform linkage between clinical and demographics data to ultimately generate a more complete dataset of mothers and babies. It is very encouraging to learn that the authors were able to link large datasets using a small set of overlapping linkage variables. While there are very interesting insights from the results, there are major concerns: -The value of linking real patient data is apparent. However, that alone will be insufficient to justify a separate publication. I struggled to see the scientific contributions of this paper.
This entire paper can fit into the Result and Discussion of a more comprehensive paper.
-Other than the description of the linkage variables and how they were compared (exact or partial), the record linkage method was not adequately described.
-It was not clear how the linkage result was validated. The authors mentioned that process is described elsewhere but I disagree with that separation. -Many observations were described vaguely without further articulation. For example, quote from the paper "The linkage rate was 3 per cent lower for multiple births, 2 per cent lower in mother's aged under 15 and 3 per cent lower for those aged 40 years and above." Questions: Lower than what? Why lower? -Using the NHS number helped link the majority of the datasets. It is interesting to know how many records were linked using just the NHS number as the linkage variable. What is the performance when the NHS is absent?
Overall, I find that the findings of this paper are very interesting. However, the authors should consider a comprehensive paper which discusses in details the linkage methods, the results, and the validation process. Such paper will be much more impactful than separate ones.

Reviewer Name: Rachael Wood
Abstract Comment: The second paragraph of the results section introduces findings that are not in the main paper. If the authors wish to present these findings, they should add a substantive table showing completeness of key data items in different data sources.
Response: Thanks for pointing this out. These findings were identified with the 2005 to 2007 that we linked previously and the paper was published in the ONS Journal Health Statistics Quarterly. I have now removed these findings from the abstract.

Methods
Comment: The second paragraph of the record linkage section suggests that step 8 of the usual NHS Digital linkage algorithm was omitted for this project but this does not tally with Table 2: please clarify.
Response: Sorry to have confused you. I have now re-written this paragraph to say that step 8 of the linkage algorithm was used in our study.
Comment: Can a reference be provided for the paper on quality assuring the linked dataset?
Response: We were hoping that the paper by Gill Harper titled 'Quality assuring linked ONS birth records and maternity Hospital Episode Statistics delivery records on singleton and multiple births in England 2005 to 2014' would be published by BMJ Open at the same time as this linkage paper but this may not happen so I have mentioned briefly in this paper how the linkage was quality assured and how long it took.

Results
Comment: In the first paragraph of the mother file section can the authors clarify why NHS Digital returned a higher number of births linked to HESID (624,326) than there were in the birth registration/notification file (617,613)?
Response: There were a higher number of births linked to HESID as it included old and new HESID for some women. This normally happens when a woman is allocated a new HESID and it subsequently becomes evident that the woman has already been assigned a HESID previously.
Comment: In the second paragraph of the baby file section, the authors state that a lower proportion of birth registration/notification records linked to HES birth records than to HES delivery records but this does not tally with Tables 1 and 3.
Response: Thanks for picking this up. You are correct that the proportion of birth registration/notification records linked to HES birth records is higher than HES delivery records so the text has now been corrected.
Comment: In the linkage bias section, the authors state that >2% of records relating to Black African babies did not link to HES delivery records but Table 4 suggests it was >5%.