Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

The United Kingdom National Neonatal Research Database: A validation study

Abstract

Background

The National Neonatal Research Database (NNRD) is a rich repository of pre-defined clinical data extracted at regular intervals from point-of-care, clinician-entered electronic patient records on all admissions to National Health Service neonatal units in England, Wales, and Scotland. We describe population coverage for England and assess data completeness and accuracy.

Methods

We determined population coverage of the NNRD in 2008–2014 through comparison with data on live births in England from the Office for National Statistics. We determined the completeness of seven data items on the NNRD. We assessed the accuracy of 44 data items (16 patient characteristics, 17 processes, 11 clinical outcomes) for infants enrolled in the multi-centre randomised controlled trial, Probiotics in Preterm Study (PiPs). We compared NNRD to PiPs data, the gold standard, and calculated discordancy rates using predefined criteria, and sensitivity, specificity and positive predictive values (PPV) of binary outcomes.

Results

The NNRD holds complete population data for England for infants born alive from 25+0 to 31+6 (completed weeks) of gestation; and 70% and 90% for those born at 23 and 24 weeks respectively. Completeness of patient characteristics was over 90%. Data were linked for 2257 episodes of care received by 1258 of the 1310 babies recruited to PiPs. Discordancy rates were <5% for 13/16 patient characteristics (exceptions: mode of delivery 8.7%; maternal ethnicity 10.2%, Lower layer Super Output Area 16.5%); <5% for 9/16 processes (exceptions: medical treatment for Patent ductus arteriosus 6.1%, high-dependency days 10.2%, central line days 11.2%, type of first milk 22.3%; and during first 14 days, summary of types of milk 13.8%; number of days of antibiotics 9.0%; whether antacid given 5.1%); and <5% for 10/11 clinical outcomes (exception: Bronchopulmonary dysplasia, defined as oxygen dependency at 36 weeks postmenstrual age 3.3%). The specificity of NNRD data was >85% for all outcomes; sensitivity ranged from 50–100%; PPV ranged from 58.8 (95% CI 40.8–75.4%) for porencephalic cyst to 99.7 (95% CI 99.2, 99.9%) for survival to discharge.

Conclusions

The completeness and quality of data held in the NNRD is high, providing assurance in relation to use for multiple purposes, including national audit, health service evaluations, quality improvement, and research.

Introduction

Accurate data describing interventions and outcomes from well-defined populations are important for monitoring and planning healthcare while also offering opportunities for national and international benchmarking and a platform for clinical research. Worldwide there is a paucity of population based neonatal data [1]. In the United Kingdom healthcare is provided for all through the National Health Service (NHS) funded directly from taxation, and offers opportunity for complete capture of population data.

Neonatal care is a nationally commissioned specialised service delivered through networks of hospitals. The implementation of routine electronic data capture across all networks provides a unique opportunity to acquire population based data without additional data collection systems. Data on newborn infants receiving hospital care whether it be on the neonatal unit, postnatal or transitional care ward are captured on electronic patient records (EPR) held on a web-based platform, BadgerNet, managed by an approved NHS supplier, Clevermed Ltd (Level 6, Edinburgh Quay, 133 Fountainbridge, Edinburgh, EH3 9QG, www.clevermed.com). An extract of over 400 items for each baby forms The Neonatal Dataset (NDS) [2] approved in 2013 as a national NHS Information Standard by the NHS Information Standards Board (now NHS Digital) (ISB1595 version 1.0; now Standardisation Committee for Care Information (SCCI) 1595) [3]. An increasing number of hospitals currently including all of those in England, Wales and Scotland are members of the UK Neonatal Collaborative (UKNC) and the necessary regulatory approvals are in place for the data from each of those hospitals to be transferred quarterly from the BadgerNet platform to the Neonatal Data Analysis Unit (NDAU), an independent research unit of Imperial College London set up in 2007. The NDAU has approvals to use these data to create the National Neonatal Research Database (NNRD). In 2012, the UK Neonatal Collaborative (UKNC) was formed, consisting of all NHS neonatal units contributing data to the NNRD. This database, now includes details of 100,000 infants admitted to neonatal care each year; it has provided data for a wide range of NHS reports and research studies published in Peer reviewed journals [48].

Neonatal databases have been established in many countries; the NNRD differs from most by being compiled from EPR with no extra data collection. It is one of the largest clinical databases and holds the largest range of patient characteristics [1]. Data completeness and accuracy are also important considerations, yet formal quality assurance of databases is rarely reported and probably rarely undertaken [9]. In 2010, funding was secured by NDAU from the National Institute of Health Research (NIHR) Medicines for Neonates programme to explore the potential of the NNRD to facilitate research. In this study, we aimed to evaluate the population coverage and data quality of the NNRD data for English hospitals.

Ethics approval

The National Neonatal Research Database has Research Ethics Approval (London Queen Square Research Ethics Committee Reference number 16/LO/1930).

Methods

We compared data held in the NNRD with independently collected data from the Office for National Statistics [10] and the Probiotics in Preterm babies Study (PiPS) [11]. The latter was a multi-centre, double blind, placebo-controlled, randomised trial funded by the Health Technology Assessment programme of the UK National Institute for Health Research. Twenty four centres in south-East England recruited patients between July 2010 and July 2013. PiPS trial data were collected using conventional paper Clinical Record Forms (CRF), subjected to a standard series of range, logic and missing data checks and double entered onto a dedicated trial database fulfilling standards of ICH-GCP at the Clinical Trials Unit at the National Perinatal Epidemiology Unit, University of Oxford.

Data flows

EPR records are held by Clevermed Ltd and stored on a secure NHS server from which individual neonatal units access their data. The NDAU obtained approvals from the Caldicott Guardians of the NHS Trust of each contributing neonatal unit, to receive a predefined data extract (the Neonatal Data Set) from the EPR of each infant admission. Clevermed Ltd transmits these data to the NDAU, where the NNRD is formed. Data are ‘cleaned’ by applying completeness, logic and range checks.

Neonatal services are arranged so that babies move between hospitals according to their clinical need, thus the in-patient period between birth and the first discharge home (or death) may include several episodes in different hospitals. For each infant, to create the NNRD, a single record is compiled by linking the episodes of care across different neonatal units using a unique identifier created by Clevermed, the BadgerID.

The NNRD is held on the NHS servers of Chelsea & Westminster NHS Foundation Trust and updated quarterly using MS SQL and SAS programming to include updated patient records from the previous time period. To-date, the NNRD contains data from the year 2006 on over 800,000 infants admitted to NHS neonatal units, and over 10 million care-days. Fig 1 illustrates the data flows and examples of outputs from the NNRD.

The neonatal dataset

Infants are identified by their unique BadgerID; no patient identifiers (NHS numbers or names) are stored in the NNRD. Age in minutes from birth, and month and year of birth are stored instead of exact dates. An episode of care is defined as a continuous admission in the same neonatal unit. An infant can have multiple episodes of care e.g. if an infant was transferred from hospital A to hospital B, there are two episodes and back to hospital A would be three.

There are three different types of data: demographic details (e.g date and place of birth, birth weight) entered only once for all infants; episodic items (e.g. blood culture, clinical outcomes and diagnoses) which may be entered during each episode of care; and ‘daily’ items that include level of care (special/high dependency/intensive), which is categorised from raw data by embedded programming following data entry, and clinical interventions (e.g. respiratory support, type of feeds, surgical procedures, high cost drugs). Daily location and whether the infant’s mother is resident and providing care are items, required to distinguish between infants cared for on a neonatal unit, postnatal ward or transitional care ward; these data are required for the categorisation of level of care. Diagnoses include fixed choice and some free-text items. Each data item is clearly defined in an accompanying meta-data set, and mapped to existing national standards and ICD codes; conversion to the international medical nomenclature, Snomed CT terminology, is underway [12]. The NDAU is the data guardian; the data controller is Chelsea & Westminster Hospital NHS Foundation Trust. The NDS was approved after data harmonisation was undertaken across NHS datasets assisted by the NHS Data Dictionary team. This included a public consultation to obtain views on included data items; a process undertaken annually to revise the NDS to reflect current practice.

Population coverage and data quality

We determined the proportion of neonatal units in England contributing data to the NNRD, and the proportion of infants born in England with an NNRD record (by gestational week) in 2008–2014. We obtained denominator figures for the annual number of neonatal units in England and live births from the National Neonatal Audit Programme [13] and Office for National Statistics, respectively [10]. To examine NNRD data completeness, we calculated the percentage of missing data for seven items applicable to all infants (gestational age (GA), sex, birth-weight, antenatal steroids, mode of delivery, multiple birth and survival to discharge from neonatal care). In addition, for antenatal steroids and mode of delivery, we performed a subgroup analysis to determine whether completeness was higher among infants born <32 weeks GA, compared to all GA.

To assess data accuracy at patient level, we performed data linkage between the NNRD and PiPs trial database and compared the agreement between 44 pre-specified items present on both databases (16 patient characteristics, 17 processes, 11 clinical outcomes).

Levels of agreement with criteria for minor and major discordancy were predefined for all 44 items by two of the authors (CB & KC) (S1S4 Tables). For instance, for a binary item such as whether or not an infant had surgery for a PDA, any difference was considered as a major discordancy whereas for an item such as number of days that a central venous line had been in place a tolerance of +/- 2 days was deemed acceptable, +/- 3–4 days as a minor and +/- 5 or more days as a major discordancy. The 16 patient characteristics were expected date of delivery (EDD), GA, month of birth, year of birth, birth-weight, sex, five minute APGAR score, born in this hospital, singleton or multiple birth, birth order, maternal year of birth, maternal ethnicity, any antenatal corticosteroids given, caesarean or vaginal delivery, instrumental delivery and maternal lower layer super output area (LSOA) (derived from maternal postcode) [14]. In England, the smallest geographical area of practical use, that is, the level at which most national datasets are collected, is the LSOA [15]. These areas are revised after each decennial census to ensure that they contain around 1500 inhabitants. LSOA can be linked to the Index of Multiple Deprivations (IMD) 2010, which reports continuous scores for seven domains of deprivation, for each LSOA in England and Wales [16]. The 17 processes were intensive care days, high dependency care days, central venous line days, length of stay, transfer to another hospital, discharge month, discharge year, surgery for patent ductus arteriosus (PDA), medical treatment for PDA, retinopathy of prematurity (ROP) treatment by laser or cryotherapy, day of first milk, type of first milk, summary of types of milk in the first 14 days, any antibiotics given and number of days of antibiotics in the first 14 days, any antacid given and number of days of antacid given in the first 14 days. The 11 clinical outcomes were worst stage of ROP in any eye, bronchopulmonary dysplasia (defined as supplementary oxygen at 36 weeks postmenstrual age (PMA)), mechanical respiratory support at 36 weeks PMA, any diagnosis of perforated necrotising enterocolitis, any gastrointestinal perforation, any abdominal surgery for NEC, haemorrhagic parenchymal infarct, hydrocephalus, periventricular leucomalacia, porencephalic cyst, and survival to discharge from neonatal care. Analyses were conducted at the level of the episode in each hospital and at infant-level for each infant’s total hospitalisation. The PiPS trial captured EDD, from which GA was calculated, and feeding data for the first 14 postnatal days. Therefore GA discordancy was only calculated for infants with EDD in both databases; ‘first feed’ discordancy was only calculated for infants fed within the first 14 days with no missing data prior to the first reported feed in the NNRD.

Statistical methods

We calculated the percentage of missing data for each item in the PiPs database and the NNRD and minor and major discordances using the predefined criteria. To explore variation of completeness of data by centre we presented the proportion of missing data for incomplete variables using centre-specific box-plots. Infants with missing data were excluded from comparisons. For continuous items, we calculated mean and median differences and 95% limits of agreement for the differences. For binary items, we calculated the percentage of infants for whom the NNRD and PiPS trial data differed with the 95% confidence interval. For items with discordancy rates of less than 5%, we used the Poisson approximation to the Binomial to calculate confidence intervals; otherwise we used the Agresti and Coull method for Binomial confidence intervals as this method has better coverage properties [17]. In addition, for binary processes and outcomes, we calculated sensitivity and specificity, treating PiPs data as the gold standard. We report the prevalence of outcomes in both databases. Sensitivity = number of infants with the disease that are correctly identified by the NNRD/number of infants with the disease identified in the PiPS data. Specificity = the number of infants without the disease correctly identified by the NNRD/number of infants without the disease. PPV = number of infants with the disease that are correctly identified by the NNRD /number of infants (correctly or incorrectly) identified by the NNRD as having the disease. NPV = number of infants without the disease correctly identified by the NNRD/ number of infants (correctly or incorrectly) identified by the NNRD as not having the disease (Fig 2). Analyses were performed using computer codes in SAS version 9.3 and STATA version 11.

thumbnail
Fig 2. Sensitivity, specificity and positive predictive value of data held on NNRD.

https://doi.org/10.1371/journal.pone.0201815.g002

Regulatory approvals.

National Research Ethics Service approval was granted in 2010 to establish the NNRD (10/H0803/151). Caldicott Guardian and Lead Neonatal Clinician approval from every NHS Trust are also held. A Parent Information Leaflet offers parents the opportunity to opt-out, although to date this has not occurred. The Research Ethics Committee advised that the NNRD PiPS data comparison was a data quality assurance study and did not require research ethics approval; a data sharing agreement for this study was agreed between the National Perinatal Epidemiology Unit where the PiPs database was held, and NDAU [11].

Results

NNRD population coverage and data completeness

The proportion of neonatal units in England contributing to the NNRD rose from 78% in 2008 to 100% (163) in 2012 (Fig 3). Closures and mergers resulted in fluctuations in the number of neonatal units over the years. The percentage of live births with an NNRD record increased over the years for all gestational ages, and has been fairly constant since 2012 (Fig 4). Between 2012 and 2014, almost 100% of infants born in England at a GA of 25–31 weeks had an NNRD record; the figures for live born infants at 23w and 24w GA were lower at 70% and over 90%, respectively. The percentage of infants with an NNRD record diminish with increasing GA; 98% for infants 32-33w; 90% 34w; 60% 35w, 40% 36w and 20% 37w. However, over time there has been an increase in the proportion of more mature live births born ≥32 weeks GA with a NNRD record (Fig 4).

thumbnail
Fig 3. Percentage of neonatal units in England contributing data to the NNRD 2008–2014.

https://doi.org/10.1371/journal.pone.0201815.g003

thumbnail
Fig 4. Bar chart showing percentage of infants born in England with an NNRD record.

https://doi.org/10.1371/journal.pone.0201815.g004

Table 1 shows the completeness of seven data items for 568,143 infants admitted over 2008–2015. At national level, well completed data items (<1% missing) in the NNRD include GA, sex, multiple birth and birth-weight. Compared to all GA, the subgroup of infants born <32w GA had less missing data for antenatal corticosteroids (4% <32w; 6.7% all GA), and mode of delivery s (6.7% <32w GA; 20.4% all GA).

thumbnail
Table 1. Percentage of infants with an NNRD record with complete data by year of birth.

https://doi.org/10.1371/journal.pone.0201815.t001

Assessment of data accuracy

Data for 1,310 infants recruited into the PiPs trial recruited in the South East of England over 37 months from July 2010 were available for analysis. Clevermed was able to provide Badger ID for 1,280 (98%) infants. We further excluded 22 infants who had episodes missing from the NNRD database because they occurred in a paediatric ward which does not use BadgerNet, inaccuracies in admission and discharge dates, or inconsistencies in the names of hospitals and NHS Trusts (which were stored as free text on the PiPs database compared to drop down menu on BadgerNet). In total, we excluded 103 episodes of care and 52 infants, leaving a final dataset of 2257 episodes of care from 1258 infants for comparison (Fig 5). Infants with missing data are excluded from calculations of discordancy.

thumbnail
Fig 5. Records from PiPs Clinical Record Forms (CRF) and the NNRD.

https://doi.org/10.1371/journal.pone.0201815.g005

For baseline characteristics, proportion of missing data was higher on the NNRD compared to PiPs database, and >4% missing for EDD, Apgar at 5 minutes, maternal ethnicity, maternal LSOA, and mode of delivery (Table 2). Box-plots show variation in data completeness across 24 PiPS recruiting units for these five variables (Fig 6).

thumbnail
Fig 6. Box plot to show unit variation in data completeness across 24 neonatal units for five variables.

https://doi.org/10.1371/journal.pone.0201815.g006

thumbnail
Table 2. PiPs vs NNRD: Comparison of baseline infant and maternal characteristics.

https://doi.org/10.1371/journal.pone.0201815.t002

There was no major discordancy for month and year of birth; <1% discordancy for birth weight, sex, APGAR Score at 5 minutes, birth order; 1–3% discordancy for whether born in this hospital, singleton/multiple, maternal year of birth, antenatal steroids and instrumental delivery.

For continuous outcomes, major discordancy (difference of ≥7days) was low for EDD and GA, at 4.1% (95% CI 3.0–5.5%) and 3% (95% CI 2.1–4.1%), respectively and highest for maternal ethnicity (10.2%, 95% CI 8.6–12.1%) and maternal Lower Super Output Area (16.5%, 95% CI 14.4–18.4%) (Table 2).

For processes/interventions, compared at episodic or infant-level, major discordances (difference of ≥5 days), were highest for days of high dependency care and central venous lines at 10.2 (95% CI 9.0–11.5%) and 11.2% (95% CI 10.0–12.6%), respectively (Table 3). Discordancy for medical treatment of a PDA was 6.0%. For all other items (intensive care days, length of stay, transfer, discharge month and year, surgery for PDA, ROP treatment by laser or cryotherapy), discordancies were less than 5%.

thumbnail
Table 3. PiPs vs NNRD: Comparison of processes and interventions by episode or infant-level.

https://doi.org/10.1371/journal.pone.0201815.t003

For processes in the first 14 days, any antibiotics given had a low discordancy (0.6%, 95% CI 0.2–1.1%) but the number of days given had a major discordancy (>2 days difference) of 9.0% (95% CI 7.6–10.8%) (Table 4). Discordancy for any use of antacid was 5.1% (95% CI 4.0–6.4%) and for number of days given, 4.8% (95% CI 3.7–6.2%). There was high agreement for day of first milk feed, with 2.8% major discordancy (≥2 day difference). There was high discordancy for type of milk given on first day of milk feed (22.3%, 95% CI 19.6–25.1%) and the summary of different milks given over the first 14 days (13.8%, 95% CI 12.0–15.8%).

thumbnail
Table 4. PiPs vs NNRD: Comparison of feeds and medications in first 14 postnatal days.

https://doi.org/10.1371/journal.pone.0201815.t004

For all outcomes, discordancy was below 10% except use of oxygen at 36 weeks post-menstrual age which had a discordancy rate of 13.3% (95% CI 11.2–15.8%) (Table 5). Lowest discordancy was for survival to discharge from neonatal care (0.2, 95% CI 0.02–0.6%). Discordancy was 1–2% for worst stage ROP in any eye, any gastrointestinal perforation, hydrocephalus, periventricular leukomalacia; and 2–3% for Any diagnosis of perforated NEC, Any abdominal surgery for NEC, haemorrhagic parenchymal infarct, porencephalic cyst.

Sensitivity and specificity

The prevalence of all outcomes using the NNRD was similar to that derived from the PiPs database (Table 6). The sensitivity of NNRD data for identifying survival was 100% and for adverse outcomes was 50–87%. Specificity was over 85% for all outcomes with the majority above 90%. The prevalence of adverse outcomes among infants <32 weeks is low and less than 6% with the exception of BPD, defined as oxygen dependency at 36 weeks PMA and medical treatment for PDA (49.0% and 20.3% respectively). The PPV of all outcomes with the exception of perforated NEC (66.0%; 95% CI 51.2, 78.8) and details of cerebral ultrasound scans, was over 75% (Table 6).

thumbnail
Table 6. Sensitivity, specificity and positive predictive values of key processes and outcomes reported on the NNRD as determined by comparison with PiPs data.

https://doi.org/10.1371/journal.pone.0201815.t006

Discussion

This is the first study to formally evaluate the population coverage and accuracy of data held on the NNRD. Completeness and accuracy are fundamental components of data quality (15) yet worldwide there are few published reports on the accuracy of population health data. We believe such assessments to be essential both to confirm the validity of data that potentially underpin a range of important research and service functions and also to highlight areas where modification of the data collection tools will improve data quality.

The number of neonatal units contributing to the NNRD has steadily increased over the years, including all 163 neonatal units in England since 2012, and units in Wales and Scotland since 2015. National ONS data covers all reported live births in England and Wales including any that die in the delivery room and healthy babies with no involvement with neonatal medical services, neither of these groups is entered on the NNRD. Our data show that the NNRD represents complete population-based data for live-born infants born 25 to 31w GA. The discrepancies at 23 and 24w of gestation (70 and 90% representation on the NNRD) are presumably due to death on labour ward and suggest continuing increase of admission rates at these gestations compared with those reported by the population based EPICure studies which in 2006 reported admissions of live births of 64% at 23 and 86% at 24w of gestation [18]. These changes are likely to be related to improved condition at birth and changing attitudes towards the management of extreme preterm infants. We speculate that the increase over time in percentage of infants ≥ 32w with NNRD records may be due to changes in commissioning and a drive to capture for payment purposes medical care outside the neonatal unit e.g. postnatal or transitional care wards.

For baseline patient characteristics, the completeness of data on the NNRD was generally high with the exception of maternal ethnicity and LSOA (derived from maternal postcode), five minute Apgar score and vaginal/caesarean birth. Linkage of maternal and neonatal datasets to create a seamless perinatal dataset would address these problems and avoid the need for duplication of data entry and the risk of transcription errors.

Discordancy was low for most patient characteristics but for processes/interventions we found a high discordancy for the type of feed given on the first day of feeding and in general discordancy was higher for items involving counting days e.g. days of antibiotic treatment in the first 14 days and days with central venous lines. For infant-level outcome data, major discordancy was low except for whether infants were receiving supplementary oxygen on the day they reached 36 weeks PMA (13.3%).

There are a number of possible reasons why differences between the two data sources were found. The choice of data variables for comparison was constrained by what was available on the two databases and while most items describing baseline characteristics were entered onto both in response the same direct question e.g. ‘What was the birthweight?’ the majority of processes/interventions and outcomes were asked for directly at the end of each episode on the PiPS CRF e.g. ‘In this hospital did the infant have a PDA treated surgically?’ whereas in the EPR data could be entered into and extracted from any of three places on the EPR, daily data, discharge diagnoses or procedures during the stay with no direct questions or check lists requiring negative entries. Absence of positive entry on the NNRD was interpreted to mean that the intervention or outcome did not occur whereas it might simply have been missing. This might lead to under reporting within the NNRD and thereby increase discordancy. This problem could easily be overcome with redesign of some entry screens, or the introduction of check lists on the EPR to be completed at discharge.

One of the great strengths of the EPR system underpinning the NNRD and contributing to its richness, is the acquisition of daily data with details of management including items such as the presence of central venous lines, oxygen use, mechanical respiratory support and details of medications. In practice these are used to compute the infant’s level of care (normal, special care, high dependency, intensive) and form the basis of charging within the NHS with mechanisms to avoid double counting when babies move between hospitals. It was agreed when this study was planned that the data on the PiPS trial database should be taken as the gold standard. Data describing length of stay in intensive/high dependency care etc for the PiPS trial were collected in response to the appropriate question at the end of each episode ‘In this hospital for how many days………’ and it is possible that for these items the NNRD data, derived as they are from the raw daily data, are the more accurate.

The levels of agreement and discordancy limits preset by the authors seemed reasonable at the time. As the study proceeded and the complexity of the data including the matching of episodes of care within the total stay emerged, and on subsequent consideration of the structure of the two databases, we have to conclude that it was unrealistic to hope that data describing varying practice such as what different milks a baby received in any one day would be recorded identically in both systems. The accurate recording of complex data such as these and of medications would be helped by standardisation of the structure of questionnaires across clinical and research databases.

We identified high specificity but low sensitivity for some important outcomes. The fact that the PPV was generally high despite low overall prevalence for key outcomes highlights the potential utility of the NNRD as a large and growing population database. Smaller local or regional databases would be unlikely to have adequate statistical power to detect clinically important signals. Overall findings were similar to that of an assessment of the accuracy of routinely collected hospital discharge data in New South Wales against data from a statewide audit of selected neonatal intensive care (NICU) admissions. They also found that, though under-ascertained, routinely collected hospital discharge data had high PPVs for most validated items but that procedures tended to be more accurately recorded than diagnoses [19].

A key strength of our study is the comparison with data from an independent clinical trial conducted to the standards of ICH-GCP. The lack of such a comparator is a common limitation of other database validation studies [9]. We were able to assess patient-level rather than aggregated data, and were able to calculate sensitivity, specificity and negative predictive value, rather than only the PPV as in previous validation studies of the General Practice Research Database (GPRD) [20]. Validation studies often only report the proportion of cases that were confirmed by medical record review or responses to questionnaires, thereby only providing an estimate of PPV. Further, whilst many validation studies have not been blinded or reported by blinded reviewers, our comparisons were automated using computer codes written without knowledge of the dataset identity. We also defined the minor and major discordancy a priori to mitigate bias.

Our study has number of limitations; the principal being the constraints imposed on the scope of the comparison because of lack of standardisation of data items. Also we were not able to validate all episodes held on the PiPs trial database against the NNRD. Data linkage was considered at two levels: first whether an infant recruited into PiPS appeared on the NNRD, second whether all of the episodes of care reported to PiPS were identified on the NNRD. For 2% of recruits into the PiPS trial no EPR data could be identified. Whether this was because of errors of the date of birth and NHS number on either the PiPS database or the NNRD or whether, which seems unlikely, the infants were never entered onto the EPR, is unclear. A further limitation is that the comparison of PiPS and NNRD data was confined to the hospitals participating in the PiPS trial in the South East of England (24 recruiting and 33 step-down sites) and may not be generalisable throughout the UK.

Despite these limitations we have shown that high quality, complete data can be extracted from the routinely collected electronic record and how with some minor changes to the EPR data collection the accuracy of recording of processes, intervention and outcomes within the NNRD could be improved. As electronic records become widely incorporated into daily care and replace paper records, it is expected that data quality will continue to improve. The creation of a static database such as the NNRD, from real-time electronic data is a cost-effective means to create a national resource, obviating the need for duplicate data capture by busy clinical teams, and supporting multiple outputs. The secondary utilities of EPR are increasingly recognised, with advantages that include minimising data entry errors, and better population coverage. The NNRD is now used for a growing number of purposes by a number of research groups, professional organisations and Government bodies [21]. The successful creation of the NNRD is a testament to the collaborative efforts of the UK neonatal community. The NNRD has the potential to revolutionise the approach to conducting clinical research, and offers a time and cost efficient method for conducting clinical trials and population epidemiological studies.

Supporting information

S1 Table. Items selected for comparison: Baseline characteristics, including details of the data held in each database, with pre-set definitions of limits of agreement, and minor and major discrepancies.

https://doi.org/10.1371/journal.pone.0201815.s001

(DOCX)

S2 Table. Items selected for comparison: Processes of care and interventions, including details of the data held in each database, with pre-set definitions of limits of agreement, and minor and major discrepancies.

https://doi.org/10.1371/journal.pone.0201815.s002

(DOCX)

S3 Table. Items selected for comparison: Processes of care and interventions in the first 14 days, including details of the data held in each database, with pre-set definitions of limits of agreement, and minor and major discrepancies.

https://doi.org/10.1371/journal.pone.0201815.s003

(DOCX)

S4 Table. Items selected for comparison: Outcomes, including details of the data held in each database, with pre-set definitions of limits of agreement, and minor and major discrepancies.

https://doi.org/10.1371/journal.pone.0201815.s004

(DOCX)

Acknowledgments

Medicines for Neonates Investigators: Zoe Chivers, Deborah Ashby, Peter Brocklehurst, Elizabeth Draper, Michael Goldacre, Azeem Majeed, Stavros Petrou, Andrew Wilkinson, Alys Young. We acknowledge the invaluable administrative assistance of Richard Colquhoun, Surbhi Shah and clinical teams from all contributing neonatal units (listed below with the current lead clinician for each unit).

Airedale General Hospital (Dr Matthew Babirecki), Alexandra Hospital (Dr Liza Harry), Arrowe Park Hospital (Dr Oliver Rackham), Barnet Hospital (Dr Tim Wickham), Barnsley District General Hospital (Dr Sanaa Hamdan), Basildon Hospital (Dr Aashish Gupta), Basingstoke & North Hampshire Hospital (Dr Ruth Wigfield), Bassetlaw District General Hospital (Dr L M Wong), Bedford Hospital (Dr Anita Mittal), Birmingham City Hospital (Dr Julie Nycyk), Birmingham Heartlands Hospital (Dr Phil Simmons), Birmingham Women's Hospital (Dr Anju Singh), Bradford Royal Infirmary (Dr Sunita Seal), Broomfield Hospital, Chelmsford (Dr Ahmed Hassan), Calderdale Royal Hospital (Dr Karin Schwarz), Chelsea & Westminster Hospital (Dr Mark Thomas), Chesterfield & North Derbyshire Royal Hospital (Dr Aiwyne Foo), Colchester General Hospital (Dr Aravind Shastri), Conquest Hospital (Dr Graham Whincup), Countess of Chester Hospital (Dr Stephen Brearey), Croydon University Hospital (Dr John Chang), Cumberland Infirmary (Dr Khairy Gad), Darent Valley Hospital (Dr Abdul Hasib), Darlington Memorial Hospital (Dr Mehdi Garbash), Derriford Hospital (Dr Alex Allwood), Diana Princess of Wales Hospital (Dr Pauline Adiotomre), Doncaster Royal Infirmary (Dr Jamal S Ahmed), Dorset County Hospital (Dr Abby Deketelaere), East Surrey Hospital (Dr K Abdul Khader), Epsom General Hospital (Dr Ruth Shephard), Frimley Park Hospital (Dr Abdus Mallik), Furness General Hospital (Dr Belal Abuzgia), George Eliot Hospital (Dr Mukta Jain), Gloucester Royal Hospital (Dr Simon Pirie), Good Hope Hospital (Dr Phil Simmons), Great Western Hospital (Dr Stanley Zengeya), Guy's & St Thomas' Hospital (Dr Timothy Watts), Harrogate District Hospital (Dr C Jampala), Hereford County Hospital (Dr Cath Seagrave), Hillingdon Hospital (Dr Michele Cruwys), Hinchingbrooke Hospital (Dr Hilary Dixon), Homerton Hospital (Dr Narendra Aladangady), Hull Royal Infirmary (Dr Hassan Gaili), Ipswich Hospital (Dr Matthew James), James Cook University Hospital (Dr M Lal), James Paget Hospital (Dr Ambadkar), Kettering General Hospital (Dr Patty Rao), Kings College Hospital (Dr Ann Hickey), King's Mill Hospital (Dr Dhaval Dave), Kingston Hospital (Dr Vinay Pai), Lancashire Women and Newborn Centre (Dr Meera Lama), Leeds Neonatal Service (Dr Lawrence Miall), Leicester General Hospital (Dr Jonathan Cusack), Leicester Royal Infirmary (Dr Venkatesh Kairamkonda), Leighton Hospital (Dr Jayachandran), Lincoln County Hospital (Dr Kollipara), Lister Hospital (Dr J Kefas), Liverpool Women's Hospital (Dr Bill Yoxall), Luton & Dunstable Hospital (Dr Jennifer Birch), Macclesfield District General Hospital (Dr Gail Whitehead), Manor Hospital (Dr Krishnamurthy), Medway Maritime Hospital (Dr Aung Soe), Milton Keynes General Hospital (Dr I Misra), New Cross Hospital (Dr Tilly Pillay), Newham General Hospital (Dr Imdad Ali), Norfolk & Norwich University Hospital (Dr Mark Dyke), North Devon District Hospital (Dr Michael Selter), North Manchester General Hospital (Dr Nagesh Panasa), North Middlesex University Hospital (Dr Lesley Alsford), North Tyneside General Hospital (Dr Vivien Spencer), Northampton General Hospital (Dr Subodh Gupta), Northwick Park Hospital (Dr Richard Nicholl), Nottingham City Hospital (Dr Steven Wardle), Nottingham University Hospital (QMC) (Dr Steven Wardle), Ormskirk District General Hospital (Dr Tim McBride), Oxford University Hospitals, Horton Hospital (Dr Naveen Shettihalli), Oxford University Hospitals, John Radcliffe Hospital (Dr Eleri Adams), Peterborough City Hospital (Dr Seif Babiker), Pilgrim Hospital (Dr Margaret Crawford), Pinderfields General Hospital (Pontefract General Infirmary) (Dr David Gibson), Poole General Hospital (Prof Minesh Khashu), Princess Alexandra Hospital (Dr Caitlin Toh), Princess Anne Hospital (Dr Mike Hall), Princess Royal Hospital (Dr P Amess), Princess Royal University Hospital (Dr Elizabeth Sleight), Queen Alexandra Hospital (Dr Charlotte Groves), Queen Charlotte's Hospital (Dr Sunit Godambe), Queen Elizabeth Hospital, Gateshead (Dr Dennis Bosman), Queen Elizabeth Hospital, King's Lynn (Dr Glynis Rewitzky), Queen Elizabeth Hospital, Woolwich (Dr Olutoyin Banjoko), Queen Elizabeth the Queen Mother Hospital (Dr N Kumar), Queen's Hospital, Burton on Trent (Dr Azhar Manzoor), Queen's Hospital, Romford (Dr Wilson Lopez), Rosie Maternity Hospital, Addenbrookes (Dr Angela D'Amore), Rotherham District General Hospital (Dr Shameel Mattara), Royal Albert Edward Infirmary (Dr Christos Zipitis), Royal Berkshire Hospital (Dr Peter De Halpert), Royal Bolton Hospital (Dr Paul Settle), Royal Cornwall Hospital (Dr Paul Munyard), Royal Derby Hospital (Dr John McIntyre), Royal Devon & Exeter Hospital (Dr David Bartle), Royal Hampshire County Hospital (Dr Katie Yallop), Royal Lancaster Infirmary (Dr Joanne Fedee), Royal Oldham Hospital (Dr Natasha Maddock), Royal Preston Hospital (Dr Richa Gupta), Royal Shrewsbury Hospital (Dr Deshpande), Royal Stoke University Hospital (Dr Alison Moore), Royal Surrey County Hospital (Dr Charles Godden), Royal Sussex County Hospital (Dr P Amess), Royal United Hospital (Dr Stephen Jones), Royal Victoria Infirmary (Dr Alan Fenton), Russells Hall Hospital (Dr Mahadevan), Salisbury District Hospital (Dr Nick Brown), Scarborough General Hospital (Dr Kirsten Mack), Scunthorpe General Hospital (Dr Pauline Adiotomre), South Tyneside District Hospital (Dr Rob Bolton), Southend Hospital (Dr Arfa Khan), Southmead Hospital (Dr Paul Mannix), St George's Hospital (Dr Charlotte Huddy), St Helier Hospital (Dr Salim Yasin), St Mary's Hospital, Isle of Wight (Dr Sian Butterworth), St Mary's Hospital, London (Dr Sunit Godambe), St Mary's Hospital, Manchester (Dr Ngozi Edi-Osagie), St Michael's Hospital (Dr Pamela Cairns), St Peter's Hospital (Dr Peter Reynolds), St Richard's Hospital (Dr Nick Brennan), Stepping Hill Hospital (Dr Carrie Heal), Stoke Mandeville Hospital (Dr Sanjay Salgia), Sunderland Royal Hospital (Dr Majd Abu-Harb), Tameside General Hospital (Dr Jacqeline Birch), Taunton & Somerset Hospital (Dr Chris Knight), The Jessop Wing, Sheffield (Dr Simon Clark), The Royal Free Hospital (Dr V Van Sommen), The Royal London Hospital—Constance Green (Dr Vadivelam Murthy), Torbay Hospital (Dr Siba Paul), Tunbridge Wells Hospital (Dr Hamudi Kisat), University College Hospital (Dr Giles Kendall), University Hospital Coventry (Dr Kate Blake), University Hospital Lewisham (Dr Jauro Kuna), University Hospital of North Durham (Dr Mehdi Garbash), University Hospital of North Tees (Dr Hari Kumar), University Hospital of South Manchester (Dr Gopi Vemuri), Victoria Hospital, Blackpool (Dr Chris Rawlingson), Warrington Hospital (Dr Delyth Webb), Warwick Hospital (Dr Bird), Watford General Hospital (Dr Sankara Narayanan), West Cumberland Hospital (Dr Jason Gane), West Middlesex University Hospital (Dr Elizabeth Eyre), West Suffolk Hospital (Dr Ian Evans), Wexham Park Hospital (Dr Rekha Sanghavi), Whipps Cross University Hospital (Dr Caroline Sullivan), Whiston Hospital (Dr Laweh Amegavie), Whittington Hospital (Dr Wynne Leith), William Harvey Hospital (Dr Vimal Vasu), Worcestershire Royal Hospital (Dr Andrew Gallagher), Worthing Hospital (Dr Katia Vamvakiti), Yeovil District Hospital (Dr Megan Eaton), York District Hospital (Dr Guy Millman)

References

  1. 1. Statnikov Y, Ibrahim B, Modi N. A systematic review of administrative and clinical databases of infants admitted to neonatal units. Archives of Disease in Childhood—Fetal and Neonatal Edition. 2017. pmid:28087722
  2. 2. ISB. Neonatal Data Set ISB 1595 Specification 2014 [cited 14th April 2016]. Available from: http://webarchive.nationalarchives.gov.uk/+/http://www.isb.nhs.uk/documents/isb-1595/amd-32-2012/1595322012spec.pdf.
  3. 3. Information Standards Board for Health and Social Care. Information Standards Board for Health and Social Care (ISB) 2014 [cited 2016 October ]. Available from: http://webarchive.nationalarchives.gov.uk/+/http://www.isb.nhs.uk/library/standard/260.
  4. 4. Watson SI, Arulampalam W, Petrou S, Marlow N, Morgan AS, Draper ES, et al. The effects of designation and volume of neonatal care on mortality and morbidity outcomes of very preterm infants in England: retrospective population-based cohort study. BMJ Open. 2014;4(7). pmid:25001393
  5. 5. Battersby C, Longford N, Mandalia S, Costeloe K, Modi N. Incidence and enteral feed antecedents of severe neonatal necrotising enterocolitis across neonatal networks in England, 2012–3: a whole-population surveillance study. The Lancet Gastroenterology & Hepatology. 2017;2(1):43–51.
  6. 6. Battersby C, Longford N, Costeloe K, Modi N. Development of a Gestational Age-Specific Case Definition for Neonatal Necrotizing Enterocolitis. JAMA Pediatr. 2017;3(10).
  7. 7. Battersby C, Santhakumaran S, Upton M, Radbone L, Birch J, Modi N. The impact of a regional care bundle on maternal breast milk use in preterm infants: outcomes of the East of England quality improvement programme. Arch Dis Child Fetal Neonatal Ed. 2014;99(5):2013–305475.
  8. 8. Wong HS, Santhakumaran S, Statnikov Y, Gray D, Watkinson M, Modi N, et al. Retinopathy of prematurity in English neonatal units: a national population-based analysis using NHS operational data. Archives of Disease in Childhood—Fetal and Neonatal Edition. 2013. pmid:24361602
  9. 9. Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the General Practice Research Database: a systematic review. Br J Clin Pharmacol. 2010;69(1):4–14. pmid:20078607
  10. 10. Office for National Statistics. Birth Summary Tables- England and Wales 2015 [cited 2015 12th December ]. Available from: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/birthsummarytables.
  11. 11. Costeloe K, Hardy P, Juszczak E, Wilks M, Millar MR. Bifidobacterium breve BBG-001 in very preterm infants: a randomised controlled phase 3 trial. The Lancet. 2015.
  12. 12. Neonatal Data Analysis Unit. Neonatal Data Set 2014 [cited 2016 October ]. Available from: https://www1.imperial.ac.uk/resources/7567FFEA-2D3D-493C-A959-46BB76794CEE/neonataldatasetisb1595release1version22_circulationversion1_appendix_one.pdf.
  13. 13. Royal College of Paediatrics and Child Health. National Neonatal Audit Programme 2014. Available from: http://www.rcpch.ac.uk/system/files/protected/page/NNAP%20report%20updated%2004.11.15WEB.pdf.
  14. 14. Office for National Statistics Census. LSOA 2011. Available from: https://www.ons.gov.uk/methodology/geography/ukgeographies/censusgeography.
  15. 15. NHS Data Dictionary. Lower Layer Super Output Area Available from: https://www.datadictionary.nhs.uk/data_dictionary/nhs_business_definitions/l/lower_layer_super_output_area_de.asp?shownav=1.
  16. 16. NHS Information Standards. NHS data dictionary Available from: https://www.datadictionary.nhs.uk/data_dictionary/nhs_business_definitions/l/lower_layer_super_output_area_de.asp?shownav=1.
  17. 17. Agresti A, Coull BA. Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions. The American Statistician. 1998;52(2):119–26.
  18. 18. Costeloe KL, Hennessy EM, Haider S, Stacey F, Marlow N, Draper ES. Short term outcomes after extreme preterm birth in England: comparison of two birth cohorts in 1995 and 2006 (the EPICure studies). BMJ. 2012;4(345).
  19. 19. Ford JB, Roberts CL, Algert CS, Bowen JR, Bajuk B, Henderson-Smart DJ, et al. Using hospital discharge data for determining neonatal morbidity and mortality: a validation study. BMC Health Services Research. 2007;7:188–. PubMed PMID: PMC2216019. pmid:18021458
  20. 20. Khan NF, Harrison SE, Rose PW. Validity of diagnostic coding within the General Practice Research Database: a systematic review. Br J Gen Pract. 2010;60(572):e128–36. Epub 2010/03/06. pmid:20202356; PubMed Central PMCID: PMC2828861.
  21. 21. Gale C, Morris I. The UK National Neonatal Research Database: using neonatal data for research, quality improvement and more. Arch Dis Child Educ Pract Ed. 2016;101(4):216–8. pmid:26968617