Article Text

Download PDFPDF

Performance of a postnatal metabolic gestational age algorithm: a retrospective validation study among ethnic subgroups in Canada
  1. Steven Hawken1,2,3,
  2. Robin Ducharme3,
  3. Malia S Q Murphy1,
  4. Katherine M Atkinson1,4,
  5. Beth K Potter1,3,2,
  6. Pranesh Chakraborty5,6,
  7. Kumanan Wilson1,3,2,7
  1. 1 Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada
  2. 2 School of Epidemiology, Public Health and Preventative Medicine, University of Ottawa, Ottawa, Ontario, Canada
  3. 3 uOttawa, Institute for Clinical Evaluative Sciences, Ottawa, Ontario, Canada
  4. 4 Department of Public Health Sciences, Karolinska Institute, Stockholm, Sweden
  5. 5 Department of Paediatrics, University of Ottawa, Ottawa, Ontario, Canada
  6. 6 Newborn Screening Ontario, Children’s Hospital of Eastern Ontario, Ottawa, Ontario, Canada
  7. 7 Department of Medicine, University of Ottawa, Ottawa, Ontario, Canada
  1. Correspondence to Dr Kumanan Wilson; kwilson{at}


Objectives Biological modelling of routinely collected newborn screening data has emerged as a novel method for deriving postnatal gestational age estimates. Validation of published models has previously been limited to cohorts largely consisting of infants of white Caucasian ethnicity. In this study, we sought to determine the validity of a published gestational age estimation algorithm among recent immigrants to Canada, where maternal landed immigrant status was used as a surrogate measure of infant ethnicity.

Design We conducted a retrospective validation study in infants born in Ontario between April 2009 and September 2011.

Setting Provincial data from Ontario, Canada were obtained from the Institute for Clinical Evaluative Sciences.

Participants The dataset included 230 034 infants born to non-landed immigrants and 70 098 infants born to immigrant mothers. The five most common countries of maternal origin were India (n=10 038), China (n=7468), Pakistan (n=5824), The Philippines (n=5441) and Vietnam (n=1408). Maternal country of origin was obtained from Citizenship and Immigration Canada’s Landed Immigrant Database.

Primary and secondary outcome measures Performance of a postnatal gestational age algorithm was evaluated across non-immigrant and immigrant populations.

Results Root mean squared error (RMSE) of 1.05 weeks was observed for infants born to non-immigrant mothers, whereas RMSE ranged from 0.98 to 1.15 weeks among infants born to immigrant mothers. Area under the receiver operating characteristic curve for distinguishing term versus preterm infants (≥37 vs <37 weeks gestational age or >34 vs ≤34 weeks gestational age) was 0.958 and 0.986, respectively, in the non-immigrant subgroup and ranged from 0.927 to 0.964 and 0.966 to 0.99 in the immigrant subgroups.

Conclusions Algorithms for postnatal determination of gestational age may be further refined by development and validation of region or ethnicity-specific models. However, our results provide reassurance that an algorithm developed from Ontario-born infant cohorts performs well across a range of ethnicities and maternal countries of origin without modification.

  • gestational age
  • prediction modelling
  • newborn screening

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This validation study has successfully demonstrated that a gestational age estimation algorithm performs well across infants from diverse backgrounds without modification.

  • Population-based design: The validation cohort included 300 132 live-born infants born between April 2009 and September 2011 who underwent newborn screening at Newborn Screening Ontario.

  • Defining ethnic subpopulations: Landed immigrant status, rather than self-reported ethnicity, was used as a surrogate measure for identification of ethnic subpopulations.

  • Model development: The model demonstrates poorer performance among more severely preterm infants, in part due to smaller numbers of preterm infants available for model development.


Knowledge of gestational age at the time of birth is vital for ensuring adequate provision of newborn care and for assessing population-level estimates of the burden of preterm birth to guide allocation of health services resources and targeted global health initiatives.1 2 In jurisdictions with challenging socioeconomic conditions and/or limited access to antenatal care and ultrasound dating technology due to rurality, determination of gestational age can be challenging. Other antenatal dating methods, including last menstrual period and fundal height measurements, are hampered by poor recall history and a high prevalence of low birth-weightl. Where prenatal estimations are unavailable or unreliable, a variety of standardised fetal assessments have been developed for clinicians seeking to determine fetal maturation after birth. Commonly used postnatal assessments that score infants on neurological and physical criteria are only accurate to within 3–4 weeks of true, ultrasound-validated gestational age.3 World health and philanthropic organisations are now seeking novel ways of determining gestational age at the time of birth, both to improve individual care and to provide reliable, high-quality data for population surveillance.

Secondary analysis of newborn screening samples routinely collected within the first few days of an infant’s life has emerged as a unique opportunity for postnatal gestational age assessment. We and others have recently demonstrated the accuracy of postnatal gestational age algorithms derived from newborn screening data in three independent North American cohorts.4–6 A significant limitation of the approaches published to date has been the predominance of white infants in the populations used for model validation. Metabolic profiles are subject to biological variation as a result of in utero environmental exposure,7 and recent work suggests that ethnic diversity within a population needs to be considered when establishing newborn screening reference intervals for some conditions.8 This study sought to validate a postnatal gestational age metabolic model in ethnic subpopulations in Ontario, Canada.

Materials and methods

Study design

We conducted a population-based retrospective validation study of infants born in Ontario, Canada using a combination of linked health administrative, newborn screening and immigration datasets maintained by the Institute for Clinical Evaluative Sciences. The Newborn Screening Ontario (NSO) database includes the analyte profiles of each infant completing newborn screening in the province (>99% of all infants born in Ontario). Over 40 screening analytes and analyte ratios including acylcarnitines, amino acids, endocrine markers, enzymes and coenzyme markers among others are available in the NSO database (table 1). Maternal country of origin was used as a surrogate marker of infant ethnicity and was ascertained from Citizenship and Immigration Canada’s Landed Immigrant Database. This study was approved by the Ottawa Health Science Network Research Ethics Board, Ottawa, Canada (20140724-01H).

Table 1

Newborn screening analytes used in model development

Original model

The postnatal gestational age estimation model previously published by our group included 249 700 infants born between April 2007 and March 2009. The details of the original model have been described previously.9 Briefly, infants who were identified as positive for any disorder screened for by NSO were excluded, as were infants with unsatisfactory samples, and missing gestational age and birth weight. Because complete metabolite profiles were required to score new observations, infants with missing analyte data or other covariates including gestational age, were excluded from this validation exercise. The infants excluded due to missing covariates constituted <5% of the cohort. Gestational age was based on best obstetrical estimate (last menstrual period, dating ultrasound or a combination). It is to be noted that >99% of women in Ontario receive at least one ultrasound during the course of pregnancy.10

The data were randomly partitioned into model development (50%), validation (25%) and test (25%) subsamples. Multiple linear regression was performed in the model development set, in which all analyte main effects were included in the model, as well as birth weight and sex. For analytes and birth weight, squared and cubic terms were also included. A stepwise variable selection algorithm was then conducted and all pairwise interactions were considered for inclusion. The Schwartz Bayesian Criterion, which rewards improved model fit and penalises model complexity, was used in variable selection. Once this process was complete, the mean square error (MSE) for the fitted model at each step was calculated in the independent validation sample subset, and the model with the smallest MSE was selected. This approach provides a high level of protection from overfitting in the final model.9 Parameter estimates for the fitted model were fixed, and used to score (ie, estimate gestational age) in the independent test sample subset (n=62 434) and model performance characteristics (r-square, MSE, proportion with observed vs estimated gestational age within 1 week) were calculated. The final regression model included a total of 311 parameter estimates, including main effects, quadratic and cubic effects plus interaction terms.

The deviation of each calculated gestational age from the true gestational age of each infant is the residual model error for that infant (in unit, weeks). The residual model error can be positive or negative depending on the direction of the difference. The MSE is the mean of each of those residual errors after squaring it (also rendering all values positive) for all infants. MSE is presented in the units of weeks.2 Taking the square root of the MSE yielded an overall ‘average deviation’ in unit, weeks.

Gestational age was then dichotomised to distinguish term or preterm infant subgroups. Model performance to correctly classify infants across two thresholds of preterm birth categories by logistic regression analysis was assessed (area under the receiver operating characteristic curve (AUC), sensitivity, specificity and positive predictive value). Preterm birth thresholds were <37 weeks gestational age, the clinical threshold for preterm birth, and ≤34 weeks gestational age, a threshold that represents the lower limit of the late preterm period. Infants born ≤34 weeks gestational age (early preterm and severe preterm infants) are at increased health risk compared with those born late preterm or at term. Estimated gestational age identified by multiple linear regression was used as a continuous independent variable in logistic regressions to determine the probability of preterm birth.9

Validation cohort

The validation cohort included 300 132 live-born infants born between April 2009 and September 2011 who underwent newborn screening at NSO. A total of 348 098 records were available for the prescribed study period. Records with missing or implausible (data entry error) data on birth weight, sex and gestational age were removed from the analysis. Infants who received a ‘positive’ flag for any one of the newborn screening conditions were excluded. Also excluded were those with missing newborn screening analyte values, or those for which the samples were collected before 24 hours and after 7 days of birth. Figure 1 summarises the logic used to create the study cohort. The validation cohort was independent from the data used in the development of the original model.

Figure 1

Cohort creation strategy.

Model validation

The coefficients from the reference linear regression model were fixed, and used to score the validation cohort. Calculated gestational age was used as the independent variable in logistic regression models to estimate the dichotomous categories of preterm birth, for which model performance characteristics including AUC, sensitivity, specificity and positive predictive value were calculated. Confidence Intervals for AUC were also calculated, using the approach described previously by Hanley and McNeil.11 SAS V.9.4 was used for all statistical analyses.


The validation dataset included 230 034 infants born to non-immigrant mothers and 70 098 born to immigrant mothers. The five most common countries of maternal birth were India (n=10 038), China (n=7468), Pakistan (n=5824), The Philippines (n=5441) and Vietnam (n=1408). The most common countries from the African continent were Somalia (n=833) and Nigeria (n=800), respectively. Descriptive characteristics of the validation cohort are provided in table 2.

Table 2

Cohort characteristics

Overall model performance characteristics for non-immigrant populations and the top eight landed immigrant subgroups are presented in table 3. Model performance characteristics from the original validation cohort are provided for comparison. Absolute gestational age estimation was within 1 week in continuous linear regression models; root mean squared error  ranged from 0.98 weeks (maternal Indian heritage) to 1.15 weeks (maternal Somalian heritage).

Table 3

Model performance comparison for original validation study and new validation in immigrant and non-immigrant subgroups

Model performance among preterm infants is provided in table 4. Among non-immigrants, our algorithm performed comparably to our original validation with AUC for classifying infants as term versus preterm; ≥37 vs <37 weeks gestational age, AUC=0.958 and >34 vs ≤34 weeks gestational age, AUC=0.986.

Table 4

Comparison of model performance for predicting preterm status

Our gestational age estimation model was able to discriminate between dichotomous preterm birth categories of immigrant infants with robust precision. For discrimination of ≥37 vs <37 weeks gestation, AUC ranged from a 0.927 among infants of maternal Somalian descent to 0.964 among infants of maternal Nigerian heritage. Similarly, the model was able to discriminate well between >34 weeks and ≤34 weeks gestational age, with AUC ranging from 0.966 for Nigerian and Bangladeshi infants to 0.994 among Filipino infants.


This study validates a postnatal gestational age estimation model in subgroups of infants born to immigrant mothers living in Ontario, Canada. This work demonstrates reasonable performance of our previously published model to determine gestational age in infants born to immigrants of diverse countries of origin. Our findings provide proof-of-principle that metabolic modelling strategies couldbe robust in a variety of international infant cohorts.

The strength of our approach lies in our ability to use population-based datasets to aggregate health and administrative data with ample sample size to evaluate model performance metrics across infant subgroups. In lieu of self-reported race or ethnicity, which were unavailable for our analyses, use of the Canadian Landed Immigrant Database provided data on infants born to immigrant mothers from a diverse range of countries. We acknowledge that naturalised immigrants living in North America may not be representative of individuals living in the country of origin, however, either due to admixture (ie, inter-racial families) or environmental factors (eg, climate, socioeconomic status, diet, sanitation). Indeed, whether subtle variation in model performance in specific immigrant subgroups, such as those from Bangladesh and Somalia, indicate a degree of biological or environmental influence warrants further consideration.

As we work towards evaluating our model in other infant populations, we must consider existing limitations. First, our original approach included a relatively small sample size of preterm infants for model development, which is reflected in decreased accuracy of the model among the most severely preterm infants.4 We are now working to refine our model to optimise performance across all gestational age categories. Overall precision of metabolic dating methods to within 1–2 weeks compares favourably to other commonly used postnatal dating methods, the accuracy of which vary widely (3–4 weeks)3 12 13 depending on the method, level of training of the specialist performing the measurements and if the child is small for gestational age. Although limited sample size prevented evaluation of the model among small-for gestational age or low birthweight infants, recent work from our group has demonstrated comparable accuracy among this infant subpopulation.14 This is particularly important when considering the potential to implement metabolic dating tools in lowand middle income countries given the prevalence of low birthweight infants in low-resource communities. In addition to continuous estimates, provision of gestational age estimates across dichotomous thresholds (eg, 37 weeks gestational age) may be useful for the purpose of population surveillance. Second, currently published models are complex and require data on a large number of analytes measured by mass spectrometry. Ultimately, simplification of models to reduce the number of metabolic variables while maintaining model performance will be required to streamline the approach for scalable, cost-effective applications. An ideal model would include analytes that may be reliably measured from samples obtained immediately after birth (ie, cord blood), and those that are stable through weeks to months of appropriate storage prior to analysis.14

In summary, our results provide reassurance that an algorithm developed from an Ontario-based population performs consistently well across a range of ethnicities. Ultimately, further validation studies will be required to evaluate the performance of this and other postnatal dating models in infants born and living across a range of international settings to determine if a single global algorithm or multiple regional algorithms should be derived.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.


  • Contributors SH and RD were involved in data acquisition and statistical analysis. SH, MSQM and KW drafted and edited the manuscript. KMA and MSQM provided project coordination. BKP and PC critically edited the manuscript for important intellectual content. KW was responsible for the conceptual design of the study.

  • Funding This work was supported by The Bill & Melinda Gates Foundation [OPP1141535]. It was also supported by the Institute for Clinical Evaluative Sciences (ICES), which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC).

  • Competing interests All authors have completed the ICMJE uniform disclosure form at and declare: the authors had financial support from The Bill & Melinda Gates Foundation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work. The opinions, results and conclusions reported in this paper are those of the authors and are independent from the funding sources. No endorsement by ICES or the Ontario MOHLTC is intended or should be inferred. Parts of this material are based on data and information compiled and provided by the Canadian Institute of Health Information (CIHI). However, the analyses, conclusions, opinions and statements expressed herein are those of the author, and not necessarily those of CIHI.

  • Ethics approval Ottawa Health Science Network Research Ethics Board.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement All data used in this study were obtained from the Institute for Clinical Evaluative Sciences and are accessible to individuals with appropriate authorisation.