INTERGROWTH-21st Project international INTER-NDA standards for child development at 2 years of age: an international prospective population-based study

Objectives To describe the construction of the international INTERGROWTH-21st Neurodevelopment Assessment (INTER-NDA) standards for child development at 2 years by reporting the cognitive, language, motor and behaviour outcomes in optimally healthy and nourished children in the INTERGROWTH-21st Project. Design Population-based cohort study, the INTERGROWTH-21st Project. Setting Brazil, India, Italy, Kenya and the UK. Participants 1181 children prospectively recruited from early fetal life according to the prescriptive WHO approach, and confirmed to be at low risk of adverse perinatal and postnatal outcomes. Primary measures Scaled INTER-NDA domain scores for cognition, language, fine and gross motor skills and behaviour; vision outcomes measured on the Cardiff tests; attentional problems and emotional reactivity measured on the respective subscales of the preschool Child Behaviour Checklist; and the age of acquisition of the WHO gross motor milestones. Results Scaled INTER-NDA domain scores are presented as centiles, which were constructed according to the prescriptive WHO approach and excluded children born preterm and those with significant postnatal/neurological morbidity. For all domains, except negative behaviour, higher scores reflect better outcomes and the threshold for normality was defined as ≥10th centile. For the INTER-NDA’s cognitive, fine motor, gross motor, language and positive behaviour domains these are ≥38.5, ≥25.7, ≥51.7, ≥17.8 and ≥51.4, respectively. The threshold for normality for the INTER-NDA’s negative behaviour domain is ≤50.0, that is, ≤90th centile. At 22–30 months of age, the cohort overlapped with the WHO motor milestone centiles, showed low postnatal morbidity (<10%), and vision outcomes, attentional problems and emotional reactivity scores within the respective normative ranges. Conclusions From this large, healthy and well-nourished, international cohort, we have constructed, using the WHO prescriptive methodology, international INTER-NDA standards for child development at 2 years of age. Standards, rather than references, are recommended for population-level screening and the identification of children at risk of adverse outcomes.


I. Brief Summary Charitable Purpose:
To develop new "prescriptive" standards describing normal fetal growth and newborn nutritional status in eight geographically diverse populations, and to relate these standards to neonatal health risk. The worldwide use of these tools should improve infants' health and nutritional status.

Project Description:
The project aims to develop scientifically robust clinical tools to assess fetal growth and the nutritional status of newborn infants, as adjuncts to the recently produced WHO charts for children aged 0 to 5. These will be incorporated into national and international maternal and neonatal programs, and be used to monitor and evaluate maternal wellbeing, infant health and nutrition at a population level.
To achieve these objectives, primary data will be collected on a population-based sample of healthy pregnant women. The tools will describe how fetuses and newborns should grow in all countries rather than the more limited objective of past growth references which describe how they have grown at specific times and locations. They will allow for evidence-based evaluation of nutritional status at birth and measurement of the impact of preventive and treatment interventions in the community. 10/116

V. Executive Summary
At least 60% of the 4 million neonatal deaths that occur worldwide every year are associated with low birth weight (LBW), caused by intrauterine growth restriction (IUGR), preterm delivery, and genetic/chromosomal abnormalities (1), demonstrating that undernutrition is already a leading health problem at birth. Accurate assessment of fetal growth and gestational age for timely identification and management of growth restriction, risk assessment of undernourished newborns, and monitoring LBW trends are therefore public health priorities, especially in developing countries where 98% of all neonatal deaths occur. The long-term health implications of these conditions are now well recognized for both developed and developing countries.
Traditionally, fetal nutritional status has been assessed by measurement of uterine height in pregnancy, which can be used in first level screening but needs more precision for suspected IUGR. Birth weight, as the final point of intrauterine growth, is also used but does not adequately describe the fetal growth process. Furthermore, the newborn standards presently recommended by WHO are based on a population of births from California, USA, in the 1970s and are unlikely to be a suitable international reference in the 21 st century (2). This is a particular problem for preterm babies, that have different growth patterns to fetuses still in utero, and for whom there are no high quality postnatal growth standards, i.e. using the prescriptive approach of the new WHO infant and child growth standards (3).
In the developed world and in urban areas in most developing countries, especially if complications are suspected, fetal growth is assessed by comparing ultrasound measurements of fetal size with reference percentiles obtained from fetuses whose growth was assumed to be normal. Unfortunately, most of the ultrasound charts of size by gestational age were obtained from very small populations of fetuses in the USA or Europe, and may not be appropriate for use internationally.
These limitations have generated concern regarding the value of ultrasound in routine clinical care particularly late in pregnancy, as well as when newborn standards are used to determine the health status of populations or monitor trends and progress in reducing undernutrition. For example, the WHO Expert Committee on "Physical Status: the use and interpretation of anthropometry" (1995) recommended the development of international standards for both fetal and newborn growth, as well as child growth (4). The latter were recently produced by WHO and are being implemented worldwide; the former are still in the research priority agenda (5). We now propose in this project to extend the WHO work by constructing a new set of international Fetal and Newborn Growth Standards. It will be the first, population based, large, multi-ethnic, longitudinal, fetal growth standard based on early assessment of gestational age.

Goals and Objectives
The goal is to develop new "prescriptive" standards describing normal fetal and preterm neonatal growth over time and newborn nutritional status, and to relate these to neonatal health risk.
The primary objective is to produce a set of international Fetal and Newborn Growth Standards (fetal growth, birth weight for gestational age and postnatal growth of preterm infants) for practical applications in clinical use and for monitoring trends in populations. We will relate the new newborn standards to neonatal morbidity and mortality to identify levels of perinatal risk.
The secondary objectives are: a) Clinical: to develop a prediction model, based on multiple 2-dimensional (2D) ultrasound measurements, for estimating gestational age during mid-late pregnancy for use in populations of pregnant women without access to early/frequent antenatal care; b) Epidemiological: to investigate in this multi-ethnic, population based sample the determinants of LBW and its components (preterm delivery, impaired fetal growth) under current healthcare conditions, and 1. Fetal Growth Longitudinal Study (FGLS) from <14 +0 weeks gestational age to birth to monitor and measure fetal growth clinically (Symphyseal-fundal-height) and by ultrasound in a healthy population. 2. Preterm Postnatal Follow-up Study (PPFS) of preterm infants (>26 +0 but <37 +0 weeks) in the FGLS to describe their postnatal growth pattern. 3. Newborn Cross-sectional Study (NCSS) of all newborns at the study centres over 12 months, obtaining anthropometric measures and neonatal morbidity and mortality rates.
Eight geographically diverse populations will participate, covering North and South America, Europe, Africa, Western Asia and the Indian Subcontinent. Each will contribute a population-based sample of healthy pregnant women expected to provide full growth potential to their fetuses, making it possible to produce prescriptive standards, similar to the WHO Child Growth Standards. Participating sites will contribute to all the components of the project.
The study populations obtained from these geographically defined areas (e.g. city or county) should have no socio-economic constraints on growth, low morbidity and perinatal mortality and adequate nutritional status. To be included, women should be non-smokers, with a normal pregnancy history, and without health problems likely to influence fetal growth or indicate a risk for pregnancy-related pathological conditions. In FGLS, women will be screened before 14 +0 weeks at the time of their first antenatal visit and followed-up with standard clinical and 2D ultrasound examinations every five weeks, i.e. up to six times during pregnancy. In PPFS, preterm infants (>26 +0 but <37 +0 weeks) born from this sample will be followed-up during the first 8 months of life with the same protocol and set of anthropometric measures that were used in the WHO Child Growth Study. Postnatal growth will be evaluated from both delivery and conception for comparison with the corresponding in utero measurements. In the NCSS study, all newborns at the study centres, born during a fixed (e.g. 12 month) period, will have anthropometric measurements taken immediately after birth. Only babies born to women who meet the same inclusion criteria used in FGLS will be selected to construct the newborn standards. Birth weight and gestational age will also be related to neonatal morbidity and mortality outcomes to construct risk-related newborn weight for gestational age standards. Approximately 5,000 pregnancies will be included in FGLS allowing for ~ 350 preterm infants to be included in PPFS. We expect ~500 women will have to be recruited at each study site to achieve the required number of participants. In NCSS, ~50,000 newborns will be recruited, of whom we expect close to 75% will be eligible for the standards yielding ~1,800 cases with severe newborn outcomes.
Standard quality control measures will be included, as well as a unique system of random evaluation and repetition of ultrasound measurements (from stored images) to monitor validity and reliability, and continuous real time assessment of all data collected. A limited number of experienced obstetric ultrasonographers and neonatal anthropometrists will form the research teams at each study site. Neonatal anthropometric measures will be monitored and standardized centrally. All data will be entered and managed in an on-line data management system specifically developed for the study, including a system for direct transfer of blinded data from the ultrasound equipment to the database. This on-line system allows the initiation of data analysis soon after data collection is completed. Results from all populations will be pooled (if biologically relevant differences are not observed between sites) to construct the curves for international applications using the same analytical strategy as in the WHO Child Growth Standards. We expect that extensive secondary analyses will be conducted from these data that will make major contributions to science and clinical practice in the near future (Objectives II and III) and for many years to come. This study involves minimal risk to participants, no greater than at routine antenatal or clinical 13/116

A. Goals and Attributable Benefits
The goal is to develop new "prescriptive" standards describing normal fetal and preterm neonatal growth and newborn nutritional status, and relate these to perinatal health risk.
The anticipated attributable benefits of this project are that by 2014: a) The newly developed fetal growth standards will be incorporated into all obstetric ultrasound equipment produced worldwide. b) Preterm infants' growth will be evaluated using the new postnatal growth curves c) All newborns worldwide will be assessed using the new, risk related, international birth weight for gestational age standards.

B. Objectives
Primary objective: I. To produce a set of international Fetal and Newborn Growth Standards (fetal growth, birth weight for gestational age and postnatal growth of preterm infants) for practical applications in clinical use and for monitoring trends in populations. The newborn standards will then be related to perinatal morbidity and mortality to identify levels of risk and target interventions.

Secondary objectives:
II. To develop a prediction model, based on multiple 2D ultrasound measurements, for estimating gestational age during mid-late pregnancy for use in populations of pregnant women without access to early/frequent antenatal care. III. To determine in this multi-ethnic sample the determinants of LBW, as well as associated conditions, e.g. preterm delivery and impaired fetal growth and their subgroups, under current healthcare conditions. IV. To acquire additional 3D images to create an anatomical and growth databank of individual fetal organs as a unique source of biological information for future research.
To the best of our knowledge, this project is unique because of its comprehensiveness and scale (never attempted before). It has all the characteristics recommended in the literature for constructing ideal standards for international use (6,7): it is prescriptive, longitudinal, population-based, and gestational age will be established by ultrasound early in pregnancy. We will be able to explore associated research questions, such as possible ethnic differences in fetal/newborn growth, risk factors for LBW and its subgroups. We will also be able to predict gestational age in women with poor antenatal care and simultaneously provide a set of practical tools, which can be introduced into maternal and newborn care worldwide in a short space of time.
There is considerable demand from maternal and neonatal services to WHO and professional organizations for the standards that this study will produce (5). Hence, we have designed this project in consultation with experts in the field, as well as other research groups and institutions, in such a way that it could easily complement other yet to be developed studies, thereby reducing the risk for duplication. 14/116

VII. Project Design and Implementation Background
In April 2006, WHO released the WHO Child Growth Standards for children aged 0 and 5 which were generated by the WHO Multicentre Growth Reference Study (MGRS) (8). Two characteristics made the MGRS unique and unprecedented as a study in its field: 1) The study included populations from several countries (Brazil, Ghana, India, Norway, Oman and the USA) and 2) A prescriptive approach was used to select the study populations, i.e. only children from populations with minimal environmental constraints on growth were included. This was achieved by recruiting children of affluent and educated parents, because high education and family income have been identified as the environmental variables most likely to be associated with optimal child growth. In addition, chronic illness, failure to adhere to MGRS feeding recommendations and maternal smoking were used as exclusion criteria.
By virtue of these characteristics, the MGRS provided the strong scientific foundations for developing standards that indicate how children should grow, as opposed to previous studies that simply described actual patterns of growth at a particular time and place. Consequently, the WHO Child Growth Standards (8) are now being used worldwide to judge children's growth because they demonstrate how healthy children grow in an environment which allows them to achieve their full growth potential.
The proposed study extends the MGRS into fetal and neonatal life. It will be based on the same prescriptive approach and international representation will be ensured by including populations from several countries. The design, implementation and conduct of the study, and dissemination of the results, as well as their incorporation into clinical practice guidelines and health care policy, will build on what has been achieved with the WHO Child Growth Standards, ensuring continuity between the development and implementation at country level of pre-and postnatal growth standards.

Conceptual Issues
There are important conceptual differences between growth references and standards. References, traditionally regarded as descriptive, are used for comparing different populations, while standards are prescriptive, implying a value judgment of optimal growth to be followed by individual pregnancies. A number of developed, as well as some developing countries have their own national standards for obstetric and neonatal use. In general, it is accepted that there is some variation in growth patterns among children of different racial or ethnic groups; however, these variations are relatively small compared to the large, worldwide variation in growth related to health, nutrition and socio-economic status. Therefore, the WHO Expert Committee on Physical Status (4) argued that "for this reason, a common reference has the advantage of uniform application allowing international comparisons without losing the usefulness for local application". Such advantages outweigh the disadvantage of not taking into account racial and ethnic variations, if actually observed. There are also practical reasons for not developing local growth standards for each country: 1) Many populations in less developed areas experience growth deficits as a result of poor health and nutrition, so that a local reference will have less screening value for the detection of health and growth disorders; 2) Significant secular changes in growth status within a relatively short period of time may render local standards less useful for clinical screening, and 3) Proper reference development is not a task that can be done easily or repeatedly, and it is very expensive to develop local standards.
In addition, as this is a longitudinal study, we will be able to study both fetal size over time (most commonly referred to as growth charts, i.e. centiles of size conditional on age) and actual fetal growth, expressed by the rate of growth between time periods (velocity charts). 15

Rationale for International Population Based Standards
We believe that the current use of multiple local charts presents sufficient problems to warrant preparing simple, international standards. However, measuring fetuses and newborns from a single, ethnically homogeneous community would be inappropriate, as it would fail to reflect any worldwide variation in growth. Using multiple populations from several countries would enhance the diversity in the biological characteristics -such as parental size and maternal weight gain during pregnancy -as well as the socio-cultural factors influencing fetal growth. This approach would also be politically more acceptable than constructing the standards based on data obtained from a single country.
Furthermore, there have been suggestions that the so called "customized" birth weight standards including ultrasound-derived curves of intrauterine weight (based on estimated fetal weight) are better predictors of perinatal mortality than population-based birth weight standards similar to those we plan to produce in the newborn component of the study. We do not agree with this concept as the maternal variables included in "customized" charts, such as ethnicity, parity and maternal weight are not physiological characteristics, they are intermediate variables. There is now strong evidence, using perinatal mortality as the substantive outcome, that the benefit in prediction is not a consequence of the "customization" by maternal characteristics but rather an effect of the higher birth weight distribution obtained from ultrasound derived fetal weight charts. Therefore, these adjusted charts will identify more preterm infants as growth restricted and therefore predict mortality better because of a higher cut-off and not because they fit fetal growth patterns better (9). The proposed population-based newborn standards from a well nourished, healthy population should produce results similar to those obtained by the "customized" charts, and in addition will be related to perinatal morbidity and mortality in the same study population.

Introducing the New Standards to Pregnancy and Newborn Care
Antenatal care consists mostly of a series of screening tests of varied complexity, implemented at different levels of care that, as a group, contribute to evaluating the overall status of each pregnancy. Objective assessments of fetal growth deviations can play a major role in prenatal care, neonatal care and outcome-based research. We are not proposing to recommend routine serial ultrasound examinations for all pregnant women in either developed or developing countries (see (10) and NICE Recommendations 2008); rather that, when they have to be used because of a clinical condition, they are interpreted correctly. Our study will produce three, new, scientifically valid tools for use at different levels of care to complement other tools already in use. The new standards will facilitate the correct interpretation of ultrasound scanning at levels of care where it is already widely used e.g. referral and most urban hospitals in developing countries. It is anticipated therefore that the new charts will have a major impact on overall care: for example, they are likely to result in fewer unnecessary interventions, such as Caesarean sections. The newborn standards will be used at all levels of delivery including rural areas; the preterm standards will be used in NICUs and similar levels up to primary health care. As an additional contribution to primary health care, we will produce new, multi-ethnic, uterine fundal height charts. We believe that such a comprehensive approach is the most effective format of care for pregnancy and newborns in developing countries.

Overall Study Strategy
Our overall strategy has been to adopt a very pragmatic approach to the study. In other words, we have tried to incorporate the study procedures into routine clinical care, so as to inconvenience the mothers as little as possible and avoid disrupting service delivery. An additional advantage of this strategy is that it increases the likelihood that the study's recommendations will be implemented and introduced into real life settings.
Nevertheless, during the preparation and implementation of the study, there will be many instances when a clear-cut answer to a clinical problem will not exist. We plan, therefore, to adopt a policy of asking the Principal Investigators and other members of the Steering Committee to make decisions in such cases. This policy will broaden our knowledge base and create a genuine feeling of team membership, which is so important in carrying out complex, multi-centre studies. The decision-making process will be supported by an 16/116 independent Advisory Group, which will be asked to resolve strategic conflicts. Ultimately, however, responsibility for finalizing the Study Protocol and the Operations Manual rests with the Executive Committee.

Selection of Study Sites and Population Framework:
There are two levels in the selection process of the study population: a) the cluster level, i.e. geographical region (e.g. city or part of a city) and within this geographical area, the selection of health institutions where women attend for antenatal and delivery care and b) the individual level, i.e. women or newborns that have the characteristics to be eligible for each of the sub-studies. We will discuss here the latter strategy, as it is common to all substudies and leave the former for each section on the specific sub-studies.
It is not necessary and would be unrealistic, especially in developing countries, for all institutions in a study area to fulfill all selection criteria. The basic characteristic for an institution to be considered is to serve a low risk population for impaired fetal growth. The concept is that such target populations, from which the individual participants will be drawn, should have no or be at very low risk of health, environmental or socio-economic constrains on fetal and newborn growth. Definitions will be locally adapted, for identifying socioeconomic characteristics associated with unconstrained growth in these populations. In principle, these will include markers of household income, housing tenure, education, occupation and employment status using locally selected cut-off points. These variables have been recently identified as explanatory factors in the relationship between birth weight and ethnicity (11).
Specifically, from each study site we will identify health institutions that serve populations with the above characteristics. We will conduct first a census of all hospitals where deliveries take place that are classified under local definitions as "private" or "corporation" hospitals or serving the upper socio-economic sector of the selected region. This is very important for developing countries sites. In developed countries, potential centres should also serve a general population, which will mostly have low-risk characteristics. We should concentrate on large institutions (>1000 deliveries a year), as it is important to involve only a small number of hospitals. Thus, from this census we will select those hospitals that cover at least 80% of all deliveries in the target population. However, it is recognized that some sites will not have large private hospitals and so local adaptations will have to be made.
The selected hospitals will need to agree to collaborate with the University of Oxford (Study Coordinating Unit) and provide evidence of being able to implement the study protocol; in particular, being able to coordinate referrals for ultrasound scans and having a policy of confirming gestational age by early ultrasound examination in all pregnant women. They will also need to show, using the latest data available, that the population they serve is actually of low risk as defined by: a) LBW rate <10% and mean birth weight >3100g; b) located at an altitude below 1,600m; c) perinatal mortality <20 per 1000 live births; d) mothers attending antenatal care in these institutions should plan to deliver in that or a similar hospital located in the same region; e) >75% of mothers have attained an educational level greater than the locally defined cut-off point; and f) lack of known non-microbiological contamination such as pollution, radiation or any other toxic substances (this will be evaluated in collaboration with the WHO Department of Protection of the Human Environment).
From this pool of eligible institutions that have the potential to provide the required sample size for ALL components of the study (target population), the Executive Committee and its advisors, in collaboration with the local study team, will identify the final list of units that will contribute to all the different sub-studies according to their capacity to provide the sample size needed, logistics, availability of equipment and population distribution. Individual participants will be selected from such a target population following the individual criteria listed below (see Section 1.1). Sampling for the fetal growth study will consist of all women that agree to participate and meet the selection criteria from the given date for initiation of the study until the sample is reached (500 pregnancies). For the preterm study we will take all preterm babies, with the inclusion criteria, from this cohort and for the newborn study we will take all newborns delivered at these institutions during the 12 month study period.

FGLS: to develop the Fetal Growth Standards (Activity 1)
Limitations of existing intrauterine growth charts: There are three important issues relating to the construction of fetal growth standards that, in general, were not considered when the ones presently in use were produced (4). The issues are a) longitudinal approach: To create a standard that reflects patterns of healthy fetal growth, serial size measurements at different stages of pregnancy are required. This provides estimates of both the amount of variability in measures among fetuses in a population, as well as the amount of variability within individuals from their own average growth trajectory (15). Some early charts were built using a single measurement from each fetus based on the assumption that cross-sectional measurements from different fetuses can be used to estimate longitudinal patterns of growth. Other authors collected multiple measurements per fetus but analyzed the data in a cross-sectional manner, resulting in an underestimation of the true variance. The present study will collect and analyze data longitudinally. b) Small sample size: Most intrauterine standards are based on small sample sizes, which may lead to unstable estimates of percentiles, in particular at the extremes of the distribution. The extreme centiles are those that matter for the identification of growth deviations. c) Lack of a prescriptive approach: In the past, references have often been built based on existing practices or from specific research units' their final aim was to have population representation in particular place and time (population representativeness). This is a descriptive approach. We believe this strategy is incorrect as the underlying population often does not possess the best health and/or nutrition. For example, the observed secular increase in birth weight could reflect the tendency towards an increase in maternal body mass index (BMI) and obesity observed in most countries and not necessarily better health status of the pregnant population. Similarly, the very high Caesarean section rates in some populations (12) can affect birth weight patterns as reflected by the increase in preterm deliveries in these populations. Therefore, the proposed new standards will be built on the basis of recommended health practices (i.e. prescriptive approach): that is, defining the target population as well-nourished, healthy (both before and during pregnancy), nonsmoking, no extreme ages, adequate education and socio-economic status. The sample to be used for the new standards should be representative of this "prescriptive" population, rather than the general population.

FGLS Individual Entry Criteria
Introduction: How to select a "healthy population" or its counterpart, the risk factors associated with fetal growth restriction, is open to considerable debate. The most important 18/116 factors to consider when selecting a healthy population with no obvious risk factors for IUGR or over-growth, at the first antenatal care visit early in pregnancy, are: a) achieving a balance between strict criteria for risk and external validity of the study population and b) the logistics of screening for factors that are not part of routine care or for which consensus is lacking about their effect on fetal growth.
We have reviewed the extensive literature on this topic which has tried to separate out the risk factors for preterm delivery and IUGR (16) particularly in low and middle income countries, as well as data from our own large-scale studies which have systematically explored the associations between risk factors and perinatal outcomes. In defining the eligibility criteria for the FGLS, one could exclude women with every possible risk factor for poor pregnancy outcome. However, we believe it would be preferable, in the initial screening process, to identify factors that are proven to be related to pregnancy outcomes, specifically those that are recommended as ways of identifying women who would benefit from low-risk antenatal care.
The approach is based upon our overview of the evidence of the effectiveness of antenatal care (17), which was recently up-dated (Abalos, WHO 2008). Similar patterns of care are recommended in the NICE/UK Clinical Guideline (2003) and by WHO for the antenatal booking visit. We will try to ensure that the approach is implemented in the participating centres in a standardized way.
All women attending the first antenatal care visit in the selected institutions with an initial evaluation of gestational age <14 +0 weeks will be screened at study entry based upon the criteria listed below. These women define a population that is likely to need only routine antenatal care in pregnancy, i.e. it is a group of clinically healthy women who can follow basic antenatal care models. Some variables have clear thresholds (e.g. urinary culture positive or negative); for others with less clear values (e.g. maternal height), we have selected a cut-off of 153 cm from the WHO Global Survey, a very large, global, multi-ethnic Maternal & Perinatal Health Study -for a detailed description of one of the 3 regions studied, see (18). This cut-off is supported by corresponding results from a number of other international studies (19,20) doi: 10 Taking all these factors into consideration, we have decided not to base gestational age solely on LMP; rather we shall use a standard gynecological definition of LMP as the first level in calculating the gestational age. Among women with a certain LMP and regular periods who are 9 +0 to 14 +0 weeks pregnant, we will corroborate the gestational age with an ultrasound CRL measurement, using an internationally recognized and validated chart (23). If the difference in gestational age estimation by CRL and LMP is ≤ 7 days we will consider the LMP to be reliable and take it as the true biological date. It is acknowledged that we may exclude a number of women who might otherwise have been eligible if only ultrasound were used, but the aim is to define a population with as exact a gestational age at entry as possible.

Nutritional adequacy during pregnancy
In the WHO Child Growth Study, it was assumed from the outset that children recruited from affluent populations consumed adequate complementary foods. This proved to be the case when we analyzed their complementary diets (24). In the present study, we shall adopt the same concept but in addition will develop general nutritional guidelines for pregnant/lactating women, suitable for local use, based on the best available evidence, e.g. NICE recommended, UK Food Standards Agency advice http://www.eatwell.gov.uk/agesand Stages/pregnancy/whenyrpregnanct/ for promotion amongst the participating mothers and care providers.
Routine nutritional supplements, e.g. protein or energy, will not be given because a) it is not a component of the recommended antenatal care package (10) and b) we are only planning to recruit women with adequate nutritional status. Iron-folate supplementation, will be prescribed if necessary for anemia during/after pregnancy but given routinely only if such a policy is in place in the institution. A similar position will be taken with calcium supplementation for the prevention of pre-eclampsia and preterm delivery if such a policy is in place in the hospital.
Women will be asked to indicate if they are taking nutritional supplements and this information will be recorded in the data collection form. It is not practical in a study of this size to measure adherence in any other way or to obtain individual intakes, e.g. 24 hr recall, considering the poor reliability of such instruments for individual assessment.

Pregnancy follow-up
Women in FGLS will receive standardized antenatal care (with some local variations) based on the recommended WHO package, part of which involves screening for conditions that emerge during pregnancy.
All women recruited will be followed throughout pregnancy from the time of the first visit, irrespective of the pregnancy outcome. As a general principle, the number of exclusions will be as small as possible. They are likely to be confined to fetuses with congenital abnormalities (based on a final evaluation at birth); multiple pregnancies that were not identified at recruitment; mothers diagnosed with catastrophic or very severe medical conditions which were not present at recruitment (e.g. cancer, HIV), and those with pregnancy related conditions requiring hospital admission (e.g. eclampsia or severe preeclampsia). Hospital admission per se is not a reason for exclusion: women admitted simply for "observation" will still contribute data to the fetal growth standards unless they develop one of the conditions listed above. Most sites are malaria free ecozones; thus, intermittent preventive treatment is not routinely recommended although insecticide-treated mosquito nets may be used. In other potential sites malaria is prevalent. We will adhere to local protocols and exclude from incorporation in the fetal growth standards any woman with evidence of malaria infection during the pregnancy.
For the sake of clarity, we will produce during the preparatory phase of the study, in collaboration with the Steering Committee and experts in the field, a very detailed list of those conditions, which would result in a pregnancy being excluded from the preparation of the fetal growth standards. In any case, if any exclusion is needed it will be done only during the data analysis period and local investigators will complete the follow-up of all enrolled women irrespective or any follow-up experience. Data from preterm deliveries, even in the absence of one of the exclusion criteria, will not contribute to the fetal growth standards. 21/116 We also expect that INTERGROWTH-21 st will have fewer follow-up problems than randomized trials as it does not involve an intervention; a well-educated population will be recruited, and women and their families will like some components of the study such as ultrasound pictures or close monitoring of preterm infants. Nevertheless, to ensure that the loss to follow-up remains below our predicted <5% level, we will maintain very close contact with participants; remind them about imminent visits and ultimately conduct home visits if necessary.

FGLS 2D Ultrasound Measurements
Justification: In preparing the protocol it became clear that more than 6 ultrasound examinations after the dating scan would present unwanted logistical problems and inconvenience to mothers. It was also evident that a minimum growth change has to occur between visits, to be reliably measured by ultrasound, considering the errors in the equipment and observers. Thus, even if velocity growth by unit of time (e.g. one week) for certain parameters is high during some gestational periods, the actual change may not be reliably measured. We have considered alternative spacing between measurements such as longer intervals early in pregnancy (e.g. 8 weeks) and shorter ones in later pregnancy (e.g. 4 weeks). There is evidence, however, that measures such as BPD and femur length continue to increase fairly linearly until 34 weeks without a clear period of very high velocity growth. The logistics of coordinating multiple antenatal visits at different time intervals in a large sample mitigate against adopting variable timings. Finally, as women will be recruited more or less randomly at gestational ages between 9 +0 to 14 +0 weeks, we will have a spread of visits throughout pregnancy, which is also advantageous in logistical terms. In a previous report, for example the Guatemalan study, 21% of participants had one scan, 50% had two scans, 24% had 3 scans, and 5% had 4 or 5 scans at varying gestational ages (25). In the Western Australia study, participants were recruited at 16-18 weeks and scanned again at approximately 24, 28, 34 and 38 weeks (26) which we believe is less satisfactory than our schedule. We have explored further this question in a large Scandinavian data set. For 5 measures, in a fixed interval of 7 weeks between measures, they did not detect any problems in modeling growth. Therefore, after the first scan between 9 +0 to 14 +0 weeks, we will perform scans at ~5 weekly (±1 week) intervals, i.e. 14-18, 19-23, 24-28, 29-33, 34-38 and 39-42 weeks. Thus, a woman could have scans at 10, 15, 20, 25, 30, 35 and 40 weeks or 11, 16, 21, 26, 31, 36 and 41 weeks depending upon the gestational age at recruitment and duration of pregnancy. This has the advantage of providing greater coverage of the complete gestational period and easier scheduling of the scans.
Schedule: After the dating scan, 6 further visits (for fetal biometry) will be scheduled at ~5 weekly (± 1 week) intervals (i.e. 14-18, 19-23, 24-28, 29-33, 34-38 and 39-42 weeks). Seven measurements will be taken at each visit from 14 weeks onwards: Biparietal Diameter (BPD); Occipito-Frontal Diameter (OFD); Head Circumference (HC); Transverse abdominal diameter (TAD); Anterio-posterior abdominal diameter (APAD); Abdominal Circumference (AC) and Femur Length (FL) At each visit, the measurements will be obtained 3 times from 3 separately generated ultrasound images in a "blinded" fashion, and submitted electronically (with the associated images) to the Coordinating Unit. The BPD, OFD, HC, TAD, APAD, AC and FL images should fill at least 30% of the monitor screen. The last recorded value of each fetal measurement will be revealed after submission for clinical purposes, as per local protocols. Standardized images will be obtained for all measurements based on internationally accepted protocols (27).
For example, the BPD will be measured from the outer-outer edges of the parietal bones in a cross-sectional view of the fetal head at the level of the ventricles. The measurement should be obtained from an image with the midline echo as close as possible to the horizontal plane with the angle of insonation of the ultrasound beam at 90°. The image should be oval containing a centrally positioned midline echo interrupted in its anterior third by the cavum septum pellucidum. The anterior walls of the lateral ventricles should be centrally placed around the midline echo and the choroid should be visible in the posterior horn in the distal hemisphere. 22/116 The CRL to gestational age conversion will be performed using the charts developed by Robinson and Fleming (23), which remain the most widely accepted reference for early gestational age estimation. This chart is, in any case, based in a cohort of women with known gestational estimation by LMP. This is a circular argument that limits the use of ultrasound alone for the estimation of gestational age. We are aware that an ultrasound expert group in the UK is reviewing the use of these charts. We are, therefore, prepared to consider changing to a new reference if one becomes available.
We also prefer not to include participants in whom fetal size is discrepant from LMP. If we use ultrasound alone to estimate gestational age and the measurement is erroneous, that error becomes built into the growth chart. This seems another circular argument that we must try to avoid. Furthermore, the ultrasound estimation of gestational age has the limitation that all fetuses with a given CRL value will have the same gestational age estimation. This lack of biological variability is a major limitation for evaluating growth. A very interesting discussion on this point can be seen in (28).
In short, we will only consider for recruitment women with a certain LMP and regular periods; from these, we will only select those women with a CRL that is equivalent to their LMP, based on the Robinson and Fleming charts. This population is at the lowest risk of negative pregnancy outcomes (29).
Additional measurements: There is a unique opportunity to complement the study by taking additional measurements, e.g. BPD using the scanner's ellipse facility; transcerebellar diameter; humerus, radius/ulna, tibia/fibula and fetal foot length, and Doppler measurements of the umbilical cord artery, middle cerebral artery and maternal uterine artery. However, we are concerned that introducing extra tasks could reduce the overall quality of the seven main measurements and inconvenience the participants. Therefore, the feasibility of taking additional measurements will be evaluated in consultation with each centre before a final decision is made. In general, only some centres are expected to participate in these additional studies, which are mostly descriptive and exploratory with smaller sample sizes than in the main study.
Variability of measurements: We intend to explore several factors that might explain variability, e.g. parity, BMI, gestational age, position/presentation of the fetus, liquor volume, gender and number of measurements. We shall also introduce a comprehensive system for evaluating intra-and inter-observer variability using the 2D and 3D measures, as well as a random sample of retaken measures by an external ultrasonographer in each centre.
Equipment: All the study centres will use equipment with the same characteristics (determined by an independent group), including a state-of-the-art scanner. The staff will be appropriately trained following standardized procedures. The study will provide the equipment free to all the centres in developing countries; centres in developed countries will be expected to provide equivalent equipment themselves, as their contribution to the study. It is very important that ultrasound probes provide an extremely high quality grey scale image. We like to stress that the results of the study will be in the public domain and all manufacturers will be able to use the results free of charge. A detailed description of the methodology for ultrasound measures is presented in Appendix C. 23/116

FGLS Quality Control Measures
Quality control measures will include a unique system of random evaluation and repetition of ultrasound measurements and continuous real time assessment of all data collected. A limited number of experienced obstetric ultrasonographers and neonatal anthropometrists will form the research teams at each study site. They will be provided with standardized equipment and an operations manual that will describe all measurement techniques, protocols and procedures for training and clinical use in all the centres. The ultrasonographers will receive specific training under the supervision of qualified instructors. Intra-observer and inter-observer measurement errors will be assessed during the training course before initiating the study.
Both the examiner and the instructor will scan each fetus briefly. Each examiner will obtain two images of each fetal anatomical parameter under study. Differences between these measurements will be expressed as the percentage of the measurement obtained from the technically better image. Percentage differences will be used to take into account the increase in the dimensions of the fetal anatomical parameters with advancing gestational age. Percentage differences for each examiner will be averaged and the mean values compared to zero and to the instructor's measurements by t-test. In addition, measurement error will be evaluated by defining a maximum acceptable discrepancy, e.g. 5%, and requiring that 95% of differences lie within that value. This will provide information on the accuracy of individual measurements. The same model of ultrasound machine will be used at all sites, and the equipment will be serviced periodically by specialized technicians as per the manufacturer's recommendations.

FGLS 3-Dimensional Ultrasound Volumes
3-Dimensional (3D) ultrasound technology, a relatively new scanning technique in obstetrics, offers exciting possibilities to assess the fetus. The technique is based upon volume acquisition using specially designed probes with later, off-line review and "3D reconstruction" on a computer. This allows examination of multiple planes of the imaged structure from 3 different right angles (top-to-bottom; front-to-back and side-to-side). Additionally, a variety of techniques for measuring fetal organ and limb volumes have been described. The multiplanar display allows a clearer and more comprehensive assessment of fetal anatomy, and volume calculation produces more accurate estimates of fetal size and the impact of growth restriction on specific fetal organs (30).
The primary purpose of the 3D component in this study is quality control, i.e. to provide volumetric data to assess the quality of the 2D measures (BPD, OFD, HC, TAD, APAD, AC and FL). Hence, the 3D volumes will be the head (at the level of the BPD), the abdomen (at the level of the AC) and the femur. The task of controlling quality with these data will carried out by an independent group. The technique of 3D volume acquisition is similar for all structures. Initially, the structure to be imaged is identified during the last of the three 2D scans taken. Once imaged correctly, the volume acquisition facility is engaged and the volume obtained over a brief (10-15 seconds) interval and stored in digital form on computer for later analysis, which will be performed in Oxford by the Noble research group after the stored images have been transferred there electronically.

Criteria for excluding participants from preparation of fetal growth standards
Women in FGLS will receive standardized antenatal care (with some local variations) based on the recommended packages, part of which involves screening for conditions that emerge during pregnancy and is presently used in the hospitals. All women recruited will be followed throughout pregnancy from the time of the first visit irrespective of the pregnancy outcome, except for those with a late diagnosis of multiple pregnancy. However, we do not expect to include all pregnancies in the preparation of the fetal growth standards as some of them may develop conditions that can affect fetal growth. As a general principle, the number of exclusions will be as small as possible. They are likely to be confined to fetuses with congenital abnormalities (based on a final evaluation at birth); severe IUGR requiring early delivery; multiple pregnancies that were not identified at recruitment; mothers diagnosed with catastrophic or very severe medical conditions which were not present at recruitment (e.g. cancer, HIV), those with pregnancy related conditions requiring hospital admission (e.g. eclampsia, placental abruption or severe pre-eclampsia) or those with acute conditions such 24/116 as an episode of malaria infection. Hospital admission per se is not a reason for exclusion: women admitted simply for observation will still contribute data to the fetal growth standards unless they develop one of the conditions listed above.
For the sake of clarity, we will produce during the preparatory phase of the study, in collaboration with the Steering Committee and experts in the field, a very detailed list of those conditions, which would result in a pregnancy being excluded from the preparation of the fetal growth standards. Data from preterm deliveries, in the absence of one of the exclusion criteria, will contribute to the fetal growth standards until the time of delivery.

Symphyseal-fundal height charts
We will produce, in addition to the ultrasound charts, new data on symphyseal-fundal height measurements to update the charts we first produced 30 years ago based on a single sample (31). To produce one, standardized, well-validated, multi-ethnic chart using this selected population to replace several used in primary health care units and hospitals around the world would be extremely useful. We expect that these measures be taken at the same time that the ultrasound 2D measures following the same schedule by specifically trained staff.

PEA POD measurements
For the Oxford site only, we will include an additional anthropometric measurement of body composition from FGLS pregnancies. Body weight alone can be very misleading; the weight scale cannot tell the difference between an ounce of fat and an ounce of muscle. Body composition estimation provides a measure of the amount of fat mass and the fat-free mass in the body and is now recognized as the most accurate method of measuring infant growth and nutritional status. It is now widely recognised that the accurate assessment and tracking of body composition in the critical period immediately following birth can provide key information in both clinical and research settings. Body composition information can be used to help monitor and evaluate infant growth patterns, optimise nutritional interventions, obtain important feedback during drug treatments, and optimise discharge criteria. Previously, obtaining reliable infant body composition data has been difficult, with available methods limited by problems with accuracy, practicality, invasiveness, and safety. Air Displacement Plethysmography has solved all of these issues by offering accurate and precise measurements of infant body fat and fat-free mass quickly, safely, and comfortably. This technique has now become a vital tool in evaluating infant growth and nutritional status and it has been widely applied in a large number of studies of newborns and infants throughout the world as it is quick and entirely safe.
Over many thousands of applications in infants the use of the PEA POD has not proven to be distressing at all to infants or their parents. In the unlikely event that a child were to be distressed after being placed in the PEA POD (or if for any reason a parent became distressed) then the procedure would be immediately terminated. A further attempt would be made only when appropriate i.e. the child was calm and with consent from the parent.
We will use the PEA POD, which is very similar to a standard neonatal incubator for determining percent fat and fat-free mass in infants. The simple, 7-minute test consists of measuring the subject's mass (weight) using a very precise electronic scale, and volume, which is determined by air displacement plethysmography while the infant lies inside the PEA POD chamber. From these two measurements, the infant's body composition is calculated. For the PEA POD manual of operations see Appendix J.

1 and 2 year follow-up
All infants born to mothers in FGLS will be followed up at the ages of 1 and 2 (± 2 weeks) to evaluate their postnatal growth. The same methods used to measure the head circumference, weight and length at birth will be employed in the follow-up of infants in this cohort. These will allow us to monitor the growth and health of all infants for whom we have accurate, ultrasound-derived, intrauterine growth patterns and detailed information about pregnancy and perinatal events.
Four additional, non-invasive and painless measures of development will also be taken at the 2 year follow-up using standard protocols as defined by the authors of the respective tests: 1) Assessments of motor skills, cognition, speech and behaviour using a tool 25/116 consisting of the Rapid Neurodevelopmental Assessment of Children (32) and components from the Malawi Development Assessment Tool and the Bayley Scales of Infant Development; 2) Visual acuity and contrast sensitivity measurements using the Cardiff Tests (33); 3) Cortical auditory processing assessments to an auditory novelty, odd-ball paradigm using wireless and gel-free electroencephalography (34), and 4) sleep-wake measures using actigraphy (35).
These measurements will allow us to monitor the growth, health and development of all infants for whom we have detailed ultrasound-derived, intrauterine growth patterns and detailed information about pregnancy and perinatal events.
The additional developmental measurements at the 2 year follow-up will be as follows: At the 1 and 2 year follow-up appointments of the infants, we will measure maternal weight. We will contact women and their GPs by letter about these appointments and invite them to attend. If they are unable to attend, we will ask them if we can instead arrange a home visit, a telephone interview or if they can complete the questionnaire by post or email.

Biological samples Oxford FGLS extension (INTERBIO-21 st Fetal & Infant Growth Study)
From 2011 to 2014, we will collect and store maternal blood (12ml) at booking, a urine sample (5ml) at each follow-up ultrasound appointment, as well as maternal faeces (5ml, only if the mother has opened her bowels) and cord blood (12ml)/placental (9x 5mg) samples at delivery (in addition to the pregnancy and fetal growth data) from a total of 1,000 pregnancies. The samples will be used for nutritional and epigenetic studies into fetal growth. The INTERBIO-21 st Fetal & Infant Growth Study will continue to monitor fetal growth and the growth and development of the infants using the same protocols as the FGLS component of INTERGROWTH-21 st http://www.medscinet.net/intergrowth/protocol.aspx http://www.medscinet.net/Interbio/protocol.aspx?lang=1 (See Appendix K).

Oxford NCSS extension (INTERBIO-21 st Neonatal & Infant Growth Study)
From 2011 to 2014, we will collect and store maternal blood (12ml) at delivery, as well as maternal faeces (5ml, only if the mother has opened her bowels) and cord blood (12ml)/placental (9x 5mg) samples at delivery (in addition to the pregnancy data) from a total of 10,000 pregnancies. The INTERBIO-21 st Neonatal & Infant Growth Study will continue to monitor the growth and development of the infants using the same protocols as the NCSS component of INTERGROWTH-21 st http://www.medscinet.net/intergrowth/protocol.aspx.

Epigenetic studies in INTERGROWTH-21 st participants
Use of Guthrie cards: We will perform the same analyses (i.e. nutritional and epigenetic) on DNA extracted from routinely collected blood spots (stored as Guthrie cards and no longer required for clinical purposes) obtained from the cohort of 5,941 babies born at the John Radcliffe Hospital who participated in the original INTERGROWTH-21 st Study. The Guthrie cards will be barcoded, linked anonymously to the clinical data collected from the INTERGROWTH-21 st participants and then unlinked.

Discarded 1 st trimester Down's screening samples
From 2011 to 2014, we will collect and store all discarded 1 st trimester samples that were used to screen Oxford patients for Down's Syndrome (approximately 7,000/year). The results of the analysis of these samples will be linked to INTERBIO-21 st Fetal and Neonatal & Infant Growth Study data, as well as anonymised, routinely collected pregnancy outcome data from all other women. As serum samples from other hospital trusts are also routinely sent to the John Radcliffe Hospital for Down's screening, we will: a) store these samples; b) link them to anonymised, routinely collected pregnancy outcome data from all women delivering at the Royal Berkshire Hospital, Reading; Milton Keynes Hospital, and Buckinghamshire Healthcare NHS Trust (approximately 21,000/year), and c) similarly use them in future studies to test novel biomarkers predictive of adverse pregnancy outcomes.

Sample use
All the samples described above will be used for a range of biochemical, nutritional, genetic and epigenetic studies into placental function and fetal growth. Any unused samples in the future may be donated to the Oxford Radcliffe Biobank for studies into the causes of pregnancy-related problems http://wyvern.ndcls.ox.ac.uk/orb/.

The Pregnancy Physiology Pattern Prediction (4P) Study
An urgent need to develop an evidence-based, national, Modified Obstetric Early Warning Score (MEOWS) was highlighted in the two most recent Confidential Enquiries into Maternal Deaths in the UK. An essential prerequisite to developing such an early warning system is knowledge of the normal distributions of physiological data (blood pressure, temperature, respiratory rate, heart rate and oxygen saturation) in "low-risk" pregnant women. The 4P Study aims to obtain these longitudinal data, from <14 weeks' gestation to 2 weeks after delivery, from women participating in the Oxford arm of the INTERBIO-21 st Fetal & Infant Growth Study. For details of observation measurement in the 4P Study, see Appendix I.

Outcomes
• A national early warning system will be developed for all gestational ages and the immediate post-partum period. • An NHS web-based programme will be developed, allowing age, parity and gestational-age specific MEOWS chart to be created for individual mothers on admission to hospital. • An automated, multi-parameter, age, parity and gestational-age specific alerting system will be created for use in monitored areas and hospitals with an EPR system. • Fetal size will be correlated with maternal physiological values.

Follow-up of preterm babies in FGLS and INTERBIO-21 st Fetal & Infant Growth Study
All preterm newborns (>26 +0 <37 +0 weeks) from the FGLS and INTERBIO-21 st Fetal & Infant Growth Study cohorts will be followed-up for 8 months after delivery and at 15, 18 and 21 months to evaluate postnatal growth. The same anthropometric measurement techniques to measure the head circumference, weight and length will be used to follow-up infants in these cohorts. At 15, 18 and 21 months, we will use the same data collection forms that are used to follow-up these infants at 1 year. These measurements will allow us to monitor the growth and health of all infants for whom we have detailed ultrasound-derived, intrauterine growth patterns and detailed information about pregnancy and perinatal events.
In addition, we will assess the motor development of the preterm cohort every month from 4 months of age to 8 months of age using the motor development assessment questionnaire, which is part of the 1 year assessment. This consists of two parts: 1) a questionnaire for the parents about the age their child achieved some gross motor milestones, and 2) a questionnaire completed by the Measuring Team reporting whether or not the child demonstrated those milestones during the visit.
At 2 years of age, the same anthropometric measurements (head circumference, weight and length) will be taken. In addition, we will also measure developmental outcomes in the 27/116 children. These measurements will include a test of hearing (cortical auditory processing); a test of visual acuity and contrast sensitivity; an assessment of the child's cognitive, language, motor and behavioural development, and an assessment of sleep. Section 1.9 details the techniques used to measure these developmental outcomes.
These measurements will allow us to monitor the growth, health and development of all infants for whom we have detailed ultrasound-derived, intrauterine growth patterns and detailed information about pregnancy and perinatal events.
We plan not to include preterm newborns ≤26 weeks (only ~3% of all preterm newborns) because of the very high morbidity/mortality in this group and need for very special intensive care. A fixed follow-up period was chosen to simplify organization of the study and reduce loss to follow-up. Nevertheless, an analysis based on the time from conception (corrected age) will be performed to compare preterm babies with their in utero counterparts conceived around the same time. This length of follow-up should minimize a) inconvenience to the mothers; b) the need for home visits, and c) the study's cost and complexity. An analysis based on the time from conception (corrected age) will be performed to compare preterm babies with their in utero counterparts conceived around the same time. Although we want to produce growth charts for the first 6 months of life, we have extended the follow-up period to 8 months for these infants to avoid the so-called "right-edge" effect in the construction of the growth standards.
Exploratory sub-group analyses will also be conducted, e.g. very preterm infants <32 weeks (~20% of all preterm newborns) and induced for medical reasons vs. spontaneous preterm deliveries. However, to be included in the PPFS growth charts, newborns have to meet the criteria decided a priori. A detailed protocol describing clinical conditions for inclusion in the final growth curves has been prepared in consultation with expert neonatologists before any data analysis is undertaken.
This protocol will focus on definitions that can be made operational across institutions. Newborns will be excluded retrospectively if they have severe medical morbidities likely to impact upon growth and congenital malformations. (See also section 2.4).
Clearly, for some subgroups of very preterm infants this will represent only an exploratory analysis; however, it will still be done because of the unique opportunity to study continuous fetal-preterm growth. It is expected that many babies, especially those born <32 weeks, will be managed in some form of neonatal intensive care unit. Measurements of these babies will be taken using the study protocol, but in accordance with the clinical status of the infant and the unit's protocols. It is clear that to ensure that "clinically stable" babies are comparable across different NICUs, standardization and continuous monitoring are needed. Ultimately, in such an acute clinical care setting, the final judgment will depend on the attending doctor; realistically, this is how the standards will be used in practice anyway.

PPFS Anthropometric Measurements
The anthropometric measurements will include weight, length and head circumference. The three measurements (and a standard clinical evaluation) will be taken every 2 weeks during the first 8 weeks, and then every 4 weeks until 8 postnatal months, using essentially the same methodology and operating manual employed in the WHO Child Growth Study (6). The only difference will be that in the proposed study all measures, interviews and clinical evaluations will be conducted at a special follow-up clinic in the corresponding hospital. Routine home visits are not planned, except for those mothers who do not comply with the protocol's scheduled visits. It is expected that the preterm babies will have a maximum of 11 follow-up visits over 8 months, but this number may be reduced for some as complications, including death, are expected especially in the very preterm subgroup. Abdominal circumference will not be included as it is not used in routine neonatal practice and respiratory movements in these tiny newborns make the measurements unreliable.
Standardized, electronic, digital, newborn weighing scales will be used and serviced regularly; they will be replaced if they are faulty and cannot be repaired. All anthropometrists in the study will be trained centrally; they in turn will train the nurses/midwives in how to use the scales. We will follow the same procedures used in MGRS to certify all staff responsible for taking the measurements, as was the case in our own previous randomized trials. During routine measurement sessions, two anthropometrists will independently take duplicate measurements. Maximum allowable differences between duplicate measurement values will be checked to detect outliers and trigger immediate re-measurement if necessary.

PPFS Quality Control Measures
An Anthropometric Data Quality Unit, organized on similar lines to the Ultrasound Quality Control Unit, will be created. Its role will be to conduct, following the WHO MGRS's training procedures, the standardization, monitoring, site visits, and retraining when necessary of all staff responsible for anthropometric measurements (MGRS Manual of Operation, WHO Geneva). Using the on-line data management system, samples of measurements will be obtained to evaluate variability and consistency within each site and across sites. The same protocol will be followed for measures at birth for all newborns in the fetal and newborn studies. For the preterm follow-up we will use the MGRS protocol but include only weight, length and head circumference. We will assign 2 staff at each centre to take all newborn (and postnatal follow-up) anthropometric measurements. This will permit replication of the WHO MGRS measurement standardization protocol (6).
As stated above, the standardization of all anthropometric measurements is a central element and challenge of this protocol. The measurers (or lead measurer from each site) will be brought together at a central location for two initial training meetings to be conducted by experienced staff from the WHO-MGRS team with the assistance of the MGRS anthropometry video. This video will be used for local training as well as for on-going standardization activities. The study will provide all sites with a new infantometer(s) with digital reading for infant length measures. For birth weight, hospitals will use the balances they are currently using but they will be calibrated daily. Other equipment will be calibrated at least once a week during data collection. The overall sample will be also be monitored daily, using the on-line data entry system, for unexpected values and outliers so that measurements can then be repeated within a few days, if appropriate.

Infant Morbidity During Follow-Up
For the Preterm Follow-up Study, we will use an indicator of morbidity at entry and an extended version for the follow-up itself. The protocol states (see section 2.1) that the study will exclude newborns ≤26 +0 weeks (only ~3% of all preterm newborns) because of the very high morbidity/mortality in this group and need for very special intensive care. A detailed protocol describing clinical conditions, "minimum criteria", for inclusion in the final growth curves will be prepared a priori before any data analysis is undertaken. Babies that do not meet the "minimum criteria" will nevertheless be followed up to form a subgroup for comparative analysis with the "healthy" preterm infants. Efforts will therefore be made to standardize the major components of infant care throughout the participating institutions. The protocol will focus on definitions that can be made operational across institutions. Criteria are likely to include any morbidity severe enough to have a negative impact on growth, feeding or the feasibility of measuring the infants at the required times.

Infant Feeding Practices
We expect newborn feeding to be protocol driven and based on a) the recent WHO recommendations (Edmond and Bahl "Optimal feeding of low-birth-weight infants" WHO, 2006) and b) standard NICU practice for clinical status. Descriptive data, collected in an adapted version of the WHO MGRS form, will serve to document that the centres conformed to recommended feeding patterns and clinical practice. The comparison with intrauterine charts is an important, biological secondary objective, but the main objective of the PPFS is to describe the postpartum growth that can be expected, or aspired, to by "healthy preterm newborns".
It is anticipated that all the sites will adhere to internationally recommended standards for feeding preterm infants; see review (36). The recommended first choice for LBW infants is breastfeeding or mother's own expressed milk. This is associated with the lowest incidence of infection and necrotizing enterocolitis, and best outcomes in terms of neurodevelopment. Where that is not possible, the next best choice is donor human milk and, in third place, preterm infant formula (Edmond and Bahl "Optimal feeding of low-birth-weight infants" WHO, 2006).

NCSS Individual Entry Criteria
The Newborn Cross-Sectional Study will include a) all babies born to mothers in FGLS and b) all other babies born at the study centres during a fixed 12 month period. The vast majority of babies will have had their gestational age confirmed by an early ultrasound because all the hospitals included in INTERGROWTH-21 st sites should already have a policy of checking gestational age with an early dating scan. Measures will be taken during the study period to reinforce this policy to ensure that all newborns have their gestational age at birth corroborated by ultrasound scan estimation early in pregnancy. The strategy of including the complete newborn population will allow us to study a large number of LBW infants and subgroups under a wide range of conditions that can affect these outcomes. We will also be able to compare the total populations across centres in terms of sociodemographics and pregnancy outcomes to confirm that the populations contributing to the new standards have the required characteristics.

Constructing the Standards
For construction of the birth weight for gestational age standards, babies born to mothers without health problems likely to influence fetal growth will be selected from the total number of births at each centre. However, the total population of newborns during the same study period will be used to explore Objective III (determinants of LBW and its components, preterm delivery and impaired fetal growth, under current healthcare conditions). In addition, we plan to use an outcome-based approach, based on recent recommendations. a

NCSS Anthropometric Measurements
All babies born during the study period will have these anthropometric measures taken within 24 hours of delivery: weight, length and head circumference. To select those newborns eligible for the NCSS (newborn standards), we will use the same criteria as for the fetal growth study.
Standardized, electronic, digital, newborn weighing scales with a precision of 10 grams will be used and their calibration status will be checked daily; they will be replaced if they are faulty and cannot be repaired. We shall also provide all clinics with standardized infantometers for length (precision 0.1 cm) and tape measures for head circumference (precision 0.1 cm); these will be similarly calibrated and maintained. All anthropometrists will be trained centrally and monitored during the study following standard procedures by the Anthropometric Standardization Unit; they in turn will train the nurses/midwives in how to apply the study's measurement protocol. We will follow the same procedures used in MGRS to certify before and during the study all staff responsible for taking the measurements, as was the case in our own previous randomized trials. As in MGRS, routine data collection will comprise duplicate measurements of weight, length and head circumference taken independently by two anthropometrists. They will check maximum allowable differences between their replicate measurement values immediately the measurements. These maximum allowable differences are approximately twice the Technical Error of Measurement of a well-trained observer (e.g. 0.7 cm for length and 0.5 cm for head circumference). If the maximum allowable difference is exceeded, the anthropometrists will immediately go on taking other duplicate measurements until the difference is acceptable. Replicate measurement values will be averaged before analysis to increase accuracy further.

NCSS Quality Control Measures
The same quality control measures and standardization strategy (adapted from the WHO Child Growth Study protocol) (7) which are described above for PPFS will be employed for the three measurements to be taken for the whole study population in NCSS. A special Anthropometric Data Quality Control Unit, independent of the Study Coordinating Unit, will  be responsible for the monitoring and quality control of these measurements during the entire study.

Follow-up of the NCSS
All newborns during the study period, including those on NICU or special care, will be followed on a daily basis until hospital discharge to detect neonatal death and document severe morbidity. We will make strenuous efforts to coordinate and promote evidence-based care for the preterm neonates using materials developed as part of the WHO best practice programme, by liaising with the lead neonatologist in each NICU before and during the study. We recognize that differences in practice will persist despite our best efforts. However, we believe this is unavoidable in a very pragmatic study such as this, which is trying to reflect what happens on a daily basis in clinical practice. Furthermore, we will similarly make strenuous efforts to standardize the main protocols for feeding practices in each NICU before the study starts. During the routine site-visits by members of the Study Coordinating Unit and the Anthropometric Team we will monitor the implementation of the protocols.

Biological Samples
We have debated the issue of anonymized genetic/epigenetic data at great length. In principle, we would not wish to miss an opportunity to put together what would be a unique databank. However, we are very concerned about a) the effect that a request to collect DNA might have on recruitment rates; b) ethical questions that might be raised by IRBs; c) the additional time required to obtain informed consent, and d) the extra cost of processing the samples. Having said this, it nonetheless remains an important issue. Therefore, we shall bring together a group of international experts and potential academic partners, to discuss the advantages and disadvantages of collecting biological samples for a specific hypothesis relating to fetal growth and/or preterm delivery. The final decision will depend largely on whether the study has sufficient power to answer a specific question, relating for example to the effect of a genetic polymorphism on a pregnancy outcome. After much discussion, it was finally decided to collect biological samples from FGLS and NCSS pregnancies from 2011 onwards. For details of biological sample collection refer to Section 1.11 and Appendix K.

Severe Perinatal Morbidity and Mortality Outcomes
Birth weight for gestational age standards will be related to indicators of perinatal outcome to establish risk levels associated with different growth patterns. The 'ideal' outcome is perinatal mortality, but its anticipated infrequent occurrence in this low risk population makes it unrealistic to have a sample large enough for the necessary number of events across the gestational age distribution (perinatal mortality is expected to be close to 1% in these populations). We have therefore decided to use an un-weighted composite outcome including at least one of the following conditions: stillbirth, neonatal death until hospital discharge of the newborn, newborn stay in NICU for ≥7 days or other severe neonatal complications. We have used such an outcome recently (12,13); it requires limited standardization of clinical diagnoses across hospitals and is well accepted as a marker in large, international, population based studies of newborns that are severely ill. b It could be argued, however, that intrapartum stillbirth may not be related to fetal growth and should not be included in this index. We believe this is a valid point but as it will not be possible to separate those intrapartum deaths that are related to IUGR from those that are unrelated, we suggest keeping the index as it is. As we expect a very low fetal death rate, it should not have a major impact on the observed rates.
We believe this is a good proxy for adverse perinatal outcomes across countries. We have used it as a primary neonatal outcome in recent publications and it has been well accepted. Its only disadvantage is that it risks excluding from the total number of early neonatal deaths some cases amongst healthy, mostly term babies delivered vaginally who, after hospital discharge at 48 hours, develop severe complications or death up to 7 days post-natally without returning to the same hospital. However, missing these isolated cases is preferable to performing thousands of unnecessary home visits.
Data from our population-based studies in some of the sites indicate that the incidence of this outcome is close to 5%. We will confirm this estimate during the preparatory phase of the study in the study centres. We will also reach consensus with the investigators and consultants on the few very severe clinical conditions to be included in the composite outcome in addition to perinatal death and ≥7 days NICU stay.

On-line Data Management and Statistical Analysis
All data will be entered into an on-line data management system specifically developed for the study; it will include a system for direct transfer of blinded data from the ultrasound machines to the database. This on-line system has the practical benefit of allowing on-going quality control, correction of errors or missing values and the initiation of data analysis soon after data collection is completed. We have previously used such a system in very large multi-centre studies (12) and randomized clinical trials in developing countries, and we are confident that there will be few problems in adapting this concept to the proposed study. The system will be field tested during 2008 using the data flow model shown in Appendix B.
This system will be used for data management and monitoring all sub-studies, including patient recruitment and follow-up, and is based on the INTERGROWTH-21 st Electronic Data Management System (IDAMlS). The system permits all participants' data to be incorporated contemporaneously into the data files via the Internet. Included within the system is a review process to ensure that all data are complete, and that the research team is notified about imminent patient visits and expected delivery dates. At an institutional level, each centre will have its own progress enrolment charts and weekly recruitment targets. The system will also provide the Data Coordinating Unit with a detailed daily record of patient enrolment and data entry, at both individual and institutional levels to monitor progress against the milestones listed in the protocol. Corresponding actions, such as telephone calls, web conferences or site visits will take place within a week of detecting a problem in a centre to ensure that appropriate corrective measures are introduced. Lastly, the system will allow the database to be analyzed as soon the data collection period has ended. Data from all sites will be pooled to construct the curves for international applications using the WHO Child Growth Study analytical strategy (8). Data analysis will be conducted following the same strategy used in the construction of the WHO MGRS curves during Years 3-5. It will be coordinated by the Statistical Director (D. Altman), consultant statistician/epidemiologist and the team's senior investigators.
Extensive secondary analyses of these data will be conducted including the development of velocity standards, thereby contributing in a major way to answering questions relating to clinical practice and public health now and for years to come.
The appropriateness of pooling data from all sites will be assessed by comparisons of site means, standard deviations and the 3 rd and 97 th centiles to the pooled values of primary data. Consistent differences ≥ 0.5 SD between the mean values of individual sites and the pooled sample will be used as a pre-set trigger for considering whether to adjust by site for purposes of pooling data. This has to be supported by site-specific consistent differences across primary measures. For the WHO-MGRS it was concluded that data from all sites were remarkably similar and thus should be pooled (8). This is the strongest biological argument to date and will be the basis for the conceptualization of our analysis.
The planned approach will be based on the experience of the WHO-MGRS adapted to the fetal growth parameters (37); we shall follow the same exploratory strategy used in this study. We are aware that these analyses may not be automatically applicable, but we feel comfortable that the considerable experience of this team of statisticians in carrying out a similar study will enable them to select a final model that is fit for purpose. The methods do not require similar intervals. We are not making any assumptions about the shapes of the individual curves, except that they will be smooth.
For the physical growth standards, a class growth distribution that can fit the data for all measurements will be explored with an appropriate smoothing technique to generate centiles. Candidate distributions (e.g. Box-Cox t, modulus-exponential-normal, Box-Coxpower-exponential, Johnson's S U or Johnson's S B ) will be compared on the basis of goodness of fit for each age group using the following diagnostic tools: (a) Log-likelihood comparisons, (b) Q-Q plots (i.e. comparing empirical and distributional centiles), and (c) Comparisons of observed percentages that occur below estimated centiles against expected values. Once the most appropriate distribution(s) for our data is (are) identified, smoothing techniques will be tested against the same diagnostic criteria applied to the selection of distributions such as polynomial smoothing, the Box-Cox t distribution that smoothes using natural splines and a variation of it with the Box-Cox-power-exponential distribution. Sir David Cox (University of Oxford Statistics Department) who advised us on another pregnancy-based study, will be invited to contribute to the selection process.
It could be argued that only parous women should be included in the construction of the standards. Furthermore, it would reduce the incidence of LBW babies amongst the mothers recruited as nulliparous women are known to be at higher risk of LBW babies. However, it is also the case that long-term outcomes in newborns born to parous versus nulliparous women have similar overall morbidity and mortality rates. Overall, we believe that the advantages of recruiting solely parous women are outweighed by the loss in external validity and credibility in doing so. Therefore, we intend including all women regardless of parity but we will try to recruit similar proportions across the study sites. We will also explore in the analysis if, in this sample of healthy women, parity is an important effect modifier for fetal growth, but we do not expect to produce parity-specific fetal growth standards. In addition, based on the same biological principles, sex-specific charts should be constructed for fetuses as they have been for newborns. However, we do not plan to separate fetuses by sex as this is not a routine practice and it is unlikely that it will became one in the future. Conversely, separate sex-specific charts will be developed to monitor the postnatal growth of preterm babies.

Sample Sizes for Objective I (Activities 1 -3)
General principles: The precision of growth chart centiles is determined by several factors, of which the most important is sample size. Other factors are also relevant, including study design (cross-sectional vs. longitudinal), the timing of measurements, and the method of curve-fitting. Many criteria can be used to estimate sample size, but three are considered critical here, namely the precision of: a) a given centile at a particular gestational age; b) the logistics of selecting and following-up a large cohort, and c) the rate of the expected perinatal outcome for Activity 3, in Objective I. (The outcomes to be used in Objectives II-III are either continuous variables or far more prevalent than severe neonatal morbidity/mortality, i.e. a preterm delivery rate of 9-10%, and therefore sample size limitations are of less concern for these parts of the study). Sample sizes have been estimated taking each of these criteria into account. The numbers quoted below are combined figures for all study sites.
It is possible that the WHO-MGRS findings of similar infant growth patterns amongst different ethnic groups may not apply to fetal growth, even though recent data for US white/black newborns support the concept of pathological rather than physiological factors being responsible for observed variations (16). We question why fetuses should grow differently to infants or be affected differently by environmental influences, or why biologically different growth patterns should exist across ethnic groups. Nevertheless, we will make all practical efforts to obtain the largest possible sample size per ethnic group to allow separate evaluation in case the alternative hypothesis (different fetal growth by ethnic group) is observed. Even if this is the case for some groups it is very unlikely that all ethnic groups will have biologically relevant different growth patterns.
FGLS: To obtain complete data from 4,000 pregnancies at 8 study sites, 500 mothers will have to be enrolled at each centre. Data from previous years will be used at each study site to estimate the expected number of eligible women available and the recruitment rates. We expect that ~75% of the total pregnant population at the sites selected will be eligible for this cohort, although local investigators will be strongly advised to be conservative in their estimates of compliance rates and to allow for attrition due to other reasons. The 4,000 women should provide close to 800 cases of each of the 5 general ethnic groups. Sample size calculations (15), based on a width of the 95% CI of the 5 th birth weight percentile of 2.5% at term (2,700g), demonstrate that 600 women per ethnic group should be adequate, after excluding complicated pregnancies and women lost to follow-up. This sample size is larger than most previous studies and adequate, we believe, to produce reliable curves. 33/116 However, we will not produce ethnic-specific standards because such pregnant populations are rarely representative of a unique ethnic group, particularly in urban areas in developing and developed countries. It would therefore be impractical to alternate between standards based on each patient's ethnic classification, even if this could be done. We will not be able to estimate outcomes across groups with any degree of power but that is not an objective of this study. These numbers should also fulfill the requirements for PPFS of ~360 preterm infants (assuming a preterm rate of 9% based on our previous study involving similar populations (12)).
We estimated that fewer than 5% of women will be lost to follow up (the figure is usually ~ 3% in our large trials). We also adjusted for the fact that 10-15% of women will be excluded from the preparation of the fetal growth standards because they will have developed problems severe enough to have affected fetal growth.
PPFS: A cohort of ~360 preterm babies will be recruited from FGLS for this study. It is expected that the gestational age distribution will provide 310 babies >30 weeks gestation. Only the subgroup of babies free of major clinical problems will be used to create the standards. We recognize that the sample size calculation here is influenced by logistic issues and the availability of newborns from FGLS rather than statistical calculations. However, it is still large by preterm study standards and we shall have very detailed followup data. We consider that the possibility of having a full set of fetal and newborn growth patterns from a cohort of preterm newborns is a biological priority even if we shall not have the power to explore other subgroup analyses such as gestational age sub-groups or early postnatal morbidity. Postnatal growth from healthy populations has been shown to be similar among ethnic groups (38,39).
NCSS: The sample size calculations for this component of the study depend mostly on the number of preterm newborns needed at the lower end of the gestational age distribution and the perinatal outcome selected. A fixed data collection period (12 months) is planned for all hospitals but it is clear that some variability will have to be accepted considering the size of each institution and the population characteristics. From the pool of all newborns, we will select the ones born to eligible women using the FGLS criteria listed above. We expect the refusal rate to be minimal. We will aim for a total of 50,000 newborns with very detailed information which will provide, based on data from similar institution (40), a sample of ~75% eligible babies for the newborn charts, including ~1,800 with a severe outcome using our established criteria. Of these 1,800, we expect that 70% (about 1,200) will be between 26 and 36 completed weeks of gestation.
Missing values: We consider that women without newborn data cannot contribute to any of the three studies. Fetal deaths and newborns with congenital malformation will be excluded. All women should have at least 2 follow-up values after the dating scan to be included in the fetal growth study.

B. Objective II: Activities 4 -6
Objective II will be achieved by the following three activities: 1) Preparation of computer files of 2D ultrasound measures for 3-week gestational windows between 20-34 weeks; 2) Development of analytical strategies for the construction of predictive models to estimate gestational age during mid-late pregnancy and 3) Completion of prediction model to estimate gestational age during mid-late pregnancy based on several ultrasound measures obtained at a single visit. Ultrasound-based gestational age estimation in infrequent attendees or women with limited access to care is usually based on a single ultrasound measure (e.g. BPD), but the estimates have large errors and wide confidence intervals. We are planning to develop equations that use several ultrasound measures obtained at a single visit. If we achieve this, it will be a major contribution to the care of high-risk women attending hospitals on only one or two antenatal visits late in pregnancy.
Statistical strategies for the production of the prediction models include multiple regression analysis using gestational age at delivery as the gold standard (dependent variable), then recalculating the expected gestational age at different time points. The independent variables or predictors will be the basic ultrasound measures at a given C. Objective III: Determinants of LBW and its components, preterm delivery and impaired fetal growth, under current healthcare conditions (Activity 7) The complex interactions between risk factors, clinical presentations and underlying biological processes are poorly understood in relation to adverse perinatal outcomes, especially LBW which is such a heterogeneous entity. This has been a major limitation in preventing preterm delivery and impaired fetal growth. We therefore aim to explore risk factors and perinatal outcomes for LBW (<2,500g), preterm delivery and impaired fetal growth in the entire NCSS study population of 50,000 newborns. For the identification of risk factors, data collected during routine and specialist antenatal care from ALL women delivering in the institutions over a fixed period will be included in the study. A summary antenatal and delivery form will be introduced into these institutions, which will conform with a) the new WHO model of antenatal care for basic routine care (10) and b) local protocols for special cases, standardized by us from previous trials in pre-eclampsia, hypertension, urinary tract conditions, and intra-and post-partum care. An important conceptual issue is that we do not aim to detect any new, unexplored risk factors. Rather, we plan to determine how risk factors, that are routinely recorded during standard antenatal care, are distributed in the preterm/IUGR cases and their subgroups across these populations, in view of the considerable heterogeneity in risks and outcomes within these two conditions.
We plan to investigate the determinants of preterm delivery and IUGR in sub-groups from clinical, routine laboratory, demographic and socio-economic variables obtained from all women attending the study centres without any exclusion (as opposed to FGLS which aims to produce standards from a sample of selected, healthy women). It would, of course, be very interesting to collect more detailed information about other variables or test biomarkers of, for example, infection. The question, as always, is when to stop adding more variables to an already complex study. One alternative is to add ancillary studies to selected centres that are interested in collaborative research with other groups. It would certainly not be impossible for us to coordinate the collection, storage and testing of such samples (as our group has experience of collecting thousands of samples for similar studies) but funding is not available for such an activity at this point.
We will apply standard statistical strategies (e.g. logistic regression analysis) using the ~4,500 preterm newborns or IUGRs from the total NCSS study population to study these relationships compared to term babies, and we shall explore some recently proposed analytical strategies as well (41). We will consider subgroups of preterm deliveries, e.g. induced vs. spontaneous deliveries; premature rupture of membranes; pre-eclampsia related vs. unexplained impaired fetal growth, as well other pathological and physiological conditions (42,43). The adjusted ORs and confidence intervals for these risk factors as related to the different subgroups will be estimated and biologically plausible interactions will be explored. The association between subgroups of newborns and perinatal outcomes will be also examined. We also intend to explore several factors that might explain variability in fetal growth, e.g. parity, BMI, gestational age, position/presentation of the fetus, liquor volume, gender and number of measurements. We shall also introduce a comprehensive system for evaluating intra-and inter-observer variability using the 2D and 3D measures, as well as a random sample of retaken measures by an external ultrasonographer in each centre.

Activity 8: 3D Data Collection during FGLS
Data collection for this activity will be conducted immediately after the 2D measurements have been taken for FGLS. The 3D volumetric images of the head, abdomen, and femur will be acquired using the methods described in Section 1. 6. We expect that the provider of the equipment will provide technical assistance to optimize data collection.
The primary purpose of the 3D component is to provide volumetric data to assess the quality of the 2D measures (BPD, OFD, HC, TAD, APAD, AC and FL). The secondary objectives are a) to evaluate currently used ultrasound measures of fetal volume with the expectation that some of them can contribute to the improvement of the presently used estimation of fetal weight and b) to establish a unique "biological databank" for future research. Additional 3D volumes of the chest (at the level of the 4-chamber view of the heart) and the humerus will be taken specifically for these purposes in some centres.
The secondary objective a) is very interesting from a clinical point of view as most of the ultrasound-derived equations for estimation of fetal weight have low predictability and were developed using data collected from small studies with less sophisticated ultrasound equipment than is now available. Thus, the feasibility of combining 2D and 3D measures will be explored including the possibility of doing so across gestational ages. We expect to evaluate the additional benefits that could be obtained by adding 3D values in a sub-sample of the study.
Other uses of the 3D data are planned: for example, the accuracy of fetal weight estimation based on 2D measures is known to be poor in clinical practice. It is expected that 3D data will improve the formulae that are still in use despite being produced over 20 years ago. Furthermore, the likelihood of low-cost, portable 3D equipment being available in the future raises the possibility of managing high-risk pregnancies in rural or distant regions by sending the images to a referral centre rather than the mother herself. We plan to take the final decision regarding the amount of 3D data to be collected and the number of participating sites after conducting a pilot study assessing the human resources available at the centres, the need for additional training, the time that 3D scanning adds to each visit, equipment considerations and the feasibility of transferring the very large datasets to the Coordinating Unit in Oxford. Issues we shall explore in the pilot study include: a) difficulties obtaining individual organ volumes (44); b) the need to obtain at least two good volume measures from most of the areas of interest which may require 20-30 minutes additional scanning; c) the need for the audit staff to know the values of the calculated volumes and the way the operator arrived at each volume which could introduce considerable bias, and d) how to save and transmit the volumetric data reliably given the risks of a hard drive crashing or being corrupted. In any case, it is unlikely that we shall record more than the 5 volumes listed above at each visit, and it is also worth mentioning that the Gates Foundation has not funded the 3D component of the study.
We have debated the issue of adding more ultrasound measures (i.e. we have a unique opportunity that may not be repeated again) with the Ultrasound Advisory Group and other experts in the field, as we are concerned that the potential list of additional 2D/3D measures is long. As is the case in selecting which data to collect in any large study, one has to balance the enthusiasm of the researchers against pragmatism. We have chosen to concentrate on taking a few measures well (and more than once) in the time available during each scanning session rather than risk the possibility that additional measurements (including some not used in routine clinical practice) will be taken hurriedly. Of course, it will be possible to take further 2D measurements retrospectively for research purposes, if needed, from the acquired 3D volumes.

Activity 9: Preparation of 3D Files and Exploratory Analyses
There are two goals of the 3D component of this project. The first is to understand better the clinical advantages of making manual and automated fetal measurements on 3D v. 2D data. This should lead to a clearer understanding of IUGR and the relationship between different growth patterns and fetal/neonatal outcomes, as well as the impact of delayed growth in individual fetal organs and health care problems in adult life (e.g. an increased risk of osteoporosis). The second goal is to derive statistical models of the fetus or parts of the fetus to provide informative visualizations of nominal shapes and growth data at different gestational ages.
One possible approach will be to build parts model of the fetus, and to deform it to different instances of data acquired at the same gestational age. This will not be a simple task and will require theoretical and technical advances in biomedical image analysis. The research will be led by A.Noble, Professor of Engineering Science, University of Oxford, and forms the basis of a separate grant application to a UK agency, the UK Engineering and Physical Sciences Research Council. The aims of Activity 9 are simply therefore to prepare the 3D images and commence these exploratory analyses. Examples of what is currently possible include studying the relationship between a) brain growth and neonatal/infant

Ethical Issues
The study will comply with the International Ethical Guidelines for Biomedical Research Involving Human Subjects. Ethical safeguards will include: Ethical approval at international, national and local levels: The study protocol will be submitted to the relevant UK ethics committee in keeping with the University of Oxford's responsibilities as the study sponsor, as well as the appropriate national bodies of the countries included in the study (if required) and all local ethics committees. Ultrasound and anthropometric research involves minimal, if any, risk to the mother and fetus and is part of routine care in these institutions.
Individual informed consent: Written informed consent will be obtained from all women enrolled in the study after the study objectives and procedures have been described in detail. The information will be provided in written form (translated into all the required languages) and orally by the researchers themselves based on a standard template, which can be referred to whenever needed. Women will only be recruited if it is clear they understand what the research entails and once they have had sufficient time (at least 48 hours) to consider whether to participate or not. All information collected in these studies will be made available to the clinicians responsible for the women's care; however, the 2D ultrasound measurements will only be made available after the data have been submitted on-line (see 1.2 above).
Discontinuation: Women who have agreed to participate in FGLS may decide to leave the study at any time without adversely affecting their care in any way.
Confidentiality: All phenotypic and imaging data will be available for clinical use at each centre. For research purposes, privacy rules will be maintained in line with the Health Insurance Portability and Accountability Act of 1996 (HIPAA). All individuals will be deidentified to ensure confidentiality. At the Coordinating Unit, all phenotypic and imaging data will therefore be stored, transmitted and analyzed anonymously.
Data sharing for clinical care: All the clinical data we plan to collect are being obtained anyway as part of routine clinical practice in these institutions; the data will therefore will be available to the providers at all times. Our proposed policy with regard to the ultrasound measures is as follows: a) the gestational age estimate will be incorporated into the medical records; b) an ultrasound examination for structural congenital malformations will be performed at the time of the 3 rd scan, if this conforms with local practice, and the results will be incorporated into the medical records; c) as far as 2D ultrasound measurements are concerned, after the blinded values have been submitted electronically to the dataset, the last measurement will be provided for clinical use, and d) no 3D data will be provided for clinical use.
Finally, there are concerns about the misuse of ultrasound for sex selection in places where it may be used (especially if the practice is illegal). We shall therefore be very vigilant in regions where this practice is known to occur and monitor sex ratios in the sample continuously.

E. Challenges
Implementing this project presents major challenges for the research team. However, our experience in conducting large-scale studies in pregnancy means that we are well prepared to face the challenges, some of which may be difficult to resolve: Recruitment: We do not anticipate capacity problems in recruiting a large number of women (4,000) from 8 centres, but it remains unclear what proportion of women in each centre will actually be eligible for FGLS given the inclusion criteria. We anticipate an average eligibility of 75%, and expect the centres to devise local strategies to maximize that number.
Follow-up: We do not anticipate problems retaining pregnant women during follow-up but we do recognize that such a large follow-up study requires a commitment from participants and staff. In addition, the planning and coordination of the ultrasound scans with only one ultrasound machine available per centre will require considerable organization in the hospitals that should not be underestimated. Follow-up of the newborns presents the following challenges: a) difficulties measuring very small infants, especially those in an incubator; b) morbidities associated with prematurity, which may modify the measures to be taken; c) mortality affecting the final sample size, which explains why we intend focusing on healthy preterms, and d) follow-up after discharge is a problem in any such study, but we expect that adherence to the protocol will be very good for the preterm babies given the parents' concerns for their health.
Data quality: Standardization of a large number of health professionals is always a major challenge for which we are prepared. Coordinating travel arrangements between centres to monitor data quality will need to be well organized.
Data analysis: Constructing growth curves is a sophisticated task for which several statistical methods are available; the experience gained in the WHO study is a major asset to overcome this challenge.
Sample size: Selecting an adequately powered sample size for studies like this one is always a compromise between statisticians' estimates, logistics and cost. We are concerned that we should not over-extend our study by attempting to answer too many questions. Ideally, we should have a sample size to create preterm standards with stable outer centiles and birth weight for gestational age standards with sufficient newborns with severe outcomes across the gestational age distribution. We would also like to explore ethnic-specific growth, in the event that multiple ethnic differences emerge from the data. However, based on the findings of the WHO Child Growth Study, we consider it unlikely that each ethnic group will have its own distinct growth pattern. Nevertheless, we shall explore this question by including women from each of the 5 ethnic groups: Whites, Latin Americans, African blacks, Orientals and Indians.
Motivation: The final challenge, but perhaps the most important, is to maintain motivation of the local investigators and data collection teams throughout the project, as they are the people who can guarantee the quality of the study.
Resources: Finally, we appreciate that advocating the introduction of new fetal growth standards into clinical practice in developing countries could be criticized on the grounds that it will require already under resourced institutions to purchase relatively expensive ultrasound equipment. We accept the possibility that some institutions will purchase ultrasound equipment based on the results of this study, despite our recommendation that serial ultrasound measurements are not necessary for the evaluation of fetal growth during routine antenatal care (10). However, many district level and referral hospitals in developing countries already have ultrasound machines and when they are used to evaluate high risk pregnancies and/or women with uncertain gestational age, the results may be misinterpreted or misused as the charts available are often inadequate. Therefore, much of this project is orientated towards more effective use of equipment that is already in such institutions.
Interestingly, the expensive and very sophisticated 3D imaging system being investigated in this study, for the first time on this scale, has potential to benefit high-risk pregnancies in rural areas and isolated communities. The rationale is that it will become feasible to transfer images from peripheral clinics to specialized centres for reconstruction and interpretation as the cost of the equipment falls and it becomes more portable. Such a strategy would reduce the transfer of large numbers of women unnecessarily as occurs at present and contribute to the decentralization of antenatal care services worldwide.

F: Regional distribution of Study Sites:
We have recruited study sites in the following regions of the world: Europe (UK); North America (USA); Central and South America (Brazil); Africa (Kenya); Arab World (Oman); Asia (China) and the India subcontinent (India).

G. Intervention Effectiveness for Impaired Fetal Growth
We ourselves have explored the question of what to do after IUGR is diagnosed and we have published a series of review papers critically evaluating RCT interventions for IUGR (47-50) and another group's recent update (51). There are not many interventions for treating (as opposed to preventing) IUGR, other than planned elective delivery, to which use of ultrasound contributes greatly in developing countries. This explains why referral to an adequate level of perinatal care is so important an option. Our study therefore focuses on helping clinicians to detect impaired fetal growth accurately, which should avoid incorrect diagnoses and, thereby, iatrogenic preterm births. The misuse of this technology is one of 38/116 the factors responsible for unnecessary medical interventions and why some treatments or interventions are shown to be ineffective, i.e. treatment of fetuses that may not have required it in the first place. Furthermore, in the light of information contained in the Lancet 2008 Nutrition Series and other similar recent reports, we will consider the possibility of implementing nested case-control studies to obtain information related to maternal body composition, metabolism and micronutrient status, or growth factors and birth outcomes. We will explore them in detail as this is a unique opportunity for such studies.
We think that interventions can only be effective if they focus on the factor responsible for the growth restriction. IUGR is a very heterogeneous condition, as we have shown, and it is unlikely that a "silver bullet" will resolve all IUGR (or preterm deliveries), even if correctly detected. Overall, we believe that accurate early identification, avoiding false positive cases, and appropriate referral to adequate levels of obstetric and newborn care will have a great impact on neonatal mortality and resource allocation in developing countries.

A. Monitoring and Evaluation
Data collection will be monitored using routine procedures and protocols employed by our network in several similarly sized, multi-centre studies. The production and evaluation of specific standards will undergo a methodical and rigorous peer review process through the expert consultation mechanisms used in previous projects: the WHO Child Growth Study and the perinatal multi-centre studies conducted by our network. Monitoring the implementation process at the study sites will follow principles established in implementing previous, similarly complex, studies carried out by Consortium members. This will include site visits and on-going quality control measures (see Section IV).

B. Dissemination of Results
The generous participation of hundreds of scientists in all aspects of dissemination was the key to the success of the WHO Child Growth Study and we will therefore rely on their already established system. So, involving the Maternal Health Task Force at the Harvard School of Public Health in disseminating the results locally will be our principal strategy; this includes local and regional meetings and symposia. We will then focus on the local and global professional societies that are an integral part of the application, as well as health authorities in the participating countries and regional health institutions. Finally, a formal dissemination strategy similar to the one used by the WHO Child Growth Study will be implemented, including partnerships with other key players in the international field such as UNICEF, other UN agencies, NGOs and government institutions.

IX. Optimizing Public Health Outcomes and Intellectual Property Plans to Achieve Global Access
It is expected that these international standards will be used to assess the growth of fetuses and newborns throughout the world. The basic assumption behind the proposed standards is that fetal growth is optimal in healthy populations. Therefore, the proposed curves will constitute "optimal" standards. Widespread uptake of these fetal curves will be ensured by ultrasound equipment manufacturers incorporating them into their systems. The potential uses are: 1) Population assessment to a) provide a reference for comparison of the means (or medians), standard deviations, and trajectories of the population means (or medians) for any given sample, b) estimate the proportions of fetuses-newborns below a given cut-off in a sample, and c) standardize for gestational age and sex differences between samples, thereby allowing comparisons of prevalence rates below a given cut-off across samples.
2) Individual assessment to a) screen for fetal growth on a single occasion to identify fetuses with excess or deficient growth leading to appropriate diagnostic and/or therapeutic interventions (including delivery), and b) provide a clinical tool for assessing the efficacy of treatments or helping to choose time of delivery, particularly in severely growth restricted fetuses. Hence, the primary use of the curves at an individual level will be to identify unacceptable deviations as early as possible to prevent severe deficits and excesses, although they will need to be introduced into clinical practice with appropriate care and auditing. We expect to link with, and build upon, WHO's global efforts to disseminate the child growth charts. The large network of proposed partners and consortium members should prove, with the help of industry, to be a very effective method for disseminating the results at both clinical and public health levels.

Yes No
1. Is the proposed research likely to lead to any patentable or commercially exploitable results? X The study will provide each centre with a sophisticated (commercially available) 3D ultrasound machine, specially adapted for the needs of the study, as well as technical expertise for data transfer and storage of 2D and 3D data. The proposed research is unlikely to lead to patentable results. Ultrasound machine manufacturers throughout the world will be encouraged as part of the Dissemination and Implementation Plan to incorporate the new standards into their machines from which they may derive commercial benefit.

B. Commitment to Sharing Data and Materials
It is highly unlikely that patentable IP will be generated by this project. The growth standard and related analyses represent the new evidence-based knowledge that will be generated. These will be widely disseminated, especially in open access publications, and shared with research and clinical communities worldwide, as well as the appropriate national and international agencies.
The Fetal Growth Standards charts will be made available to all manufacturers of ultrasound machines. Data from each centre will be provided to the local investigators with the understanding that the primary and secondary objectives of this study can only be explored with the pooled data set and can not be replicated using local data. Other locally relevant issues can be analyzed by individual centres.
The study Steering Committee (SC) and Executive Committee (EC) will also, after the main tasks are completed, engage in negotiations with other bona fide researchers for access to the data set to allow scientific and public health relevant questions to be explored. The SC and the EC have developed a set of rules and procedures for the evaluation of such requests in keeping with the commitment of the present investigators to ensure the widest possible utilization of the data (see Appendix F).

C. How will we bridge the new data to the existing data?
We aim to produce the following new standards: 1. Fetal growth by ultrasound: Based on discussions with ultrasound manufacturers and the known limitations of existing charts, we anticipate that the newly developed fetal growth charts will replace all charts that are currently incorporated into ultrasound machines.
2. Postnatal growth for preterm babies: Based on discussions with a number of pediatricians in different countries and WHO information, we anticipate that the postnatal growth charts for preterm babies will be incorporated into routine clinical practice without much difficulty because there is a lack of well-developed charts for this specific purpose. We believe that the international community is looking forward to such charts being developed.
3. Birth weight for gestational age standards: It is likely that the new, international, risk-related, birth weight for gestational age standards will be widely adopted following implementation of the new fetal growth and preterm postnatal growth standards. However, we acknowledge that this process may take longer because other standards are already being used (even though their limitations are recognized). We shall therefore work closely with the Maternal Health Task Force at the Harvard School of Public Health to follow the strategy they are using for the infant growth standards.
We do not anticipate any conceptual limitations to the future use of the standards, particularly in the light of the statement in the recently published Lancet "Maternal and Child Undernutrition" series that international fetal and newborn growth standards need to be developed (5). Nevertheless, it is clear that whenever efforts are being made to change clinical practice (even if the need to do so is recognized), common barriers and resistance to change will be encountered. We feel that our experience in introducing the MGRS standards will be invaluable for this task although, in theory at least, introducing the ultrasound standards should present fewer problems as they will be built into the equipment and we hope they will be endorsed by the International Society of Ultrasound in Obstetrics & Gynecology and other professional associations responsible for recommending new technology into clinical practice.
Rolling out the fundal-height charts and the newborn and preterm growth standards will use the same procedures as WHO did when it rolled out its Child Growth Standards. We shall coordinate these processes from the start of INTERGROWTH-21 st by liaising closely with both the Maternal Health Task Force at the Harvard School of Public Health and the leading manufacturers of ultrasound machines. The network already developed by the team will serve as a means of rolling out the standards we propose developing in INTERGROWTH-21 st .

A. Organizational Capacity and Facilities
History: The University of Oxford enjoys an international reputation as a world-class centre of excellence in research and teaching. It employs over 7,800 academic, research and support staff across a wide range of academic disciplines. The Medical Sciences Division, within which the Nuffield Department of Obstetrics & Gynaecology (NDOG) is located, is one of the major centres for clinical and basic biomedical research in Europe, with more than 2,200 staff and 800 postgraduate students. It achieved top scores in both the 1996 and 2001 UK HEFCE research assessment exercises, and fosters the highest possible standards in research, teaching and patient care. The annual grant income from external sources for the Medical Sciences Division is approximately $200M.
Experience: The Consortium members (University of Oxford and collaborating centres) have extensive experience in conducting multi-centre studies and constitute perhaps the largest research network in the world in the field of women's and perinatal health. NDOG has coordinated large-scale genetic epidemiology studies across multiple international sites, involving data collection from thousands of individuals in the ENDOGENE Study. Office space within its existing facilities at the John Radcliffe Hospital, Oxford, will be provided to house the Coordinating Unit (CU) staff, as well as the necessary IT support and administrative assistance. Sites have been chosen on the understanding that they have the necessary infrastructure to conduct the study (e.g. IT and capacity for ultrasound research). Finally, the professional societies recruited to help coordinate implementation and dissemination of the results are among the leading groups in the world in the field of obstetrics.

B. Management and Staffing Plan
Management: The study will be coordinated and managed by NDOG (University of Oxford), where the CU will be located (see Appendix F). Four units responsible for day-today monitoring of quality control and data collection (2D Ultrasound Data Quality Unit; Anthropometric Data Quality Control Unit; Data Management Unit and 3D Ultrasound Data Quality Unit) will report directly to the CU. All data, except the 3D data, will be centralized at the Data Management Unit.
The Steering Committee (SC) consists of representatives of Consortium members and the lead investigator from each of the study centres. It has also external members to provide support in areas related to the study. The SC will meet twice a year to review progress, ensure uniformity of data collection from the study sites, and discuss any substantive issues that arise. Any adaptations to the protocol or technical variations required at individual sites because of local needs will be reviewed and approved by the SC.
The Executive Committee (EC) will monitor the progress of the study on a regular basis and resolve substantive issues arising from implementation of the study. The Study Coordinator will be ex-officio member of the EC. The EC will make the final selection of study sites; exclude a site if data collection standards are not sufficiently high, and make final decisions about sample selection and analytical issues related to construction of the standards.
The AC, consisting of internationally recognized experts in ultrasound, obstetrics, statistics, nutrition, fetal development, anthropometrics, epidemiology and biology, will provide technical advice to the CU, EC and SC. The study will be indemnified by the University of Oxford that will carry liability for any adverse outcomes arising from breaches of research protocol.

Study coordination:
The Principal Investigator (José Villar) will be responsible for all scientific and technical components of the project. The Project Director (Stephen Kennedy) will be the focal person in NDOG responsible for overall study management. The Project Leader (Leila Cheikh Ismail), will be responsible for day-to-day implementation of the study, coordinating and monitoring of all activities as well as training, monitoring and evaluation. A Statistical 43/116 Director (Doug Altman) will advise on sampling strategies, sample sizes and all statistical methods, in particular those for the construction of the new standards. He will be assisted in Years 1-5 by a Data Management Supervisor (Stephen Ash), who will coordinate all aspects of data management between the Data Management Unit (DMU), the CU in Oxford and the other data quality units. This will include the preparation of data sets, quality control strategies and data files for use by the team of statisticians. This strategy for data management and analysis has been extremely efficient in previous, large multi-centre studies conducted by us as it allows for direct contact between the data collection centres and the coordinating unit. In Years 3-5, Dr Altman will also be assisted by a Senior Statistician to develop, test and implement computer programs to construct the new growth standards.
There will be a Senior Technical Coordinator (Aris Papageorghiou, St George's, London) of the ultrasound component of the study, who will have responsibility for overall coordination of the ultrasound measurements, including quality control and standardization. He will be assisted by an External Ultrasound Expert (Laurant Salomon), acting as a consultant and a Clinical Research Fellow (Caroline Knight) to oversee the quality of a random sample of ultrasound measurements during the data collection period. The CU team will be completed by a Secretary (Melissa Shorten) responsible for secretarial support to the CU, a Project Administrator (Ann Lambert) in charge of all aspects of transferring funds to the study centres, equipment, meetings, travel and coordination between the CU, DMU and study centres, and a Research Associate (Isabelle Wilson) responsible for piloting protocol and monitoring its implementation.

XII. Appendices Appendix A: Data Collection Forms
Three different types of data recording forms will be used:

NCSS: 1) Pregnancy and Delivery (including newborn anthropometrics) -same as FGLS
It should be emphasized that all forms are as short as possible to improve compliance. Therefore, all questions have been carefully evaluated to ensure that they serve at least one of the following purposes: a) eligibility (e.g. socio-economic status); b) sample description (demographic and environmental variables, etc.); c) exclusion criteria (major illnesses, etc.); d) standardization of results across centres; e) future use of references (vitamin/mineral supplements), or f) to assess possible selection biases.

Background
A central plank of INTERGROWTH-21st is the creation of fetal growth charts during the FGLS part of the study. The correct use of ultrasound is essential to ensure that accurate, reproducible and applicable results are obtained. This document sets out all the ultrasound requirements for the study, including measurements, training, equipment and quality control.

Initial ultrasound examination and dating:
All mothers will have a trans-abdominal ultrasound scan between 9 +0 to 14 +0 weeks. This will be performed to rule out ectopic pregnancy, missed abortion and multiple gestations and to estimate gestational age (GA).
For the purposes of the study, gestational age will be calculated by LMP and confirmed by fetal crown-rump length (CRL). In order to allow accurate assessment of gestation the inclusion criteria used were: • Certain LMP • Regular 26-30 day menstrual cycles in the last 3 months; • No hormonal contraception use, pregnancy or breastfeeding in the 3 months preceding the LMP. • As embryonic growth may be different in pregnancies from assisted reproductive techniques 2,3 only those with spontaneous conception were included • The CRL will be plotted on well-established charts 4,5 . If the difference in gestational age estimation by CRL and LMP is ≤ 7 days we will consider the LMP to be reliable and take it as the true biological date. Women where the difference in gestational age between LMP and CRL is greater than 7 days will not be eligible.

Subsequent ultrasound examinations
Following the initial scan, women will have up to six scheduled scans at ~5 weekly (±1 week) intervals until delivery. When rescanning does not occur within the allotted time, the women will be asked to attend at their next scheduled study appointment.

Essential documentation
At all examinations after dating the following will be documented:

Optional measurements
This study presents a unique opportunity to obtain other measurements of fetal biometry.
However, the introduction of extra tasks could reduce the overall quality of the seven main measurements and inconvenience the participants. Therefore, the feasibility of taking additional measurements will be evaluated in consultation with each centre before a final decision is made and only some centres are expected to participate in these additional studies. In descending order of importance: 1. Humerus 2. Radius / Ulna 3. Tibia / Fibula

Essential 3D measurements
The objective of the 3D component is to provide volumetric data to assess the quality of the 2D measurements (BPD, OFD, HC, AC, FL). Hence, the 3D volumes that we obtain will be of the: (1) head (capture 3D volume with 2D view at the level of the BPD) (2) abdomen (volume at the level of the AC) (3) femur (volume in the same view as measuring the FL)

Optional 3D measurements
In order to establish a unique "biological databank" for future research additional 3D volumes of other organs, e.g. the heart at the level of the 4-chamber view of the heart; placental volumes, can be taken specifically for these purposes. These will be considered ancillary studies and will be dependent upon local capacity and the approval of the Steering Committee.
We plan to take the final decision regarding the amount of 3D data to be collected and the number of participating sites for these ancillary studies after conducting a pilot study assessing the human resources available at the centres, the need for additional training, time that 3D scanning adds to each visit, equipment considerations and the feasibility of transferring the very large datasets to the Coordinating Unit in Oxford.
Issues that we will explore in the pilot study include: a) difficulties obtaining liver, brain and fetal cerebellar volumes b) the need to obtain at least two good volume measures from the areas of interest, which may require 20-30 mins additional scanning c) the temptation to divert attention away from standard measurements to more "exciting new technology" which will be popular with women. For these purposes the surface rendering function will be disabled. d) how to save and transmit the volumetric data reliably given the risks of a hard drive crashing or being corrupted. e) With advancing gestation (especially in the third trimester, it is likely that we will not be able to record many of the 5 volumes listed above accurately (i.e. brain and abdomen).

Methodology of ultrasound and definitions
The first visit (dating scan) will be between 9 +0 and 14 +0 completed weeks. The aim of this scan is to confirm fetal viability, and to exclude multiple pregnancy or major fetal abnormality. Fetal CRL is measured at this visit. In women who meet the inclusion criteria and where the discrepancy between gestational age by LMP and CRL is ≤7 days, subsequent visits (for fetal biometry) will be scheduled at approximately 5 weekly (± 1 week) intervals (i.e. 14-18, 19-23, 24-28, 29-33, and 34-38 weeks and 39-42 weeks).
A full morphological evaluation will be conducted at 19-23 weeks following standard practice at each centre. Fetuses diagnosed with any minor abnormalities will be managed as per local clinical guidelines. If the clinical decision is to continue with the pregnancy the mother will remain in the study. Study follow-up for these cases will be identical, but eventual exclusion at the time of data analysis will be carried out.
Fetuses with major abnormalities that may affect morphometric measurements will be excluded from further study. All infants will receive anthropometrical assessment after delivery purposes as per local protocols. 6. In each view a single 3D volume is stored for quality control purposes. 7. Ultrasound equipment provides measurements to tenth of a millimetre. 8. Data are submitted electronically to the web-based data set 9. Depending on the availability of appropriate technology, associated images are also submitted electronically to the Coordinating Unit. If real time submission of images is not available these will be down-loaded onto a memory card and submitted via email. 10. Ultrasound images of CRL, BPD, OFD, HC, AC and FL must fill at least 30% of the monitor screen.

Crown -Rump Length (CRL) -only at the dating scan (9 +0 to 14 +0 weeks)
Obtaining the image • A mid-sagittal section of the fetus should be obtained • The fetus should be horizontal (at 90° to the angle of insonation) • The fetus should be in a neutral position (not hyperextended or flexed)

Magnification
• The image must fill at least 30% of the monitor screen.

Caliper placement
• The intersection of the calipers should be placed on the outer borders of the head and rump.

Biparietal Diameter
Obtaining the image • A cross-sectional view of the fetal head at the level of the thalami • As close as possible to the horizontal with the angle of insonation as close as possible to 90 o • Oval shape • Symmetrical • Centrally positioned, continuous midline echo (falx cerebri) broken anteriorly at one third of its length by the cavum septum pellucidum • The thalami should be located symmetrically on each side of the midline falx.

Magnification
• The cross section of the fetal head must fill at least 30% of the monitor.

Caliper placement
• The intersection of the calipers should be placed on the outer border of the parietal bones ('outer to outer') at the widest part of the skull.

Occipito-frontal diameter (OFD) and Head Circumference (HC)
Obtaining the image and magnification: obtained from the same still image as the BPD.

Caliper placement
The intersection of the calipers should be placed on the outer border of the occipital and frontal edges of the skull at the point of the midline ('outer to outer') across the longest part of the skull. The HC will be calculated from the BPD and OFD measurements using the ellipse facility (using the formula HC = π(BPD + OFD)/2).

Abdominal Circumference
Obtaining the image • Transverse section of the fetal abdomen as close as possible to circular • Umbilical vein in its anterior third • Stomach bubble visible • Kidneys and bladder not visible

Magnification
• The cross section of the fetal abdomen must fill at least 30% of the monitor screen.

Caliper placement
• The anterior-posterior abdominal diameter (APAD) and transverse abdominal diameters (TAD) are measured. • To measure the APAD the intersection of the calipers is placed on the outer borders of the body outline from the posterior aspect (skin covering the spine) to the anterior abdominal wall. • To measure the TAD the intersection of the calipers is placed on the outer borders of the body outline at 90° to the APAD, across the abdomen at the widest point. • The AC will be calculated from the APAD and TAD measurements using the ellipse facility (based on the formula AC = π(APAD + TAD)/2).

Femur Length Technique
Obtaining the image • To be imaged as close as possible to the horizontal plane.
• Angle of insonation of the ultrasound beam is 90 o . • The full length of the bone is visualised • Not obscured by shadowing from adjacent bony parts.

Magnification
• The cross section of the fetal abdomen must fill at least 30% of the monitor.

Caliper placement
• The intersection of the calipers is placed on the outer borders of the edges of the femoral bone 'outer to outer'). • The trochanter is not to be measured.

Techniques for optional measurements:
The humerus, radius, ulna, tibia and fibula can all be measured from 14 +0 weeks onwards. Similar to the guidelines of femur length measurement, the longest length of each bone is measured with the bone at as close as possible to 90° to the ultrasound beam. The humerus is measured from upper to lower lateral margins. In the forearm the ulna is distinguished from the radius by its longer proximal length (both radius and ulna end at the same level distally). In the lower leg the lateral bone is the fibula with the tibia lying medially to it. All long bones are to be measured along their longest lengths from upper to lower lateral margins. The bone measured should fill at least 30% of the width of the screen. Only one side (the anterior or most easily accessible limb) should be measured.

Inability to take measurements
The fetal position may on occasion be so persistently unfavourable as not to allow measurements according to the guidelines above.
CRL: Accurate measurement of CRL is of obvious importance in ensuring that a woman is eligible to take part in the study. If despite repeated attempts it is not possible to obtain a CRL measurement that allows confirmation of gestational age the woman is not eligible for the study (in the same way she will not be eligible for the study if the gestation calculated from CRL and LMP is discrepant by more than 7 days).
Essential measurements: With the exception of CRL, every effort should be made to obtain the best possible measurements taking into account the definitions above. This may require allowing the woman to go and return for the scan at a later date/time (within one week). Measurements should not be taken if it remains impossible to obtain a good quality image to allow accurate measurement this should not be taken. The next appointment should be kept as scheduled.
Optional measurements: The same principle applies to the optional measurements: these should not be taken if it is impossible to obtain a good quality image.

Equipment (based on RCR recommendations) 13 :
In all applications of ultrasound, three things are of prime importance: image clarity, resolution and the ability to differentiate tissue structures. Hence, although it is accepted that the best possible images may not be obtained from all patients at all times, the equipment chosen must be capable of visualizing tissue structures in the overwhelming majority of patients with different body sizes and shapes to a level sufficient to meet the exacting needs of the study.
The specifications below are the minimum requirements for the study, but it is important to recognize that image clarity and resolution may differ between manufacturers despite machines having similar specifications. It is therefore essential that candidate machines are tested before a final choice is made to assess their quality and to determine operator preferences. It is envisaged that this will be done at the Study Coordinating Unit (Oxford) or the Ultrasound Coordinating Unit (St George¹s, London). We realize that there is a potential trade-off between quality and cost and, therefore, both these factors need to be considered in a balanced way in making a decision.

Study specific considerations
1. All ultrasound equipment to be used will be standardized • Uniform probes • Uniform factory presets ("study preset") 2. Commercially available high quality real-time ultrasound scanner. 3. Less than 2 years old. 4. Transabdominal probes suitable for scanning throughout pregnancy. 5. Facility for on-line transfer of measurements and associated images. 6. Facility to "blind" measurements from examiner until after data transfer. 7. Facility to "unblind" the final measurement to allow clinical use. 8. Equipment will be serviced periodically by specialized technicians as per the manufacturer's recommendations.

Scanheads
• The scanner should have, at least 2 active transducer ports from which one can be selected from the control panel. • Broad band 2D probe which must be suitable for examinations at 9-14 weeks, 21-23 week and third trimester growth scans and with a penetration suitable for a wide variety of patients. E.g. Broadband 3-6 MHz and broadband 5-9 MHz probes or a single broadband probe of 3-9 MHz

Accessories
• Thermal printer capable of printing onto continuous standard thermal paper • DVD player/recorder General Configuration • The unit should be compact and provide reasonable portability.

Safety
• The unit should meet or exceed performance and safety requirements of UL 544, CSA C22.2 and IEC 60601-1 and be CE marked Maintenance • In order to ensure the equipment is maintained in proper working order, the basic maintenance requirements are listed below. Suppliers are required to specify the services they will provide and their costs:

Quality Control
Quality assurance and control of the Ultrasound component of INTERGROWTH-21 st will be the remit of an Ultrasound Quality Unit (USQU) (see section 6 below).
In order to ensure ongoing quality control we will employ rigorous tools for training, assessment and certification under the supervision of qualified instructors. Intra-observer and inter-observer measurement errors will be assessed during the training course before initiating the study.
Ongoing quality control and adherence to protocols will be performed. This will include quality assessment of images, random evaluation and repetition of ultrasound measurements, and assessment of collected data. It is envisaged that a limited number of • 3D Broadband 3-6 MHz or 3D broadband 5-9 MHz probes • Spatial Temporal Image Correlation (STIC) -option -capture full fetal heart cycle in real time using volumetric transducer • Surface rendering disabled • Volume Contrast Imaging • Specific software for post-processing stored 3D volumes. • Any non-compliance with the specification must be clearly identified together with any alternative or additional features. • A decision may be taken to lease the selected device. If so, a tendering exercise will be undertaken to choose an appropriate leasing company. • The availability of any training including courses offered / recommended by the manufacturer / supplier should be stated, together with any associated costs that may be incurred. Preference will be given to manufacturers / suppliers offering free fault finding, preventative maintenance, and first line maintenance training courses to two members of each participating centre's maintenance team. • The cost of any contracts available for the regular service and/or maintenance of the devices should be separately stated. • Prices should include any discounts or special offers available, including those of any ex-demonstration units that may be available, but these should be separately detailed. • The terms of guarantee must be stated. experienced obstetric ultrasonographers will form the research teams at each study site. They will be provided with standardized equipment and the ultrasound protocol describing all measurement techniques, protocols and procedures for training and clinical use.

Initial training, assessment and standardization of ultrasonographers
It is recognized that the local ultrasonographers will already have a high standard of training. Therefore, the goals of initial training are: • To ensure all ultrasonographers are familiar with equipment to be used in the study, including the ultrasound machine, the SOPs, the automated recording of ultrasound measurement values, software and entry of data. • Ensuring standardization by training, assessment and certification. The schedule for this is as follows: 1. Individual and group theoretical training (Coordinating Unit) 2. Hands-on ultrasound training (Coordinating Unit) 3. Submission of a log of 10 measurements of each parameter. These measurements will be repeated three times from three different images (to assess intra-observer variability), and repeated offline by the external sonographer (to assess inter-observer variability): 4. Measurements should, on average, score >75% of the max score (i.e. 4 and 3 respectively, see below (5.2.1) 5. No more that one out of the ten repeated measurement by the same operator should vary of more than two standard deviations (SD) of the measurement error of each given parameter at the given GA. 6. No more than two out of the ten repeated measurements by a different operator should vary by more than two SD (proxy for random error) and there should not be a systematic bias of more than 0.5 SD. 7. If these criteria are not fulfilled, certification will not be given. 8. If the criteria are fulfilled certification of competence will be given prior to the start of the study

Pilot test-retest study
Similar to the anthropometry component of the study, quality control of ultrasound measurements will be primarily based on the comparison of repeat measurements by the same or different observers. Routine data collection will be guided by a system of maximum allowable differences between replicates. However, in contrast to anthropometry, for ultrasound measurements not much is known about what the maximum allowable differences should be for the different fetal measures at different stages of pregnancy. It is not known how exactly fetal size and GA influence measurement error in ultrasound. In addition, it is not clear how much measurement error in ultrasound is influenced by expertise and how expert-level and non-expert-level measurement errors compare. A pilot test-test study is therefore proposed to determine the Technical Errors of Measurement at both levels of expertise and trends to bias (in comparison with expert), specifically for: -CRL between 9 +0 -14 +0 weeks -BPD, HC, FL, AC in the different GA intervals that will be used in the study: 14-18, 19-23, 24-28, 29-33, 34-38 and 39-42 weeks This pilot study will be carried out in the unit at Oxford. It will involve one or several other obstetricians who regularly do ultrasound measurements in the same hospital. All observers should be familiar with the measurement protocol of the study and should have received at least one training session by the LU prior to the start of the pilot study. The same equipment must be used as the one that will be used in the main study. For each of the seven GA intervals (from 9 +0 -14 +0 to 39-42 weeks) a number of 10 or 15 women will be enrolled over a 2-month period (November -December 2008).
It is proposed that each woman undergoes 4 independent assessments in a random order, two assessments by the LU and two by another ultrasonographer in the hospital. Each assessment will consist of an ultrasound session during which all the measurements needed for the GA interval will be performed in a random sequence. For each measurement a number of metadata (factors potentially modifying measurement error) could be collected so that the pilot study of itself becomes more publishable. All observers should ideally be blinded to the measurement values they obtain by automatic transfer of values to the online system.
This pilot study will yield GA-specific Technical Errors of Measurement and other statistics of inter-and intra-observer reliability and bias. These, in turn, will be used to define the maximum allowable differences for each fetal measure at each age that will be used to: (1) evaluate success of initial training (2) identify the need for re-measurement in the routine data collection, and (3) interpret data quality statistics from the standardization exercises and the random re-measurements.

Assessing local factors: the initial site visit
Prior to the start of the study, an initial site visit will be performed by a member of the USQU. The aims are to ensure: • Preparation of the local data quality control activities, which will involve o Identification of a local supervisor of ultrasound data quality control activities. This will often be the ultrasonographer with the best data quality statistics and closest in performance to the LU during the initial LU visit. The person will have good organizational and team working skills and be competent in using the required software. The local supervisor should be instructed and trained to do the following tasks: ! Conduct refresher sessions every two months, and document observer performances ! Conduct data quality control re-measurements on a random sample of routine measurements ! Observe as many routine measurements as possible performed by all site ultrasonographers ! Extract and analyze, in collaboration with a LU, data from routine standardization sessions and from random re-measurements done at the site. ! Liaise with LU about problems with ultrasonographers, SOPs and data quality statistics o Identification of a local data management coordinator who will liaise with the overall study coordinator.

Qualitative quality control
All measurements taken by ultrasonographers will be qualitatively controlled and scored by the Research Fellow.

60/116
Scores will be given according to the described scheme by Salomon et al 11 . Each specific criterion scores one point; thus, the maximum score is 6 points for BPD/HC; 6 points for AC; and 4 points for FL. Any ultrasonographer with more that 10% of images rejected in a given period will have his/her certification withdrawn and will undergo re-training.
Objective scoring system for still images (modified from Salomon et al 11  In case of clearly incorrect caliper placement, images will be rejected and measurement excluded. If the image scores half or less of the maximum score (i.e. 3 points or 2 points, respectively), measurements will also be excluded.

Quantitative quality control: Intra-observer reliability
Intra-observer reliability will be prospectively assessed based on the three concealed measurements taken routinely. No more that one out of the ten repeated measurement by the same operator should vary by more than two SD of the given parameter at the given GA.
If this is not fulfilled, certification is cancelled and the sonographer identified as requiring retraining. Feedback and discussion between USQU and site supervisors will follow in order to highlight the need for re-training and to carry out re-training.

Quantitative quality control: Random re-measurement on 2D images
In order to assess correct caliper placement, a random sample of 10% of all measurements will be re-measured on still images by the Clinical Research Fellow / LU. This will allow production and evaluation of inter-observer reliability and bias (against the LU) for each observer based on the random re-measurements. No more than one out of the ten repeated measurement by a different operator should vary by more than two SD and there should not be a systematic bias of more than 0.5 SD.
If this is not fulfilled, certification is cancelled and the sonographer identified as requiring retraining. Feedback and discussion between USQU and site supervisors will follow in order to highlight the need for re-training and to carry out re-training.

Quantitative quality control: Random use of 3D images to re-measure planes
In order to assess correct image acquisition, a random sample of 5% of all 3D volumes will be used to acquire the ideal measurement plane by the Clinical Research Fellow / LU. This will allow evaluation of inter-observer reliability and bias (against the LU) for each observer based on the random re-measurements. No more than two out of the ten repeated measurement by a different operator should vary by more than two SD (proxy for random error) and there should not be a systematic bias of more than 0.5 SD.

61/116
If this is not fulfilled, certification is cancelled and the sonographer identified as requiring retraining. Feedback and discussion between USQU and site supervisors will follow in order to highlight the need for re-training and to carry out re-training.

Site standardization exercises.
Local test-retest exercises involving all site observers are an opportunity for refreshing the understanding of the protocol, better focus on the protocol, re-training and better comparison of observers as each observer measures the same subjects, which is not the case for routine measurements and for the random quality control re-measurements.
o Organize periodic production and evaluation of data quality statistics for each observer based on the measurements taken during the standardization sessions o Feedback and discussion between USQU and site supervisors about data quality statistics from the standardization sessions; identification of need for re-training; organize and carry out re-training o These exercises will be done on 10 patients every 6 months, or more frequently depending on UQDU recommendations

Optimizing the data systems to accommodate the needs for quality control
The data entry system should allow incorporating measurement values and data from quality control and standardization exercises and be able to identify these as such. o The data system could select a random sample of non-supervisor routine measurements for QC re-measurement. Ultrasonographers will be unaware which measurements will be re-measured when they do their measurements. o Data extraction routine for monitoring of data quality statistics of entire study, sites, and individual ultrasonographers o Establishment of a research dataset on determinants of ultrasound data quality, to be extracted from the main database. ! Possible factors causing variability in ultrasound measurements with a given type of instrument may include, among others:

Central reporting
The USQU will produce 3-monthly data quality statistics based on standardization sessions, QC re-measurements and routine replicates. This may include, as appropriate the production of statistics and trend plots of: Research questions using the research dataset on ultrasound data quality • Determinants of observer bias in ultrasonography • Determinants of observer reliability in ultrasonography • Differences between concealed and revealed measurements • Difference between 2D and 3D measurements • Differences in data quality statistics obtained from training sessions, routine data collection, quality control, and standardization sessions

Ultrasound protocol and standard operating procedures (SOP).
This SOP includes procedural guidelines on: - circumstances under which one should delay or forego the assessment or part of the assessment; -Release of measurement values and GA calculations for clinical purposes -Preparation of measurement setting and instruments; linkage with computer and database -Ultrasound device and accessories: maintenance, transport, storage, calibration checks

Unanticipated changes to this protocol / SOP
Despite all efforts, it is likely to be impossible to anticipate all possible eventualities in a study of this size. Therefore, changes to this protocol will be possible. These will be proposed to the Steering Committee by the USQU, and incorporated if approved.

Appendix E: Data Management: the Electronic System
The data management process will be performed electronically, simultaneously with data collection, at each study site. Data entry will be implemented on a customized web-based system developed for the study. The system automatically carries out range and consistency checks for immediate correction and reviews percentages of missing/unknown values for each variable. This technology also allows a real-time audit of each site by a set of predefined reports included in the system (recruitment, data inconsistencies, etc) run periodically by the Project Data Manager. Additional modules for internal/external messenger, hosting of study documentation (protocol, guidelines, etc) and user administrator make up an integral data management solution for multicentre studies.
The INTERGROWTH-21 st website homepage will consist of an open part and a part protected by username and password. The open part will be used for publication of information, news etc. The protected part will contain the data entry application, a reporting module, an alert module, a descriptive statistics module, and an administration module. We are looking into the possibility of linking the application to the external ultrasound data sources.
Ultrasound measurements obtained during the follow-up visits will be electronically transferred from the ultrasound equipment to the web-based system to avoid transcription errors using a database format, agreed between ultrasound equipment manufacturer and system development group. Routine analyses will be carried out regularly for each variable to check digit preference and unusual frequencies of answers that may reflect poor understanding of the instructions. After the information is electronically transferred, the last measured will be made available to the attending staff for clinical use. The application includes an automated user and event logging system. Data collected will be evaluated by the Data Management Unit for further quality control analyses and compliance with the study protocol, including the analysis of eligibility criteria, the timing of visits and drop-out rates. Measurements for the study taken from the equipment will have a facility to blind all fetal measurement values until they are submitted for the study. Following submission the last value will be printed for use in clinical care.
All other clinical and anthropometric data will be collected initially in a paper form entered onto the on-line system at each local institution by specially trained data management staff. Data should be entered into the system within one week of being collected facilitating the retrieval of possible missing data or other inconsistencies.

Data management and statistical analysis:
• Monitor data collection procedures at study sites and regularly advise SC about all data received and processed • Validate data and produce queries about inconsistent and/or apparently incorrect data • Send data queries to study sites so that errors are corrected at collection phase • Update master file using corrections from validation checks and/or answers to queries • Evaluate electronic monitoring reports by study site based on recruitment rates, dropouts, data completeness, quality assurance checks, etc. • Maintain communication channels between study sites and supporting units relating to queries • Inform SC whenever a study site requires retraining in data collection or is failing to respond to queries and implement corresponding actions • Conduct interim statistical analyses of data in accordance with analysis plan approved by SC • Conduct final statistical analyses of data in accordance with plan approved by SC • In collaboration with SC, coordinate preparation of articles for publication in peerreviewed, scientific journals etc.

Study Committees
The following Committees will oversee the implementation of the study:

Steering Committee (SC)
The SC will consist of the following: • Professional staff from CU and its supporting units • Principal Investigators (one per site) • Selected senior scientists • Representatives from Department of Nutrition and Department of Environmental Health, and other related Institutions as considered appropriate The SC will meet regularly by conference call or face-to-face meetings to review the study's progress and discuss substantive issues that arise from the study's implementation and conduct. Its role is to make managerial decisions that affect the conduct of the study; to ensure the comparability of data from the different sites, and to resolve problems that arise in running this complex project.
The SC will have an Executive Committee (EC) that will meet frequently to review study progress and problems, protocol divergences, and other substantive issues that may arise from the study's implementation. The EC will have overall administrative responsibility for the study's implementation. All changes or alterations to the protocol or issues related to the technical, financial or administrative conduct of the study must be approved by the EC. This Committee will also select the study sites; approve the continuing participation of the sites selected, and determine which data are to be included in the pooled, international data set. Every effort will be made to achieve consensus for all decisions but where this is not possible resolution will be by simple majority with a quorum of at least 5 members. The EC membership will be JV, SK, AN, DA, MdO, ZB, AP, the SC Chair and a Principal Investigator selected by his/her peers. The Project Manager will be an ex-officio member. 70/116

Advisory Committee (AC)
• The AC will consist of senior scientists in the areas of Medicine, Epidemiology and Statistics, Obstetrics and Gynaecology, Paediatrics and Human Biology. The task of the AC will be to provide overall and strategic advice to the SC.

Data Management
Electronic data management will be contracted to Medscinet, a private company that has provided similar services in previous multicentre studies we have conducted. It has extensive experience of conducting large multicentre trials and observational studies, and it is recognized as a leading research organization in reproductive health. Although based in Sweden, the company has a UK base in the Department of Women's Health, St Thomas' Hospital, London. It will provide a full on-line data management system, as well as supervision and training support to the participating sites for the purposes of data entry, cleaning, file preparation, and study monitoring. All participating sites will be required to adhere to the study's data management system, which will be available on the study web page. We have successfully used this system in two previous multicentre studies involving sites included in INTERGROWTH-21 st .

Ancillary Studies
Proposals for ancillary studies from one or more local Principal Investigators will be considered as long as a) they do not result in alterations to the main protocol and b) they do not impact adversely on subject recruitment and/or participation in the main study. All ancillary studies and/or additional data collection must be submitted to the EC for written approval before implementation. A letter of intent must be submitted to the CU at the conceptual stage of the ancillary study. If approved by the EC, the local Principal Investigator(s) will be invited to submit a detailed proposal that should include the study's rationale, objective, methodology, expected outcomes and budget.

Site Monitoring
Study progress will be monitored by: a) Monthly reports for each study site produced by the CU using data from the electronic data bank, which will be shared with the study sites and EC members.
b) Visits (the frequency as required by the progress of the study) to all study sites to ensure that the study is uniformly implemented (the timing to be decided by the CU in coordination with the EC and Anthropometric Quality Control Unit (ADQU). The activities to be performed during each visit include, but are not restricted to: • Meeting site Principal Investigator and members of study staff • Review of study procedures, use of data forms and related documents • Observation of study personnel carrying out specific procedures • Particular attention will be paid to standardization of data collection (e.g., adherence to study protocol, and measurement/standardization protocols) • Review of coverage and participation rates, including characteristics of participants and non-participants, and reasons for loss to follow-up • Observation of data validation procedures • Physical walk-through of certain procedures (e.g., screening to determine subject eligibility or taking of informed consent) • Conversations with key support personnel to assess their practice with regard to data collection • Inspection of study facilities and subject study files c) Exchange visits by field supervisors, regional and global meetings of field directors to facilitate standardization of procedures across sites.

Dissemination of Results
The results of the study will be disseminated as widely as possible, in publications and the mass media. Activities will include: • Publication in peer-reviewed journals of all papers reporting primary and secondary outcomes based on the pooled international data set • Publication of papers and editorials in regional journals that are published in local languages, such as Bulletin of PAHO (Spanish), Chinese, French or African Journals • Electronic study newsletter • Publication in periodicals from other collaborating institutions.
• Presentations at local, regional and international meetings • Publication of a book including all collected information (e.g., growth curves) and detailed methodological issues unlikely to be included in reports to peer-reviewed journals. The Executive Committee in coordination with Local investigators will take responsibility for this task.

Data Ownership
All data derived from the INTERGROWTH-21 st multicentre project will be the property of the Oxford Maternal and Perinatal Health Institute (OMPHI) at the University of Oxford. Individual investigators will share the ownership of their site-specific data sets with OMPHI. After publication of the growth curves and the main papers related to the secondary objectives, the pooled data set will enter the public domain at a time to be determined by the SC. The anonymity of all study participants will be ensured.

Publications and Authorship
All papers reporting the new growth charts (Primary Objective, components 1-3), based on the pooled international data set (the growth curves) will be published under corporate authorship (International Fetal and Newborn Growth Consortium) The address for correspondence will be the Oxford Maternal & Perinatal Health Institute (OMPHI), Nuffield Department of Obstetrics & Gynaecology, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, UK.
For these publications, the names of the members of all committees and study units will be listed at the end of the paper (see Lancet 2006;367:1819 for an example of the format) plus their individual contributions to the study (e.g., protocol development, study coordination, data management, data analysis, manuscript preparation, etc) as required by the individual journal.
Sites participating in the multicentre project will be listed alphabetically by country; the local Principal Investigator will be responsible for selecting the authors within each country. It is understood that authorship, within each site, will be offered to those who have made a substantial contribution to the study. Within each site, the order of the author's names will be the responsibility of the local Principal Investigator. The SC will provide a list of people to be mentioned in the standard Acknowledgments, once it has been agreed what type of support will be acknowledged. All papers arising from the study should include such a list of acknowledgements. The University of Oxford will give certificates of collaboration to doctors, nurses and local staff who contributed to the study but whose names cannot appear in publications. For all other papers arising from the study (except the main growth charts described above), including those relating to secondary objectives, a modified system of corporative authorship will be used, in which SC members will be listed by name according to an agreed rotation based on personal interests and leadership in the preparation of the paper. These names would be followed by "for the International Fetal and Newborn Growth Consortium" or similar statement as agreed by the SC. The authorship order will rotate based on a system agreed by the SC, i.e. initial lottery and rotation thereafter or some other system.
The study will have a Publications Review Committee (PRC) consisting of the EC members, a Local Principal Investigator and one or two ad hoc Senior Scientists invited according to specific needs. The PRC will be responsible for: • Preparation of a list of expected publications on the primary and secondary outcomes of the study, with the authorship strategy for each set according to the rules above which are to be approved by the SC. • Setting-up writing committees for the primary outcome, corporate authorship papers and a review mechanism by the SC of the final drafts that are submitted for publication. • Reviewing and approving all papers sent for publication with the goal of maintaining internal consistency of material and methods, as well as authorship policies. • Reviewing and approving all site-specific analyses proposed by local Principal Investigators.
The following more general rules will apply: • Individual sites will not be allowed to publish analyses of country-specific data or subsets of sites that, in the EC's assessment, have the potential to be misconstrued as standard data for particular nations or population groups.
• However, individual sites and groups of sites are encouraged to publish other analyses of data based on their site-specific data. These analyses must be cleared by the EC before preparation. The order of author's names in publications based on site-specific analyses will be the responsibility of the local Principal Investigator. The EC will provide the Acknowledgments and funding sources, in a standard format.
• The CU is encouraged to publish methodological and conceptual papers that describe the methods and procedures used in the study. The order of authors' names for these publications will follow the modified corporative authorship model including the SC as well as the CU members that lead the preparation of the papers. The CU will provide the Acknowledgments in a standard format.
• Scientists are encouraged to present the study at scientific conferences and meetings. When scientists are invited to international or regional meetings to present the study, the CU should be informed and prior written approval is required with details of the type, venue and organizers of the meeting. The CU will keep an archive of all materials presented at meetings and make them available at the study's web page.
Press enquiries will be honoured unless there are some operational or scientific reasons for withholding information. Requests for interim results or other details arising during the study which if honoured are likely to have an adverse effect on the study, will be denied. Only one individual will be authorized to interact with the press (the local Principal Investigator or public relations officer of his/her institution) at every site in coordination with the CU. In the multicentre study context, the EC will respond to queries concerning the overall study design or results. The SC will define the type of queries that may be answered locally and those that must be referred for response to the EC. Publicity concerning study results in preparation will be avoided.

Appendix I: The Pregnancy Physiology Pattern Prediction (4P) Study Details of observation measurement Additional observations
Minimum dataset: In addition to blood pressure (that is currently recorded), temperature, respiratory rate, heart rate and oxygen saturation will be recorded at each ultrasound scan visit; this should only add 2 minutes to each visit. Minimum intra-partum and post-partum dataset (hospital): All routinely collected intrapartum and post-partum measurements (blood pressure, temperature, respiratory rate, heart rate and oxygen saturation) will be added to the dataset. Full dataset: Each participant will, if willing, be trained in the use of home monitoring equipment. The participants will then be asked to provide a daily dataset for the 2 weeks after delivery.
• Resting blood pressure will be recorded, but not displayed; readings will be downloaded into the research database at each scan visit. • Resting pulse oximetry and heart rate will be recorded and directly transferred via the (bluetooth-connected) smartphone to the research database.
• Temperature will be taken and entered by the participant into the smartphone provided and automatically transferred to the research database. • Automatic text reminders will be sent to the smartphone when datasets have not been received. A research midwife will be employed to visit participants, reinforcing observation techniques.
A complete dataset will be obtained during each visit. Two weeks after delivery, study personnel will arrange pick-up of the home monitoring equipment; a final dataset will be collected at this visit.
Blood pressure: When measuring blood pressure in the clinic or in the home, standardise the environment and provide a relaxed, temperate setting, with the woman quiet and seated, and her arm outstretched and supported. Blood pressure should be measured approximately 5 minutes after the woman has been seated and relaxed, with legs uncrossed. The arm should be supported at heart level. The correct cuff size should be used: British Hypertension Society Standard cuff Bladder 12-26 cm for the majority of adult arms Large cuff Bladder 12-40 cm for obese arms Small cuff Bladder 12-18 cm for lean adult arms (Combined European Society/NICE/British Hypertension Society recommendations) Blood pressures will be stored within the device and downloaded into the database at scan visits/home visits.
Temperature: Tympanic and oral temperature will be measured at each scan visit, using standard techniques. Self-measurement of temperature will be via the tympanic route only and will be entered into the smartphone by the participant. Midwife home visits will record both tympanic and oral temperature.
Pulse oximetry: Oxygen saturation and heart rate will be recorded at each scan visit using standard hospital equipment. For each self-observation set, oxygen saturation and heart rate will be relayed to the smartphone, and uploaded directly to the research database.
Respiratory rate: Respiratory rate will be manually counted for 1 minute at scan and home visits. At the end of each home observation set, respiration rate will be estimated using an accelerometer within the smartphone, with the smartphone applied to the chest by the participant sitting back in her seat quietly for 30 seconds.
The PEA POD uses a series of standard equations to estimate body composition. Those equations require a subject's weight (w), length (l) and volume (V). Weight and volume are measured by the machine itself. Length needs to be measured beforehand.

Warnings and cautions
The machine also requires a quiet and stable environment: It must be on a flat floor (can be checked in System setup " inclinometer) Not to be near a heater, air conditioning machine, fan or under direct sunlight Not to be near a window, a door (especially one that could be opened during testing) It is sensitive to low frequency noise It must not be touched during a test The room temperature (RT) should be between 20-28°C. If RT is out of range (by even 0.1°C), an alarm will go off and what you are doing will be lost. To prevent that if happening, ideally keep RT around 25°C (open windows and AC also limit the number of people in the room) Moreover, RT should not vary by more than ±0.5°C during testing. The test chamber temperature is maintained at 31°C Room humidity 20-70%, variation during testing ±5% Atmospheric pressure between 86-106kPa The machine is designed for newborns and infants between 1 to 8kg (from birth to 5-6 months). Do not turn the machine on if you are not using it. The machine will need cleaning after each usage. Also worth knowing, the temperature in the PEA POD room increases very quickly.

Working sequence
Turn PEA POD and air-conditioning on Warm-up (2h) Calibration: Analyze hardware " Scale calibration * "Scale check " Autorun " Volume *: scale calibration is to be performed every 2 weeks or if the PEA POD has been moved.
Test: Volume " Enter patient's data " Scale " Volume measurement " Cleaning Repeat as many times as the number of babies to measure Print out results & Back-up Log out Turn machine off

Turning on/off the machine and warm up
Check the machine is on at the wall socket and at the back of the machine (main). The PEA POD and the PC will turn on together and the PEA POD software will launch automatically. The log-in window appears.

77/116
Log-in information: User name: jiang Password: peapod (Data management password: admin) When you turn the machine on, a system error message will be displayed 'Test Chamber Too Cool' " It will disappears when the test chamber has warmed up. The machine needs 2 hours to warm-up. If tests are plans in the morning, the machine can be left on overnight. However, the running temperature of the PEA POD is 28.8°C, so you will need to turn the AC on or open a window in the morning before calibrating. When the machine is turned on, the air circulation system starts. It will turn itself off after 10min of inactivity. The PEA POD will require an extra 5min to re-circulate air when becoming active. On the day, if the machine is left inactive for a couple of hours, it is recommended to do a couple of Autoruns: QC " Autorun (Each takes about 10 min). Do not turn the machine on if you are not using it. To turn the machine off: Log off and exit the PEA POD program (the PEA POD will stop) Turn the PEA POD and the PC off " bottom right of keyboard, hold pressed for 3s Turn everything off (main at the back and plug)

Operation
The interface of the machine is very user friendly. Follow the instructions on screen. Remember to always execute the QC menu after the warm-up. Autoruns should be performed if the machine is left idle for a couple of hours. Subject's behaviour and preparation: The subject's hair should be flattened against their head using baby oil. In case the baby has a lot of hair or has curly hair, the baby should wear a cap, provided with the machine.
Calmer subjects make the experience more agreeable -testing after feeding results in calmer subjects. However, subject's behaviours (crying, movement) have been shown not to affect the results.
By default, the models used by the PEA POD are: Body composition: Fomon Thoracic gas volume: Stocks Surface area artefact: Boyd 78/116

Menu titles
This is the main screen: In the blue column on the right you will find: List of menus Environmental conditions (There can be 1 -like here -or 5 indicators) Error messages appears just underneath these indicators In the middle of the screen: The system notification box where error message appears. Do leave that bin outside to be emptied regularly.
Weekly: clean sides and top using cleaning solution and procedures specified in manual Monthly: check pre-filter (replace if dirty) and clean test chamber (Zedong) Bi-monthly: replace pre-filter (Zedong) The pre-filter is situated at the back of the machine on the bottom left if you face the rear of the machine. They are behind a grid and they are blue. They have to be replaced if they appear to collect a lot of dirt. The Brillianize spray is for the window only.

System errors
Several messages can appear: Test chamber too cool Ambient temperature out of range Scale out of level Test and reference chamber temperature too far apart Volume sequence time limit ** # Contact customer service if persistent Calibration valve pneumatic pressure out of specification Test chamber door does not close Door opened by door fail safe # Contact customer service # techsupport@lifemeasurement.com ** The volume sequence time limit means the time elapsed between the volume measurement test should not have any delays. Meaning, you need to stay at the machine and follow instructions as soon as they appear on the screen. The measuring sequence is as follow: V calibration " Patient data entry " Scale taring " Mass measurement " Volume measurement " Results If 10min have elapsed between the end of calibration and beginning of V measurement, the alarm will go off and the whole measuring sequence of calibration is to be repeated. Click 'cancel' " Test " Body Composition 83/116

Results print-outs, export and backup
To avoid transcription error, we thought it would be easier if we print out 2 copies of the results: One to be stored in the corresponding folder for future reference in the PEA POD room One to be attached to the result sheet

To Export data
The results are exported to a text file in a tab-delimited format. Further data analysis can then be performed using software programs (e.g. Excel...) Data management "Export results Enter password " Click OK Select Results to Export: The following filters can be applied: last name, gender, test date, body mass, ID_1, ID_2, ID_3 and/or ID_4. One you selected the filter(s) you want, click apply.
If you don't select any filter and click 'Apply', every results will be listed in the window.
If your search yields more than 200 results, a dialogue box will appear with the following options: Click 'Back' to further limit the number of records Click 'Preview' to populate the table (this may take some time) Click Export to export all records without previewing the data Modifications to models can be implemented. If you scroll sideways the result list, you will find columns for the default models used. You can change them if you go to their scroll down menu (down arrow). The will not affect the database just the exported results.
Select the results you want to export " Click 'Next' Name your file " Click 'Next' The file will be saved in C:\PEAPOD Data\Export Results\

To transfer data
To transfer the data, exit the software and go to the above address or use the shortcut on the desktop. There are 2 USB port in the panel on the left side of the screen. Copy or move your file to a memory stick.

To import data into Excel
In Excel, go to Open.
Select your drive and folder. In the 'Files of type' at the bottom of the 'Open' dialog window, select 'All Files'. Select your file " Click 'Open' " Click 'Next' " Click 'Next' " Click 'Finish' The results will appear in an Excel spreadsheet, each column will be a field from the 'Enter Subject information' screen. Each test will occupy a line. The software only gives the option of 'finite' hour. So it was decided that: If a baby was born before half past the hour, we would enter as its time of birth the last complete hour and If a baby was born on or after half pas the hour, we would enter as its time of birth the next complete hour.
For example, if a baby was born between 1pm and 1.29pm, time of birth is 1pm and if a baby was born between 1.30pm and 1.59pm, time of birth would be 2pm.

* Gest age
Gestational age at birth. Enter weeks and days. If any doubt, please refer to the FGLS list that should be in the room. On that list, the women are arranged by EDD (i.e. 40+0 weeks). Also there should be a wheel available in a folder.

* Gender
Male or Female

* Length
The length has to be measured prior to the test. There should be an infantometer in the PEA POD room. The software gives you the option of unit (cm or inches).

Scale taring and volume calibration
Remember to check what non-clothing items the baby is wearing and use the duplicate props during volume calibration and scale taring.
If the baby had his/her length measured earlier, check which items to babies is wearing but do not undress baby just as yet as you need to execute volume calibration and scale taring. You will be provided with a PEA POD form that you or an anthropometrist should have completed with baby's length. Do not forget to record the information requested.  The primary objective of INTERGROWTH-21 st is to develop new "prescriptive" standards, conceptually similar to the WHO Child Growth Standards, describing optimal fetal and preterm neonatal growth and newborn nutritional status, and to relate these to neonatal health risk. This objective is being achieved by implementing three studies involving detailed and highly standardised recording of maternal characteristics and anthropometry, pregnancy complications, exposure to pollutants, fetal growth, neonatal anthropometry and perinatal outcomes:

Fetal Growth Longitudinal Study (FGLS):
ultrasound and clinical assessment of fetal growth every five weeks throughout pregnancy from <14 weeks, with accurate early pregnancy dating, in eight populations with optimal health, in defined geographical areas with low environmental risks. It will produce ultrasound and clinical Fetal Growth Standards.

Preterm Postnatal Follow-up Study (PPFS):
follow-up of infants from the FGLS cohort born prematurely with regular anthropometry and nutritional evaluation to describe their postnatal growth pattern up to 2 years. It will produce Preterm Postnatal Growth Standards. All newborns from the complete cohort (FGLS and PPFS) will be seen at 1 and 2 years to evaluate health, nutrition and development.

Newborn Cross-Sectional Study (NCSS):
anthropometric measures, neonatal morbidity and mortality, and pregnancy complications assessed in all newborns at each of the study centres over a 12 month period, i.e. all deliveries are being captured over 12 months from the same areas. It will produce Newborn Birth Weight for Gestational Age Standards.
The secondary objectives are: d) Clinical: to develop a prediction model, based on multiple 2-dimensional (2D) ultrasound measurements, for estimating gestational age during mid-late pregnancy for use in populations of pregnant women without access to early/frequent antenatal care; e) Epidemiological: to study in this multi-ethnic, population-based sample the determinants of LBW and its components (preterm delivery, impaired fetal growth and their subgroups) under current healthcare conditions, and f) Biological: to acquire additional 3-D images to create an anatomical and growth databank of individual fetal organs as a unique source of biological information for future research.
The study populations from these geographically defined areas have no socio-economic constraints on growth; low morbidity and perinatal mortality, and adequate nutritional status. To be included, women must be non-smokers, with a normal pregnancy history, and without health problems likely to influence fetal growth or indicate a risk for pregnancy-related pathological conditions.
In FGLS, women are screened <14 +0 weeks at their first antenatal visit and followed-up with standard clinical and 2D ultrasound examinations every five weeks, i.e. up to six times during pregnancy. In PPFS, preterm infants (> 26 +0 but < 38 +0 weeks) born from this sample are being followed-up during their first 8 months of life with the same protocol and set of anthropometric measures used in the WHO Child Growth Study. Postnatal growth is being evaluated from both delivery and conception for comparison with the corresponding in utero measurements. All infants from FGLS and PPFS will also be seen at 1 and 2 years to evaluate health, nutrition and development.
In NCSS, all newborns at the study centres, born during a fixed 12 month period, have anthropometric measurements taken immediately after birth. Only babies born to women who meet the same inclusion criteria used in FGLS are being selected to construct the newborn standards. Birth weight and gestational age will also be related to neonatal morbidity and mortality outcomes to construct riskrelated newborn weight for gestational age standards.
Standard quality control measures are being used, including adaptation of the ultrasound machines to ensure that blinded measurements are taken; a unique system of random evaluation and repetition of ultrasound measurements (from stored images) to monitor validity and reliability, and continuous real time assessment of all data collected. Anthropometric measures of all neonates are being monitored and standardised centrally. All data are entered and managed in an on-line system specifically developed for the study, including a means of transferring blinded data directly from the ultrasound equipment to the database. This allows initiation of data analysis soon after data collection is completed.

Background
The aim is to establish a biobank (INTERBIO-Bank) of maternal blood, maternal faeces and cord blood/placental samples from healthy and complicated pregnancies to allow nutritional, epigenetic and other biomarker studies to be performed.
Collecting a heterogeneous group of cases will allow us to explore the wide range of aetiological factors (genetic, metabolic, vascular, autoimmune, infectious etc.) contributing to the development of complicated pregnancies that may present in the same way phenotypically (e.g. low gestational age), as well as the interactions between risk factors and outcomes. Ultimately, we aim to integrate all the pregnancy-related, clinical and biomarker data to improve the phenotypic characterisation of newborns, so as to facilitate the development of targeted interventions and screening strategies in pregnancy and early infant life. c The pathways leading to pregnancy complications, e.g. preterm delivery, IUGR and SGA syndromes, are almost certainly controlled by multiple molecular, genetic, epigenetic and biochemical mechanisms. What is less clear is the relative contributions from risk factors such as infections, nutritional status and other environmental exposures, especially in resource-poor settings.
We hypothesise that: 1. There is more than one preterm delivery phenotype associated with inter-related pathways, i.e. the heterogeneous causes have different functional effects on the fetus/newborn.

2.
Similarly, the IUGR/SGA phenotype has several intrauterine growth patterns, multiple causes (e.g. small maternal stature, poor maternal nutrition, infection, prematurity and utero-placental insufficiency), and neonatal and infant outcomes.
3. Hence, it is inappropriate to manage SGA and preterm newborns as single clinical entities, as usually occurs, based on the potentially false assumption that, irrespective of the cause, the adverse effects on the fetus and the clinical manifestations in the newborn are uniform; 4. These phenotypes will best be characterised by integrating measures of maternal health, fetal growth patterns, better estimation of gestational age and metabolic function, with biomarker data.
More rigorous clinical and laboratory-based characterisation of such phenotypic subgroups and their different aetiologies should lead to better clinical management of newborn complications and the development of more effective preventive interventions and screening strategies by improving their specificity. This is important because a lack of specificity of interventions tested in previous RCTs, particularly those to prevent preterm delivery, could have resulted in interventions that are actually effective in some phenotypic subgroups, being abandoned because they failed to show an overall protective effect.
A good example is the finding that calcium supplementation in low-risk women with low-calcium diets, significantly reduces the risk of pre-eclampsia (RR 0.48; 95% CI 0.33-0.69) but its impact on preterm birth (RR 0.81; 95% CI 0.64-1.03) borders on significance 1 . However, when the analysis was restricted to the four small RCTs including women at high risk of pre-eclampsia (n=568), there was a large and significant decrease in preterm birth (RR 0.45, 95% CI 0.24 to 0.83) 2 . Hence, it is possible that the magnitude of the effect of supplementation varies because the predominant preterm birth subgroups are different.
Similarly, although malaria infection clearly affects birth weight and gestational duration in epidemiological studies, a Cochrane systematic review of anti-malarial interventions in pregnancy showed that -among women in their 1 st or 2 nd pregnancies -treatment reduced anaemia, parasitaemia, placental malaria, perinatal deaths and low birth weight (RR 0.57; 95% CI 0.46-0.72), but had no effect on preterm births in the only trial assessing this outcome 2 3 . Thus, anti-malarial interventions may be effective in preventing only a subgroup of preterm births that is not seen when small trials use overall preterm rate as the primary outcome. Lastly, despite the considerable epidemiological evidence that gynaecological infections and bacterial vaginosis are associated with preterm birth, the results of several RCTs of antibiotic treatment of such infections have generally

INTERBIO-Bank study design
We aim to collect and store maternal blood, maternal faeces and cord blood/placental samples (see Figure 3) to create a biobank from the following populations: • 1) "Fetal Study" pregnancies in three centres currently in the INTERGROWTH-21 st Project (Pelotas, Brazil; Nairobi, Kenya; Oxford, UK), supplemented by high-risk pregnancies in centres in resource-poor settings, monitored using the same protocol: INTERBIO-21 st Fetal Study • 2) "Neonatal Study" pregnancies in the same three centres (Pelotas, Brazil; Nairobi, Kenya; Oxford, UK), supplemented by high-risk pregnancies in centres in resource-poor settings, monitored using the same protocol:

INTERBIO-21 st Neonatal Study
The INTERBIO-21 st Fetal Study will provide detailed phenotypic information based on fetal growth patterns and biological samples to investigate maternal/fetal nutritional status and maternal/placental/fetal biomarkers in pregnancies with optimal outcomes, as well as those complicated by a range of factors, including HIV, malaria, malnutrition and anaemia in resource-poor settings. In the field of DNA methylation in particular, this will be an important first step in describing normal variability in fetal/placental methylomes and how methylation signatures relate to both healthy and adverse clinical outcomes.
The  st Neonatal Study will provide detailed newborn phenotypic information (including accurate gestational age at birth and neonatal morbidity) and biological samples for case-control studies of maternal/fetal nutritional and maternal/placental/fetal biomarkers in healthy pregnancies, as well as those complicated by a range of factors, including HIV, malaria, malnutrition and anaemia in resource-poor settings. NCSS pregnancies in INTERGROWTH-21 st are ideal, population-based cohorts for nutritional, epigenetic and other biomarker studies to study the causes of pregnancy complications and how they influence growth and development, principally for the reasons outlined in Box 1. However, we recognise the need to enrich the collection of complicated pregnancies from populations with other risk factors that are especially relevant to the needs of developing countries. Therefore, we will supplement sample collection in the three INTERGROWTH-21 st centres by also collecting samples from pregnancies from the general population in resource-poor settings where there is a high risk of fetal growth impairment and preterm delivery because of infection, malnutrition, poor socioeconomic status and past adverse pregnancy outcomes. This strategy will increase the generation of cases from a relatively small population given the higher incidence of the conditions.
In these centres, we will collect and store samples from 800 pregnancies per centre: Maternal blood, maternal faeces, cord blood and placental samples will be collected from pregnancies (cases) that have delivered at <38 +0 weeks gestation (n=200 per centre) or have resulted in IUGR/SGA newborns (n=200 per centre). Newborns that were born at <38 +0 weeks' gestation and were IUGR/SGA will be included in both sets of cases as the case-control analysis will be carried out separately for each outcome.
We will also collect the same samples from term AGA newborns (controls), i.e. non-IUGR, normal birth weight newborns at term, as a reference group (n=400 per centre, i.e. one control for each case).
All cases and controls are required to have had, reported in their medical records, an estimation of gestational age by ultrasound measurement of either CRL <14 weeks or HC <24 weeks. When LMP is available this should also be recorded. If the LMP is not available it should be recorded as such and ultrasound estimations will be used.
Because of the different populations in the centres selected, all analyses in this case-control strategy will be stratified by centre, and will only be pooled if there is no statistical evidence of heterogeneity.

Anthropometric measurements
All babies, i.e. all cases and controls, born during the study period will have weight, length and head circumference taken within 24 hours of delivery.
Standardised, electronic, digital, newborn weighing scales with a precision of 10g will be used and their calibration status will be checked twice a week; they will be replaced if they are faulty and cannot be repaired. We shall also provide all clinics with standardised infantometers for length (precision 0.1 cm) and tape measures for head circumference (precision 0.1 cm); these will be similarly calibrated and maintained. All anthropometrists will be trained centrally and monitored during the study following standard procedures by the Anthropometric Standardisation Unit; they in turn will train the nurses/midwives in how to apply the study's measurement protocol.

99/116
For a small subgroup, the following additional anthropometric measurements will be taken: arm circumference; thigh circumference; abdominal circumference and skinfold thickness, as well as neonatal body composition using air displacement plethysmography (PEA POD) in some centres.

Follow-up
All newborns during the study period, including those on NICU or special care, will be followed on a daily basis until hospital discharge to document severe morbidity and detect neonatal death. We will make strenuous efforts to coordinate and promote evidence-based care for the neonates born <38 +0 weeks' gestation using materials developed as part of our best practice programme, by liaising with the lead neonatologist in each NICU before and during the study. We recognise that differences in practice will persist despite our best efforts, especially in resource-poor settings. However, we believe this is unavoidable in a very pragmatic study such as this, which is trying to reflect what happens on a daily basis in clinical practice. Furthermore, we will similarly make strenuous efforts to standardise the main protocols for feeding practices in each NICU before the study starts. During the routine site-visits by members of the Study Coordinating Unit and the Anthropometric Team we will monitor the implementation of the protocols.

Severe perinatal morbidity and mortality outcomes
We have decided to use an un-weighted composite outcome including at least one of the following conditions: stillbirth, neonatal death until hospital discharge of the newborn, newborn stay in NICU for ≥7 days or other severe neonatal complications. We have used such an outcome recently 7 8 ; it requires limited standardisation of clinical diagnoses across hospitals and is well accepted as a marker in large, international, population-based studies of newborns that are severely ill. d It could be argued, however, that intrapartum stillbirth may not be related to fetal growth and should not be included in this index. We believe this is a valid point but as it will not be possible to separate those intrapartum deaths that are related to IUGR from those that are unrelated, we suggest keeping the index as it is. We believe this is a good proxy for adverse perinatal outcomes across countries.

On-line data management and statistical analysis
All clinical data will be entered into an on-line data management system specifically developed for the study. It includes a method for direct transfer of blinded data from the ultrasound machines to the database. This on-line system has the practical benefit of allowing on-going quality control, correction of errors or missing values and the initiation of data analysis soon after data collection is completed. It will be used for data management and monitoring all sub-studies, including patient recruitment and follow-up, and is based on the INTERGROWTH-21 st Electronic Data Management System. The system permits all participants' data to be incorporated into the data files via the Internet as soon as they are available. Included within the system is a review process to ensure that all data are complete.
All sample related data will be entered separately into a data management system specifically developed for the study. The system allows samples to be tracked from the time of collection through processing, storage in the participating centres, and transport to a centralised facility. Each participant will have a unique identifier number, which will be used to link the clinical and sample databases. The number will also be used to barcode individual samples and aliquots. Quality control for this aspect of the study will be monitored by a team from GAPPS. These systems will provide the Data Management Unit with a detailed daily record of patient enrolment and data entry, at both individual and institutional levels to monitor progress against the milestones listed in the protocol. Corresponding actions, such as telephone calls, web conferences and site visits will take place within a week of detecting a problem in a centre to ensure that appropriate corrective measures are taken.

100/116
Selection of Cases and ControlsAll live and stillborn infants in the study hospitals during the data collection period, whether or not they survive until hospital discharge, will be screened. However, multiple births and post-term births (>42 +0 weeks), will not be included.
Each newborn infant will fall into one of the four groups below: All mothers admitted for delivery (spontaneous or induced labour, or elective C-section) will be screened to check if they had gestational age estimated by CRL at <14 weeks or HC at <24 weeks. If not, they are not eligible for the study. If a mother had one or both of these two measurements, the screening form will be completed to collect the information required to classify her infant as: a) <38 +0 weeks or ≥38 +0 weeks, and b) IUGR/SGA or non-IUGR/SGA (based on the charts provided).

Operational definition of cases and controls in the maternity wards
To simplify the identification of cases and controls during screening, the following procedures will be used (see instructions in Appendix II): First, gestational age will be assessed using CRL or HC. Cases, born at <38 +0 weeks, will be live or stillborn infants with gestational age assessed by an early ultrasound (either CRL at <14 weeks or HC at <24 weeks), regardless of whether or not they presented with IUGR/SGA at any time during pregnancy or at birth. These infants correspond to groups A and C in the table above. Second, BW for gestational age will be assessed for infants born ≥38 +0 weeks. Cases, IUGR/SGA, will be live or stillbirths whose BW for gestational age is below the 10 th centile of the INTERGROWTH-21 st neonatal standard as defined on the form. These infants correspond to group B in the table above. In the data analysis phase, infants from group C (IUGR/SGA infants born <38 +0 weeks) will be added to those in group B so as to include all IUGR/SGA infants regardless of their gestational age at birth.
Third, the screening form will also identify potential controls, that is, non-IUGR/SGA infants who were not born <38 +0 weeks (group D in the table above). The first potential control born after each case (either a case born <38 +0 weeks or an IUGR/SGA case) in the same hospital e will be enrolled in the study as a control. After enrolling a case, a control must be recruited. If two cases are born in succession, the second case cannot be recruited and instead screening for a control continues. Once a case-control pair have been recruited and processed, sites then screen for another case. e If there is more than one hospital at a given study site, and if presumed risk factors vary by hospital (e.g. one primarily attracts mothers of low socioeconomic status, and another attracts high income mothers), it may be necessary to weight the analyses to reproduce a control group that is representative of the study population; ignoring such differences may lead to overmatching.

Definitions of cases and controls for the data analyses
Cases born <38 +0 weeks' gestation for the data analyses will include all births at <38 +0 weeks whether or not they present with IUGR/SGA (groups A and C).
IUGR/SGA cases for the data analyses will include the operational definition of IUGR/SGA cases (group B) plus those cases born <38 +0 weeks who are also IUGR/SGA (group C); the latter were collected as a sub-set of cases born <38 +0 weeks.
Infants in group C (IUGR/SGA infants born <38 +0 weeks) will be included in both groups of cases, as the case-control analyses will be carried out separately for each outcome.
The table below provides the definition of controls for the analyses.

Controls for cases born <38
+0 weeks will be a sample of live and stillborn infants born ≥38 +0 weeks. In the statistical analyses, a proportion of term IUGR/SGA (xB) cases will be added to the operational controls (group D).
Controls for IUGR/SGA cases will be a sample of live and stillborn infants who are not IUGR/SGA at birth. In the analyses, they will include all operational controls (group D) plus a proportion of infants born <38 +0 weeks who are not IUGR/SGA at birth(xA). Table 2. Case-control comparisons in the data analyses.

Comparison Cases Controls Comments
Infants born <38 +0 weeks case-control study A + C D + xB To reproduce the control population, set B (IUGR/SGA only) will be down-weighted by a factor x which is equal to the sampling fraction for set D, that is the proportion of all infants in the control pool who were included in the detailed study (cases).
IUGR/SGA case-control study B + C D + xA As above, for set A (infants born <38 +0 weeks only).
Subgroup analyses will include cases born <38 +0 weeks, stratified according to: a) gestational age groups (the exact groupings will be decided based on the number of births each week of gestational age, so that there will be at least 100 cases in each sub-group) or b) by preterm phenotype, using the newly proposed INTERGROWTH-21 st classification system. For IUGR/SGA, subgroup analyses will include stratification by: a) IUGR/SGA severity (<3 rd , 3-5 th , 6 th -9 th centiles) and b) gestational age.
Appendix II provides more detailed information on different strategies for selecting controls for casecontrol studies than we considered when planning the study, but some of these proposed strategies were not practical. As proposed above, INTERBIO-21 st will adopt a traditional case-non-case design, and odds ratios will be used to estimate relative risks. This is based on the assumption that cases will be relatively rare, i.e. <10% of the overall number of births.
We estimate that the overall birth rate for infants born <38 +0 weeks will be <10% and the overall IUGR/SGA rate will be <10-15%. However, by collecting data on all four sets (A, B, C and D), it will also be possible, with appropriate statistical weights in the analyses, to carry out case-base analyses using Poisson regression with robust variance, if the outcomes end up being more common (>10%).

Sample processing
The sample collection, processing and storage procedures will be performed in a standardised manner based on protocols described in detail in the  st Operations Manual that has been developed with the assistance of the GAPPS team, and researchers at the Universities of Oxford 9 and Cambridge 10 , and Johns Hopkins Bloomberg School of Public Health.
In brief, maternal and cord blood samples will be collected to store whole blood, plasma and the buffy coat for a wide range of purposes, including DNA extraction for genetic and epigenetic studies and micronutrient assays. Two placental biopsies will be taken for immunohistochemistry and DNA extraction and in RNA later for expression studies (if the sample is obtained <30 mins after delivery).
In addition, we intend to collect and store samples for a number of future, as yet unspecified, biomarker assays relating to preterm delivery and fetal growth.
It is vitally important to ensure that samples are collected in a standardised way with adequate monitoring of quality control, principally because sample quality, quantity and handling can greatly influence the results of microarray and sequencing experiments 11 .

103/116
The primary reason (aside from quality control) for ensuring that samples are collected, stored and processed in a uniform manner is to facilitate the anticipated interchange of data, in the future, with other biobanks. Standardising phenotypic definitions, sample collection methods and analyses fosters trans-national collaboration and networking 12 . We shall therefore also seek advice from groups such as the Public Population Project in Genomics (http://www.p3g.org), which promotes international harmonisation and collaboration in population genomics and biobanking by sharing research tools and expertise.
Faecal samples: We wish to collect a faecal sample from mothers, opportunistically at the time of delivery, for metabiomic studies. Although it has been suggested that we should also collect stool samples from infants every 6 months, we feel that this is a rather large-scale undertaking that is beyond our remit.
We certainly appreciate the importance of looking for maternal intestinal co-infections and microbiota.
In fact, we published on this subject in 1989: in a prospective study of 14,914 pregnant Guatemalan women, the incidence of IUGR increased with the number of parasitic species detected 13 .

Sample size
This is a great challenge in any field-study of this magnitude and even more difficult when exploring risk factors with relatively unknown degrees of association and prevalence in the population. The key issue is to reach a balance between logistical demands, including the need to maintain data quality in these populations, and power calculations especially for the planned epigenetic studies. Having said that, our co-investigators, Krina Zondervan and Cecilia Lindgren in the Wellcome Trust Centre for Human Genetics, Oxford, have considerable experience of conducting candidate gene and genomewide association studies (GWAS) in related fields and the lessons learned over the last 15 years will be pertinent to the proposed studies.
To illustrate the point, Cardon & Zondervan reviewed how the complex interplay between genotype, phenotype, environmental factors and sample size affects the ability to detect disease susceptibility variants in population-based association studies 14 . They concluded that thousands of cases and controls are required to detect common variants with small effect sizes in such studies.
Three examples demonstrate the need to study large numbers to identify genes influencing quantitative traits involved in metabolic function, such as birth weight. Nearly 120,000 individuals were genotyped to identify three loci influencing anthropometric measures (waist circumference and waisthip-ratio) of central obesity and fat distribution in a recently published meta-analysis of 16 GWAS, followed by large-scale replication testing 15 . Using similar methodology (GWAS meta-analysis followed by replication), over 90,000 individuals were genotyped to confirm that two loci are associated with BMI and to identify six additional loci 16 and, more recently, nearly 40,000 European individuals were genotyped in identifying variants in ADCY5 and near CCNI associated with fetal growth and birth weight 17 .
For the nested case-control studies, we are collecting samples from 2,000 controls; and 2,000 cases from pregnancies with adverse outcomes, e.g. delivery at <38 +0 weeks' gestation, term IUGR/SGA. In addition, we have the potential to include 400 cases and 2100 controls from the FGLS population in the analysis, taking into consideration the possibility of selection bias in the selection of FGLS population controls.
It is very unlikely that fewer than these numbers will be needed to study the effects of adverse intrauterine effects on epigenetic profiles, especially as there is emerging evidence from genome-wide epigenetic studies in animals that imprinted quantitative trait loci (iQTL) affect body weight and growth 18 and adult body composition 19 in much more complex and diverse patterns than previously assumed.
Selection of study centresWe aim to use the same rigorous processes to select the new sites for this extension as originally adopted in the selection of the current INTERGROWTH-21 st centres. However, in this case, the selection criteria will inevitability involve finding a balance between obvious opportunities (e.g. having access to a malnourished pregnant population with a high prevalence of malaria/HIV) and the risks of working in a research naïve environment with limited existing access to antenatal care.
The criteria the  st Steering Committee will use to select the centres will include factors such as: 1) existing research infrastructure and capacity; 2) existing maternity services, including antenatal ultrasound; 3) support of local health authorities; 4) previous experience in collecting biological samples; 5) geographical location to retain global coverage; 6) prevalence of key exposure variables, i.e. risk factors; 7) costs; 8) leveraged funding from other donors, and 9) need ideally for all samples in the proof-of-concept study to be analysed in a centralised facility.

Background
Understanding the gene-environmental interactions underlying the plasticity of the epigenome at certain times from fetal life to infancy will be crucial to developing interventions, particularly in pregnancy, that might correct or at least prevent the long-term, adverse consequences 20 . We believe that the key to doing so effectively is to recognise that phenotypes other than birth weight and gestational age alone are needed to determine the nutritional status of the newborn and assess the effectiveness of interventions.
The redefinition of newborn phenotypes will arise from evaluating a combination of factors in pregnancies with normal and abnormal outcomes. These include maternal health; fetal growth patterns measured using 2D ultrasound; growth patterns of individual fetal organs measured using 3D ultrasound; newborn body composition and physiological function; micronutrient levels and data from epigenetic experiments, which will initially characterise normal variability across the epigenome in uncomplicated pregnancy and then, in carefully designed nested case-control studies, evaluate the effects of adverse environmental and nutritional factors on the epigenome (and other biomarkers) in a pool of complicated and uncomplicated pregnancies.

General Objectives
The aim is to conduct a hypothesis-testing, proof-of-concept study comparing 500 normal birth weight and 500 term IUGR/SGA newborns (using both cord blood and placental samples) taken from the samples collected in the context of both the INTERBIO-21 st Fetal and Newborn Studies. This will be the first in a series of experiments utilising samples collected for the INTERBIO-Bank.
We aim to assess DNA methylation patterns in ~100 imprinted genes previously implicated in fetal growth. Our hypothesis is that maternal micronutrient deficiency, particularly of folate and other methyl donor factors, results in impaired fetal growth, development and pregnancy outcomes, through altered DNA methylation.
We will therefore correlate these methylation patterns with pregnancy (clinical outcomes, fetal growth), nutritional (micronutrient assays), and neonatal (growth, development and body composition) data, which will allow us to: 1. Study the effects of environmental and nutritional factors on the epigenome;

Develop new phenotypic definitions of LBW and other adverse pregnancy outcomes
If validated, the results could inform knowledge-based actions to address underlying problems, such as poor nutrition and infection, leading to improved outcomes. The data will, in addition, serve to define normal variability in the epigenome and inform the design of future epigenome-wide studies, once the cost has fallen, as inevitably it will with technological advances.
In the long-term, we would also wish to correlate these epigenetic findings with single nucleotide polymorphism (SNP) genotyping data from a GWAS given the increasing evidence that epigenetic regulation is influenced by genetic factors and the recently published data implicating variants in ADCY5 and near CCNI with fetal growth and birth weight 17 .

Specific Objectives
We plan to study the methylation profiles of the ~100 imprinted genes that have to date been implicated in fetal growth, although the final list of candidate genes will be taken from our own systematic search of the literature, as well as existing databases, such as http://www.geneimprint.com and http://igc.otago.ac.nz.
Where possible, we will analyse cord blood and placental tissue separately to compare the methylation profiles of both tissues. The underlying rationale is as follows: • There is increasing evidence that placental function and gene expression respond to, and are marked by, environmental insults. The placenta can therefore serve as a 'record of in utero exposure and pathology' 21 . Effects on the fetus almost certainly occur downstream of these events and so comparing the epigenetic profiles of both tissues in individual pregnancies may help to differentiate the various causes of IUGR/SGA and preterm delivery. CpG sites between cord blood and placenta 22 .
o Guo et al. (2008) have described similar findings in two imprinting clusters: the H19 promoter is unmethylated and IGF2 DMR2 hypomethylated in placenta. However, in cord blood, these two regions maintain the differential methylation status seen in most other tissues 23 .
o Yuen et al. (2009) have observed DNA methylation of the promoter in TUSC3 and WNT2 in placental, and not the associated fetal, tissues; within individual placentas, methylation was confined to trophoblastic chorionic villi, and not amnion, chorion, cord or decidua 24 .

Study design
For this proof-of-concept study, we will randomly select 500 term IUGR/SGA cases from the INTERBIO-Bank. The 500 normal birth weight controls will be taken either from the population at least risk within the INTERBIO-21 st Fetal Study or from the total with normal outcomes from the entire study population, and matched with the cases. A final decision will be made by the INTERBIO-21 st Steering Committee.

Methods
We have given considerable thought to the best technological platform for assessing methylation profiles and we have consulted widely with leading experts in the scientific community and industry. There are a large number of different platforms available and many more being developed; in general, there is an inverse relationship between the cost of analysis and the resolution/coverage of the genomic region being studied. At this stage, however, we have decided to use Methylated-Cytosine DNA Immunoprecipitation-Microarray Chip (MeDIP-Chip) followed by bisulfite-(BS) PCR and high throughput sequencing for validation of differentially methylated loci 11 .
The approach is well described in a recently published proof-of-concept study assessing whether 'DNA methylation in a subset of genomic loci may connect end-stage cardiomyopathy with different etiologies' 25 . In brief, these authors performed a preliminary analysis using MeDIP-Chip (Nimblegen, WI, US); validated differential methylation loci by BS-PCR and high throughput sequencing; identified three angiogenesis-related genetic loci that were differentially methylated with the BATMAN algorithm 26 , and using quantitative RT-PCR, found that the expression of these genes differed significantly between cardiomyopathy hearts and normal controls.
However, we are aware that the samples may not be analysed for at least another two years by which time the technology is likely to have changed considerably, costs will have fallen and genome-wide profiling in large numbers of samples will be affordable. We are therefore in preliminary discussions with a number of companies, including Nanopore (Oxford, UK), http://www.nanoporetech.com, and Pacific Biosciences (Menlo Park, CA, US), http://www.pacificbiosciences.com, who may soon be able to offer high-throughput, single molecule sequencing 27 . Whichever platform is used, however, the intention ideally is to analyse all samples in a centralised facility; in fact, this applies to all the experiments proposed in the proof-of-concept study.

Specific experiments
Placenta v. cord blood methylation profiles: To the best of our knowledge, no study has compared the methylation profiles of the ~100 imprinted genes in placental tissue and cord blood. The outcomes of these comparisons will potentially shed light on the regulatory mechanisms and epigenetic profiles of adverse and healthy pregnancy outcomes.

107/116
The PEA POD system compares well with the 4-compartment reference model, which is considered the best choice for assessing body composition in humans. In contrast, however, it is easy to perform; takes only a few minutes to complete; infant movement during the measurement is not a significant problem; the measurements can be repeated as frequently as needed, and the results are immediately available 31 .
The system is now recognised as an established method to assess neonatal body composition in developed countries [32][33][34] , and it is suggested that it may offer important insights into which fetal growth parameters most closely reflect the generalised nutritional state of neonates and infants 34 . However, there are no published data about its use in resource-poor settings as, to the best of our knowledge, the system has been installed in only one site in such a setting, as part of a collaboration between Jimma University, Ethiopia, and the Department of Human Nutrition, University of Copenhagen.
We now propose installing PEA POD systems in four of the centres in resource-poor settings to give a much more detailed assessment of nutritional status and growth than birth weight and gestational age alone.

Sample size for epigenetic studies
As discussed on page 18, it is extremely difficult to provide reliable power calculations at the moment for epigenetic studies: the field is too new and very few relevant studies have been conducted, especially in humans, to enable power calculations to be performed. It is also unclear at present to what extent it will be necessary to map DNA methylation at high resolution across the entire genome 35 , which will inevitably influence the epigenotyping strategy and choice of platform, e.g. bisulfite sequencing or array-based technology. However, having said that, the sample size chosen matches that in the NIH National Standard for Normal Fetal Growth Study and we feel comfortable that it provides a reasonable compromise between cost, expediency and logistical demands.
The estimated samples sizes required to detect the effects of methylation status on adverse pregnancy outcomes are inevitably based on a range of assumptions, since the spectrum of methylation changes and their corresponding effect sizes are unknown. Table 1 shows the sample sizes required to detect differential methylation in cases vs. controls. The following assumptions are made: -Methylation status is either on/off, and so the proportion of cases vs. controls with methylated status is analysed. -Methylation proportion among controls of 0.2, with proportion in cases varying from 0.3-0.5, corresponds to an odds ratios (OR) of the effect of methylation status on outcome from 1.7-4.0. -A significance threshold α of 5.0 x 10 -4 (Bonferroni-corrected threshold for 100 candidate imprinted genes) vs. 5.0 x 10 -7 (commonly applied genome-wide significance threshold in GWA studies 36 ). -Power of 80% vs. 90% -Case: control ratio either 1:1 or 1:3 108/116  Table 1 demonstrates the approximate power of the proposed experiments with 500 cases and 500 controls. However, for future experiments, based on conservative estimates (OR=2.2 and PAF=0.2), we will have considerable power to detect differences even for 90% power, given that we could have a 1:3 case: control ratio (i.e. 1,000 infants born at <38 +0 weeks' gestation or 1,000 term IUGR/SGA newborns and at least 3,000 term, non-IUGR/SGA controls). Nevertheless, it is worth emphasizing that these are approximate calculations and that, in a study of this magnitude and complexity, logistical and budgetary considerations must inevitably play an important role in the selection of the sample size.
Data quality: Standardisation of the research staff, who will be responsible for obtaining the neonatal body composition data, represents a challenge. However, we will employ the same quality control measures that are now being used in FGLS and PPFS for the ultrasound and anthropometric data to ensure that the quality of the data is maintained.

Publications and Authorship
The policy regarding publications arising from the study is identical to that in place for the INTEGROWTH-21 st Project as a whole and was approved at the first INTERBIO-21 st Study Steering Committee.