Purpose Preterm birth (PTB) is a public health issue. Interventions to prolong the length of gestation have not achieved the expected results, as the selection of population at risk of PTB is still a challenge. Cervical length (CL) is the most accepted biomarker, however in the best scenario the CL identifies half of the patients. It is unlikely that a single measure identifies all pregnant women who will deliver before 37 weeks of gestation, considering the multiple pathways theory. We planned this cohort to study the link between the vaginal microbiome, the proteome, metabolome candidates, characteristics of the cervix and the PTB.
Participants Pregnant women in the first trimester of a singleton pregnancy are invited to participate in the study. We are collecting biological samples, including vaginal fluid and blood from every patient, also performing ultrasound measurement that includes Consistency Cervical Index (CCI) and CL. The main outcome is the delivery of a neonate before 37 weeks of gestation.
Findings to date We have recruited 244 pregnant women. They all have measurements of the CL and CCI. A vaginal sample for microbiome analysis has been collected in the 244 patients. Most of them agreed to blood collection, 216 (89%). By August 2021, 100 participants had already delivered. Eleven participants (11 %) had a spontaneous PTB.
Future plans A reference value chart for the first trimester CCI will be created. We will gather information regarding the feasibility, reproducibility and limitations of CCI. Proteomic and metabolomic analyses will be done to identify the best candidates, and we will validate their use as predictors. Finally, we plan to integrate clinical data, ultrasound measurements and biological profiles into an algorithm to obtain a multidimensional biomarker to identify the individual risk for PTB.
- public health
- biotechnology & bioinformatics
- molecular biology
Data availability statement
All data relevant to the study are included in the article or uploaded as online supplemental information. The information will be accessible as each stage of the work is completed. Data will be available in the papers and in online supplemental data section.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
To the best of our knowledge, this is the first Colombian cohort designed to obtain differential demographic and biological characteristics from the early stages of pregnancy in women who deliver preterm.
The biomarkers identified will allow us to create algorithms to establish an individualised risk of preterm birth (PTB). The study includes stratified risk based on the antecedents of PTB.
We will have a significant repository of clinical data, biological samples and ultrasound images to extend the research.
The study includes biological markers evaluated only at one point in the first trimester, and we will not obtain trends over time.
Preterm birth (PTB) is a public health issue around the world. It has been estimated that at least 15 million babies are born preterm each year.1 Prematurity is the leading cause of neonatal and under 5 years old mortality. Approximately one million babies die each year due to PTB complications.2 Hyaline membrane disease, necrotizing enterocolitis and intracerebral hemorrhage are the most significant consequences of prematurity, and these conditions are the basis of increased mortality and short-term disabilities.3 According to the Global Burden of Disease report, PTB accounts for 3.1% of the years of healthy life lost due to disability.4 The frequency of PTB varies worldwide, with rates ranging from 9% to 12%.1 The global report on PTB published by the WHO shows a rise in frequency worldwide. In Colombia, the rate of prematurity is 14%, higher than the world’s average.5 According to the Colombian National Institute of Health, in 2019, 45.2% of perinatal and late neonatal deaths were caused by prematurity, asphyxia and related causes.6
Some strategies have contributed to the reduction of the morbidity related to prematurity. These include the use of uterine inhibitors, steroid-induced lung development, neuroprotection by magnesium sulfate, technological and care improvements in paediatric intensive care units and ‘kangaroo mother’ programmes.7 However, none of these measures, alone or combined, have reduced de incidence of preterm delivery, and therefore, the prematurity.8–10
One of the objectives of the WHO is the reduction of perinatal mortality and morbidity of children under 5 years old; this ambitious objective involves diligent work in reducing PTB, which requires basic sciences research to predict and prevent this condition, as well as clinical research in a translational programme to help translate all biological findings into a valuable clinical scenario for the patients.7
The most reliable risk factor for PTB prediction is a history of previous PTB.11 In a recent meta-analysis, the authors concluded that the risk of having PTB before 37 weeks in a subsequent pregnancy in a woman with a medical history of PTB is 30%.12 The benefit of progesterone therapy was demonstrated for the first time in pregnant women with a history of PTB, reducing PTB risk in the subsequent pregnancy by 34%.13 This result was confirmed later with vaginal progesterone reducing the PTB frequency before 34 or 37 weeks compared with the placebo group.14 However, only 15% of pregnant women have a history of PTB, which does not allow us correctly to classify the other 85% at risk of PTB.
Regarding the biomarkers for PTB prediction, cervical changes throughout gestation have been studied, and cervical shortening is the most evaluated characteristic. These changes can be identified through the measurement of CL by transvaginal ultrasound and, when they occur at an early gestational age, can predict the occurrence of PTB.15 A short cervix, with a length less than 25 mm (10th percentile), was identified in 18% of patients who had PTB before 37 weeks.16 When combined with a history of PTB, a CL less than 25 mm between 20 and 24 weeks may identify up to 28% of patients who will deliver between 34 and 37 weeks.17 A meta-analysis of individual patient data showed that the vaginal use of progesterone in patients with a short cervix could reduce the risk of PTB by 38%.18 More recently, two proposals to increase the sensitivity (Se) of the CL, one in nulliparous changing the cut-off of the cervical length (CL) and the other with customized charts based on the maternal characteristics, have been published19 20
Other cervical measurements have been proposed.21 Parra-Saavedra et al postulated the Consistency Cervical Index (CCI) as a predictor of spontaneous PTB. The authors demonstrated greater Se and positive predictive value (PPV) than CL measurement.21 However, more evidence is needed to validate the reproducibility of this measure as an effective tool to predict PTB.
The new sequencing techniques have allowed us to better understand bacterial communities present in the vagina and their characterization throughout gestation to predict PTB.22 23 Showing differences among women who delivered preterm and those who delivered at term, and racial differences that show the variability of these bacterial communities among populations and their potential role as a predictor of PTB.24
The proteome characterization in the serum of pregnant women has improved the understanding of the pathways involved in PTB.25 Inflammation has been recognized as one of the critical processes associated with the changes in maternal and fetal tissues during delivery. Inflammatory mediators released from senescent fetal tissues during pregnancy spread to maternal gestational tissues to signal preparation for delivery. The increased load of immune cells and inflammatory proteins is reflected in the tissues with accumulation in the cervical stroma favouring rapid remodelling and releasing proteolytic enzymes that digest the proteins of the extracellular matrix to culminate in the process of cervical dilation, which is a step before expulsion.26
The search for biomarkers using proteomics has been carried out on different types of samples, in which some differences have been found between the groups of patients with PTB and full-term delivery. From cervical samples, it has been determined that the proteins expressed at the highest levels in women with PTB are fibronectin, extracellular matrix protein 1, laminin and calsintenin.27 Other studies found proteins such as desmoplakin 1, which participates in intercellular junctions and cell–cell communication, localized in human fetal membranes.28
From the beginning of the pregnancy, many physiological changes and metabolic adaptations in the mother are present, and they are necessary for gestational development and maintaining maternal–fetal well-being.29 In recent years, interest has increased regarding understanding how these processes occur in the maternal system to identify possible candidate biomarkers for the early detection of pathologies in pregnancy, such as pre-eclampsia and PTB, through metabolomic analysis. This branch of science is responsible for studying the molecules present in cells, tissues and body fluids and how these are reflected in the state of health and well-being. The metabolic profile based on gene expression analysis provides a quantifiable reading of the biochemical state from normal physiology to various pathologies. This information allows a realistic approach to the phenotype of the disease and the possible identification of a set of predictive markers with higher reliability.30
Advances in biomarker detection technology related to metabolomics proteomics have resulted in the generation of a large amount of multidimensional data, which requires developing more efficient analytical techniques. Machine learning methods represent the ability of algorithms to learn from a set of data and apply this knowledge to a new set of data to make accurate classifications and predictions. This approach allows better identification of the essential characteristics present in the data and, therefore, a better understanding of the factors relevant to the problem studied than traditional approaches. The use of machine learning methods in omics data analysis has recently increased, and the strategies developed for each specific study.31 32
Because several routes contribute to the onset of labour,33 it is unlikely that a single marker will identify pregnant women who will deliver before term, which explains the limited results of current interventions for the containment of PTB. In this regard, we believe that it is necessary to integrate the clinical, epidemiological and environmental aspects, as well as the characteristics of the cervix and biological markers of the different pathways involved in the onset of labour; this task will result in a large volume of data which require, not only the clinical expertise but also the integration of all the information in artificial intelligence algorithms, This will allow us to establish an individual risk, derived from biological and environmental determinants, on the way to personalised medicine.34
From January 2020, all consecutive pregnant women who attended their first-trimester screening ultrasound at the maternal–fetal medicine units of two centres, Hospital Universitario de Santander and Centro de Atencion Materno Fetal INUTERO in Bucaramanga Colombia, have been invited to participate in the study.
Inclusion criteria were women with a singleton pregnancy between 11+0 and 13+6 week’s gestation, planning to stay in the nearby area during gestation and delivery. Exclusion criteria were participants with a medical history of cancer, HIV infection, or history of cervical surgery. Also, pregnant women whose gestation ended before 37 weeks on medical indication of fetal or maternal causes were excluded. Patients were invited consecutively by a research team member in the recruitment centres. After acceptance and signing the consent, the pre-established format for collecting data was filled out for the following categories: age, parity, gestational age, marital status, labour status, educational level, social security type, PTB history, a medical condition associated (HTA, diabetes), smoking status, health insurances, height and weight to calculate the body mass index.
The main outcome is spontaneous PTB which is defined as delivering the baby before 37 weeks of gestation, excluding those indicated by maternal or fetal causes.
All women selected to participate in the study underwent the following procedures:
A medical team of specialists in maternal–fetal medicine, who carried out the screening studies (11+0–13+6 weeks) at these institutions, received training from Dr Miguel Parra-Saavedra in the measurement technique of the CCI. Subsequently, the physicians made images with the requested characteristics and a predesigned format, which was qualified as appropriate after achieving compliance greater than 90%.
Proposed protocol for the first trimester cervical consistency index measurement
The maternal bladder must be empty.
With the woman in lithotomy position, the vaginal transducer is directed toward the anterior vaginal fornix.
To obtain the adequate image (figure 1)
Use the vision angle of 180°.
Obtain a sagittal view of the cervix, visualising the endocervical canal.
Identify the internal and the external os.
Magnify the image until the cervix occupies 75% of the screen.
To obtain cervical measurements (figure 2)
Measure the CL without including the isthmus, place one calliper in the external os and the second calliper where the cervical glands end, at the level of the vesicovaginal fold. (the line will serve as a reference).
Freeze the image and press the double image screen button, be sure that the anterior lip of the cervix is equal to the posterior lip (without over press the cervix with the vaginal probe).
Press the transducer toward the posterior vaginal wall until the cervix displacement starts, use the cineloop to center the image with similar characteristics to the left image and freeze.
Measure the anterior–posterior diameter of the cervix in each image, tracing a perpendicular line to the cervical canal.
The callipers must be placed on the bright line on the anterior and posterior edge of the cervix.
Obtain the cervical consistency index dividing the anteroposterior diameter under pressure (AP2) by the basal anterior-posterior diameter without pressure (AP1). Obtain a number between 0 and 1.
Repeat step 4 to obtain a second measurement, use the minor number obtained (the minor number represents the maximum compressibility of the cervix).
According to the national recommendations, all patients underwent a screening study that includes a CRL measurement to establish gestational age.35 36 In addition, a CL measurement was performed, using the described technique,37 also two CCI measurements were performed.21 The images of these measurements are saved in the ultrasound equipment (General Electric Voluson E8, General Electric Voluson E6, General Electric Voluson S8). Additionally, 50 patients were evaluated by two operators. The images and the study information are exported to the external hard disc to protect the data.
Vaginal sample for the microbiome
At the time of screening, a sample of the middle third of the vagina is taken with a device designed for this purpose(Patent Ref. No:NC2016/0002338. Res. 1214 of the Colombian Industry and Commerce Superintendency). The women’s sample acceptability was previously proved.38 The single-use device has a brush to collect the vaginal sample; the brush is subsequently separated from the device and dispensed in a sterile container; two ml of cell-conserving liquid is added to maintain the sample’s integrity. The vials with the brushes inside are stored at a temperature of 4°C until DNA extraction.
DNA from vaginal samples is extracted with the salting-out technique. The DNA is re suspended in 50 μL of Tris-Edta buffer for its preservation. DNA quantification and purity are carried out in Scientific Nano Drop One (Thermo Scientific, USA), and the DNA sample is stored at 20°C until the sequencing.
Blood sample for omics analysis
Two blood samples collected with EDTA anticoagulant are taken from each patient. The extraction of the sample is carried out through the vacuum system technique in the veins located in the antecubital area of the forearm. The samples are centrifuged at 3000 rpm for 10 min in the laboratory with the Sorvall ST 16 Centrifuge (Thermo Scientific, USA)to separate its components and obtain blood plasma. From each patient, four aliquots of 800 μL are obtained and collected in Eppendorf tubes and are stored in Ultra-Low Temperature Freezer (Thermo Scientific, USA) at –70°C.
After the first visit that includes the registration of the initial information, the ultrasound study and the collection of the vaginal and blood samples, a research team member makes a monthly telephone call to monitor the evolution of the pregnancy until delivery. A record is kept with the date of delivery, the delivery route, the delivery characteristics, and if it was spontaneous or occurred on medical indication. In the case of medical indication, the reason for it is also recorded. These data are confirmed in the file of each patient.
Clinical data are stored in a password-protected, web-based electronic database, RED Cap, with the deidentification capability to protect patient information. After extraction from the ultrasound machine, the deidentified ultrasound images are stored in a repository and linked through the unique assigned code. The information will be analysed with the Stata statistical software.
Calculations will be based on the main outcome, which is PTB before 37 weeks. However, this is a diagnostic test analysis on which we will calculate the needed number of patients according to the expected increase in Se for a fixed 15% false positive rate (FPR) using the current best predictive model for PTB as the baseline Se. According to Celik et al from the Fetal Medicine Foundation Model,39 the Se at a 15% FPR for a comprehensive model using CL, obstetric history and maternal characteristics for the prediction of PTB below 37 weeks is 34.7%. We expect to increase the prediction to 50% for the same 15% FPR. For a delta of 15.30% of the expected increase using an 80% probability of finding a statistically significant difference for a threshold of 0.05 and an expected 15% attrition, we would need a total of 384 patients in the cohort. In addition, we plan to increase the cohort as required by the results of the exploratory omics studies.
Sociodemographic data will be analysed in proportions or central tendency measures: mean and SD to normally distributed data, and the not-normal data will be reported with median and IQR. The population included for analysis will be divided into two groups: with spontaneous premature birth (<37 weeks) and with term birth (≥37 weeks). The differences between groups will be evaluated via the χ2, χ2 of lineal tendency (χ2LT), or Wilcoxon tests. According to the situation, differences between these dichotomic categories will be evaluated with χ2 or Fisher’s exact tests, considered statistically significant when α<0.05. For continuous variables, ROC analysis will be performed to find the better cut-off points to predict premature labour, and those distributions will be grouped into two to four categories according to the association relationships.
Regarding the predictive model, we decided to take a classical approach for a multivariate logistic model and add a machine learning approach with automatic variable selection and penalisation of non-significant variables to avoid overfitting and lack of parsimoniously and robustness for the model. A classical and machine learning approach will be used for the construction of predictive models for PTB. First, a training and testing dataset will be constructed in a 50%/50% ratio. The training dataset will be used for the construction of the models and the testing dataset for testing and calibration of the models. In the training dataset, an AI elastic net model will be used as the primary approach for model creation. The elastic net uses a ridge regression that penalises non-statistically significant variables and reduces their coefficients to zero to avoid a non-robust model. Elastic net also uses the lasso regression, which performs an automatic selection of the variables that best predicts the outcome (PTB). The final model will be selected based on the lowest lambda which includes a 10-fold cross-validation automatically chosen by the lowest Akaike information criterion and Bayesian information criteria. Because of automatic selection of predictors achieved by L1-penalty, no previous subset selection, which typically has been used in previous methods, need to be performed, thereby reducing the variance and instability of the prediction model. Automatic selection of predictors performed in elastic net results in a simpler, sparse model that includes only a subset of variables, thereby allowing for better interpretation of the model.
For the classic approach, we will use the training dataset to create a multivariate nested model using forward and backwards stepwise regression to assess the association between several predictors and the main outcome. A verification inflation factor (VIF) analysis will be performed to stablish possible multicollinearity among variables in the model, where a VIF greater than 10 will be considered as highly multicollinear and will be excluded from the model, and a VIF of 4 or more will be explored. A multiple model will be created using a nested logistic approach in which the base model will comprise all possible maternal characteristics found statistically significant at the univariate analysis, then a second model will be added to the first model using biomarkers such as CL, CCI, to evaluate the added value of the addition of these markers for the prediction of PTB. A Nagelkerke R2 by X2 analysis will be used to calculate is there a statistically significant difference among models. And as a third step, a third model will be created using the first two models and adding new markers for PTB such as microbiome.
The testing dataset will be used for validation of both models using an ROC (receiver operating characteristic) curve analysis (compared by DeLong method), Se, specificity (Sp), PPV, negative predictive value and accuracy as discrimination methods. Calibration of the models will be performed by plotting the observed versus predicted using the testing dataset and compared using the Hosmer-Lomeshow test for goodness-of-fit. Finally, a Kaplan-Meier survival analysis will be used to stablish the difference in time among women selected as high and low risk using the previous model approach. Data will be analysed using STATA V.17 for Mac and R statistics.
16S/18S NGS sequencing
Vaginal samples are processed using Illumina MiSeq Sequencing System. It has kits for adding index (Nextera XT index) and a flow cell (500-Cycle MiSeq Nano), to obtain a 2×250 paired-end configuration considering the Illumina MiSeq V2 platform. Primers are directed to region V3 and V4 regions of the 16S rRNA gene for the identification of bacteria and archaea, suggested by Illumina (F-16S 5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG and R-16S 3'-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC). Likewise, primers targeting the 18S SSU rRNA gene targeting eukaryotes are used based on Amaral-Zettler et al 40 and Stoeck et al 41 (F-1391 5’- GTACACACCGCCCGTC and R-EukBr 3’- TGATCCTTCTGCAGGTTCACCTAC). The protocol Part No. 15 044 223 Rev. B is used to generate amplicons and index addition, as the company suggests.
The initial analysis of the sequences is carried out using the MG-RAST server, which performs rRNA detection by initial BLAST search using SILVA database reduced data set. Subsequently, the rRNA sequences are grouped using a 97% identity percentage and finally identified the samples with the SILVA SSU database.42 Processing options to be used include deleting the readings of artificial replication, Bowtie to eliminate the contamination caused by the host using A. thaliana, TAIR, TAIR9 as a reference, and filtering low-quality sequences using 15 as the lowest Phred score.42 The MEGAN software will be used for the taxonomic and phylogenetic analysis of the data.43 Available from: https://dx.plos.org/10.1371/journal.pcbi.1004957. For statistical hypothesis testing, taxonomic profile analysis, calculation of effect sizes and CIs, the STAMP software (statistical analysis of taxonomic and functional profiles) V.2.1.3. will be used, the software allows comparing between pairs of samples or between groups of samples arranged in two or more treatment groups.44
Findings to date
From January 2020 to August 2021, 261 pregnant women have been invited to participate in the cohort, 251 (96.2%) agreed to participate and signed the informed consent; in the case of minors, the assent was signed too. Seven patients, who reported being under antibiotic treatment, were excluded (figure 3). Three of them were under systemic treatment for urinary tract infection, and the rest were under topical treatment for genital infections. Four patients who did not answer follow-up calls during pregnancy and up to 30 days after the probable delivery date were also excluded.
From the initial phase of the study, 244 patients are included for this analysis, the main results are: The mean age of the patients is 28 years (min 16 and max 45 years). The marital status of the patients is distributed as follows: 73 (28.6%) were married, 35 (13.7%) were single and 147 (57.6%) were in a civil union. The occupation reported by the patients is 110 (43.3%) housewives, 86 (33.8%) employees, 44 (17.3%) freelancers and 14 (5.5%) students (for a total of 254). Concerning the social security plan, 141 (39.2%) are in the contributive system, 90 (35.35) in the subsidised, 15 (5.9%) in the special regimen and 9 (3.5%) are not in any social security plan (for a total of 255). The median body mass index is 25.3 kg/m2 (min 16.5 and max 43.3). The mean gestational age at the time of the study is 13.2 weeks. The median CL is 38 mm (min 22 mm and max 56 mm). The median CCI: median 0.82 (min 0.64 and max 0.95).
In all cases, the amount of DNA extracted from vaginal fluid samples has been considered sufficient, with a mean DNA quantification of 275 ng/μL.
At the time of writing, 100 (39 %) patients had delivered, 11 patients (11%) delivered before 37 weeks.
Data availability statement
All data relevant to the study are included in the article or uploaded as online supplemental information. The information will be accessible as each stage of the work is completed. Data will be available in the papers and in online supplemental data section.
Patient consent for publication
This study involves human participants and was approved by the Research Ethics Committee (CEINCI) of the Universidad Industrial de Santander approved the study on 17 October 2019. Acta No 17. Participants gave informed consent to participate in the study before taking part.
The authors would like to acknowledge the contributions of Monica Beltran and Carolina Parra. The authors also thank the health institutions that have facilitated the recruitment. They also like to thank the mothers who participate in the cohort study.
Contributors CHB-M, BRO, MAP-S, LAD-M and RJM-P conceptualised the protocol and designed the data collection tool; all the authors were involved in writing the final draft of the manuscript. LAD-M and RJM-P supported the methodology. MAP-S was in charge of teaching the CCI technique and evaluating the competence. BRO supported the laboratory technique aspects. CHB-M led the team that recruits patients and performs ultrasound measurements. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work. CHBM is the guarantor and accepts full responsibility for the conduct of this study.
Funding This project is granted by 'Convocatoria Santander Cientifico'. Project code 2542. Universidad Industrial de Santander.
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.