Objective This study developed and internally validated a predictive model for preterm birth (PTB) to examine the ability of neighbourhood socioeconomic status (SES) to predict PTB.
Design Cohort study using individual-level data from two community-based prospective pregnancy cohort studies (All Our Families (AOF) and Alberta Pregnancy Outcomes and Nutrition (APrON)) and neighbourhood SES data from the 2011 Canadian census.
Setting Calgary, Alberta, Canada.
Participants Pregnant women who were <24 weeks of gestation and >15 years old were enrolled in the cohort studies between 2008 and 2012. Overall, 5297 women participated in at least one of these cohorts: 3341 women participated in the AOF study, 2187 women participated in the APrON study and 231 women participated in both studies. Women who participated in both studies were only counted once.
Primary and secondary outcome measures PTB (delivery prior to 37 weeks of gestation).
Results The rates of PTB in the least and most deprived neighbourhoods were 7.54% and 10.64%, respectively. Neighbourhood variation in PTB was 0.20, with an intra-class correlation of 5.72%. Neighbourhood SES, combined with individual-level predictors, predicted PTB with an area under the receiver-operating characteristic curve (AUC) of 0.75. The sensitivity was 91.80% at a low-risk threshold, with a high false-positive rate (71.50%), and the sensitivity was 5.70% at a highest risk threshold, with a low false-positive rate (0.90%). An agreement between the predicted and observed PTB demonstrated modest model calibration. Individual-level predictors alone predicted PTB with an AUC of 0.60.
Conclusion Although neighbourhood SES combined with individual-level predictors improved the overall prediction of PTB compared with individual-level predictors alone, the detection rate was insufficient for application in clinical or public health practice. A prediction model with better predictive ability is required to effectively find women at high risk of preterm delivery.
- preterm birth
- neighbourhood socioeconomic status
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
Use of multilevel model with random intercept at neighbourhood level allowed to examine the ability of neighbourhood socioeconomic status (SES) to predict preterm birth taking into account the neighbourhood-level variation and intra-class correlation in preterm birth (PTB; relevance of neighbourhood).
Prediction model used simplest multilevel structure with individual and neighbourhood level predictors of PTB, data which can be easily collected in both community and clinical setting.
Internal validation of prediction model using bootstrapping method provided a confidence about the reproducibility of our prediction model although execution of external validation of the model is required to fully understand its performance.
Relevant individual-level and neighbourhood-level predictors such as previous PTB and neighbourhood access to healthcare, which may help to optimise the prediction, are not included in the prediction model.
Our sample over-represents women from urban areas of Alberta, with high SES, thus limiting the generalizability of the findings to urban settings.
Globally, 11.1% of births are preterm.1 Preterm birth (PTB), that is, delivery prior to 37 weeks of gestation, is a major contributing factor to neonatal deaths,2 3 and among the survivors, PTB is also a significant risk factor for short-term and long-term morbidity.3–5 The incidence of PTB and its associated mortality and morbidity could potentially be reduced if women at risk of delivering preterm were identified early in gestation and appropriately managed.6 7 The aetiology of PTB is multi-factorial,8–10 and one risk factor for PTB may be neighbourhood socioeconomic status (SES)10–12: the rate of PTB in low SES neighbourhoods is higher than the rate in high SES neighbourhoods.13–15 Neighbourhood SES is an area-level measure of SES, which aggregates individual SES (such as income, education and employment status) at a certain geographical level.11 The high rate of PTB in low-SES neighbourhoods is not only related to the fact that women living in these neighbourhoods have higher individual-level risk factors for PTB. Neighbourhoods themselves can also increase the risk of PTB by exposing individuals to an elevated risk.11 12 16 Low SES neighbourhoods influence an individual’s ability to fulfil daily needs, access resources, make lifestyle choices and cope with different situations.11 12 16 Accordingly, women living in low SES neighbourhoods have less access to healthy foods, quality health services, opportunities for leisure activity and social support, and have more exposure to societal stressors, crimes and poor air and water quality. All of these neighbourhood-level factors can increase the risk of PTB among women living in these neighbourhoods through material, psycho-social, behavioural, and biological mechanisms.11 12 16 17
While many studies have examined the association between neighbourhood SES and PTB,13–15 our understanding about the ability of neighbourhood SES to predict the risk of PTB is limited. It is possible that even strongly associated risk factors can have a low capacity to discriminate PTB in the population.18–20 Similarly, a statistically significant association between neighbourhood SES and PTB may exist, with small/no variation of PTB at the neighbourhood level.21–23 Thus, the association may provide unreliable information about the likelihood of delivering preterm infants among women living in certain neighbourhoods and may mislead decision-makers in implementing public health interventions targeted at specific areas.21 22 As previous studies have not developed and validated a prediction model for PTB to evaluate the predictive ability of neighbourhood SES, information about the ability of neighbourhood SES to predict PTB is lacking.
A better understanding of the ability of neighbourhood SES to predict PTB has its own importance as it may improve our capacity to accurately discriminate between women at high and low risk of delivering preterm infants.19 24 The accurate discrimination capacity may offer a more valid prediction about the future probability of delivering a preterm infant in an individual woman coming from certain neighbourhoods.19 24 The use of valid prediction models may help us effectively identify women at high risk of delivering preterm infants, and in planning suitable public health interventions targeting women from low SES neighbourhoods, such as appropriate triage of women into low and high risk prenatal care. This is timely and relevant given that individual-level risk factors (including biomarkers) have shown a low discriminatory accuracy in predicting PTB,18 20 resulting in ineffective early identification of women at risk for delivering preterm infants. Therefore, this study developed and internally validated a predictive model to examine the ability of neighbourhood SES to predict PTB.
This study combined existing data sets from two community-based prospective pregnancy cohort studies in Alberta, Canada: All Our Families (AOF: n=3341) and Alberta Pregnancy Outcome and Nutrition (APrON: n=2187)) (figure 1). The description and comparability of these two cohort studies is available elsewhere25 26 and justifies combining these data sources.27 Briefly, each cohort study had similar recruitment periods (2008–2012), inclusion criteria, sampling design and data-collection methods.25 26 Both studies collected data on socio-demographics, lifestyle, social support, depression and PTB25—the core individual-level variables necessary for this research.
We obtained two de-identified cohort datasets linked with neighbourhood SES data from Secondary Analysis to Generate Evidence, the secure data repository developed by PolicyWise for Children & Families, which houses these data sets. Neighbourhood SES data were measured by the median personal income and the Pampalon material deprivation index (both measures were derived from 2011 Statistics Canada census),28 29 which were both aggregated at the dissemination area (DA) level. DA is the smallest geographic unit available in the Canadian census, consisting of 400–700 persons.30 The Pampalon material deprivation index is a composite measure of neighbourhood SES that combines the proportion of persons without high-school diplomas (education), the average personal income (income) and the rate of unemployment (employment) within the DA.28
Patient and public involvement
This study used de-identified secondary data. Patients and public were not involved in this study.
Data harmonisation and combination
Individual-level variables in the two studies were harmonised in each data set considering multiple factors. These factors included whether the variables were completely or partially identical regarding question asked/responded, the response coded (value level, value definition, data type), the frequency of measurement, the pregnancy time-point of measurement and missing values. If the variables were an exact match for each of these factors, they were pooled as is. If the variables were partially matched, data harmonisation was performed considering these multiple factors. The variables were deemed completely un-matched were not combined; thus, they were not included in this study. However, no important variables had to be excluded from the study due to this reason. Once the selected variables were harmonised in each data set, the two data sets were appended into a single new data set. Women who participated in both studies (n=231) were counted only once.
The harmonised variables included maternal age, marital status, ethnicity, duration of stay in Canada, body mass index, parity, education, household income, depression during pregnancy, smoking/alcohol consumption and drug abuse before the pregnancy. Deliveries that occurred before the completion of 37 weeks of gestation were considered as PTB.
Univariate analysis was performed to observe the distribution of each variable. Bivariate analysis using χ2 tests was performed to identify individual-level variables associated with PTB (p<0.25). Multi-variable conventional logistic regression models, followed by multilevel logistic regression models, as outlined by Merlo et al,23 were developed using bootstrapped samples with 1000 replications (training dataset) (online supplementary appendix 1). Missing data were deleted using variable wise or pair-wise deletion approach for bivariate analysis, followed by the listwise deletion approach for regression models. All analyses were performed using STATA/IC software V.14.1.
Supplementary file 1
Model validation and model performance assessment
The bootstrap procedure was employed for internal validation of the model.19 31 Model performance was evaluated in the original sample (validation data set) using measures of model calibration (the correspondence between predicted and observed outcome rates), risk stratification capacity (proportion of women categorised as low risk vs high risk or the distribution of the women in each predicted risk category), and classification performance or discrimination accuracy (true-positive and false-positive rates, positive and negative predictive values, positive and negative likelihood ratios and area under the receiver-operating characteristic curve (AUC)). To obtain these measures, the predicted probability of PTB for each woman was estimated and was categorised into four risk groups (<5%, ≥5%–10%, ≥10–15% and ≥15%). The difference in AUC estimates between the bootstrapped sample and the original sample was assessed as described by optimism.19 31 Data on prenatal care and previous PTB were not available in APrON cohort data set. A sensitivity analysis was performed using only the AOB dataset, whereby two variables—previous PTB and total number of prenatal care visits—were added to the final models (conventional logistic regression model and multilevel random effect model) to assess whether addition of these variables improved model performance.
The total sample size from the combined cohort was 5297. The proportion of missing data ranged from 1.52% for depression to 7.51% for gestational age at delivery. The majority of women were younger than 35 years, were married or living with a common-law partner, were Caucasian and approximately half of the women were primiparous. Almost three-quarters of women had completed more than high-school education and had a household income ≥$70,000, while approximately one-quarter of women were living in the least deprived neighbourhood (table 1). Overall, 7.26% (95% CI: 6.57 to 8.07) of women delivered preterm infants, with 7.54% among women living in the least deprived neighbourhoods and 10.64% among women living in the most deprived neighbourhoods. Compared with women who delivered at term, a higher proportion of women who delivered preterm infants were primiparous, non-white, obese and were living in the most deprived neighbourhood (table 1).
As shown in table 2, a conventional logistic regression model that included individual-level predictors (parity, ethnicity, body mass index, smoking, depression and household income) showed an AUC of 0.60 (95% CI: 0.56, 0.63). The multi-level model that included individual-level predictors and a random effect at the neighbourhood level showed large variation in PTB at the neighbourhood level (neighbourhood variance: 0.20, intracluster correlation (ICC): 5.72%, median OR (MOR): 1.53), with an AUC of 0.75 (95% CI: 0.73, 0.78). After inclusion of neighbourhood SES (deprivation index) in the multi-level model, although deprivation index was not significantly associated with PTB (OR: 1.19, 95% CI: 0.78, 1.79), neighbourhood variance decreased to 0.15, the ICC to 4.45% and the MOR to 1.46, with an AUC of 0.75 (95% CI: 0.73, 0.78). The MOR of 1.46 for PTB indicates that in the median case, the residual heterogeneity between neighbourhoods increased by 1.46 times the individual odds of PTB when randomly picking out two persons in different neighbourhoods. Furthermore, the multi-level model that contained median personal income, as a measure of neighbourhood SES, showed similar variance as the model that contained deprivation index.
Predicted probabilities of PTB in the multi-level model that contained individual-level predictors and deprivation index ranged from 2.77% to 27.00%. Calibration of the model predicting PTB was adequate, as shown by an agreement between the model-predicted probability for PTB and the proportion of observed PTB, particularly for low-risk categories. Specifically, the observed PTB rate within the predicted risk category of ≥5%–10% was 7.30%, which falls within the risk category range; the same was true for the risk category of <5%. The risk-stratification capacity of the model was adequate; it assigned women to the different risk of PTB, where almost 90% of women were assigned to low-risk category (table 3).
The classification accuracy of the model ranged from 33.09% to 92.30% in the different predicted risk categories: the proportion of women with preterm delivery who were identified as high risk for PTB (sensitivity) ranged from 5.70% to 91.80% and the proportion of women without preterm delivery who are identified as low risk (specificity) ranged from 28.50 to 99.10. The positive and negative likelihood ratios of the model for the highest predicted risk category for PTB were 6.22 and 0.95, respectively. The difference in the AUCs between the bootstrap sample (AUC: 0.75, 95% CI: 0.73, 0.78) and original sample (AUC: 0.75, 95% CI: 0.73, 0.78) was negligible (ie, optimism: 0.0001). While the multi-level model that contained median personal income showed similar model performance as the model that contained the deprivation index (except for sensitivity and positive predictive values for the highest risk category), the logistic regression model that included individual-level variables showed lower model performance (table 3 and figure 2). In the sensitivity analysis, the addition of variables related to prenatal care visits and previous PTB did not change the model performance. The AUC increased by 2.00% for the conventional logistic regression model but did not increase for the multi-level random effect model that contained the neighbourhood SES variable.
This study developed and internally validated a prediction model to examine the ability of neighbourhood SES to predict the risk of PTB. This study found that approximately 6% of the total variance in PTB was attributable to neighbourhood circumstances (ICC: 5.72%), and neighbourhood SES explained one quarter of the neighbourhood-level variation in PTB. Neighbourhood SES combined with individual-level predictors (parity, ethnicity, body mass index, smoking, depression and household income) predicted the risk of delivering a preterm infant with an AUC of 0.75. The sensitivity was 91.80% at a lowest risk threshold, with a cost of high false-positive (71.50%), and the sensitivity was 5.70% at a highest risk threshold, with a low false-positive (0.90%). Neighbourhood SES combined with individual-level predictors had a good risk stratification and a modest calibration ability for identifying woman at risk for delivering a preterm infant.
Model discrimination (measured by AUC) was improved substantially when we combined individual-level predictors with neighbourhood level information. While it has been previously demonstrated that individual-level predictors including maternal characteristics, clinical risk factors and biomarkers have low discriminatory accuracy in predicting the risk of PTB (AUC ranged from 0.60 to 0.67),18 20 our study enhances our understanding that adding the neighbourhood-level information can improve the discriminatory accuracy of PTB. Furthermore, it is important to note that a multi-level model that included a random effect for neighbourhood and individual-level information gives the maximum AUC that can be obtained by combining available individual-level information and the neighbourhood identity.23 Neighbourhood identity captures the totality of potentially observable and unobservable neighbourhood factors.23 32 33
As suggested by the classification performance of the model including neighbourhood SES and individual-level predictors, a large proportion of women who were identified as high risk actually did not deliver preterm. Positive predictive value was improved, but still too low, as the predicted risk threshold increased, which was related to the high proportion of PTB in the threshold. The model had low sensitivity (5.70%) at the highest risk threshold, with a low false-positive value (0.90%). This means that a substantial number of women who were at high risk for delivering PTB would be identified as low risk.34 The LR positive test was improved (up to 6.22) for the highest risk threshold; however, this group only includes <6% of total women who actually delivered preterm. This dichotomy between improved LR and poor detection rates has also been noted previously.35
While the prediction of PTB risk using neighbourhood SES is suboptimal, other commonly recognised risk factors for PTB also failed to sufficiently predict PTB. For example, it has been noted that a history of prior PTB has an LR +of 3.24, short cervical length has an LR +of 2.0 and vaginal fetal fibronectin has an LR +of 3 in predicting PTB.36 Similarly, for a fixed false-positive rate of 10%, maternal characteristics and obstetrical history have a sensitivity of 27.5% for PTB with an AUC of 0.61.20 The less optimal predictive performance for identifying the risk of PTB may be related to the complex underlying aetiology of PTB, and a combination of multiple aspects of predictors (such as biomarkers, clinical risk factors, socio-demographics and health behaviours) may be required to adequately predict such an outcome.35 37 Our study further shows that inclusion of neighbourhood SES along with multiple individual-level predictors would further improve the prediction of PTB. Altogether, it implies that identification of women at risk for delivering preterm infants should rely on multiple factors, and even women identified as low risk for PTB may need further monitoring/assessment and high-quality prenatal care should be universal.
Our findings on neighbourhood variation and clustering of PTB suggest that pregnant women from the same neighbourhoods are more similar to each other than to women from different neighbourhoods with respect to the risk of PTB, and that some portion of this variation is related to neighbourhood SES. Overall, this finding reflects the presence of health disparities in PTB between neighbourhoods in Alberta and justifies the relevance of neighbourhood including neighbourhood SES and neighbourhood targeted interventions. Furthermore, the share of the variance in PTB that are explained by neighbourhood-level variance (as measured by ICC) offers an understanding about the discriminatory accuracy as it corresponds to the AUC23; when the ICC is high, the AUC is also high.23 However, previous research has emphasised identifying neighbourhood-level risk factors associated with PTB or causal effects, which is difficult to establish due to the potential challenges. These challenges include reverse causation between neighbourhood circumstances and health, unmeasured confounding, residential mobility, possibility of same individual variable being confounder and mediator and changes in neighbourhood context over the life process.11 12 38 Thus, a study aiming to establish a causal association demands longitudinal study design with the repeated measurement of neighbourhood characteristics and outcomes over time in life-course processes.11 12 38
Strengths and limitations of study
To our knowledge, our study is the first to develop and internally validate a predication model for PTB to investigate the ability of neighbourhood SES to predict the risk of PTB, in contrast to the previous studies that examined mostly the association between neighbourhood SES and PTB. Our finding allows us to understand the relevance of area of residence (in general), and more specifically area-level SES, in predicting the risk of maternal health outcomes. Our study used the simplest multilevel structure with individual -level and neighbourhood-level predictors of PTB, data which can be easily collected in both community and clinical settings.
Our findings should be interpreted with a consideration of the limitations of our study. We were not able to separate-out spontaneous and iatrogenic PTB in the model due to data limitations; the predictive performance might be improved with a focus on spontaneous PTB. Our sample over-represents women from urban areas of Alberta, with high SES,26 39 40 thus limiting the generalizability of the findings to urban settings. The observed predictive ability of neighbourhood SES would have been underestimated as the relevance of neighbourhood SES status might be higher for those with low SES. Although the observed small difference in discriminatory accuracy between the bootstrapped sample and the original sample provided us a confidence about the reproducibility of our prediction model, as the model was internally validated, it possibly showed artificially high performance; thus, model validation should be confirmed against external data. Use of area-based variables, where women living in the same area share the same value for the variable, can be a methodological problem. Results on outcomes could be affected by what geographical level or unit we choose to define area in the study. Individuals who live in the same area may also experience different contextual influences from many other areal units, and the timing and duration in which individuals experienced these contextual influences is also uncertain. Thus, it is hard to interpret neighbourhood influences on outcomes, including the performance of the model that contains neighbourhood level variable. However, we defined neighbourhoods using smallest area (ie, dissemination area), where people living in the smallest area are more likely to be similar for the outcomes, and used multi-level analysis that accounts for area-level variation, an appropriate analytical approach for multi-level data.
Although the predictive performance of the model that contained neighbourhood SES and individual level predictors was better compared with the performance of individual level predictors alone, the performance was too low to consider its application in clinical or public health practices. While the development and validation of our predictive model is an important first step towards the early identification of women at high risk for PTB based on neighbourhood risk assessment, a clinically relevant validated model to predict the risk of PTB is yet to be identified. Future studies could develop a prediction model for PTB considering other clinically relevant individual and neighbourhood-level predictors, separating out spontaneous and iatrogenic PTB in the model, and externally validating their results to optimise the prediction and to improve its usefulness. The application of clinically useful prediction model would support healthcare providers and public health practitioners to make informed decisions on their care by improving their ability to identify woman most at risk of delivering preterm. As such, community-level interventions combined with an individual-centred approach that attempts to change neighbourhood circumstances (health promoting or damaging features of neighbourhood including SES) and population characteristics (with focus to modifiable predictors) may be effective in reducing the incidence of PTB.
KA is supported by the Vanier Canada Graduate Scholarship from the Canadian Institutes of Health Research and the Alberta Innovates Studentship Award from the Alberta Innovates Graduate Studentship. AM is supported by a Canadian Institutes of Health Research New Investigator Award. We acknowledge the All Our Families and the Alberta Pregnancy Outcomes and Nutrition cohort study teams for providing permission to use their data. We acknowledge SAGE (Secondary Analysis to Generate Evidence), the secure data repository developed by PolicyWise for Children and Families, which houses these data sets, for providing access to these data sets.
Patient consent for publication Not required.
Contributors KA involved in the conception and design of the study. She is also responsible for conducting the analysis, interpreting the data and drafting the manuscript. AM provided overall supervision to KA in conducting this study and contributed to conception and study design, interpretation of data, provided intellectual content and revisions to manuscript. SBP, TW, ABP, SP, ST, NL and GG were involved in the conception and design of the study and provided interpretation and intellectual content to subsequent drafts of the manuscript. All authors read and approved the final draft.
Funding KA received the Vanier Canada Graduate Scholarship (Award code: 201611CGV- 382013-267341) and the Alberta Innovates Studentship Award (Award code: 201610474) to conduct this study.
Competing interests None declared.
Ethics approval Ethics approval for this study was obtained from the Conjoint Health Research Ethics Board at the University of Calgary.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Additional data such as statistical codes, supplementary tables and technical appendix are available upon request (by emailing Kamala Adhikari: firstname.lastname@example.org)
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.