Article Text


Association between access to social service resources and cardiometabolic risk factors: a machine learning and multilevel modeling analysis
  1. Seth A Berkowitz1,
  2. Sanjay Basu2,
  3. Atheendar Venkataramani3,
  4. Gally Reznor4,
  5. Eric W Fleegler5,
  6. Steven J Atlas4
  1. 1 Department of General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, North Carolina, USA
  2. 2 Departments of Medicine and of Health Research and Policy, Stanford University, Stanford, California, USA
  3. 3 Department of Medical Ethics and Health Policy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
  4. 4 Division of General Internal Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
  5. 5 Division of Emergency Medicine, Children’s Hospital Boston, Boston, Massachusetts, USA
  1. Correspondence to Dr Seth A Berkowitz; seth_berkowitz{at}


Objectives Interest in linking patients with unmet social needs to area-level resources, such as food pantries and employment centres in one’s ZIP code, is growing. However, whether the presence of these resources is associated with better health outcomes is unclear. We sought to determine if area-level resources, defined as organisations that assist individuals with meeting health-related social needs, are associated with lower levels of cardiometabolic risk factors.

Design Cross-sectional.

Setting Data were collected in a primary care network in eastern Massachusetts in 2015.

Participants and primary and secondary outcome measures 123 355 participants were included. The primary outcome was body mass index (BMI). The secondary outcomes were systolic blood pressure (SBP), low-density lipoprotein (LDL) cholesterol and haemoglobin A1c (HbA1c). All participants were included in BMI analyses. Participants with hypertension were included in SBP analyses. Participants with an indication for cholesterol lowering were included in LDL analyses and participants with diabetes mellitus were included in HbA1c analyses. We used a random forest-based machine-learning algorithm to identify types of resources associated with study outcomes. We then tested the association of ZIP-level selected resource types (three for BMI, two each for SBP and HbA1c analyses and one for LDL analyses) with these outcomes, using multilevel models to account for individual-level, clinic-level and other area-level factors.

Results Resources associated with lower BMI included more food resources (−0.08 kg/m2 per additional resource, 95% CI −0.13 to −0.03 kg/m2), employment resources (−0.05 kg/m2, 95% CI −0.11 to −0.002 kg/m2) and nutrition resources (−0.07 kg/m2, 95% CI −0.13 to −0.01 kg/m2). No area resources were associated with differences in SBP, LDL or HbA1c.

Conclusions Access to specific local resources is associated with better BMI. Efforts to link patients to area resources, and to improve the resources landscape within communities, may help reduce BMI and improve population health.

  • cardiovascular disease
  • food insecurity
  • health disparities
  • socioeconomic status

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • Extensive individual-level and area-level data.

  • Innovative machine learning methods to overcome issues of collinearity and avoid multiple testing.

  • Use hierarchical linear modelling to account for data structure.

  • Cross-sectional study.

  • No information on use of resources.

Cardiometabolic disease remains the most common cause of morbidity and mortality in the USA.1 Though better control of cardiometabolic risk factors could substantially reduce this morbidity and mortality, individuals with low socioeconomic status (SES) are less likely to achieve recommended goals.2 Among the reasons for this are patient-reported health-related social needs, including food insecurity, housing instability and lack of transportation. These health-related social needs have been associated with higher levels of important cardiometabolic risk factors including increased body mass index (BMI), systolic blood pressure (SBP), low-density lipoprotein (LDL) cholesterol and haemoglobin A1c (HbA1c), even after adjusting for factors like race/ethnicity, income and education.3–8 Proposed mechanisms linking health-related social needs to cardiometabolic risk factors include reduced dietary quality, cost-related medication underuse, reduced cognitive ‘bandwidth’ to attend to health and disruptions in clinical care.9–11 

Healthcare systems are increasingly interested in working with community partners to help link their patients to local resources, such as food pantries or housing agencies, to help meet health-related social needs.12–16 This approach is exemplified by the Accountable Health Communities initiative from the Centres for Medicare & Medicaid Services, which involves screening for adverse social circumstances and linking those who screen positive to community resources.17 However, there remain significant gaps in knowledge regarding such approaches. Critically, healthcare systems need to know which organisations to partner with, and potentially what types of resources to invest in.18 The specific resources that best address a particular health-related need may not be straightforward. For example, a food pantry could help alleviate food insecurity, but so could employment.

To help address these issues, and inform further interventions, we sought to study associations between area resources and cardiometabolic risk factors in a large primary care network. Our goal was to understand which resource types were associated with improved levels of BMI, SBP, LDL and HbA1c, and to determine whether area resources had stronger associations with cardiometabolic risk factors for conditions that are less amenable to clinical management.


Setting and study sample

Data for this study came from two primary sources: an asset mapping of community resources and electronic health records. The asset mapping came from the HelpSteps database, a comprehensive asset mapping of area resources in eastern Massachusetts.19 The clinical records came from a primary care network in eastern Massachusetts, a network of 18 primary care practices, including hospital-based, academic and community health centre sites. All adult (age ≥18 years) primary care patients seen between 1 January 2012 and 31 December 2015 were included. Data were current on 31 December 2015. The most recent patient address was geocoded for the study. Patients without available addresses were excluded—prior work has shown that only 0.15% of patients in this cohort could not be geocoded.20

The Partners Healthcare Human Research Committee approved this analysis, which entailed use of secondary data without patient contact (Protocol Number: 2017P000964).

Patient and public involvement

The study research question was developed in reference to patient priorities regarding the incorporation of neighbourhood factors that promote health into population health management. Patients were not involved in the design of the study or in recruitment. We plan to disseminate study results via open-access publication.

Area resources

HelpSteps ( is a web and mobile screening and referral system for social needs. Originally launched in 2010, the system uses a database of social services throughout the greater Boston area to connect families to appropriate services. The database is maintained in collaboration between Boston Children’s Hospital and the Mayor’s Health Line at the Boston Public Health Commission. Every agency is contacted at least once per year to maintain the accuracy of the data and to grow the database. HelpSteps contains information on area resources across 16 non-mutually exclusive domains: health, housing, food employment, violence, safety, substance abuse, mental health, education, parenting, nutrition, after school, sexual health, transportation, diabetes and care transitions. An example of organisations that would be in the food domain are food pantries. The employment domain would consist of job placement or job training services. And the nutrition domain would include organisations that provide food counselling. Agencies providing multiple resources could be included in more than one domain. Because individual-level data for this study came from 2015, we used information from HelpSteps that was current as of 2015. For this study, ‘area resources’ are defined as the number of organisations found in the HelpSteps database providing assistance for a given domain and within a given geographic area.

After geocoding the addresses for both individuals and the area resource organisation, we created counts, for each individual, of how many resources for each domain were within the same geographic area as they were. We did this at four geographic levels in roughly increasing order of size: census tract (using US Census 2010 boundaries), ZIP code tabulation area (which we refer to throughout this paper as ‘ZIP’ level, owing to common use of the term, again using US Census 2010 boundaries), ‘neighbourhood’ (eg, Allston, Roxbury, a designation based on Boston city planning that may better capture actual movement patterns) and county.

Clinical outcomes

To assess clinical outcomes, we calculated the mean of all values recorded in 2015 from individual’s electronic health record for the following measurements: BMI (in kg/m2), SBP (in mm Hg), LDL cholesterol (in mg/dL) and HbA1c (%). All values were obtained in the process of usual care.


To account for possible confounding of the association between area resources and health outcomes, we collected the following variables from the electronic health record: age (years), gender (male or female), race/ethnicity (non-Hispanic white, non-Hispanic black, Hispanic or Asian/other/multi), education (less than high school diploma, high school diploma [including General Educational Development certificate] or greater than high school diploma), insurance (commercial, Medicare, Medicaid [including dual-eligibles] and uninsured/self-pay), number of clinic visits in 2015, primary language (English vs other), connectedness to their primary care clinic using previously validated algorithm21 and comorbidity (Charlson comorbidity score, and individual indicators of depression, hypertension, coronary heart disease, osteoarthritis and diabetes). To account for area-level differences from factors other than resources, we used data from the US Census’ American Community Survey (5-year estimates 2010–2015) and the USDA’s Food Access Research Atlas: median household income, percent living in poverty, ‘food desert’ status (low income, low food access census tract at 1/2 mile in urban areas and 10 miles in rural areas), unemployment rate, proportion of the area population living in group quarters (eg, those living in a nursing facility unlikely to be exposed to area-level conditions), vehicle access and housing segregation.22 23

Statistical analysis

In this study, we wanted to evaluate the relationship between many resources types and cardiometabolic risk factors. A secondary goal of our study was to help understand the relationship that specific geographic levels and resource types had with clinical outcomes. Because the nested structure of our data violate the statistical independence assumption that underlies parametric, regression-based variable selection approaches (such as forward, backward or stepwise selection), and to avoid multiple hypothesis testing that may lead to the identification of spurious associations, we employed a non-parametric machine learning technique called variable selecting using random forest (VSURF) to screen through variables in the derivation set.24 25 This was done using a derivation data set, which consisted of a random partition of the entire data set. Finally, we used multilevel modelling in the test set (not used in the derivation stage) to test a small number of candidate variables identified by VSURF as being most important to explaining variations in the derivation set. VSRUF is described in more detail in technical online supplementary appendix and efigure 1 .

Supplementary file 1

Multilevel modelling

In the test data set, we fit multilevel linear mixed models to test the association between variables identified in the VSURF step and the outcome of interest. The BMI model included all study participants. The SBP model included those with a diagnosis of hypertension. The LDL model included those with common diagnoses (hypertension, diabetes, coronary heart disease, cerebrovascular disease, congestive heart failure) where LDL lowering is most beneficial. The HbA1c models included those with a diagnosis of diabetes. The models used fixed effects to adjust for age, gender, race/ethnicity, education, insurance, number of clinic visits, language, clinic connectedness, comorbidity and census tract level median household income, poverty rates, ‘food desert’ status, unemployment, numbers living in group quarters, vehicle access and segregation. To account for clustering within practices, we included a practice-level random effects term. To account for area-level clustering, we used a ZIP-level random effects term. These were fit as crossed effects models (ie, we did not nest practices within ZIP codes) to allow for the fact that patients are often seen in practices outside of their ZIP code of residence.

Falsification tests

To reduce the possibility that observed associations due to other unmeasured characteristics of the area, rather than the specific area resource tested, we also conducted falsification analyses. To do this, we used the same modelling approach as above, but tested for the association between area afterschool resources for children and the outcome of interest. Our reasoning was that, since there was unlikely to be any direct effect of afterschool resources for children on adult BMI, any observed association would reflect unmeasured area characteristics not appropriately adjusted for in our model (such as high levels of civic engagement or community organisation, or other beneficial resources).

Variations in clinical management

To help explore whether variations in the intensity of clinical management could explain whether community resources were associated with health outcomes, we also used the above modelling approach to test whether area resources were associated with SBP in those without a diagnosis of hypertension. The primary care network in the study has a quality improvement programme that emphasises the importance of SBP, LDL and HbA1c control in appropriate clinical populations. Since BMI (in any population) and SBP control in those without a diagnosis of hypertension are not included in these programmes, we reasoned that area resources may be more important when clinicians are not intensively attempting to impact an outcome. We focused on BMI and SBP among those without hypertension for this because BMI and SBP are routinely measured at all practice visits for all patients.

Because of its mechanistically plausible relationship with BMI, we used the association between ZIP-level food resources and BMI as the primary outcome, with secondary analyses being the associations between other VSURF selected area resources and clinical outcomes.

Robustness checks

In addition to the main analyses, we conducted a series of robustness checks that examined whether different specifications of resources in the area (eg, resources per capita or resources per capita living in poverty) or different functional forms (eg, including polynomial terms or using splines) would alter the observed associations between area-level resources and outcomes. We also conducted analyses restricted to those with indicators of lower SES (high school diploma or lower educational attainment, living in a ZIP where >15% of individuals are in poverty) to ensure the results were applicable to those most likely to use the resources studied.

A p value of <0.05 was taken to indicate statistical significance. Analyses were conducted in SAS V.9.4 (Cary, North Carolina, USA), Stata 14 (College Station, Texas, USA) and R V.3.3.4 (Vienna, Austria).


Overall, 123 355 participants were included in the study. All participants were eligible for BMI analyses. Based on inclusion criteria, 43 509 were included in the hypertension analyses, 46 940 were included in the LDL analyses and 13 127 were included in the diabetes analyses. Demographic characteristics of the overall sample are presented in table 1. Demographic characteristics of the samples used in the hypertension, LDL cholesterol and diabetes analyses are presented in online supplementary eTables 1–3. Overall, the mean age was 52.4 (SD 16.9) years, the sample was 41.5% men, 82.1% non-Hispanic white, 5.8% non-Hispanic black and 6.5% Hispanic. The median number of years participants followed in our network was 9 (IQR 3, 10), and the median number change of address per year followed was 0.1 (IQR 0.1, 0.25), suggesting that participants resided at their current address for the majority of their time in our network.

Table 1

Demographics of study sample

In general, individuals living in areas with more resources had lower educational attainment and higher rates of Medicaid insurance coverage (online supplementary eTable 4). Maps depicting the distribution of the resources are presented in figure 1 and online supplementary eFigures 2–3.

Figure 1

Food resource density by ZIP.

The mean BMI in the sample was 27.8 (SD 6.2) kg/m2. In the hypertension analyses, the mean BP was 131.6 (SD 15.8) mm Hg. In the LDL analyses, the mean LDL was 102.9 (SD 39.8) mg/dL, and in the diabetes analyses the mean HbA1c was 7.1 (SD 1.5)%.

Among geographic levels assessed, all resources selected were at the ZIP level (table 2). For the BMI analyses, the selected resources were ZIP-level food resources, ZIP-level employment resources and ZIP-level nutrition resources. For hypertension analyses, the selected resources were ZIP housing and ZIP nutrition resources. For LDL analyses, the only selected resource was ZIP nutrition resources. For diabetes analyses, the selected resources were ZIP mental health and ZIP substance use resources.

Table 2

Distribution of the number of resources in the selected resource categories

For the BMI analyses, we tested the association between selected resources and BMI, adjusting for the factors described in the statistical analysis section, and accounting for clustering at the clinic and ZIP level with multilevel linear mixed models. We found that resources associated with lower BMI included more food resources (−0.08 kg/m2 per additional resource, 95% CI −0.13 to −0.03 kg/m2, p=0.001), employment resources (−0.05 kg/m2, 95% CI −0.11 to −0.002 kg/m2, p=0.04) and nutrition resources (−0.07 kg/m2, 95% CI −0.13 to −0.01 kg/m2, p=0.02) (full models for these and all robustness checks in online supplementary eappendix table 5-16). Table 3 compares mean BMI and obesity prevalence at selected numbers of resources, adjusted for the other factors in the model. For example, the mean BMI in neighbourhoods with the median (0) number of food resources was 27.8 kg/m2, while the mean BMI in neighbourhoods in the 75th percentile (three resources) was 27.5 kg/m2 and the 90th percentile (eight resources) was 27.1 kg/m2. Falsification tests found the expected lack of association between afterschool resources and BMI (p=0.67).

Table 3

Estimated BMI, in kg/m2, by resource level

Robustness checks found that our results did not vary substantially with other specifications of area-level resources (online supplementary eTables 5–7).

In the hypertension analyses, neither housing resources (−0.05 mm Hg per additional resource, 95% CI −0.16 to 0.06 mm Hg, p=0.41) nor nutrition resources (0.01 mm Hg, 95% CI −0.13 to 0.16 mm Hg, p=0.87) were associated with SBP after adjustment for individual-level and area-level characteristics. In LDL analyses, nutrition resources (0.10 mg/dL per additional resource, 95% CI −0.36 to 0.55 mg/dL, p=0.67) were not associated with LDL cholesterol in adjusted models. In diabetes analyses, neither substance abuse resources (−0.003% per additional resource, 95% CI −0.03% to 0.02%, p=0.86) nor mental health resources were associated with HbA1c (−0.003%, 95% CI −0.03% to 0.02%, p=0.76).

In analyses looking at SBP among those without a diagnosis of hypertension (ie, those with no reason for clinical management of blood pressure), food resources were associated with lower SBP in linear mixed models adjusted for the same factors as above (−0.08 mm Hg per additional resource, 95% CI −0.15 to −0.01 mm Hg, p=0.03). Mean SBP was approximately 1 mm Hg lower at the 95th percentile (118.9 mm Hg) of food resources compared with the 50th percentile (119.8 mm Hg).

Full models for all analyses are presented in online supplementary eTables 8–16.


This study assessed the relationship among area resources and cardiometabolic risk factors. We found that increasing numbers of food, employment and nutrition resources was associated with lower BMI and lower SBP among those without hypertension. The magnitude of the difference was meaningful at the population level, as the 0.7 kg/m2 difference in BMI between individuals in a well-resourced versus poorly resourced ZIP is similar to the 0.6 increase kg/m2 in BMI in the overall US population from 2006 to 2016.26

Conversely, we found that area resources were not associated with SBP among those with hypertension, LDL cholesterol among those with an indication for LDL lowering or haemoglobin A1c among those with diabetes. This suggests that the relationship between area resources and cardiometabolic risk factors may vary based on whether these factors are targets of intensive clinical management.

This study enhances our knowledge regarding the association of area-level factors and cardiometabolic risk factors. Prior studies have consistently found that adverse area-level factors, such as poverty, are associated with increased cardiometabolic risk, even when adjusting for individual-level factors, such as income.2 27–29 However, we did not know whether the presence of area resources that might plausibly support health, such as food and nutrition resources, would be associated with lower cardiometabolic risk.

The positive and negative associations between community resources and cardiometabolic risk factors may have important public health implications. The association between increased area resources and lower BMI suggests that efforts to help link patients to community resources, and to help improve the resources landscape within communities, may be a successful strategy for improving population health, particularly for risk factors such as BMI where clinical management may not be prioritised.13 14 30 This is reinforced by the finding that SBP, among those without hypertension, is lower in those living in areas with more resources. Since SBP does not come under clinical management for those without hypertension, this finding supports the potential for area resources to impact population health, and is consistent with guidelines that recommend lifestyle, rather than pharmacological, approaches to prehypertension treatment.31 Future work in this area should investigate whether interventions that link individuals to area resources show clinical benefits.

Our finding should be interpreted in light of several limitations. We did not have access to data regarding use of the resources. This means that we do not know whether individuals made use of the resources in their community. In light of this, the association between ZIP-level resources and outcomes could be viewed analogously to an ‘encouragement design’ intervention. This means that the association estimated in this study is likely different than the association that would be estimated if analysing those who were known to use the resource. That association is clearly of policy interest, and should be examined in future work. While we adjusted for several individual-level and area-level SES indicators in order to capture the multidimensional nature of SES and, thus, reduce confounding, it is possible that residual confounding, owing to unmeasured characteristics, exists, which would tend to reduce the observed associations between area resources and outcomes. Additional unmeasured covariates that could affect the observed associations include local culture, and the quality of the resources available. Devising methodology to determine the quality of the services provided to help meet health-related social needs is pressing, and will be an important direction for future investigation. Next, our study was cross-sectional, and thus we cannot establish time ordering between the exposure and the cardiometabolic outcomes. However, we think it is less likely that lower BMI would drive individuals into areas with more resources than vice versa, as areas with higher resources tended to have other adverse features, such as lower income and higher poverty, which are likely more salient considerations for those choosing where to live. Finally, because of the relatively high residential stability within this primary care population, we only examined the association between current area of residence and the study outcomes. However, for those who do move, this could lead to misclassification, which would tend to bias results to the null. These limitations are balanced by several strengths. We had access to a detailed mapping of area resources, along with detailed individual-level health information. Further, in addition to the multilevel framework we used, the use of falsification tests demonstrated that unadjusted area-level factors are not likely to explain the observed results.

In summary, ZIP-level food, employment and nutrition resources were associated with BMI differences that were clinically meaningfully and statistically significant. Further, the association between area resources and cardiometabolic risk factors differed based on the specific risk factor. Investing in area resources and linkage programmes may be an important way to help reduce cardiometabolic risk for vulnerable individuals, especially for situations not under intensive clinical management.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
View Abstract


  • Contributors SAB conducted the data analysis, wrote the first draft of the manuscript and is the guarantor of the article. SAB, EWF and SJA conceived the study. GR assisted with data analysis. SB, AV and GR contributed to interpretation of results and critical revision of the manuscript for intellectual context. All authors (SAB, SB, AV, GR, EWF, and SJA) read and approved the final manuscript for submission.

  • Funding Research reported in this publication was supported by the National Institute for Diabetes and Digestive and Kidney Disease of the National Institutes of Health, and the National Institute on Minority Health and Health Disparities of the National Institutes of Health under Award Numbers DP2MD010478 (SB), U54MD010724 (SB), and K23DK109200 (SAB).

  • Disclaimer The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Statistical code will be available concurrent with publication from Owing to privacy concerns, study data cannot be made publicly available.

  • Patient consent for publication Not required.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.