Article Text

Download PDFPDF

Decomposition of the US black/white inequality in premature mortality, 2010–2015: an observational study
  1. Mathew V Kiang1,2,
  2. Nancy Krieger2,
  3. Caroline O Buckee3,4,
  4. Jukka Pekka Onnela5,
  5. Jarvis T Chen2
  1. 1 Center for Population Health Sciences, Stanford University, Palo Alto, California, USA
  2. 2 Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
  3. 3 Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
  4. 4 Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
  5. 5 Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
  1. Correspondence to Dr Mathew V Kiang; mkiang{at}


Objective Decompose the US black/white inequality in premature mortality into shared and group-specific risks to better inform health policy.

Setting All 50 US states and the District of Columbia, 2010 to 2015.

Participants A total of 2.85 million non-Hispanic white and 762 639 non-Hispanic black US-resident decedents.

Primary and secondary outcome measures The race-specific county-level relative risks for US blacks and whites, separately, and the risk ratio between groups.

Results There is substantial geographic variation in premature mortality for both groups and the risk ratio between groups. After adjusting for median household income, county-level relative risks ranged from 0.46 to 2.04 (median: 1.03) for whites and from 0.31 to 3.28 (median: 1.15) for blacks. County-level risk ratios (black/white) ranged from 0.33 to 4.56 (median: 1.09). Half of the geographic variation in white premature mortality was shared with blacks, while only 15% of the geographic variation in black premature mortality was shared with whites. Non-Hispanic blacks experience substantial geographic variation in premature mortality that is not shared with whites. Moreover, black-specific geographic variation was not accounted for by median household income.

Conclusion Understanding geographic variation in mortality is crucial to informing health policy; however, estimating mortality is difficult at small spatial scales or for small subpopulations. Bayesian joint spatial models ameliorate many of these issues and can provide a nuanced decomposition of risk. Using premature mortality as an example application, we show that Bayesian joint spatial models are a powerful tool as researchers grapple with disentangling neighbourhood contextual effects and sociodemographic compositional effects of an area when evaluating health outcomes. Further research is necessary in fully understanding when and how these models can be applied in an epidemiological setting.

  • Bayesian joint model
  • racial/ethnic inequality
  • risk decomposition
  • spatial epidemiology

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

View Full Text

Statistics from

Strengths and limitations of this study

  • Descriptive epidemiology is often limited in situations where areas or subpopulations may have few observations of the health outcome of interest. We present a small area estimation method used in spatial statistics that allows estimating these outcomes in a disciplined way.

  • This method may be useful for understanding spatial patterns, generating new hypotheses and estimating health outcomes when data are limited.

  • Like all models, this method makes assumptions that must be assessed and carefully explored and is limited to descriptive (ie, non-causal) questions.


Understanding differences in health outcomes across geographic and sociodemographic subpopulations is essential for improving population health. Research has consistently found large differences in health outcomes by geography and sociodemographic characteristics including race/ethnicity and income.1–3 This important heterogeneity is masked when using aggregated measures of health such as national life expectancy.4 5 Measuring how health varies across geographic space and sociodemographic characteristics is crucial to informing health policy and improving overall health.

Examining geographic variation of health outcomes is often complicated due to small populations or few observations of the outcome of interest. This data sparsity results in highly variable estimates, making it difficult to understand the underlying risk. In addition, investigating geographic variation in health disparities between two subpopulations is difficult when using these volatile estimates, especially for relative measures where unstable estimates in the denominator may result in undefined or extreme over/under-estimates.6

In order to address these issues of small area estimation, researchers employ three common approaches—aggregation, suppression and multilevel modelling. First, researchers may choose to aggregate over geography, time, sociodemographic characteristics or a combination of these to obtain sufficiently large sample sizes for stable estimates. Second, researchers may choose to suppress estimates from areas with small populations or remove areas with inadequate sample size from their analyses. While convenient, these two approaches result in loss of information, thus limiting their ability to inform health policy. Third, researchers may use Bayesian hierarchical models to pool information across geography, time or subpopulations. However, most Bayesian hierarchical approaches share information across geography or time while treating subpopulations independently.

One type of Bayesian hierarchical model that treats subpopulations in a more sophisticated way is the shared component model. This model jointly estimates the spatial variation in two subpopulations, which ameliorates issues of sparse data by pooling information across both geographic areas and subpopulations.7 The simultaneous modelling of common spatial patterns of risk across different population groups can be conceptualised as providing evidence of latent risk factors that have shared spatial structure. Conversely, this model also estimates divergent spatial patterns, which may provide evidence of risk factors unique to only one group.8 Finally, joint modelling approaches also produce estimates that are more precise than separate stratified models, especially when one subpopulation is small relative to the other.8 9

While originally developed for jointly modelling two related diseases, this model has also been used to estimate geographic variation in health inequalities by gender10 and racial/ethnic disparities in epilepsy11 and infant mortality.6 However, the shared component model is still underused in the social epidemiology literature. In the cases where it is used, computational restrictions often require estimating only a single region11 or using a simplified version of the model.6

In the USA, racial/ethnic disparities in a range of health outcomes, including premature mortality, have been noted since the earliest health records,12 fluctuating in magnitude,13 while remaining spatially persistent over time.14 Premature mortality is a key indicator of population health and is often used to guide health policy worldwide.15 Many premature deaths are considered preventable16 and thus amenable preventive health interventions. In addition to reducing population-wide premature mortality, a central goal of the US Healthy People 2020 initiative17 and the WHO Health for All programme18 is to reduce health disparities between subpopulations. We used the shared component model to estimate non-Hispanic white and non-Hispanic black premature mortality risk for 3108 US counties. In addition, for each county, we calculated the racial/ethnic disparity in terms of risk ratios.

The shared component model decomposes geographic variation in premature mortality risk into three spatial ‘risk surfaces’. First, the shared surface represents unobserved covariates that may explain geographic variation for both blacks and whites. In addition, the model produces two race/ethnicity-specific risk surfaces, which represent the risk in each county that is specific to the corresponding racial/ethnic group. In this paper, we present a computationally feasible way of estimating the full shared component model on the entire contiguous USA. While this model is general enough for any pair of outcomes or groups, we present an application of the method to black/white disparities in premature mortality in the USA. We show that this nuanced decomposition of premature mortality risk at the county level could be used to inform health policy, create targeted intervention strategies and improve health while decreasing geographic and racial/ethnic disparities in US premature mortality.



We used the US National Center for Health Statistics’ compressed mortality files and accompanying population files19 to tabulate premature deaths by sex, race/ethnicity and 5-year age category (0–4, 5–9, …, 85+) for non-Hispanic whites and non-Hispanic blacks—hereafter referred to as ‘whites’ and ‘blacks’—from 2010 to 2015. Following previous studies,13 20 we conservatively defined premature mortality as death before age 65. The majority of counties observed fewer than 10 deaths among the black population (see online supplementary appendix figure 1). We calculated age/sex-standardised mortality ratios (SMRs) using the indirect method with the overall national premature mortality rate, including both non-Hispanic whites and non-Hispanic blacks, as the reference rate. Data are available by request using the National Vital Statistics System website (

We used the 5-year American Community Survey to extract 2015 median income in the previous 12 months (in 2015 dollars) of all households for each county.21 Income was mean-centred and scaled to $10 000 per unit. Four counties (0.1%) with missing values for median income were mean-imputed (see online supplementary appendix figure 2).

County adjacency was defined using the US Census Bureau’s 2010 County Adjacency File.22 Consistent with previous research,6 we limited our analysis to the contiguous USA and the District of Columbia.

Patient and public involvement

This study used de-identified data on deceased individuals and did not require patient or public involvement.


Shared component model

Our model builds on the Knorr-Held and Best7 shared component model. Specifically, assume we have observed and expected counts Embedded Image and Embedded Image , respectively, for each race/ethnicity Embedded Image within counties Embedded Image , which are nested in states Embedded Image . The county-level SMR is Embedded Image . The county-specific relative risks Embedded Image and Embedded Image are modelled hierarchically as a log-linear function of a Poisson distribution. Each race/ethnicity d is modelled with group-specific intercepts Embedded Image , vectors of coefficients Embedded Image , design matrices Embedded Image and spatial parameter Embedded Image . Both racial/ethnic groups have a shared spatial parameter Embedded Image , which is scaled by δ . In addition to the county-level spatial parameters, there is an independent, normally distributed state random effect Embedded Image , which represents our belief that multiple levels of governmental characteristics and policies can affect health.

Embedded Image

Embedded Image

Embedded Image

The racial/ethnic-specific spatial components are assumed to be independent with Embedded Image and Embedded Image representing race/ethnicity-specific risk separately. Conversely, Embedded Image represents a latent risk surface that is shared between both blacks and whites. All three spatial components are assigned the Besag-York-Mollié (BYM) convolution prior. In a BYM prior, the spatial effect of a particular county is dependent on the effects of all its neighbours. In addition, the BYM includes an unstructured random effect to account for independent county-specific noise.23 The magnitude of the shared component may differ by the scaling factor δ , which is estimated from the data and allows each race/ethnicity to have a different ‘risk gradient’. A simplified schematic of the shared component model is shown in online supplementary appendix figure 3.

Priors and hyperpriors

We assigned both intercepts a flat prior of Embedded Image . For the income-adjusted model, the coefficient of county median household income was assigned a weakly informative prior of Embedded Image . The inverse square of the scaling factor, Embedded Image , can be interpreted as the risk ratio of the shared component. Accordingly, we assigned a prior of Embedded Image , which corresponds to our belief that the risk ratio of the shared component has a 95% probability being in the interval Embedded Image and is consistent with previous research.6 7 We used weakly informative hyperpriors for the precision terms in the BYM components. Following Gelman,24 we reparameterised our precisions as SD with priors of Embedded Image .

Risk ratio

For each county, we calculated the within-county black/white disparity in premature mortality on the relative scale, which we refer to as the ‘risk ratio’ to differentiate it from the area-specific relative risk Embedded Image . Specifically, the log risk ratio was defined from the earlier equations as:

Embedded Image

Embedded Image

Embedded Image


All analyses were conducted using the statistical programming language R V.3.3.225 with the RStan V.2.14.1 package.26 The split-chain Gelman-Rubin Embedded Image diagnostic27 was used to check convergence. All parameters had Embedded Image 0, indicating convergence with adequate effective sample size. Due to computational limitations, we did not conduct formal sensitivity analyses of model parameters and hyperparameters.

To quantify the between-county variation, we calculated the empirical variance for each risk surface. Similarly, we calculated the empirical variance of the overall (log) county-specific relative risks and the fraction of total spatial empirical variation shared by both racial/ethnic groups.

We estimated the posterior probability of excess relative risk greater than 1 or reduced relative risk less than 1 for each region, each surface and for the spatially varying disparity. Consistent with previous literature,28 we used a cutpoint of 80%, which simulation studies have shown to have adequate sensitivity and specificity,29 and only plot counties that meet this cutpoint. Estimates for all counties and their posterior probability are shown in the online supplementary appendix figures 4 and 5.


Standardised mortality ratios and county-specific relative risks

From 2010 to 2015, 2.85 million non-Hispanic white premature deaths and 762 639 non-Hispanic black premature deaths were recorded from observed populations of 995.5 million and 220.9 million, respectively (table 1). The spread of county-level premature mortality risk for blacks was greater than that of whites in both the raw data (SMRs) and modelled relative risk estimates (table 1). High white SMRs are observed in Appalachia, Oklahoma and other southern states, as well as parts of Nevada and Arizona (figure 1). In the income-adjusted model, the 95th percentile county had three times higher white relative risk than the 5th percentile county (table 1). High black relative risks are observed in the South, Mid-Atlantic region and parts of California and Nevada (figure 1). In the income-adjusted model, the 95th percentile county had 5.6 times higher black relative risk than the 5th percentile county (table 1). Model parameters are shown in online supplementary appendix table 1.

Figure 1

County-level premature mortality risk by race/ethnicity after sex- and age-standardisation. The top row is the unsmoothed (raw) standardised mortality ratio. The middle row is smoothed county-specific relative risk with no income adjustment. The bottom row is smoothed county-specific relative risk and adjusted for county-level median household income.

Table 1

Distribution of counts of deaths and population for non-Hispanic blacks and non-Hispanic whites under 65, and county-level standardised mortality ratios (SMRs) by race/ethnicity for the USA, 2010–2015. Modelled SMRs are county-level posterior median values with and without adjusting for county median household income. All SMRs are in reference to the total US population after sex/age-standarisation

County median household income was associated with lower premature mortality rates for both whites and blacks. Specifically, a $10 000 increase in median household income corresponded to within-race rate ratio of 0.88 (95% uncertainty interval (UI): 0.87 to 0.88) and 0.91 (95% UI: 0.90 to 0.92) for whites and blacks, respectively. After adjusting for county median household income, the between-county empirical variance in (log) relative risk was reduced. Specifically, the between-county empirical variance in white (log) relative risk was reduced 50% from 0.08 (95% UI: 0.07 to 0.08) to 0.04 (95% UI: 0.03 to 0.04). On the other hand, adjustment for county median household income reduced the larger between-county variance in the black (log) relative risk by only 14%, from 0.14 (95% UI: 0.13 to 0.16) to 0.12 (95% UI: 0.11 to 0.13). The between-state variance remained small (0.00; 95% UI: 0.00 to 0.01) in both models.

Spatial components

Geographic variation of shared riskEmbedded Image

Before adjusting for county median household income, patches of high shared risk exist in the South and around Appalachia (figure 2, middle row). Meanwhile, the Midwest and Pacific Northwest have areas of low shared risk; however, after accounting for income, the shared risk is attenuated (online supplementary appendix figure 6) such that only Appalachia and parts of Texas remain at an elevated shared risk. After income adjustment, the between-county variance of the shared (log) risk decreased 33% from 0.03 (95% UI: 0.02 to 0.05) to 0.02 (95% UI: 0.01 to 0.03).

Figure 2

County-level white-specific (top), shared (middle) and black-specific (bottom) premature mortality risk before (left) and after (right) adjusting for county median household income. Both models use sex/age-standardised rates. Non-significant counties are grey. Significant counties are defined as counties with greater than 80% of posterior estimates above or below 1.

Geographic variation of white-specific riskEmbedded Image

The white-specific risk, Embedded Image , in the unadjusted model shows significant excess risk in Appalachia and parts of the South and Nevada (figure 2, top row) as well as lower risk in the Midwest and coastal Mid-Atlantic states. However, after adjusting for county median household income, there is substantial attenuation of white-specific risk (online supplementary appendix figure 6) with some elevated risk still present in Appalachia and the lower risk in the Mid-Atlantic no longer significant. The between-county variance of the white-specific log risk decreased 60% from 0.05 (95% UI: 0.03 to 0.06) to 0.02 (95% UI: 0.00 to 0.03). The fraction of the total geographic variation of premature mortality risk among whites that is shared with blacks increased after adjustment for county median income, from 30% (95% UI: 13% to 36%) to 52% (95% UI: 13% to 88%) (table 2).

Table 2

Posterior median (95% uncertainty interval) for variance components. The adjusted model takes into account county median household income while the unadjusted model does not. Both models are based on sex/age SMRs

Geographic variation of black-specific riskEmbedded Image

In the unadjusted model, the black-specific risk, Embedded Image , is elevated in the South, Mississippi Delta and parts of California (figure 2, bottom row). There is reduced risk in the Northeast, Colorado and northern Michigan. However, unlike the shared and white-specific risks, the geographic variation in the black-specific risk was amplified after adjusting for county median household income (online supplementary appendix figure 6), with the between-county variance increasing 29% from 0.07 (95% UI: 0.03 to 0.11) to 0.09 (95% UI: 0.05 to 0.11). Specifically, high-risk counties in the South become even higher risk while low-risk counties of the Midwest and West become even lower risk (figure 2) with more high-risk areas in Wisconsin and California. The fraction of total geographic variation of black risk that is shared with whites decreased after adjustment for county median household income from 42% (95% UI: 23% to 81%) to 15% (95% UI: 6% to 57%) (table 2).

County-specific black/white risk ratio

Within-county black/white risk ratio had a minimum value of 0.33 (95% UI: 0.20 to 0.44) in both models to 5.07 (95% UI: 4.77 to 5.40) in the model without county median household income and 4.55 (95% UI: 4.24 to 4.89) in the model with income adjustment. Counties in the South, California and the Mid-Atlantic region had statistically significantly higher premature mortality risk for blacks than for whites. Conversely, counties in Appalachia, Nevada and parts of Colorado showed statistically significantly lower premature mortality risk for blacks than whites. The county-level risk ratio remained largely unchanged after adjusting for county median household income (figure 3 and online supplementary appendix figure 7), despite substantial changes in the race/ethnicity-specific and shared components. The global disparity (ie, the spatially invariant difference in risk) was only a small part of the county-level risk ratio at 1.10 (95% UI: 1.08 to 1.12) for both the income-adjusted and unadjusted models.

Figure 3

County-level risk ratio (black/white) of premature mortality before (top) and after (bottom) adjusting for county median household income. Both models use sex/age-standardised rates. Non-significant counties are grey. Significant counties are defined as counties with greater than 80% of posterior estimates above or below 1.


Within the contiguous USA, there is considerable geographic variation in premature mortality for both whites and blacks. For reference, the fifth percentile of county premature mortality rates for both whites and blacks is approximately 30% higher than the average premature mortality rate in high-income European counties such as Norway, the Netherlands and Luxembourg (approximately 180 vs 140 per 100,000; see online supplementary appendix table 2 for other list of premature mortality rates across various countries). Conversely, the 95th percentile county premature mortality rate for whites is similar to the average rate in Bulgaria or Latvia (approximately 330 per 100,000); for blacks, the 95th percentile rate is similar to post-Soviet nations such as Belarus, Ukraine and Lithuania (approximately 420 per 100,000; see online supplementary appendix table 2). In addition, there is large geographic variation in the county-level racial/ethnic disparity between these two groups. Both within-race geographic variation and within-county racial/ethnic disparity persist after adjusting for county median household income. This geographic variation is masked when aggregating over larger areas and cannot be directly estimated for many counties with small populations; however, the shared component model offers a valuable method of smoothing data across both geography and race/ethnicity simultaneously. This method allows for reliable estimates of premature mortality, by race/ethnicity as well as estimates of the disparities between groups. Further, the shared component model allows for premature mortality risk to be decomposed into county-level risk specific to each race/ethnicity as well as the county-level risk shared between each group.

Before interpreting our results, it is important to note the limitations of this study. First, we refrain from making any causal interpretations due to the ecological nature of the study and the lack of temporal data. This descriptive model is designed to inform policy, generate hypotheses and predict areas of risk, but is not suited for causal inference. This is especially important in regard to attempts to interpret the impact of county median household income. Our finding that county income explains more white risk than black risk is, at least in part, likely due to whites comprising a larger proportion of the population and thus county median household income is more correlated with white income. Finally, it is possible that the composition of people within counties changed substantially over the 6-year period. However, despite these limitations, we believe this study demonstrates the value of the shared component model in health disparities research.

Consistent with the literature, our results suggest that whites Embedded Image experience excess premature mortality in Appalachia,30–32 while blacks Embedded Image experience excess premature mortality in the South.1 5 33 The race/ethnicity-specific components reflect spatially patterned risk factors that are not shared with the other group. Thus, for whites, this heightened risk in Appalachia likely reflects noted issues of low income and education, geographic isolation, reduced access to care and environmental factors.34 Similarly, the black-specific component in the South is consistent with the research about the lasting impact of slavery and racism, differences in opportunity structure, black-specific experience of county-level poverty and socioeconomic conditions, and differential access to care.13 35–37 In addition, the shared component describes areas that are at higher or lower risk for both blacks and whites, simultaneously. Even after adjusting for county median household income, much of the South has increased premature mortality risk for both whites and blacks while the Midwest has decreased risk for both.

In addition, our results suggest that county median household income explains more of the county-level white risk than black risk. Furthermore, we find that half of the geographic variation in white risk is shared with blacks, while only 15% of the geographic variation in the black risk is shared with whites. That is, after adjusting for county median household income, the majority of geographic variation in premature mortality risk for blacks is not shared with whites. Further research should be conducted to understand sources of county-level variability in black risk. Finally, we found between-state variation to be small relative to between-county variation in both models, again reiterating the importance of estimating geographic variation at the local level.

There is wide geographic variation in the racial/ethnic disparity of premature mortality risk. Specifically, the county-level black/white risk ratio estimates ranged from 0.30 (95% UI: 0.20 to 0.44) to 4.56 (95% UI: 4.24 to 4.89) in the income-adjusted model. However, despite substantial changes in the shared and race/ethnicity-specific surfaces after adjusting for county median household income, we find almost no change in the spatial pattern of the black/white risk ratios of premature mortality before and after adjustment. Other aspects of structural racism, such as racial residential and occupational segregation, could plausibly contribute to the inequalities not accounted for solely by median household income.12

This study is the first to jointly model premature mortality risk in non-Hispanic blacks and non-Hispanic whites. The joint modelling approach identified counties with higher or lower risk unique to blacks or whites as well as counties with shared risk, despite small counts of the black population and deaths in many counties. This decomposition of risk, in addition to more precise estimates in small populations, suggests joint spatial modelling in general, and the shared component model specifically, may be useful tools for researchers to measure the impact of interventions, inform policy and generate new hypotheses in studying in health disparities across geography and sociodemographic characteristics. This nuanced decomposition of risk may be a powerful tool as researchers grapple with disentangling neighbourhood contextual effects and sociodemographic compositional effects of an area when evaluating health outcomes.


We would like to thank Brent Coull, Alexander Tsai and Monica Alexander for valuable comments and feedback. MK was previously at Harvard T.H. Chan School of Public Health where the majority of this work was conducted. MK is currently at Stanford University School of Medicine.


View Abstract


  • Twitter @mathewkiang, @Caroline_OF_B, @jponnela

  • Correction notice This article has been corrected since it was published. Equation on page 3 is updated.

  • Contributors MK and JC designed the study, acquired the data and developed the statistical plan. MK did the data analysis. MK, JPO, NK, CB and JC interpreted the results and suggested critical additional analyses. MK prepared the initial draft of the manuscript, tables and figures. MK, JPO, NK, CB and JC provided critical revisions in successive drafts. MK, JPO, NK, CB and JC reviewed the manuscript and approved of the version to be published.

  • Funding JC and MK were supported through a Nodal Award from Dana Farber/Harvard Cancer Center (P30CA006516). JC was funded in part by the National Institute on Child Health and Human Development of the National Institutes of Health (R01HD092580). MK received support from the National Institute on Minority Health and Health Disparities of the National Institutes of Health (DP2MD010478). The content is solely the responsibility of the authors and does not necessarily reflect the official views of the funders.

  • Map disclaimer The depiction of boundaries on the map(s) in this article do not imply the expression of any opinion whatsoever on the part of BMJ (or any member of its group) concerning the legal status of any country, territory, jurisdiction or area or of its authorities. The map(s) are provided without any warranty of any kind, either express or implied.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data may be obtained from a third party and are not publicly available.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.