Article Text

Download PDFPDF

Original research
Racial and ethnic disparities in SARS-CoV-2 pandemic: analysis of a COVID-19 observational registry for a diverse US metropolitan population
  1. Farhaan S Vahidy1,2,
  2. Juan Carlos Nicolas1,
  3. Jennifer R Meeks1,
  4. Osman Khan1,
  5. Alan Pan1,
  6. Stephen L Jones1,3,4,
  7. Faisal Masud3,5,
  8. H Dirk Sostman3,6,7,
  9. Robert Phillips1,3,8,
  10. Julia D Andrieni3,9,
  11. Bita A Kash1,3,10,
  12. Khurram Nasir1,8
  1. 1Center for Outcomes Research, Houston Methodist Research Institute, Houston, Texas, USA
  2. 2Houston Methodist Neurological Institute, Houston, Texas, USA
  3. 3Weill Cornell Medicine, New York, New York, USA
  4. 4Department of Surgery, Houston Methodist Hospital, Houston, TX, United States
  5. 5Department of Anesthesiology and Critical Care, Houston Methodist Hospital, Houston, TX, United States
  6. 6Houston Methodist Research Institute, Houston, Texas, USA
  7. 7Houston Methodist Academic Institute, Houston, TX, United States
  8. 8Department of Cardiology, DeBakey Heart and Vascular Center, Houston Methodist Hospital, Houston, TX, United States
  9. 9Department of Medicine, Houston Methodist Hospital, Houston, TX, United States
  10. 10Texas A&M University School of Rural Public Health, College Station, Texas, USA
  1. Correspondence to Dr Farhaan S Vahidy; fvahidy{at}


Introduction Data on race and ethnic disparities for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection are limited. We analysed sociodemographic factors associated with higher likelihood of SARS-CoV-2 infection and explore mediating pathways for race and ethnic disparities in the SARS-CoV-2 pandemic.

Methods This is a cross-sectional analysis of the COVID-19 Surveillance and Outcomes Registry, which captures data for a large healthcare system, comprising one central tertiary care hospital, seven large community hospitals and an expansive ambulatory/emergency care network in the Greater Houston area. Nasopharyngeal samples for individuals inclusive of all ages, races, ethnicities and sex were tested for SARS-CoV-2. We analysed sociodemographic (age, sex, race, ethnicity, household income, residence population density) and comorbidity (Charlson Comorbidity Index, hypertension, diabetes, obesity) factors. Multivariable logistic regression models were fitted to provide adjusted OR (aOR) and 95% CI for likelihood of a positive SARS-CoV-2 test. Structural equation modelling (SEM) framework was used to explore three mediation pathways (low income, high population density, high comorbidity burden) for the association between non-Hispanic black (NHB) race, Hispanic ethnicity and SARS-CoV-2 infection.

Results Among 20 228 tested individuals, 1551 (7.7%) tested positive. The overall mean (SD) age was 51.1 (19.0) years, 62% were females, 22% were black and 18% were Hispanic. NHB and Hispanic ethnicity were associated with lower socioeconomic status and higher population density residence. In the fully adjusted model, NHB (vs non-Hispanic white; aOR, 2.23, CI 1.90 to 2.60) and Hispanic ethnicity (vs non-Hispanic; aOR, 1.95, CI 1.72 to 2.20) had a higher likelihood of infection. Older individuals and males were also at higher risk of infection. The SEM framework demonstrated a significant indirect effect of NHB and Hispanic ethnicity on SARS-CoV-2 infection mediated via a pathway including residence in densely populated zip code.

Conclusions There is strong evidence of race and ethnic disparities in the SARS-CoV-2 pandemic that are potentially mediated through unique social determinants of health.

  • epidemiology
  • public health
  • infectious diseases

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

View Full Text

Statistics from

Strengths and limitations of this study

  • This is one of the first studies to systematically evaluate race and ethnic disparities in susceptibility to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, while accounting for multiple sociodemographic characteristics and comorbidities.

  • Study population represents a large and diverse metropolitan of the USA with data from one of the largest healthcare providers across the greater metropolitan area.

  • The study evaluates potential mediation pathways for race disparities and demonstrates that residence in areas with high population density may mediate race and ethnic disparities in susceptibility to SARS-CoV-2 infection.

  • This is a single-centre study with limited information on burden of comorbidity and lifestyle factors.


COVID-19, caused by infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a pandemic that has thus far resulted in over 9.5 million cases globally in under 6 months. At the time of this reporting, the USA has approximately 25% of the total global cases and has surpassed all countries in terms of absolute number of cases, cases per one million population and fatalities.1 2 Experts project these numbers to continue rising as widespread testing is instituted and newer patterns of infectivity emerge. The geographical distribution of cases across the USA demonstrates that the predominant pandemic burden hits major metropolitan areas. However, cases of COVID-19 have been reported across all 50 states, the District of Columbia, Guam, Puerto Rico, the Northern Mariana Islands and the US Virgin Islands.3 As of 31 May 2020, the state of Texas has 64 287 reported cases of COVID-19, with about one-third in the Greater Houston area.4 The Greater Houston area is home to approximately seven million individuals, is the fourth largest metropolitan area by population in the USA and is considered one of the nation’s most diverse regions.5 6

Initial reports indicate that specific individuals such as the elderly, males and people with comorbidities, including hypertension, diabetes, obesity, coronary artery disease and heart failure, have poor COVID-19 outcomes.7–10 As the pandemic spread over the continental USA during the last 4 months, patterns of high-risk phenotypes started to emerge and reports of poor outcomes (particularly high case fatality) among racial minorities surfaced.11–13 Although it is important to understand the determinants of poor outcomes among patients with COVID-19, it is equally imperative, from a public health perspective, to systematically examine the likelihood of SARS-CoV-2 infection across large diverse communities in the USA. Data on higher likelihood of SARS-CoV-2 infection among racial and ethnic minorities across diverse US metropolitan areas are limited. Furthermore, the mediators of SARS-CoV-2 infection among racial and ethnic minorities have not been described.

We explored sociodemographic characteristics such as age, sex, race, ethnicity, median household income by zip codes, population density of residents’ zip codes and health insurance status associated with positive SARS-CoV-2 testing in an urban and diverse population served by one of the leading healthcare systems of the Greater Houston area. We further examined the association between pre-existing comorbidities and higher likelihood of SARS-CoV-2 infection in our study population. We hypothesised that older age and racial and ethnic minorities will be associated with significantly higher likelihood of SARS-CoV-2 infection, and factors such as low socioeconomic status, residence in high population density areas (proxy for potential difficulties in social distancing) and higher comorbidity burden will mediate the effect of race and ethnicity on SARS-CoV-2 infection.


We analysed data between 5 March and 31 May 2020 collected as part of the COVID-19 Surveillance and Outcomes Registry (CURATOR) at Houston Methodist (HM). The CURATOR, designed and managed by the big data team at the Center for Outcomes Research at HM, is populated from multiple data sources across the HM system, such as electronic medical records, electronic databanks for laboratory and pharmacy, and electronic interactive patient interface tools. The HM system comprises a flagship tertiary care hospital in the Texas Medical Center, seven large community hospitals, a continuing care hospital, and multiple emergency centres and clinics throughout the Greater Houston area. Data from various sources are curated into a harmonised format, assessed for quality and integrity, and stored on a secure institutional Health Insurance Portability and Accountability Act (HIPAA)-compliant server.

We flagged all individuals who were tested for SARS-CoV-2 using real-time reverse transcriptase PCR diagnostic panels. The three cross-validated PCR tests used were the WHO nucleic acid amplification test, Panther Fusion SARS-CoV-2 Assay and Cepheid Xpert Xpress SARS-CoV-2 Assay. These assays were verified for quantitative detection of novel SARS-CoV-2 isolated and purified from nasopharyngeal swab specimens obtained from individuals and immersed in universal transport medium. Testing was carried out for symptomatic individuals or for individuals who had a self-reported history of exposure to a COVID-19 case, including recent travel to other countries with high infection rates or hotspots within the USA.

Sociodemographic characteristics including age, sex, race, ethnicity and payer status (insurance type) were obtained from the HM CURATOR for analyses. We also extracted information on the presence of comorbidities comprising the Charlson Comorbidity Index (CCI), which include history of myocardial infarction, congestive heart failure, peripheral vascular disease, cerebrovascular disease, dementia, chronic pulmonary disease, rheumatic disease, peptic ulcer disease, liver disease, diabetes with or without complications, hemiplegia, renal disease, any malignancy (excluding skin neoplasms), metastatic solid tumours and AIDS/HIV. Data on hypertension and obesity were additionally obtained. We used the US Census Bureau’s American Community Survey 5-year data (2014–2018) to determine the median household income by individual zip code tabulation area (ZCTA).14 The median ZCTA household income was inflation adjusted to 2018 US dollars. We also used the same data source to obtain population estimates by ZCTA, and calculated ZCTA-level population density (population per mile square) by standardising it for area measurements of ZCTA. For the purpose of population density determination, land area estimates were obtained from the Census Bureau’s US Gazetteer Files 2010.15 In the absence of granular and precise social distancing data, we have used population density as a proxy for potential difficulties in social distancing among crowded communities.

We provide descriptive summary data as mean (SD) and proportion. We fit univariable and multivariable logistic regression models to assess unadjusted and adjusted associations between sociodemographic characteristics and likelihood of being tested positive for SARS-CoV-2. We additionally provide univariable comparison of various sociodemographic and comorbidity variables between non-Hispanic black (NHB) and non-Hispanic white (NHW) race categories, as well as between Hispanic and non-Hispanic ethnic groups. Age, income, population density and CCI were categorised for certain analyses. We included age, sex, race, ethnicity, zip code household income, insurance type, zip population density and CCI in our initial multivariable model. Zip code household income, zip population density and CCI were evaluated as mediators. Factors demonstrating mediation were excluded from the final models. However, the factors that did not demonstrate mediation were included in the final models, as we believe that they continue to importantly inform the variance of estimates for direct effects.16 We assessed the model fit using the Hosmer-Lemeshow goodness of fit test, and crude OR and adjusted OR (aOR) and 95% CI are reported. Postestimation marginal probabilities of SARS-CoV-2 infection were determined from the final adjusted model for major covariates (race, ethnicity and age). We explored the mediation influence of comorbidity burden (CCI), socioeconomic status (median income) and lack of social distancing (population density) on the relationship of black race and Hispanic ethnicity with high likelihood of SARS-CoV-2 infection using the generalised structural equation modelling (GSEM) framework. The GSEM framework was set up to provide estimates of direct and indirect effects of black race and Hispanic ethnicity on SARS-CoV-2 infectivity. Statistically significant (p<0.05) indirect effects represent full or partial mediation by a tested covariate. We included all individuals tested for SARS-CoV-2 across our healthcare system and did not perform formal sample size calculations.

Patient and public involvement

There was no direct patient or public involvement in the design and conduct of this study.


Sociodemographic and comorbidity characteristics of the study population

Across the time period of analysis, we identified a total of 20 228 presumed cases tested for SARS-CoV-2, among whom 1551 (7.7%, CI 7.3 to 8.0) tested positive. Overall, the mean (SD) age of the study population was 51.1 (19.0) years; 61.9% were female and 62.3% were white (including Hispanic ethnicity). The study sample was comparable with the overall population of patients treated across HM, who have a mean (SD) age of 49.0 (22) years, are 56% female and 53% white. The HM system metrics were derived from a sample of 3 216 290 patients managed across the system since 22 May 2016.

The overall median (IQR) household income was US$70 658 (US$53 313–US$99 276), and 42.6% of the study population had private or employer-based insurance. In our univariate analysis, black race (vs white; OR, 1.55, CI 1.37 to 1.75), Hispanic ethnicity (vs non-Hispanic; OR, 2.02, CI 1.79 to 2.27) and male (vs female; OR, 1.17, CI 1.06 to 1.31) were associated with significantly higher likelihood of testing positive for SARS-CoV-2. Among the SARS-CoV-2-positive patients, 40.8% were in the 51–75 years age category and 11.4% were older than 75 years. These proportions were significantly higher than the reference group (up to 35 years; for 51–75 years vs up to 35 years: OR 1.29, CI 1.12 to 1.48; for >75 years vs up to 35 years: OR 1.23, CI 1.02 to 1.49). Furthermore, individuals in higher pentiles of socioeconomic status had significantly lower likelihood, whereas those residing in higher population density ZCTA had higher likelihood of SARS-CoV-2 infection. We observed a significantly higher proportion of SARS-CoV-2-positive individuals in the CCI 1–2 category compared with CCI of 0 (OR, 1.35, CI 1.18 to 1.54). However, similar differences for higher CCI categories were not observed. For specific comorbidities, a significantly greater proportion of individuals with diabetes had SARS-CoV-2-positive results (OR, 1.40, CI 1.24 to 1.57). The sociodemographic characteristics and comorbidity profiles for the overall and SARS-CoV-2 positive and negative patients are summarised in table 1.

Table 1

Summary measures and univariable association of sociodemographic characteristics with SARS-CoV-2 infection from HM CURATOR

Sociodemographic and comorbidity characteristics associated with minority race and ethnicity

In our study sample comprising 13 754 non-Hispanic black and white individuals, we compared the association between race and various sociodemographic and comorbidity characteristics (table 2). Similarly, we also evaluated univariable differences for sociodemographic variables and comorbidities between Hispanic and non-Hispanic individuals (table 3). Minority race (NHB) and ethnicity (Hispanic) were both associated with younger age, higher proportion of females, and residence in low income and higher population density ZCTA. However, NHB and Hispanic groups were both associated with an overall lower burden of comorbidities (as demonstrated by significantly lower median CCI) compared, respectively, with NHW and non-Hispanic categories. A higher proportion of individuals among minority race and ethnicity were diabetic and a higher proportion of NHB were also hypertensive compared with NHW.

Table 2

Univariable comparison of sociodemographic and comorbidity factors between NHB and NHW race categories

Table 3

Univariable comparison of sociodemographic and comorbidity factors between Hispanic and non-Hispanic ethnicities

Multivariable model and marginal probabilities for likelihood of SARS-CoV-2 infection and racial and ethnic minorities

The significantly higher likelihood of SARS-CoV-2 infection among minority race and ethnic groups persisted after controlling for other demographics, insurance type, median household income, population density and comorbidities. The aOR (CI) for NHB versus NHW was 2.23 (1.90 to 2.60) and for Hispanic versus non-Hispanic was 1.95 (1.72 to 2.20). Higher risk of infection among males (compared with females) and higher likelihood of SARS-CoV-2 infection among elderly also remained statistically significant. Detailed outputs of the fully adjusted logistic regression models for minority race and ethnic groups are presented in table 4. Based on the marginal probabilities obtained from our fully adjusted model, the probability of SARS-CoV-2 infection in a 45-year-old NHB is 9.6%, whereas it is 4.5% in a 45-year-old NHW individual, all other adjusted variables being constant. At the age of 75, this probability is 14.0% for an NHB and 6.9% for an NHW. A similar relationship differential was observed for Hispanic versus non-Hispanic individuals. Multivariable model-derived probabilities of SARS-CoV-2 infection for NHB versus NHW and for Hispanic versus non-Hispanic across age spectrum are presented in figures 1 and 2.

Figure 1

Adjusted probability and 95% CI of positive SARS-CoV-2 PCR in non-Hispanic black versus non-Hispanic white by increasing age. SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

Figure 2

Adjusted probability and 95% CI of positive SARS-CoV-2 PCR in Hispanic versus non-Hispanic by increasing age. SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

Table 4

Adjusted OR and 95% CI for likelihood of SARS-CoV-2 positivity among minority race and ethnic groups

GSEM for mediation by income, population density and comorbidity index

Using the GSEM framework, we determined the direct and indirect effects of NHB and Hispanic ethnicity on SARS-CoV-2 infection with median income, population density and CCI modelled as mediators in six separate equations adjusted for age and sex. The indirect effect of NHB mediated through population density was statistically significant (OR, 1.03, CI 1.01 to 1.05, p=0.001); however, the indirect effects mediated via median income and comorbidity scores were not statistically significant (p=0.14 and p=0.64, respectively). Among individuals identifying as Hispanic or Latino, both population density and income partially mediated the effect of ethnicity on SARS-CoV-2 positivity (for population density: OR 1.02, CI 1.01 to 1.02, p<0.001; for income: OR 1.04, CI 1.02 to 1.06, p<0.001). Evaluation of comorbidities did not suggest a mediation influence for either NHB or Hispanic categories.


The underlying race and ethnic healthcare disparities have been painfully highlighted in the wake of the COVID-19 pandemic. Most reports indicate higher mortality or case fatality among minority racial groups (black/African–American) across major US metropolitan areas.11–13 However, robust insights on the racial differences for SARS-CoV-2 infection are limited. Furthermore, comprehensive data evaluating higher susceptibility to SARS-CoV-2 infection among Hispanic communities are also scarce. This is perhaps because of comparatively homogeneous populations in non-US regions of the world. Houston, as an exceptionally ethnically diverse population centre,17 is well suited for an investigation of racial, ethnic and socioeconomic gradients in COVID-19 test positivity. We focus on highlighting the mechanisms of racial and ethnic disparities in susceptibility to SARS-CoV-2 infection and provide evidence of mediation of such disparities by novel social determinants of health (SDoH).

Our study adds to the current literature by analysing emerging data for individuals being tested across one of the largest healthcare systems in the Greater Houston area. We report that racial and ethnic minorities (NHB and Hispanic individuals) are almost twice as likely to test positive for SARS-CoV-2 than the NHW and non-Hispanic population. These findings illuminate systematic racial/ethnic disparities in testing positive for SARS-CoV-2 infection. Although there are limited prior SARS-CoV-2 data, such racial and ethnic disparities have previously been described for the US H1N1 influenza pandemic.18 These data indicated that Spanish-speaking Hispanic and black individuals were at a greater risk of H1N1 infection, primarily attributable to lack of healthcare access.

We explored three possible mechanisms of race disparities in our data. These included lower socioeconomic status, residence in higher population dense areas and higher level of comorbidities. We demonstrate that NHB race is significantly associated with all three potential disparity pathways, and in the traditional multivariable analyses racial and ethnic disparities persisted even after controlling for these pathways. However, our mediation analyses highlighted the potential influence of residence in high population density areas as a viable pathway that at least partially explains the observed racial and ethnic disparity. Furthermore, residence in low-income areas emerged as a significant mediation pathway for ethnic differences in SARS-CoV-2 positivity. Pathways mediating the influence of comorbidity status did not demonstrate a significant effect. We used population density as a marker for potential inability to maintain adequate social distancing as it has been indicated that maintaining the WHO-recommended safe distance between people becomes challenging with high population densities.19 Furthermore, overall effects of population density and disease spread have been previously described in the literature.20 21 In addition to lack of social distancing, higher population density may also be associated with several other behavioural and sociodemographic attributes that may predispose populations to both viral spread and increased susceptibility. For example, there are reports linking obesity, lack of physical activity and higher mortality with residence in densely populated neighbourhoods.22 23

As reported, our data also corroborate that older populations may be more susceptible to SARS-CoV-2 infection.10 However, younger populations still are cause for concern as nearly one in four of the infected cases in our sample were between 36 and 50 years of age. Finally, our data demonstrate that males may be approximately 20% more likely to test positive for the SARS-CoV-2 infection. Potential sex differences in infectivity to SARS-CoV-2 and intersectionality with racial and ethnic socioeconomic factors need to be explored further in future analyses. Additional policy-oriented research should prioritise studying the intersectionality of these vulnerable economic statuses and racial disparities in COVID-19 infection indicated by the present study.

Findings of our study need to be interpreted in the light of certain limitations. Our data are from a single centre and may not be generalisable to the wider US population. These findings need to be replicated in larger data sets across other large heterogeneous US metropolitans. However, the Houston metropolitan area is one of the most diverse and representative in the USA17 and our healthcare system is one of the largest systems providing care to patients with COVID-19 in the Greater Houston area. Our sample was composed of 22% black, 18% Hispanic and 62% female population. Our final multivariable models included potential mediators which may produce biased estimates.16 However, these potential mediators did not demonstrate a statistically significant indirect effect in our analyses. We did not have information on certain demographic covariates such as education or household size. Educational status has been linked to healthcare awareness and may be important to adjust for in analyses of potential disparities, and household size may be used to provide more precise estimates of socioeconomic status. However, we obtained and adjusted for zip code income data from the US Census, as income has previously been shown to have strong correlation with educational attainment and socioeconomic status.24 Since testing was based on suspicion of infection and may have been influenced by factors such as access to care, the potential for selection bias cannot be ruled out. Furthermore, lack of sensitivity of SARS-CoV-2 diagnostic tests have been reported; however, the three assays used for testing were cross-validated for internal consistency. Finally, we did not have detailed information on comorbidities and their management in the study population. However, we did control for major comorbidities which are being reported as associated with COVID-19 outcomes.25


The strong association between racial and ethnic minorities and SARS-CoV-2 infection demonstrated in our data, even after adjustment for other important sociodemographic and comorbidity factors, highlights a potential catastrophe of inequality within the existential crisis of a global pandemic. Our data, representing a large heterogeneous US metropolitan area, also provide preliminary evidence into the potential pathways for this disparity. It is highly likely that higher comorbidity burden and detrimental effects of adverse social determinants, including those that may not adequately permit safe practices of social distancing, mediate higher SARS-CoV-2 infectivity among racial and ethnic minorities.

As the pandemic continues to spread and evolve across the continental USA, emerging data on the association between SARS-CoV-2 infection and various sociodemographic factors will continue to enhance our understanding of targeted risks related to SARS-CoV-2 infection, and such data would enable us to comprehend healthcare services and access factors related to development and outcomes of COVID-19 among minority populations. Our findings substantiate prior calls for collection of robust data on race and ethnicity as part of international collaborations,26 and further drive home the critical importance of quantifying novel SDoH.


The authors thank Jacob M Kolman, Senior Scientific Writer of the Houston Methodist Center for Outcomes Research, for reviewing the language and format of the manuscript.


View Abstract


  • Twitter @HMAIChief

  • Contributors FSV: design, data analysis and interpretation, drafting the manuscript, critical revision for important intellectual content, final approval. JCN, OK, AP: data acquisition, data analysis, drafting the manuscript, final approval. JRM: data acquisition, drafting the manuscript, final approval. SLJ: data acquisition, data interpretation, critical revision for important intellectual content, final approval. FM, HDS, RP, JDA, BAK: critical revision for important intellectual content, final approval. KN: design, interpretation of data, critical revision for important intellectual content, final approval.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval This work was carried out under an approved protocol for the Houston Methodist COVID-19 Surveillance and Outcomes Registry (HM CURATOR) by the Houston Methodist Research Institute Institutional Review Board (HMRI IRB). HM CURATOR has been approved by the HM IRB as an observational quality of care registry for all suspected and confirmed patients with COVID-19. HM IRB granted CURATOR a waiver of informed consent and HIPAA (Health Insurance Portability and Accountability Act) authorisation in accordance with current federal regulations.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available upon reasonable request. All requests for de-identified data should be made to the corresponding author. All reasonable requests will be evaluated by the CURATOR Data Governance and Sharing Committee comprising FSV, SLJ, BAK and KN in the light of institutional policies and guidelines.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.