Article Text

Original research
Geography versus sociodemographics as predictors of changes in daily mobility across the USA during the COVID-19 pandemic: a two-stage regression analysis across 26 metropolitan areas
  1. Kathryn Schaber1,
  2. Rohan Arambepola1,
  3. Catherine Schluth1,
  4. Alain B Labrique2,
  5. Shruti H Mehta1,
  6. Sunil S Solomon1,3,
  7. Derek A T Cummings4,
  8. Amy Wesolowski1
  1. 1 Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
  2. 2 Department of International Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
  3. 3 Department of Infectious Disease, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
  4. 4 Department of Biology and the Emerging Pathogens Institute, University of Florida, Gainesville, Florida, USA
  1. Correspondence to Kathryn Schaber; kathrynschaber{at}gmail.com

Abstract

Objective We investigated whether a zip code’s location or demographics are most predictive of changes in daily mobility throughout the course of the COVID-19 pandemic.

Design We used a population-level study to examine the predictability of daily mobility during the COVID-19 pandemic using a two-stage regression approach, where generalised additive models (GAM) predicted mobility trends over time at a large spatial level, then the residuals were used to determine which factors (location, zip code-level features or number of non-pharmaceutical interventions (NPIs) in place) best predict the difference between a zip code’s measured mobility and the average trend on a given date.

Setting We analyse zip code-level mobile phone records from 26 metropolitan areas in the USA on 15 March–31 September 2020, relative to October 2020.

Results While relative mobility had a general trend, a zip code’s city-level location significantly helped to predict its daily mobility patterns. This effect was time-dependent, with a city’s deviation from general mobility trends differing in both direction and magnitude throughout the course of 2020. The characteristics of a zip code further increased predictive power, with the densest zip codes closest to a city centre tended to have the largest decrease in mobility. However, the effect on mobility change varied by city and became less important over the course of the pandemic.

Conclusions The location and characteristics of a zip code are important for determining changes in daily mobility patterns throughout the course of the COVID-19 pandemic. These results can determine the efficacy of NPI implementation on multiple spatial scales and inform policy makers on whether certain NPIs should be implemented or lifted during the ongoing COVID-19 pandemic and when preparing for future public health emergencies.

  • COVID-19
  • EPIDEMIOLOGY
  • PUBLIC HEALTH

Data availability statement

Data are available upon reasonable request. Data on mobile phone provider subscriber movement and demographics is confidential and cannot be shared. Mock data and code to replicate the analysis using this data are available at https://github.com/KathrynSchaber/COVID-Mobility-Analyses. All other data used have been referenced and are publicly available. The data set generated during this study is available from corresponding author upon reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study compared geographic and demographic factors as predictors of changes in mobility throughout the course of the COVID-19 pandemic.

  • This study used zip code-level daily mobility data from mobile phone records in 26 US metropolitan areas across various stages of the pandemic (15 March–31 September 2020).

  • To examine the predictability of daily mobility, this study used a two-stage regression approach with generalised additive models and univariate linear models.

  • This study did not have data on mobility within zip codes or mobility between zip codes when less than 50 trips were made.

  • This study had an over-representation of individuals in higher incomes and older age groups, based on who was subscribed to the mobile phone carrier.

Introduction

At the start of the COVID-19 pandemic, there was an initial decrease in mobility observed throughout the USA as various non-pharmaceutical interventions (NPIs) such as business and school closures, travel restrictions and stay-at-home orders were implemented to slow the spread of SARS-CoV-2.1–3 Decreases in travel have been broadly identified, despite substantial heterogeneity in the total number and date of interventions put in place across geographic scales throughout the country.4–8 In many instances, discordance across national, state, county and city-level NPI policies resulted in a patchwork of interventions. This has complicated identifying an overall impact of NPIs on ultimately slowing down SARS-CoV-2 transmission. Furthermore, directly tying these policies to actual disease data is difficult, particularly at early stages of the pandemic when surveillance data were often limited. In lieu of these approaches, human mobility data have served as an important proxy measures for changes in human mobility with data sources increasingly being made available (eg, Cuebiq,9 SafeGraph,10 Facebook data for good,11 Apple mobility data12 and Google community mobility reports13). These data sets served as a measure of NPI adherence and implementation that allowed policy makers to evaluate how behaviours may have changed and ultimately transmission patterns.

In addition, these data allowed for detailed examinations of mobility patterns to more broadly understand human behaviour and implications for general infectious disease dynamics. For example, when mobile phone data were examined, mobility was found to vary between regions and metro areas,3 14–19 as well as within cities.15–17 19–28 In particular, heterogeneous mobility changes within cities were correlated with population density,15 17 24 25 median income,15 17–19 24 26 29–31 urbanicity28 and partisanship.23 25 These socioeconomic factors were also found to correlate with the spatial variance of COVID-19 case positivity and mortality rates.26 32 33 Many data analyses and modelling studies found correlations between the implementation of NPIs, changes in mobility and the incidence of COVID-19 cases.16 21 26 28 34–37 It is unclear, however, to what extent mobility patterns changed by location (region, state and metropolitan area) regardless of demographic characteristics (population density, distance to city centre, income levels, etc.) and vice versa.

In addition, the effects of income, population density and NPIs on mobility patterns15 27 and COVID-19 incidence data26 are not static, but change temporally. Few studies have investigated mobility following the initial stages of the pandemic to understand and identify factors associated with continued changes in mobility patterns.15 Therefore, it is critical to understand not only what factors are related to changes in mobility but how those effects change throughout the course of the epidemic. Here, we use fine spatially resolved daily mobility data from mobile phone records encompassing zip codes throughout 26 metropolitan areas in the USA to investigate whether the location or characteristics of a zip code are more predictive of changes in mobility patterns. We analyse data across various stages of the pandemic (15 March–31 September 2020) to further evaluate how consistent these effects are throughout the pandemic and if factors associated with changes in daily mobility are temporally consistent.

Methods

Study area and data

A mobile phone operator supplied non-identifiable, aggregated data on the daily movement of subscribers between zip codes for 26 cities across the USA from 15 March to 31 October 2020 (figure 1A). For each city, all zip codes within the metro area and those within 25 miles of the city centre were included. We further included all remaining zip codes within these counties (ie, those further than 25 miles from the city centre) to ensure county-level complete coverage, totalling 5247 zip codes, 12.6% of the total 41 407 US zip codes (figure 1A). Daily mobility data were provided for the number of trips subscribers took between any two zip codes in the city, where a trip between two zip codes consisted of a subscriber being registered in the first zip code for at least 10 min then being registered in the second zip code for at least 10 min. Daily trip counts were also provided by three subscriber age groups (18–34, 35–54, 55+). If fewer than 50 subscribers made a trip (by day, for each age group and for pairs of zip codes), then these data were not provided to help protect subscriber anonymity. The provider of these data cannot be listed by name due to the agreement under which the data were provided.

Figure 1

Predicted effect of city on variation from average mobility patterns. (A) Location of zip codes used in analyses, grouped by city. (B) Relative mobility predicted from GAM with countrywide effect of date (eq. 1). The three date ranges used in analyses (initial, rebound and stable phases) are denoted using doted lines and coloured backgrounds. (C) Residuals from this trendline (eq. 2) were then used as the response variable in univariate linear models with zip code locations and features as predictors (eq. 3). Coefficient estimates from the best-fitting model during each phase of the pandemic (initial, rebound and stable) are provided. GAM, generalised additive models.

Information was also provided on the demographics of subscribers living in each zip code. The total number of subscribers by gender, income class (<$25 000, $25 000-$5,000, > $50 000) and age group (18–34, 35–44, 45–54, 55–64, 65+) were provided for each zip code. Again, if the value was less than 50, it was excluded to protect anonymity. Subscribers were found to be over-representative of the higher income and older age groups compared with the general adult population.38 Information on the population density, land area and median income of each zip code were obtained using the zipcodeR package in R that extracts information from the US Census Bureau.39 The cartesian minimum distance between each zip code and the boundaries of the city limits was calculated using shape files from the US Census Bureau and Centres for Disease Control and Prevention’s 500 Cities dataset.40

Data management and cleaning

Data for zip codes with fewer than 100 residents, according to the US Census Bureau, were removed to avoid PO Boxes and building-specific zip codes, leaving 5076 of 5247 zip codes for analysis. To remove outlying trip counts between any given zip codes for each age group, which may be the result of collection errors and could skew average trends, values outside 1.5 * IQR were removed (1.1% of trips per reported by zip code pair and age group daily). We further excluded data for a specific age group and origin/destination zip code pair if there were trip counts available for fewer than 15% (35 of 231) of days in the data set (removing 32 of 5076 zip codes). Finally, for each city, data were removed on days where fewer than 50% of the age-group and origin–destination zip code pairs were available, which left 95.2% of the original data points.

Data on the number of trips between zip codes were aggregated across the three age groups then aggregated again across the origin zip code to provide a value for the number of trips made from any given zip code on each date. Zip codes with data available on less than 75% of dates were removed (56 of 5044 zip codes, 0.5% of data). In our analysis, we focused on travel from 15 March to 30 September 2020. For each zip code, we calculated the percentage of trips relative to a ‘baseline’ value that represented the average number of trips made from that zip code in October 2020, when the majority of NPIs were no longer in place. While this time period is still during the COVID-19 pandemic, it provides a way to focus on how mobility patterns were relatively impacted over time, including early and later stages of the pandemic. The outlying values for number of trips during ‘baseline’ could have a large impact on the relative difference in travel; therefore, baseline-specific outliers were removed for each zip code. That is, we removed data points where the trip count was outside the range of 1.5*IQR for each zip code for the month of October 2020 (1.4% of data points). We also removed zip codes with the average number of trips in October below 100, as they would likely to have trip counts below 50 during the period of interest, which would not be recorded for anonymity (see above) and would impact the relative percent change (27 of 4988 zip codes, 0.5% of data points). The daily trips relative to the average number of trips during the October baseline for each zip code was calculated. From these relative values, severe outliers (outside 3 times the IQR) from the entire data set were removed to account for data collection issues, as these are most likely those with random big decreases or increases in trip counts (1.4% of data points). Relative values were then recalculated for any zip code that had outliers in the month of October. After cleaning, 96.2% of data points remained. Note that while 4961 distinct zip codes remained, 99 of these were examined twice, as they were in the collection region for two cities due to overlapping borders (22 were in Omaha and Lincoln and 77 were in both San Francisco and San Jose).

Zip code characteristics (population density, land area and median income) were split into overall and city-specific quartiles to account for differences in the distributions of these characteristics by location (online supplemental tables S1–S3). Distance between the zip code and the city boundary was considered as a binary (those who were contained within or intersecting the city boundary vs those outside the city limits) with those outside the city limits having minimum cartesian distance split into quartiles (online supplemental table S4). For each zip-code, the daily count of the number of NPIs in place for the corresponding county were calculated. Scores ranged from zero to four if the county had stay-at-home, shelter-in-place, non-essential business closure and lifestyle/entertainment closures in place. We chose to use count variables as the NPIs tended to be put in place in the same combinations in our counties of interest. Indeed, when one NPI was in place, it was school closures for 98.5% of counties, the two NPIs in place were school and lifestyle closures for 98.7% of counties, and the three NPIs in place were school closures, lifestyle closures and stay at home orders for 97.1% of counties (online supplemental table S5). County-level daily COVID-19 case incidence was obtained using the covidcast package in R.41 Cities were also grouped into six informally defined geographic regions: ‘Great Lakes’ (Chicago, IL; Columbus, OH; Detroit, MI), ‘Midwest’ (Fargo, ND; Lincoln, NE; Omaha, NE; Sioux Falls, SD), ‘Northeast’ (Baltimore, MD; New York City, NY; Philadelphia, PA), ‘West Coast’ (Los Angeles, CA; San Diego, CA; San Francisco, CA; San Jose, CA), ‘South’ (Austin, TX; Dallas, TX; El Paso, TX; Houston, TX; Phoenix, AZ; San Antonio, TX)and ‘Southeast’ (Atlanta, GA; Charlotte, NC; Jacksonville, FL; Miami, FL; Nashville, TN; Tampa, FL).

Supplemental material

Data analysis

The predictability of mobility during the COVID-19 pandemic was examined using a two-stage regression approach. First, a generalised additive model (GAM) was used to investigate relative mobility (per cent of trips from a given zip code relative to the average number of trips in October). GAMs allow for non-linear relationships between response and explanatory variables, where the effect of each explanatory variable depends on some unknown smooth function and interpretation of effects focuses on these smooth functions. Here, we examine the relationship between response variable relative mobility (M) and predictor variable date while accounting for the variation by day of the week:

Embedded Image (1)

S1 is the smooth function for date and s2 is a cyclic cubic regression spline (bs = ‘cc’), that is a spline whose ends match.

The residuals from this model, rcountry, represented the difference between a zip code’s measured mobility on a given date and the mobility predicted by the countrywide average trend in eq. 1:

Embedded Image (2)

These residuals were then used as the response variable in multiple linear models to determine which factors predict the variation from the mean trend over time.

Embedded Image (3)

Predictori in the different models was either location (city or region), zip code-level features (population density, land area, median income, distance to city limits) or number of NPIs in place (available at the county-level variable). Zip code-level features were categorised into quartiles, based both on the city-specific and countrywide values. To determine whether the effects of predictor variables changed over the course of an epidemic, we further split the data into three phases: ‘initial phase’ (14 March to 15 May 2020), ‘rebound phase’ (16 May to 31 July 2020) and ‘stable phase’ (1 August to 30 September 2020). Univariate linear models were compared using corrected Akaike information criterion (AICc). County-level daily COVID-19 incidence were also considered as a predictor provided in the online supplemental information, since temporal and location effects were already being accounted for, as were number of NPIs in place. Population density, land area and median income were also aggregated to the city level and considered as predictors in the online supplemental information.

To further examine whether zip code-level characteristics are predictive of variation from average city-level trends, two-stage regression was again used. This allowed exploration of not only what variables were predictive of mobility, but also where the model was unable to accurately predict mobility patterns. First, a GAM was run where per cent relative trips was predicted by a countrywide smoothing function and by the factor smooth interaction between city and date:

Embedded Image (4)

Factor smooth interaction allows for each city to have its own trend over time while penalising functions too far from the average, so city-specific smoothing functions still have similar underlying shapes. City-level residuals, rcity, were then calculated

Embedded Image (5)

and used as the response variable in univariate linear models

Embedded Image (6)

where predictori was zip code-level features (population density, land area, distance to city limits and median income) or county-level number of NPIs in place, as in the previous model. Again, each model was fit on each of the three time phases defined above (initial, rebound and stable) and univariate models were compared using AICc score.

The coefficients learnt in the previous model represent the effect of different predictors on whether zip code-level mobility was above or below the city-level trends. However, it is possible that a predictor could have different effects on zip code-level mobility in different cities. To determine whether the effect of the best-fit predictor was city-specific, linear models were run looking at the interaction of the best-fit predictor and city for each of the three time phases:

Embedded Image (7)

These were compared with their respective univariate model to determine whether city-specific effects were present. All statistical analyses were performed in R V.3.6.3 and R 4.0.3 statistical computing software.42

Patient and public involvement

None.

Results

Over the course of the analysis (14 March to 30 September 2020), the percentage of trips made from a zip code relative to ‘baseline’ had a large range in values (−78.6% to 150.2%), with a median of −6.3% (IQR: −18.2% to 2.8%). During the ‘initial phase’ (14 March to 15 May 2020), the percentage of trips made from a zip code relative to its ‘baseline’ decreased (median: −20.1%, IQR: −31.2 to −9.3%). Relative travel then slowly returned to the baseline (October) values across the ‘rebound phase’ (16 May to 31 July 2020) (median: −3.3%, IQR: −12.3 to 4.9%) and ‘stable phase’ (1 August to 30 September 2020) (median: −1.1%, IQR: −8.7 to 5.3). While most zip codes followed this temporal trend, the magnitude of the initial decrease, the rate at which movement recovered and whether mobility went above ‘baseline’ trends during recovery all varied across zip codes. Indeed, the additive model accounting for the effect of time on percent relative trips made (eq. 1) only explained around 25% of the deviance (figure 1B). We then considered residuals from this temporal trendline to investigate how zip codes deviate from the overall pattern.

City was the most important predictor of variation from mobility trends across the study period

When examining the residuals from the countrywide temporal trendline (eq. 2), we found that the city a zip code was in explained its variation from overall mobility better than any other variable (ie, region or zip code characteristics) for all three epidemic phases (eq. 3) (table 1). While most cities (70%) were relatively similar to the countrywide trendline, with residuals values within a 5% window around zero (figure 1C), city-specific effects were more prominent in the initial phase, where 42% of cities had large effects as compared with the rebound phase when only four cities (15%) were outside this window. There was also a larger number of cities with positive residuals (ie, more mobility than predicted by the countrywide trend) during the initial phase (65%) as compared with the rebound (54%) and stable phases (50%) (figure 1C). There were, however, five cities with larger decreases in mobility than predicted (ie, negative residuals) across all three epidemic phases and eight cities with consistently smaller decreases in mobility than predicted (ie, positive residuals) (figure 1C). This included all cities in the Midwest region, where residuals were positive throughout the epidemic, particularly during the initial and rebound phases (figure 1C). Indeed, region was the second-best predictor (after city) for residuals in the initial and stable phases (eq. 3) (table 1). During the initial phase, all cities in the Midwest and Southeast regions had positive deviations from the global trend, whereas all cities in the Northeast region had negative deviations (ie, a larger decrease in mobility than the countrywide trend) (figure 1C). In the stable phase, cities in the Midwest and the Great Lakes regions had a smaller change in mobility than average, while all cities in the South had a larger decrease than average (figure 1C). For most cities, however, there were a variety of ways in which residuals changed across the initial, rebound and stable phases of the pandemic, emphasising the importance of a time-dependent city effect. Indeed, the GAM with city-specific mobility trends (eq. 4) explained around 34% of deviance, as compared with 25% without city-specific effects (eq. 1) (online supplemental figures S1 and S2).

Table 1

Comparison of linear models predicting variation from the time-dependent model of mobility

When accounting for time and location, remaining variation was best explained by zip code population density

Based on the model that accounted for overall and city-specific patterns over time (eq. 4), residuals were analysed to examine the impact of zip code-level characteristics on variations from the city-level mobility patterns (eq. 6). The best predictor variable of the residuals for city-level mobility was population-density for the initial and rebound phases of the pandemic, while distance from city limit was slightly better during the stable phase (table 2). During the initial phase, zip codes with the lowest population density had relative mobility values around 5.5% higher than predicted by their city, while the densest zip codes had an average of 3.7% less relative mobility than predicted by their city (figure 2A). During the stable phase, those in the city limit were slightly less mobile (−1.2%) than predicted, with increasing distance from the city corresponding to increased residuals until those furthest from the city had 2.3% higher amounts of travel than predicted by the city-specific trend (figure 2B). However, we also saw that the effects of zip code characteristics were smaller in magnitude during the stable phase (figure 2). Interestingly, the effects of each zip code characteristic were markedly similar to the effects seen when predicting variation from countrywide mobility patterns (figure 2 and online supplemental figure S3). The notable exception to this is the number of NPIs in place, which had different effects on countrywide and city-specific residuals, with the inclusion of city resulting in considerably smaller effects of NPI count (figure 2 and online supplemental figure S3). Interestingly, income was not an important predictor in any of these analyses.

Table 2

Comparison of linear models predicting variation from the time-dependent and city-dependent model of mobility

Figure 2

Predicted effect of zip code characteristics on variation from average city-level mobility patterns. Residuals (eq. 5) from the generalised additive models (GAM) predicting mobility with city-level trends (eq. 4) were used as the response variable in univariate linear models with zip code features as predictor variables (eq. 6). Coefficient estimates are provided for these models during each phase of the pandemic (initial, rebound and stable). Predictor variables are (A) zip code population density quartile, (B) zip code distance to city border quartile, (C) zip code land area quartile, (D) zip code median income quartile, and (E) zip code number of non-pharaceutical interventions (NPIs) in place.

The effect of zip code characteristics on variation from city-level mobility differs by city

To determine whether the effects of zip code characteristics were dependent on city, we focused on population density for the initial and rebound phases of the pandemic, as it was the characteristic best at predicting variation from city-level mobility trends (eq. 6). Indeed, the city-based effect of population density (eq. 7) was significantly better than a global effect of population density (table 2). In most cities, the least dense zip codes had the highest residuals values and the densest zip codes had the lowest residual values; however, the magnitude of the effects varied by city (figure 3A,B). In Los Angeles, California, for example, there was less than a 1.8% spread between the predicted residuals of the based on population density, whereas in Charlotte, North Carolina, the difference in residuals of the zip codes with the least and most dense populations was around 22.5% (figure 3A). Furthermore, the most densely populated zip codes had residual values much greater in magnitude than the other zip codes in the city (online supplemental figure 4A). There are also a few cities where the least density areas were most like the densest areas, such as in Sioux Falls, South Dakota (figure 3).

Figure 3

Predicted city-specific effect of zip code characteristics on the variation from average city-level mobility during each epidemic phase. For each pandemic phase (initial, rebound and stable), the zip code characteristic best at predicting variation from city level mobility trends (found with eq. 6) was further examined using its interaction with city as a predictor of city-specific residuals (eq. 7). The zip code characteristic (A) for the initial phase was population density, (B) for the rebound phase was population density and (C) for the stable phase was distance from city limit. Predicted residual values are provided here for one city from each region (for all cities, see online supplemental figure S4).

We also investigated the best predictor during the stable phase (distance to city limit). Indeed, the interaction with city (eq. 7) was more predictive than the effect of just the distance to city limits (table 2). When accounting for the city-based effects of distance, we see vast differences in both the order and magnitude of distance from city effects (figure 3C). The significantly better fit of this model may be due to the large effect seen in New York City (online supplemental figure S4C), where those in and near the city have less mobility than predicted but those furthest from the city have almost 10% more mobility than predicted by the city-specific effect of time.

Discussion

Across the USA, there was significant heterogeneity in response to the COVID-19 pandemic, both in terms of policies enacted and mobility patterns changed. Previous studies found location (region, metro area and city) and sociodemographic features (population density, median income and urbanicity) to correlate with both mobility changes and COVID-19 case positivity;3 14–37 however, it is unclear which of these features (geographic or sociodemographic) are more predictive of the fine-scale changes in mobility. Furthermore, it is not known whether these effects are consistent throughout the pandemic or change after the initial phase. We found that on fine spatial scales, location (city) was the most important predictor of how daily mobility changed over various phases of the COVID-19 pandemic. This effect was time dependent, with city-level deviations changing in both magnitude and direction throughout the three phases of the epidemic. Even on larger scales, geographic location remained important with the overall region a city was in being an important predictor. For example, across the three phases of the pandemic explored, mobility in the midwestern cities did not decrease as much as predicted by the general US trend. The effects seen within a city or region may be partially attributable to their overall demographics relative to other cities/regions; however, the demographic characteristics of individual zip codes, when split into quartiles across all cities, were still not as significant at predicting mobility change compared with city, likely because city accounts for the combined effects of all the demographic variables (online supplemental tables S6 and S7). Indeed, while city-level median income and population density were moderate predictors of deviations from countrywide mobility trends, they were not as significant as the ‘city’ variable (online supplemental table S6). Furthermore, they became the worst predictors for deviations from city-specific mobility trends (online supplemental table S7).

Once accounting for the time-dependent effect of city, the population density of a zip code relative to its city added significant predictive power. There was also a significant effect of relative distance from the city limits, where those zip codes within city limits had the biggest decrease in mobility and those furthest from the city centre had the smallest change, a trend also seen for stay-at-home behaviour in Levin et al.17 However, the effect size of these factors differed by phase, where predicted variation from the overall mobility trendline was much smaller in the stable phase compared with the initial and rebound phases of the pandemic. Effect was also significantly city-specific, where both the direction and the magnitude of effects differed between cities, an effect that warrants future investigation.

Notably, we found median income to be the demographic variable with the least significant effect both overall and for each time phase. This is contrary to many results seen in the literature.15 17–19 24 26 29–31 This may be due to the over-representation of higher income individuals in our data versus the general population or the coarseness of our income variables. The number of NPIs in place and the COVID-19 incidence were also some of the least significant predictors (online supplemental tables S6 and S7). This may be because we are already accounting for the role of time, which inherently encompasses differences in NPI policies and COVID-19 incidence over time. Furthermore, the time-dependent effect of city is partially explained by the differing COVID-19 incidence levels and NPI implementation dates by city/state. Indeed, the effect of NPI count on residual mobility was considerably smaller in magnitude once city-specific mobility patterns were included.

While these data allowed for an understanding of the variations in mobility change across location, demographics and time on a fine spatial scale, our model could not completely explain changing mobility patterns during a pandemic. Some other aspects that could have affected mobility, such as adherence to NPIs or smart working, were not quantified during 2020 and were unable to be considered. While we account for the distance of a zip code from the city limits, we do not account for things like transportation systems. There were also some limitations with the data provided, such as the over-representation of individuals in higher incomes and older age groups, based on who was subscribed to the mobile phone carrier and the inability to adjust for sampling bias due to the unknown true underlying subscriber population. Furthermore, data were only available for mobility between zip codes, not within and values were only provided when more than 50 trips were made between any two zip codes. The impact of this is irregular across locations due to the varying populations and area of zip codes. While a large proportion of trips will be captured with between zip code movement when zip codes are small, large zip codes may still have much intra-zip code mobility we were unable to measure. This may somewhat explain the role of location, as zip codes in the western USA tend to be larger than those in the eastern USA. It may also contribute to the observed role of distance from city limits, as in the eastern USA, zip codes closer to the city centre are smaller than those further away. Analyses were also limited by the use of October 2020 as a baseline for mobility, as interventions were still in place and the COVID-19 pandemic continued. However, we found that this time period was the most consistent across locations and was also during a period of relatively low case counts across the country. It is also important to note that while other papers have focused establishing the causality,43 44 our methods focused solely on identifying correlations between mobility and possible predictor variables.

Human mobility is an important driver of infectious disease transmission and has been shown to correlate with case positivity and mortality rates during the COVID-19 pandemic. It is critical, therefore, to understand which factors affect fine scale mobility change during a disease outbreak, as well as how those effects change throughout the course of the epidemic. While location was found to be the best predictor zip code-level changes in daily movement, these effects shifted and evolved over time, possibly due to case counts and interventions in place. Furthermore, the characteristics of a zip code contributed to how its mobility changes, where the densest zip codes closest to a city centre tended to have the largest decrease in mobility. These data and analyses can not only provide additional evidence on the relationship between NPI implementation across multiple spatial scales, but also inform policy makers on whether certain NPIs should be implemented or lifted as the COVID-19 pandemic continues. The interplaying effects of a zip code’s location and characteristics are also important to consider when trying to develop accurate projections of human mobility behaviours and their impact on COVID-19 spread moving forward.

Supplemental material

Data availability statement

Data are available upon reasonable request. Data on mobile phone provider subscriber movement and demographics is confidential and cannot be shared. Mock data and code to replicate the analysis using this data are available at https://github.com/KathrynSchaber/COVID-Mobility-Analyses. All other data used have been referenced and are publicly available. The data set generated during this study is available from corresponding author upon reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

This study was submitted to the Institutional Review Board (IRB) at Johns Hopkins Bloomberg School of Public Health and declared exempt from IRB oversight as it does not qualify as human subjects research.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors AW conceptualised the study, acquired funding, supervised, and acted as guarantor. CS, ABL, SHM, SSS and AW curated the data. KS, RA, DATC and AW were involved in methodology and formal analysis. KS provided visualisations and wrote the original draft with AW. All authors were involved in review and editing of the manuscript.

  • Funding This work was supported by the National Institutes of Health Director’s New Innovator Award grant number DP2LM013102-0 (RA, KS, CS and AW), the National Institute of Allergy and Infectious Diseases grant 1R01A1160780-01 (RA, KS, CS, DATC and AW) and a Career Award at the Scientific Interface from the Burroughs Welcome Fund project number 1015823.01 (G127203) (AW).

  • Map disclaimer The inclusion of any map (including the depiction of any boundaries there), or of any geographic or locational reference, does not imply the expression of any opinion whatsoever on the part of BMJ concerning the legal status of any country, territory, jurisdiction or area or of its authorities. Any such expression remains solely that of the relevant source and is not endorsed by BMJ. Maps are provided without any warranty of any kind, either express or implied.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.