Article Text

Original research
Modifiable and non-modifiable risk factors for COVID-19, and comparison to risk factors for influenza and pneumonia: results from a UK Biobank prospective cohort study
  1. Frederick K Ho1,
  2. Carlos A Celis-Morales1,2,
  3. Stuart R Gray2,
  4. S Vittal Katikireddi1,
  5. Claire L Niedzwiedz1,
  6. Claire Hastie1,
  7. Lyn D Ferguson2,
  8. Colin Berry2,
  9. Daniel F Mackay1,
  10. Jason MR Gill2,
  11. Jill P Pell1,
  12. Naveed Sattar2,
  13. Paul Welsh2
  1. 1Institute of Health and Wellbeing, University of Glasgow, Glasgow, UK
  2. 2Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow, UK
  1. Correspondence to Dr Paul Welsh; Paul.Welsh{at}


Objectives We aimed to investigate demographic, lifestyle, socioeconomic and clinical risk factors for COVID-19, and compared them to risk factors for pneumonia and influenza in UK Biobank.

Design Cohort study.

Setting UK Biobank.

Participants 49–83 year olds (in 2020) from a general population study.

Main outcome measures Confirmed COVID-19 infection (positive SARS-CoV-2 test). Incident influenza and pneumonia were obtained from primary care data. Poisson regression was used to study the association of exposure variables with outcomes.

Results Among 235 928 participants, 397 had confirmed COVID-19. After multivariable adjustment, modifiable risk factors were higher body mass index and higher glycated haemoglobin (HbA1C) (RR 1.28 and RR 1.14 per SD increase, respectively), smoking (RR 1.39), slow walking pace as a proxy for physical fitness (RR 1.53), and use of blood pressure medications as a proxy for hypertension (RR 1.33). Higher forced expiratory volume in 1 s (FEV1) and high-density lipoprotein (HDL) cholesterol were both associated with lower risk (RR 0.84 and RR 0.83 per SD increase, respectively). Non-modifiable risk factors included male sex (RR 1.72), black ethnicity (RR 2.00), socioeconomic deprivation (RR 1.17 per SD increase in Townsend Index), and high cystatin C (RR 1.13 per SD increase). The risk factors overlapped with pneumonia somewhat, less so for influenza. The associations with modifiable risk factors were generally stronger for COVID-19, than pneumonia or influenza.

Conclusion These findings suggest that modification of lifestyle may help to reduce the risk of COVID-19 and could be a useful adjunct to other interventions, such as social distancing and shielding of high risk.

  • epidemiology
  • cardiology
  • diabetes & endocrinology

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Large cohort from a general population, as opposed to a cohort of hospitalised patients, at an age relevant to more severe COVID-19 symptoms.

  • Biochemistry assays performed in a single dedicated central laboratory.

  • Extensively emerging and novel risk factors for COVID-19.

  • UK Biobank is not representative of the whole UK population.

  • Exposures were measured several years before the development of the outcomes, and misclassification of the exposures will likely bias our results to the null.


COVID-19, caused by SARS-CoV-2 infection, includes a spectrum of morbidity from asymptomatic infection1 to severe pneumonia in patients presenting for medical care.2 The COVID-19 pandemic3 has led to concerted research efforts to identify people at greatest risk of developing the infection and progressing to critical illness and dying. Predictors of disease severity include older age, smoking, diabetes, hypertension, kidney disease, chronic obstructive pulmonary disease (COPD) and previous cardiovascular disease (CVD).4–7 In addition, it is becoming clear that other risk factors might include obesity and low physical fitness.8 9 However, many studies investigate risk factors for disease progression to death or critical illness among hospitalised patients7 as opposed to healthy comparators.

Identifying the risk factors for COVID-19 is important in terms of identifying factors that can be modified to reduce risk, as well as identifying non-modifiable risk factors that can help identify high risk groups who require shielding and targeting for testing, and eventual vaccine and anti-viral therapies. Pneumonia is a life-threatening complication of COVID-19 infection.10 Established major risk factors for community-acquired pneumonia include many of the emerging risk factors for COVID-19.11 As COVID-19 is caused by SARS-CoV-2 viral infection, it may have risk factors for contraction of the disease that are common to other respiratory virus conditions such as Influenza.

UK Biobank is a large prospective, deeply phenotyped, population-based cohort study carried out in the UK.12 Over the study period, testing for COVID-19 in England was conducted in accident and emergency (A&E) departments and in-hospital. These data were provided by Public Health England (PHE) and linked to UK Biobank baseline data. We aimed to establish modifiable and non-modifiable risk factors for confirmed COVID-19. We also aimed to compare these risk factors to risk factors for the incident pneumonia over a similar time span.



UK Biobank was conducted via 22 assessment centres across England, Scotland and Wales between March 2006 and December 2010 and recruited 502 624 participants aged 37–73 years. The present study was restricted to participants living in England for whom COVID-19 test results were available. Death data were available to the end of January 2018 in England. To reduce competing risks, whereby risk factors may influence risk of death before the pandemic occurred, we excluded from the study all participants known to have died prior to the COVID-19 pandemic. Baseline biological measurements were recorded and touch-screen questionnaires were administered according to a standardised protocol.12 13


Results of COVID-19 PCR tests for UK Biobank participants were provided by PHE.7 14 Data provided by PHE included the specimen date, specimen type (eg, upper respiratory tract), laboratory, origin (whether evidence from microbiological record that the was conducted in hospital setting or not) and result (positive or negative). At the time the manuscript was developed,15 data were available for the period 16 March 2020 to 3 May 2020. Results were available for 5356 tests conducted on 3003 individuals. Confirmed COVID-19 infection (primary outcome) was defined as at least one positive result in the context of an in-hospital or A&E test. Any positive test result was used as a sensitivity analysis. Longer follow-up of the UK Biobank cohort has become available, but in this analysis, we have elected to investigate risk factors for more severe disease (in-hospital positive tests) in the early stages of the pandemic.

Pneumonia was defined based on the 10th revision of the International Classification of Diseases, (ICD-10) codes J12–J18, and influenza based on J09–J11, converted into Read Codes using the UK Biobank’s look-up table. Incident pneumonia and influenza, occurring after 1 January 2016 (an arbitrary date taken to mimic the time lag from baseline exposure measurement to incident COVID-19 infection, while also obtaining sufficient case numbers) was obtained from a 41% sample of participants with available data from primary care. A sensitivity analysis was conducted using cases after 1 January 2015 and demonstrated consistent data. An analysis was also conducted exploring risk factors for all cases of pneumonia and influenza after baseline.


Exposures were measured at the baseline assessment visit between 2006 and 2010. ‘Modifiable’ risk factors were considered to include smoking, anthropometric measurements, glycated haemoglobin (HbA1C), lung function measurements, hypertension, high-density lipoprotein (HDL) cholesterol, other lipid measurements, and measures of physical activity.

Current age (on 1 March 2020) was derived from the assessment date and age at recruitment. Ethnicity, smoking, alcohol consumption, physician-diagnosed prevalent conditions and medication use were self-reported. For the present analyses, ethnicity was coded as white, south Asian, black, or mixed/other. Smoking status was categorised into never vs former/current smoking. Systolic and diastolic blood pressures were measured at the baseline visit, preferentially using an automated measurement, but using manual measurement where this was not available, and average of available measures used. Area-level socioeconomic deprivation was assessed by the Townsend score (incorporating measures of unemployment, non-car ownership, non-home ownership and household overcrowding) corresponding to the participants’ home postcode. Higher scores on the Townsend score represent greater socioeconomic deprivation. Self-reported walking pace was rated by each participant as slow, steady/average, or brisk.16

The definition of baseline diabetes included self-reported type 1 or type 2 diabetes, those with a primary or secondary hospital diagnoses relating to diabetes at baseline (ICD-10 codes E10-E14.9), and those who reported using diabetes medications. Baseline CVD was defined as self-reported myocardial infarction, stroke, or transient ischaemic attack. Cancer, longstanding illness, and depression were self-reported on touchscreen questionnaire. Other previous health complaints including asthma, rheumatoid arthritis, chronic kidney disease, systemic lupus erythematosus, sleep apnoea, COPD, pneumonia, bronchitis (including bronchitis, bronchiectasis and emphysema), and other respiratory diseases (including interstitial lung disease, asbestosis, pulmonary fibrosis, alveolitis, respiratory failure, pleurisy, pneumothorax, other respiratory condition) which were derived from self-report at nurse interview. Some conditions were not included in multivariable analysis due to low case numbers and/or high correlations with other conditions, but univariable results are presented for completeness. Statin (categorised to include other cholesterol lowering medications) and blood pressure medication use were also recorded from self-report, with blood pressure medication being used as a proxy for baseline diagnosed hypertension.

Body mass index (BMI) is the ratio of the measured body mass in kg divided by height squared measured in metres. Height was measured using a Seca 202 height measure. Weight and whole-body fat mass and fat free mass were measured to the nearest 0.1kg using the Tanita BC-418 MA body composition analyser. Socks and shoes were removed when height was measured Grip strength was measured using a Jamar J00105 hydraulic hand dynamometer and the mean was derived from the right and left hand values expressed in kilograms.17

Lung function was assessed by spirometry using a Vitalograph Pneumotrac 6800 spirometer (Vitalograph, Buckingham, UK). Participants did not perform spirometry if they answered yes to unsure to the following: chest infection in the last month, history of detached retina, heart attack, surgery to eyes, chest or abdomen in last 3months, history of collapsed lung, pregnancy, or currently on medication for tuberculosis. The aim was to record two acceptable blows from a maximum of three attempts. The spirometer software compared the acceptability of the first two blows and, if acceptable (defined as a ≤5% difference in forced vital capacity (FVC) and forced expiratory volume in 1 s (FEV1)), the third blow was not required. The mean observation was taken for both measures.

Blood collection sampling procedures for the study have been previously described and validated.18 Biochemical assays were performed at a dedicated central laboratory on around 480 000 samples. Further details of these measurements can be found in the UK Biobank Data Showcase and Protocol ( For the present study we included total cholesterol, HDL cholesterol, rheumatoid factor, cystatin C, HbA1C, C reactive protein (CRP), differential white cell count and red cell distribution width as exposures of interest. Biomarkers with data below the limit of detection were imputed as the square root of the limit of detection. The majority of participants had undetectable rheumatoid factor and risk ratios were derived for detectable rheumatoid factor, with the referent being undetectable.


Mean and SD were reported for continuous outcomes except for biomarkers, where median and IQR were reported. Poisson regression with robust ‘sandwich’ standard errors were used to study the associations of exposure variables with confirmed COVID-19 and pneumonia. Poisson regressions were used because they provide risk ratio (RR) which is easier to interpret and robust error estimation ensures accurate inference.19 Three adjustment schemes were considered: model 0—univariate (ie, no adjustment), model 1—adjusted for age, sex, ethnicity and deprivation index, and model 2—further adjusted for behavioural (smoking and alcohol drinking) and physical (adiposity, blood pressure, spirometry and physical capability) factors that were found to be significant in model 1. In model 2, there were variables that were derived from the same variable (eg, BMI and BMI categories) and have strong correlations (r=0.87 between BMI, and body fat mass, and r=0.97 between FEV1 and FVC). To avoid multicollinearity, in model 2, we chose one from, BMI, BMI categories, body fat-free mass, body fat mass and body fat per cent. and one from FEV1, FVC and FEV1/FVC. For continuous variables, the linearity of exposure-outcome associations were tested using penalised cubic splines in generalised additive model.20 Nonlinearity was tested using likelihood ratio test comparing a model with the exposure fitted on a spline with a model assuming a linear exposure-outcome relationship. P value for nonlinearity <0.05 suggest evidence against the linearity assumption. Spline smoothness was chosen using generalised cross validation.21 Population attributable fractions (PAFs) were calculated to determine the relative contribution of each risk factor to the overall number of confirmed COVID-19 cases within UK Biobank. Another Poisson model which included all significant factors in model 2 was fitted to estimate the mutually adjusted risk ratios (RRs). These RRs were biased towards null because of over adjustment bias but were used to construct the PAFs to ensure the PAF estimates did not overlap or exceed 100%. In general, two-tailed p values<0.05 were considered statistically significant. Analyses were conducted in R Statistical Software V.3.5.3 with the package ‘mgcv’.

Patient and public involvement

UK Biobank maintains a website and twitter feed to keep participants, the general public and researchers up to date on the study ( There is an annual scientific meeting which is recorded and available to the public as webcast. The results of the present study are shared through these channels as the UK Biobank organisation deem appropriate, our own twitter feeds and open-access publication.

The study was set up by the MRC, Department of Health and Wellcome Trust with input from major patient representative organisations (British Heart Foundation and Cancer Research UK:


Of 445 857 participants in England, 428 225 were alive during the available follow-up period. Complete data on covariates were available for 235 928 (55.1%) participants. Primary care data on incident pneumonia were available in 96 814 (41.0%) participants (online supplemental figure 1). At 1 March 2020, the age range of eligible participants was 49–83 years, and time elapsed from baseline was median 10.97 years (IQR 10.36–11.55 years). Of these participants, 1525 received at least one COVID-19 test, and 518 had confirmed SARS-CoV-2 infection, with 397 positive results conducted in hospital or A&E (primary outcome).

Univariable risk factors for incident COVID-19

Key univariable potentially modifiable risk factors for confirmed COVID-19 included current and former smoking (RR 1.56), higher BMI, body fat and HbA1C (RR 2.32 for obesity), poor lung function (RR 0.84 for 1 SD increase in FEV1), treated hypertension (RR 1.89), HDL cholesterol (RR 0.71 per SD increase) and slow walking pace (RR 2.29; table 1). Among non-modifiable risk factors were older age (particularly the over 75 year olds; RR 1.44), male sex (RR 1.38), black ethnicity (RR 2.88), socioeconomic deprivation (RR 1.30 per SD increase in Townsend Index; table 1). Among baseline comorbidities, general long standing illness (RR 1.59), baseline diabetes (RR 2.16), baseline CVD (RR 1.93), sleep apnoea (RR 3.32), statin use (RR 1.80), higher inflammatory markers (particularly white blood cell count; RR 1.17 per SD increase) and higher cystatin C (RR 1.34 per SD increase) were also associated with confirmed COVID-19 (table 1). Risk factor associations were similar when all 518 test positive COVID-19 cases were considered, rather than only in-hospital positive tests (online supplemental table 1).

Table 1

Univariable association of baseline risk factors with in-hospital COVID-19 in 2020, pneumonia occurring after 2016 and influenza occurring after 2016

Univariable risk factors for incident pneumonia

Of the 96 814 participants with primary care data, 209 had pneumonia recorded after 2016.

Of the modifiable risk factors, pneumonia was associated with BMI, body fat and HbA1C (more moderately than COVID-19; RR 1.43 for obesity), poor lung function (more strongly than COVID-19; RR 0.63 for 1 SD increase in FEV1), and slow walking pace (RR 2.07), but not smoking, lipids, blood pressure or blood pressure medication use. Among non-modifiable risk factors, incident pneumonia was more common in participants who were older (RR 2.09 in the over 75 year olds), and in women (RR 0.59 among men) (table 1). Ethnicity was not associated with pneumonia. Among comorbidities, baseline diabetes, CVD and cancer were approximately twice as common in those who developed pneumonia (table 1), and any longstanding illness was also associated with pneumonia. Inflammatory markers were similarly associated with pneumonia as with COVID-19.

When investigating risk factors for pneumonia over the full follow-up time from baseline, risk factors were generally similar although there was increased power due to a higher number of incident cases (online supplemental table 1). In this analysis, smoking was associated with pneumonia (RR 1.15) as was blood pressure medication use (RR 1.13). Similar findings were found when we studied pneumonia occurring after 2015 (online supplemental table 22).

Univariable risk factors for incident influenza

Of the 96 814 participants with primary care data, 94 had influenza recorded after 2016.

Those who developed influenza were generally slightly more socioeconomically deprived (RR 1.20 per SD increase in Townsend score), more likely to have bronchitis at baseline, and less likely to take statins. No other risk factors showed associations (table 1). When all influenza cases occurring after baseline were considered, evidence of statistically significant associations emerged due to increased power. Smoking (RR 1.18), poor lung function (RR 0.84 for 1 SD increase in FEV), slow walking pace (RR1.35), and high BMI (RR 1.20 for obesity) were all associated with higher influenza risk (online supplemental table 1). Cases were less common in older people (RR 0.69 in the over 75 year olds) and more common in women (RR 0.83 in men). South Asians (RR 2.20) were at increased risk.

Multivariable models for COVID-19 and pneumonia

After multivariable adjustment for age, sex, ethnicity and socioeconomic status, the modifiable risk factors for confirmed COVID-19 included smoking (RR 1.45 (95% CI 1.19 to 1.79)), higher BMI (RR 1.36 per SD increase (95% CI 1.25 to 1.48)) and other measures of body fat, higher HbA1C (RR 1.23 per 1 SD increase (95% CI 1.15 to 1.32)), as well as blood pressure medication, FEV1 and FVC, and slow walking pace (RR 1.99 (95% CI 1.48 to 2.68); model 1, table 2). Other risk factors were older age (RR 1.12 per 5 years (95% CI 1.05 to 1.19)), male sex (RR 1.36 (95% CI 1.11 to 1.66)), black ethnicity (RR 2.32 (95% CI 1.33 to 4.04)), South Asian ethnicity (RR 1.98 (95% CI 1.10 to 3.55)) socioeconomic deprivation (RR 1.27 per SD increase in Townsend Index (95% CI 1.16 to 1.39)). Among comorbidities, risk factors included longstanding illness (RR 1.44 (95% CI 1.17 to 1.77)), diabetes, CVD and statin use. Among blood biomarkers, risk factors included lower total and HDL cholesterol, higher cystatin C (RR 1.27 per 1 SD increase (95% CI 1.16 to 1.39)), CRP and white cell count (table 2).

Table 2

Association of risk factors for COVID-19 and pneumonia in UK Biobank

After adding BMI, blood pressure, FEV1 and walking pace as mediators/covariates to the adjustment model, modifiable risk factors for COVID-19 admission continued to include smoking (RR 1.39 (95% CI 1.13 to 1.73)), higher BMI (RR 1.28 (95% CI 1.16 to 1.40)), higher HbA1C (RR 1.14 (95% CI 1.05 to 1.23), treated hypertension (RR 1.33 95% CI 1.04 to 1.70) and slow walking pace (RR 1.53 (95% CI 1.12 to 2.08)). The age association was no longer significant after these additional adjustments. Other risk factors included male sex (RR 1.72 (95% CI 1.25 to 2.35)), black ethnicity (RR 2.00 (95% CI 1.16 to 3.53)), socioeconomic deprivation (RR 1.17 (95% CI 1.06 to 1.29)), lower HDL cholesterol (RR 0.83 (95% CI 0.73 to 0.95)) and higher cystatin C (RR 1.13 (95% CI 1.02 to 1.25); model 2, table 2).

Nonlinear associations of continuous variables with COVID-19 and pneumonia are shown in figure 1 (,and online supplemental table 3). Age was associated with COVID-19 admission in a J-shaped curve, where for participants aged 60–70 years the curve was relatively flat, and for participants aged over 75 years risk increased exponentially with age. The associations of BMI and FEV1 with COVID-19 were fairly linear. Associations were similar when adjusting for body fat percentage rather than BMI (online supplemental table 4).

Figure 1

Non-linear associations of significant continuous variables with COVID-19 and pneumonia. BMI, bodty mass index; FEV1, forced expiratory volume in one second.

Directly contrasting model 2 for COVID-19 and pneumonia (figure 2), the risk factors in common for both conditions were higher BMI and slow walking pace, although less strongly associated with pneumonia than for COVID-19. Highlighting specific differences, smoking and treated blood pressure were only associated with COVID-19. Pneumonia was more common in women, and socioeconomic deprivation, HDL cholesterol, cystatin C and HbA1c were not associated with pneumonia. Low FEV1 was more strongly associated with pneumonia than with COVID-19. Pneumonia also showed an association with baseline cancer.

Figure 2

Risk factors for COVID-19 and pneumonia in the UK Biobank cohort. Data presented as risk ratios (RR) and their 95% CI. Analyses were adjusted for age, sex, ethnicity, deprivation, body mass index (BMI), forced expiratory volume in 1 s (FEV1) and walking pace (and diastolic blood pressure for COVID-19 only). Continuous exposures were standardised and presented per 1-SD increment (deprivation index SD=3.01, cystatin C SD=0.14, BMI SD=4.59, HbA1c SD=5.80, FEV1 SD=0.77 and HDL cholesterol SD=0.37). BP, blood pressure.

Population attributable risks

Factors that were significant in table 2 were mutually adjusted to compare their PAFs for COVID-19 and pneumonia (table 3). Among potentially modifiable risk factors smoking accounted for 14.9% of COVID-19 cases that occurred within the UK Biobank population, obesity with 6.3%, high HbA1C with 5.3%, treated BP with 5.1%, and slow walking pace with 4.0% (total 35.6%). In contrast, none of these factors were large contributors to pneumonia cases within UK Biobank (table 3).

Table 3

Population attributable fractions of COVID-19 in the UK Biobank


In this population-based study, we found that confirmed COVID-19 infection was associated with a number of modifiable risk factors, a trend that less apparent for pneumonia and influenza. In particular, the associations of smoking, BMI (and body fat), hypertension and physical fitness (as measured by slow walking pace) with COVID-19 are of note; even when such factors were measured a decade before infection they potentially accounted for one-third of COVID-19 cases in UK Biobank. The associations of FEV1 and BMI with COVID-19 are linear, suggesting even modest improvements in lifestyle may be beneficial to risk of presumed severe COVID-19 symptoms. The other independent risk factors for COVID-19 infection included older age, male sex, black ethnicity, socioeconomic deprivation, longstanding illness and reduced renal function as measured by cystatin C, the latter also notable given renal complication in severe COVID-19. Of note, the modifiable risk factors explained some of the association of age with COVID-19 in our adjusted model.

Since this analysis was conducted,15 several other studies have also shown the association between modifiable and non-modifiable risk factors and COVID-19 risk,22–29 particularly with obesity. Generally, these data support our analyses showing the importance of both modifiable and non-modifiable risk factors. It is important to recognise the competing risk for health of lockdown and social distancing. Social distancing reduces viral transmission, but also has consequences for lifestyle. Previous data suggest that physical fitness can be rapidly lost when activity levels decrease,30 31 and this will also result in an increase in BMI.32 33 Further, social distancing may increase loneliness, depression and psychological stress in some people and consequently adversely affect eating habits34 and other health behaviours. Anecdotal evidence from our clinics and media suggest many are struggling with overeating. This study suggests public health guidance should focus on reducing the risk of severe complications of COVID-19 by advocating a healthy lifestyle during the ongoing pandemic, not just for general cardiovascular and metabolic health, but also to help to protect against COVID-19 infection, used alongside other public health interventions.

Understanding the actual causes of disease, rather than markers that simply correlate with exposures, is clearly a key issue to consider. The independent association of socioeconomic deprivation with COVID-19 after multiple adjustment may be explained by the accumulation of earlier life socioeconomic adversities which can result in less physiological reserve and more multimorbidity.34 35 It may also be linked to more overcrowding, reduced social distancing and potential exposure to greater viral load. Asthma, diabetes and high blood pressure, which have been shown to be associated with a higher risk of severe COVID-19 outcomes36 also showed some trends to be associated with COVID-19 in this dataset. While substantial focus of COVID-19 research has been its apparently more aggressive symptoms and disease progression in older people, it is important to recognise that people who are older have less cardiorespiratory reserve to cope with COVID-19 infection. Older age is also associated with more hypertension and diabetes, poorer lung function and greater relative fat mass.34

Previous reports have suggested that obesity or excess ectopic fat deposition may be a unifying risk factor for severe COVID-19 infection,8 37 reducing both protective cardiorespiratory reserve as well as potentiating the immune dysregulation that appears, at least in part, to mediate the progression to critical illness in a proportion of patients with COVID-19. Our analysis bears out these hypotheses. Once BMI or body fat was adjusted for, inflammatory markers were no longer associated with incident COVID-19. This suggests that proinflammatory markers, arising from increased adipose deposition, are probably acting as a marker for body fat which seems to be an adverse risk factor for severe COVID-19.8 37–39 Furthermore, obesity enhances thrombosis,40 which is relevant given the association between severe COVID-19 and prothrombotic disseminated intravascular coagulation and high rates of venous thromboembolism,41 42 as well as the association with D-dimer seen in other reports.43 D-dimer was not measured in blood samples in UK Biobank.

Given that pneumonia is a critical clinical complication of COVID-19, it is important to understand how risk factors for COVID-19 related pneumonia differ from ‘classic’ community acquired pneumonia.11 Our comparison between other common respiratory diseases and COVID-19 suggests that identification of at-risk groups based on our understanding of other respiratory diseases may be inadequate—a more refined approach to risk stratification based on COVID-19 specific risks is needed, and future data should focus on some of the exposures we identify to achieve this.

The strengths of our study include the large cohort size at an age relevant to more severe COVID-19 symptoms, and biochemistry assays performed in a single dedicated central laboratory. We were also able to extensively explore emerging and novel risk factors for COVID-19, while simultaneously comparing to risk factors for pneumonia identified to primary care providers. Limitations include that UK Biobank is not representative of the whole UK population44 (our data focus on England specifically) although this is generally not a concern in investigating risk associations.45 Care should be taken in generalising the PAF estimates. These related to cases occurring within the UK Biobank population, but are not directly applicable to the general population where the prevalence of risk factors is different. In addition, mutual adjustment for overlapping risk factors can lead to problems in interpretability of the PAFs.46 However, the direction of causal associations with this new outcome is not clear, and these exploratory analyses allow a direct comparison between COVID-19 and pneumonia in the same cohort. Due to the under-representation of non-white ethnicities in UK Biobank, we have limited power to explore important interactions by ethnic group (although black ethnicity was associated with the outcome), and we recognise this as an important risk factor. Ascertainment bias, including differential healthcare seeking, differential testing and differential prognosis may explain some differences in outcomes given poor coverage of testing in the UK. It is also likely that cases will generally be at the more severe end of the clinical spectrum by using hospitalised cases as the outcome, although use of admission to Intensive Care units would be additionally informative once sufficient case numbers accrue. Despite this, we still observe many similar risk factors to incident pneumonia, which is more likely to have close to complete case ascertainment in primary care records, although, as discussed, other risk factors seem more strongly linked to COVID-19 infections. Exposures were measured several years before the development of the outcomes, and misclassification of the exposures will likely bias our results to the null. However, this also serves to illustrate that the risk factors were generally present many years before development of the disease, and as is well known, risk factors tend to track with age. We have also been unable to fully exclude all deaths that occurred prior to the pandemic, due to lack of up-to-date linkage to mortality records.

In conclusion, these data from UK Biobank suggest risk factors for confirmed COVID-19 infection differ in some important ways from risk factors for pneumonia, being more common in men than women, in lower SES, and with stronger associations with ethnicity, CV risk markers, prior smoking and adiposity. Such findings suggest possible merit in advocating improvements in lifestyle as an additional measure to reduce the risk of COVID-19 alongside existing public health measures such as social distancing and shielding of high risk groups. They also have implications for health advice targeted at the public to lessen risks during this pandemic.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @DrStuGray, @claire_niedz, @MetaMedTeam

  • Contributors PW, JPP and NS conceived the idea for the paper. FH conducted the analysis. All authors contributed to the interpretation of the findings. PW, FH, CC-M and NS jointly wrote the first draft. All authors critically revised the paper for intellectual content and approved the final version of the manuscript. PW and NS are guarantors of the work.

  • Funding The work in this study is supported by the British Heart Foundation Centre of Research Excellence Grant RE/18/6/34217. CLN acknowledges funding from a Medical Research Council Fellowship (MR/R024774/1). SVK acknowledges funding from the Medical Research Council (MC_UU_12017/13), Scottish Government Chief Scientist Office (SPHSU13), and NRS Senior Clinical Fellowship (SCAF/15/02).

  • Disclaimer The funders played no part in the research.

  • Competing interests PW has received research grants from Roche Diagnostics, AstraZeneca and Boehringer Ingelheim outside the submitted work, and NS has received grant and personal fees from Boehringer Ingelheim, and personal fees from Amgen, AstraZeneca, Eli Lilly, Novo Nordisk, Pfizer, and Sanofi outside the submitted work. All authors declare no other relationships or activities that could appear to have influenced the submitted work.

  • Patient consent for publication Not required.

  • Ethics approval UK Biobank received ethical approval from the North West Multi-centre Research Ethics Committee (REC reference: 11/NW/03820). All participants gave written informed consent before enrolment in the study, which was conducted in accordance with the principles of the Declaration of Helsinki. Direct dissemination of the results to participants is not possible/applicable. This study was performed under UK Biobank application number 7155.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data may be obtained from a third party and are not publicly available. UK Biobank data can be requested by bona fide researchers for approved projects, including replication, through

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.