Article Text

Original research
Assessment of the concordance between individual-level and area-level measures of socio-economic deprivation in a cancer patient cohort in England and Wales
  1. Fiona C Ingleby1,
  2. Aurélien Belot1,
  3. Iain Atherton2,
  4. Matthew Baker3,
  5. Lucy Elliss-Brookes4,
  6. Laura M Woods1
  1. 1 Inequalities in Cancer Outcomes Network, Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London, UK
  2. 2 School of Health and Social Care, Edinburgh Napier University, Edinburgh, UK
  3. 3 Consumer Forum, National Cancer Research Institute, London, UK
  4. 4 National Cancer Registration and Analysis Service, Public Health England, London, UK
  1. Correspondence to Dr Fiona C Ingleby; fiona.ingleby{at}lshtm.ac.uk

Abstract

Objectives Most research on health inequalities uses aggregated deprivation scores assigned to the small area where the patient lives; however, the concordance between aggregate area-level deprivation measures and personal deprivation experienced by individuals living in the area is poorly understood. Our objective was to examine the agreement between individual and ecological deprivation. We tested the concordance between metrics of income, occupation and education at individual and area levels, and assessed the reliability of area-based deprivation measures to predict individual deprivation circumstances.

Setting England and Wales.

Participants A cancer patient cohort of 9547 individuals extracted from the Office for National Statistics Longitudinal Study.

Outcomes We quantified the concordance between measures of income, occupation and education at individual and area level. In addition, we used ROC (receiver operating characteristic) curves and the area under the curve (AUC) to assess the reliability of area-based deprivation measures to predict individual deprivation circumstances.

Results We found low concordance between individual-level and area-level indicators of deprivation (Cramer’s V statistics range between 0.07 and 0.20). The most commonly used indicator in health inequalities research, area-based income deprivation, was a poor predictor of individual income status (AUC between 0.56 and 0.59), whereas education and occupation were slightly better predictors (AUC between 0.62 and 0.65). The results were consistent across sexes and across six major cancer types.

Conclusions Our results indicate that ecological deprivation measures capture only part of the relationship between deprivation and health outcomes, especially with respect to income measurement. This has important implications for our understanding of the relationship between deprivation and health, and, as a consequence, healthcare policy. The results have a wide-reaching impact for the way in which we measure and monitor inequalities, and in turn, fund and organise current UK healthcare policy aimed at reducing them.

  • epidemiology
  • epidemiology
  • public health
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study presents a detailed description of concordance between aggregate area-level deprivation metrics and individual-level deprivation data, enabling an assessment of whether the widely used aggregate metrics are actually representative of individual deprivation circumstances or not.

  • The study assesses education, occupation and income indicators of deprivation separately, and quantifies concordance between individual-level and area-level measures for each, allowing a more detailed understanding of deprivation than has been possible to date.

  • The cohort focusses on cancer types known to have significant socio-economic inequalities in terms of cancer survival, meaning that extension to a broader population (other cancers or the general population) would be of interest in future work.

  • The data used is the most recent individual deprivation data available from the UK census, and are therefore limited to year 2011, but once data is available from the planned 2021 census, the results could be updated.

  • A small proportion of individual-level deprivation data was missing and so we completed this information where possible using another household adult, which could have led to a very small number of individuals being misclassified.

Introduction

There is strong evidence across economically advanced countries that people who live in more socio-economically deprived areas have poorer health outcomes than those living in more advantaged areas.1–8 These inequalities can be substantial: for example, in England, they account for around 1 in 10 cancer deaths in the first 5 years after diagnosis.9–11 There is little evidence of these inequalities narrowing, despite efforts to reduce them.5 12 13

Much of the research exploring health inequalities across deprivation groups has been conducted using data aggregated to small geographical areas. These ecological measures represent aggregated individual characteristics for the population. Arguably, attributing these measures to individuals invokes an implicit assumption that area-level measures are at least somewhat representative of an individual’s personal deprivation. In reality, while these studies have improved our understanding of trends in health outcomes across ecological deprivation groups, they have not directly addressed the relationship between individual deprivation and mortality because the concordance between ecological measures of deprivation and individual deprivation status is not well understood.

The relationship between individual measures, ecological measures and health outcomes is potentially made more complex by the possible existence of contextual effects: that is, that the relationship between individual deprivation and health outcomes might vary by the patient’s socio-economic context (ecological deprivation). The degree to which this occurs is likely to depend on the mechanism by which deprivation (either at individual or ecological level) affects outcomes as well as the type of deprivation examined. For example, within oncology a small number of studies have examined the relative effects of individual-level and ecological-level deprivation on both cancer risk14–16 and outcomes.17–19 Generally, these studies have quantified independent effects of both individual and ecological deprivation, and for both, more deprived areas or individuals have higher risk and lower survival.14 17–19 However, the strength and nature of these trends varies considerably across factors including sex, level of geographical aggregation and which type of deprivation metric is used.18 Furthermore, these associations are not well understood in a UK context, especially in terms of making use of recent data, and an improved understanding will be important in order to reduce inequalities as part of the National Health Service (NHS) long-term plan for 2020 to 2030.20 The research on health inequalities on which the NHS long-term plan is based uses data aggregated to small area level, and so improving our understanding of how reliably this matches individual-level circumstances is important in terms of developing further policies which more specifically target individual-level variation in health outcomes.

Here, we focus on two key research questions: (1) how strong is the concordance between individual and ecological socio-economic deprivation measures in a cohort of cancer patients; and (2) how strong is the concordance between different types of deprivation variables? These questions enable us to comment on the predictive ability of area-level measures to provide information on individual-level deprivation status in a cancer patient cohort. We discuss the implications of these results in the context of the existing literature on cancer outcome inequalities.

Methods

We analysed data from the Office for National Statistics Longitudinal Study (ONS LS), individually linked to cancer registrations for England and Wales recorded by the National Cancer Data Repository. The LS is a long-term census-based multi-cohort study using four annual birthdates as the selection criterion. This provides a random 1% sample of the population of England and Wales, clustered by date of birth.21 22 Data are available for all census variables from the 1971 census through to the most recent 2011 census, as well as for variables derived from external, individual linkage, including cancer registrations and administrative data (births and deaths).

The analysis cohort included LS members present at either or both of the 2001 and 2011 census (figure 1). We defined the adult cancer patient subpopulation as anyone with a first primary malignant cancer diagnosis recorded in the national cancer registry between 1 January 2008 and 30 April 2016 for six common cancer types in England and Wales: breast (ICD-10 (International Classification of Diseases, 10th Revision) code C50), colon (C18), rectum (C19 to C21), prostate (C61), bladder (C67) and non-Hodgkin's lymphoma (C82 to C86). These cancers were selected for analysis based on evidence of wide socio-economic inequalities in cancer survival in the UK.5 A small number (<20) of sex-site inconsistencies, and also a small number (<30) of men with breast cancer were excluded. Only those aged 18 to 99 at the time of diagnosis were included.

Figure 1

CONSORT (Consolidated Standards of Reporting Trials) diagram describing the data set linkage and variables used in the analysis, as well as the flow of LS members through the data processing steps: overall numbers, cancer patient subpopulation filtering and missing data exclusions. Data source: ONS LS. ICD-10, International Classification of Diseases, 10th Revision; IMD, Indices of Multiple Deprivation; LSOA, Lower-level Super Output Area; NS-SEC, National Statistics Socio-Economic Classification; ONS LS, Office for National Statistics Longitudinal Study; SOC, Standard Occupational Classification.

Both at individual and area level, we focussed on three main variables: occupation, education and income; which are commonly used to summarise the broad spectrum of socio-economic status in the social sciences.23

Ecological deprivation metrics

The Indices of Multiple Deprivation (IMD) were used to measure area-based deprivation. The IMD statistics are calculated for each Lower-level Super Output Area (LSOA) in England and Wales and consist of seven domains. We used the income, employment (occupation) and education domains. LSOA codes were recorded directly for individuals in the 2011 census data, while in 2001 census, LSOA codes were derived from concatenating district and ward codes. The temporally closest data were used for each census: for the 2001 census this was the English IMD 200424 and Welsh 2005 report,25 and for the 2011 census this was the English IMD 201526 and Welsh 2014 report.27 Each domain was included as ventiles (ie, 20 equal quantile groups) of the national distribution of areas, as opposed to the raw scores, to avoid LS members being identified in LSOAs with low population size.

Individual-level deprivation metrics

Individual data on age, sex, qualifications and occupation at the 2011 census were extracted for each patient, while individual income was derived using a previously published method (see below). Individual data were not available from the 2011 census for a small proportion of individuals; in part accounted for by those who were diagnosed with cancer between 2008 and 2010 and had died prior to the 2011 census (figure 1). Where possible, data from the 2001 census was used for these individuals. For missing data on qualifications or occupation, data was completed where possible by proxy, using another adult resident in the household (usually household head or spouse). The rationale for this use of information by proxy is based on evidence that partners tend to have similar incomes,28 occupations29 and educational attainment.30 We tested the sensitivity of the estimated concordance statistics to this use of proxy data by comparing results with and without these imputed values, and found very little difference (online supplemental table S1). Prior to data completion by proxy, missingness was 12% for occupation data, 2% for education and 9% for income. After completion of missing data by proxy, missingness was 6%, <1% and 5%, respectively, for each of occupation, education and income individual-level deprivation variables (figure 1).

Supplemental material

Occupation type was derived from the National Statistics Socio-Economic Classification (NS-SEC). The three-group version of the NS-SEC was used, which categorised LS member occupations as (1) technical, routine and manual occupations; (2) intermediate occupations; or (3) higher managerial, administrative and professional occupations.31 Unlike the finer-scaled versions of the NS-SEC, the three-group version classifies occupations into approximately hierarchical groups. As recommended for the three-group version of the NS-SEC, those without an occupation classification due to long-term unemployment or studentship were treated as missing.31 We carried out a sensitivity analysis where these individuals were included in the technical, routine and manual group, which did not cause any appreciable differences to the concordance estimates.

Education level was categorised as one of the six groups based on the standard levels of UK qualifications used in the census:32 (1) no qualifications; (2) 1 to 4 GCSEs (General Certificate of Secondary Education) or equivalent; (3) 5+ GCSEs or equivalent; (4) apprenticeships and vocational qualifications; (5) A-levels or equivalent; or (6) degree-level education and higher.

Weekly income (GBP) was estimated per individual following the method described by Clemens and Dibben,33 which required information on sex, age and Standard Occupational Classification (SOC) code. Income is therefore linked to occupation. The SOC codes used, however, capture specific detail not available within the NS-SEC codes used for the occupation variable, which more broadly classifies types of occupation. We took a data-driven approach to adjust income estimates for those aged over 60 who are most likely to be retired, using observed annualised percentage decreases in income for those aged over 60 reported by the English Longitudinal Study of Ageing34 (see online supplemental tables S2 and S3). After applying this correction, LS members were grouped into quintiles by estimated income, from least deprived (Q1) to most deprived (Q5). Quintiles were calculated based on all available LS members (not just patients with cancer), separately for each sex.

Patient and public involvement

Due to data protection, we do not have access to individual identifying data from the ONS LS and so it was not possible to directly involve these participants in the analyses and discussion for this study. Our aim is to share these results with patients and public through publication, in order to address public health issues surrounding health inequalities. In addition, we included cancer patient representatives at each stage of the design, implementation and analysis of this study, as part of the research team.

Data analysis

Men and women were analysed separately, for all cancer types combined and for individual cancers. We tested the degree of concordance between each pairwise combination of the six deprivation variables: individual-level income quintile, education and occupation groups; and LSOA-level quintiles for income, education and occupation. Concordance was quantified using Cramer’s V statistic, a measure of the concordance between pairs of categorical variables derived from a χ2 statistic, with 95% CIs also approximated from the χ2 distribution.35 The measure has the big advantage of not assuming that categories are ordinal. Cramer’s V<0.10 are generally interpreted as low concordance and V>0.30 high, although the values depend in part on the number of categories in the variable with the lowest number of groups (V can be slightly higher where group numbers are fewer35). In most comparisons here, this is the same (five groups), except for comparisons involving individual-level occupation (three groups).

For each type of deprivation metric (ie, education, income or occupation) we assessed the extent to which the area-level value accurately predicted the ‘true’ individual-level value. Individuals were considered ‘deprived’ if their individual-level value was either no qualifications or 1 to 4 GCSEs (education), technical, routine and manual (occupation) or below the 40th centile of income (quintiles 4 and 5). A binary classification was applied to the corresponding area-level deprivation variable, which was repeated using each ventile of the area-level variable as the binary threshold. For ventile 1 as threshold, individuals in ventiles 2 to 20 were categorised as deprived; for ventile 2 as threshold, individuals in ventiles 3 to 20 were categorised as deprived; and so on. Three aspects of predictive ability were then measured: (1) accuracy, the total proportion of individuals correctly classified; (2) sensitivity, the proportion of ‘deprived’ individuals correctly classified by the area-level measure; and (3) specificity, the proportion of ‘not deprived’ individuals correctly classified by the area-level measure. Using these measures, we generated receiver operating characteristic (ROC) curves36 for each type of deprivation measure and calculated the area under the curve (AUC) to summarise the ability of the area-based measure to predict individual-level deprivation.

All analyses were carried out in R V.3.6.1. Graphs were generated using the package ggplot2 (V.3.2.1).

Results

The linked data set consisted of 4826 male patients with caner and 4721 female patients with cancer with non-missing individual deprivation data for analysis (figure 1). The patient cohort tended to include more individuals from the more deprived groups (table 1).

Table 1

Numbers and percentages of patients with cancer included in the analysis, by sex; showing distribution across deprivation groups at both individual-level and LSOA-level and across cancer types. Data source: ONS LS.

Our analyses set out first to investigate concordance between individual and ecological deprivation measures in patients with cancer. We found that concordance between individual-level and ecological-level measures was generally low for both men and women (figure 2), despite a general trend of the highest proportion of deprived individuals being found in the most deprived areas (figure 3). We also used binary deprived/not deprived individual and area-level categories to assess how well area-level status predicted individual status and found that none of the area-based measures were strongly reliable predictors of individual-level deprivation status (figure 4), although occupation performed better than education or income. For occupation, using ventiles 14 (men) and 16 (women) to predict a binary deprivation status yielded the highest predictive accuracy (figure 4A). The ROC curves showed that for each sex the ability to discriminate was higher than the 0.5 expected by chance, with AUC values of 0.65 and 0.62 for men and women, respectively (figure 4B). Predictive ability for education was slightly lower, with an AUC of 0.62 for both sexes (figure 4C,D). For income, the predictive ability of area-level income was very low with AUC values of 0.59 for men and 0.56 for women (figure 4E,F), indicating the predictive ability was not much greater than expected by chance.

Figure 2

Cramer’s V±95% CI for all pairwise combinations of deprivation metrics. Strength of concordance is indicated by darker shading for men in top half (green; n=4826), and women in bottom half (purple; n=4721). Data source: ONS LS.Individ, individual; LSOA, Lower-level Super Output Area; ONS LS, Office for National Statistics Longitudinal Study.

Figure 3

Stacked barplots showing proportions of men and women in each combination of categories for (A) individual occupation versus LSOA occupation quintiles; (B) individual education versus LSOA education quintiles; and (C) individual income versus LSOA income quintiles. Data source: ONS LS. Appren, Apprenticeship; dep, deprived; GCSE, General Certificate of Secondary Education; Intermed, Intermediate; LSOA, Lower-level Super Output Area; Manag/Prof, Managerial/Professional; ONS LS, Office for National Statistics Longitudinal Study; Quals, qualifications; Tech/Routine, Technical/Routine

Figure 4

Predictive accuracy of LSOA-level variables to predict deprived/not deprived individual deprivation status (left); and ROC curves (right) plotted as sensitivity (true positive rate) against 1-specificity (false positive rate). A/B: occupation; C/D: education; and E/F: income. Dashed lines indicate LSOA ventile value with maximum predictive accuracy when used as the threshold value to differentiate between deprived/not deprived, where deprived are those above this threshold. AUC values are shown next to ROC curves. Data source: ONS LS. AUC, area under the curve; LSOA, Lower-level Super Output Area; ONS LS, Office for National Statistics Longitudinal Study; ROC, receiver operating characteristic

A secondary aim of the analyses was to test the concordance between the different types of deprivation variables included in the study. For both men and women, concordance between deprivation variables at the individual level was moderately high, while high concordance was found between the different ecological-level deprivation variables at the LSOA level (figure 2). There is some evidence of higher concordance between variables at the individual level for women than for men.

The patterns observed in the overall cancer patient cohort were also observed for each cancer when examined separately (online supplemental tables S4-S9). There was suggestive evidence of higher concordance between deprivation variables for patients with bladder cancer than for other cancer types, but small sample size and wide CIs around the estimates make these results hard to interpret.

Discussion

The main aim of this study was to assess the concordance between individual and ecological deprivation measures. Area-level income displayed particularly low concordance with individual-level income status; whereas area-level occupation, and, to a lesser extent education, appear to have slightly higher concordance with individual-level measures. Additionally, the results showed that aggregated area-level deprivation metrics are weak predictors of individual-level deprivation status in the cancer patient cohort analysed here. These results have important and wide-ranging implications for the interpretation of studies that examine the impact of deprivation on health outcomes, particularly those that form the basis of policies aimed at addressing inequalities. If aggregated area-level deprivation metrics do not fully represent socio-economic variation between individuals, then policies based on these measures risk misunderstanding the relationship between health and deprivation.

The calculation of the IMD income domain is based on the proportion of individuals in an area eligible for low-income tax credits or benefits. It is therefore principally an estimator of the distribution of very low incomes, and provides relatively little information about the distribution of mid-incomes to high-incomes. On the other hand, the individual-level income estimation method we used generates a continuous scale of income, the quintiles of which separate individuals with higher incomes from middle and lower incomes more effectively. An additional consideration is the calculation of an individual’s income, which is not directly collected as part of census data in the UK and we therefore had to use an estimation method.33 While this method is validated on UK data, it is nonetheless likely to introduce a degree of error, and perhaps especially so for those individuals managing periods of insecure employment or unemployment, whose occupations will be the least well-documented in the census. As such, ecological and individual metrics quantify income variation in different ways and might not be expected to closely match with one another. Income deprivation carries a major weight in the calculation of the IMD for area-level statistics, but our analyses show that it is not straightforward to translate this to individual circumstances. Differentially targeting healthcare funding towards the poorest communities, based on area-level income metrics, is a sensible policy with important potential benefits in terms of reducing inequalities, but it is nonetheless also important to recognise that this could overlook some individuals, and perhaps especially those with low income but not in the lowest income bracket.

For occupation, the area-level IMD domain is based on the proportion of unemployment in an area. In our individual-level data, unemployed individuals were treated as missing data31 and would therefore have been categorised by proxy (wherever possible) using the occupational category of another adult in the same household. This approach makes an imperfect assumption that the type of occupation of an unemployed individual can be approximated by the occupation of another adult in the same household (usually a spouse or partner). However, the relatively good predictive accuracy of area-level and individual-level occupation variables in our results suggests that there is a fair degree of geographical clustering of levels of unemployment and occupation types. Interestingly, concordance between individual and ecological occupation measures was not affected by a sensitivity analysis we carried out with unemployed individuals included in the analysis as part of the technical, routine and manual group, which could be explained by levels of unemployment being highest in these types of jobs.37

Our results showed that the ability of area-level education to predict individual status was similar to occupation, although slightly lower. In the case of education, the area-level IMD domain represents the proportion of people in an area with no qualifications, which was one of the individual-level categories we included for education, and this data was directly available from the census. As such, we might have expected close concordance between the two education variables. Although concordance is higher than for the respective income metrics, concordance is low overall and the predictive ability is consistent with the full picture presented by our results that area-level measures only capture some of the variation in deprivation, and do not fully represent individual deprivation status.

Our results suggest that, at least for patients with cancer diagnosed in England and Wales, area-level statistics are not a very good proxy for individual-level deprivation status, indeed for income deprivation they are only a small improvement on the toss of a coin. This is somewhat consistent with a recent study of a French population by Bryere et al,38 although we generally found slightly lower predictive power for area-level variables to predict individual-level deprivation. A major difference between the two analyses is that where Bryere et al used data that was a random sample of the population, we focussed on a cancer patient cohort. In particular, the cohort focussed on cancer types with wide socio-economic inequalities in survival,5 and survival inequalities were of interest as survival differences can be readily interpreted in terms of healthcare provision and performance. However, it may be interesting for further research to validate these results on the overall population cohort in the ONS LS.

Data availability has undoubtedly been a limiting factor in the ability of previous research to consider both area-level and individual-level effects of deprivation. Aggregated data is typically more easily accessible and therefore predominantly features in inequalities research. Our results have implications for the interpretation of studies that rely solely on area-level measures of deprivation such as the IMD. These are useful tools for summarising geographical trends, but our results suggest that caution is needed in terms of extending the interpretation to individual deprivation circumstances. We are not suggesting that aggregated deprivation statistics should not be used, or that the use of aggregated data produces unreliable results for the effect of ecological deprivation. On the contrary, our results show that area-level and individual-level health inequalities should be viewed as independent phenomenon, both of interest, and that their separate effects as well as their interaction are likely to be important for understanding and reducing socio-economic differences. For example, further research could address the extent to which inequalities in cancer outcomes are related to area-level factors such as the availability of healthcare services and resources, in comparison to individual-level factors such as symptom awareness and individual means to access appointments and treatment. Further, establishing whether or not, for instance, more deprived patients with cancer experience better outcomes when living in an affluent area compared with living in a more deprived area, due to increased availability of healthcare services and resources, is integral to fully understanding these differentials and thus the way in which resources should be deployed to address them.

Our data suggest, in fact, that where interventions such as cancer symptom awareness campaigns or screening have been directed at ecologically deprived areas, a significant minority of patients who are deprived will have missed out. The policies to reduce health inequalities set out in the NHS long-term plan20 are based on research using aggregate measures of deprivation. If the mechanism by which deprivation affects cancer survival principally functions at an individual level, it follows that such campaigns may have had limited efficiency. Conversely, if ecological factors are the predominant driver of inequalities this approach will have had greater traction. The fact that inequalities are not significantly reducing, even in the context of policy change,13 suggests the latter is, even if only partially, at work.

In conclusion, we have shown that individual and contextual deprivation are not highly concordant with each other in a cancer patient cohort, and we argue that this shows the potential for individual and contextual factors to have independent effects on health inequalities. Further research will be important to disentangle these factors and enable more targeted policy recommendations, especially in terms of individual-level deprivation effects, which have not received much research attention to date. An improved understanding of how individual deprivation affects health outcomes has potential to inform more effective policies to reduce health inequalities.

Acknowledgments

This work makes use of data from the National Cancer Data Repository prepared by the National Cancer Intelligence Network in association with the National Disease Registration Service and the cancer registries of England and Wales. The permission of the Office for National Statistics to use the Longitudinal Study is gratefully acknowledged, as is the help provided by staff of the Centre for Longitudinal Study Information and User Support (CeLSIUS). CeLSIUS is supported by the ESRC Census of Population Programme under project ES/R00823X/1. The authors alone are responsible for the interpretation of the data in this paper. This work contains statistical data from ONS which is Crown Copyright. The use of the ONS statistical data in this work does not imply the endorsement of the ONS in relation to the interpretation or analysis of the statistical data. This work uses research data sets which may not exactly reproduce National Statistics aggregates. We gratefully acknowledge the participation of all members of the ONS Longitudinal Study.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @DrFionaIngleby

  • Contributors All authors (FCI, AB, IA, MB, LE-B and LMW) contributed to the study design. FCI, AB, IA and LMW analysed the data. All authors (FCI, AB, IA, MB, LE-B and LMW) contributed to the interpretation of the results. FCI, AB, IMA and LMW prepared the manuscript. All authors (FCI, AB, IA, MB, LE-B and LMW) commented on and approved the final manuscript.

  • Funding This study was supported by a grant from the Economic and Social Research Council (ES/S001808/1) and partially by a programme grant from Cancer Research UK (grant number C7923/A18525).

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval London School of Hygiene & Tropical Medicine Research Ethics Committee: online application 14600; approved 01/02/2018.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are not publicly available but may be accessed via appropriate application to the ONS Longitudinal Study.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.