Introduction The Canadian Population Attributable Risk of Cancer project aims to quantify the number and proportion of cancer cases incident in Canada, now and projected to 2042, that could be prevented through changes in the prevalence of modifiable exposures associated with cancer. The broad risk factor categories of interest include tobacco, diet, energy imbalance, infectious diseases, hormonal therapies and environmental factors such as air pollution and residential radon.
Methods and analysis Using a national network, we will use population-attributable risks (PAR) and potential impact fractions (PIF) to model both attributable (current) and avoidable (future) cancers. The latency periods and the temporal relationships between exposures and cancer diagnoses will be accounted for in the analyses. For PAR estimates, historical exposure prevalence data and the most recent provincial and national cancer incidence data will be used. For PIF estimates, we will model alternative or ‘counterfactual’ distributions of cancer risk factor exposures to assess how cancer incidence could be reduced under different scenarios of population exposure, projecting incidence to 2042.
Dissemination The framework provided can be readily extended and applied to other populations or jurisdictions outside of Canada. An embedded knowledge translation and exchange component of this study with our Canadian Cancer Society partners will ensure that these findings are translated to cancer programmes and policies aimed at population-based cancer risk reduction strategies.
- potential impact fraction
- population attributable risk
- risk factors
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
We report a detailed and transparent approach for conducting large attributable risk estimation projects to assess the impact of multiple risk factors.
We have considered projections of both the exposure prevalence and cancer incidence with multiple approaches, which is an improvement over unrealistic fixed projection models.
Long-term projections of exposure prevalence and cancer incidence are statistically challenging and involve a great deal of uncertainty.
Many of our exposure measures are based on self-reported data, which introduces the possibility of misreporting.
Estimates of the current and future burden of cancer in Canada attributable to known and probable causes of the disease are required for allocating prevention resources optimally. National1 2 and global cancer incidence projections3 suggest that the burden of cancer will continue to rise. In Canada and other developed nations, this is largely attributable to growing and ageing populations. In addition, despite established associations between modifiable risk factors and cancer risk, a sufficient reduction in the prevalence of these risk factors has not been achieved in Canada.1–3 Identifying exposures and interventions with the greatest potential impacts of reducing cancer risk will aid in implementing prevention programmes and policies to combat this growing health challenge.
Several groups, including some members of our Canadian Burden of Cancer—Population Attributable Risk (ComPARe) Study Group, have produced estimates of the current burden of cancer attributable to lifestyle, environmental and infectious exposures in Canadian national4–6 and provincial7–15 populations. Additional studies have estimated the avoidable national16–20 and global21 cancer burdens in future attributable to single exposures. However, population attributable risk (PAR) estimates are dependent on risk factor prevalence, which vary over time and are population specific. Therefore, it is important to frequently update PAR estimates. In addition, several methodological extensions to these approaches, including modelling the combined impact of multiple risk factors and defining the timing of intervention impacts on subsequent cancer incidence, are lacking. A comprehensive estimation of the current and future cancer burden and the impact of potential reductions in exposure prevalence on cancer incidence on the population are needed.
For the ComPARe Study, we developed a methodological framework to estimate the burden of cancer in Canada using cancer incidence data (2015) and projected incidence trends (2015–2042). The ComPARe study team brings together the substantive and quantitative expertise of cancer researchers from across the country. This collaborative, pan-Canadian study also involves a partnership with the Canadian Cancer Society, a main knowledge end-user for this work, who worked in partnership with the researchers throughout this project. To ensure methods were rigorously applied and standardised across research labs, we developed a methodological framework for the estimation of current attributable and future avoidable cancers associated with modifiable risk factors. This framework extends the work of other groups22–29 and is applicable to a range of diseases and populations. In this paper, we describe the approach and methods used in the ComPARe Study. An overview of earlier methods used to estimate PARs and preventable impact fractions (PIFs) are presented. We then describe how we used these methods in the ComPARe Study, and the innovations that we developed to extend them. See figure 1 for an outline of our approach.
Figure 1 shows the methodological framework for the ComPARe Study. The concept of PAR and population attributable fraction was initially developed by Levin in 1953 to estimate the burden of disease in the general population attributable to a given factor.30 Attributable risks are predicated on the assumption that there are causal relationships between exposures and disease outcomes, and on the concept of the counterfactual, a scenario counter to actual experience, where exposures to the causal agents no longer exist or can be mitigated.31
Since the initial concept of the PAR method was introduced, several statistical and theoretical extensions to the framework have included methods to measure the uncertainty around PARs and the development of PIF. The PIF as an extension of PAR to consider situations of complete removal of the exposure cannot be assumed.32 The impact of a reduction in the prevalence or population distribution of an exposure and the subsequent impact of an exposure reduction is examined. The PAR and PIF form the statistical foundation of the ComPARe Study.
To apply the PAR and PIF to estimate the impacts of reducing exposures, three sources of data are essential (table 1): (1) the relative risk of incident disease, or risk distribution associated with exposure; (2) the proportion of the population or cancer cases exposed to the risk or protective factor (sex and age-specific exposure prevalence); and (3) sex-specific and age-specific disease incidence data. These three elements are needed to estimate the proportion of cancer cases that could be prevented, based on the PARs or PIFs. In the following sections, we present the methods used in the ComPARe Study for estimating the current attributable (PAR) and future avoidable (PIF) burdens of cancer.
Identifying risk factors for inclusion
A crucial component of attributable cancer estimation is determining which exposures should be included as causal for incident cancers. Given the considerable amount of epidemiological and basic science literature evaluating aetiological associations for cancer, we needed criteria to determine the level of evidence required for inclusion in our analyses. We developed a hierarchy of evidence for the ComPARe Study (figure 2) where quality determined using STrengthening the Reporting of OBservational studies in Epidemiology33 guidelines for cohort and case-control studies and Meta-analysis Of Observational Studies in Epidemiology34 guidelines for meta-analysis. The World Cancer Research Fund’s (WCRF) Continuous Update Project35 and the International Agency for Research on Cancer’s (IARC) Monographs on the Evaluation of Carcinogenic Risks to Humans36 have devoted substantial resources, including expert panels, to classifying potentially carcinogenic risks to humans. We used the recommendations from these international and national panels as our first level of inclusion. IARC group 1 (carcinogenic to humans) and group 2A (probably carcinogenic to humans) carcinogens were included. As a second level of evidence, we included exposure–cancer site pairs where high-quality meta-analyses of epidemiological studies published since the WCRF and IARC reports demonstrated consistent associations, as well as IARC Group 2B exposures for sensitivity analyses. The exposure and cancer site associations included in the ComPARe Study are presented in online supplementary table 1.
Supplementary file 1
Estimation of attributable cancers
Exposure prevalence data: including latency
The biologically relevant time period from the initiation of an exposure to development of disease is highly variable, depending on the exposure and cancer site, and it is likely to be measured in years or even decades for solid tumours. Therefore, we allowed for a period of latency from exposure to cancer incidence and diagnosis in our assessments. However, exposure prevalence data were not always available for the long relevant time periods implied by latency. As a proxy measure for each exposure, we extracted the median or mean follow-up time from exposure measurement to cancer incidence from large cohort studies. Our assessment of quality of the cohort studies was evaluated based on their sample size, methods of exposure assessment and length of follow-up, where large cohorts with detailed exposure and longer follow-up were considered the highest quality. This information concerning the latency period was then compared with the time periods for which high-quality data on exposure prevalence were available. We selected prevalence estimates that corresponded to the midpoint of the range of potential latency periods, as identified from the cohort studies. When these data were not available, we assumed a 10-year latency period between exposure measurement and cancer incidence, or used the closest available prevalence estimates. We attempted to strike a pragmatic balance between selecting a biologically plausible and relevant period of time and feasibly collecting prevalence data. For example, for the infectious agents, the latency period was determined by the availability of prevalence data. For Helicobacter pylori, there was one seroprevalence survey in 1999–2000, and for hepatitis B virus and hepatitis C virus, the prevalence data were collected from the Canadian Health Measures and the Canadian Notifiable Disease Surveillance System from 2007 to 2012. A diagram of our approach to modelling relevant exposures is shown in figure 3.
To estimate the attributable burden of cancer due to past exposures in Canada, we developed a hierarchy to select prevalence data from Canadian national and region-specific data sources, where available. For lifestyle exposures we considered data from large Canadian cohort studies when data from national population-based surveys were not available. For several environmental exposures, environmental monitoring data from sites in various parts of Canada were used. We collected exposure prevalence data overall and, where the data allowed, by sex, age and province.
Cancer incidence data
We obtained cancer incidence data for those 18 years of age and older from the Canadian Cancer Registry (CCR), a national registry of cancer cases covering the entire population of Canada, including by province and territory. Statistics Canada produces annual data quality reports for the CCR and each Canadian province and territory has a legislated responsibility for cancer collection and control, which improves the completeness and population coverage of the data.37 Data by province, sex and 5-year age group for 2012, being the most recent year of national data available at the time of the study (except for Quebec data, which were extrapolated from 2010), were obtained. Cancer cases were coded in the CCR using the International Classification of Diseases for Oncology, 3rd Edition. Cancer mortality was not considered in this study as we were interested in cancer prevention through changes in behaviours and exposures. Furthermore, the inclusion of survival requires an additional set of modelling assumptions related to survival across exposures groups, where the evidence base is far less developed.
Estimation of population attributable cancers—including uncertainty
The PAR estimation methods employed for the individual exposures in the ComPARe Study are presented in table 1. Since 95% CIs cannot be easily calculated for PARs,38 Monte Carlo simulation methods were used to estimate 95% CIs around PAR estimates, where the relative risk (RR) values were drawn from a log normal distribution derived from the RR and its associated variance estimated from 95% CIs while prevalence values were drawn from a binomial distribution with parameter n as the number of survey participants and parameter p as the prevalence of exposure estimated from the survey. We simulated 10 000 samples and used the 2.5th and 97.5th percentiles of the resulting PAR distribution as the lower and upper limits of its 95% CI.39 ,40
Estimation of avoidable cancers
Exposure prevalence data
To estimate the future avoidable cancer burden to 2042, it is necessary to project exposure prevalence (eg, to 2032 if a 10-year latency period is used). We used the exposure prevalence data hierarchy outlined above to identify the optimal exposure prevalence data. For these data, we focused on sources with longitudinal surveys. For exposures where historical data allowed past trends to be observed, one of the several approaches to model future prevalence were used. These included linear, logistic growth, multinomial logistic regression and exponential curves to predict the future proportion of the population exposed. Prevalence estimates were projected by sex and various levels of exposure prevalence. Models were selected based on expert opinion of the visual evaluation of the fit to past data trends and by avoiding extreme projection scenarios that might have arisen because of some overly influential data points. The different approaches to model future prevalence reflect different potential scenarios. Logistic growth considers that the prevalence of exposure would reach a future steady state, while multinomial logistic regression predicts that the past exposure observed trend would continue relatively unchanged into the future. Exponential and logarithmic curves are a compromise between the logistic and multinomial approaches, and involve an assumption that the past trend would continue, but at a slower pace. We projected exposure data for the combined population and for males and females separately, for both national and provincial estimates, where the data allowed.
Cancer incidence projections
Cancer incidence frequencies and rates were projected by extrapolating past trends using various statistical models. In the past, trends over age at diagnosis, year of diagnosis (period) and/or year of birth (cohort) as well as hybrids of these models have been used. More recently, the age-period-cohort41 and the age-drift-period-cohort (Nordpred)42 models have been widely used. For the ComPARe Study, the R package ‘Canproj’43 was used to project cancer incidence from 2012 to 2042. The package projects forward to a maximum of 30 years, which suited our needs, based on the uncertainty surrounding cancer sites for which secondary or primary prevention interventions were being scaled up (eg, colorectal, breast, lung and cervical cancers) or reduced (eg, prostate cancer).
Canproj combines cancer projection methods that have been used in the last 30 years to select the best fitted model for the data, using a decision algorithm to identify the most appropriate projection (online supplementary figure 1). The models available in Canproj include age-only, age-period (including common trend and age-specific trend), age-cohort and Nordpred42 (age-drift-period-cohort; negative-binomial distribution may replace the Poisson distribution when over-dispersion appears). All models provide projected age-specific incidence rates and counts. Through the decision algorithm the Canproj methods produce more realistic projection estimates than other approaches, such as the Poisson regression method,44 the polynomial regression and natural spline methods,45 the joinpoint method46 and the Bayesian Markov chain Monte Carlo methods47 by taking advantage of specific aspects of all these methods to fit the best model. We evaluated all the findings, independently of goodness-of-fit to inspect the face validity of the projections.
Supplementary file 2
Defining counterfactual scenarios
Within our avoidable cancers (PIF) framework we examined a range of exposure prevalence reduction scenarios or counterfactuals. Our primary counterfactuals were based on population-based interventions that have been shown to be beneficial in experimental studies, and which could be scaled up to the population level. We conducted a systematic literature search of interventions for each exposure and identified their effects from reviews, meta-analyses or large intervention (individual and/or community level) trials. For all exposures, we also included models with fixed prevalence reductions of 10%, 25%, 50% and 100% for every year between 2018 and 2042.
Potential impact fraction estimation: defining latency of interventions
Using projected exposure prevalence, cancer incidence and a range of counterfactual scenarios, we then estimated the proportions and numbers of avoidable cancers in Canada from 2018 to 2042. To present these results, we plotted the number of projected cancers under the baseline projection scenario (if no change in exposure prevalence were to occur), followed by the incidence estimated under a range of counterfactual scenarios.
To evaluate the assumed fixed latency period, we conducted sensitivity analyses using some other assumptions for the statistical distribution of latency periods, for example, including the uniform, modified Weibull and binomial distributions. These alternative distributions were each chosen to have a mean of 10 years and range from 0 to 15 years. Incorporating a distribution of latency periods into PIF estimation allowed us to better predict the transitional effect of counterfactual interventions.
Consideration of multiple risk factors and joint effects
As with other burden estimation efforts, our primary analyses were focused on the attributable and avoidable proportions and numbers of cancers related to individual exposures separately. This approach is an oversimplification because several exposures might be known to have joint impact or interactions on cancer risk. Several well-characterised examples include alcohol and tobacco for various cancer sites,48 and overweight or obesity and physical inactivity for colorectal cancer.49 Where possible, we have also estimated the impact of multiple risk factors for a series of scenarios where the scientific literature has suggested the existence of combined or synergistic effects. When exposures are strongly associated and/or their interaction on cancer risk departs from multiplicative risk, Levin’s formula to estimate PAR of individual risk factors must be used with caution. In order to combine PAR across exposures we used the Miettinen-Steenland approach for any combined or ‘summary’ estimates.
Our sensitivity analyses sought to characterise potential bias in the available prevalence and risk data. Since we relied on data from self-report questionnaires for some exposures, such as alcohol, physical activity and body weight, we expected a certain degree of misreporting. In our sensitivity analyses, we corrected the reported prevalence by using studies that had validated the survey data, based on small samples of objective measurements, and then using sex-specific correction factors. Some exposures had considerable (>10%) non-response rates (ie, responded ‘don’t know’ or ‘refuse to answer’), and for these cases in our main analysis, we assumed that non-responders had been unexposed to the risk factors in question. In the sensitivity analyses, we imputed exposure values using both missing-at-random and missing-not-at-random assumptions. For the missing-at-random scenario, we assumed that non-response was unrelated to the exposure status, and hence that the exposure distribution among non-responders was identical to that of responders. For the missing-not-at-random scenario, we assumed that the non-responders were all exposed, and that their exposure distribution was identical to the exposed survey responders.
Patient and public involvement
No patients or public were involved in this study protocol.
In the ComPARe Study, we developed approaches for each step of data collection, analysis, uncertainty estimation and sensitivity analyses in order to arrive at plausible PAR estimates for cancer incidence. Furthermore, this approach provides a methodologically rigorous framework for long-term projections of cancer burden and the relative impacts of different population-based interventions for cancer prevention. As new cancer risk factor prevention strategies are developed, their subsequent impact on the future cancer burden can easily be integrated into this project for a comparative analysis of intervention strategies.
The estimates from this project will be relevant to a broad audience, ranging from those working in cancer prevention and more broadly in health promotion, to cancer advocacy groups, public health and healthcare planners, health policymakers, clinicians and the public to inform priority setting in prevention programming and resources; allocation of funding to areas of unmet need; and so on. We have developed this project in collaboration with the Canadian Cancer Society (CCS), our knowledge translation partner. As a primary end-user of the data generated from this project, CCS’s input into the design and desired output of the project has been invaluable. We encourage other groups to plan knowledge translation via similar partnership arrangements from the initiation phase of the project.
During this project we encountered several methodological components that were comparatively under-developed. For example, while several groups have conducted large attributable risk estimation projects, few, if any, have systematically assessed the impact of multiple risk factors. Our examination of approaches for multiple risk factors adds to the literature and provides validation of the estimates produced in this project. In addition, we have considered projections of both the exposure prevalence and cancer incidence data with multiple approaches. Previous projects have assumed fixed cancer incidence or exposure prevalence for future projections, and both are unrealistic. Furthermore, in the application of our counterfactual scenarios, we tested and applied several lag time models to fit the most likely windows of exposure and their associated subsequent changes in cancer incidence. In addition, we have worked in collaboration with key knowledge end-users to develop counterfactual scenarios that best match realistic expectations for cancer prevention programmes.
Our framework, while building on previous approaches, has a number of limitations. Long-term projections of exposure prevalence and cancer incidence are statistically challenging and involve a great deal of uncertainty. Although we have strived to identify the highest quality exposure prevalence and cancer incidence datasets, and used methodologically sound approaches for modelling, our results still need to be interpreted with caution. The resulting projections are a direct product of the validity of the input data on exposure prevalence and associated relative risks. Using data of poor quality or having questionable validity may result in erroneous projections. For this reason, we included population-based, nationally representative surveys to estimate exposure prevalence when they were available. Many of our exposure measures, particularly for the lifestyle risk and protective factors, were based on self-reported data. Where possible, we modelled the potential impact of reporting biases on our estimates and included analyses focused on directly measured exposures.
For several infectious agents including Epstein-Barr virus, H. pylori and human papillomavirus, large-population-based estimates of prevalence were not available for Canada. For these instances, we included case series, case–control and cohort studies, as well as population-based surveys extracted from populations from the USA and if not available, then Western Europe. The use of a more sensitive assay for the detection of H. pylori has substantially increased the proportion of non-cardiac gastric cancers attributable to this infectious agent.50 To account for the new gold standard, the included studies will be corrected for measurement error.
In terms of cancer incidence projections, we relied on the Canproj programme,43 which uses age-period-cohort models and the extension of the Nordpred model that has been widely used by other research groups for long-term projections of cancer incidence. However, errors in estimates are inevitable when projecting 30 years into the future as the models do not account for future changes in risk factors (ie, population changes in smoking patterns, diet, etc.). In addition, to deal with some of the uncertainty inherent in projections, expert opinion was used when the projection model selected by Canproj was implausible, which introduces some degree of bias to the decisions.
The CCR is a high-quality database with good case ascertainment of malignant tumours. Very few incident cancer cases are missed in the CCR and therefore any bias would be minimal and would not affect our results.37 However, data for the province of Quebec were extrapolated from 2010, as data for 2012 were not available, which is a limitation for the national counts. Ethnicity was not taken into account in these estimates for various reasons. Unlike other national cancer registries, the CCR does not provide incidence data by ethnicity. Canada is not a populous country and stratifying cancer incidence by sex, age and ethnicity would lead to few observations. Furthermore, ethnicity-specific risk estimates and prevalence data are not available at this time. However, for ultraviolet radiation (UVR) exposure, ethnicity was taken into account, as there is a strong interaction between UVR and ethnicity.
We have described a methodological framework for attributable risk estimation and cancer projection that extends our previous research in PAR and PIFs. The application of this framework will provide estimates of both current attributable and future avoidable disease risk in Canada. These findings will be of use to those working in cancer prevention, public health and healthcare planners, health policymakers, healthcare providers and the general public for a wide range of applications in cancer control and prevention.
Contributors DRB, CMF, SDW, WDK, ELF, PAD, PJV, RN, and PD. were responsible for the study conception. DRB, AEP, SDW, WDK, ELF, PAD, PJV, YR, FK, XG, RN, LS, PD, KV, DO’S, PH and CMF contributed substantially to the study design. DRB, AEP and YR drafted the manuscript. DRB, AEP, SDW, WDK, ELF, PAD, PJV, YR, FK, XG, RN, LS, PD, KV, DO’S, PH and CMF revised the draft paper, and gave final approval of this version to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Funding This research is supported by a Canadian Cancer Society Research Institute Partner Prevention Research Grant (#703106).
Competing interests None declared.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Please contact the corresponding author for inquiries related to the data resulting from this work.
Collaborators Elizabeth Holmes (Canadian Cancer Society, Toronto, Ontario, Canada), Zeinab El-Masri (Cancer Care Ontario, Toronto, Ontario, Canada), Mariam El-Zein, Sheila Bouten (Department of Oncology, McGill University, Montréal, Québec, Canada), Tasha Narain, Priyanka Gogna (Department of Public Health Sciences, Queen’s University, Kingston, Ontario, Canada).
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.