Article Text

Download PDFPDF

Statistical projection methods for lung cancer incidence and mortality: a systematic review
  1. Xue Qin Yu1,2,
  2. Qingwei Luo1,
  3. Suzanne Hughes1,
  4. Stephen Wade1,
  5. Michael Caruana1,
  6. Karen Canfell1,2,3,
  7. Dianne L O'Connell1,2,4
  1. 1Cancer Research Division, Cancer Council NSW, Sydney, New South Wales, Australia
  2. 2The University of Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
  3. 3Prince of Wales Clinical School, University of New South Wales, Sydney, New South Wales, Australia
  4. 4School of Medicine and Public Health, University of Newcastle, Newcastle, New South Wales, Australia
  1. Correspondence to Dr Qingwei Luo; qingweil{at}


Objectives To identify and summarise all studies using statistical methods to project lung cancer incidence or mortality rates more than 5 years into the future.

Study type Systematic review.

Methods We performed a systematic literature search in multiple electronic databases to identify studies published from 1 January 1988 to 14 August 2018, which used statistical methods to project lung cancer incidence and/or mortality rates. Reference lists of relevant articles were checked for additional potentially relevant articles. We developed an organisational framework to classify methods into groups according to the type of data and the statistical models used. Included studies were critically appraised using prespecified criteria.

Results One hundred and one studies met the inclusion criteria; six studies used more than one statistical method. The number of studies reporting statistical projections for lung cancer increased substantially over time. Eighty-eight studies used projection methods, which did not incorporate data on smoking in the population, and 16 studies used a method which did incorporate data on smoking. Age–period–cohort models (44 studies) were the most commonly used methods, followed by other generalised linear models (35 studies). The majority of models were developed using observed rates for more than 10 years and used data that were considered to be good quality. A quarter of studies provided comparisons of fitted and observed rates. While validation by withholding the most recent observed data from the model and then comparing the projected and observed rates for the most recent period provides important information on the model’s performance, only 12 studies reported doing this.

Conclusion This systematic review provides an up-to-date summary of the statistical methods used in published lung cancer incidence or mortality projections. The assessment of the strengths of existing methods will help researchers to better apply and develop statistical methods for projecting lung cancer rates. Some of the common methods described in this review can be applied to the projection of rates for other cancer types or other non-infectious diseases.

  • statistical projections
  • statistical modelling
  • lung cancer
  • tobacco smoking
  • systematic review

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This is the first systematic review summarising statistical methods used in projecting lung cancer incidence or mortality rates over the past three decades.

  • The review was conducted according to the published guidelines.

  • Using predefined assessment criteria and a standardised data extraction form resulted in a high level of agreement in the data extractions performed by two independent reviewers.

  • The review provided theoretical and practical information, including a comprehensive summary of the methods and relevant software.

  • Meta-analysis was not possible due to the wide variation in study populations and time periods used in the projections.


Lung cancer has been the most commonly diagnosed cancer in the world for several decades and is the leading cause of cancer deaths worldwide, accounting for nearly 20% of all cancer deaths.1 Reliable projections of future patterns of lung cancer incidence and mortality are, therefore, of importance for the planning of health service requirements and the management of healthcare resources.2 3 Given the well-documented association between tobacco smoking and lung cancer risk,4 5 projections of lung cancer incidence and mortality are also important for evaluating the effectiveness of existing tobacco control programme and the forward projection of the potential impact of new evidence-based tobacco control strategies.2 6 7 There have been a variety of statistical methods developed and reported in the literature for projecting cancer incidence or mortality rates.2 These methods range from assuming the current rate remains unchanged into the future, to a more complex class of statistical models of past trends such as age–period–cohort (APC) models, which may involve a range of assumptions, software and techniques.

Projecting future cancer incidence and mortality trends is always a complex exercise due to the changing risk factor profiles over time, and the long latency period between risk factor exposure and development of some cancers.8 For lung cancer in particular, projections can be inaccurate if any changes in past smoking behaviour are not accurately taken into account.2 3 Unfortunately, data on smoking behaviour are not always available with the requisite level of detail (eg, sex-age-specific data), so choosing and implementing an appropriate projection method largely depends on data availability and the purpose for the projections.8 Given the complexity involved in such projections, information on the available statistical methods, utilisation of these methods and further developments in this area are of particular interest to researchers working in this field. However, while some of these methods have been reviewed and evaluated,8–11 to our knowledge, there are currently no published systematic reviews of all statistical methods available for projecting lung cancer incidence or mortality rates.

Therefore, we carried out a methodological systematic review to identify and summarise published population-based studies that used statistical methods to project lung cancer incidence or mortality rates over the long term (eg, more than 5 years). The aim was to provide up-to-date and comprehensive information on the statistical methods that are available for projecting lung cancer rates. In doing so, our intention was to provide readers with an understanding of these various statistical methods, the availability of statistical software to implement these methods, and the utilisation of these methods in different circumstances, and to highlight the differences and similarities between methods.


This systematic review adhered to the checklist presented in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses.12 A protocol was developed for this review and is included as online supplementary resource 1.

Patient and public involvement

As this was a systematic review of statistical methods used to obtain lung cancer rate projections, no patients or public were involved.

Literature search

In August 2016, Embase, Medline and PreMEDLINE databases were searched using text terms and, where available, database-specific subject headings, for studies published since 1988, which used statistical methods to project lung cancer incidence and/or mortality. Searches for lung cancer-related terms were combined with searches for terms related to projection, forecasting and statistical models. Reference lists of relevant articles were checked for additional potentially relevant articles. In August 2018, Embase and Medline, including Epub Ahead of Print, In-Process and other Non-Indexed Citations databases, were searched for studies published from 2016 onwards using an updated search strategy, which aimed to capture all newly published articles. A complete list of the terms used is included in online supplementary resource 2.

Selection criteria

Full inclusion and exclusion criteria are listed in table 1. Studies were included if they used a statistical method to project lung cancer incidence and/or mortality over a period greater than 5 years using population-based data and were published in English from 1 January 1988 to 14 August 2018. ‘Statistical method’ was defined as a method that analyses the observed data using traditional regression, correlation or other statistical summaries. ‘Projection’ was defined as the use of data including the whole or part of the observed data to forecast lung cancer incidence or mortality rates beyond the time period covered by the data included in the statistical models. Mathematical models, which generate outcomes based on a proposed theoretical model of the disease’s natural history, were not included in this review.

Table 1

Inclusion and exclusion criteria employed

Application of selection criteria

The literature search and the review followed the stages described in figure 1. After removing duplicates, 1878 studies were retained for screening. One author (SH) screened the titles and abstracts against the inclusion criteria to exclude articles that were clearly irrelevant. The main reason for exclusion of papers at the screening stage was that the studies did not report on lung cancer incidence or mortality. Others were excluded because they used mathematical methods rather than being population-based studies. Further studies were excluded because they were an editorial commentary or literature review. After the screening process, a total of 166 studies were eligible for full-text review.

Figure 1

PRISMA flow chart of study selection process. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Full-text articles were independently reviewed and assessed for inclusion by two authors (XQY and QL) and a total of 101 studies were retained for final inclusion (92% agreement). Disagreements were discussed and if an agreement could not be reached the study was assessed for inclusion by a third reviewer (DLO). Excluded studies and the reasons for exclusion are listed in online supplementary resource 3. The main reasons for exclusion of studies at this stage were that they did not report lung cancer rates separately or the projections were for fewer than 6 years.

Critical appraisal

As the purpose of this methodological review was to provide an overview of statistical methods, and the projections of lung cancer rates were conducted in different populations and over different time periods, no meta-analysis was possible and specific projection results were not compared or analysed in this review. Therefore, the risk of bias evaluation of the included studies was not applicable.

The methodological quality of the studies was independently assessed by two reviewers using prespecified criteria (table 2): quality of the data source, length of period covered by the observed data, availability of software information, model fitting and validation. Validation provides information on the performance and reliability of the projection model and can be undertaken by withholding the most recent observed data from the model fitting and then comparing the projected rates for those years with the actual observed values.7 As the use of scales for assessing study quality is discouraged in Cochrane reviews13 and meta-analyses,14 as the calculation of an overall score inevitably involves assigning (often arbitrary) weights to the quality criteria being assessed. It is difficult to justify the weights used and it has been shown that the overall quality score is not a reliable assessment of the study’s validity.13 Moreover, each method included in this review has its own merits and limitations, and depending on specific circumstances may be more or less reliable or relevant. Therefore, an overall score for the methodological quality of each study was not provided.

Table 2

Prespecified criteria for assessing studies included in this review

Data extraction

For each included study, two reviewers (XQY and QL) independently extracted details of the study including data sources, study population, year of publication, observed data period for the projections, statistical methods and software used, and whether the method incorporated information about smoking patterns, which is the main risk factor for lung cancer. The extracted data were collected using a standardised form (see online supplementary resource 4), which was pilot tested using 10 studies. Any differences between the two reviewers were discussed and when agreement could not be reached the studies were assessed by a third reviewer (DLO). The overall agreement between the two reviewers was 91.6%.

The selection of an appropriate statistical method for projecting cancer rates is largely restricted by the quality and availability of cancer data, which is generally better in more developed countries.15 The Human Development Index (HDI), developed by the United Nations,16 is a summary measure of life expectancy, education and gross domestic product per head of population. We, therefore, recorded HDI ranking for each of the study populations, so that we could describe the distribution of projections methods used according to the country’s level of development.

Classification of statistical methods

In order to summarise the differences and similarities between the methods reported, we developed an organisational framework to classify methods into groups according to both the type of observed data used and the statistical models reported (figure 2). As tobacco exposure is well known to be the most significant risk factor for lung cancer4 and can be used as an important predictor for lung cancer incidence and mortality, we first divided the studies into two large categories according to whether or not they included data on smoking in the projection method. For each category, we then subdivided studies into groups according to the projection method used. Methods not incorporating data on smoking in the population were grouped as either: (1) APC models, a special form of generalised linear model (GLM), which includes age, period and cohort components, (2) other GLMs, where the number of cases (deaths) or the logarithm of this was modelled as a linear or non-linear function of the explanatory factors using the logarithm of the population size as an offset, with Poisson or negative binomial distribution and (3) present state method (eg, assumes that the age-specific rates in the future will be the same as the most recent observed rates, or assumes a constant annual rate of change as observed in a selected time period). Methods incorporating data on smoking were grouped into: (1) GLMs with a smoking variable as one of the covariates, (2) APC models that included an effect for smoking, (3) projections adjusted for the smoking attributable fraction (SAF) and (4) other methods (including all methods that do not use detailed historical cancer data or do not include detailed data on smoking). More detailed descriptions of each of these methods are provided in online supplementary resource 5.

Figure 2

Organisational framework to categorise methods for lung cancer mortality projections. APC, age–period–cohort; GLM, generalised linear model; SAF, smoking attributable fraction.


A total of 101 eligible studies were included (table 3). All these studies are ecological studies that used single year or 5-year aggregated population incidence or mortality data, or are based on cancer rates reported in the literature. Table 4 shows the study characteristics grouped according to the method used for the projections. Eighty-eight studies used projection methods not incorporating data on smoking,1 2 9 17–101 16 studies used a method incorporating data on smoking,3 7 33 41 42 102–112 and 6 studies used multiple methods.18 33 36 41 42 62 Overall, APC models were the most commonly used method to project lung cancer rates (44 studies used this method),2 9 17–58 and other GLMs were the next most commonly used (35 studies).18 36 59–89 100 101 Only 12 studies used the present state method by assuming that the average cancer rates in the most recent years will remain constant into the future.1 62 90–99 Of the 16 studies incorporating data on smoking, eight studies directly used GLMs with a variable reflecting detailed historical smoking-related behaviour as one of the covariates included.3 7 33 103 106 108 111 112 These variables included number of cigarettes consumed and average tar content,3 7 33 smoking prevalence,111 number of years of smoking106 112 and smoking intensity.103 108 Two studies used APC models and predefined coefficients based on recent trends in smoking prevalence and tar content to adjust the estimates for the period parameter.41 42 Two studies made projections adjusted for the SAF, which required limited data on smoking behaviour,102 107 and the remaining four studies used other methods, which required limited data on both cancer rates and smoking behaviour.104 105 109 110

Table 3

Summary of included studies

Table 4

Summary of study characteristics grouped according to projection method used

The majority of models were developed using more than 10 years of observed data that was considered to be good quality, that is, incidence data included in the Cancer Incidence in Five Continents series,15 or mortality data from a source considered by WHO to have a high population coverage.113 Most studies provided projections for 10 years or more, and the proportion of studies providing projections for more than 19 years was higher for studies using methods incorporating data on smoking (50.0%) than for studies using methods which did not incorporate smoking patterns (18.2%). Only 25.7% of the studies provided comparisons of fitted and observed rates and 11.9% of the studies reported validation of the projection model using observed data.

The numbers of studies by publication period and by the country’s HDI rank are presented in figure 3. The number of publications increased substantially over time, especially the number of studies using APC models, which more than tripled in the most recent period (2008–2018) compared with 1998–2007. The majority of the articles included in this systematic review used data from countries with very high or high HDI including studies from the USA, Europe and Australia, 16 studies used data from countries in medium or low HDI groups including studies from China and India, and 22 studies used data from multiple countries.

Figure 3

Studies included by year of publication, 1988–2018 and level of human development of the country providing the data, stratified by method. *Six studies used more than one method, and 22 studies used data from multiple countries. HDI, Human development index.

The statistical software packages used by method and year of publication are shown in figure 4. Among the studies using APC models, the most commonly used software package was Nordpred (R package developed by Harald Fekjær and Bjørn Møller, Cancer Registry of Norway)10 38 and most of these studies were published in recent years. GLIM (Oxford, UK)114 was the second most commonly used software for APC modelling, but it was mainly used in the earlier years, with the latest study published in 2000.45 Special software WinBUGS (Cambridge, UK),115 INLA (R package developed by Rue and Martino, Department of Mathematical Sciences NTNU, Norway)116 or BAMP (Institute of Biomedical Engineering, Imperial College, London, UK)117 were used for studies employing Bayesian methods.2 20 22 25 26 31–33 48 85 Among studies using other GLMs, Joinpoint (National Cancer Institute, USA)118 and Stata119 were the two most commonly used software packages. Most studies using the present state method did not mention which software was used. Nordpred, Stata, Joinpoint, SAS,120 other R packages and WinBUGS were the software program most commonly used in the recent time period. An overview of these software packages is provided in table 5. Each of these packages has different features and some are freely available to researchers.

Figure 4

Statistical software packages used by method and year of publication. *Six studies used more than one method, 20 studies used more than one software package. **Others include BMDP, BAMP, S-Plus, S and Can*Trol.

Table 5

Summary of software packages commonly used in 2008–2018


This review highlights the scope and diversity of the statistical methods used to project lung cancer rates for the longer term, and provides a summary of the main methods used in studies conducted over the last three decades. These methods range from using a basic assumption that the current rate will remain unchanged into the future, to more complex statistical models involving a range of different assumptions, statistical techniques and software packages. We found that both lung cancer incidence and mortality projections were commonly based solely on past cancer trends, and only a limited number of studies incorporated smoking data in the projection models, most likely due to the scarcity of data on past smoking behaviour in the population.21 Methods, which do not incorporate smoking data, are also generalisable to projections for other cancer types. We found that the number of studies reporting statistical projections for lung cancer increased substantially over time, and that the majority of these studies used good quality data from countries with a very high or high level of HDI.

The three-factor APC model was the most commonly used method for projecting lung cancer rates. This method does not require knowledge of aetiological factors,25 as the period and cohort effects are considered to be surrogates for exposure to a range of risk factors.11 For example, period effects can reflect diagnostic and treatment factors, which lead to changes in disease incidence and survival across all age groups.11 On the other hand, the cohort effect may represent risk factors such as smoking behaviour that change from generation to generation.7 11 106 This method is considered to be appropriate for long-term projections.10 However, due to the non-identifiability of the linear components of the age, period and cohort parameter estimates, there is no way to distinguish the period effect and the cohort effect. This non-identifiability issue for APC models can be addressed by introducing constraints to the time effects, however, the parameter estimates can be sensitive to the choice of constraint on period and cohort factors.3 121 In addition, the APC model used in this context generally assumes that current and past trends continue into the future, and such an assumption would be questionable if any interventions have significant impacts on the cancer rates. Given the latency period between exposure to a cancer agent and development of some cancers, projections that are based on past trends may be inaccurate.8 Nonetheless, with the development of strategies to deal with the inherent non-identifiability problem in such models, the APC model has been implemented in various statistical software packages in recent years.10 122–124

In contrast to the APC model, other methods using GLMs do not include all three time components in the same model, making them less complicated to use. GLMs are more flexible and can be easily implemented using commonly available software including Stata,119 SAS,120 R and Joinpoint. The interpretation of the results from the standard GLM seems to be straightforward, and it can be extended to incorporate other factors.125 This method has been evaluated using Finnish Cancer Registry data and it was concluded that the GLM performed reasonably well for short-term (eg, 5 years) projections.125 However, GLMs may not be appropriate for long-term projections (>10 years) as the model does not consider period and cohort effects at the same time. For example, a GLM without a cohort component may not be appropriate for cancer types where significant changes in risk factors have occurred, due to the lack of cohort-specific effects in the projections.125 On the other hand, a GLM without a period component will not be able to capture the changes in period effects for cancer types with screening programme or improvements in treatments over time.33 It is recommended that the potential significance of period and cohort effects should be examined and determined prior to implementing any projections using GLMs.9

The present state method is the simplest projection method, which projects future numbers of lung cancer cases or deaths by applying the average of the age-specific incidence/mortality rates observed in the most recent years to the projected future age-specific population estimates. The projection is based on a very strong assumption that the rates will remain constant over the projection period, which could be 20 or 30 years long. This method does not need special software, and it is a practical method to use when long-term historical data are not available. Although the validity of this assumption may not be realistic, especially for long-term projections, the results of present state projections can provide base assumptions from which to examine the impact of population growth and ageing on the cancer burden, and can provide a benchmark which is useful for evaluating the effect of cancer prevention or intervention activities.

Due to the association between tobacco smoking and lung cancer risk,4 5 past smoking behaviour is considered to be an important predictor for lung cancer rates.3 7 The accuracy of lung cancer projections can, therefore, be improved if historical data on smoking exposure in the population are incorporated into the models. This is likely to be particularly important if smoking trends peak and then reverse over time, as has occurred in a number of high-income countries,126 since the simple projection of lung cancer trends based only on data reflecting the burgeoning epidemic will not reflect the impact of a turnaround in smoking prevalence.3 However, our review found that only a very limited number of published studies incorporated smoking data in the projection models, with only eight studies including detailed historical data on smoking exposure along with lung cancer data in their projection models.3 7 33 103 106 108 111 112 Another eight studies used less detailed information or a limited amount of smoking data, which was not directly included in the projection models.41 42 102 104 105 107 109 110 Negri et al developed a method to incorporate smoking patterns into an APC model, multiplying the estimated period parameters by predefined coefficients based on recent trends in smoking prevalence and the tar yield of cigarettes.41 42 Two studies reported projections adjusted for the SAF, which involved modelling projections based on observed cancer data and then modifying the projected rates by multiplying by the SAF, which was estimated from a previous population-based study.102 107 This method can be used for data from any country where lung cancer is primarily caused by smoking,107 but is more suitable for countries where lung cancer mortality for males had reached its peak some time ago and recent smoking prevalence is similar for males and females.107 In addition, it should be noted that the SAF based on the relative risk of death for current smokers estimated by the American Cancer Society’s Cancer Prevention Study II (ACS CPS-II) in the USA may not be applicable to other countries.107 A few other studies used methods which were based on cancer rates reported in the literature or on less detailed data,104 105 109 these methods are useful for countries where it is not realistic to use more sophisticated models due to the lack of detail in the available cancer and smoking data. However, for projections in populations at an earlier stage in the smoking epidemic more detailed information on tobacco exposure would be necessary so that the complex changes over time in the smoking behaviour of the population are captured.127

As previously discussed, GLMs are flexible and can be extended to incorporate other covariates, including smoking exposure, at the requisite level of detail. Log-linear models assuming a Poisson distribution based on age, cohort and cigarette tar exposure were reported by Brown and Kessler3 using data from the USA, and by Shibuya et al7 using data for four countries—the USA, UK, Canada and Australia. Both studies were based on sex-specific tobacco consumption over time for two large age groups (30–49 years and ≥50 years).3 7 These studies take into account the effects of changes in tobacco consumption and differences in exposure among birth cohorts, and both studies demonstrated improvements in projections by incorporating tar exposure measurements into the projection models. This approach was also reported by Knorr-Held and Rainer33 using data from Germany, but they concluded that the available smoking data in Germany were not able to improve their projections, because there was no available information on sex-specific cigarette consumption, nor on the average tar content per cigarette. This confirmed that accurate projections and the selection of appropriate projection methods depend on the quality and availability of data at the requisite level of detail. Some other smoking-related variables have also been used, including smoking intensity103 108 and the number of years of smoking prior to age 40.106 All the studies using GLMs did not include constraints on the period and cohort components. This method has the advantage of flexibility and is able to piecewise examine the performance of various models based on different covariates, which is particularly relevant when detailed data on risk factors are available. However, the application of this method for a specific cancer type requires reasonable justification and validation, to ensure that the covariates included in the projection model are sufficient to reflect the factors that impact cancer rates in the population. In addition, the potential risk of ecological bias should be considered.

The availability of suitable software is paramount when dealing with complex models and inferences, such as when using APC models. The increasing number of studies using APC models is likely to be due to recent developments in statistical software packages including R and Stata. Norpred is a free-software package in R and S-PLUS for APC modelling which was developed by Møller et al at the Cancer Registry of Norway.10 It incorporates a smoothing technique and has become the most commonly used software for fitting APC models in recent years. However, Norpred only provides projections for a maximum of 25 years beyond the observed data, and no other covariates can be incorporated into the model. Other R packages, including ‘Epi’,9 ‘apc’123 and ‘INLA’,48 can also be used for cancer incidence or mortality projections. Two packages in Stata were developed for APC models in the early 2010s and have the advantage of more flexible modelling implementation,122 124 although one package requires additional programming when projecting beyond the observed data.122 Joinpoint118 is another popular package that has been increasingly used to project cancer rates into the future by extrapolating the most recent trend.128 However, Joinpoint is only considered to be suitable for short-term projections.118

We acknowledge that each method included in this review has its own merits and limitations depending on the length of projections, data quality and availability, and the timing of analysis in relation to different stages of the smoking epidemic in a country (particularly, whether smoking prevalence is assumed to peak over the time frame of the analysis). It is important to note that all projections of cancer incidence and mortality based on historical trends may be inaccurate, regardless of the method used, if the underlying trends in risk or interventions change.9 This is particularly relevant to lung cancer due to its strong relationship with tobacco exposure.8 There is no way to identify the ‘best model’ for all situations or to conclude that one method is superior to another. Furthermore, even projections using the same method can be sensitive to the model setting and the length of the projection base.10 Therefore, wherever possible, appropriate validation of the selected projection method should be performed, as such information is useful for checking the specifications of the model and helps researchers understand the potential limitations of the projection model. Performing a validation of the model being used for a projection by withholding the most recent observed data from the model fitting and then comparing the projected with the observed rates for the most recent period, can provide important information on the performance of the projection model.7 Surprisingly, however, fewer than 12% of the studies reported on this, although as high-quality data on lung cancer rates is now available for several decades or longer for many countries it is likely that this type of validation will become more feasible and more frequently performed. In addition, as more data become available over time, prior statistical projections can be compared against the emergent data, which will allow for even greater understanding of the general strengths and pitfalls of the various methods—this exercise is underway and will yield further insights.

Strengths and weaknesses

Although we searched multiple electronic databases (Medline, Embase and PreMEDLINE databases), this review is limited to studies published in English. Thus, this review may not be complete if there were relevant studies published in other languages. It is also possible that we may have missed articles in the initial search, as we were unable to search the grey literature completely for eligible studies. It should also be noted that this review is limited to lung cancer only (International Classification of Diseases 10th Revision, ICD 10 C33–C34), which means it will not capture the literature on every possible type of cancer related to the lungs (eg, mesothelioma). In addition, the wide variability in study populations and time periods made meta-analyses infeasible. Despite these potential limitations, we believe this review is still a valuable resource and has many strengths. By searching the reference lists of all included articles, we should have ensured a thorough and extensive coverage of the literature, and developing prespecified assessment criteria to provide clear definitions for the different assessment areas allowed for objective assessment of the studies. Also, a pretested and revised standardised form was used for data extraction, which should have minimised differences between the data extraction by different reviewers, as confirmed by the high agreement for the data extracted by the two reviewers (91.6%). Also, we developed an organisational framework to categorise and summarise the projection methods used in the literature, which provides the comprehensive information and highlights the similarities and differences across methods. To our knowledge, this systematic review is the first to provide comprehensive, up-to-date coverage of the literature on statistical methods for projecting lung cancer rates.

Implications for research

This systematic review provides a comprehensive summary of the statistical methods over the past three decades used in published lung cancer incidence or mortality projections. The assessment of the strengths and advantages of existing methods will help researchers to better understand the currently used statistical methods for projecting lung cancer rates. In this review, we summarised both theoretical and practical aspects, including software information and generalisability of the methods, and some of the common methods described in this review can be applied to other cancer types, so it is hoped that this review will serve as a resource for researchers who are interested in using or developing one or more of these methods for projecting cancer rates. In particular, the methods incorporating a covariate such as smoking may be also applicable to projection of rates for other cancers with data on risk factors or diagnostic factors at the requisite level of detail, such as prostate specific antigen (PSA) testing rates for prostate cancer.


We would like to thank Clare Kahn for editorial assistance, Harriet Hui for assistance in collecting full-text articles, and Victoria Freeman for assistance in updating the final search for this review.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
  58. 58.
  59. 59.
  60. 60.
  61. 61.
  62. 62.
  63. 63.
  64. 64.
  65. 65.
  66. 66.
  67. 67.
  68. 68.
  69. 69.
  70. 70.
  71. 71.
  72. 72.
  73. 73.
  74. 74.
  75. 75.
  76. 76.
  77. 77.
  78. 78.
  79. 79.
  80. 80.
  81. 81.
  82. 82.
  83. 83.
  84. 84.
  85. 85.
  86. 86.
  87. 87.
  88. 88.
  89. 89.
  90. 90.
  91. 91.
  92. 92.
  93. 93.
  94. 94.
  95. 95.
  96. 96.
  97. 97.
  98. 98.
  99. 99.
  100. 100.
  101. 101.
  102. 102.
  103. 103.
  104. 104.
  105. 105.
  106. 106.
  107. 107.
  108. 108.
  109. 109.
  110. 110.
  111. 111.
  112. 112.
  113. 113.
  114. 114.
  115. 115.
  116. 116.
  117. 117.
  118. 118.
  119. 119.
  120. 120.
  121. 121.
  122. 122.
  123. 123.
  124. 124.
  125. 125.
  126. 126.
  127. 127.
  128. 128.


  • XQY and QL contributed equally.

  • Contributors All authors contributed substantially to the conception and design of the study. KC and DLO conceived the study. SH, QL and XQY drafted the study protocol and designed the data extraction form with input from DLO. SH did the initial scan of the literature search results to exclude articles that were clearly irrelevant. XQY, QL and DLO acted as reviewers. XQY and QL conducted the data extraction, data analysis and drafted the initial manuscript. DLO contributed to the interpretation of results and drafting of the manuscript. SW and MC contributed to the interpretation of results. All authors critically reviewed the manuscript and approved the final version.

  • Funding This project has not received any funding and the authors are employed by Caner Council NSW, Australia.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement All data relevant to the study are included in the article or uploaded as online supplementary information.