Article Text


Lung cancer in South Africa: a forecast to 2025 based on smoking prevalence data
  1. Volker Winkler1,
  2. Nosimanana J Mangolo1,
  3. Heiko Becher1,2
  1. 1Institut für Public Health, Universitätsklinikum Heidelberg, Heidelberg, Germany
  2. 2Institut für Medizinische Biometrie und Epidemiologie, Universitätsklinikum Hamburg-Eppendorf, Hamburg, Germany
  1. Correspondence to Dr Volker Winkler; volker.winkler{at}


Objective This study aims to forecast lung cancer mortality with respect to recent changes in smoking prevalence and compares the results to estimates from GLOBOCAN and the Global Burden of Disease study.

Setting An established epidemiological model is applied to detailed smoking prevalence data from South Africa to estimate lung cancer mortality from 2010 to 2025.

Participants Data from the South Africa Demographic and Health Survey conducted in 2003 was analysed by sex and ethnic group, and combined with longitudinal estimates on smoking prevalence from 1980 to 2010.

Primary and secondary outcome measures Results provide detailed data on tobacco smoking behaviour by age, sex and ethnic group as well as modelled age-adjusted lung cancer mortality and number of yearly lung cancer deaths.

Results From 2010 to 2025, a decrease in age-adjusted lung cancer mortality is shown from 17.1 to 14.1 among men; whereas rates were stable around 7.2 among women. As a consequence, the estimated number of yearly lung cancer deaths is expected to increase slightly for men and more for women. With respect to ethnic groups, male mortality is expected to be highest for Asians and lowest for blacks. Female rates were lowest for Asians and highest for whites and for coloured.

Conclusions Mortality estimates of this study are close to the WHO mortality database and to Global Burden of Disease estimates for 2010, but significantly lower compared with GLOBOCAN estimates. In conclusion, our study demonstrates the impact of demographic changes and the positive effects of antismoking policy on lung cancer mortality in South Africa. Results may help decision makers to further improve smoking control.

Statistics from

Strengths and limitations of this study

  • Widely used estimates do not take into account change in risk factor prevalence and demographic changes at the same time, instead they use invalid vital statistics to estimate current and future disease burden.

  • This study forecasts lung cancer mortality by taking into account recent developments in tobacco consumption as well as demographic change using South Africa as an example.

  • Results show the impact of successful tobacco control policy on lung cancer mortality and question much higher estimations published by International Agency for Research on Cancer (IARC); however, results across ethnic groups suffer from limited sample size.

  • Detailed information on tobacco consumption is scarce in many low and middle income countries and therefore limits the application of more sophisticated models.


Today, many low and middle income countries are going through an epidemiological transition and are facing a double burden of infectious and chronic diseases. Better control of infectious disease mortality leads to an increasing life expectancy, which generally results in a rise of chronic diseases.1–3

In order to assign budgets and resources to healthcare programmes and to estimate costs caused by disease, valid information on current and future disease burden are of great importance for every country. However, many low and middle income countries lack important data on health statistics such as disease-specific mortality.4 ,5 To fill this gap, methods have been developed to estimate the burden of disease in every country, worldwide, for example, IARC's GLOBOCAN for cancer.5

The GLOBOCAN method is based on national and subnational data on cancer mortality or incidence for countrywide or regionwide extrapolation. However, we have shown in earlier work that these estimates are questionable for African countries.6 ,7 With respect to future cancer burden, GLOBOCAN only takes into account demographic effects caused by the epidemiological transition and does not consider risk factor prevalence.5 In addition to demographic changes, risk factor prevalence and its development over time largely determine the chronic disease burden in a population.

Tobacco consumption is an important risk factor for many chronic diseases such as cardiovascular diseases, respiratory diseases and many types of cancer.8 ,9 The strongest association has been identified for tobacco smoking and lung cancer (LC), where it causes more than 90% of all cases in developed countries.10 Taking into consideration latency periods for smoking behaviour and LC, tobacco consumption decades ago mainly determines today's LC burden. Therefore, detailed analysis of smoking prevalence data may allow better prediction of current and future LC mortality.

In 1993, South Africa (SA) started to introduce strong tobacco control policies, which included a ban on advertising tobacco products, restrictions on smoking in public places and an increase in excise duties on cigarettes, as well as interventions such as health education programmes.11 ,12 Those policy interventions resulted in a 54% decrease of the per capita cigarette consumption, and smoking prevalence among school children declined from 23.0% to 16.9% between 1999 and 2011.12 Between 1995 and 2010, smoking prevalence among men was estimated to have decreased from about 40% to 22%, while prevalence among women remained almost unchanged at 9%.13

The aims of this study are to estimate current LC mortality with respect to ethnicity as well as to predict LC mortality in SA by taking into account data on smoking prevalence up to 2010 and with realistic assumptions about future smoking prevalence. Results of this study will help to evaluate the success of the antismoking policy with respect to LC deaths. Additionally, comparisons of our estimates to results from other methods such as GLOBOCAN and the Global Burden of Disease (GBD) study may help validate the use of our model in an African setting.


Data on tobacco smoking

We used data from the population-based Adult Health Survey, which was part of the South Africa Demographic and Health Survey conducted in 2003, to analyse smoking prevalence.14 The survey sample was designed to be a nationally representative probability sample based on the enumeration areas provided by Statistics South Africa. Therefore, the Adult Health Survey offered representative and detailed data on various health aspects including detailed data on tobacco consumption. Altogether, 9614 adults aged 15 years and over were eligible, of which 8115 were interviewed, yielding a response of 84%. The principal reason for non-response was the failure to find persons at home despite repeated visits to the household.

Owing to missing or implausible information on tobacco smoking, 25 persons (0.3%) were dropped from the analysis. The data set included detailed information on current and former smoking behaviour with regard to age, sex, ethnicity, tobacco products smoked per day, age at starting and for ex-smokers’ age at smoking cessation.

Every individual was assigned to one of the following categories: never-smoker, non-daily smoker, ex-smoker or current smoker. Current smokers were further categorised with respect to age at starting to smoke and number of cigarette equivalents smoked per day. Information on type of consumed tobacco products per day comprised manufactured cigarettes, pipes, cigars and hand rolled cigarettes. Cigarette equivalents per day were calculated by multiplying the factors 1, 3, 4, 1, respectively.15 For 236 (12.8%) current smokers, smoking dose was missing and had to be imputed as the mean smoking dose according to age group, sex and ethnicity. Age at starting to smoke was imputed accordingly for 234 (12.7%) current smokers. The category of current smokers also included individuals who stopped smoking less than 5 years ago.8

Ex-smokers were categorised into people who smoked less than or more than 10 pack-years. The data set did not include information on smoking dose and age at starting for ex-smokers. Therefore, both variables were estimated by using the distribution from current smokers according to age group, sex and ethnicity. Duration of smoking was then calculated as the difference between age at smoking cessation and age at starting. The number of pack-years was estimated as smoked cigarette equivalents per day divided by 20 and multiplied with smoking duration in years.

These procedures result in nine smoking dose categories as listed in table 1.

Table 1

Smoking dose categories and corresponding relative risks (RRs) for lung cancer

The analysis of smoking prevalence by the four ethnic groups, Asian (including Indian), blacks, coloured and white, is based on 8007 individuals, since 83 persons had missing information on ethnicity.

To project detailed SA smoking prevalence between 1995 and 2010, we used estimates of daily smoking prevalence from another study.13 Ng et al estimated the prevalence of daily smoking by sex for 187 countries from 1980 to 2012, based on nationally representative sources that measured tobacco use (n=2102 country-years of data). The study included data from eight SA surveys conducted between 1995 and 2007 among adults. We used their yearly prevalence estimates for SA to calculate the factor of prevalence change in relation to the baseline prevalence in 2003. For example, current smoking prevalence among men in 1995 was estimated by multiplying all age-specific smoking prevalence data of current smoker dose categories from 2003 with 1.38 (the factor for prevalence change of that year). Ex-smoker prevalence of that year was estimated by multiplying ex-smoker dose categories with 0.72 (the reciprocal value of prevalence change). Detailed numbers for estimated longitudinal smoking prevalence are displayed as age-adjusted prevalence in table 2.

Table 2

Estimated prevalence (age-adjusted to world standard) of current and ex-smokers in South Africa based on Ng et al13

Statistical methods

The model to estimate LC mortality has been described in more detail elsewhere.16 In the step age and sex, specific LC mortality rates were estimated based on LC mortality among non-smokers combined with (1) country-specific smoking prevalence data by dose-specific smoking categories and (2) relative risk estimates for these categories.

LC mortality rates among non-smokers Embedded Image were available from different recent studies from various countries.17–20 These estimates were combined into a summary estimate by using a linear regression model Embedded Image, where j is the mid-age of each 5-year age group.16

The age-specific LC mortality rate in year y for country k and age group j, Embedded Image was estimated by Embedded Image.

Embedded Image was the proportion of smokers in age group j, country k, dose group s and year y 15 years previous LC mortality rate, and Embedded Image was the relative risk associated with that dose (see table 1). Dose-specific relative risks were estimated based on previous studies.21–23 Relative risk estimates for current smokers took into account current age and age at start of smoking. The estimation procedure was performed for each sex separately.

The relationship between smoking patterns and LC is complex and a latency of 10–20 years is stated between smoking and LC.24 In this study we assumed a latency of 15 years, meaning smoking prevalence data from 2003 are used for LC mortality estimates for the year 2018. LC mortality was calculated for 5-year age groups from age 25 to 75+ years.

Following this, age-standardised mortality rates (ASR) according to the world standard population as well as the numbers of expected LC deaths were calculated.25 The numbers of LC deaths Dky were estimated as Embedded Image. Population figures njky were taken from UN Population Prospects referring to the years of the estimated LC rates.26

For comparison of our results, we used LC mortality estimates from the WHO mortality database, the GBD study and IARC's GLOBOCAN.5 ,27 ,28


Results of the consumption of tobacco products in 2003 are presented in table 3. Analysis of smoking prevalence shows that overall 27.7% of men and 8.4% of women in SA were current smokers, and 5.3% and 1.9%, respectively, stopped smoking more than 5 years ago. The highest prevalence of smokers was seen in the mid age group of 30–59 years for both sexes. This was also true for all ethnic subgroups. On average, male smokers consumed slightly more cigarettes per day than female smokers and their mean age to start smoking was about 5 months lower.

Table 3

South African smoking prevalence in 2003 by sex and ethnic group (83 persons missing ethnicity)

With respect to ethnicity, there were some remarkable differences. The highest total smoking prevalence was seen among Asian (38.4%) and coloured (38.4%) men, whereas among white men the prevalence of ex-smokers was highest with 9.1%. In contrast, the average number of consumed cigarettes was also highest among white men. The mean age to start smoking was lowest among Asian men and highest among black women. On average, women started smoking at an older age compared with men across all ethnic groups. The lowest total prevalence was seen among black women (1.3%). The male-to-female prevalence ratio of current smoking was therefore highest among black people at 6.7. For coloured and white people the male-to-female ratio was small at 1.3.

Results of the smoking prevalence analysis were reflected in estimates of LC mortality in the year 2018, which were based on smoking prevalence data from 2003 (see table 4). Estimated ASR was 15.0 and 7.1 for men and women, respectively. The highest ASR was estimated among Asian men (23.4) and lowest among Asian women (5.9). With regard to sex, the lowest rates among men were expected for black (13.3) and the lowest rates among women for coloured (11.9) and white (11.8).

Table 4

Estimated lung cancer mortality rates per 100 000 in South Africa 2018 (age-adjusted to world standard), based on prevalence data from 2003

Among all subgroups, age-specific rates increased sharply in older age groups.

Figure 1 shows the estimated ASR as well as estimated number of LC deaths in SA during the years 2010 and 2025. Our modelling results (blue) showed a decrease in male ASR from 17.1 to 14.1. The yearly number of male LC deaths increased slowly from about 2784 to about 3002. With regard to female LC mortality we expected an almost stable ASR being highest in the year 2010 at 7.4 and lowest around the year 2018 at 7.1. In consequence, the estimated number of female LC deaths showed a steady increase from 1813 cases in the year 2010 to 2407 cases 15 years later.

Figure 1

Estimated number of lung cancer deaths (dots, left scale) and ASR (crosses, right scale) in South Africa from 2010 to 2025. ASR, age-standardised mortality rate; GBD, Global Burden of Disease.

Comparisons of our results to the WHO mortality database (orange), the GBD study (green) and GLOBOCAN (red), showed similar estimates to the WHO mortality database and the GBD study. The most recent data in the WHO mortality database show 3164 (ASR=21.1) LC deaths among men and 1539 (ASR=6.9) among women in 2010. The GBD study estimated slightly higher numbers for men (3113 deaths) and a somewhat lower mortality for women (1447 deaths). In contrast, all GLOBOCAN estimates were considerably higher than our results with respect to ASR as well as estimated numbers of deaths. GLOBOCAN assumed ASRs to be stable for both sexes at 26.1 for men and 10.0 for women. As a consequence, the numbers of LC deaths were expected to increase for both sexes, in men up to 5591 and in women up to 3462 deaths in the year 2025.


The results of this study show great variation in LC mortality in SA with respect to sex and ethnicity. For all ethnic groups age-adjusted mortality rates among men were about twice as high as female rates. Among Asian people, male mortality was more than four times higher, but among coloured and white people, the sex ratio was only about 1.5. The lowest rates were estimated for black people, and overall differences by ethnic group are in line with another recent study.29

Predicted time trends of the LC burden in SA showed a decrease in male ASR and a 3% increase of male LC deaths until the year 2025. The successful antismoking policy effectively slows the increasing numbers of chronic diseases due to demographic changes.12 As a result and in contrast to IARC's GLOBOCAN estimations, our model predicts a slow increase in yearly LC deaths among men. Among women there were only small changes in ASR mortality, therefore, demographic changes show their impact and our model calculates a 33% increase in female LC deaths between the years 2010 and 2025. From 2015 onwards the increase in female LC deaths estimated in this study was almost in parallel to the estimates by GLOBOCAN, however, on a significantly lower level. Furthermore, all GLOBOCAN figures were significantly higher than estimates from this study.

In general, differences to GLOBOCAN have already been shown in other publications on LC mortality estimation and resulted in discussions on validity.6 ,30 Previous results of our model have shown significantly higher LC estimates compared with GLOBOCAN for all of sub-Saharan Africa except for SA.6 In 2012, GLOBOCAN estimates a mean ASR of 4.3 for sub-Saharan African men and an ASR of 18.1 for South African men.5 This could only be true if the smoking prevalence in SA was extremely high, and in other parts of Africa extremely low. Available data, however, do not support this extreme assumption. According to GLOBOCAN, the methodology used to estimate data on SA was a direct projection of available rates on incidence and mortality. The same method is used for countries with very good data such as Sweden, Finland and the USA. However, it is also stated that SA offers national low-quality data on cancer incidence and low-quality but complete vital registration.

In 2004, the South African National Burden of Disease Study reported substantially higher cancer mortality rates than cancer incidence rates reported by the National Cancer Registry of South Africa.31 ,32 For the year 2010, Statistics South Africa states that about 24% of all registered deaths are coded as ill-defined.33 Therefore, in our view, the estimation procedures applied by GLOBOCAN may not be feasible for SA. A comparison of LC mortality estimates by the GBD study and most recent data from the WHO mortality database showed our estimates were in line for men and somewhat higher for women. According to the WHO, coverage of cause-of-death registration improved to 92.3% since 2007, however, the number of ill-defined causes of death is still high at 21%.28

When comparing completely different methods of estimation without knowing the true result, the validity of methods are strengthened when they produce similar results. Therefore, we are confident our results are reasonably close to the true LC mortality in SA.

Our motivation to use such a simplistic model to estimate LC mortality was due to scarce data in numerous low-income countries. In comparison to many other low and middle income countries, SA offers relatively detailed data on tobacco consumption. Many of the recently published reports in other countries only provide smoking as a binary indicator and the smoking dose is often unknown. However, the relation between smoking and LC is rather complex, involving many aspects, for example, age at the start of smoking, duration of smoking, type of cigarettes smoked, and many more.21 Therefore, it is not possible to address all aspects in detail and biased estimates cannot be ruled out. Additionally, there are other risk factors for LC, such as occupational exposures, environmental radon exposure and diet, which have not been taken into account separately.

A recent study compared the adequacy of different models in estimating LC mortality rates.34 This study also considered a model using non-smoker LC mortality rates that are close to our baseline rates.16 ,23 ,35 The relative risks for the smoking categories of current smokers in our model were directly obtained from this considered model.35 In summary, this considered model results in very good agreement for period and cohort trends, but in weaker agreement with age for younger individuals.34

The baseline mortality rate for LC in non-smokers reflects the joint impact of all other risk factors combined and may, therefore, be imprecise. However, we showed that country-specific variations are limited.16 ,30 Finally, the latency time between smoking and LC cannot be seen as a fixed time period, however, on a population level, 15 years is a reasonable assumption.24 Despite all known limitations, we have validated the modelling procedure in various western countries with very different smoking patterns.16

Other limitations of this study concern the data used on smoking prevalence. Subgroup analysis by ethnicity should be interpreted with some caution due to limited sample size. Another aspect is the assumption that smoking dose pattern of ex-smokers was equal to that of current smokers. However, overall there are less than 8% of ex-smokers, therefore, this is negligible. Finally, the estimation of changes in smoking prevalence over time did not take into account the possibility of differential effects on different smoking dose groups and on different ethnic groups.

In conclusion, we are confident that our results provide valid estimates on current and future LC mortality in SA. Our study takes into account changes in risk factor prevalence to estimate future disease burden, which is rarely done. Results demonstrate the positive effect of antismoking policy on mortality and may therefore help decision makers to further improve smoking control.


View Abstract


  • Contributors VW coordinated the study, has written major of the manuscript, interpreted the data and results, and reviewed the literature. NJM carried out part of the analysis and contributed to writing the manuscript. HB contributed important intellectual content and helped in critically revising the manuscript.

  • Funding We acknowledge the financial support of the Deutsche Forschungsgemeinschaft (grant number WI 3510/1-2) and Ruprecht-Karls-Universität Heidelberg within the funding programme Open Access Publishing.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.