Article Text

Using Monte Carlo simulation to assess variability and uncertainty of tobacco consumption in a city by sewage epidemiology
  1. De-Gao Wang1,
  2. Qian-Qian Dong1,
  3. Juan Du1,
  4. Shuo Yang1,
  5. Yun-Jie Zhang2,
  6. Guang-Shui Na3,
  7. Stuart G Ferguson4,
  8. Zhuang Wang5,
  9. Tong Zheng6
  1. 1School of Environmental Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
  2. 2Department of Mathematics, Dalian Maritime University, Dalian, Liaoning, China
  3. 3National Marine Environmental Monitoring Center, Dalian, Liaoning, China
  4. 4Faculty of Health Science, School of Medicine, University of Tasmania, Hobart, Tasmania, Australia
  5. 5Jiangsu Key Laboratory of Atmospheric Environment Monitoring and Pollution Control (AEMPC), School of Environmental Science and Engineering, Nanjing University of Information Science and Technology, Nanjing, China
  6. 6State Key Laboratory of Urban Water Resource and Environment, Harbin Institute of Technology, Harbin, China
  1. Correspondence to Dr De-Gao Wang; degaowang{at}


Objective To use Monte Carlo simulation to assess the uncertainty and variability of tobacco consumption through wastewater analysis in a city.

Methods A total of 11 wastewater treatment plants (WWTPs) (serving 2.2 million people; approximately 83% of urban population in Dalian) were selected and sampled. By detection and quantification of principal metabolites of nicotine, cotinine (COT) and trans-3′-hydroxycotinine (OH-COT), in raw wastewater, back calculation of tobacco use in the population of WWTPs can be realised.

Results COT and OH-COT were detected in the entire set of samples with an average concentration of 2.33±0.30 and 2.76±0.91 µg/L, respectively. The mass load of absorbed NIC during the sampling period ranged from 0.25 to 4.22 mg/day/capita with an average of 1.92 mg/day/capita. Using these data, we estimated that smokers in the sampling area consumed an average of 14.6 cigarettes per day for active smoker. Uncertainty and variability analysis by Monte Carlo simulation were used to refine this estimate: the procedure concluded that smokers in Dalian smoked between 10 and 27 cigarettes per day. This estimate showed good agreement with estimates from epidemiological research.

Conclusions Sewage-based epidemiology may be a useful additional tool for the large-scale monitoring of patterns of tobacco use. Probabilistic methods can be used to strengthen the reliability of estimated use generated from wastewater analysis.

  • wastewater analysis
  • Monte Carlo simulation
  • cotinine
  • trans-3’-hydroxycotinine
  • tobacco use

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • This paper describes a probabilistic method to assess the uncertainty of tobacco use based on wastewater analysis.

  • The approach offers a model to estimate tobacco consumption in a city.

  • The main limitation to this study is that sewage-based epidemiology gives no information on demographic characteristics of smokers.


China is the largest consumer of tobacco in the world, with an estimated 301 million smokers in 2010.1 The average cigarette consumption per smoker has remained relatively consistent during this century (estimated at 14.8 cigarettes per day (CPD) in 2002, and 14.2 CPD in 2010).2 ,3 In May 2015, the Chinese Government approved an increase in the tax applied to tobacco products, however, more tobacco control programmes and initiatives are needed to reduce the smoking.4 Monitoring tobacco consumption is essential for evaluating the effectiveness of tobacco control programmes and initiatives.

Cross-sectional household surveys are the main sources of cross-sectional smoking prevalence estimates. The reliability of these estimates relies on large representative samples of the population providing accurate information about their use of tobacco. These methods are labour-intensive and time-consuming.5 In part, to overcome these issues, a new method of sewage epidemiology based on wastewater analysis has been developed to investigate tobacco consumption. The approach is based on the principle that the metabolites of nicotine (NIC) are excreted along with urine into the urban sewer networks. By the detection and quantification of the principal metabolites—cotinine (COT) and trans-3′-hydroxycotinine (OH-COT)—in raw wastewater, back calculation of tobacco use in the population of a wastewater treatment plant (WWTP) catchment area can be estimated. Such wastewater analysis procedures have been used to estimate tobacco consumption in several cities in Spain6 and Italy.7 The results show good agreement with prevalence data from national epidemiological surveys, and demonstrate the potential of the approach to complement existing socioepidemiological methods.8

The primary advantages of sewage epidemiology procedures are that they provide evidence-based, objective and real-time estimates of drug consumption in a defined population.9–17 However, this approach is also subject to a number of uncertainties associated with the different steps involved.11 ,18 ,19 The key uncertainties are those related to the sampling of wastewater,20 ,21 the chemical analysis,22 the method to back-calculate tobacco use, the estimation of the size of the population,23–25 and human urinary excretion of metabolites.16 ,26 ,27 Gaussian error propagation has become widely used to calculate the uncertainty of estimations generated.11 ,19 ,28 However, this approach analyses the uncertainty of average values of estimation, not the variability of the estimation itself. As a representative of probabilistic approach, Monte Carlo simulations can quantify model inputs in an estimation model, and therefore assess the variability and uncertainty of tobacco consumption. The objectives of the present study were to assess tobacco consumption using wastewater analyses in a region of China, and to use Monte Carlo simulations to estimate the uncertainties of the estimation of tobacco consumption generated.


Wastewater sampling and analysis

WWTPs were selected for sampling to achieve a wide geographical distribution in Dalian, China. In total, 11 plants were included, servicing a population of 2.2 million people. All 11 plants operated with the secondary wastewater treatment systems, such as biological aerated filter, cyclic active sludge technology, sequencing batch reactor, and anoxic/oxic activated sludge process. Twenty-four-hour composite samples of raw wastewater were collected in polyethylene terephthalate containers by the staff of each plant on two consecutive weekdays in June 2015. A previous study suggested that the concentrations of COT and OH-COT were consistent between weekdays and weekends.7 Each WWTP also provided information on special events that occurring during the sampling period, flow variations (as measured by a flow sensor in the inlet of the WWTP), sampling mode, and sampling frequency. The population served by each WWTP was estimated using statistical data from the government.

A solid-phase extraction with reverse-phase cartridges was applied for the cleanup of the wastewater sample. Liquid chromatography coupled with tandem mass spectrometry was then used to analyse the samples. The details of sample treatment, analysis (see online supplementary table S1), and chemicals used in this study can be found in the supporting information.

Back-calculation of NIC and tobacco consumption

The per cent of COT and OH-COT excreted, the molecular weight ratio between NIC and the metabolites, the flow rate, and the population served, were used to calculate the mass load of NIC consumed in each WWTP per day (mNIC), as summarised in equation 1.Embedded Image 1where, CCOT and COH-COT are the concentrations of COT and OH-COT, F the flow rate of raw wastewater in WWTPs, MNIC, MCOT and MOH-COT the molecular weights of NIC, COT, and OH-COT, xCOT and xOH-COT the excretion percentage of COT and OH-COT relative to NIC (%).

The average of the amount of NIC consumed (mNIC_Mean) was calculated by multiplying each mass load estimate by the population-equivalent weight (equation 2).Embedded Image 2where mNIC,i is mass load of WWTP of i, WP,i the population-equivalent weight for WWTP of i, and n the number of the WWTPs from 1 to 11.

The population-equivalent weight for each WWTP was calculated based on the population served and total population for all 11 WWTPs (equation 3).Embedded Image 3where Pi is the population served by the WWTP of i, and Embedded Image the total population from 11 WWTPs.

From an epidemiological and public health perspective, a desirable goal would be to estimate the number of cigarettes consumed per smoker rather than the net mass of NIC within a community. Recognising this, the number of cigarettes consumed per smoker (nC) was calculated using equation 4.Embedded Image 4where D is the content of NIC in an average cigarette, Y the average yield of NIC uptake during smoking, R15 the ratio of population aged 15 years and older (15+) in Dalian and xSmoker the smoker ratio (15+) in China. D was estimated at 0.90±0.15 mg of NIC per cigarette sold in Chinese market. The value of Y was determined to be 0.66±0.19 (based on the amount measured for that cigarette brand by standard Federal Trade Commission method and metabolites of NIC in urine29). The values of 91% and 24% for R15 and xSmoker were used to calculate the number of cigarettes smoked per smoker.

Urinary excretion profile of NIC

Studies of NIC metabolism have been conducted over several decades. Many studies have reported excretory profiles of NIC administered through smoking, oral ingestion and transdermal administration.30–38 The analysis of compiled data suggests that the urinary excretion profile is quite similar when NIC is either administered through smoking or transdermal patches; however, the oral ingestion has higher extraction rates of OH-COT and COT (see online supplementary table S2). The main elements of the excretory profile of NIC are the NIC metabolites COT and OH-COT (and their conjugated forms), and NIC itself.39 The conjugated forms of these metabolites are completely transformed to the free form by β-glucuronidase enzymes from faecal bacteria in the raw wastewater.7 Therefore, the total amount (per cent of free and conjugated forms) of OH-COT and COT accounts for 43.4±13.8% and 32.3±8.0% of the total amount of NIC equivalents excreted when smoking is the main route of NIC administration. These mean excretion percentages were used to calculate the mass of NIC consumed.

Uncertainty analysis

The calculated estimates of tobacco consumption are based on the average values of the input parameters. However, their implementations are complicated since the parameters cannot be treated as fixed-point values: each parameter may take on a range of values depending on if the actual values are uncertain (eg, concentrations of COT and OH-COT), or if they vary from person to person (eg, excretion percentage of COT and OH-COT). The most effective quantification method for uncertainty and variability is to assign to each parameter a probability density function. The Monte Carlo approach (Oracle Crystal Ball software, V.7.3.1) allows the possibility of describing the uncertainty and variability associated with the input parameters, and incorporating them into the estimate of tobacco consumption.

Selecting appropriate input distributions for uncertain and variable inputs is crucial to the integrity of Monte Carlo simulation results. These distributions of density functions can take on several different forms (eg, normal, lognormal, uniform, triangular). Normal distribution can be used to represent a quantity for which the underlying mechanism can be described by the central limit theorem, such as the result of a large number of additive independent errors. In this study, we assumed the flow rates, population, the content of NIC in cigarettes and the yield of NIC uptake during smoking were normal distributions, with a conservative estimate of 10% relative SD. The central limit theorem can also be used as the basis for selecting a lognormal distribution to represent a quantity. The lognormal distribution has often been found to be a good representation of positive real values and results from the production of many random variation of the dilution of pollutant concentrations. As such, the concentrations and excretion percentage of COT and OH-COT were taken as lognormal distributions. Online supplementary tables S3 and S4 show the description of the parameter distributions for Monte Carlo simulations.

Sensitivity analysis shows how much each input parameter contributes to the uncertainty or variability of the estimations. In turn, it adds information on which portion of the variability is from natural fluctuation versus how much is caused by lack of knowledge. The contribution of an assumption to the total variance of the forecast is obtained as the square of the correlation coefficient normalised to the sum of the squared correlation coefficients (adjusted to 100%).

Results and discussion

Concentrations in wastewater

NIC, COT and OH-COT were detected in all 22 samples with average concentrations of 11.5±8.73, 2.33±0.30 and 2.76±0.91 µg/L (table 1), respectively. As expected, OH-COT concentrations were approximately 1.2 times higher than those of COT, in concordance with the known metabolic fate of NIC in humans (approximately 1.3 times). These values were highly correlated (R2=0.90). The concentrations measured in the samples collected on 11 WWTPs were very similar and stable, as expected. By contrast, parent NIC concentrations were higher than expected considering only urinary excretion, and showed variability across the different WWTPs. Cigarette butts and ash contribute to the amount of NIC in wastewater, suggesting that the human metabolites of COT and OH-COT are the best biomarkers for estimating tobacco consumption. NIC concentrations did not correlated well with either COT (R2=0.69) or OH-COT (R2=0.72).

Table 1

The concentration of NIC, COT and OH-COT in wastewater, flow rate, and served population, mass loads, and smoking cigarette number for 11 WWTPs in Dalian, China

These concentrations are in the same range of those measured by Lopes et al40 in Lisbon, Portugal (1.1–3.5 µg/L for COT) and slightly higher than those reported by Rodríguez-Álvarez et al6 in Spain (2.2–5.9, 1.0–1.6 and 1.8–2.8 µg/L for NIC, COT and OH-COT, respectively). The concentrations can be influenced by many factors, such as tobacco prevalence and flows arriving to the WWTPs.

Mass load of NIC consumed and uncertain analysis

The mass load of absorbed NIC during the sampling period ranges from 0.25 to 4.22 mg/day/capita, with an average of 1.92 mg/day/capita (table 1). The mass load of NIC consumed, reported in this work, are similar than those reported in Spain (1.8 mg/day/capita), and lower than in Italy (3.42 mg/day/capita). These differences in the mass load of NIC consumed may be explained by the underlying differences in tobacco consumption.

Figure 1 shows the probability distribution density of the NIC mass load in Dalian generated by the Monte Carlo simulations. The figure shows the 10th centile, the central tendency of consumption (50th centile) and the reasonable maximum mass load (90th centile). The value of mNIC_Mean was estimated to be 2.04 mg/day/capita, ranging from 1.56 to 2.39 mg/day/capita in Dalian. The results reveal that the uncertainty of mass load, as defined by the ratio of the 90th–10th centile, is 1.5.

Figure 1

Distribution of estimated mass load of nicotine (mg/day/capita) in Dalian: mean, median, 10th and 90th centiles.

Figure 2 shows the sensitivity analysis for mNIC_Mean and the eight most significant variables influencing mass load (>1%). The two most significant parameters influencing mass load were xCOT and xOH-COT, which contribute 52% and 31% of total uncertainty of mass load, respectively. By contrast, population, flow rate, and concentration combined only contribute 17% of the observed variance. In this analysis, it can be seen that variability of extraction percentage are far more important than the uncertainty about population, flow rate, and concentration to overall variance in mass load. The variability of extraction percentage for smokers refers to interindividual heterogeneity with respect to different extraction rates of OH-COT and COT. The mechanisms underlying these differences of urinary of NIC metabolites likely involve both genetic and behavioural factors.41 Variability is an inherent property of a system under study and, therefore, is irreducible. While the uncertainty of population, flow rate and concentration due to random or systematic error in measurement can be reduced with better data and/or better models. The main uncertainty of estimation results originates from the interindividual variability, which cannot be reduced through improving the measurement technique.

Figure 2

The results of sensitivity analysis for nicotine (NIC) mass load. xCOT and xOH-COT are excretion percentage of cotinine (COT) and hydroxycotinine (OH-COT) relative to NIC (%); Pcer and Fcer are the population and flow rate in wastewater treatment plant (WWTP) of CER; CCOT_cyi is the concentration of COT in WWTP of CYI; Pseg and Fseg are the population and flow rate in WWTP of SEG; Pmyi is the population in WWTP of MYI.

Consumption of tobacco and uncertain analysis

Given that mean value of 0.90 mg/cigarette of NIC content in the Chinese market, and 66% yield of NIC uptake, we estimated an average of 3.0 cigarettes/day/capita across Dalian. Assuming that 24% of population (15+) in China are active smokers, and 91% of population in Dalian are adults (15+), our procedure estimates that the average Dalian smoker consumed 14.6 CPD. This compares well with survey estimates of population smoking. The 2010 Global Adult Tobacco Survey found that adult Chinese smokers consumed, on average, 14.2 manufactured CPD.3 Similarly, a 2009 smoking prevalence investigation among urban residents of Liaoning province—which includes Dalian—found that the smokers consumed, on average, 14.2±0.3 CPD.42 Hence, the smoking rate obtained from sewage analysis shows good agreement with the smoking rate estimated from epidemiological survey data.

The probability distribution density for consumed number of cigarettes per smoker in Dalian is shown in figure 3. Since this estimation is calculated as the product of several probability distributions, the mass load distribution shown in figure 3 is approximately lognormal. The estimated cigarette consumption ranged from 10 to 27 CPD, with a median of 16 CPD. The uncertainty and variability of number of cigarettes was estimated as 2.0. The probability distribution of daily cigarette consumption is consistent with the questionnaire review from Jiangsu Province in 2010, which showed the median number of cigarettes consumed per day was 15, and ranged from 10 to 20 for the 25th and 75th centiles.43

Figure 3

Distribution of estimated number of cigarette consumed (cigarettes/day/smoker) in Dalian: mean, median, 10th and 90th centiles.

The sensitivity analysis for number of cigarettes smoked shows that the four parameters of Y, D, xOH-COT and xCOT contributed to 63%, 20%, 9% and 5% of variance, respectively (figure 4). The contribution of the other parameter is lower than 1%. The content of NIC in one cigarette is the most important parameter than other factors, which contribute to almost half the total uncertainty. Owing to interindividual heterogeneity of three parameters of Y, xOH-COT and xCOT, further study on improving estimating of the true values of D should be done to reduce the uncertainty of estimation for cigarette consumption.

Figure 4

The results of sensitivity analysis for number of cigarettes consumed. xCOT and xOH-COT are excretion percentage of cotinine (COT) and hydroxycotinine (OH-COT) relative to nicotine (NIC) (%); Y is the yield of NIC during smoking, and D the content of NIC in one cigarette.

In summary, the present study provides a systematic assessment of tobacco use in a city based on sewage epidemiology. Generally, the estimated tobacco consumption agrees quite well with those derived from a separate epidemiology survey. Monte Carlo simulation shows that the uncertainty of mass load and cigarette number are 1.5 and 2.0. The results of sensitivity analysis show that the interindividual variability plays an important role on uncertainty of estimation results. These data are the first published estimates generated using this tool for tobacco consumption in a Chinese city. Although sewage epidemiology has some major strengths, it is not without limitations. This approach gives no information on demographic characteristics of smokers. As such, it is unlikely that such an approach could entirely replace traditional epidemiological sampling.

Further work is required to develop new techniques to reduce the uncertainty of consumption. Sewage epidemiology is a potential approach to estimate consumption of NIC, and could complement conventional socioepidemiological surveys, particularly when it comes to the real-time monitoring of the impact of new tobacco control policies.


The authors are thankful for the discussion with Dr Lian-Zheng Yu from the Department of Noncommunicable Chronic Disease Prevention, Liaoning Center for Disease Control and Prevention, and Dr Lin-Ke, Ge from National Marine Environmental Monitoring Center.


View Abstract

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors D-GW designed whole study and wrote the paper. Q-QD and JD pretreated the samples. SY collected the samples. Y-JZ designed the model. G-SN analysed the samples. SGF provided input on nicotine metabolism and revised the paper. ZW and TZ revised the draft paper.

  • Funding This study was supported by the Fundamental Research Funds for the Central Universities, Open Fund by Jiangsu Key Laboratory of Atmospheric Environment Monitoring and Pollution Control (KHK1309), A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Open Project of State Key Laboratory of Urban Water Resource and Environment, Harbin Institute of Technology (QAK201503), and The National Science Foundation of China (grant numbers 41476084 and 21577029).

  • Competing interests None declared.

  • Patient consent Obtained.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.