Article Text
Abstract
Objective To estimate overdiagnosis of colorectal cancer (CRC) for screening with sigmoidoscopy and faecal occult blood testing (FOBT).
Design Simulation study using data from randomised trials.
Setting Primary screening, UK, Norway
Participants 152 850 individuals from the Nottingham trial and 98 678 individuals from the Norwegian Colorectal Cancer Prevention (NORCCAP) trial.
Intervention CRC screening.
Outcome measure We estimated overdiagnosis using long-term data from two randomised trials: the Nottingham trial comparing FOBT screening every other year to no-screening, and the NORCCAP trial comparing once-only sigmoidoscopy screening to no-screening. To estimate the natural growth of adenomas to CRC, we used the following microsimulation models: (i) the Microsimulation Screening Analysis; (ii) the CRC Simulated Population model for Incidence and Natural history; (iii) the Simulation Model of Colorectal Cancer; (iv) a model derived by the German Cancer Research Center. We defined overdiagnosed cancers as the difference between the observed number of CRCs in the no-screening arm and the expected number of cancers in screening arm (sum of observed and prevented by adenoma removal). The amount of overdiagnosis is defined as the number of overdiagnosed cancers over the number of cancers observed in the no-screening arm.
Results Overdiagnosis estimates were highly dependent on model assumptions. For FOBT screening with 2354 cancers observed in control arm, four out of five models predicted overdiagnosis, range 2.0% (2400 cancers expected in screening) to 7.6% (2533 cancers expected in screening). For sigmoidoscopy screening with 452 cancers observed in control arm, all models predicted overdiagnosis, range 25.2% (566 cancers expected in screening) to 128.1% (1031 cancers expected in screening).
Conclusions The amount of overdiagnosis estimated based on the microsimulation models varied substantially. Microsimulation models may not give reliable estimates of the preventive effect of adenoma removal, and should be used with caution to inform guidelines.
- gastrointestinal tumours
- health policy
- public health
- preventive medicine
- risk management
Data availability statement
Data are available upon reasonable request. Data and computing code available on request to the corresponding author.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
Data from two large randomised trials with long follow-up and detailed information of adenomas and cancers were used.
Four different microsimulation models to estimate the natural growth of adenomas to colorectal cancer were applied.
The amount of overdiagnosis was highly dependent on the transition rates in the different models of natural history.
Microsimulation models may not give reliable estimates of the preventive effect of adenoma removal and should be used with caution.
Introduction
Overdiagnosis is diagnosis of a disease that would not have developed to cause symptoms or death in the patient’s lifetime if not detected by screening.1 2 While largely disregarded 10 years ago, overdiagnosis is now recognised as a major harm of cancer screening.1–3
A substantial amount of overdiagnosis has been observed for mammography screening for breast cancer, ultrasound screening for thyroid cancer, and prostate cancer by prostate-specific antigen.1
Although colorectal cancer (CRC) screening is endorsed in the USA, Europe and in many other areas of the world, there is a striking paucity of studies on overdiagnosis. CRC screening entails different tests with varying characteristics, and most likely different amounts of overdiagnosis. The most commonly used tests are faecal occult blood testing (FOBT), sigmoidoscopy and colonoscopy.3 Faecal tests are mainly early detection screening tests as they primarily detect invasive cancers. Colonoscopy and sigmoidoscopy are mainly preventive screening tests as they primarily detect and remove adenomas which are cancer precursors. Because all CRC screening tests entail colonoscopy (either for primary screening, or for individuals positive with FOBT or sigmoidoscopy), all entail a component of preventive screening.3
The few studies investigating overdiagnosis in CRC screening report overdiagnosis of CRC of 6%–9% after positive FOBT, and 1 per 1000 screening colonoscopies detect and remove an adenoma that would never progress to CRC. However, the studies have been hampered by the failure to recognise the cancer-preventive effect of screening to ascertain correct estimates of overdiagnosis, and a lack of proper comparison groups, which made reliance entirely on modelling necessary.4 5
We took advantage of recent long-term follow-up data from large randomised trials of FOBT and sigmoidoscopy screening, with optimal comparative groups.6 7 The aim of the study was to estimate the risk of overdiagnosis of CRC based on the numbers of adenomas and cancers removed at FOBT and sigmoidoscopy screening.
Methods
Concept of estimating overdiagnosis
Estimating overdiagnosis for CRC screening requires knowledge about the natural history of adenomas to CRC. The natural history of adenomas cannot be observed directly. However, microsimulation models for CRC screening which are widely used for guidelines and policy making, for example, by the US Preventive Services Task Force and the American Cancer Society8 have natural history assumptions as an important part of their models.9 10
Study design
We estimated overdiagnosis of CRC by combining data from two sources:
We used data of observed adenomas and CRCs from two large randomised trials of FOBT screening6 and flexible sigmoidoscopy.7 Both trials entailed no-screening control arms.
For the natural history of adenomas to cancer, we applied estimates from the three most commonly used microsimulation models of CRC screening (a–c) and estimates derived from an observational setting (d).
All microsimulation models (a–c) simulate the lifetime risk of developing adenoma and CRC for a large population of individuals. The models were developed independently and were calibrated to the same data regarding adenoma prevalence, cancer incidence and stage distribution. The data were collected by Cancer Intervention and Surveillance Modeling Network. The model derived by the German Cancer Research Center (d) was developed using the German screening colonoscopy data and is based on the observed numbers of adenomas and cancers detected in the German population.
By combining the natural history estimates from these models with the data of actual observed adenomas and cancers in the randomised trials, we estimated the number of expected cancers if adenomas had not been removed at screening, and compared this to the observed number of CRCs in the no-screening control arms of the two randomised trials.
If the observed and expected numbers of cancer in the screening arm are similar to the control arm, there is no overdiagnosis. If the numbers are higher, this would reflect overdiagnosis, or that the models, which have been validated and are used to inform guidelines, may be incorrect.
Data sources
Screening with FOBT
The Nottingham trial is a randomised trial comparing FOBT screening with no-screening.6 13 Between 1981 and 1991, 152 850 individuals age 45–74 years living in the Nottingham area in England were randomised in a ratio of 1:1 to either screening by guaiac-based FOBT every other year, or to no screening.13 Between three and six screening rounds were applied to the screening arm, depending on the date of study entry. Attendance to at least one FOBT screening round was 59.9% (44 838 individuals), attendance to all screening rounds was 38.2% (28 720 individuals).
For the current study, we used data from the most recent follow-up of the trial published in 2012 (median follow-up time 19.5 years) reporting on removed adenomas and on CRC incidence and mortality for the screening as compared with the control arm.6
Screening with sigmoidoscopy
The Norwegian Colorectal Cancer Prevention Trial (NORCCAP) is a randomised trial comparing once-only sigmoidoscopy screening to no screening.7 14 Between 1999 and 2001, all residents of Oslo and Telemark County in Norway aged 50–64 years were randomised to sigmoidoscopy screening with (10 388 individuals) or without (10 392 individuals) an immunochemical FOBT, or to no screening (79 430 individuals).14 In all analyses the number of observed cancers in the control arm is rescaled to fit the screening arm.
For the current study, we used individual patient data provided by the NORCCAP investigators, including sex, age, CRC diagnosis, death and immigration, and for the screening participants the date of adenoma removal. Median follow-up was 14.8 years.7 We used the actual age and sex distribution among individuals with adenomas, consisting of 60% men and 40% women for adenomas removed at screening.
Model assumptions
Since we did not have access to individual data for adenoma patients for the FOBT trial, we assumed follow-up time of the entire 19-year study period for patients who had adenomas removed in the first screening round. For adenomas removed in subsequent rounds, we assumed uniform number of adenomas removed in each round, with corresponding shorter follow-up time. For detailed information on our assumptions, see online supplemental appendix 1.
Supplemental material
Since we had access to individual data in the sigmoidoscopy trial, including date of adenoma removal, size and histology of the adenomas, and follow-up time, no assumptions were needed.
Modeling
We used Markov models with discrete time (1-year intervals) to calculate the number of prevented cancers. Each year any adenoma patient could stay in the same state, move from the non-advanced/advanced adenoma state to preclinical cancer, or from preclinical cancer to cancer. Additionally, at each state patients could move to overall death. The state transition probabilities for overall death were obtained from population registries (see online supplemental appendix 1).
In order to estimate the transition probabilities from one state to another we used the four above-mentioned CRC screening models, including five scenarios.11 12 Each of the models provides assumptions for the natural history of adenoma growth as presented in figure 1. The MISCAN, CRC-SPIN and SimCRC models simulate time (in years) from adenoma to preclinical cancer and from preclinical cancer to symptomatic cancer.11 In our analyses using these three models, we assumed that all adenomas removed at screening were non-advanced (the most conservative approach). We did not adjust our analyses to multiple adenomas because as we focus on the time to the first cancer (ie, duration for one of the detected adenomas to develop to cancer). We used the median dwell time to estimate the annual transition between stages in the models. In the MISCAN model, non-progression of some adenomas is part of the natural history. We used probabilities of progressive adenomas reported by the authors, that is, 14% for individuals aged less than 65% and 14% to 96% (linearly increasing) for individuals aged 65–100 (25.7% progression at age 70, 49.1% progression at age 80, etc).15
The German model12 estimates age-dependent and sex-dependent annual transition rates from non-advanced adenoma to advanced adenoma, from advanced adenoma to preclinical cancer, and from preclinical cancer to clinical cancer. In our estimates we used the highest (German, high) and lowest (German, low) point-estimates for annual transition rates. Since no information on adenoma histology was available for the FOBT trial, we defined all adenomas ≥10 mm as advanced, and all adenomas <10 mm as non-advanced.
To obtain additional details on applied transition rates for adenomas and cancer, we contacted the MISCAN, CRC-SPIN and SimCRC investigators. We got responses from the MISCAN and CRC-SPIN modellers and following their advice, we performed the following sensitivity analyses. For the MISCAN model, the authors assume that the time from adenoma to cancer follows an exponential distribution with a mean of 140 years (meaning that most of the adenoma patients will not develop cancer within their lifespan). Thus, we drawn 1000 samples from an exponential distribution with mean of 140 years and calculate a median over the samples to estimate the annual transition between adenoma and cancer. For the CRC-SPIN we used annual transition rates for the following steps of the natural history: from adenoma <5 mm to adenoma 5–9 mm to adenoma ≥10 mm to preclinical cancer to cancer.16
Data analysis
Overdiagnosis calculations
We estimated overdiagnosis as the difference between the expected number of cancers in the screening arm if no adenomas had been removed at screening and the number of cancers observed in the control arm. In our primary analysis, we used the following formula to calculate the number of overdiagnosed cancers:
(Primary analysis)
where are the number of expected cancers among screening participants in the screening arm if no adenomas had been removed, and is the number of observed cancers in the control arm. The number of expected cancers is the sum of the numbers of cancers observed (diagnosed) in the screening arm , and the numbers of cancers that would develop from adenomas if they were not removed at screening (figure 2 and box 1).
Glossary of terms for overdiagnosis in preventive screening
Overdiagnosis: diagnosis of a disease that would not have developed to cause symptoms or death in the patient’s lifetime if not detected by screening.
Number of overdiagnosed cancers: number of cancers expected in the screening arm subtracting number of cancers observed (diagnosed) in the control arm.
Number of cancers expected in the screening arm: the sum of the numbers of cancers observed (diagnosed) in the screening arm, and the numbers of cancers that would develop from precursor lesions if they were not removed at screening (estimated through microsimulation models).
Overdiagnosis: number of overdiagnosed cancers divided by the number of observed (diagnosed) cancers in the control arm.
In a secondary analysis, we included all adenomas, both in the screening and control arm of the FOBT trial. This was not done for the sigmoidoscopy trial since no information existed on adenoma removal in the non-attenders and in the control arm. The following formula was used to estimate overdiagnosis:
(Secondary analysis)
where is the number of cancers expected if no adenomas had been removed among non-attenders and are the number of cancers expected if no adenomas had been removed in the control arm.
We estimated the amount of overdiagnosis as number of overdiagnosed cancers divided by the number of observed cancers in the control arm. For transparency, we also calculated overdiagnosis using the observed and expected number of CRCs in the screening arm as an alternative denominator. In a non-randomised setting, without a valid control group, number of cancers observed in the presence of screening may be the only number available and it is sometimes used as the denominator. However, for preventive screening methods (reducing cancer incidence) such as in colorectal screening, this may result in falsely high overdiagnosis estimates.17
Results
Trial data
Table 1 shows the baseline characteristics of the randomised trials on FOBT and sigmoidoscopy screening, including detected cancers and adenomas, and cancer incidence in the follow-up period.
Model data
The estimated annual transition rates from adenoma to cancer which are used in the five scenarios (four models) are displayed in table 2. The highest annual transition rates from adenoma to cancer were found in the CRC-SPIN and the SimCRC models, and the lowest in the German model (German low) (table 2). The annual transition rate from adenoma to preclinical cancer ranged from 3.0% in the CRC-SPIN model to 10.9% for the progressive adenomas in the MISCAN model (not all adenomas are progressive in the MISCAN model). The transition rate from preclinical to clinical cancer ranged from 17.3% in the German model (German, low) to 29.3% in the MISCAN and CRC-SPIN models. In the MISCAN model, not all adenomas progress and the estimates are for the progressive adenomas. Accordingly, if one follows 100 adenoma patients for 10 years and assume median reported dwell times, between 1 (German, low) and 18 (CRC-SPIN and SimCRC) cancers would develop according to the different models. Annual transition rates for sensitivity analyses can be found in online supplemental table 1.
Overdiagnosis
For FOBT screening arm, the number of prevented cancers in the different models ranged from 72 to 254. Four out of the five model scenarios predicted overdiagnosis ranging from 2.0% (2400 cancers expected in screening arm vs 2354 cancers observed in control arm) to 7.6% (2533 cancers expected in screening arm). The MISCAN model did not predict any overdiagnosis, table 3. For sigmoidoscopy screening arm, the number of prevented cancers in the different models ranged from 173 to 638. All models predicted overdiagnosis, ranging from 25.2% (566 cancers expected in screening arm vs 452 observed in control arm) to 128.1% (1031 cancers expected in screening arm), table 3. Similar results were obtained in the sensitivity analyses, see online supplemental tables 2 and 3. A comparison of cancers observed and cancers expected in screening and control arms can be found in figure 3.
The overdiagnosis was highest when cancers observed in the screening arm were used as denominator (up to 7.9% for FOBT and 147.3% in sigmoidoscopy) and lowest when cancers expected in the screening arm was used as a denominator (up to 7.1% for FOBT and 56.2% for sigmoidoscopy), see online supplemental table 4.
In the secondary analysis for FOBT screening (when we included estimates of cancers prevented through adenoma removal outside of screening in the screening arm and in the control arm), <1% overdiagnosis was estimated with the MISCAN model, whereas for the remaining four model scenarios, the estimated overdiagnosis ranged from 3.1% to 10.2%.
Discussion
This study is the first to estimate overdiagnosis in colorectal cancer screening using randomised trials with valid comparison groups and sufficiently long follow-up. We found that overdiagnosis with FOBT screening is between 0% and 7.6% depending on adenoma transition rates. The highest amount of overdiagnosis was observed for German model (German high), with 179 overdiagnosed cancers. For comparison only 236 cancers were detected at screening.
For sigmoidoscopy screening, our analyses yielded overdiagnosis between 25.2% and 128.1%. Two of the five models predicted overdiagnosis over 100%, which means that majority of cancers detected in the screening arm are overdiagnosed. The apparent difference in overdiagnosis between FOBT and sigmoidoscopy may be explained by different characteristics of adenomas detected at screening (less advanced in sigmoidoscopy and more advanced in FOBT). However, even if we use the lowest reported transition rates for adenomas removed in FOBT screening and highest reported for adenomas removed in sigmoidoscopy screening, the results are hard to believe clinically. Although, similar overdiagnosis (50%–70%) was estimated with MISCAN model for cervical cancer screening (which same as sigmoidoscopy is mostly preventive),18 there is no clinical justification for observed results.
Overdiagnosis was highly dependent on the model used, and thus the transition rates from adenoma to cancer. The choice of denominator or whether we included only adenomas detected at screening (primary analysis) or all adenomas (secondary analysis) did not change the results significantly. The CRC-SPIN and SimCRC models had the highest proportion of adenomas progressing to cancer within 10 years from adenoma detection (18%) for the German model (with the lowest annual transition rates, ie, German, low) it was 1.3%.
A reasonable explanation for the calculated overdiagnosis is that adenoma growth rates in at least some of the models are wrong. Our study may indicate that microsimulation models do not provide reliable estimates for the natural history of adenomas. As a consequence, because adenoma development is a central assumption in CRC screening microsimulation models, we are concerned that the model estimates for the effect of preventive CRC screening tests, such as sigmoidoscopy or colonoscopy screening, may be wrong. Since the estimates for flexible endoscopy screening were much more unlikely than for FOBT screening it may be that the models have specific problems with screening tests where multiple non-advanced adenomas are detected. It could be interpreted that either the proportion of non-advanced adenomas destined to progress to cancer or the time to progress to cancer is overestimated in these models.
Strengths of our study include the application of data from large randomised trials with long follow-up and detailed information of adenomas and cancers. Although some observational studies with long follow-up were also available19 20 we decided not to use them due to lack of appropriate control groups. Weaknesses include the inability to directly observe transition times of adenomas and cancers, which this study shares with all others in this area. Modelling studies of the natural history of CRC show a variation of mean transition time from adenoma to preclinical cancer between 7.6 and 24.2 years.11 The annual transition rate from advanced adenoma to CRC is estimated to be 2.5%–5.6% dependent on age and sex, and from adenoma to advanced adenoma from 3.6% to 4.2%.12 Thus, differences in transition rates do impact on the results obtained. Our assumption on adenoma growth relied on the median transition rate from the microsimulation models11 and the lowest and highest point estimate for annual transition rates from the German model.12 The advantage of this assumption is that it is transparent and easy to understand. In addition to assumption on adenoma growth rate, the microsimulation models include several other assumptions such as participation rates, performance and quality of the screening intervention. We did not make assumptions on any of these variables, and some may considered this a limitation of our study. We were able to use the observed data from the published trials where these variables are observed in a real life scenario and do not have to be assumed. Since there is no randomised trials on faecal immunochemical test, used as benchmark test for CRC in the UK,21 we used data from the randomised trial of FOBT screening. We believe that our results can be generalisable for faecal immunochemical testing depending on the threshold for the test positivity which can be chosen with FIT, but not with FOBT screening.
Overdiagnosis in preventive screening is difficult to ascertain, because the early detection effect which may harbour the risk of overdiagnosis is counteracted by the preventive effect, which reduces cancer incidence. Disentangling the two effects is necessary to ascertain the net overdiagnosis. As all approaches used for CRC screening include colonoscopy which has a cancer prevention effect due to adenoma removal, it is invalid to assume that there is no overdiagnosis in CRC screening simply because the cumulative incidence of CRC is similar or lower in the screening arm compared with the control arm at the end of follow-up. In the present study, we achieved this by using information about removed adenomas in the randomised trials and by applying modelled transition rates from adenoma to cancer from the most commonly used microsimulation models. This approach should provide a good and valid approximation of the true amount of overdiagnosis in CRC screening, but as our results show, it did not result in plausible estimates for sigmoidoscopy and FOBT screening.
Microsimulation modelling has been used extensively to guide policy making about CRC screening. The models we used for this study are regarded as valid and trustworthy. Important guidelines which are followed by physicians around the world and their patients rely on these models for their recommendations for or against screening. The US Preventive Services Task Force is heavily relying on microsimulation modelling,8 and many other organisations do so increasingly.9 10 However, based on our results, the four presented microsimulation models may not give reliable estimates of preventive effect of screening and they results should be used with caution.
Data availability statement
Data are available upon reasonable request. Data and computing code available on request to the corresponding author.
Ethics statements
Ethics approval
Ethics approval was not required for this study. The authors affirm that the manuscript is an honest, accurate and transparent account of the study. The Nottingham trial was approved by the Nottingham Family Practitioner Committee and BMA Medical Ethics committee. The NORCCAP trial was approved by the Ethics Committee of South East Norway and the Norwegian Data Protection Authority.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Contributors PW: study design, data analyses, first draft of the paper. MFK: study design, data interpretation, drafting the paper. ML: study design, data interpretation, drafting the paper. MBugajski: study design, data interpretation, drafting the paper. MBretthauer: study design, data interpretation, drafting the paper. MK: study design, data analyses, data interpretation, drafting the paper.
Funding This work was supported by Fulbright scholarship (Wieszczy P) and research grants from Norwegian Research Council and Norwegian Cancer Society (Bretthauer M, Kalager M, Loberg M). Award/Grant number is not applicable to any of the fundings.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.