Introduction

The main goal of mammography screening is to reduce mortality and morbidity from breast cancer through early detection. The diagnostic accuracy of full-field digital mammography (FFDM) is clearly at least as good as that of screen-film mammography (SFM) [1]. However, there are some controversies on the specific effect on different performance indicators, including the detection of ductal carcinoma in situ (DCIS). Recent studies have shown that the detection rate is higher with FFDM than with SFM, partly owing to greater detection of DCIS [2, 3]. Some studies providing data on the characteristics of tumours detected by FFDM have described a trend to identify less advanced invasive cancers [2, 4]. These results point to two possible situations: an improvement in early diagnosis or, on the contrary, a worrying increase in overdiagnosis. The effect of the switch to FFDM on recall rate and positive predictive value (PPV) is unclear, with some studies showing a higher recall rate and lower PPV [2, 5], while others report a lower recall rate [6, 7] and similar or higher PPV [8, 9]. The effect on interval cancer has been studied less, but several reports have found no effect of FFDM on interval cancer rate [6, 10, 11].

However, to our knowledge, no studies have presented data on all these quality indicators together (cancer rates, including interval cancer, sensitivity, specificity, PPV and tumour characteristics). Moreover, one of the main limitations of studies comparing the two technologies is the short time period since the introduction of FFDM. Most of the studies provide data only from the first one or two screening rounds performed with FFDM [2, 5, 79, 12], which may have been influenced by the transition from one technology to the other, as well as by the learning curve. Therefore, the evolution of quality indicators in subsequent digital screening rounds has scarcely been evaluated. Evaluation of the long-term effect of FFDM on the screening performance indicators within a cohort perspective is required for a complete evaluation of screening.

The aim of this study was to analyse trends in the cancer detection rate (invasive cancers and DCIS), tumoral characteristics, interval cancer rate and the sensitivity, specificity and PPV of FFDM over a 16-year period with 6 years of complete digitalization.

Patients and methods

Setting and study population

A retrospective cohort study was performed in women participating in a population-based breast cancer screening program in an area of 300,000 inhabitants in the city of Barcelona, Spain. Women aged 50–69 years were invited by personal letter to undergo mammography with a 2-year interval between screening rounds. We included data from 61,859 participating women from 1 November 1995 to 31 December 2010, and followed up until 31 December 2012. These women were screened in two radiology units and underwent a total of 182,002 screens.

The two radiology units began screening activities with SFM (SSH 140 A; Toshiba and Bennett, Trex Medical, Copiague, NY. Film: Mamoray-HT, AGFA, Greenville, SC) in 1995 and 1999, respectively, and shifted to FFDM (DM 1000 Agfa; Lorad, Danbury, Conn) in 2007 and 2004, respectively. Mediolateral oblique and craniocaudal views were available for each breast. All mammograms were read by two radiologists using the BI-RADS classification [13], and when double reading led to different assessments, a third radiologist served as a tiebreaker. Prior screening mammograms were always available in the original format during reading in successive screenings.

The program was based on the European Guidelines for Quality Assurance in Mammographic Screening [14] and its results met the Europe Against Cancer standards. The study was approved by the Ethics Committee of Parc de Salut Mar. Informed consent was not required.

Screening procedures

The screening program keeps mammogram registers with data from participants and the final outcome of screening. Two results of a screening test are possible: normal findings (for which screening mammography at 2 years is recommended) and abnormal findings, which require further assessments to confirm or exclude malignancy. When, after further assessments, a tumour is found (DCIS or invasive cancer), the result is considered a true positive. Otherwise, the result is considered a false positive.

Further assessments can include noninvasive procedures (additional mammography, ultrasound, magnetic resonance imaging) and/or invasive procedures (fine-needle aspiration, core-needle biopsy, open biopsy). Once a malignant tumour is histologically confirmed, the woman is sent to the referral hospital for treatment and follow-up. These women are not invited to further screening.

After a negative result of a screening episode, with or without further assessments, and before the next screening invitation, a woman may be diagnosed with interval breast cancer [14]. Interval cancers were identified by merging data from the register of the screening program with data from the hospital-based cancer registry and telephone contact with women who underwent mammography in the last scheduled screening but who did not attend the following screening invitation. This procedure covered 98 % of women lost to follow-up from the program [15]. In our study, we extended the definition of interval cancer until the 30th month, because each screening round can last up to 6 months. All data sources kept information on the date of diagnosis, which allowed us to ensure that all interval cancers fitted the case definition.

This study included 161,992 screening mammograms performed from the second screening round onwards (82,961 SFM and 79,031 FFDM). In all, 684 tumours were detected in screening (345 with SFM and 339 with FFDM), and 226 tumours were diagnosed as interval cancers (114 with SFM and 112 with FFDM) (Fig. 1).

Fig. 1
figure 1

Flow chart of the study. Parc de Salut Mar breast cancer screening program, Barcelona, 1995–2010

Study period

For the purpose of this study, the first screening round after the implementation of the screening program in both radiology units (RU1 and RU2) (prevalent round, all performed with SFM) was excluded from the analyses. We excluded the first round of the screening program because of its particular characteristics, such as the prevalence peak of cancer detection and the higher proportion of larger tumours, which could bias our results. Thus, we excluded all screening mammograms performed from November 1995 to December 1997 for RU1 and from January 1999 to December 2000 in RU2 (n = 20,010). Overall, 9 years were covered by SFM (from January 1998 to March 2007—9 years—in RU1, and from January 2001 to September 2004—3 years—in RU2). Similarly, the digital period covered 6 years (from March 2007 to December 2010—3 years—in RU1, and from September 2004 to December 2010 in RU2—6 years). Both radiology units cover two closed neighbourhoods in Barcelona and use the same screening protocol. The mammograms were read by the same radiologists.

Study variables

Information from screening (initial or successive screening, date of screening mammogram, further assessments, the use of SFM or FFDM, and the final outcome of screening) was obtained from the screening program database.

Age at diagnosis was obtained from the date of birth and date of the screening mammogram.

Tumour-related information (invasiveness, pathological tumour–node–metastasis [TNM] status and histological grade) was drawn from the hospital-based cancer registry.

Statistical analyses

Cancer detection rates (overall, invasive and DCIS) were computed as the number of cancers detected per 100 screening tests performed with SFM and FFDM. To calculate the interval cancer rate, women with cancer diagnosed at screening were excluded from the denominator, because they were not “at risk” of an interval cancer. Recall rate was defined as the percentage of screened women requiring at least one further assessment after a positive mammogram. False-positive rates were defined as the number of further assessments (including noninvasive and/or invasive procedures) with no cancer diagnosis divided by the number of screening tests, performed with SFM and FFDM.

To calculate the accuracy measures, we used the definitions from the European Guidelines for Quality Assurance in Mammographic Screening [14]. Sensitivity was computed as the proportion of cancers identified at screening (screen-detected cancers) divided by the sum of screen-detected cancers and interval cancers. Specificity was the proportion of truly negative screening examinations relative to true negatives plus false positives. The PPV of the screening test was computed as the proportion of screen-detected cancers divided by the sum of false positives and screen-detected cancers. Proportions were compared by the two-sided Chi-square test.

We computed a time/technique variable, which allowed us to exclude possible confounding due to time trends and to analyse information from the two radiology units together, because they covered different periods using SFM and FFDM [4, 7]. We divided the screening history in each radiology unit into four equal time intervals (quartiles) both for the SFM period and FFDM period. Therefore, each time interval for SFM and DM had a similar number of screening tests.

Logistic regression analyses were performed to assess the odds of cancer detection and interval cancer in the SFM and FFDM periods, adjusting for radiology unit, age and whether the diagnosis was established at the initial or successive screenings. The logistic regression models were replicated to assess the odds of invasive cancer and DCIS detection. For the former, we censored DCIS diagnostics and for the latter, we censored invasive diagnostics. Repeated measures in the logistic models were considered to be independent observations, because cancer detection always depends on the absence of a previous diagnosis. The outputs of the logistic regression models were plotted, showing the adjusted odds ratios (OR) and the 95 % confidence intervals (95 % CI).

We conducted a sensitivity analysis by including versus excluding the 3-month learning period after the switch to digital technology. We also compared the results by including the program’s first screening round.

All p values were based on two-sided tests and were considered statistically significant if less than 0.05. Statistical analyses were performed using the SPSS (version 12.0).

Results

The overall cancer detection rate showed no statistically significant differences between the SFM and FFDM periods (0.42 vs 0.43 %; p = 0.685, Table 1). However, the detection rate of DCIS rose during the FFDM period (0.05 % vs 0.09 %; p = 0.010), only in the initial screenings (0.06 % vs 0.12 %; p = 0.031). The rate of interval cancers showed no statistically significant differences between the two study periods (0.14 % vs. 0.14 %, respectively; p = 0.816), whereas the rate of false positives dramatically decreased during the FFDM period, from 4.79 % to 3.38 % (p < 0.001).

Table 1 Screening performance indicators for screen-film mammography and digital mammography

Table 2 shows tumour-related characteristics according to whether cancers were detected by SFM or FFDM. Tumours detected by FFDM tended to be diagnosed at earlier stages, the proportion of DCIS being 20.30 % in FFDM and 13.14 % in SFM (p = 0.092). When invasive cancers only were considered, the proportion of cancers smaller than 20 mm was significantly higher during the FFDM period than during the SFM period (78.90 % vs. 69.37 %, respectively; p = 0.040). In DCIS, the percentage of high-grade tumours was higher in those detected with FFDM than with SFM, although this difference was not statistically significant (60.34 % vs. 51.22 %, respectively; p = 0.581).

Table 2 Tumour-related characteristics of cancers detected by screen-film mammography and full-field digital mammography

Figure 2 shows the distribution of tumour size of invasive screen-detected cancers by study period. During the digital periods, the percentages of invasive cancers smaller than 20 mm tended to increase (73.1 % in the 4th SFM period, and 80.3 in the 4th FFDM period; p = 0.095), whereas the proportion of T2 (tumours from 20 to 50 mm) became smaller in comparison with the SFM periods (16.4 % in the 4th SFM period and 11.5 % in the 4th FFDM period; p = 0.095).

Fig. 2
figure 2

Distribution of tumour size of invasive screen-detected cancers by study periods

Table 3 shows the rates of screen-detected cancers, interval cancers, false positives, recall rate and accuracy measures during the study period. The rate of screen-detected cancers remained fairly stable across the study periods, with no particular trend. The interval cancer rate was 0.13 % in the first SFM and increased to 0.21 % in the first FFDM period. Thereafter, it gradually declined, reaching the lowest values in the last FFDM period (0.11 %). The false-positive rate and the recall rate showed a decreasing trend over time, especially from the beginning of the digital period. Sensitivity, specificity and PPV increased during the digital period.

Table 3 Rates of screen-detected cancers, interval cancers, false positives and accuracy measures by study period

Rates of screen-detected cancers, invasive cancer and DCIS over time are shown in Fig. 3. Invasive cancer rates slightly decreased in the last FFDM periods, while those of DCIS increased over the same periods.

Fig. 3
figure 3

Rates of overall screen-detected cancer, invasive cancer and ductal carcinoma in situ (DCIS) over the study period

Figure 4 plots the adjusted OR for cancer detection at screening (overall, invasive and DCIS) and for interval cancers. The introduction of FFDM did not increase cancer detection at screening, although the risk increased in the 2nd and 4th FFDM periods, without reaching statistical significance [2nd DM period, OR = 1.19 (95 % CI 0.88–1.62); 4th FFDM period, OR = 1.14 (95 % CI 0.84–1.55)]. The highest detection of interval cancer was found in the first FFDM period [OR = 1.69 (95 %CI 1.03–2.77)] but the trend then dramatically decreased over the subsequent FFDM periods. In the FFDM periods, the risk of DCIS detection increased significantly from an OR = 1.58 (95 % CI 0.65–3.80) in the first FFDM period to an OR = 2.68 (95 % CI 1.19–6.00) in the fourth DM period. Nevertheless, detection of invasive cancers did not show a clearly declining pattern in the adjusted model.

Fig. 4
figure 4

Logistic regression models for the detection of screen-detected cancer, interval cancer, ductal carcinoma in situ (DCIS) and invasive cancer. The models are adjusted by radiology unit, age (continuous) and whether the diagnosis was established at the initial or successive screenings. a Screen-detected cancers, b interval cancers, c ductal carcinoma in situ (DCIS), d invasive cancers. The reference category is the first screen-film mammography period. The black points and the vertical lines represent the odds ratio (OR) and the corresponding 95 % confidence intervals, respectively

The results of the sensitivity analysis revealed that inclusion versus exclusion of the 3-month learning period after the switch to digital technology did not significantly affect any of the screening indicators presented. Equally, the inclusion of the first screening round did not modify the direction of our findings.

Discussion

No differences were observed in overall cancer detection rates between the SFM and FFDM periods. However, DCIS rates increased with digital mammography, and remained higher throughout the digital periods, while the invasive carcinoma detection rate was somewhat lower in the FFDM periods. Information on tumour characteristics pointed to a stage shift, especially when we analysed the size of invasive tumours. The proportion of smaller invasive tumours tended to increase during the FFDM periods. The proportion of high-grade DCIS was also higher, but not statistically significant, in the FFDM period. False-positive rates remained lower in all the digital periods with no increase in the interval cancer rate. No differences were observed in sensitivity between the two periods, but specificity and especially PPV increased throughout the FFDM periods. The adjusted screen-detected cancer (overall and invasive) and interval cancer rates showed no statistically significant differences throughout the study period, but detection of DCIS increased in the FFDM period.

Most studies comparing the two technologies have detected an increase in DCIS detection rates with FFDM [2, 4, 6, 7, 9, 12]. In agreement with our results, the increase has been observed especially in initial screenings [4, 7] but also in successive ones [4, 7, 9]. The significance of this increase on DCIS has been little studied and it is not known whether it is due to earlier diagnosis [4, 6] or to overdiagnosis [2].

An earlier diagnosis should be followed by a decrease of invasive cancers or at least by detection of less advanced invasive cancers. Our results pointed to an earlier diagnosis, showing not only significant differences in TNM stages, with a higher proportion of DCIS in the FFDM period, but also a higher proportion of smaller, invasive tumours along the FFDM periods, which has also been reported by other authors [6]. In line with our findings, recent work by Drukker et al. reported a higher detection of biologically high-risk cancers with FFDM [16]. Moreover, as previously reported [17], among DCIS we found a higher proportion of high-grade tumours in the FFDM period, although this difference was not statistically significant. Because low-grade DCIS may have a lower potential to become invasive [18, 19], the similar proportions observed in the two periods do not support the hypothesis of overdiagnosis. However, larger series are necessary to confirm this trend.

After the shift to FFDM, numerous studies in Spain [7, 20] and elsewhere [21, 22] observed a reduction in the recall rate, which led to suspicions of an increase in interval cancers and false negatives. The current work, in agreement with previous studies [6, 11], confirms that the reduction in recall rate was not due to an increase in interval cancer during the FFDM period. Although we have no information on interval cancer subtypes, previously published data refute the hypothesis that the introduction of FFDM has increased the number of false-negative cancers [10, 11]. However, an increasing trend in the detection of interval cancers was observed in the SFM periods, with the highest interval cancer rate in the first FFDM period and a clear subsequent decrease. These increases could be partly attributed to an improvement in the mechanisms to detect interval cancers introduced in the screening programs during their implementation (active follow-up of women, merging data from the screening program registers with the regional Minimum Basic Data Set—based on hospital discharges with information on the principal diagnosis—and hospital-based cancer registries). However, after the shift to FFDM, no further changes were made to those mechanisms.

Some studies have described a reduction in PPV after the introduction of FFDM [2, 5], whereas others [4, 68, 12], including ours, have shown an increase, especially in successive screenings. The implementation of digital technology in our population-based screening program reduced the number of adverse effects related to false-positive results and costs [23], with similar cancer detection and interval cancer rates.

Because cancer detection rates are affected by other factors, a logistic model was used, adjusting by radiology unit, initial and successive screening, and age. Overall, the adjusted OR of detection of cancer or interval cancer remained unchanged throughout the period. However, the odds of DCIS detection were higher in the FFDM periods than in the SFM periods. To date, no other studies have adjusted the risk by time trends; a recent Norwegian study found that the risk of DCIS detection was increased during the whole FFDM period, but no increased risk was found for screen-detected cancers overall, invasive cancers or interval cancers [6].

This study has some limitations. Although the study period is one of the longest ever analysed, the number of DCIS does not allow exploration of trends in tumoral grade during the digital period. We had no information on interval cancer subtypes, and therefore we could not assess the behaviour of false negatives before and after the shift to digital technology.

The main strength of the current work is the long study period, with more than 6 years of complete digitalization. This allowed us to study trends over time, beyond the first digital period after the transition. Because of the wide variability observed in the performance indicators of screening programs, both within and between countries, cross-sectional observations cannot reflect the real impact of one intervention. Differences observed in a given screening participation (screening round) may be compensated for in subsequent screening participations. After confirming that the direction of our findings was not modified, we excluded the first screening round of both the radiology units, both of which used SFM, which reduced the sample size in the analogical period but ensured that the indicators in the SFM period were not confounded by the prevalence peak after the implementation of screening, or by the larger tumour sizes found in the first screening round [24]. We included the first 3 months after the switch to digital mammography because it did not modify the outcomes.

In conclusion, this study supports the idea that the increase in DCIS detection observed with the introduction of FFDM is partly due to an improvement in early diagnosis and confirms the reduction in the false-positive rate with no increase in the interval cancer rate. In view of the current results, which are supported by those from previous studies performed in different countries, the use of digital technology should not be seen as a threat that increases the negative effects of screening through overdiagnosis. Future recommendations on screening performance and quality standards should be updated with information from screening settings using this technology.