Objectives Many clinical practice guidelines and consensus statements (CPGs/consensus statements) have been developed for the surgical treatments for breast cancer. This study aims to evaluate the quality of these CPGs/consensus statements.
Methods We systematically searched the PubMed and EMBASE databases, as well as four guideline repositories, to identify CPGs and consensus statements regarding surgical treatments for breast cancer between January 2009 and December 2016. We used the Appraisal of Guidelines for Research and Evaluation (AGREE) instrument to assess the quality of the CPGs and consensus statements included. The overall assessment scores from the AGREE instrument and radar maps were used to evaluate the overall quality. We also evaluated some factors that may affect the quality of CPGs and consensus statements using the Mann-Whitney U test or Kruskal-Wallis H test. All analyses were performed using SPSS V.19.0. This systematic review was conducted according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines.
Results A total of 19 CPGs and four consensus statements were included. In general, the included CPGs/consensus statements (n=23) performed well in the ‘Scope and Purpose’ and ‘Clarity and Presentation’ domains, but performed poorly in the ‘Applicability’ domain. The American Society of Clinical Oncology (ASCO), National Institute for Health and Care Excellence (NICE), Scottish Intercollegiate Guidelines Network (SIGN), New Zealand Guidelines Group (NZGG) and Belgium Health Care Knowledge Centre (KCE) guidelines had the highest overall quality, whereas the Saskatchewan Cancer Agency, Spanish Society of Medical Oncology (SEOM), Japanese Breast Cancer Society (JBCS) guidelines and the D.A.C.H and European School of Oncology (ESO) consensus statements had the lowest overall quality. The updating frequency of CPGs/consensus statements varied, with the quality of consensus statements generally lower than that of CPGs. A total of six, eight and five CPGs were developed in the North American, European and Asian/Pacific regions, respectively. However, geographic region was not associated with overall quality.
Conclusions The ASCO, NICE, SIGN, NZGG and KCE guidelines had the best overall quality, and the quality of consensus statements was generally lower than that of CPGs. More efforts are needed to identify barriers and facilitators for CPGs/consensus statement implementation and to improve their applicability.
- breast cancer
- surgical management
- AGREE instrument
- quality of guidance document
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
- breast cancer
- surgical management
- AGREE instrument
- quality of guidance document
Strengths and limitations of this study
This was a systematic review conducted following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, including descriptions of key methodological steps, results and discussion.
This was the first study, to our knowledge, to systematically assess the methodological quality of CPGs and consensus statements regarding surgical treatments for breast cancer using the Appraisal of Guidelines for Research and Evaluation II instrument.
We only searched two databases and four guideline repositories and only included literature published in English. Only CPGs and consensus statements published after January 2009 were included.
Surgical treatment is the major approach for patients with non-metastatic breast cancer.1 The quality of surgical treatment of breast cancer depends on a variety of factors, including the surgeons’ perspective as well as the patient’s socioeconomic status and resources.2 Among these, surgeons’ perspective is an important factor that is associated with the services provided and is shaped by a variety of factors, including clinical practice guidelines or consensus statements (CPGs/consensus statements). CPGs/consensus statements have been developed to optimise and standardise the surgical management of breast cancer to improve the quality of care. They should provide clear, comprehensive and evidence-based recommendations to reduce the gap between research and clinical practice.3 However, when developed by different institutions and/or countries, CPGs/consensus statements may provide equivocal or inconsistent recommendations due to different perspectives, local resources or updating frequency, among other factors. This can result in confusion among healthcare providers in clinical practice regarding which CPGs/consensus statements to follow and what to consider when applying the recommendations. Such confusion may affect healthcare providers’ implementation and adherence to the CPGs/consensus statements, which may in turn affect long-term patient outcomes.4–6 For example, the National Comprehensive Cancer Network (NCCN) has incorporated the conclusions of the ACOSOG Z0011 study7; patients fitting the Z0011 criteria may be spared from axillary lymph node dissection (ALND) if their surgeons follow the NCCN recommendations. Some CPGs/consensus statements suggest that patients meeting the Z0011 criteria may be eligible to avoid ALND, a recommendation that may sometimes be ambiguous. Healthcare providers may not be able to find clear statements in these CPGs/consensus statements regarding additional considerations in patients fitting the Z0011 criteria to avoid ALND. Clarity and unambiguity of recommendations are important factors in the implementation of CPGs/consensus statements and reflect their methodological quality.
The methodological quality of CPGs/consensus statements is an important factor to guide surgeons regarding which CPGs/consensus statements they should follow and also aids CPGs/consensus developers in considering their strategy for developing and updating their CPGs/consensus statements.8 Several instruments have been developed to assess the methodological quality of CPGs/consensus statements. Among them, the Appraisal of Guidelines for Research and Evaluation (AGREE) II instrument is the most popular and has been validated internationally.9–11 In this study, we conducted a systematic review of the CPGs/consensus statements regarding the surgical management of breast cancer and assessed their methodological quality using the AGREE II instrument. We also investigated potential factors that might be associated with quality.
This review was performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, thus providing a comprehensive framework for objectively assessing quality indicators and the risk of bias in the included CPGs/consensus statements.
Data sources and searches
Recent progress in scientific researches has led to advances in surgical treatments for breast cancer over the past decade, resulting in a need to update many CPGs and consensus statements. We therefore only searched studies published between January 2009 and December 2016. Two independent reviewers screened the PubMed and EMBASE databases for guidelines and consensus statements on surgical treatments for breast cancer. The search strategy included terms related to breast cancer, surgical treatments, guideline and consensus. Online supplementary file 1 has the full PubMed search strategy, which was adapted to suit other databases. Additionally, four guideline repositories, the National Guideline Clearinghouse (USA), the National Library for Health (UK) on Guideline Finder, Canadian Medical Association Infobase (Canada) and the Guidelines International Network (G-I-N) International Guideline Library were manually searched. We also performed a search of the websites for the organisations that developed those CPGs/consensus statements.
Inclusion and exclusion criteria
According to the National Guideline Clearinghouse, we defined CPGs as statements that included recommendations intended to optimise and standardise patient care informed by a systematic review of evidence and assessment of the benefits and risks of alternative care options.12 13 Consensus statements based on comprehensive or systematic reviews and providing clinically relevant suggestions based on the collective opinion of an expert panel12 were also included.
We included CPGs/consensus statements if they met the following criteria:
addressed issues about surgical management of breast cancer, including breast surgery and axillary surgery;
published in English;
CPGs/consensus statements focused on a specific topic that was irrelevant to the surgical management of breast cancer, for example, screening guidelines;
CPGs/consensus statements focused only on metastatic breast cancer, as surgical management in these patients is not the primary recommendation14 15;
CPGs/consensus statements focused on breast reconstruction surgery, such as prosthesis implantation, autologous reconstruction;
CPG/consensus statements ‘for education and information purpose’ or ‘out of date’ because the organisations declared that CPG/consensus statements may no longer be consistent with recent evidence;
draft or unpublished guidelines, discussion papers, personal opinions and obsolete guidelines replaced by updated recommendations from the same organisation.
Several additional principles were followed:
If multiple updated versions of a CPG/consensus statement were available, the most recent one was included.
If doubts existed regarding whether an article was a CPG/consensus statement, we verified its eligibility by checking the inclusion criteria of similar reports in the National Guideline Clearinghouse.
Two authors (XL and FTL) independently searched and identified eligible CPGs/consensus statements and collected the full text of the CPGs/consensus statements and related supplementary materials if available. The authors met to gather and compile all available information to ensure that no relevant information was missed and to ensure that all three reviewers reviewed the same materials. Discrepancies or inconsistent findings were discussed together with the third author (SYL). Because this was a systematic review, the ethical approvals of Sun Yat-Sen Memorial Hospital and the First Affiliated Hospital, Sun-Yat Sen University, were waived.
Guideline quality assessment
The quality of each CPG/consensus statement was independently evaluated by three different reviewers (XL, KC and YS) using the AGREE II Instrument9 11 (updated: September 2009). The AGREE II11 instrument evaluates 23 items categorised into six domains, including Scope and Purpose, Stakeholder Involvement, Rigour of Development, Clarity and Presentation, Applicability and Editorial Independence. Reviewers scored each item ranging from 1 (strongly disagree) to 7 (strongly agree). A score of 1 is assigned when no information concerning that item is available, while a score of 7 indicates that clear information is evident and the full criteria were met. The domain score is calculated by scaling the total obtained score of that domain as a percentage of the maximum possible score for that domain using the following formula:
Domain score = (obtained score-minimum possible score)/(maximum possible score-minimum possible score)
For example, if three reviewers assessed the CPGs/consensus statements, with four items within a domain, the maximum possible score and minimum possible score was 7*4*3=84 and 1*4*3=12, respectively. If a total score of 30 was obtained, the domain score was (30 – 12)/(84 – 12)=0.25.
If the actual total score of all items within one domain between any two of the three appraisers differed by >30% of the maximal total score of all items within that domain, disagreements were discussed by all reviewers, together with a fourth author (SXF), to ensure that all necessary information (supplementary files, website pages, full-text) was collected. After discussion, the three reviewers (XL, KC and YS) re-evaluated the CPGs/consensus statements and resubmitted their final domain scores. The reviewers could keep the previous score without any changes after discussion. Consistency among reviewers on AGREE II scores was assessed using the intraclass correlation coefficient (ICC).
In addition to the six domains, an overall assessment is included in the AGREE II instrument. Three reviewers (XL, KC and YS) scored the overall quality of the CPGs/consensus statements from 1 to 7; the overall assessment score was calculated using the same equation as that used for the domain scores. Additionally, we used a radar map to illustrate the domain scores of each CPG/consensus statement and calculated the total area of the radar map as a reflection of the overall quality of the CPGs/consensus statements. The radar map areas were expressed as percentages of the maximal area. The association between radar map areas and overall assessment score was tested using linear regression analysis. The reviewers also categorised the CPGs/consensus statements into three groups: recommending the CPGs/consensus statements for use, recommending the CPGs/consensus statements for use with modifications and not recommending the CPGs/consensus statements for use.
Factors associated with guideline quality
Two authors (XL and LLZ) developed a data extraction plan to collect the main features of each CPG/consensus statement (eg, year of publication, country/region, year of publication, update frequency). The quality (radar map and overall assessment scores) of the CPGs/consensus statements according to these factors was compared using the Mann-Whitney U test or the Kruskal-Wallis H test as appropriate. p<0.05 was considered statistically significant. All analyses were performed using SPSS V.19.0.
Search results and characteristics
A total of 19 guidelines16–34 and 4 consensus statements35–38 were identified for final evaluation (figure 1). Among the 19 CPGs included, six,18 23 28–30 33 eight16 17 24–26 31 32 34 and five19–22 27 CPGs were developed in North American, European and Asian/Pacific regions, respectively (table 1).
The update frequency of each CPGs/consensus statement varied (table 2). The National Institute for Health and Care Excellence (NICE),17 Malaysia,21 New Zealand Guidelines Group (NZGG),20 Cancer Australia-National Breast and Ovarian Cancer Centre (CA-NBOCC)22 guidelines and the Biedenkopf35 consensus statement have not been updated since 2011. The German Group for Gynaecological Oncology (AGO)32 and the National Comprehensive Cancer Network (NCCN)18 guidelines have been updated annually, and the St. Gallen36 consensus statement has been updated every other year.
The website links of the included CPG/consensus statements are listed in online supplementary table 1. The ICCs of the three reviewers for each guideline/consensus statement ranged between 0.90 and 0.99 (online supplementary table 2.1); the ICCs of the three reviewers for each domain of AGREE II ranged between 0.82 and 0.96 (online supplementary table 2.2), suggesting good agreement of rating scores among the three reviewers.
Overall quality assessment
The overall assessment scores and radar map areas were significantly correlated (online supplementary figure 1) (R2=0.835, p<0.05). The overall assessment scores suggested that the American Society of Clinical Oncology (ASCO),28 NICE,17 NCCN,18 Scottish Intercollegiate Guidelines Network (SIGN),16 NZGG20 and Belgium Health Care Knowledge Centre (KCE)26 guidelines had the best overall quality, whereas the Saskatchewan Cancer Agency (SASK),23 Spanish Society of Medical Oncology (SEOM),31 Japanese Breast Cancer Society (JBCS)19 guidelines, and the D.A.C.H.37 and European School of Oncology (ESO)38 consensus statements had the poorest overall quality (table 3). Radar map areas suggested that the ASCO,28 SIGN,16 NICE,17 NZGG20 and KCE26 guidelines had the best overall quality, whereas the SASK,23 SEOM31 and JBCS19 guidelines and the D.A.C.H.37 and ESO38 consensus statements had the poorest overall quality (table 3, figure 2). All three reviewers categorised the SIGN,16 KCE,26 ASCO28 and Malaysia21 guidelines as ‘recommend for use’, whereas all three reviewers categorised the SASK23 guideline as ‘not recommend for use’ in the overall assessment (table 3).
In general, the median domain scores (range) of the Scope and Purpose, Stakeholder Involvement, Rigour of Development, Clarity and Presentation, Applicability and Editorial Independence domains were 74.1% (57.4%–92.6%), 38.9% (13.0%–94.4%), 54.9% (14.6%–89.6%), 79.6% (40.7%–92.6%), 23.6% (4.2%–73.6%) and 63.9% (8.3%–91.7%), respectively. All included CPGs/consensus statements scored >50% in the Scope and Purpose domain. In contrast, only five CPGs/consensus statements16–18 20 28 scored >50% in the Applicability domain. Five of the CPGs/consensus statements16–18 20 28 had all domain scores>50%, while in SEOM,31 all but the Scope and Purpose domain scored <50%. The domain scores of each CPGs/consensus statement are listed in table 2.
Factors associated with quality
CPGs versus consensus statements
In total, 4 consensus statements and 19 CPGs were included in this study. In general, consensus statements had lower overall quality than CPGs. The median (range) of the radar map area was 19% (15.1%–22.6%) and 34.6% (5.8%–67.7%) for consensus statements and CPGs, respectively (p=0.10). The median (range) of overall assessment scores ranged between 72.2% (22.2%–94.4%) and 41.7% (33.3%–44.4%) for consensus statements and CPGs, respectively (p=0.01). As shown in table 2, consensus statements had lower average domain scores than CPGs in Stakeholder Involvement (consensus statements 30.1% vs CPGs 52.7%, p=0.133), Rigour of Development (consensus statements 30.1% vs CPGs 61.3%, p=0.062) and Applicability (consensus statements 14.6% vs CPGs 33.9%, p=0.088) domains, none of which were statistically significant.
The median (range) of the radar map areas were 26.4% (7.6%–67.5%), 41.4% (5.8%–67.7%) and 47.6% (9.4%–59.6%), and the median (range) of overall assessment scores were 63.9% (22.2%–88.9%), 72.2% (44.4%–94.4%) and 72.2% (38.9%–77.8%) for CPGs published in Europe, North America and Asian/Pacific regions, respectively. However, we did not observe any statistically significant differences in the radar map areas or the overall assessment scores among CPGs developed in different geographic regions (p>0.05).
Importance of CPGs/consensus statements and our major findings
The practice of breast cancer surgery varies in clinical practice. The underlying reasons for this variation may be multifactorial such as patient preferences, local resources and surgeons’ perspectives. CPGs/consensus statements with clear structure and presentation may help reduce the disparity in clinical practice and potentially increase the quality of care.
Because a growing number of institutions, working groups and/or governmental agencies have developed CPGs/consensus statements regarding surgical treatment, it would be helpful to know which CPGs/consensus statements are the most reliable. Therefore, assessing the quality of CPGs/consensus statements for breast cancer surgical treatments is important and informative. In this study, we found that the ASCO,28 NICE,17 SIGN,16 NZGG20and KCE26 guidelines had the best overall quality, whereas the SEOM,31 SASK,23 JBCS19 guidelines and the D.A.C.H.37 and ESO38 consensus statements had the poorest overall quality. These results were similar to those reported by Gandhi et al,39 which was done for CPGs/consensus statements for early breast cancer systemic therapy. They found that the NICE, ASCO and NZGG guidelines had the highest overall assessment scores, whereas the SASK, SEOM guidelines and the St. Gallen consensus statement had the lowest overall assessment scores. The SASK23 guideline had the poorest quality in both our and S. Gandhi’s studies; it scored poorly in the ‘Applicability’ and ‘Editorial Independence’ domains. Low scores in the Applicability domain might suggest poor guideline implementation. In addition, we did not find any statement in the SASK guideline regarding conflicts of interest of the guideline development group members, which led to its low score in the Editorial Independence domain. Healthcare providers should therefore use caution when choosing which CPG/consensus statement to follow.
The Rigour of Development domain of the AGREE II instrument assesses whether the CPGs/consensus statements provide a procedure for updating the guideline. However, there is no recommended optimal schedule for updating CPGs/consensus statements. We found that the update frequency of CPGs/consensus statements varied. Timely updates based on newly published studies could facilitate the acceptance and implementation of these CPGs/consensus statements. For example, the Z0011 study7 was published in 2011, when some controversies existed. However, the NCCN guideline incorporated the results of the Z0011 in 2012. Meanwhile, the ALND rate significantly decreased in the USA, from 71% in the pre-Z0011 era (January 2007–April 2011) to 7% in the post-Z0011 era (April 2011–February 2014).40 The reduction in the ALND rate was also observed in studies from different countries.40–42 Therefore, the timely updating of the NCCN guidelines may accelerate the change of clinical practices. In contrast, the Malaysia21 and the NZGG20 guidelines did not include any recommendations about the Z0011 trials as they have not been updated since 2011. Therefore, these two CPGs should be considered to be out of date. Physicians should use caution when adhering to these CPGs, despite them having higher scores using the AGREE II instrument.
CPGs versus consensus statements
We found that, although without statistical significance, the overall methodological quality of CPGs was better than that of consensus statements, which was consistent with Jacobs’ findings.43 In their study, they found that the score of the Rigour of Development domain for consensus statements was 32% lower than that of CPGs (p<0.0001). The score of the Editorial Independence domain was 15% lower for consensus statements than for CPGs (p = 0.0003). The differences between CPGs and consensus statements may be multifactorial. First, systematic reviews are performed more frequently for CPGs than for consensus statements. Some consensus statements are based on comprehensive literature searches rather than systematic reviews. Second, most consensus statements are developed by one round of voting of panel members, whereas for CPGs, several rounds of drafting, revision and discussion, voting and peer reviews are used. Third, the authors of consensus statements may not necessarily comply with all domains of the AGREE instrument. However, despite less rigorous development of consensus statements, they are still valuable resources if they are developed in response to a recently identified issue or newly recognised gap in healthcare based on high-quality evidence, such as the optimal negative margin for DCIS patients who will receive BCS. Therefore, physicians should weigh the advantages and disadvantages of consensus statements when they apply their recommendations in clinical practice.
The median scores of the Scope and Purpose and Clarity of Presentation domains for all CPGs/consensus statements were >70%, suggesting that most of them had clear purposes and provided clear recommendations. The most poorly performing domain was the Applicability domain, which refers to the facilitators and barriers to guideline implementation strategies used to improve uptake and resource availability.11 Poor performance of CPGs in the Applicability domain is a common problem,12 39 44 reflecting that the implementation of guidelines and its barriers were not well addressed globally. To facilitate CPGs/consensus statement implementation, pilot studies and/or barrier analysis45 may identify facilitators and barriers to implementation.44 46 47 Feedback from stakeholders and users could also be informative and help to improve the incorporation of CPGs/consensus statements. Furthermore, widely accepted resource-stratified CPGs/consensus statements would be helpful. In some low-income and middle-income countries where certain diagnostic tests and treatments are unavailable, CPGs/consensus statements should be able to differentiate which services are basic standard of care from those services that could provide major improvements in disease outcomes but are cost prohibitive. Although this may be difficult for some reasons, such as considerations of patient values and preferences in each country/region, costs and resource-use implications, it is possible. The NCCN Framework for Resource Stratification stratified treatment pathways into four levels based on available resources—Basic, Core, Enhanced and NCCN guidelines18 48—and provided a tool to optimise treatment options given specific resource constraints. Additionally, ongoing efforts in healthcare quality improvement policy, such as the establishment of National Quality Strategy49–52 and the Institute for Health Improvement (http://www.ihi.org/Pages/default.aspx), should be recognised.
Several limitations of this study should be addressed.
First, lack of content appraisal is one of the major limitations of our study. To comprehensively evaluate CPGs/consensus statements, we need to assess not only the strength of their development processes, structure and presentation but also the content and strength of the evidence. Therefore, gathering a panel of experts or using an instrument, such as the Grade Approach53 developed by the National Guideline Clearinghouse, to evaluate the content and strength of evidence of CPGs/consensus statements should be considered in the future.
Second, the AGREE II instrument has a manual to guide reviewers on how to appraise CPGs/consensus statements, and reviewers score each item based on how much information is provided related to that item. However, reviewers cannot evaluate how much information is provided quantitatively, and scoring each item is therefore a subjective process.
Third, we only included CPGs/consensus statements published in English, so relevant non-English CPGs/consensus statements may have been missed.
Fourth, we included CPGs/consensus statements with different scopes, which may have used different approaches for development and presentation and therefore may have affected the methodological quality.
Our study showed that the ASCO, NICE, SIGN, NZGG and KCE had the highest overall quality, whereas SASK, SEOM, JBCS, D.A.C.H. and ESO had the lowest overall quality. All of the CPGs/consensus statements generally had lower scores in the Applicability domain. The consensus statements generally had lower quality than CPGs. The geographic regions in which the CPGs/consensus statements were developed were not associated with methodological quality. To comprehensively assess CPGs/consensus in the future, more efforts are needed to appraise content and the frequency of updates. Additional resource-stratified CPGs/consensus statements with more applicability for implementation in clinical practice are necessary.
XL, FL and SL contributed equally.
Contributors Each author certifies that he/she has made a direct and substantial contribution to the conception and design of the study, development of the search strategy, establishment of the inclusion and exclusion criteria, data extraction, analysis and interpretation. XL was involved in the literature search, data collection and analysis, quality appraisal and writing. FTL was involved in the literature search and writing. SYL was involved in the data collection and writing. YS conducted the quality appraisal. LLZ extracted and analysed the data. FS provided critical revision of the paper. KC was involved in the design of this study, conducted the quality appraisal and provided critical revision of the paper. SL was involved in the design of this study and provided critical revision of the paper. All authors read and provided final approval of the version to be published.
Funding This study was supported by the National Natural Science Foundation of China (grant # 81402201/81372817), National Natural Science Foundation of Guangdong Province (grant # 2014A030310070) and grant  163 from Key Laboratory of Malignant Tumour Molecular Mechanism of Guangzhou Bureau of Science and Information Technology. We appreciate the statistical advice provided by Yilong Education.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement All the tables and figures can be accessed on BMJ Open, and all the supplementary materials can be accessed upon request via email to the corresponding authors of this study.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.