Article Text

Original research
Risk prediction models for breast cancer: a systematic review
  1. Yadi Zheng1,
  2. Jiang Li1,
  3. Zheng Wu1,
  4. He Li1,
  5. Maomao Cao1,
  6. Ni Li1,2,
  7. Jie He3
  1. 1Office of Cancer Screening, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
  2. 2Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Jiangsu Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China
  3. 3Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
  1. Correspondence to Dr Ni Li; nli{at}cicams.ac.cn; Dr Jiang Li; lij{at}cicams.ac.cn

Abstract

Objectives To systematically review and critically appraise published studies of risk prediction models for breast cancer in the general population without breast cancer, and provide evidence for future research in the field.

Design Systematic review using the Prediction model study Risk Of Bias Assessment Tool (PROBAST) framework.

Data sources PubMed, the Cochrane Library and Embase were searched from inception to 16 December 2021.

Eligibility criteria We included studies reporting multivariable models to estimate the individualised risk of developing female breast cancer among different ethnic groups. Search was limited to English language only.

Data extraction and synthesis Two reviewers independently screened, reviewed, extracted and assessed studies with discrepancies resolved through discussion or a third reviewer. Risk of bias was assessed according to the PROBAST framework.

Results 63 894 studies were screened and 40 studies with 47 risk prediction models were included in the review. Most of the studies used logistic regression to develop breast cancer risk prediction models for Caucasian women by case–control data. The most widely used risk factor was reproductive factors and the highest area under the curve was 0.943 (95% CI 0.919 to 0.967). All the models included in the review had high risk of bias.

Conclusions No risk prediction models for breast cancer were recommended for different ethnic groups and models incorporating mammographic density or single-nucleotide polymorphisms among Asian women are few and poorly needed. High-quality breast cancer risk prediction models assessed by PROBAST should be developed and validated, especially among Asian women.

PROSPERO registration number CRD42020202570.

  • Epidemiology
  • Public health
  • Breast tumours
  • Risk management

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information. All data of the current study is present in the main manuscript, figures, tables and online supplemental material.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Thoroughly conducted systematic review collecting data from major existing databases.

  • Critically appraised published studies of risk prediction models for breast cancer in the general population and provide evidence for future research in the field.

  • Prediction model study Risk Of Bias Assessment Tool (PROBAST) was used to assess the quality of prediction models, which was developed through a consensus process involving a group of methodological experts in the area of clinical prediction tools and quality assessment.

  • Studies only about the external validation of the present risk models were not included in the review.

  • Our study highlighted high-quality breast cancer risk prediction models assessed by PROBAST should be developed and validated among different ethnic groups, especially among Asian women.

Introduction

Breast cancer is a major public health problem, and one of the most severe burdensome cancer among women worldwide,1 accounting for 11.7% of new cancer cases and 6.9% of cancer deaths in 2020. The prevalence of breast cancer is projected to increase over the coming years and is the most common cancer in women in 2020.2 Breast cancer prevention is associated with a reduction in mortality,3 and more researches are needed to improve the methods of identifying women at elevated risk and preventing the disease. Numerous breast cancer risk prediction models have been developed to identify the combined effect of risk factors for breast cancer, guide routine screening and genetic testing, and reduce the burden of breast cancer. Risk-stratified screening can improve cost-effectiveness and maximise benefits and minimise harms like overdiagnosis.4 Individualised prediction model for breast cancer could be used in practice to assist decision making about mass screening or opportunistic screening and treatment strategy.

A recent breast cancer screening guideline5 suggests that breast cancer screening increase the early detection rate and reduce the incidence if the screening is applied in appropriate at-risk populations. However, major gaps exist in our knowledge to determine the risk of breast cancer accurately in order to apply these approaches to appropriate populations of women.

A lot of breast cancer risk prediction models have been developed over the past few decades. Many breast cancer risk models have undergone validation including discrimination and calibration in study populations other than those used in initial development, or have been further assessed in comparative studies. Breast cancer-related predictors including hormonal factors, environmental factors, family histories, genetic factors and radiographic factors have been based on in these risk models, which would improve the generalisability. For example, the Gail model,6 one of the most famous models, has been widely used and validated worldwide since it was developed in 1989.7–12

This study is a systematic review of breast cancer risk prediction models by using meta-analysis and the Prediction model study Risk Of Bias Assessment Tool (PROBAST).13 14 The aim of our study is to systematically review published studies of risk prediction models for breast cancer in the general population, find more methods of predicting female breast cancer risk among one or more ethnic groups, prepare for the development of risk prediction models, and provide evidence for future research in the field.

Methods

The current review was designed according to the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies.15

Literature search and eligibility criteria

We systematically searched PubMed, the Cochrane Library and Embase from inception to 16 December 2021. The detailed search strategies were reported in online supplemental table 1. Articles identified from the search were loaded into EndNote V.X7 and duplicates were removed.

Inclusion criteria: (1) a model used data from cross-sectional studies, cohort studies, case–control studies and randomised controlled trials; (2) a model estimating the individualised risk of female breast cancer among one or more ethnic groups; (3) a model developed for the general population without breast cancer; (4) reported a multivariable (ie, at least two variables or predictors) model and (5) published in English.

Exclusion criteria: (1) external validation studies that only validated previous models in a different population without adding any additional information such as modifications on the risk factors and (2) models developed by machine learning.

Data extraction

Two reviewers screened the search results independently. Full-text reports were then assessed for eligibility with discrepancies resolved through discussion or a third reviewer.

We extracted information in two categories: (1) For all studies included in the review, we extracted the following information: author, publication year, study design, research method, targeted population, number of risk factors, risk factors, model performance and sample size of development. (2) For studies included validation part, we also extracted the following information: type of validation, study design, targeted population, model performance and sample size of validation. The information was extracted by one reviewer and checked by a second reviewer.

Risk of bias assessment

We used PROBAST to assess the reported prediction models, which is a new tool designed by a group of experts all over the world to assess the risk of bias and applicability of diagnostic and prognostic prediction models. It can be used in critical appraisal of studies that develop, validate or update prediction models for individualised predictions.13 14 In brief, it contains 20 signalling questions in four domains: participants, predictors, outcome and statistical analysis. Signalling questions can be answered as yes, probably yes, no, probably no or no information. A domain where at least one signalling question is answered as no or probably no should be judged as high risk of bias. Only if all domains are judged as low risk of bias, the total bias is judged as low risk as well.

Before putting PROBAST into use, we formed a 10-people study group including prediction model researchers, statisticians, evidence-based medicine specialists, etc to learn and practise the appropriate use of this new tool systematically. Only after everyone understood all these 20 questions totally, we would move to the peer quality assessment part. Risk of bias of every prediction model was assessed by two reviewers independently with discrepancies resolved through discussion or a third reviewer.

If there were more than one models developed in one study, we only assessed the risk of bias once due to their similarity. We also assessed the risk of external validation of prediction model when it was conducted in the same article that included model development.

Data synthesis and analysis

We calculated and reported descriptive statistics to summarise the characteristics of the models. We calculated the most frequently used risk factors and classified all risk factors into eight categories: Age, reproductive factors, family history of cancer, hormone, gene-related factors, lifestyle, medical history and test, and basic information. Classification details can be seen in online supplemental table 2. Then we used network diagram to see the connections of categorised risk factors. We used forest plot to describe the model performance. The expected observed (E/O) ratio was not included in the forest plot because it was only reported in 7 out of 40 studies. All analyses were performed using Stata V.16.0 and NetDraw.

Patient and public involvement

There was no patient or public involvement in this study.

Results

Study selection

A total of 92 519 indexed records (54 653 in PubMed, 30 374 in Cochrane Library and 7492 in Embase), 28 625 were eliminated as duplicates found in all databases, leaving a total of 63 894 publications. Forty-three articles were included primarily after screening by title and abstract. Three studies which were only about the external validation of previous models were excluded while full test screening, resulting in 40 studies with 47 models were included in the review eventually (figure 1).

Figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow chart.

Study characteristics

A brief summary of the 406 16–54 included studies is presented in online supplemental table 3. The included studies were published from 1989 to 2021. twenty-five of the studies were conducted over the past 10 years with five studies published in 2017 especially. Seventeen out of the 40 studies used data from case–control studies to develop prediction models,6 17 19 23–26 29–31 39 41 43 46 49 51 54 13 from prospective cohorts,16 18 20–22 27 33–37 40 52 8 from nested case–control studies28 38 42 44 47 48 50 53 and 2 from cross-sectional study.32 45 Thirty-one studies used logistic regression to fit prediction models,6 17–19 22–26 28–32 34 38–51 53 54 seven used cox proportional hazards regression,20 21 27 33 35 36 52 one used Poisson regression16 and one used competing risk regression.37 Of all 47 models in 40 studies, 16 models were developed in Caucasian women,6 16 18 23 26 28 29 34 40 42 45 47 50 53 13 in multiple ethnicities women,20–22 24 27 30 35–38 44 48 12 in Asian women,17 19 31 32 39 43 49 51 52 2 in African-American women,25 33 2 in Hispanic women,41 1 in Nigerian women46 and1 in Cypriot women.54

The association between eight categories of risk factors was shown in figure 2. Reproductive factors had the biggest node size, which meant that this factor was most frequently connected with other factors among prediction models. The number between two factors meant the times these two factors were included in the same models, some of which were over 30. For instance, reproductive factors and family history of cancer were included in the same models for 40 times, and reproductive factors and age were included in the same models for 31 times.

Figure 2

Network diagram of eight categorised risk factors (age, basic information, family history of cancer, gene-related factors, hormone, lifestyle, medical history and test, and reproductive factors).

Twenty-nine studies reported c-statistics,18–22 26–28 30–32 34–40 42 43 45–48 50–54 ranged from 0.59 (95% CI 0.57 to 0.61) to 0.943 (95% CI 0.919 to 0.967). Qiu et al51 had the highest c-statistics (0.943, 95% CI 0.919 to 0.967), and Lee et al19 and Salih et al45 reported area under the curve (AUC) over 0.8, 0.867 and 0.864 (95% CI 0.81 to 0.92), respectively. E/O ratios can be obtained from eight studies.22 27 29 32 35 36 46 52 Figure 3 shows that the overall AUC was 0.68 (95% CI 0.63 to 0.73) for 16 studies21 26 27 30 32 34 37 38 42 45 46 48 50–52 54 that reported the AUC and 95% CI. The AUCs of the subgroups in five studies18 22 31 39 47 were between 0.6 and 0.7.

Figure 3

Area under the curve (AUC) and CIs reported by the included studies.

In all these 40 studies, nine studies assessed prediction models with internal validation,22 26 27 33 39 44–47 10 with external validation,23 25 29 31 37 41 49 51–53 and 1 with both.32 Fifteen studies reported the discriminatory accuracy as the AUC,23 25 27 29 31–33 37 39 41 46 49 51–53 and 11 studies used the expected/observed event ratio (or observed/expected event ratio) to measure the calibration accuracy of the model.23 25 27 29 31 33 37 41 45 49 52

Quality assessment

A summary of the quality assessment is shown in table 1. Overall, all models assessed by PROBAST in the review had high risk of bias. There was a low and high risk of bias in the outcome and analysis domains, respectively. Over 60% models had low risk in participants domain and about 70% models had low risk in predictors domain, 32 models and 36 models, respectively (as shown in figure 4).

Figure 4

Risk of bias assessment (using PROBAST) of all assessed models based on four domains. PROBAST, Prediction model study Risk Of Bias Assessment Tool.

Table 1

Summary of risk of bias assessment

The main reasons for the high risk in analysis domain were model performance measures evaluated inappropriately, categorisation of continuous predictors, no reporting of overfitting and optimism in model performance and missing data handled inappropriately (online supplemental table 4).

Discussion

Summary of main results

This systematic review identified 40 studies with 47 risk prediction models developed and/or validated for breast cancer among different ethnic groups. Most of the studies used logistic regression to develop breast cancer risk prediction models for Caucasian women by case–control data. The most widely used risk factor was reproductive factors. Reproductive factors together with family history factor were used in most models. The highest AUC was 0.943 (95% CI 0.919 to 0.967) from Qiu et al.51 The overall AUC was 0.68 (95% CI 0.63 to 0.73) for 16 studies21 26 27 30 32 34 37 38 42 45 46 48 50–52 54 that reported the AUC and 95% CI. All the studies presented a high risk of bias due to the high risk in analysis domain, which were mainly because of model performance measures evaluated inappropriately, categorisation of continuous predictors, no reporting of overfitting and optimism in model performance and missing data handled inappropriately.

Agreements and disagreements with other reviews

As we can learn from the review, there were more and more risk prediction models of breast cancer over the past 30 years. Most of the models were developed in the Caucasian women, which agreed with the systematic review published by Louro et al.55 Compared with this review, we identified more prediction models and used a newly published tool to assess the quality of included models.

Over the past 10 years, some new variables (such as oral contraceptives, diabetes and alcohol consumption) have been included in prediction models. Increased use of the inclusion of common genetic variation in the prediction models was in accord with Louro et al55 and Anothaisintawee et al.56 However, neither of them included models developed with potential biomarkers like tumour-associated antigens. By contrast, we included one model developed by Qiu et al51 included five tumour-associated antigens. The model performed well with a high AUC 0.943 (95% CI 0.919 to 0.967).

Strengths and limitations of the study

PROBAST was developed through a consensus process involving a group of methodological experts in the field of clinical prediction tools and quality assessment. We used it to assess the quality of prediction models, which has been used widely in many fields57–60 since it came out.

Despite the strength, there are four main limitations. First, we did not systematically search grey literature. Therefore, some models may not be identified. Second, quality assessment could be thought to be subjective, which is an inherent bias of systematic review. However, two independent reviewers extracted and assessed the risk prediction models using PROBAST whose authors have indicated essentially objective guidelines and explanations. Moreover, studies only about the external validation of the present risk models were not included in the review, but the original developments of these risk models were covered. For instance, the study describes the original developments of Gail model6 was included in our research, while the studies only about the external validation of Gail model61–64 were not included. What’s more, papers about genetically oriented models like BOADICEA65 66 and BRACAPRO67 were not included in our study because some rare truncating/pathogenic variants like BRCA1 and BRCA2 are needed to be tested, which might be too expensive to use for general population in the mass screening.55

Implication to research and clinical practice

Eleven models19 30–32 37–39 43 45 50 54 selected predictors based on univariable analysis, causing a high risk in analysis domain, which should be avoided. Risk prediction models should include predictors those are well established and with clinical credibility regardless of any statistical significance.68 69 Because sometimes predictors only have important relationship with the outcome after adjustment for confounding covariates, and covariates hold no independent predictive power when other covariates are included.13 70

Some models were high risk in analysis domain because of missing data handled inappropriately, which may lead to biased associations between risk factors and breast cancer as well as biased model performance because of the selectivity of participants.71 So imputation techniques are supposed to apply when data are missing.72 73

When developing the risk prediction models, there were only nine studies included internal validation,22 26 27 33 39 44–47 leaving most models without internal validation. Lack of performing internal validation may increase the risk of overfitting.74 Thus, we suggest that internal validation should be performed before external validation.

PROBAST was created by many international experts, providing a series of guidelines about model development and validation, which can be easily applied and improve clinical practice of prediction models. So, the new and most recommended methodology should be used when a new model is developed or the existing models are updated.

In the light of the results of our review, it is still hard to recommend any of the models to be applied in the breast cancer screening due to the high risk of bias. Adding variables like mammographic density or single-nucleotide polymorphisms (SNPs) to risk-prediction models can improve the model performance and has been well validated in the general population of European-ancestry women.40 75–80 But the model incorporating breast density or SNPs among Asian women is few and poorly needed. Cost-effectiveness should be considered when a model is going to be applied in clinical practice. Because even though the model with some risk factors that cost more to get (eg, high risk gene) has better model performance, it is still hard to be applied in poor area.81 What’s more, an existing model should be modified or updated before used in another group of people with different characteristics, which may improve the performance of prediction models.

Breast cancer incidence has risen to the first place by 2020 all over the world, which makes it more crucial to develop breast cancer prediction models for different ethnic groups. In China, we have launched many breast cancer screening programmes. For example, Rural Women ‘two cancers’ Check Project Management Solutions have covered 31 provinces and 1437 counties since 2009. Cancer Screening Programme in urban China conducted by the National Cancer Centre has covered 28 provinces and 67 cities with more than 4 million people involved and 2 million people screened by ultrasound and Mammography since 2012. The programme will provide large data for us to develop a high-quality breast cancer risk prediction model in Chinese and will have great significance for breast cancer prevention of Asian women.

Conclusions

All 47 models assessed in our review using PROBAST performed the high risk of bias, leaving no model is recommended in the routine screening programme. Some new variables, like oral contraceptives, diabetes and alcohol consumption, have been widely used in prediction models over the past ten years. Models incorporating mammographic density or SNPs among Asian women are few and poorly needed. It is necessary to develop and validate high-quality breast cancer risk predication models among different ethnic groups, especially among Asian women.

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information. All data of the current study is present in the main manuscript, figures, tables and online supplemental material.

Ethics statements

Patient consent for publication

Ethics approval

This study does not involve human participants.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors YZ and JL conceptualised the study and created the first version of the review protocol. ZW, HL, MC, NL and JH critically reviewed the review protocol and approved it. YZ and HL screened eligible articles. YZ extracted the data, supported by ZW, MC. YZ drafted the first version of the manuscript, supported by JL, NL and JH. All authors contributed to data interpretation and critically assessed it. All authors approved the final version of the manuscript. NL was responsible for the overall content as the guarantor.

  • Funding This work was supported by the Non-profit Central Research Institute Fund of Chinese Academy of Medical Sciences (grant number: 2019PT320027).

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.