Article Text

Original research
Prediction of breast cancer risk among women of the Mariana Islands: the BRISK retrospective case–control study
  1. Yurii B Shvetsov1,
  2. Lynne R Wilkens1,
  3. Kami K White1,
  4. Marie Chong1,
  5. Arielle Buyum2,
  6. Grazyna Badowski3,
  7. Rachael T Leon Guerrero3,
  8. Rachel Novotny4
  1. 1Cancer Center, University of Hawai'i at Mānoa, Honolulu, Hawaii, USA
  2. 2AB Consulting, LLC, Saipan, Northern Mariana Islands
  3. 3College of Natural and Applied Sciences, University of Guam, Mangilao, Guam
  4. 4College of Tropical Agriculture and Human Resources, University of Hawai'i at Manoa, Honolulu, Hawaii, USA
  1. Correspondence to Dr Yurii B Shvetsov; YShvetso{at}cc.hawaii.edu

Abstract

Objectives To develop a breast cancer risk prediction model for Chamorro and Filipino women of the Mariana Islands and compare its performance to that of the Breast Cancer Risk Assessment Tool (BCRAT).

Design Case–control study.

Setting Clinics/facilities and other community-based settings on Guam and Saipan (Northern Mariana Islands).

Participants 245 women (87 breast cancer cases and 158 controls) of Chamorro or Filipino ethnicity, age 25–80 years, with no prior history of cancer (other than skin cancer), residing on Guam or Saipan for at least 5 years.

Primary and secondary outcome measures Breast cancer risk models were constructed using combinations of exposures previously identified to affect breast cancer risk in this population, population breast cancer incidence rates and all-cause mortality rates for Guam.

Results Models using ethnic-specific relative risks performed better than those with relative risks estimated from all women. The model with the best performance among both ethnicities (the Breast Cancer Risk Model (BRISK) model; area under the receiver operating characteristic curve (AUC): 0.64 and 0.67 among Chamorros and Filipinos, respectively) included age at menarche, age at first live birth, number of relatives with breast cancer and waist circumference. The 10-year breast cancer risk predicted by the BRISK model was 1.28% for Chamorros and 0.89% for Filipinos. Performance of the BCRAT was modest among both Chamorros (AUC: 0.60) and Filipinos (AUC: 0.55), possibly due to incomplete information on BCRAT risk factors.

Conclusions The ability to develop breast cancer risk models for Mariana Islands women is constrained by the small population size and limited availability of health services and data. Nonetheless, we have demonstrated that breast cancer risk prediction models with adequate discriminatory performance can be built for small populations such as in the Mariana Islands. Anthropometry, in particular waist circumference, was important for estimating breast cancer risk in this population.

  • Breast tumours
  • Epidemiology
  • PUBLIC HEALTH

Data availability statement

Data are available upon reasonable request. The datasets generated and analysed during the current study are not publicly available because they contain protected health information. De-identified datasets are available from the senior author (RN at novotny@hawaii.edu) on reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of the study

  • The small sample size of this study is a direct consequence of the small population size.

  • Our model construction method is designed to overcome the challenge of small population size.

  • Bootstrap validation was used to minimise optimism bias.

  • Evaluation of model coefficients separately for Chamorro and Filipino women of the Mariana Islands accounted for possible differential effect of model predictors between these two ethnic groups.

Introduction

Breast cancer is the most common cancer among women worldwide.1 It is the second most common cause of cancer mortality among US women2 and has been the leading cause of cancer mortality among women on Guam over the last three decades.3

The Mariana Islands consist of two administrative units: Guam, a US territory, and the Commonwealth of the Northern Mariana Islands (CNMI), which includes the islands of Saipan, Tinian and Rota. The current population of Guam is ethnically mixed,4 with 37% Chamorro, 26% Filipino, 12% other Pacific Islander and 25% other ethnicity. CNMI is also diverse; its ethnic breakdown includes 24% Chamorro, 35% Filipino, 11% other Pacific Islander and 30% other ethnicity.5

While the breast cancer incidence rate on Guam is lower than across the USA, breast cancer mortality among some ethnicities on Guam, especially Chamorros, is higher than among US women.6 During 1998–2002 on Guam, the age-adjusted breast cancer incidence rate among Chamorro women was nearly twice as high as Filipino women and second only to white women (115.9, 60.7 and 148.6 per 100 000, respectively).7 The age-adjusted incidence rate for US women (not including data from the US affiliated Mariana Islands) during this time was 131 per 100 000 women. Chamorro women also had the highest breast cancer mortality rate on Guam, at 32 per 100 000 women.8 This contrasts with the overall US mortality rate for that time period of 28 per 100 000.

The reasons for higher breast cancer mortality rates, and relatively high incidence rates, among Chamorro Pacific Islanders compared with other ethnic groups in the Mariana Islands are not well understood. The Breast Cancer Risk Model (BRISK) Project was conducted to improve understanding of the risk factors for breast cancer in this region.9

Estimation of a woman’s breast cancer risk is an important tool used for risk assessment and stratification in breast cancer screening and prevention efforts. One of the most widely used models for predicting breast cancer risk is the Gail model, developed for white women10 11 and subsequently extended to include other race/ethnicities such as African American and Asian American women.12 13 This extended model is available as National Cancer Institute’s Breast Cancer Risk Assessment Tool (BCRAT).14 Although BCRAT includes Filipinos as one of the Asian American ethnicities, it is built from the Filipino population in SEER 9 registries,15 whose age-specific breast cancer incidence rates differ from those for Filipinos on Guam, a US territory (figure 1). A similar situation exists for Pacific Islanders, where only rates for Native Hawaiians are present in BCRAT. Additionally, BCRAT uses the same risk factors and relative risk estimates for all Asian American ethnicities; however, different breast cancer risk models are needed for adequate risk estimation for women of diverse racial/ethnic backgrounds,16 and while some of the established risk factors are associated with breast cancer risk in the Mariana Islands women, others are not.9 Due to these considerations, the utility of the BCRAT model for the Mariana Islands women is unknown.

Figure 1

Cumulative incidence rates of invasive breast cancer in Guam and the USA, 2000–2009. Sources: (1) Guam Cancer Registry; (2) Hawaii Tumor Registry; (3) Surveillance, Epidemiology and End Results (SEER) 18-registry data.

In the present report, we evaluate performance of the BCRAT model and its modified version among Chamorro and Filipino participants in the BRISK study. In so doing, we propose a method of risk model development for small populations which we use here for the development and internal validation of a new breast cancer risk model for Chamorro and Filipino women of the Mariana Islands.

Methods

BRISK study design and population

BRISK is a retrospective case–control study of mostly Asian and Pacific Islander women living on the Mariana Islands of Guam and Saipan.

A detailed description of the study design and recruitment is provided elsewhere.9 17 Briefly, breast cancer cases and controls were recruited between 2010 and 2013. Breast cancer cases were identified through the Guam Cancer Registry (GCR), CNMI Department of Public Health and health clinics on Guam. Controls were recruited in local clinics/facilities and other community-based settings on Guam and Saipan from among women with mammography screening and were frequency-matched to cases on age, ethnicity and location (Saipan or Guam). Eligibility criteria for all participants were: (1) no prior history of cancer (other than skin cancer); (2) residence on Guam or Saipan for at least 5 years; (3) ability to provide consent for the study and (4) age between 25 and 80 years. An additional eligibility criterion for cases was primary, invasive breast cancer newly diagnosed between 2009 and 2012.

During an interview, participants completed a detailed questionnaire including demographic, anthropometric, behavioural and lifestyle information; personal and family medical history; reproductive history; and acculturation based on a survey used in a multiethnic study.18 19 The reference date for the interview was the diagnosis date for cases and the interview date for controls. In addition, current waist circumference (WC), measured with an inelastic tape measure at the level of the umbilicus,20 weight, height and sitting height were measured by a trained anthropometrist. Body mass index (BMI) was calculated as kg/m2. Waist-height ratio (WHtR) was calculated as WC in cm divided by height in cm.

Of the 275 cases contacted, 38% agreed to participate, 21% were ineligible and 41% refused due to scheduling conflicts, lack of transportation, family, psychological or cultural reasons, or off-island travel.17 The corresponding percentages for controls were 74%, 20% and 6%. The study included 104 breast cancer cases (83 from Guam and 21 from CNMI) and 185 controls (140 from Guam and 45 from CNMI) between 27 and 80 years of age. A summary ethnicity variable was defined based on each participants’ self-reported composition of her mother’s and father’s ethnicities. The present analysis was limited to participants with summary ethnicity of Chamorro and Filipino residing on Guam and Saipan (87 cases and 158 controls).

Patient and public involvement

Patients were not involved in the development of the research question, design of the study, recruitment and conduct of the study. However, the study provided funds to the CNMI Public Health mammography programme to expand access and facilitate recruitment. The results were disseminated to study participants by public talks given at the University of Guam.

Breast cancer incidence and all-cause mortality rates

We obtained data from the GCR for all reportable female breast cancer diagnoses (n=576) on Guam for 2000–2009 (online supplemental table S1).17 Since data for CNMI were unavailable, Guam rates were also used to represent Saipan. Average annual age-specific incidence rates for female breast cancer were computed per ethnicity and 5-year age group, using interpolations between the US 2000 and 2010 female census counts for Guam as denominators. All-cause mortality rates were obtained from the Guam Statistical Yearbook.21 Since 2004 was the only year these rates were published, the rates for 2004 were used as a reasonable approximation for the 2000–2009 all-cause mortality rates.

Construction and selection of risk models

We assumed the general form of the Gail model,10 13 22 23 which projects absolute risk of breast cancer at a specified time interval using relative risk estimates for a set of risk factors, population breast cancer incidence rates and all-cause mortality rates. Risk factors considered for inclusion in the models were those identified in our previous report9 as having a statistically significant (p<0.05) association with breast cancer risk among Guam and Saipan women: age at first live birth (<20 or missing, 20–24, 25–29 or nulliparous, ≥30 years); BMI (<25, 25–29, ≥30); WHtR (≤0.54, 0.55–0.61 or missing, 0.62–0.67, >0.67) and WC (≤89, 90–99.5 or missing, >99.5 cm). Also considered for inclusion were the risk factors included in the original Gail model10 13 although they did not have a statistically significant association with breast cancer risk in our study: age at menarche (<12, 12–13, ≥14 years or missing); first-degree relatives with breast cancer (yes, no) and menopausal status (premenopausal, postmenopausal). As BMI, WHtR and WC were strongly correlated in our study, only one of these three factors was allowed to enter the model at a time. Following the approach of Gail et al,10 for each risk factor, missing values were grouped with the category showing the closest risk of breast cancer to participants with missing values, according to minimally adjusted logistic models. We constructed and evaluated models that included every combination of the above seven risk factors as main effects (a total of 127 models). For each such combination, the entire dataset was used to estimate ORs for the included risk factors using multivariable unconditional logistic regression, with adjustment for study participants’ age, among both ethnicities combined and separately for Chamorros and Filipinos. Model-based adjusted attributable risk (AR) corresponding to these risk factors was then computed.24 The Hosmer-Lemeshow statistic was computed to assess model fit. A risk model was constructed using the OR and AR estimates from the logistic model. To assess model performance, a bootstrap validation method was used, whereby a validation subset was randomly selected, containing 50% of breast cancer cases (n=42) and two age and ethnicity-matched controls per case. The model was applied to all participants in the validation subset to project the absolute risk of breast cancer for a 5-year period preceding the study interview date, and the area under the receiver operating characteristic curve (AUC) statistic was computed.

This bootstrap validation step was performed 100 times for each model, and the median AUC was computed. The top performing BRISK model was selected based on the highest median AUC for each ethnicity.

Evaluation of model performance

The final BRISK model was examined for its calibration and discrimination. The median AUC across bootstrap validation steps and its 95% CI were taken as the measure of discriminatory performance of the model. Calibration of the model was assessed by examining the case/control distribution within quintiles of predicted 5-year and 10-year absolute risk across the entire sample. The mean predicted risk of breast cancer was also computed for each quintile. Performance was compared with that of BCRAT.13 As Native Hawaiians are the only Pacific Islander ethnicity represented in BCRAT and are the closest to Chamorros in terms of culture and lifestyle, we used Native Hawaiian incidence and mortality rates when applying BCRAT to Chamorro women. Due to a lack of breast biopsy information in our sample, all women were assumed to have had no breast biopsies, the default value in BCRAT.

Additionally, to examine whether calibrating the BCRAT model to the Guam breast cancer incidence rates would improve its performance, we modified the BCRAT model by replacing incidence and mortality rates with those for Filipino and Chamorro women on Guam, while retaining risk factors and their relative risk estimates specified in the BCRAT; this modified model is referred to as BCRAT-G.

Results

The demographic, lifestyle and reproductive characteristics of the study participants included in the present analysis (n=245) are summarised in table 1. Briefly, the largest age group among both cases and controls was 50–59 years. One third of the participants (33%) were of Filipino ethnicity, the rest were Chamorros. The ethnic composition was similar among cases and controls by design, although the case to control ratio was somewhat higher among Filipino than Chamorro women (43% and 32% cases, respectively). Cases and controls had a similar proportion of women ever pregnant, pre-menopausal, parous and having ever breastfed, but somewhat differed in BMI, WC, WHtR, alcohol consumption and smoking.

Table 1

Characteristics* of breast cancer cases and controls among Chamorro and Filipino women of Mariana Islands in the BRISK study

The composition of the top BRISK model and its performance is summarised in table 2. The model included separate relative risk estimates among Chamorros and Filipinos for the included risk factors: age at menarche, age at first live birth and the number of first-degree relatives with breast cancer for both ethnicities, and additionally WC for Filipino women. The AUCs among Chamorros and Filipinos, respectively, were 0.64 and 0.67, based on the median across 100 validation runs.

Table 2

Performance of the BRISK model and BCRAT among Mariana Island women in the BRISK study.

The BRISK model classified more cases than controls into the highest risk stratum and more controls than cases into the lowest risk stratum among both ethnicities (figures 2 and 3), which indicates a good performance in terms of case/control distribution. Using case and control data, the BRISK model predicted a median 10 year absolute risk of breast cancer to be 1.28% for Chamorro women and 0.89% for Filipino women.

Figure 2

Classification of breast cancer cases and controls into risk strata among Chamorro women in the BRISK study: the BRISK Model, BCRAT and BCRAT-G. (A) Frequency of cases and controls by quintile of predicted risk. (B) Mean predicted 5-year risk by quintile of predicted risk. (C) Mean predicted 10-year risk by quintile of predicted risk. BCRAT, Breast Cancer Risk Assessment Tool; BRISK, Breast Cancer Risk Model.

Figure 3

Classification of breast cancer cases and controls into risk strata among Filipino women in the BRISK study: the BRISK Model, BCRAT and BCRAT-G. (A) Frequency of cases and controls by quintile of predicted risk. (B) Mean predicted 5-year risk by quintile of predicted risk. (C) Mean predicted 10-year risk by quintile of predicted risk. BCRAT, Breast Cancer Risk Assessment Tool; BRISK, Breast Cancer Risk Model.

The unmodified BCRAT and the modified BCRAT-G model exhibited similar performance among Chamorros (AUC: 0.60 and 0.59, respectively) while BCRAT performed non-significantly better than BCRAT-G among Filipinos (AUC: 0.55 and 0.51, respectively; table 2). Both models performed better among Chamorros than among Filipinos. Both BCRAT and BCRAT-G classified more controls than cases into the lower risk stratum among Filipinos, but not among Chamorros (figures 2 and 3). Both BCRAT and BCRAT-G classified more cases than controls into the higher risk stratum among both Chamorros and Filipinos.

Discussion

To our knowledge, this is the first study that tested existing, as well as developed new, breast cancer risk models in a small, isolated population such as the Mariana Islands and in Pacific Islander populations other than Native Hawaiians. Developing or validating cancer risk models for populations such as Mariana Islands is challenging. Due to its unique ethnic composition and lifestyle, this population may be subject to unique risk factors not affecting other populations. The small population size places a natural restriction on the sample size of any epidemiologic study and reduces statistical power for potential model development. The population’s geographic isolation results in the absence of sufficiently large comparable populations for external model validation.

A key challenge in our study was its small sample size, largely precipitated by the small size of the target population and newly emerging breast cancer registries. It is generally recommended that any new risk prediction model should include internal validation, either as bootstrap validation or using training and validation subsets.25 26 As splitting a small dataset into training and validation parts would cause instability in the relative risk estimates and consequently in the resulting model, we have implemented a bootstrap validation procedure and used the entire dataset for parameter estimation. Our method produced a model that performed reasonably well, with AUC of 0.64–0.67 comparable to the AUC range of 0.53–0.68 for other published models.27 28 We also found that performance of the BCRAT model was modest among Chamorro and Filipino women in our study, with AUCs not exceeding 0.60. The poor performance of BCRAT-G indicates that replacing population incidence and mortality curves with those from the target population did not improve model performance.

There are several possible reasons that could explain the observed differences in model performance. First, in addition to the established risk factors in the Gail model, only the risk factors that exhibited significant associations with breast cancer risk in BRISK were considered for inclusion in the development of the model. Including risk factors not significantly associated with the outcome may cause model overfitting,29 which may in turn bias the predicted absolute risk. In our previous report,9 no significant association between breast cancer risk and a number of known risk factors was found, but significant effects of several anthropometric factors such as BMI, WC and WHtR were observed on the risk of breast cancer. This may indicate a unique risk profile for this population or minimal variation in the known risk factors.

The BRISK model uses separate relative risk estimates by ethnicity with an additional risk factor (WC) for Filipinos; the model using joint estimates did not perform as well. This indicates that Chamorros and Filipinos have different breast cancer risk profiles, which should be taken into account in risk prediction models. The BRISK model included anthropometrics in the form of WC, which reinforces the need to consider anthropometric measures in breast cancer risk models. Body size is dramatically different among the Asian and Pacific Islander residents in the Mariana Islands, with Filipino women generally having smaller body size than Chamorro women.30 31 BMI and central obesity have been found to be associated with higher breast cancer risk among Asian women,32–34 and studies have demonstrated that the addition of body size variables improves prediction of breast cancer risk.35 The inclusion of WC for Filipinos only may have to do with the issue of differing body sizes and excess overweight/obesity rates among Chamorros, thus diminishing the predictive value of body size for breast cancer in this ethnic group.

The BRISK model included 3–4 risk factors out of seven considered for inclusion. It has been suggested that the complexity threshold for a risk prediction model is 20 cases per model parameter.25 29 Exceeding this threshold in terms of the number of model parameters increases the danger of overfitting. In our study, with 87 breast cancer cases, the optimal number of model parameters is <5, which is evidenced in the final model. Applying a similar method of model selection and validation to a larger dataset may have resulted in a model with more parameters.

A recent focus of the breast cancer risk model improvement efforts has been examination of modifiable risk factors and their impact on predicted breast cancer risk.36 The BRISK model includes WC, a modifiable factor. This opens the possibility of the model being used as a supplemental health assessment tool in health behaviour interventions, providing additional motivation for adoption of a healthier lifestyle that could decrease WC. As all predictors in the model can be collected from a patient questionnaire, the model can easily be implemented in most clinic settings including local clinics.

Limitations of our study include the small sample size as noted above, which may have prevented us from detecting important risk factors and, combined with limited response rate, may limit generalisability of findings. The failure to detect some expected associations (and thus to include the corresponding risk factors in the models) may also be due to a small sample size and lack of variability of some exposures in the study sample. The information on risk exposures was limited; in particular, performance of the BCRAT model could have been affected by the lack of information on breast biopsies in our study. The Guam breast cancer incidence rates covered a 10-year period and, thus, can be deemed reliable; however the all-cause mortality rates in our study are based on 1 year and thus may not be sufficiently stable. No CNMI breast cancer incidence or mortality rates were available and had to be approximated by the Guam rates. Although the ethnic composition of CNMI is similar to Guam, it is possible that, due to differing lifestyle and poorer access to healthcare among the CNMI population, the breast cancer incidence and overall mortality rates in CNMI differ from those on Guam. However, since the majority of study participants were from Guam, this potential difference in rates was unlikely to have a substantial impact on our results. Nonetheless, efforts are needed to collect and disseminate data on cancer incidence and mortality rates for CNMI, which would allow researchers to improve study results.

Because BRISK was a case–control study, we were unable to assess model calibration to population incidence rates, although we examined the internal calibration of the model. We note, however, that the AUC-based comparison of models is robust to mis-calibration23 and thus is a valid method in our study. We were also unable to perform external validation of the BRISK model, which is challenging given the unique nature and small size of this population, and remains a topic for future studies. Finally, AUCs based on the same dataset used for model construction may be overly optimistic.29 We used the bootstrap validation method to minimise the optimism bias, although some of it may still persist. Despite these limitations, our model construction method has produced a reasonably well performing breast cancer risk model for Chamorro and Filipino women of the Mariana Islands, and the first and only model for this population.

Conclusions

We have demonstrated that breast cancer risk prediction models with adequate discriminatory performance can be built for small populations such as the Mariana Islands. The proposed model has the potential of being useful as a supplemental tool for risk assessment and stratification in breast cancer screening and prevention in the Mariana Islands, but needs further refinement on larger samples of women and external validation on comparable Pacific Island populations.

Data availability statement

Data are available upon reasonable request. The datasets generated and analysed during the current study are not publicly available because they contain protected health information. De-identified datasets are available from the senior author (RN at novotny@hawaii.edu) on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and was approved by Institutional Review Boards at the University of Guam (approval ref. number 0982) and the University of Hawaii (approval ref. number 17796). Participants gave informed consent to participate in the study before taking part.

Acknowledgments

We thank Michelle Blas-Laguana, Ashley Yamanaka and Frances Santos-Hofschneider for recruiting and interviewing the BRISK Project participants, and the staff of Guam Radiology Consultants, FHP Clinic and Guam Seventh-Day Adventist Clinic for their assistance in the recruitment of participants. We also thank the participants on Guam and Saipan who volunteered to take part in the BRISK study.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors YBS conducted the primary statistical analysis, had primary responsibility for the final manuscript and is responsible for the overall manuscript as the guarantor. RTLG and RN led study concept and design. LW ensured integrity and accuracy of the study data. AB led data collection in Saipan. YBS, LW, KKW, MC, GB contributed to statistical analysis. YBS, LW, KKW, RTLG, RN interpreted the results and wrote the manuscript. YBS, LW, KKW, AB, RTLG, RN reviewed and approved the final manuscript.

  • Funding This work was supported by the US National Cancer Institute, Comprehensive Partnerships to Reduce Cancer Health Disparities grants U54-CA143727 and U54-CA-143738 and by the US National Cancer Institute grant R21-CA-220080.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.