Article Text
Abstract
Objectives Women in mid-life often develop chronic conditions and experience declines in physical health and function. Identifying factors associated with declines provides opportunity for targeted interventions. We derived and externally validated a risk score for clinically important declines over 10 years among women ages 55–65 using the Physical Component Summary Score (PCS) of the SF-36.
Design Derivation and validation of a risk score.
Setting Two longitudinal cohorts from sites in the USA were used.
Participants Women from the Study of Women’s Health Across the Nation (SWAN) and women from the Women’s Health Initiative (WHI) Observational Study and/or clinical trials.
Outcome measures A clinically important decline over 10 years among women ages 55–65 using the PCS of the SF-36 predictors was measured at the beginning of the 10 years of follow-up.
Results Seven factors—lower educational attainment, smoking, higher body mass index, history of cardiovascular disease, history of osteoarthritis, depressive symptoms and baseline PCS level—were found to be significant predictors of PCS decline among women in SWAN with an area under the curve (AUC)=0.71 and a Brier Score=0.14. The same factors were associated with a decline in PCS in WHI with an AUC=0.64 and a Brier Score=0.18. Regression coefficients from the SWAN analysis were used to estimate risk scores for PCS decline in both cohorts. Using a threshold of a 30% probability of a significant decline, the risk score created a binary test with a specificity between 89%–93% and an accuracy of 73%–79%.
Conclusions Seven clinical variables were used to create a valid risk score for PCS declines that was replicated in an external cohort. The risk score provides a method for identifying women at high risk for a significant mid-life PCS decline.
- EPIDEMIOLOGY
- GERIATRIC MEDICINE
- GENERAL MEDICINE (see Internal Medicine)
Data availability statement
Data are available on reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
STRENGTHS AND LIMITATIONS OF THIS STUDY
These cohorts only include women, so their application in men requires testing in other cohorts.
The definition of variables was not identical across the two cohorts used for derivation and validation.
The large representative cohorts of women were both followed for at least a decade.
The methods employed were rigorous and follow standard reporting recommendations.
The variables included in the risk score are all easily measured, and the outcome measure has been used widely in various studies of longitudinal health.
Introduction
The mid-life is a period when physical health and function often begin to decline1 2 and may provide a window of opportunity to prevent future declines.3 Research suggests that both patient characteristics, including sociodemographics, anthropometrics, health behaviours and the development of comorbid conditions, are associated with health and functioning during older adult years.2 4 In prior literature, higher body mass index (BMI), lower physical activity levels, sleep problems and older age were identified as potential correlates of future physical function declines in women.2 3 5
Research in this area may help identify epidemiological factors associated with physical health status, as well as facilitate clinical interventions directed at women during the mid-life. Although there are multimodal interventions targeting specific behaviours, such as exercise or cognitive training,1 6 more effective strategies may be developed that focus on a high-risk group. Risk scores provide a method for identifying high-risk groups and are widely used in clinical care. These scores help predict future cardiovascular events,7 future fractures8 and survival among hospitalised patients with COVID-19.9
Developing a clinical risk score typically begins with consideration of covariates from epidemiological models.10 After identifying potential correlates of future health status, regression coefficients for each individual variable can be used to derive a scoring system that allows for easier grouping of populations into different risk categories.11 A vital aspect of risk score development is external validation.12 Thus, after a risk score has been derived in one population, it is imperative to externally validate in a different population, ensuring the portability of any risk score. These methods have been formalised and applied to clinical epidemiological data.13
We developed epidemiological models predicting clinically relevant declines in physical health and function, using the Physical Component Summary (PCS) score of the 36-Item Short Form Survey (SF-36), among women in their mid-life using data from the Study of Women’s Health Across the Nation (SWAN).2 14 These regression models had strong model fit characteristics when assessing correlates of declines in physical health and function as assessed by the PCS score.14 We are not aware of any existing clinical risk score for mid-life declines in physical health and function, thus the current analyses aimed to derive and externally validate a risk score.
Methods
Study design and study populations
We aimed to derive and externally validate a risk score for predicting clinically important declines in physical health and function during women’s mid-life. The risk score derivation is based on epidemiological models developed using data from SWAN.2 14 Several modelling exercises examined predictors of declines in PCS; these predictors were then tested in the Women’s Health Initiative (WHI) cohort and model fit was assessed. The regression coefficients from SWAN were used to calculate risk scores separately in SWAN and WHI. Thresholds for the risk score were assessed to determine whether the risk score could be used as a binary test for PCS declines among women in the mid-life.
We focused on 10-year declines in PCS between ages 55 and 65. Thus, women included in these analyses were required to have a PCS measured within 3 years of age 55 and again within 3 years of age 65. Women with less than 9 years between the measurements were excluded, but no other exclusion criteria were applied.
SWAN is a longitudinal cohort of 3302 women enrolled at seven geographically diverse sites in the USA. Details of the recruitment and study design have been previously reported.15 At enrolment, women were between ages 42 and 52 years and were premenopausal or perimenopausal. Each site intentionally enrolled a specific racial and/or ethnic minority population for approximately half of their sample and White non-Hispanic women for the other half to create a more representative and balanced cohort. One site that targeted enrolment to Hispanic women was closed during study follow-up and women (both Hispanic and non-Hispanic), from this site, were excluded from analysis. Women in SWAN have been followed on a nearly annual basis since 1996 (baseline).
The WHI is a large, prospective study investigating major determinants of chronic diseases among postmenopausal women.16–18 Briefly, a total of 161 808 postmenopausal women ages 50–79 years were enrolled between 1993 and 1998 from 40 clinical centres in the USA. Women participated in the observational study and/or one or more of three clinical trials.16 For this study, women were followed through 2021.16
Women in SWAN and WHI provided informed consent to have their information used in a deidentified manner.
While women participants are involved in the overall design of SWAN and WHI, they were not involved in these secondary analyses.
Patient and public involvement
None.
Outcomes
The main outcome of the risk score was defined as a clinically important decline in physical health and function, based on the PCS of the SF-36.19 The PCS is a weighted score that focuses on physical health and function as opposed to mental health or other domains of health and quality of life.20 For example, a decline of 7–8 points has been found to be clinically important in patients who have sustained cervical injuries.21 22 We used an 8-point decline as a consensus measure of the minimally clinically important difference for this study, based on previous literature.21 22 We specifically examined the period between ages 55 and 65 as a critical period during the mid-life. This 10-year period was chosen, as it allows for the consideration of interventions to modify risk factors, whereas a shorter period may not provide enough time for an intervention.
Risk factors
As with any risk score that may have clinical value, we focused on variables that were relatively easy to collect in a clinical setting. We were interested in variables that were available to us in both the SWAN and WHI data at or within 3 years of age 55; the first visit in this range was considered ‘baseline’ in these analyses.
In a prior study using data from SWAN, we assessed a broad list of over 30 variables as potential predictors of PCS declines.14 These variables included sociodemographics (age, race, ethnicity, educational attainment, marital status), anthropometric measures (BMI), lifestyle factors (alcohol use, tobacco use, physical activity), clinical conditions (menopausal status, thyroid disease, osteoarthritis, osteoporosis, diabetes, cardiovascular disease (CVD) (including myocardial infarction, stroke or angina), hypertension, hyperlipidaemia, cancer, venous thromboembolic disease and depressive symptoms as defined by the Center for Epidemiologic Studies Depression Scale (CES-D) scale23 and several laboratory measures.14
We identified seven significant variables in the prior SWAN epidemiological analyses; these include six risk factors (lower educational attainment, smoking, higher BMI, history of CVD, history of osteoarthritis, depressive symptoms) and one adjustment variable (baseline PCS level).14 These variables were identified in WHI, but there were differences in how some questions were asked (see online supplemental methods). In both cohorts: BMI was calculated using height and weight measurements; smoking status included never smoker, current smoker or past smoker; and educational attainment was categorised as less than college versus some college or greater. Depressive symptoms were collected from the full CES-D in SWAN23 and the shortened CES-D screening instrument (range 0–1) in WHI24; we used a cut-off of 0.06 for the shortened CES-D to indicate depressive symptoms. Osteoarthritis was self-reported in SWAN; in WHI, it was defined as a woman reporting arthritis but not rheumatoid arthritis, or if they had ever had a hip or other joint replacement. CVD was self-reported in SWAN, including stroke or myocardial infarction. In WHI, CVD was defined as ever having a doctor diagnosed stroke or myocardial infarction. Adjudicated stroke and myocardial infarction were also counted as CVD in both cohorts.
Supplemental material
Statistical analyses
We first compared relevant characteristics across women in the two cohorts. The seven risk factors identified in prior work in SWAN (greater BMI, lower educational attainment, smoking, depressive symptoms, history of CVD, history of osteoarthritis and higher baseline PCS level)14 were entered into a logistic regression model that included the eligible women from WHI, where the dependent variable was the 10-year decline in PCS. Model fit statistics were assessed in both models, including Akaike information criterion (AIC), Bayesian information criterion (BIC) and area under the receiver operating characteristic curve.25
The coefficients from the SWAN logistic regression were then used to derive a risk score for the outcome of interest. Each woman in SWAN was then assigned a score based on the value for the risk factor and multiplying by the regression coefficient:
PCS Decline Risk Score=logit (x)=−7.90096+0.07153 (baseline PCS)+0.06494 (BMI)+0.62221 (current smoker)+0.24087 (past smoker)+1.05925(CVD)+0.33326 (osteoarthritis)+0.68612 (CES-D)+0.68367 (less than college education).
The distribution of scores was assessed and the observed risk of a 10-year decline was compared with the predicted probability of 10-year decline. As noted above, we assessed the model fit of the logistic regression and estimated the risk score derived in SWAN among the women in WHI. The calibration of the risk score in SWAN and WHI were assessed by plotting the observed versus the predicted per cent of women with the outcome of interest. In addition, the Brier Score for model calibration was assessed; this score ranges from 0 to 1.0 and is a measure of the accuracy of a predicted probability, with scores less than 0.25 demonstrating a predictive model.25
Risk score categories were created based on a priori probability thresholds; these were determined based on clinical opinion and not through formal methods. These categories denote the probability of the outcome, that is, 10-year decline in PCS of at least 8 points. Categories included: very low (0%–5%), low (>5%–15%), medium (>15%–30%), high (>30%–50%) and very high (>50%). The number of women in each category and their observed outcomes were assessed across SWAN and WHI. Based on the distributions of women in each category, we further simplified the risk score into two categories: 0%–30% and >30%. We tested this simplified risk score’s performance characteristics in SWAN and WHI. Other thresholds were tested in sensitivity analyses.
Missing data were imputed using prior values where available. All analyses were conducted in SAS V.9.4 and R.
Results
We first compared the populations of women from SWAN (n=1084) and WHI (n=2535) for characteristics at the analytic baseline (at or around age 55 years) (see table 1). The women were similar in age, but they had different racial/ethnic make-up. SWAN included broader diversity with 24% black, 24% Chinese or Japanese, and 52% white women. WHI had 1.4% Asian or Pacific Islander, 11% black, and 86% white women. Educational attainment was higher in WHI with 49% describing college or beyond vs 26% in SWAN. BMI was slightly lower in SWAN than in WHI (27.0 vs 27.8 kg/m2). Current smoking was almost identical between the two cohorts (9.9%–10%), but alcohol use was more frequent in WHI with 38% describing 1 or more drinks per week vs 26.6% in SWAN. All measured comorbid health conditions were reported more frequently in SWAN than WHI.
The distribution of the change in PCS scores over 10 years was similar across the two cohorts. The median change in PCS was −1.0 (IQR −6.1 to 2.5) points in SWAN and −2.7 (IQR −8.5 to 1.3) in WHI. Among the groups that had a clinically important decline in physical health and function (ie, at least 8-point decline in PCS), the median change in SWAN was −14.2 (IQR –19.0 to –10.6) and in WHI was −14.1 (–19.8 to –10.5). Importantly, for the assessment of the risk score, 19% of women in SWAN (n=206) and 26% of those in WHI (n=670) had an >8-point decline over 10 years of follow-up.
The seven variables (six risk factors: lower educational attainment, smoking, higher BMI, history of CVD, history of osteoarthritis, depressive symptoms; and one adjustment variable: baseline PCS level) from a logistic regression model developed in SWAN14 were then tested in WHI (see table 2). The variables had similar ORs for 10-year decline in PCS, with moderate area under the curves (AUCs) in both models: AUC 0.71 in SWAN and 0.64 in WHI. Model fit statistics were good and showed better fit for WHI with lower AIC and BIC. Calibration curves (predicted vs observed outcomes) for SWAN and WHI show results aligning with the diagonal (see figure 1). While the predicted probabilities were underestimates of the observed probabilities in WHI, the Brier Scores for both cohorts were in the excellent calibration range: SWAN – 0.14 and WHI 0.18.
The regression coefficients from SWAN were then used to calculate a risk score for PCS declines in both SWAN and WHI (see the Methods for risk score equation). As noted in figure 2, the distributions of the PCS decline risk score were very similar in both cohorts; for SWAN, median and IQR was 0.16 (0.10–0.25), and for WHI, 0.13 (0.089–0.20). Risk score categories were created based on a priori thresholds: very low (0% to 5%), low (>5% to 15%), medium (>15% to 30%), high (>30% to 50%) and very high (>50%). There were similar increases in the percentage of women who had significant declines with increasing risk scores (see online supplemental table 1). We further simplified the risk score to 0%–30% (ie, very low to medium) and >30% (ie, high to very high) probability of a clinically important decline. Using this threshold, the risk score created a binary clinical test with a specificity between 89% and 93% (depending on the cohort) and an overall accuracy of 73%–79% (see table 3) across the two cohorts. We also tested other thresholds in sensitivity analyses; these gave similar results (see online supplemental table 2).
Supplemental material
Discussion
Patient-reported outcomes, such as the SF-36, are being used more commonly for clinical decision-making. Individual changes on the SF-36 can help inform and target interventions for at-risk individuals. The mid-life is an important time in the lifespan, as adverse health trajectories often begin during this phase of life.5 26 We used information from two large US cohorts of women, SWAN and WHI, to develop and validate a risk score for clinically meaningful declines in the PCS of the SF-36. We found that the set of predictors developed in SWAN (six risk factors: lower educational attainment, smoking, higher BMI, history of CVD, history of osteoarthritis, depressive symptoms; and one adjustment variable: baseline PCS level) also identified women in WHI with an increased probability of significant declines. The regression coefficients from the model derived in SWAN allowed us to estimate a risk score in the combined cohort that was shown to be potentially useful as a binary clinical test to predict future decline in women.
This risk score has several potential implications. First, the variables in the risk score may help women and their clinicians predict future health status. Accurate prediction of functional health and functional declines can motivate behaviour change. Second, several of the predictors could be the targets of strategies to impact health trajectories, including depressive symptoms, BMI and tobacco use. Finally, by identifying women in the mid-life at an increased risk of health declines, the risk score can provide a target population for future tests of various interventions and/or programmes; such interventions may be provider-based or health system-based and should focus on slowing health and functional declines during the mid-life.
Most interventions in healthcare focus on treatments for specific conditions, that is, heart disease, diabetes or arthritis. However, some types of general interventions have demonstrated benefits, although inconsistently, in reducing declines in health and functional status. A large meta-analysis of randomised controlled trials testing multidimensional home visits in the elderly demonstrate their benefit at reducing disability.27 Trials of home-based group exercises have demonstrated improvements in physical function, balance and muscle strength.28 Several of the predictors we identified are clearly independent of the PCS, but others are more directly related (eg, baseline PCS, depressive symptoms); this may have implications for designing interventions.
The clinical risk score development we describe could have applications beyond this specific example. At a time when biomedical research is increasingly searching for biomarkers (eg, proteomic, genomic or transcriptomic) that might correlate with a given phenotype, it is important for clinical researchers to search for easily accessible clinical variables that may provide similar insights. Other clinical risk scores are widely used in current clinical practice, including American College of Cardiology/American Heart Association (ACC/AHA) cardiovascular risk score29 and the FRAX osteoporosis risk score.8
Limitations of the current study include good, but not excellent, model fit. The AUCs are adequate but there may be other variables that could improve model fit, especially in WHI. While we tested a wide variety of variables in SWAN as the derivation cohort,14 other variables in other cohorts (eg, laboratory measures, muscle mass, exercise, dietary factors and urban vs rural location) might be useful to improve model fit. Furthermore, the definition of variables was not identical across SWAN and WHI (see online supplemental methods). We anticipate better model fit in WHI if the definitions were more similar, but the variation reflects what would be expected in typical practice. The attempt to create a binary risk score with a single threshold was pursued to improve the clinical application of the risk score. The sensitivity of the current binary threshold is suboptimal. While we considered other thresholds (see online supplemental table 2), the performance characteristics were similar to the original 30% threshold. These studies only include women who survived 10 years between ages 55 and 65 (<5% of women died during this period), so their application in men and in women with a limited life expectancy requires testing in other cohorts. Finally, the 8-point MCID threshold for the PCS was based on consensus review of prior literature.21 22 However, this is not the ideal method for determining an MCID. Guidelines have been set out by a consensus group of experts in clinimetrics, referring to a minimally important difference; they acknowledge that many different statistical methods are feasible but are unable to agree on one preferred method.30
Strengths of the current study are the large representative cohorts of women, followed for at least a decade. While the cohorts were similar, their differences were important and suggest that the risk score should work across populations of diverse race, ethnicity and educational attainment. The methods employed were rigorous and follow standard reporting recommendations.13 The variables included in the risk score are all easily measured, and the PCS has been used widely in various studies of longitudinal health and changes in PCS have been examined in different populations.31–33
In summary, seven easy to obtain clinical factors—six risk factors: lower educational attainment, smoking, higher BMI, history of CVD, history of osteoarthritis, depressive symptoms; and one adjustment factor: baseline PCS level—can be used to predict which women in the mid-life are more or less likely to have significant declines over the next decade. These risk factors have been organised into a simple risk score that works well in two large diverse cohorts of women from the USA. The performance of the risk score as a diagnostic test suggests that it might be useful for identifying women at elevated risk for significant declines in physical health and function. Finding a high-risk cohort would facilitate development and testing of future interventions.
Data availability statement
Data are available on reasonable request.
Ethics statements
Patient consent for publication
Ethics approval
This study involves human participants and was approved by MassGeneral Brigham Human Research Committee, 1999 P006353. Participants gave informed consent to participate in the study before taking part.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Twitter @DanielHSolomon
Contributors DHS: design, supervision, writing and revising manuscript. LS: analysis, revising. AS: revising manuscript. BH: revising manuscript. S-AMB-B: revising manuscript. CK-G: revising manuscript. AC: revising manuscript. RJ: revising manuscript. MSL: revising manuscript. KR: revising manuscript. CIV: revising manuscript. NEA: revising manuscript. JEM: supervision, revising manuscript. DHS is the guarantor of the data and accepts full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish.
Funding The Study of Women's Health Across the Nation (SWAN) has grant support from the National Institutes of Health (NIH), DHHS, through the National Institute on Aging (NIA), the National Institute of Nursing Research (NINR) and the NIH Office of Research on Women’s Health (ORWH) (Grants U01NR004061; U01AG012505, U01AG012535, U01AG012531, U01AG012539, U01AG012546, U01AG012553, U01AG012554, U01AG012495, and U19AG063720). The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, US Department of Health and Human Services through contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C and HHSN268201600004C.
Disclaimer The content of this manuscript is solely the responsibility of the authors and does not necessarily represent the official views of the NIA, NINR, ORWH or the NIH.
Competing interests DHS reports unrelated research contracts to his institution from Amgen, Abbvie, CorEvitas, Janssen and Moderna. He also receives royalties for chapters in UpToDate on NSAIDs.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.