INTRODUCTION

Despite ongoing efforts to improve quality of care for patients with serious illness approaching the end of life,1 many patients still receive care that is inconsistent with their preferences, uncomfortable, and costly.2,3,4,5,6,7 Access to palliative care for seriously ill patients is associated with improved patient and caregiver experience and improved outcomes.3,4,5,6,7,8,9,10,11,12,13,14 Nevertheless, use of palliative care interventions, such as those targeting improved communication, remains uneven and suboptimal.7, 15, 16

To improve the value of care for the seriously ill, experts and policy makers have called for a more thorough integration of palliative care into primary care.7, 17, 18 By virtue of their longitudinal relationship with patients and their role in care coordination and access, primary care clinicians are uniquely positioned to support improvements in end-of-life care directly, via delivery of primary palliative care.7, 17, 19 However, the primary care setting is inherently time- and resource-constrained, and access to community resources and specialty palliative care is limited.20, 21 The success of primary care–based palliative care programs, therefore, rests on the ability to focus interventions on the patients who need them most.22 Successful identification of target populations is difficult; primary care populations are heterogeneous and comprised of patients with multiple comorbid illnesses at varying stages of severity. Furthermore, mortality in the primary care population is a relatively rare event,23 complicating efforts to effectively identify patients with poor prognosis.

The “Surprise Question” (SQ)—“Would you be surprised if this patient died in the next 12 months?”24—has shown promise in several populations of seriously ill patients, including those with cancer and renal failure, to identify patients nearing the end of life who might benefit from palliative care interventions.25,26,27,28,29 The SQ is an attractive, simple alternative to mortality predictors that rely on administrative data, which can have poor performance in undifferentiated populations.30 Despite optimism about the potential for integrating the SQ into screening efforts in primary care,24 initial evidence in primary care demonstrated that the SQ may perform poorly.31 Two recently published meta-analyses demonstrate poor to moderate performance of the SQ, finding poorer performance in diseases other than cancer; both call for ongoing research into better means of using the SQ in clinical practice.32, 33

We have been implementing the Serious Illness Care Program (SICP), a multi-step systematic intervention aimed at driving more, better, and earlier communication about goals and values in advancing serious illness, in the primary care setting for the past 4 years.34, 35 Based on discussions about prognostic uncertainty with practicing primary care physicians (PCPs) and the poor performance of the traditional SQ (1 year) in our setting,31 we hypothesized that modifying the horizon of the SQ to 2 years would improve performance. In this manuscript, we present data collected prospectively to evaluate the utility of the 2-year SQ as a tool for prioritizing patients for an early serious illness communication intervention in a heterogeneous, high-risk primary care population.

METHODS

Study Cohort

The study cohort consisted of 1699 chronically ill, complex primary care patients eligible for the Brigham and Women’s Hospital (BWH) Integrated Care Management Program (iCMP) who were screened for the SICP. The iCMP is a primary care–based care management program designed to improve care for complex patients at risk of poor outcomes and high costs.36, 37 Primary care patients are screened for eligibility for the program using a proprietary claims–based algorithm weighing comorbid illnesses and past utilization patterns (https://www.optum.com/solutions/data-analytics/analytics-technology/impact-pro-cpl.html).36 PCPs then review lists of eligible patients and select the most appropriate patients to enroll in the iCMP.36 Then, to identify which iCMP patients were eligible for the SICP, PCPs and nurse care coordinators (RNCCs), answered the 2-year SQ between March 2014 and March 2015.

We asked the SQ via a voluntary electronic survey to both PCPs and RNCCs separately as part of the SICP activities before training and clinicians were aware of the purpose of the survey. The survey had a list of patients for each clinician who were active in the iCMP program and asked the 2-year SQ. A “No” answer to the question represents a clinician concern for death in the next 2 years and thus indicates that a patient may be a good candidate for a serious illness conversation. Study data were collected and managed using Research Electronic Data Capture (REDCap).38 REDCap is a secure, web-based application designed to support data capture, providing (1) an intuitive interface for validated data entry; (2) audit trails for tracking data procedures; (3) automated export procedures for data downloads to statistical packages; and (4) procedures for importing data from external sources.

Outcomes and Covariates

We ascertained the primary outcome, vital status (alive/deceased) 2 years after SQ response, using data from medical records, insurance eligibility files, and the Massachusetts state registry of vital records and statistics; data from the state registry was treated as the primary source unless it was missing, in which case we used insurance eligibility or electronic health record (EHR) data. Secondary outcomes included vital status at 1 year for PCP SQ response and 1 and 2 years for RNCC SQ response. Patient demographic variables and comorbidity data are derived from EHR data and intake surveys completed by nurse care coordinators. Clinicians were surveyed for years in practice, gender, and time spent on clinical duties. We assessed comorbidity (both individual and aggregate burden) using the Gagne methodology, a composite comorbidity index combining Elixhauser and Charlson scores,39, 40 which has been validated for mortality prediction (Appendix).30

Statistical Analysis

For all analyses, we separated the cohort into 2 groups according to the response to the SQ (yes/no). We assessed the performance of the SQ as a predictor of 1- and 2-year mortality stratifying by clinician type. First, a univariate logistic regression model quantified the stand-alone effect of the 2-year SQ response on 1- and 2-year mortality. Next, we incorporated the 2-year SQ into a multivariate logistic regression of 1- and 2-year mortality on additional variables: age, gender, ethnicity, marital status, and comorbidity score. We calculated screening test characteristics—sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive and negative likelihood ratios (+ and −LR respectively), and area under receiver operating curve (AUC)—for physicians and nurses, and constructed collaborative responses that combine responses pertaining to the same patient from both professions.41 Finally, we compared the survival between the SQ response groups after the date when PCP or RNCC screening occurred using Kaplan-Meier curves. We performed all analyses in STATA (version 15.0; Stata Corporation), considering p < 0.05 statistically significant. The institutional review board of Partners HealthCare approved this study.

RESULTS

Of 66 PCPs (58.4%) who returned the survey, 56.1% were female, averaging 16.9 years in practice (interquartile range (IQR) 9–20, 5 missing values), 72.7% completed the training program before returning the survey, and the mean time spent on clinical duties was 68.3% (IQR 32.5–100%, 6 missing values). All 16 nurses answered the survey, 93.8% were female, averaging 23.8 years in practice (IQR 20–30), 68.8% completed the training program, and the mean clinical time was 66.2% (IQR 50–70%, 3 missing values).

For the 1163 patients in this high-risk population for whom their PCP answered the SQ, 57.4% were female and the mean age (recorded at the time of the SQ answer) was 70.1. The most commonly observed Gagne comorbidities in the PCP cohort were hypertension (66.8%), cardiac arrhythmias (27.6%), and chronic pulmonary disease (21.7%). Slightly fewer than half of patients (41.0%) had 3 or more Gagne comorbidities. Most patients were independent for all activities of daily living (60.9%) and either lived with family/spouse (43.3%) or alone (20.6%). Approximately one-third of patients (31.5%) rated their health as fair or poor, and few (11.3%) reported falling in the last 6 months (Table 1). The 1- and 2-year mortality for patients in the PCP cohort was 7.8% and 15.5% respectively. For the 1448 patients in this study for whom their RNCC answered the SQ, 60.4% were female and the mean age was 69.8. The most commonly observed comorbidities in the RNCC cohort were hypertension (68.4%), cardiac arrhythmias (27.6%), and chronic pulmonary disease (22.3%). Fewer than half of patients (41.7%) had 3 or more comorbidities. Functional status, living situation, and self-reported health and falls were similar in the nursing and physician cohorts (Table 1). The 1- and 2-year mortality for patients in the RNCC cohort was 7.1% and 14.7% respectively.

Table 1 Baseline Characteristics of Surprise Question Cohort

Descriptive Results of Answer to 2-Year SQ

Overall, PCPs classified 452 (38.9%) of these high-risk patients into the “No” group, representing a physician concern for death in the next 2 years, and 711 (61.1%) into the “Yes” group, representing low concern for death in the next 2 years. Among patients whom PCPs identified as at risk of death, 143 patients died in 2 years (31.6% mortality rate); among those whom PCPs assessed at low risk of death, 37 patients died in 2 years (5.2% mortality rate) (Table 2). The 2-year sensitivity of the PCP SQ response was 79.4%, specificity 68.6%, PPV 31.6%, NPV 94.8%, +LR 2.53, and −LR 0.30. In a sensitivity analysis evaluating whether performance changes for the 2-year SQ were related to longer follow-up time and to compare with prior studies examining SQ performance at 1 year, we assessed the performance of the 2-year SQ response for 1-year mortality. The 1-year sensitivity of the PCP response to the 2-year SQ was 84.6%, specificity 65.0%, PPV 17.0%, NPV 98.0%, +LR 2.42, and −LR 0.24.

Table 2 Tabulation of 2-year Surprise Question Response and Patient Vital Status After 2 Years by Profession

Nurses classified 352 (24.3%) patients into the “No” group, representing an RNCC concern for death in the next 2 years, and 1096 (75.7%) into the “Yes” group, representing low concern for death in the next 2 years. Among patients whom RNCCs identified as at risk of death, 112 patients died in 2 years (31.8% mortality rate); among those whom RNCCs assessed as low risk of death, 101 patients died in 2 years (9.2% mortality rate) (Table 2). The 2-year sensitivity of the RNCC SQ response was 52.6%, specificity 80.6%, PPV 31.8%, NPV 90.8%, +LR 2.71, and −LR 0.59. In a sensitivity analysis assessing RNCC 2-year SQ performance on 1-year mortality, results were similar to those of the 2-year performance. Kaplan-Meier curves for PCP and RNCC 2-year SQ are presented in Figure 1 (log rank test χ2 = 144.1, P < 0.0001 for PCP, χ2 = 118.2, P < 0.0001 for RNCC curves).

Figure 1
figure 1

Kaplan-Meier curves of physician and nurse 2-year Surprise Question answers. SQ, Surprise Question.

For the 1034 patients for whom there was agreement between clinicians on high risk of death status, where both PCP and RNCC answered “No,” the 2-year sensitivity of the SQ response was 50.3%, specificity 86.7%, PPV 40.0%, NPV 90.8%, +LR 3.78, and −LR 0.57. For patients where either clinician answered “No,” the 2-year sensitivity of the SQ response was 82.6%, specificity 62.7%, PPV 28.1%, NPV 95.3%, +LR 2.21, and −LR 0.28 (Table 2).

Univariate Regression Results

In univariate regression for PCP response, the odds of dying within 2 years for high-risk patients (SQ answer of “No”) were 8.4 times higher than those for low-risk patients (SQ answer of “Yes”) (95% CI 5.7–12.4, p < 0.001) with an AUC of 0.74. For PCP response, the odds of dying within 1 year for high-risk patients were 10.2 times higher than those for low-risk patients (95% CI 5.7–18.3, p < 0.001) with an AUC of 0.75. For RNCC response, the odds of dying within 2 years for high-risk patients (SQ answer of “No”) were 4.6 times higher than those for low-risk patients (SQ answer of “Yes”) (95% CI 3.4–6.2, p < 0.001) with an AUC of 0.67 and the odds of dying within 12 months were similar.

In univariate analysis, patients whom both clinicians agreed were at high risk, the odds of dying were 6.6 times higher than those for low-risk patients (where one or both clinicians deemed the patient to be low risk) (95% CI 4.6–9.6, p < 0.001) with an AUC of 0.69. For patients whom either clinician deemed to be at high risk of death (SQ answer of “No”), the odds of dying were 8.0 times higher than those for low-risk patients (95% CI 5.1–12.3, p < 0.001) with an AUC of 0.73.

Multivariate Regression Results

In multivariate analysis, a PCP high-risk designation was more strongly associated with the odds of 2-year mortality than age, sex, marital status, ethnicity, or Gagne comorbidity score with an odds ratio of 3.2 (95% CI 2.1–5.1, p < 0.001) and an AUC of 0.86. The same was true for nursing response with an odds ratio of 1.8 (95% CI 1.3–2.6, p = 0.001) and an AUC of 0.83 (Table 3). Sensitivity regression analysis removing the SQ yielded similar coefficients for covariables, indicating that it may contribute to prediction independently of other measured patient characteristics (Appendix).

Table 3 Multivariate Logistic Regression with an Outcome of Vital Status After 2 Years

DISCUSSION

In this study, we tested a new 2-year SQ as a screening tool for an early serious illness communication intervention in a diverse high-risk primary care population. The new tool was based on primary care clinician input and initial poor performance with the more traditional 1-year question in a primary care population.31 The 2-year question performed relatively well in predicting mortality at 1 and 2 years and thus for identifying patients for serious illness communication interventions. Patients whom the physician identified as high risk—a “No” response to the SQ—were significantly more likely to die within 1 (unadjusted OR of 10.2) and 2 years (unadjusted OR of 8.4). A nurse response of “No” was similarly associated with a higher likelihood of death, indicating that answering clinician screens in palliative care programs, such as the SQ, may also be a role for nurses within busy primary care practices. Prior studies have examined the SQ in different care settings and with multiple lengths of time, demonstrating considerable variability across settings and underlying conditions.32, 33, 42 AUC for the SQ has ranged from 0.51 to 0.82, and our findings of 0.74–0.75 for doctors and 0.67–0.69 for nurses are within that range. However, while data is sparse in primary care, the SQ has generally performed more poorly in general practice populations, where the AUC of the tool has been lower (0.55–0.59).31, 43

In the current study, we observed the highest sensitivity (82.6%) when either clinician gave a “No” answer to the 2-year SQ. At the same time, the rates of false positive results (a “No” answer in a patient who remained alive) were also relatively low, at 2.6 times the rate of true positives (a “No” answer in a patient who died). This indicates that, by having conversations with all patients with a “No” answer, each clinician team would need to have a conversation with approximately 3 to 4 patients; this effort would result in conversations with over 80% of the patients who died in this population. For clinical practice, our results suggest that using a “No” answer from either the doctor or nurse to the 2-year SQ is a reasonable marker for identifying patients for serious illness communication interventions in the high-risk care management primary care setting: it identifies primary care patients who could benefit from earlier goals and values conversations. Furthermore, in the context of improving serious illness communication, the cost of a false positive result for the SQ is time spent talking with a patient about their goals and values. In that same vein, screening tests without perfect sensitivity may be an acceptable first step in the context of busy clinician practices and clinician shortages. While primary care clinicians work in a highly time-pressured environment, our results suggest that the 2-year SQ is specific enough to avoid generating a significant burden of extra work for clinicians. However, we do not believe these results indicate that the SQ is an appropriate tool for all palliative care interventions, some of which have a narrower application, such as hospice referral or completion of Physician Orders for Life-Sustaining Treatment Forms.

Our study did not analyze the full spectrum of possible uses of the SQ. For example, since the SQ captures information not present in demographics or comorbidities, it is possible that analytical methods could use the SQ in combination with more advanced predictive modeling strategies. However, recent work highlights the significant challenges in accurate prediction of death in patients even using advanced analytic methods especially in the context of targeting interventions.44 In addition, given frequent EMR-generated alerts presented to primary care doctors and attendant “alert fatigue,”45 clinician-driven tools, such as the SQ, hold promise as a means to generate some clinician control and “buy-in,” activating movement from a pre-contemplative state to a contemplative state in considering interventions aimed at improving end of life for their seriously ill patients.46

This study has several important limitations. First, the study cohort was identified as a population at high risk of increased cost and utilization, by a proprietary computer algorithm followed by clinician screen for the program, before the SQ was asked of clinicians and thus the data is most applicable to high-risk care management programs. Future studies should assess the performance of the SQ when paired with other predictive variables and models, as well as the additional possible effect on clinician activation. Furthermore, the physicians, but not nurses, who answered the SQ in this study have experience with the screening question as they had used it previously, although in the first study the question was delivered as part of screening for care coordination, and in this one it was delivered to patients already in the care coordination program as part of the SICP. That familiarity with the SQ and differences in delivery mechanisms may have affected our results when compared with our prior evaluation of the 1-year SQ. Additionally, we did not collect the qualitative or quantitative data needed to answer questions about why the tool performs as it does, for example, what clinicians consider when they answer that they would not be surprised if a patient dies, and this should be a component of further research in this area. Lastly, it is important to note that prediction of death is not the only, and possibly not the most important, factor in identifying seriously ill patients for early palliative care interventions.

CONCLUSIONS

In our study, clinician use of a new 2-year SQ identified primary care patients at increased risk of death and successfully captured most of the patients who eventually died over the next 2 years. The 2-year SQ had notably improved performance over the traditional 1-year SQ in this heterogeneous primary care population. These results suggest that a 2-year SQ holds promise for identification of appropriate patients for serious illness communication interventions in the primary care setting and future work should focus on pairing it with appropriate analytical tools and studying the effect of using the 2-year SQ on physician and nurse behavior, such as conducting discussions about care goals with seriously ill patients.