Article Text

Original research
Psychometric properties of self-reported financial toxicity measures in cancer survivors: a systematic review
  1. Zheng Zhu1,2,
  2. Weijie Xing1,2,
  3. Huan Wen3,
  4. Yanling Sun3,
  5. Winnie K W So4,
  6. Lucylynn Lizarondo5,
  7. Jian Peng1,
  8. Yan Hu1,2
  1. 1School of Nursing, Fudan University, Shanghai, China
  2. 2Fudan University Centre for Evidence-based Nursing: A Joanna Briggs Institute Centre of Excellence, Fudan University, Shanghai, China
  3. 3School of Public Health, Fudan University, Shanghai, China
  4. 4The Nethersole School of Nursing, The Chinese University of Hong Kong, Hong Kong, China
  5. 5The Joanna Briggs Institute, University of Adelaide, Adelaide, South Australia, Australia
  1. Correspondence to Dr Weijie Xing; xingweijie{at}fudan.edu.cn

Abstract

Objective The aim of this systematic review was to summarise the psychometric properties of patient-reported outcome measures (PROMs) measuring financial toxicity (FT) in cancer survivors.

Design This systematic review was conducted according to the guidance of the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) methodology.

Data sources Comprehensive searches were performed in PubMed, MEDLINE, Embase, CINAHL, PsycINFO, Web of Science, ProQuest and Cochrane Library from database inception to February 2022.

Eligibility criteria for selecting studies We included studies that reported any PROMs for measuring FT in cancer survivors who were ≥18 years old. FT was defined as perceived subjective financial distress resulting from objective financial burden. Studies that were not validation studies and that used a PROM only as an outcome measurement were excluded.

Data extraction and synthesis Two reviewers independently extracted data from the included papers. We used the COSMIN criteria to summarise and evaluate the psychometric properties of each study regarding structural validity, internal consistency, reliability, measurement error, hypothesis testing for construct validity, cross-cultural validity/measurement invariance, criterion validity and responsiveness.

Results A total of 23 articles (21 PROMs) were eligible for inclusion in this study. The findings highlighted that the Comprehensive Score for Financial Toxicity (COST) had an adequate development process and showed better psychometric properties than other PROMs, especially in internal consistency (Cronbach’s α=0.92), reliability (intraclass correlation coefficient=0.80) and hypothesis testing (r=0.42–0.20).

Conclusions From a psychometric property perspective, the COST could be recommended as the most suitable worldwide available measure for use in research and clinical practice across different contexts. We suggest that PROMs should be selected only after careful consideration of the local socioeconomic context. Future studies are warranted to develop various FT PROMs based on different social and cultural backgrounds and to clarify the theoretical grounds for assessing FT.

  • oncology
  • health economics
  • quality in health care

Data availability statement

No data are available.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This is the first systematic review that comprehensively summarised the psychometric properties of 21 patient-reported outcome measures (PROMs) evaluating financial toxicity in cancer survivors.

  • The results may provide quantitative evidence for researchers and healthcare professionals to choose PROMs measuring cancer survivors’ financial toxicity in future scientific research and clinical practice.

  • This review only included studies that aimed to evaluate the measurement properties of financial toxicity PROMs.

Introduction

The rising cost associated with advancements in cancer treatment and lengthening of cancer survivorship poses a significant challenge to survivors, caregivers and public healthcare systems.1 2 Total global spending on cancer medications grows at a compound annual growth rate of 6.5%, growing from US$96 billion in 2013 to US$173 billion in 2020, which is nearly twice the rate of global gross domestic product growth.3–5 The majority of cancer survivors in middle-income and low-income countries/regions depend on out-of-pocket payments, which may lead to global inequalities in healthcare expenditures and financial insecurity for vulnerable groups.6 7

The term ‘financial toxicity (FT)’ has been described as the economic effect of cancer treatment in the age of precision medicine.2 8 9 Witte et al described FT as ‘the patient-reported outcome (PRO) of perceived subjective financial distress resulting from objective financial burden’.10 This concept covers both the objective financial burden and the subjective financial distress that cancer survivors face as a result of high out-of-pocket medical expenses. Regarding the terminology, ‘financial toxicity’, ‘financial burden’ and ‘financial distress’ are often used interchangeably in research and share a similar definition.10 11 In this review, the authors agreed to consistently use the term ‘financial toxicity’. Financial toxicity is usually measured by PRO measures (PROMs); choosing a PROM with high validity and reliability is a prerequisite for robust results.

There are a few cancer-specific and generic FT PROMs that have been reported and used in different contexts. As one of the recent cancer-specific FT PROMs, the Comprehensive Score for Financial Toxicity (COST) is the most commonly used measure for assessing FT.12 In addition to COST, other cancer-specific measures have been widely used, including the Breast Cancer Finances Survey Inventory,13 Socioeconomic Well-being Scale (SWBS)14 and InCharge Financial Distress/Financial Well-being Scale (InCharge).15 Additionally, validated subscales, such as the Social Difficulties Inventory Cancer Care Outcomes (SDI), the Cancer Care Outcomes Research and Surveillance Consortium patient survey, and Italian version of the Edmonton Symptom Assessment System-Total Care (TC), were also used to evaluate FT.16–18 However, existing PROMs vary significantly in their state of development and degree of validation, and many PROMs have not been psychometrically tested.

A preliminary literature search was conducted in PubMed, PsycINFO (EBSCO), Cochrane Library (Wiley) and Joanna Briggs Institute (Ovid), which revealed that there exist some reviews regarding measures of FT. Witte et al summarised the content of 352 items from 34 studies measuring FT in cancer survivors.10 However, this review did not report the psychometric properties of the included PROMs, and most of the included PROMs were not validated through a scientific process, which made it difficult for readers to choose the best measure from existing PROMs to evaluate the level of FT. Salman et al conducted a systematic review and found eight PROMs and two caregiver-reported measures for assessing financial burden in adolescents and young adults.19 However, this review focused only on PROMs assessing FT in adolescents and young adults with cancer. The psychological properties of FT measures in adult cancer survivors are still unknown.

The reproducibility, reliability and accuracy of PROMs are the fundamental premise for achieving robust results. Therefore, it is necessary to summarise the psychometric properties of existing PROMs for future research. However, this information is still lacking. The aim of this systematic review was to summarise the psychometric properties of PROMs for measuring FT in cancer survivors. The review was conducted according to the guidance of the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) methodology and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement.20 21 The protocol of this review was published in BMJ OPEN in 2020.22 The registration number of the protocol in PROSPERO was CRD42021254721.

Methods

Search strategy

First, we conducted a limited search via PubMed to capture keywords from which to develop search strategies for each database. Subsequently, all identified search strategies across databases were performed in PubMed/MEDLINE, MEDLINE (Ovid), Embase (Ovid), CINAHL (EBSCO), PsycINFO (EBSCO), Web of Science, ProQuest Dissertations and Theses, and Cochrane Library (Wiley). The search time frame was set from database inception to February 2022. To include more studies published in 2021 and 2022, the end date of the search was updated to February 2022.22 In PubMed/Medline, we searched papers in English using MeSH terms ([cancer OR neoplasms] AND [“cancer survivors” OR patient OR survivors] AND “cost of illness”) combined with (cancer OR [patient* OR survivor*] AND [cost OR bill* OR expense OR productivity loss OR “out-of-pocket” OR “economic burden” OR “financial toxicity” OR “financial hardship” OR “financial burden”]). The COSMIN measurement properties filter and exclusion filter were also used in the search box. The search strategies for each database are presented in online supplemental appendix 1. Finally, the references of all included studies were manually reviewed to supplement the database search.

Inclusion and exclusion criteria

The inclusion criteria were as follows: (1) studies that reported any PROMs for measuring FT in cancer survivors who were ≥18 years old. If the studies reported results in a population combined with both ≥18 and <18 years old cancer survivors and the majority of survivors were not <18 years old, the studies were also considered; (2) studies that evaluated at least one measurement property; and (3) studies published in English. The exclusion criteria were as follows: (1) studies that were not validation studies and used a PROM only as an outcome measurement; (2) studies that used a PROM as a comparator for another instrument; (3) studies that did not provide empirical data and (4) if a measure was a quality of life PROM and had a domain that assessed FT, we included only the original version of the PROM. If the measure/domain included only one item and reported the measurement property as an independent domain, the measure/domain was also considered.

Study screening and selection

We imported all identified citations by search strategies into Endnote V.X8 (Clarivate Analytics, Pennsylvania, USA). After duplicates were removed, two reviewers (ZZ and WX) independently screened all titles, abstracts and full texts (ZZ and WX) based on the established inclusion and exclusion criteria. Any disagreements were resolved by a third reviewer (YH).

Quality appraisal

Two reviewers (HW and YS) assessed the methodological quality of the PROM of the included studies by using the COSMIN Risk of Bias Checklist (online supplemental appendix 2).19 The checklist consisted of 10 domains (116 items), including PROM development, content validity, structural validity, internal consistency, cross-cultural validity, reliability, measurement error, criterion validity, hypothesis testing and responsiveness. Each measurement property was rated as ‘very good’, ‘adequate’, ‘doubtful’ or ‘inadequate quality’. According to the COSMIN guidelines, the methodological quality of a single study is rated based on the worst score count method. For example, if the lowest rating is ‘inadequate’ in the PROM development domain, the overall methodological quality of that domain is ‘inadequate’. The worst score counts method takes into account that inadequate quality items could affect the overall results of the measurement property of each PROM. Any discrepancies were resolved by a third reviewer (ZZ).

Data extraction

Two reviewers (ZZ and WX) independently extracted data from the included papers, including authors, year of publication, PROM, country/language, study design, target population, sample size, domains, number of items, total score range and main findings. The main findings regarding psychometric properties, including content validity, structural validity, internal consistency, cross-cultural validity, reliability, measurement error, criterion validity, hypothesis testing and responsiveness, were also extracted. Any discrepancies were resolved through discussion between the two reviewers.

Data synthesis

We used the COSMIN criteria to summarise and evaluate the psychometric properties of each study regarding structural validity, internal consistency, reliability, measurement error, hypothesis testing for construct validity, cross-cultural validity/measurement invariance, criterion validity and responsiveness. Each measurement property from each study was rated as sufficient (+), insufficient (−) or indeterminate (?). The criteria for the measurement property rating can be found in online supplemental appendix 2. If the ratings of one psychometric property per study were all sufficient (+) or insufficient (−), the results were pooled, and the overall rating was rated as sufficient (+) or insufficient (−). If the ratings were inconsistent, explanations of inconsistency were explored (eg, different languages). For example, in our review, different language, social, economic and cultural contexts may contribute to inconsistencies in psychometric properties. Our review team (ZZ, WJ, HW and YS) discussed the potential explanations of inconsistency. If the review team regarded the explanation as reasonable, we provided ratings (‘+’, ‘−’ and ‘?’) in subgroups (eg, language subgroup). If the explanation was not reasonable, the overall rating of this measurement property was rated as inconsistent (±).

Assessing certainty of evidence

We used a modified Grading of Recommendations Assessment, Development and Evaluation system to assess the certainty of evidence.19 Each piece of evidence was graded for risk of bias, inconsistency, imprecision and indirectness. The instructions for downgrading for risk of bias, inconsistency, imprecision and indirectness are shown in Appendix II. Four reviewers (ZZ, WJ, HW and YS) independently assessed the grade. Any discrepancies were resolved by discussion.

Patient and public involvement

No patients or the public were directly involved in the development of the research question, selection of the outcome measures, design and implementation of the study, or interpretation of the results.

Results

Literature search

Figure 1 shows the process of literature screening and selection. A total of 9399 articles were identified via databases. Six articles were found by additional supplementary searches. After duplications were removed, a total of 11 731 articles were retained, 11 669 articles were deleted after reading the title and abstract, and 39 were deleted after full-text reading. Finally, a total of 23 articles (21 PROMs) were eligible for inclusion in this study.12 14 16 23–42

Figure 1

PRISMA flow chat of selection process. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Study description

Table 1 shows the characteristics of the included studies. All included studies were published from 2005 to 2022. Eight studies were conducted in the USA,12 14 23 27 30 37 39 41 four in the UK,16 29 35 38 two in Canada31 36 and two in China (mainland and Hong Kong),25 39 India26 34 and Italy.33 42 One study was conducted in 12 countries in Europe and North America.22 23 Other studies were conducted in Brazil32 and Iran.34 A total of 12 362 participants were included, ranging from 736 to 590141 per study. The majority of studies assessed FT in multiple types of cancer. Only two studies focused on a single type of cancer, namely, lung, colorectal, or head and neck cancer.31 37

Table 1

Overview of the included studies

Among the 21 PROMs, 7 were FT-related domains of quality of life PROMs and 14 were independent PROMs focusing on FT. All PROMs were validated in cancer survivors. Fifteen PROMs were in English,12 14 16 23 25–31 35 37 38 40–42 and two were in Chinese.24 39 Other languages included French,36 Portuguese,32 Italian,33 42 Hindi25 26 and Persian.34 The number of items evaluating FT ranged from 340 to 23.36 The French version of the Patient Self-Administered Financial Effects Questionnaire (P-SAFE) did not report the total score range of the whole PROM.36

Quality assessment

Methodological quality assessment

Table 2 shows the methodological quality of the 23 included studies by using the COSMIN checklist. In the PROM development domain, only one study was rated as adequate,42 three studies were rated as doubt12 24 27 29 and the others were rated as inadequate. Two studies reported adequate information in testing the relevance, comprehensiveness and comprehensibility of PROMs.12 27 29 One study reported adequate relevance and comprehensiveness.42 Among all studies, the most reported domain was internal consistency, except one study.36 Limited information could be retrieved on cross-cultural validity (3 studies),31 32 36 criterion validity (6 studies),16 23 33 35 38 40 reliability (10 studies),12 16 24 27 28 33 35 38–40 42 and responsiveness (2 studies).31 39 No data were identified as measurement error.

Table 2

Methodological quality assessment of the measures

Measurement property assessment

Table 3 shows the quality of the psychometric properties retrieved from 21 PROMs. Only the Persian version of the COST-v2 and Subjective Financial Distress Questionnaire (SFDQ) were rated as ‘+’ in structural validity.26 34 There were 17 PROMs rated as ‘+’ in internal consistency.12 14 16 23 24 26–29 31 32 34 35 37–39 41 42 Eight PROMs were rated as ‘+’” in reliability.12 24 26–29 31 35 Ten PROMs were rated as ‘+’ in hypothesis testing.12 14 23 24 27–31 33 35 39 Limited information was retrieved on cross-cultural validity (two PROMs),32 36 criterion validity (six PROMs),16 24 33 35 38 40 and responsiveness (two PROMs).31 39 No PROMs reported data on measurement error.

Table 3

Rating of measurement properties

Certainty of evidence

Table 4 shows the certainty of evidence for each measurement property. Among all included PROMs, the COST showed the best psychometric properties compared with other measures. The COST and its seven versions were rated as having high evidence of structural validity, internal consistency, hypothesis testing and criterion validity.12 24 25 27 28 32–34 39 The Financial Index of Toxicity (FIT) and Impact of Cancer-Childhood Survivors (IOC-CS) financial problems domain reported data on five properties and were rated on a scale from ‘very low evidence’ to ‘high evidence’.31 40

Table 4

Certainty of evidence of measurement properties

Discussion

This systematic review identified 21 PROMs and domains of PROMs evaluating FT in cancer survivors, including the COST (original, Brazilian, India, Italian, Persian, Simplified Chinese, Traditional Chinese version), FIT, Personal Financial Burden, P-SAFE, SWBS, Quality of Life in Adult Cancer Survivors (QLACS) financial problems domain, Chronic Cancer Experiences Questionnaire financial advice domain, Patient-Reported Outcome for Fighting Financial Toxicity (PROFFIT), Patient Roles and Responsibilities Scale financial well-being domain, SDI-21 providing for the family domain, SDI-16 money matters domain, SFDQ, IOC-CS financial problems domain and Cancer Problems in Living Scale (CPILS) employment/financial domain. Overall, the COST had a complete development process compared with other PROMs and showed the best psychometric properties, especially in terms of internal consistency, reliability and hypothesis testing. To the best of our knowledge, this is the first systematic review that has summarised the psychometric properties of FT PROMs in cancer survivors and reported the certainty of evidence for each property of PROMs. The results may provide quantitative evidence for researchers and healthcare professionals to choose PROMs measuring cancer survivors’ FT in future scientific research and clinical practice.

The results highlighted that the COST (of which we studied both version 1 and version 2) had better psychometric properties than other specific and generic PROMs in terms of internal consistency, reliability and hypothesis testing. The COST could be recommended as the most suitable worldwide available measure for use in research and clinical practice across different contexts. Other systematic reviews have also suggested that the COST is a promising measure from a content perspective.10 11 From a psychometric standpoint, there are a few issues that one must face when evaluating financial toxicity in cancer survivors using the COST. First, caution should be taken when using the COST in different socioeconomic conditions outside the USA. In some countries in Europe or Asia, the majority of medical expenses are covered by social health insurance, and direct out-of-pocket payments are replaced by prepayment from health insurance contributions.43 44 In addition, social security systems can benefit cancer survivors who are not able to work.45 These two socioeconomic factors may affect cancer survivors’ understanding regarding some items related to medical spending and indirect cost. However, few COST validation studies have considered socioeconomic issues, adapted the measure in a local context or provided data on cross-cultural validity. It is recommended that future COST validation studies recruit cancer survivors across multiple social and cultural backgrounds to assess cross-cultural measurement invariance.

Second, the original construct and item generation for the COST were based on a literature search; thus, the theoretical grounds for the measure are unclear, and the instrument may not capture detailed information related to the construct. Theoretical frameworks and conceptual models are crucial for self-reported measures to capture subtle changes in constructs.46 Although FT is a relatively new concept, certain models can guide item generation in the development of future FT PROMs. Tucker-Seeley and colleagues developed a conceptual model of FT and emphasised three components of financial burden, namely, the material, psychosocial and behavioural domains.47 Head developed SWBS based on James Coleman’s Theory of Social Class; this scale contains 17 items across 3 domains: human capital, material capital and social capital.14 30 48 Witte et al’s systematic review analysed 352 different questions regarding financial spending and found six domains (financial spending, financial resources, psychosocial affect, support seeking, coping care and coping lifestyle) that can represent reactions to subjective financial distress.10 Other theories and models, including the Wreckers theory of financial distress, ecological theory and the functionalist tradition, have also been widely used in cancer survivors.49–51 With the increasing number of theoretical studies related to FT, the theoretical grounds for future PROMs need to be clarified.

In addition to the COST, two other PROMs, namely, the FIT and the IOC-CS financial problems domains, also provided adequate data on psychometric properties. The FIT is relatively new and has fewer items than the other included measures. This measure was developed by Hueniken et al and has been validated only in survivors with head and neck cancer.31 Head and neck cancer, especially laryngeal and hypopharyngeal cancer, has particularly large impacts on survivors’ daily function (eg, speech and eating) after treatment and affects survivors’ ability to return to work.52 53 Only 32%–59% of head and neck cancer survivors return to work after treatment.54 This form of cancer also has short-term and long-term financial consequences for caregivers and their families.55 Therefore, future studies should be aware that the FIT may not be directly applicable to other cancer populations.

Regarding PROM development, we found that only two PROMs, PROFFIT and SFDQ, were not developed in the context of English-speaking developed countries such as the USA, the UK and Canada. The socioeconomic contexts and healthcare systems in these countries may be significantly different from those in other parts of the world and ultimately lead to a nuance in the perceived causes and consequences of FT. Previous studies have reported that FT is closely related to broad social determinants of economic circumstances. Factors including healthcare policy, healthcare system, insurance system, specific micro contexts and the level of regional economic development could not only affect the cancer survivors’ perceived level of FT but also determine the origins of FT.56 57 Additionally, cultural factors (eg, a cultural emphasis on saving and a cultural imperative to have a large family) also affect cancer survivors’ perceived financial security and economic burden.58

PROFFIT, which was developed in 2021 in the Italian context, also reported higher quality PROM development and content validity than other PROMs. We would consider it to be a good FT PROM against the COSMIN criteria if more validation studies were conducted to report a greater effect size of the measurement properties. Therefore, we recommend that researchers use context-specific measures to assess FT in cancer survivors (eg, using PROFFIT in Italy). Further studies are warranted to develop various FT PROMs based on different social and cultural backgrounds. Worldwide measures, such as COST, should be analysed to determine the differences between social, cultural and economic contexts.

Limitations

We acknowledge that there are some limitations to this study. First, this review included only studies that aimed to evaluate the measurement properties of FT PROMs. Many studies that aimed to explore the level of FT in cancer survivors also reported the reliability and validity of PROMs. Therefore, the PROMs we summarised in this systematic review had higher psychometric quality than other measures that we did not list in this review. Second, we included only studies published in English. Therefore, studies published in other languages were not included, which may affect the conclusion of this review. Third, we included only the original version of the FT domain from PROMs assessing quality of life in cancer survivors, such as the EORTC QLQ-C30 and the QLACS. Over 20 language versions of these PROMs do not provide sufficient details on the FT domain individually.

Conclusion

This systematic review summarised the psychometric properties of 20 PROMs evaluating FT in cancer survivors. The findings highlighted that, from a psychometric property perspective, the COST had an adequate PROM development process and showed the best psychometric properties among all examined PROMs, especially in internal consistency, reliability and hypothesis testing; thus, we recommend the COST as the most suitable worldwide available measures for use in research and clinical practice across different contexts. The FIT and the IOC-CS financial problems domain also had adequate psychometric properties. We suggest that PROMs should be selected only after careful consideration of the local socioeconomic context. Future studies are warranted to develop various FT PROMs based on different social and cultural backgrounds and a clear theoretical basis for assessing FT.

Data availability statement

No data are available.

Ethics statements

Patient consent for publication

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @lucylizarondo20

  • Contributors WX took full responsibility for the work, had access to the data, and controlled the decision to publish. ZZ and WX designed the systematic review, conducted data searching, extraction and analysis, assessing the certainty of evidence, and wrote the draft of the manuscript. HW and YS conducted quality appraisal and assessing the certainty of evidence. WKWS, LL, JP and YH provided critical comments. All authors approved the final version of the manuscript. WX is the guarantor.

  • Funding This work was supported by National Natural Science Foundation of China (Grant number: 72004034), China Medical Board Open Competition Program (Grant number: 20-371), Shanghai Pujiang Program (Grant number: 2019PJC017), and Shanghai Soft Science Key Program (Grant number: 20692104800)

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.