Objective A potential psychological harm of screening is unexpected diagnosis—labelling. We need to know the frequency and severity of this harm to make informed decisions about screening. We asked whether current evidence allows an estimate of any psychological harm of labelling. As case studies, we used two conditions for which screening is common: prostate cancer (PCa) and abdominal aortic aneurysm (AAA).
Design Systematic review with narrative synthesis.
Data sources and eligibility criteria We searched the English language literature in PubMed, PsychINFO and Cumulative Index of Nursing and Allied Health Literature (CINAHL) for research of any design published between 1 January 2002 and 23 January 2017 that provided valid data about the psychological state of people recently diagnosed with early stage PCa or AAA. Two authors independently used explicit criteria to review and critically appraise all studies for bias, applicability and the extent to which it provided evidence about the frequency and severity of harm from labelling.
Results 35 quantitative studies (30 of PCa and 5 of AAA) met our criteria, 17 (48.6%) of which showed possible or definite psychological harm from labelling. None of these studies, however, had either appropriate measures or relevant comparisons to estimate the frequency and severity of psychological harm. Four PCa and three AAA qualitative studies all showed clear evidence of at least moderate psychological harm from labelling. Seven population-based studies found increased suicide in patients recently diagnosed with PCa.
Conclusions Although qualitative and population-based studies show that at least moderate psychological harm due to screening for PCa and AAA does occur, the current quantitative evidence is insufficient to allow a more precise estimation of frequency and severity. More sensitive measures and improved research designs are needed to fully characterise this harm. In the meantime, clinicians and recommendation panels should be aware of the occurrence of this harm.
- Preventive Medicine
- Primary Care
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
Our systematic review included English language studies of all research designs and all psychological measures of people newly diagnosed with prostate cancer and abdominal aortic aneurysm.
We examined both study quality (internal validity) and generalisability (external validity).
We report both the design flaws of studies and their results.
Although it is likely that most included studies included participants diagnosed by screening, some studies did not make this clear.
Some studies reported results in a manner that made it impossible to determine the frequency and severity of harm from labelling.
When a physician makes a diagnosis, a patient receives a label. Patients with unexplained symptoms may be reassured by such labelling, but asymptomatic patients given a label may experience negative psychological consequences as a result of an unexpected diagnosis. In this paper, we use the term ‘labelling’ to refer to a healthcare professional giving an unexpected diagnosis to an asymptomatic patient, usually in the process of screening.
In deciding about screening, any negative effects of labelling should be added to other harms and costs, and then weighed against any benefits. The negative effects of labelling would be most problematic if the label were unnecessary (ie, the patient was actually overdiagnosed), given prematurely (ie, earlier than would be needed to achieve the same benefit) or given to a patient not destined to benefit from earlier detection (ie, due to unavailability of effective treatment or due to limited life expectancy). As these three categories often make up a majority of patients labelled by screening, it is important to quantify this harm as much as possible.
An older study of labelling found people diagnosed with hypertension had substantially increased absenteeism from work in the year after compared with the year prior to labelling (80% increase compared with 9% increase among workers not labelled). In another older study, 41% of asymptomatic adolescents previously screened and told they had a ‘heart problem’ had some restriction placed on their activity, although 79% had no actual cardiac disease.1–3 Yet, guideline panels rarely discuss the potential harm of labelling in making recommendations for screening. For example, of the 19 screening recommendations made by the US Preventive Services Task Force between January 2015 and January 2017, for only one has considerations of labelling been a factor (unpublished data, RPH, April 2017). We asked what current evidence is available, and what available evidence tells us about the frequency and severity of negative psychological effects from labelling of two conditions, prostate cancer (PCa) and abdominal aortic aneurysm (AAA), that are the focus of widespread screening.
We conducted a systematic review of research studies of any design published in the English language between 1 January 2002 and 23 January 2017 that assessed the psychological state of patients newly diagnosed with PCa or AAA. We chose this time period to focus on the most recent literature that would involve the current public psychological response to labelling, and that would most likely be included in new systematic reviews of screening. We chose these conditions, one cancer and one non-cancer, as they are commonly diagnosed by screening, but not all are immediately treated after diagnosis. This time lag means that psychological harms of diagnosis can be observed without interference from additional psychological effects of treatment. We also focused only on studies with newly diagnosed early stage PCa and AAA. Although included studies did not always discuss whether participants were diagnosed by screening or by symptoms, the natural history of PCa and AAA is such that early stage patients are almost always asymptomatic, and thus most likely to have been diagnosed by screening.
Study inclusion or exclusion
After formulating our research question in the population, intervention/exposure, comparison, timing, and outcome (PICOTS) format (table 1), we worked with a research librarian to search PubMed, PsychINFO and Cumulative Index of Nursing and Allied Health Literature (CINAHL) for articles on PCa and AAA published between 1 January 2002 and 23 January 2017 (search terms, online supplementary appendix A). We also searched the reference lists of included studies for further studies meeting our criteria.
Supplementary file 1
Two trained reviewers (the authors) independently examined each abstract identified by our searches, and the full text of any article that any single reviewer thought could meet inclusion criteria. For each full-text study, one reviewer extracted data, a second reviewer checked the extraction and then the first reviewer compiled extraction notes and comments. Uncertainty or disagreements between abstract or full-text reviewers about inclusion were resolved by discussion and final decisions were made by a third senior reviewer. Data extracted included the recruitment site, sample size, average age of the study population, comparison group (when available), measurement instruments used to assess psychological state, assessment time points and results.
We categorised all included studies into one of three research designs: quantitative, qualitative and population-based studies. Quantitative studies were those using quantitative rating scales in defined populations. Qualitative studies were those that did not use rating scales but rather engaged participants in structured or semistructured interviews and thus reported themes rather than frequencies. Population-based studies were studies from large databases providing frequencies of fatal or severe complications (eg, suicide rates) associated with a diagnosis of either PCa or AAA. As most cases of PCa and AAA are detected by screening at an early stage, we accepted population-based studies that did not stratify results by stage at diagnosis. For these studies, we also collected information about outcomes such as cardiovascular disease and psychiatric medical care that could have been associated with psychological distress, as well as longer-term outcomes that could be related to psychological distress. Our focus, however, was on outcomes that occurred shortly after diagnosis.
Determination of quality and answering the question of harm from labelling
We assessed the ‘quality’ (ie, internal validity) and generalisability (ie, applicability or external validity) of each of the three types of studies in two steps, using criteria modified from the US Preventive Services Task Force Procedure Manual (uspreventiveservicestaskforce.org) (table 2), described below.
After all articles meeting eligibility criteria were selected, we assessed each included quantitative study first for quality and second for the degree to which it answered our research question. Quality assessment addressed validity, reliability and response rate of the psychological measure (ie, greater than 60%) and measurement bias (ie, equal measurement of compared groups with valid and reliable measures).
As our research question was to determine the change in the frequency and severity of specific psychological states as a result of screening in the general primary care population, we also assessed each study for its ability to answer the research question. Thus, we examined the presence of a relevant comparison group, the applicability of study participants to a primary care population (which is the usual population for screening) and for the presentation of results in frequencies of specific psychological states. As important psychological harm from screening may not reach pathological levels, we assessed whether studies used only general psychological measures (which are often insensitive to less severe distress) or whether they also used more sensitive condition-specific measures. As above, these assessments were done independently by two co-authors, with discrepancies resolved by discussion and, when necessary, appeal to a third senior author. Table 2 provides a list of the assessment criteria.
After the assessments above, we grouped quantitative studies into one of four mutually exclusive categories (table 3), depending on the presence of a relevant comparison group, presentation of results and the results of the psychological measures. These categories are: no evidence of harm, possible evidence of harm, definite evidence of harm and uncertain evidence of harm (table 3).
Qualitative studies and population-based studies
We assessed the quality of qualitative studies and population-based studies using criteria in table 2. We then extracted themes from the qualitative studies, reflecting different types of psychological problems without ranking their severity. Frequency of problems cannot be obtained from this type of study. A limited number of outcomes were reported in the population-based studies, all of which were rare but severe (eg, suicide, cardiovascular events and psychiatric hospitalisation).
Patients were not involved in this systematic review. As we did not study patients but rather medical studies, we did not seek ethics committee approval.
Protocol and registration
There was no written protocol or registration of this systematic review.
We did not examine publication bias.
A total of 5348 articles were identified through database searching. Of these, 403 were duplicates and removed, leaving 4945 unique articles (308 AAA, 4637 PCa). At title and abstract review, a total of 4581 were excluded; 364 articles were assessed at full-text review (55 AAA, 309 PCa). For those articles that were excluded at full-text review, the most frequent reasons for exclusion were the wrong patient population (eg, already treated; 135/315, 42.9%) and the wrong time frame (eg, psychological assessment longer than 6 months after diagnosis, 59/315, 18.7%). A total of 49 studies (8 AAA, 41 PC) met inclusion criteria. Seven were qualitative studies (three AAA; four PCa)4–10;seven were population-based studies (all were PCa)11–17 and 35 were quantitative studies (5 AAA; 30 PCa).18–52 No studies were excluded for bias (see Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) figure, online supplementary appendix B).
Of the 35 included quantitative studies, 13 (37.1%)21 23–28 36 37 45 46 50 52 included relevant comparison groups, only 2 (5.7%)44 47 used more sensitive condition-specific measures and only 5 (14.3%)18 21 36 46 52 involved generally representative populations. Nearly half (17/35, 48.6%) of the studies indicated the presence of possible or definite psychological harm from the recent diagnosis; seven of these 17 studies (41.2%)21 26 36 37 45 46 50 included a relevant comparison group, although only one46 of these seven studies presented results in terms of frequencies and severity. This study, however, used only general, less sensitive measures in a non-representative population. For more than a third (14/35, 40.0%)18–20 27–35 38 51 of the included studies, the results were presented in such a way that the evidence of any harm was uncertain (ie, no comparison group and the results were given only in means, making determination of frequency and severity impossible). Only four studies (11.4%)23–25 52 used relevant comparison groups such that we could conclude that there was no evidence of harm, although all four of these studies used less sensitive general measures and three studied non-representative populations (table 3 and online supplementary appendix C).
No study presented results in terms of frequencies and used condition-specific measures with a relevant comparison group in a representative population, criteria for allowing a valid estimate of the frequency and severity of psychological harm from labelling. By inspection, there were no major differences in the results between AAA and PCa studies, although there were so few AAA studies that no statistical comparison could be made.
Because of clinical heterogeneity—multiple different measurement instruments and multiple different psychological outcomes—in these studies, no meta-analysis was possible.
Our systematic review identified three AAA and four PCa qualitative studies addressing the psychological effect of labelling. Nine themes of psychological effects emerged from the AAA qualitative studies and 10 from the PCa studies. Themes common to both conditions included shock, anxiety, fatalism, general distress and burden about protecting others from worrying. Several other themes were reported from one or the other condition alone (table 4 and online supplementary appendix D). All seven qualitative studies showed evidence of negative psychological effects of labelling, which we assessed as at least moderate in severity in that they interfered with individuals’ usual lives.
These studies primarily involved volunteers from non-representative groups; thus they cannot be used to estimate the population frequency of psychological symptoms due to labelling. They do, however, provide more in-depth information about the severity of some patients’ reactions to diagnosis. None of these studies assessed patients for pathological levels of psychological states (frequently the target of quantitative measures), but they do demonstrate considerable distress.
Our systematic review yielded seven PCa population-based studies from two different countries, the USA and Sweden. Only three outcomes were studied: suicide, cardiovascular outcomes and psychiatric outcomes. All seven studies found increased suicide rates or medical care for psychological problems compared with similar groups without PCa, usually early after diagnosis (table 5 and online supplementary appendix E). Although these outcomes are quite severe, they were also infrequent, yet higher than relevant comparison groups. It is likely that many of these newly diagnosed patients had PCa detected by screening.
Our systematic review examined negative psychological states of individuals soon after receiving a label of either PCa or AAA and whether these states could be attributed to unexpected diagnosis—labelling—usually by screening. We found that nearly half (48.6%) of the 35 quantitative studies and all of the qualitative (n=7) and population-based studies (n=7) showed either possible or definite evidence of psychological harm due to labelling. Only 11.4% (4/35) of the quantitative studies found no evidence of harm, and none of these studies used more sensitive condition-specific measures and only one studied a representative population (table 3 and online supplementary appendix C). The qualitative and population-based studies show that this harm can be at least moderate in severity, although they do not allow us to estimate the frequency of life-changing distress. Population-based studies in particular show increased suicide rates in men recently diagnosed with PCa.
We also found that the literature concerning labelling has a number of deficiencies. A previous review from our group53 found a lack of studies on labelling. This study extends that finding to examine the quality and results of studies that have been done. In this review, we found that few studies used appropriate measures and studied appropriate populations. Most studies had no relevant comparison group to determine whether a negative psychological state after diagnosis was pre-existing or was truly caused by labelling. Most studies gave results in ways that prevented calculation of the frequency of people harmed in important ways.
Our findings are in agreement with our knowledge of previous studies of long-term negative psychological effects associated with labelling.1–3 54 55 As noted earlier, there were two pioneer studies that gave early evidence of the harms of labelling. Subsequent studies have confirmed that labelling may have adverse effects on psychological well-being and perceived health.2 54–57 Our review of more recent studies found that although a number of studies have examined the psychological state of individuals recently labelled, no study defined the construct of labelling. Indeed, no study in our review even used the term ‘labelling’.
This study has several limitations. We only reviewed two of many possible risk factors and diseases that are currently the target of screening programme. It is likely that the labelling effect differs by the label. We also looked primarily at the time shortly after a diagnosis, with the exception of the population-based studies which followed individuals over a longer period of time. Thus, the time course of any negative psychological effects is uncertain. Longitudinal studies would be needed to examine this issue, although these studies would have the challenge of separating negative effects of labelling from the psychological effects of treatment. In some of our included studies, it was difficult to determine the exact timing between diagnosis and psychological assessment. It was also unclear whether participants had decided on a treatment method at that time. Anticipated treatment could influence a patient’s psychological state. We examined only studies in the English language; a review of non-English language studies is needed.
Another limitation is that some studies reported results in such a way that the frequency and severity of any psychological problems could not be determined. In future, it would be helpful for quantitative studies to report both the frequency and the severity of various psychological states. In addition, the great majority of quantitative studies used only general psychological measures, while condition-specific measures are likely more sensitive to the psychological distress usually experienced by individuals who have been labelled.23 Thus, the quantitative studies we included may have systematically underestimated the frequency and severity of negative psychological states. Even low-grade distress, if experienced by a large number of people, should be considered a substantial concern for a screened population.
Although not all included studies were clear that all of their participants were diagnosed by screening, by including in our review only quantitative and qualitative studies that provided results for early stage PCa and AAA, conditions that are almost always asymptomatic in their early stage, we ensured that these results are primarily if not entirely for patients diagnosed by screening. For population-based studies that often do not stratify results by stage at diagnosis, we focused on outcomes shortly after diagnosis. In any case, the inclusion of even a small number of symptomatic patients would likely have biased the results in the direction of less psychological harm. Thus, it is possible that our review underestimates the psychological effects of labelling due to diagnosis by screening.
Although we included qualitative studies, we are aware that, in general, they do not provide adequate evidence about the frequency of negative psychological states. They can, however, provide a deeper understanding of the severity of psychological distress associated with labelling.
Finally, few papers contained a relevant comparison group and those that did often used less than optimal comparisons. It is interesting to consider what an ideal comparison group would be. Because the very idea of screening may have negative psychological effects for some, the ideal comparison group would be a similar group in the general population who had not even been offered screening.58 59 Because the severity of negative psychological effects of labelling likely differs by one’s usual psychological state, it is important to make sure that the psychological state of individuals in the comparison group matches that of the labelled group before diagnosis.
Lack of appropriate comparison groups and prediagnostic psychological measurements in studies makes it difficult to interpret reported levels of psychological distress and makes us, for some studies (ie, those assessed as ‘uncertain evidence of harm’), uncertain about the extent to which we can attribute post-labelling psychological distress found to labelling itself. Also, some studies excluded individuals who had a history of mental illness and this may reduce generalisability of the results to real populations and may thus underestimate the effect of labelling on a particularly vulnerable population.
The main strengths of this systematic review are that we were able to look at different types of studies—quantitative, qualitative and population-based studies. Each helped to give a different understanding of the possible influence of labelling on psychological state. Furthermore, we were able to identify specific instruments that were used in each of these studies, such as the short form 36 item health survey (SF-36), and compare and contrast these general measures with ones that were more condition specific. The general measures found less psychological distress than the condition-specific measures.
Our findings suggest that when screening guidelines are being created, the harms of labelling should be considered when weighing benefits and harms. Even if the evidence is scant, guideline developers should consider the potential psychological harm of labelling. In addition, practitioners should be aware of the potential for labelling so that they can inform patients when making screening decisions and offer support if a patient is labelled.
Our findings should also send a message to researchers. Even with the limitations above, we found adequate evidence that labelling is a potential harm of screening. It is clear that the construct has yet to be fully and precisely defined, and that a valid and reliable measure of the psychological harm of labelling has yet to be developed. Only with more and better definition of labelling and better measures will we be able to accurately estimate the frequency, magnitude and time pattern of labelling harms. We need to know much more about variation of labelling effects across different conditions and among different types of individuals.
We conclude that labelling is a potential psychological harm of screening. It is a real phenomenon that has been underappreciated and understudied. We need more and better research on labelling to be able to weigh all of the potential benefits and potential harms of screening programme. In the meantime, clinicians and guideline developers should be aware of the potential harm of labelling when assisting patients in deciding about screening.
Contributors Conceptual design and organisation; assess complex articles and resolving disagreements: RPH. Organisation of initial search and initial database: CJB. Maintenance of database: ARC and KV. Updating searches: YY and MR. Initial assessment of abstracts, full-text articles: KV, ARC, LLM, YY and MR. Completing evidence tables and revision of manuscript: all authors. Initial draft of manuscript: LLM. All authors approved of the final manuscript.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No further data available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.