Diagnostic accuracy of the Whooley questions for the identification of depression: a diagnostic meta-analysis

Katharine Bosanquet; Della Bailey; Simon Gilbody; Melissa Harden; Laura Manea; Sarah Nutbrown; Dean McMillan

doi:10.1136/bmjopen-2015-008913

Article Text

PDF

PDF +
Supplementary
Material

XML

Mental health

Research

Diagnostic accuracy of the Whooley questions for the identification of depression: a diagnostic meta-analysis

Katharine Bosanquet1,
Della Bailey1,
Simon Gilbody1,2,
Melissa Harden3,
Laura Manea1,2,
Sarah Nutbrown1,
Dean McMillan1,2

¹Department of Health Sciences, University of York, York, UK
²Hull York Medical School, University of York, York, UK
³Centre for Reviews and Dissemination, University of York, York, UK

Correspondence to Katharine Bosanquet; kate.bosanquet{at}york.ac.uk

Abstract

Objectives To determine the diagnostic accuracy of the Whooley questions in the identification of depression; and, to examine the effect of an additional ‘help’ question.

Design Systematic review with random effects bivariate diagnostic meta-analysis. Search strategies included electronic databases, examination of reference lists, and forward citation searches.

Inclusion criteria Studies were included that provided sufficient data to calculate the diagnostic accuracy of the Whooley questions against a gold standard diagnosis of major depression.

Data extraction Descriptive information, methodological quality criteria, and 2×2 contingency tables were extracted.

Results Ten studies met inclusion criteria. Pooled sensitivity was 0.95 (95% CI 0.88 to 0.97) and pooled specificity was 0.65 (95% CI 0.56 to 0.74). Heterogeneity was low (I²=24.1%). Primary care subgroup analysis gave broadly similar results. Four of the ten studies provided information on the effect of an additional help question. The addition of this question did not consistently improve specificity while retaining high sensitivity as reported in the original validation study.

Conclusions The two-item Whooley questions have high sensitivity and modest specificity in the detection of depression. The current evidence for the use of an additional help question is not consistent and there is, as yet, insufficient data to recommend its use for screening or case finding.

Trial registration number CRD42014009695.

diagnostic accuracy
major depression
Whooley questions
screening
diagnostic meta-analysis

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

https://doi.org/10.1136/bmjopen-2015-008913

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

An original study–the first diagnostic accuracy meta-analysis of the Whooley questions as a screening test for depression.
Using rigorous methodology–strict inclusion/exclusion and quality assessment criteria–identified 10 studies of sufficient quality for inclusion.
Substantial variability observed in methodological quality of included studies.
Inconsistency in how Whooley questions are referred to means further relevant studies may have been missed.

Introduction

Depression is a highly prevalent condition that affects a substantial proportion of the population, varying from around 1 in 4 women to 1 in 10 men.1 ,2 It leads to impairments in functioning that are as significant as those seen in chronic physical health conditions.3 Although depression is a common condition, it is often hard to detect in primary care and other non-psychiatric settings. Despite the significance of the problem, there is remarkable uncertainty about the value of screening or case finding for depression. The guidance from different Western countries is contradictory,4 ,5 and from a UK health perspective, recommendations offered by different UK bodies are also inconsistent.6–10 The UK National Screening Committee11 concluded that there is insufficient evidence to recommend the adoption of screening for depression and also identified a lack of robust evidence for case finding among populations at elevated risk. In contrast, the National Institute of Health and Care Excellence (NICE) guidance recommends that, in the UK, general practitioners (GPs) consider asking two brief questions to identify potential depression in certain patient groups7–9 such as people with long-term conditions and women during the perinatal period; if someone responds positively to either question a more comprehensive assessment is carried out, to determine whether or not an individual is depressed.

NICE guidance recommends considering using the Whooley questions,12 derived from the original Prime-MD,13 to identify potential depression. The Whooley questions consist of two questions asking about low mood and loss of interest or pleasure. In the original validation study, the questions had a sensitivity of 0.95 (0.89 to 0.98) and specificity of 0.56 (0.52 to 0.61). A subsequent validation study added a third question, which asks whether the person wants help with the difficulties identified.14 Although NICE endorses the use of the Whooley questions, the guidance recognises that this is based on limited evidence of the diagnostic accuracy of the measure. Perhaps as a consequence of this, practitioners also have doubts about the ability of the questions to detect depression.15 There is further uncertainty about whether the two or three-item version of the questions should be used, with some NICE guidance recommending the use of the third question,9—though recent policy changes have seen this removed10—while other guidance specifically chose not to adopt this additional question because of a lack of evidence on its effectiveness.8

The Whooley questions are at the centre of the UK's approach to the identification of depression, yet at the time the UK guidance was published there was limited evidence on the diagnostic performance of the test. It remains unclear whether a review of the current evidence base would lead to a revision of UK guidance. We conducted a systematic review, therefore, to identify all studies that had examined the diagnostic accuracy of the Whooley questions against a gold standard method of establishing a diagnosis of major depression according to internationally recognised criteria. A further component of the review was to assess the effect of the ‘help’ question in those studies that included it in the screen.

Method

A protocol for the systematic review was developed and published on PROSPERO (registration number: CRD42014009695 http://www.crd.york.ac.uk/PROSPERO/). We adhered to Centre for Reviews and Dissemination guidance in the conduct of the review and PRISMA guidelines in the reporting of the review.16

Data sources and searches

The following databases were searched to identify studies assessing the diagnostic test accuracy of the Whooley questions: MEDLINE, MEDLINE In-Process, PsycINFO, EMBASE, Cumulative Index to Nursing & Allied Health (CINAHL Plus), Cochrane Central Register of Controlled Trials (CENTRAL), Cochrane Database of Systematic Reviews (CDSR), Database of Abstracts of Reviews of Effects (DARE), and the Health Technology Assessment (HTA) database. A number of additional sources were searched to identify studies in progress, unpublished research or grey literature: Conference Proceedings Citation Index—Science and Social Science, OAIster, ClinicalTrials.gov, Health Services Research Projects in Progress (HSRProj) and the Trip database.

Searches were conducted from 1994—the year the PRIME-MD was published from where the Whooley questions were derived—to September 2013. No language restrictions or study design filters were applied to the search strategy. In addition, a forward citation search of the Whooley 1997 paper was carried out in the Web of Science database to identify any further papers on the Whooley questions. We examined the reference lists and conducted a reverse-citation search of all included studies.

A search strategy, consisting of relevant free-text terms and subject headings, was developed in MEDLINE (OvidSP) and then adapted for use in the other databases searched. Online supplementary appendix 1 gives the full search strategy for MEDLINE. Furthermore, we contacted key experts in the field to obtain information about potential unpublished data and for clarification on aspects of their work, which consisted of six authors including Whooley et al,12 Arroll and colleagues.14 ,17

An update of the searches was conducted in April 2015. No further diagnostic accuracy studies using the Whooley questions were found. However, we did observe changes to policy. NICE had amended guidance on perinatal depression (CG192).10 It now recommends considering asking the Whooley questions alone rather than with the addition of a help question.

Study selection

Studies were selected using a prepiloted form based on the PICO inclusion criteria in the review protocol. Three reviewers assessed titles and abstracts to identify potentially eligible studies. Any queries were discussed with a second reviewer. Full text was obtained for all articles included after this initial screen. Each of these was assessed using the prepiloted form by two reviewers. At each stage any disagreements were resolved by consensus and where necessary arbitration by further reviewers.

Studies that met the following inclusion criteria were included: Participants/population; No restrictions were made in terms of the participants or population. Instrument: Studies that used either the two-item or three-item Whooley questions were included. The two-item questions had to use the standard Whooley wording, as outlined in the original article.12

“During the past month, have you often been bothered by feeling down, depressed, or hopeless?” (yes/no)
“During the past month, have you often been bothered by little interest or pleasure in doing things?” (yes/no)12

For translated versions, the wording had to be derived from the original. The questions also had to be scored as a dichotomous ‘yes’/‘no’. For the two-item Whooley questions, only studies that defined a positive screen as ‘yes’ to one or both of the questions were included. Given inconsistencies in the literature about the precise phrasing of the ‘help question’, all variations in phrasing were accepted. No restrictions were made in terms of mode of administration (eg, telephone or face-to-face) or the person administering the measure (eg, clinician, researcher or self-administered). Comparator (reference standard): Studies that use a gold standard diagnostic interview to establish a diagnosis of major depression according to international criteria (Diagnostic and Statistical Manual (DSM) or International Classification of Disease (ICD)) were eligible for inclusion. Studies were excluded if the target diagnosis was not solely major depression (eg, any depressive disorder). No restrictions were made in terms of who administered the gold standard or its mode of administration. Outcome: For a study to meet inclusion criteria, it had to report sufficient data to extract 2×2 contingency tables for either the two-item Whooley questions or the two-item questions plus an additional help question. Study design: No restrictions were made in the type of study design.

Data extraction and quality assessment

Two reviewers independently extracted the following data to a prepiloted standardised form: (1) descriptive characteristics of the sample and setting (country, setting, age of sample, gender of sample, sample size, proportion depressed); (2) descriptive characteristics of the Whooley (mode of administration, who administered, language); (3) descriptive characteristics of the gold standard (type of gold standard, whether DSM or ICD diagnoses); (4) quality assessment criteria (see below); and (5) the 2×2 contingency tables for the two-item Whooleys and/or two-item Whooleys plus help question against gold standard diagnosis of major depression. Any disagreements were resolved through consensus or, where necessary, arbitration by a third reviewer. Study authors were contacted to provide additional data or clarification as necessary.

Quality assessment was conducted at the study level and used criteria based on the QUADAS-II.18 The QUADAS-II guidelines require that it is adapted for each specific review; this can involve adding or omitting questions and providing clarification about how specific questions are to be rated. We developed specific guidance on the coding of the questions in the form of a brief field guide.

We retained all of the risk of bias signalling questions and applicability questions, with the exception of one item (prespecified threshold on the index test). This item was removed because the standard method of scoring the Whooley provides a dichotomous cut-off; there is no ordinal or continuous scale that requires the prespecification of a threshold. For the signalling question ‘Is the reference standard likely to correctly classify the target condition?’ we operationalised this as whether the researchers who conducted the gold standard interview had received appropriate training. For the signalling question ‘Was there an appropriate interval between the index test and reference standard?’ we defined an appropriate interval as less than 2 weeks in keeping with how this item has been applied in previous diagnostic test accuracy studies of depression.19

We added two additional questions that were applied to studies using translated versions of the Whooley and reference test. For translations of the reference test, we asked whether appropriate forward and back translation methods were used and whether psychometric properties of the translated version were reported. Similarly, we asked whether appropriate translation methods were used and also applied to any translated version of the Whooley. We also added an additional question to establish whether the studies had used strategies to exclude people already known to a service to have depression. This reflects Thombs et al's20 concern that studies which include people already known to be depressed may provide an artificially inflated indication of a test's performance, because the typical aim of a screening or case finding tool is to identify depression in those not already known to be depressed. Studies met this criterion if they used strategies to exclude people already known to be depressed, such as excluding people already known to be using psychotropic medication.

Data synthesis and analysis

We constructed 2×2 contingency tables with true positive, true negative, false positive and false negative results. We performed a bivariate diagnostic meta-analysis to obtain pooled estimates of specificity, sensitivity, likelihood ratios, diagnostic ORs and their associated 95% CIs. The bivariate model is a 2-level model which takes into account the precision by which differences in sensitivity and specificity have been calculated while incorporating and estimating the amount of between-study variability in sensitivity and specificity.21 A priori subgroup analyses were conducted on descriptive variables and quality assessment criteria.

Heterogeneity

We measured the between study heterogeneity using the I² statistic of the pooled diagnostic OR.22 I² describes the percentage of total variation across studies, which is caused by heterogeneity rather than chance. The I² has a greater statistical power to detect clinical heterogeneity when fewer studies are available compared to other measures of heterogeneity. I² values of 25% may be considered low, 50% moderate and 75% high. We explored the causes of heterogeneity where there was significant between-study heterogeneity by visually inspecting the summary receiver operation characteristic curves and identifying the studies that were outside the 95% confidence ellipse. We also undertook a meta-regression analysis of logit diagnostic OR using a priori potential sources of heterogeneity entered as covariates in the meta-regression model.23

We investigated the heterogeneity resulting from sample or study design characteristics by exploring the effects of potential predictive variables.24 For the sample we examined the effect of language (translated vs not translated), baseline prevalence of major depressive disorder in the screened population, as a proxy measure of the spectrum of severity of disorder within the screened population, and study settings (primary care vs general hospital). For study quality, we considered blinding (of the assessor to the results of the Whooley questions as well as the gold standard) and whether the studies avoided a case–control design or an artificially inflated base rate of major depression. If these items were important sources of heterogeneity, then they would be predictive in a meta-regression analysis, and would reduce the level of between-study heterogeneity in the meta-regression model.

Analyses were conducted using STATA V.12, with the metandi, metabias, metareg and metafunnel user-written commands.

Results

The initial search identified 6846 unique citations (10 589 citations before de-duplication). Twenty-two of these citations met initial inclusion criteria and were selected for further screening of the full article (figure 1). Ten of the 22 met final stage inclusion criteria. The reasons for exclusion of the 12 studies are as follows: three used the PHQ-2 not the Whooley,25–27 for one study we were unable to establish whether the two-item questionnaire used was the Whooley,28 four did not use a gold standard reference test,13 ,29–31 two did not report data on a diagnosis of major depression alone (eg, outcome was any depression diagnosis)32 ,33 and for two it was not possible to extract information to calculate a 2×2 contingency table.34 ,35

Figure 1

Overview of selection of studies (PRISMA).

Overview of included studies

Table 1 summarises the characteristics of the included studies. The studies took place in a variety of countries and settings. The samples included adults and older adults and ranged from predominantly male12 to entirely female samples.36 ,37 Sample sizes ranged from 8938 to over 100014 ,39 and the proportion depressed according to the gold standard ranged from 3.3%38 to 34%.40 Clinicians administered the Whooley questions in the majority of studies. The language of administration was English in six of the studies; translated versions were used in the remainder. A variety of gold standard measures were used, though the CIDI was used in 4 of the 10 studies.

View this table:

Table 1

Descriptive characteristics of the included studies

Quality assessment

Table 2 summarises the results of the quality assessment using QUADAS-II. None of the studies was rated as at low risk of bias across all domains. A rating of an unclear risk of bias was the most common rating across the domains. All studies avoided the use of a case–control design. Only three clearly made attempts to exclude people with a known history of depression. Six of the 10 studies provided evidence of blinding in both directions (ie, Whooley interpreted blind to reference, reference interpreted blind to Whooley). In terms of the QUADAS-2 applicability criteria, all studies were rated as applicable on all three domains.

View this table:

Table 2

Quality assessment of included studies

Diagnostic properties of the Whooley questions (no help question)

Ten studies reported the diagnostic properties of the Whooley questions. One study41 reported a significantly lower sensitivity and higher specificity than other studies. In the remaining nine studies, the sensitivity ranged between and 0.9039 and 1.00.36–38 ,42 Specificity values ranged between 0.4437 ,42 and 0.78.14 Table 3 presents the individual performance of the 10 studies including sensitivity, specificity, likelihood ratios and diagnostic ORs and their corresponding 95% CIs.

View this table:

Table 3

Performance of individual studies (no help question)

The pooled sensitivity was 0.95 (CI 0.88 to 0.97), pooled specificity 0.65 (CI 0.56 to 0.74), pooled positive likelihood ratio 2.78 (CI 2.16 to3.57), pooled negative likelihood ratio 0.07 (CI 0.03 to 0.16) and diagnostic OR 36.91 (17.52 to 77.76). The level of between-study heterogeneity was low (I²=24.1%). Figure 2 shows the Whooley questions summary receiver operating characteristic plot of major depression diagnosis. Figure 3 shows the posterior probabilities given positive and negative test results. The figure shows that, at the prevalence rate expected in the general population (less than 20%), the probability of a depressed person with a negative test result is very low; whereas the probability of a depressed person with a positive test result is around 40%.

Figure 2

Whooley questions summary receiver operating characteristic plot of diagnosis of major depressive disorder. Pooled sensitivity and specificity using a bivariate meta-analysis.

Figure 3

Bayesian graph for major depressive disorder for Whooley questions.

We conducted a meta-regression to explore possible sources of heterogeneity. Descriptive variables and quality assessment criteria (setting, baseline prevalence of major depression, language, whether the study avoided a case–control design and blinding) were examined as predictors. Out of these variables, only the prevalence of major depression was significant (p=0.026).

Subgroup analyses

One of the possible reasons for heterogeneity is the various clinical settings in which the Whooley questions have been validated. On a priori grounds we conducted subgroup analyses to examine the diagnostic performance of the Whooley questions in similar clinical settings.

Five studies were conducted in primary care settings,14 ,17 ,37 ,40 ,42 three studies recruited in hospital or out-patient-based medical settings12 ,36 ,39 and two in community settings.38 ,41 In primary care settings the Whooley questions had a pooled sensitivity of 0.96 (CI 0.91 to 0.98), pooled specificity 0.61 (CI 0.48 to 0.73), pooled positive likelihood ratio 2.53 (CI 1.80 to 3.56), pooled negative likelihood ratio 0.04 (CI 0.01 to 0.13) and diagnostic OR 52.07 (15.65 to 173.18). Heterogeneity in primary care studies was moderate I²=49.9%.

We did not identify a sufficient number of studies (minimum of four studies for a diagnostic meta-analysis) using a comparable clinical setting to conduct further subgroup analyses for other settings. There were not enough studies to pool the results separately for different age groups.

Six studies validated the original (English) version of the Whooley questions.12 ,14 ,17 ,36 ,37 ,39 Pooled sensitivity for these studies was 0.95 (0.89 to 0.98), pooled specificity was 0.64 (0.54 to 0.72), positive likelihood ratio 2.67 (2.11 to 3.38), negative likelihood ratio 0.06 (0.02 to 0.15) and pooled diagnostic OR 40.64 (17.00 to 97.14). Heterogeneity in the English studies was low (7.3%).

Whooley questions and help question

Lack of consistency in the phrasing of the questions and how the data were combined meant that we were unable to combine results for a meta-analysis of the help question. Instead we described the results of the studies individually. Two studies14 ,41 considered a positive screen as a positive response to either or both Whooley questions and yes to the help question (yes today; or yes, but not today). The psychometric properties of this method of scoring the Whooley questions were, as reported by Arroll et al14: sensitivity 0.95 (95% CI 0.85 to 0.99), specificity 0.89 (95% CI 0.87 to 0.91), positive likelihood ratio 9.06 (95% CI 7.41 to 11.10) negative likelihood ratio 0.04 (95% CI 0.01 to 0.18) and OR 190.00 95% (50.00—* value unable to be estimated). The psychometric properties reported by Suija et al showed a lower sensitivity of 0.68 (95% CI 0.46 to 0.85) but comparable specificity of 0.85 (0.82 to 0.88). Positive likelihood ratio was 4.77 (95% CI 3.36 to 6.78), negative likelihood ratio 0.37 (95% CI 0.21 to 0.66) and OR 12.80 (95% CI 5.40 to 30.20). Arroll et al14 made the distinction between ‘help, yes but not today’ or ‘yes, help today’ though we were unable to extract 2×2 tables for these different responses to the help questions from the data presented in the paper.

The remaining two studies36 ,42 reported the psychometric properties of the help question only in those who scored positive on either Whooley questions. Mann et al used the help question ‘is this something you feel you need or want help with?’ rather than the one proposed by Arroll et al14. Psychometric properties of a positive answer to either Whooley question and a positive answer to this question were as follows: sensitivity 0.66 (95% CI 0.38 to 0.88), specificity 0.91 (95% CI 0.78 to 0.98), positive likelihood ratio 8.22 (95% CI 2.62 to 25.80), negative likelihood ratio 0.36 (95% CI 0.17 to 0.74) and OR 22.70 (95% CI 4.83 to 105.00).

Mohd-Sidik et al used the help question proposed by Arroll et al14, and made the distinction between ‘help, yes but not today’ or ‘yes, help today’. For this study we were able to ascertain how distinguishing between these two options can affect the ability of the help question to detect depression, in people who responded yes to either of the Whooley questions. If a positive answer to the help question was considered ‘yes today’, sensitivity was 0.61 (95% CI 0.42 to 0.78), specificity was 0.94 (95% CI 0.80 to 0.99), positive likelihood ratio was 10.4 (95% CI 2.64 to 41.1), negative likelihood ratio 0 0.41 (95% CI 0.262 to 0 0.64) and OR 25.3 (95% CI 5.55—* value unable to be estimated). If a positive answer to help question was considered a positive answer to ‘yes today, or yes, but not today’, sensitivity was higher at 0.87% (95% CI 0.70% to 0.96%), but specificity lower at 0.82% (95% CI 0.65% to 0.93%); positive likelihood ratio was 4.94 (95% CI 2.36 to 10.30), negative likelihood ratio was 0 0.15 (95% CI 0.06 to 0.39) and OR 31.5 (95% CI 8.22 to 120.00). In this study, therefore, answering ‘yes, help today’ increases the specificity of the Whooley questions when used in conjunction with the help question.

Discussion

NICE guidance recommends that, in the UK, GPs consider using the Whooley questions to identify potential depression in certain patient groups7–9 such as people with long-term conditions and women during the perinatal period. The guidance suggests that the Whooley questions are used as a case-finding tool for depression, so if an individual responds positively to one or both of the questions a more comprehensive assessment is carried out to determine whether or not that individual is depressed. The guidance acknowledges, though, that this recommendation is based on limited evidence. Furthermore, there is inconsistency between NICE guidance about whether the Whooley questions should be combined with an additional help question.

This review sought to establish the current evidence for the diagnostic performance of both the original two-item Whooley questions and their combination with an additional help question. The original validation study reported that the two-item version of the questions had high sensitivity (0.95, 95% CI 0.89 to 0.98) and modest specificity (0.56, 95% CI 0.52 to 0.61). The current review found comparable results. Pooled sensitivity was 0.95 (95% CI 0.88 to 0.97) and pooled specificity was 0.65 (95% CI 0.55 to 0.74). Similar figures were also reported in the subgroup analysis examining primary care studies (sensitivity: 0.96, 95% CI 0.91 to 0.98; specificity: 0.61, 95% CI 0.48 to 0.73).

Our search identified four studies that used the help questions. The authors of the original validation study14 developed the help question in order to encourage the patient to take an active role in making decisions about their own treatment. They also suggested that the help question may improve specificity. Two categories of help were proposed in this study (help ‘but not today’, and help ‘yes today’).14 ,42 However, of the four studies identified in our review, only two studies, one of which was the original validation study, distinguished between these two help categories: one study combined the two responses41 and the fourth study36 used a different response. Given the small number of studies and the variability in how the help question was used, we were unable to combine these studies in a meaningful way in order to ascertain the diagnostic performance of the help question when used with the original Whooley questions.

Limitations

The results of the systematic review need to be considered in light of the limitations of the primary studies used in the review and the review itself. As the QUADAS-2 ratings indicate, there are a number of limitations of the primary studies and often details about key methodological criteria were not reported. Only a small number made attempts to exclude people already known to have depression. The aim of depression screening is typically to identify depression in those not known to have that problem. It is possible that excluding those known to be depressed may alter the diagnostic performance of a test. Blinding in both directions was established in some but not all studies. Lack of blinding may artificially inflate the diagnostic performance of a test. It is possible then that the results may overestimate the performance of the Whooley.

Four of the 10 studies used the CIDI as the reference test, an instrument that has been described as an imperfect gold standard for mental health diagnosis.43 However, the results of these studies for the two-item Whooley questions appeared broadly comparable with studies using a different gold standard. For the studies using the additional help question, the two studies that used the CIDI were the same two studies that reported increased specificity without an impact on sensitivity,14 ,42 findings that were not replicated in the two studies that used other gold standards.36 ,41 It is unclear to what extent these differences are linked to the use of different gold standards.

There are also a number of limitations of the review itself. First, we did not include the ‘help’ question in the search terms, which may have meant we missed articles focused solely on its effect. Second, although efforts were made to identify grey literature, it remains possible that unpublished studies were missed, so we cannot rule out the possibility of publication bias. Third, there is inconsistency in the published studies in how the Whooley questions are referred to, and while the inclusion of various alternative terms for the Whooley questions in the search strategy attempted to address this, it is possible that further relevant studies may have been missed.

Recommendations

The limitations suggest a number of research recommendations. Future diagnostic validation studies should report sufficient detail on the method to permit an assessment of key methodological criteria, such as those given in the QUADAS-2. Subsequent reviews of the Whooley would benefit from a more consistent method of referring to the Whooley in primary studies. We would recommend the use of the term ‘Whooley questions’ and avoidance of the term ‘PHQ-2’. Although the PHQ-2 shares similarities with the Whooley questions, the PHQ-244 asks about a different time frame and uses a different scoring system (see online supplementary appendix 2). We recommend that future studies should refer to Whooley in the title or abstract to facilitate future reviews of the measure.

Conclusion

This review on the diagnostic accuracy of the Whooley questions provides evidence of consistent high sensitivity and moderate specificity for the two questions across a range of settings among different populations. The Whooley questions demonstrate discriminatory power at ruling out depression: few people who answer no to both questions are depressed according to gold standard diagnostic interview. Given that depression is a common condition, this finding should be valuable to clinicians in general practice for use with patients they have concerns about. Despite its modest specificity, which means that many people who score positively will not meet diagnostic criteria for depression, the test retains value in its ability to eliminate the target condition. Although this review identified some evidence that the addition of a help question appeared to improve specificity—when used as second tier test—the inconsistency, both in how the question was phrased and how data were combined, means evidence of its performance remains limited.

References

↵
Mental Health Foundation. Mental Health Statistics [cited 2015 07/04/15]. http://www.mentalhealth.org.uk/help-information/mental-health-statistics/
↵
National Institute for Health and Clinical Excellence. Clinical knowledge summaries: depression prevalence. NICE, 2015. [updated Last revised in March 2015; cited 2015 07/04/15]. http://cks.nice.org.uk/depression#!backgroundsub:1
↵
1. Moussavi S,
2. Chatterji S,
3. Verdes E, et al
. Depression, chronic diseases, and decrements in health: results from the World Health Surveys. Lancet 2007;370:851–8. doi:10.1016/S0140-6736(07)61415-9
OpenUrl CrossRef PubMed Web of Science
↵
1. Joffres M,
2. Jaramillo A,
3. Dickinson J, et al.
, Canadian Task Force on Preventive Health Care. Recommendations on screening for depression in adults. CMAJ 2013;185:775–82. doi:10.1503/cmaj.130403
OpenUrl FREE Full Text
↵
US Preventive Services Task Force. Guide to clinical preventive services. Alexandra, VA: Williams & Wilkinsy, 1996.
↵
1. Allaby M
. Screening for depression: a report for the National Screening Committee. Oxford: NHS PHRU, 2010.
↵
National Institute for Health and Clinical Excellence. CG90 depression: the Nice Guideline on the treatment and management of depression in adults. London, 2010. http://www.nice.org.uk/guidance/cg90/evidence/cg90-depression-in-adults-full-guidance2
↵
National Institute for Health and Clinical Excellence. CG91 Depression in adults with a chronic physical health problem. London, 2010. http://www.nice.org.uk/guidance/cg91/evidence/cg91-depression-with-a-chronic-physical-health-problem-full-guideline2
↵
National Institute for Health and Clinical Excellence. Clinical guideline 45: antenatal and postnatal mental health. London: NICE, 2007.
↵
National Institute for Health and Clinical Excellence. NICE guidelines [CG192]: antenatal and postnatal mental health: clinical management and service guidance. NICE, 2014. [updated December 2014; cited 2015 08/04/15]. http://www.nice.org.uk/guidance/cg192/chapter/1-recommendations#recognising-mental-health-problems-in-pregnancy-and-the-postnatal-period-and-referral-2
↵
National Screening Committee. The UK National Screening Committee's criteria for appraising the viability, effectiveness and appropriateness of a screening programme. London: NSC, 2003.
↵
1. Whooley M,
2. Avins A,
3. Miranda J, et al
. Case-finding instruments for depression. Two questions are as good as many. J Gen Intern Med 1997;12:439–45. doi:10.1046/j.1525-1497.1997.00076.x
OpenUrl CrossRef PubMed Web of Science
↵
1. Spitzer R,
2. Williams J,
3. Kroenke K, et al
. Utility of a new procedure for diagnosing mental disorders in primary care: the PRIME-MD 1000 study. JAMA 1994;272:1749–56. doi:10.1001/jama.1994.03520220043029
OpenUrl CrossRef PubMed Web of Science
↵
1. Arroll B,
2. Goodyear-Smith F,
3. Kerse N, et al
. Effect of the addition of a “help” question to two screening questions on specificity for diagnosis of depression in general practice: diagnostic validity study. BMJ 2005;331:884. doi:10.1136/bmj.38607.464537.7C
OpenUrl Abstract/FREE Full Text
↵
1. Beauchamp H
. What factors influence the use of the Whooley questions by health visitors? J Health Visiting 2014;2:378–87. doi:10.12968/johv.2014.2.7.378
OpenUrl
↵
1. Moher D,
2. Liberati A,
3. Tetzlaff J, et al
. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Int Med 2009;151:264–9. doi:10.7326/0003-4819-151-4-200908180-00135
OpenUrl CrossRef PubMed Web of Science
↵
1. Arroll B,
2. Khin N,
3. Kerse N
. Screening for depression in primary care with two verbally asked questions: cross sectional study. BMJ 2003;327:1144–6. doi:10.1136/bmj.327.7424.1144
OpenUrl Abstract/FREE Full Text
↵
1. Whiting P,
2. Rutjes A,
3. Westwood M, et al
. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Int Med 2011;155:529–36. doi:10.7326/0003-4819-155-8-201110180-00009
OpenUrl CrossRef PubMed Web of Science
↵
1. Mann R,
2. Hewitt C,
3. Gilbody S
. Assessing the quality of diagnostic studies using psychometric instruments: applying QUADAS. Soc Psychiatry Psychiatr Epidemiol 2009;44:300–7. doi:10.1007/s00127-008-0440-z
OpenUrl CrossRef PubMed Web of Science
↵
1. Thombs B,
2. Arthurs E,
3. El-Baalbaki G, et al
. Risk of bias from inclusion of patients who already have diagnosis of or are undergoing treatment for depression in diagnostic accuracy studies of screening tools for depression: systematic review. BMJ 2011;343:d4825.
OpenUrl Abstract/FREE Full Text
↵
1. Reitsma J,
2. Glas A,
3. Rutjes AW, et al
. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005;58:982–90. doi:10.1016/j.jclinepi.2005.02.022
OpenUrl CrossRef PubMed Web of Science
↵
1. Higgins J,
2. Thompson S,
3. Deeks J, et al
. Measuring inconsistency in meta-analyses. BMJ 2003;327:557–60. doi:10.1136/bmj.327.7414.557
OpenUrl FREE Full Text
↵
1. Thompson S,
2. Higgins J
. How should meta-regression analyses be undertaken and interpreted? Stat Med 2002;21:1559–73. doi:10.1002/sim.1187
OpenUrl CrossRef PubMed Web of Science
↵
1. Lijmer J,
2. Bossuyt P,
3. Heisterkamp S, et al
. Exploring sources of heterogeneity in systematic reviews of diagnostic tests. Stat Med 2002;21:1525–37. doi:10.1002/sim.1185
OpenUrl CrossRef PubMed Web of Science
↵
1. Chagas M,
2. Crippa J,
3. Loureiro S, et al
. Validity of the PHQ-2 for the screening of major depression in Parkinson's disease: two questions and one important answer. Aging Ment Health 2011;15:838–43. doi:10.1080/13607863.2011.569482
OpenUrl PubMed
↵
1. Henkel V,
2. Mergl R,
3. Coyne J, et al
. Screening for depression in primary care: will one or two items suffice? Eur Arch Psychiatry Clin Neurosci 2004;254:215–23. doi:10.1007/s00406-004-0476-3
OpenUrl PubMed Web of Science
↵
1. Zuithoff N,
2. Vergouwe Y,
3. King M, et al
. The Patient Health Questionnaire-9 for detection of major depressive disorder in primary care: consequences of current thresholds in a crosssectional study. BMC Fam Pract 2010;11:98. doi:10.1186/1471-2296-11-98
OpenUrl CrossRef PubMed
↵
1. Chochinov HK,
2. Wilson KG,
3. Enns M, et al
. “Are you depressed?” Screening for depression in the terminally ill. Am J Psychiatry 1997;154:674–6. doi:10.1176/ajp.154.5.674
OpenUrl CrossRef PubMed Web of Science
↵
1. Burton C,
2. Simpson C,
3. Anderson N
. Diagnosis and treatment of depression following routine screening in patients with coronary heart disease or diabetes: a database cohort study. Psychol Med 2013;43:529–37. doi:10.1017/S0033291712001481
OpenUrl CrossRef PubMed
↵
1. Lombardo P,
2. Vaucher P,
3. Haftgoli N, et al
. The ‘help’ question doesn't help when screening for major depression: external validation of the three-question screening test for primary care patients managed for physical complaints. BMC Med 2011;9:114. doi:10.1186/1741-7015-9-114
OpenUrl PubMed
↵
1. Shah M,
2. Karuza J,
3. Rueckmann E, et al
. Reliability and validity of prehospital case finding for depression and cognitive impairment. Am Geriatr Soc 2009;57:697–702. doi:10.1111/j.1532-5415.2009.02185.x
OpenUrl
↵
1. Biswas S,
2. Gupta R,
3. Vanjare H, et al
. Depression in the elderly in Vellore, South India: the use of a two-question screen. Int Psychogeriatr 2009;21:369–71. doi:10.1017/S1041610208008259
OpenUrl CrossRef PubMed
↵
1. Ryan D,
2. Gallagher P,
3. Wright S, et al
. Sensitivity and specificity of the Distress Thermometer and a two-item depression screen (Patient Health Questionnaire-2) with a ‘help’ question for psychological distress and psychiatric morbidity in patients with advanced cancer. Psychooncology 2012;21:1275–84. doi:10.1002/pon.2042
OpenUrl PubMed
↵
1. Brody D,
2. Hahn S,
3. Spitzer R, et al
. Identifying patients with depression in the primary care setting: a more efficient method. Arch Intern Med 1998;158:2469–75. doi:10.1001/archinte.158.22.2469
OpenUrl CrossRef PubMed Web of Science
↵
1. Suzuki T,
2. Nobata R,
3. Kim N, et al
. Evaluation of Questionnaires (Two question case finding instrument & Beck Depression Inventory)as a tool for screening and intervention of depression in work place. Seishin Igaku (Clinical Psychiatry) 2003;45:699–708.
OpenUrl
↵
1. Mann R,
2. Adamson J,
3. Gilbody S
. Diagnostic accuracy of case-finding questions to identify perinatal depression. CMAJ 2012;184:E424–30. doi:10.1503/cmaj.111213
OpenUrl Abstract/FREE Full Text
↵
1. Gjerdingen D,
2. Crow S,
3. McGovern P, et al
. Postpartum depression screening at well-child visits: validity of a 2-question screen and the PHQ-9. Ann Fam Med 2009;7:63–70. doi:10.1370/afm.933
OpenUrl Abstract/FREE Full Text
↵
1. Adachi Y,
2. Aleksic B,
3. Nobata R, et al
. Combination use of Beck Depression Inventory and two-question case-finding instrument as a screening tool for depression in the workplace. BMJ Open 2012;2:e000596. doi:10.1136/bmjopen-2011-000596
OpenUrl Abstract/FREE Full Text
↵
1. McManus D,
2. Pipkin SS,
3. Whooley MA
. Screening for depression in patients with coronary heart disease (data from the Heart and Soul Study). Am J Cardiol 2005;96:1076–81. doi:10.1016/j.amjcard.2005.06.037
OpenUrl CrossRef PubMed Web of Science
↵
1. Robison J,
2. Gruman C,
3. Gaztambide S, et al
. Screening for depression in middle-aged and older puerto rican primary care patients. J Gerontol A Biol Sci Med Sci 2002;57:M308–14. doi:10.1093/gerona/57.5.M308
OpenUrl Abstract/FREE Full Text
↵
1. Suija K,
2. Rajala U,
3. Jokelainen J, et al
. Validation of the Whooley questions and the Beck Depression Inventory in older adults. Scand J Prim Health Care 2012;30:259–64. doi:10.3109/02813432.2012.732473
OpenUrl PubMed
↵
1. Mohd-Sidik S,
2. Arroll B,
3. Goodyear-Smith F, et al
. Screening for depression with a brief questionnaire in a primary care setting: Validation of the two questions with help question (Malay version). Int J Psychiatry Med 2011;41:143–54. doi:10.2190/PM.41.2.d
OpenUrl Abstract/FREE Full Text
↵
1. Gelaye B,
2. Tadesse M,
3. Williams M, et al
. Assessing validity of a depression screening instrument in the absence of a gold standard. Ann Epidemiol 2014;24:527–31. doi:10.1016/j.annepidem.2014.04.009
OpenUrl
↵
1. Kroenke K,
2. Spitzer R,
3. Williams J
. The Patient Health Questionnaire-2: validity of a two-item depression screener. Med Care Res Rev 2003;41:1284–92. doi:10.1097/01.MLR.0000093487.78664.3C
OpenUrl

View Abstract

Supplementary materials

Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Data supplement 1 - Online appendix 1
Data supplement 2 - Online appendix 2

Footnotes

Twitter Follow Simon Gilbody at @SimonGilbody
Contributors KB led on all stages of the review from development of the protocol, through screening studies, to data extraction and assessing the quality of the included studies, to production of the final report. DB involved in all stages of the review from development of the protocol, through screening studies and data extraction to synthesis and production of the final report. SG provided expert advice on methodology and approaches to assessment of the evidence base. MH devised the search strategy, carried out the literature searches and wrote the search methodology section of the report. LM reviewed the included studies and assessed their quality, performed the statistical analysis and wrote the results section of the final report. SN involved in the development of the protocol, screening studies for inclusion and data extraction. DM supervised the quality assessment, methodology and approaches to evidence synthesis and provided senior advice and support throughout the review and is guarantor. He contributed to the production of the final report. All parties were involved in drafting and/or commenting on the report.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.

[1] ↵
Mental Health Foundation. Mental Health Statistics [cited 2015 07/04/15]. http://www.mentalhealth.org.uk/help-information/mental-health-statistics/

[2] ↵
National Institute for Health and Clinical Excellence. Clinical knowledge summaries: depression prevalence. NICE, 2015. [updated Last revised in March 2015; cited 2015 07/04/15]. http://cks.nice.org.uk/depression#!backgroundsub:1

[3] ↵
Moussavi S,
Chatterji S,
Verdes E, et al
. Depression, chronic diseases, and decrements in health: results from the World Health Surveys. Lancet 2007;370:851–8. doi:10.1016/S0140-6736(07)61415-9
OpenUrl CrossRef PubMed Web of Science

[4] Moussavi S,

[5] Chatterji S,

[6] Verdes E, et al

[7] ↵
Joffres M,
Jaramillo A,
Dickinson J, et al.
, Canadian Task Force on Preventive Health Care. Recommendations on screening for depression in adults. CMAJ 2013;185:775–82. doi:10.1503/cmaj.130403
OpenUrl FREE Full Text

[8] Joffres M,

[9] Jaramillo A,

[10] Dickinson J, et al.

[11] ↵
US Preventive Services Task Force. Guide to clinical preventive services. Alexandra, VA: Williams & Wilkinsy, 1996.

[12] ↵
Allaby M
. Screening for depression: a report for the National Screening Committee. Oxford: NHS PHRU, 2010.

[13] Allaby M

[14] ↵
National Institute for Health and Clinical Excellence. CG90 depression: the Nice Guideline on the treatment and management of depression in adults. London, 2010. http://www.nice.org.uk/guidance/cg90/evidence/cg90-depression-in-adults-full-guidance2

[15] ↵
National Institute for Health and Clinical Excellence. CG91 Depression in adults with a chronic physical health problem. London, 2010. http://www.nice.org.uk/guidance/cg91/evidence/cg91-depression-with-a-chronic-physical-health-problem-full-guideline2

[16] ↵
National Institute for Health and Clinical Excellence. Clinical guideline 45: antenatal and postnatal mental health. London: NICE, 2007.

[17] ↵
National Institute for Health and Clinical Excellence. NICE guidelines [CG192]: antenatal and postnatal mental health: clinical management and service guidance. NICE, 2014. [updated December 2014; cited 2015 08/04/15]. http://www.nice.org.uk/guidance/cg192/chapter/1-recommendations#recognising-mental-health-problems-in-pregnancy-and-the-postnatal-period-and-referral-2

[18] ↵
National Screening Committee. The UK National Screening Committee's criteria for appraising the viability, effectiveness and appropriateness of a screening programme. London: NSC, 2003.

[19] ↵
Whooley M,
Avins A,
Miranda J, et al
. Case-finding instruments for depression. Two questions are as good as many. J Gen Intern Med 1997;12:439–45. doi:10.1046/j.1525-1497.1997.00076.x
OpenUrl CrossRef PubMed Web of Science

[20] Whooley M,

[21] Avins A,

[22] Miranda J, et al

[23] ↵
Spitzer R,
Williams J,
Kroenke K, et al
. Utility of a new procedure for diagnosing mental disorders in primary care: the PRIME-MD 1000 study. JAMA 1994;272:1749–56. doi:10.1001/jama.1994.03520220043029
OpenUrl CrossRef PubMed Web of Science

[24] Spitzer R,

[25] Williams J,

[26] Kroenke K, et al

[27] ↵
Arroll B,
Goodyear-Smith F,
Kerse N, et al
. Effect of the addition of a “help” question to two screening questions on specificity for diagnosis of depression in general practice: diagnostic validity study. BMJ 2005;331:884. doi:10.1136/bmj.38607.464537.7C
OpenUrl Abstract/FREE Full Text

[28] Arroll B,

[29] Goodyear-Smith F,

[30] Kerse N, et al

[31] ↵
Beauchamp H
. What factors influence the use of the Whooley questions by health visitors? J Health Visiting 2014;2:378–87. doi:10.12968/johv.2014.2.7.378
OpenUrl

[32] Beauchamp H

[33] ↵
Moher D,
Liberati A,
Tetzlaff J, et al
. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Int Med 2009;151:264–9. doi:10.7326/0003-4819-151-4-200908180-00135
OpenUrl CrossRef PubMed Web of Science

[34] Moher D,

[35] Liberati A,

[36] Tetzlaff J, et al

[37] ↵
Arroll B,
Khin N,
Kerse N
. Screening for depression in primary care with two verbally asked questions: cross sectional study. BMJ 2003;327:1144–6. doi:10.1136/bmj.327.7424.1144
OpenUrl Abstract/FREE Full Text

[38] Arroll B,

[39] Khin N,

[40] Kerse N

[41] ↵
Whiting P,
Rutjes A,
Westwood M, et al
. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Int Med 2011;155:529–36. doi:10.7326/0003-4819-155-8-201110180-00009
OpenUrl CrossRef PubMed Web of Science

[42] Whiting P,

[43] Rutjes A,

[44] Westwood M, et al

[45] ↵
Mann R,
Hewitt C,
Gilbody S
. Assessing the quality of diagnostic studies using psychometric instruments: applying QUADAS. Soc Psychiatry Psychiatr Epidemiol 2009;44:300–7. doi:10.1007/s00127-008-0440-z
OpenUrl CrossRef PubMed Web of Science

[46] Mann R,

[47] Hewitt C,

[48] Gilbody S

[49] ↵
Thombs B,
Arthurs E,
El-Baalbaki G, et al
. Risk of bias from inclusion of patients who already have diagnosis of or are undergoing treatment for depression in diagnostic accuracy studies of screening tools for depression: systematic review. BMJ 2011;343:d4825.
OpenUrl Abstract/FREE Full Text

[50] Thombs B,

[51] Arthurs E,

[52] El-Baalbaki G, et al

[53] ↵
Reitsma J,
Glas A,
Rutjes AW, et al
. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005;58:982–90. doi:10.1016/j.jclinepi.2005.02.022
OpenUrl CrossRef PubMed Web of Science

[54] Reitsma J,

[55] Glas A,

[56] Rutjes AW, et al

[57] ↵
Higgins J,
Thompson S,
Deeks J, et al
. Measuring inconsistency in meta-analyses. BMJ 2003;327:557–60. doi:10.1136/bmj.327.7414.557
OpenUrl FREE Full Text

[58] Higgins J,

[59] Thompson S,

[60] Deeks J, et al

[61] ↵
Thompson S,
Higgins J
. How should meta-regression analyses be undertaken and interpreted? Stat Med 2002;21:1559–73. doi:10.1002/sim.1187
OpenUrl CrossRef PubMed Web of Science

[62] Thompson S,

[63] Higgins J

[64] ↵
Lijmer J,
Bossuyt P,
Heisterkamp S, et al
. Exploring sources of heterogeneity in systematic reviews of diagnostic tests. Stat Med 2002;21:1525–37. doi:10.1002/sim.1185
OpenUrl CrossRef PubMed Web of Science

[65] Lijmer J,

[66] Bossuyt P,

[67] Heisterkamp S, et al

[68] ↵
Chagas M,
Crippa J,
Loureiro S, et al
. Validity of the PHQ-2 for the screening of major depression in Parkinson's disease: two questions and one important answer. Aging Ment Health 2011;15:838–43. doi:10.1080/13607863.2011.569482
OpenUrl PubMed

[69] Chagas M,

[70] Crippa J,

[71] Loureiro S, et al

[72] ↵
Henkel V,
Mergl R,
Coyne J, et al
. Screening for depression in primary care: will one or two items suffice? Eur Arch Psychiatry Clin Neurosci 2004;254:215–23. doi:10.1007/s00406-004-0476-3
OpenUrl PubMed Web of Science

[73] Henkel V,

[74] Mergl R,

[75] Coyne J, et al

[76] ↵
Zuithoff N,
Vergouwe Y,
King M, et al
. The Patient Health Questionnaire-9 for detection of major depressive disorder in primary care: consequences of current thresholds in a crosssectional study. BMC Fam Pract 2010;11:98. doi:10.1186/1471-2296-11-98
OpenUrl CrossRef PubMed

[77] Zuithoff N,

[78] Vergouwe Y,

[79] King M, et al

[80] ↵
Chochinov HK,
Wilson KG,
Enns M, et al
. “Are you depressed?” Screening for depression in the terminally ill. Am J Psychiatry 1997;154:674–6. doi:10.1176/ajp.154.5.674
OpenUrl CrossRef PubMed Web of Science

[81] Chochinov HK,

[82] Wilson KG,

[83] Enns M, et al

[84] ↵
Burton C,
Simpson C,
Anderson N
. Diagnosis and treatment of depression following routine screening in patients with coronary heart disease or diabetes: a database cohort study. Psychol Med 2013;43:529–37. doi:10.1017/S0033291712001481
OpenUrl CrossRef PubMed

[85] Burton C,

[86] Simpson C,

[87] Anderson N

[88] ↵
Lombardo P,
Vaucher P,
Haftgoli N, et al
. The ‘help’ question doesn't help when screening for major depression: external validation of the three-question screening test for primary care patients managed for physical complaints. BMC Med 2011;9:114. doi:10.1186/1741-7015-9-114
OpenUrl PubMed

[89] Lombardo P,

[90] Vaucher P,

[91] Haftgoli N, et al

[92] ↵
Shah M,
Karuza J,
Rueckmann E, et al
. Reliability and validity of prehospital case finding for depression and cognitive impairment. Am Geriatr Soc 2009;57:697–702. doi:10.1111/j.1532-5415.2009.02185.x
OpenUrl

[93] Shah M,

[94] Karuza J,

[95] Rueckmann E, et al

[96] ↵
Biswas S,
Gupta R,
Vanjare H, et al
. Depression in the elderly in Vellore, South India: the use of a two-question screen. Int Psychogeriatr 2009;21:369–71. doi:10.1017/S1041610208008259
OpenUrl CrossRef PubMed

[97] Biswas S,

[98] Gupta R,

[99] Vanjare H, et al

[100] ↵
Ryan D,
Gallagher P,
Wright S, et al
. Sensitivity and specificity of the Distress Thermometer and a two-item depression screen (Patient Health Questionnaire-2) with a ‘help’ question for psychological distress and psychiatric morbidity in patients with advanced cancer. Psychooncology 2012;21:1275–84. doi:10.1002/pon.2042
OpenUrl PubMed

[101] Ryan D,

[102] Gallagher P,

[103] Wright S, et al

[104] ↵
Brody D,
Hahn S,
Spitzer R, et al
. Identifying patients with depression in the primary care setting: a more efficient method. Arch Intern Med 1998;158:2469–75. doi:10.1001/archinte.158.22.2469
OpenUrl CrossRef PubMed Web of Science

[105] Brody D,

[106] Hahn S,

[107] Spitzer R, et al

[108] ↵
Suzuki T,
Nobata R,
Kim N, et al
. Evaluation of Questionnaires (Two question case finding instrument & Beck Depression Inventory)as a tool for screening and intervention of depression in work place. Seishin Igaku (Clinical Psychiatry) 2003;45:699–708.
OpenUrl

[109] Suzuki T,

[110] Nobata R,

[111] Kim N, et al

[112] ↵
Mann R,
Adamson J,
Gilbody S
. Diagnostic accuracy of case-finding questions to identify perinatal depression. CMAJ 2012;184:E424–30. doi:10.1503/cmaj.111213
OpenUrl Abstract/FREE Full Text

[113] Mann R,

[114] Adamson J,

[115] Gilbody S

[116] ↵
Gjerdingen D,
Crow S,
McGovern P, et al
. Postpartum depression screening at well-child visits: validity of a 2-question screen and the PHQ-9. Ann Fam Med 2009;7:63–70. doi:10.1370/afm.933
OpenUrl Abstract/FREE Full Text

[117] Gjerdingen D,

[118] Crow S,

[119] McGovern P, et al

[120] ↵
Adachi Y,
Aleksic B,
Nobata R, et al
. Combination use of Beck Depression Inventory and two-question case-finding instrument as a screening tool for depression in the workplace. BMJ Open 2012;2:e000596. doi:10.1136/bmjopen-2011-000596
OpenUrl Abstract/FREE Full Text

[121] Adachi Y,

[122] Aleksic B,

[123] Nobata R, et al

[124] ↵
McManus D,
Pipkin SS,
Whooley MA
. Screening for depression in patients with coronary heart disease (data from the Heart and Soul Study). Am J Cardiol 2005;96:1076–81. doi:10.1016/j.amjcard.2005.06.037
OpenUrl CrossRef PubMed Web of Science

[125] McManus D,

[126] Pipkin SS,

[127] Whooley MA

[128] ↵
Robison J,
Gruman C,
Gaztambide S, et al
. Screening for depression in middle-aged and older puerto rican primary care patients. J Gerontol A Biol Sci Med Sci 2002;57:M308–14. doi:10.1093/gerona/57.5.M308
OpenUrl Abstract/FREE Full Text

[129] Robison J,

[130] Gruman C,

[131] Gaztambide S, et al

[132] ↵
Suija K,
Rajala U,
Jokelainen J, et al
. Validation of the Whooley questions and the Beck Depression Inventory in older adults. Scand J Prim Health Care 2012;30:259–64. doi:10.3109/02813432.2012.732473
OpenUrl PubMed

[133] Suija K,

[134] Rajala U,

[135] Jokelainen J, et al

[136] ↵
Mohd-Sidik S,
Arroll B,
Goodyear-Smith F, et al
. Screening for depression with a brief questionnaire in a primary care setting: Validation of the two questions with help question (Malay version). Int J Psychiatry Med 2011;41:143–54. doi:10.2190/PM.41.2.d
OpenUrl Abstract/FREE Full Text

[137] Mohd-Sidik S,

[138] Arroll B,

[139] Goodyear-Smith F, et al

[140] ↵
Gelaye B,
Tadesse M,
Williams M, et al
. Assessing validity of a depression screening instrument in the absence of a gold standard. Ann Epidemiol 2014;24:527–31. doi:10.1016/j.annepidem.2014.04.009
OpenUrl

[141] Gelaye B,

[142] Tadesse M,

[143] Williams M, et al

[144] ↵
Kroenke K,
Spitzer R,
Williams J
. The Patient Health Questionnaire-2: validity of a two-item depression screener. Med Care Res Rev 2003;41:1284–92. doi:10.1097/01.MLR.0000093487.78664.3C
OpenUrl

[145] Kroenke K,

[146] Spitzer R,

[147] Williams J

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

Strengths and limitations of this study

Introduction

Method

Data sources and searches

Study selection

Data extraction and quality assessment

Data synthesis and analysis

Heterogeneity

Results

Overview of included studies

Quality assessment

Diagnostic properties of the Whooley questions (no help question)

Subgroup analyses

Whooley questions and help question

Discussion

Limitations

Recommendations

Conclusion

References

Supplementary materials

Supplementary Data

Footnotes

Read the full text or download the PDF:

Log in using your username and password