Introduction The Depression subscale of the Hospital Anxiety and Depression Scale (HADS-D) has been recommended for depression screening in medically ill patients. Many existing HADS-D studies have used exploratory methods to select optimal cut-offs. Often, these studies report results from a small range of cut-off thresholds; cut-offs with more favourable accuracy results are more likely to be reported than others with worse accuracy estimates. When published data are combined in meta-analyses, selective reporting may generate biased summary estimates. Individual patient data (IPD) meta-analyses can address this problem by estimating accuracy with data from all studies for all relevant cut-off scores. In addition, a predictive algorithm can be generated to estimate the probability that a patient has depression based on a HADS-D score and clinical characteristics rather than dichotomous screening classification alone. The primary objectives of our IPD meta-analyses are to determine the diagnostic accuracy of the HADS-D to detect major depression among adults across all potentially relevant cut-off scores and to generate a predictive algorithm for individual patients. We are already aware of over 100 eligible studies, and more may be identified with our comprehensive search.
Methods and analysis Data sources will include MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO and Web of Science. Eligible studies will have datasets where patients are assessed for major depression based on a validated structured or semistructured clinical interview and complete the HADS-D within 2 weeks (before or after). Risk of bias will be assessed with the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Bivariate random-effects meta-analysis will be conducted for the full range of plausible cut-off values, and a predictive algorithm for individual patients will be generated.
Ethics and dissemination The findings of this study will be of interest to stakeholders involved in research, clinical practice and policy.
- Diagnostic accuracy
- Individual Patient Data Meta-Analysis
- Major depression
- PRIMARY CARE
- Chronic illness
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
- Diagnostic accuracy
- Individual Patient Data Meta-Analysis
- Major depression
- PRIMARY CARE
- Chronic illness
Strengths and limitations of this study
The study will use individual patient data to estimate diagnostic accuracy for all relevant cut-off scores of the Depression subscale of the Hospital Anxiety and Depression Scale (HADS-D). Using data from all patients at each cut-off score will overcome limitations related to selective cut-off reporting in primary study publications.
The study will conduct analyses that exclude patients with current diagnoses of depression or who are undergoing mental health treatment at the time of study enrolment, as these patients would not be screened in clinical practice. This will overcome potential bias in primary diagnostic test accuracy studies where these patients are often included.
The study will generate a predictive model to estimate the probability that a patient with a particular HADS-D score and relevant covariates typically available in clinical practice has depression. This will facilitate more informed clinical decision-making than can be done with standard diagnostic accuracy metrics.
A potential limitation is that the success of the study depends on the ability to obtain the relevant individual patient data and to avoid selective availability of studies with better or worse accuracy results. We do not know the proportion of eligible datasets that will be possible to include in the study.
Major depressive disorder (MDD) may be present in 10–20% of patients with acute and chronic medical conditions and is independently associated with poor prognosis.1–6 Healthcare teams in non-psychiatric settings, however, where the majority of depression care is provided, typically have little formal mental health training,7 and mental healthcare is often inconsistent. Many depressed patients are not diagnosed, and a high proportion of patients treated for depression do not meet diagnostic criteria.8–12 It has been suggested that routine screening for depression may be a solution,13 but this is controversial.14–18
A necessary, but not sufficient, condition for depression screening to benefit patients is a screening tool with demonstrated high diagnostic accuracy. In order to correctly identify ideal cut-off screening thresholds and to obtain unbiased diagnostic accuracy estimates, several major problems common to studies of depression screening tools must be addressed. These include (1) the overreliance on primary studies with small sample sizes that selectively report results from only well-performing cut-off thresholds;19 ,20 (2) the inclusion in primary studies of patients already being treated for depression, even though these patients are not screened in actual practice, since screening is done to detect unrecognised cases;21 ,22 and (3) the lack of consideration of individual patient depression risk factors (eg, age, sex, inpatient vs outpatient care) in estimates of screening accuracy.23
The Depression subscale of the Hospital Anxiety and Depression Scale (HADS-D) is the most commonly used screening tool in medically ill patients.24–26 Most existing studies of the accuracy of the HADS-D have (1) been conducted in samples too small to precisely estimate accuracy, (2) included already-diagnosed and treated patients and (3) selectively published accuracy results from cut-offs that perform well, but not other relevant cut-offs.25 ,27–29 For instance, in one meta-analysis of the diagnostic accuracy of the HADS-D,27 the authors excluded 16 of 41 (39%) primary studies that examined the diagnostic accuracy of the HADS because those studies only reported results from study-specific optimal cut-offs, but not from standard cut-offs, which were pooled in the meta-analysis. The inability to include studies that did not publish more pessimistic results associated with standard HADS cut-offs likely led to inflated estimates of accuracy in the meta-analysis compared to real-world performance using standard cut-offs. In other meta-analyses,25 ,28 ,29 authors have reported results from a single bivariate accuracy model that included results from different cut-offs from different primary studies, sometimes from the best-performing sample-specific cut-off in each study.
Individual patient data (IPD) meta-analysis can potentially address some of these problems. IPD meta-analysis involves using actual patient data obtained from researchers who conducted primary studies, rather than summary results from published or unpublished study reports.30 The steps involved in conducting a systematic review with an IPD meta-analysis, in terms of defining a research question, establishing study inclusion and exclusion criteria, identifying and screening studies and analysing data, are similar to those in a traditional systematic review and meta-analysis and diverge only in analysing individual-level data rather than summary data.31 In the context of evaluating the diagnostic accuracy of depression screening tools, IPD meta-analysis has a number of potential advantages compared with a conventional meta-analysis. First, it can address bias from the selective reporting in publications of only well-performing cut-off thresholds since accuracy can be evaluated across all relevant cut-off scores. Second, it allows the systematic exclusion of already-treated patients, for whom the tool would not be used to screen for unidentified depression. Third, IPD meta-analysis with large numbers of patients and large numbers of depression cases allows the incorporation of study variables (eg, study setting, risk of bias factors) and individual factors that may influence screening accuracy (eg, age, sex, inpatient vs outpatient). In addition, a large IPD database allows the development of a predictive algorithm to generate estimates of the probability of having depression based on patient characteristics and actual HADS-D scores, rather than classifying patients as simply negative or positive based on screening results. This is an important consideration because a patient with a score of 0 on the HADS-D, for example, would almost certainly have a lower likelihood of having depression than a patient with a substantially higher, but subthreshold, score of 7, although typically both would be classified as negative screens.
A potential downside of IPD meta-analyses is that they are resource intensive. Furthermore, they can be biased if the primary datasets obtained are not representative.30–33 Currently, our team is conducting an IPD meta-analysis of the Patient Health Questionnaire depression screening tool, which is the first IPD meta-analysis of the diagnostic accuracy of a depression screening tool.34 In that study, which is still in progress as of 14 March 2016, we had verified 76 eligible primary datasets and obtained usable data for 60 of them. This suggests that investigators are generally able and willing to provide primary data from studies of the diagnostic accuracy of depression screening tools for use in IPD meta-analyses. On the basis of preliminary searches, we are aware of at least 100 eligible studies on the HADS-D.
Thus, the objectives of this IPD meta-analysis are to determine the diagnostic accuracy of the HADS-D to detect major depression among patients in medical settings and to develop an algorithm to predict the probability that individual patients have MDD based on HADS-D scores and patient characteristics.
Methods and analysis
This systematic review has been funded by the Canadian Institutes of Health Research (Funding Reference Number KRS-144045). The protocol has been registered in the PROSPERO prospective register of systematic reviews (CRD42015016761), and any changes to the study protocol will be registered as amendments with PROSPERO.
The IPD meta-analysis has been designed and will be conducted in accordance with best-practice standards as elaborated in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy35 and other key sources.30 ,31 ,36 Results will be reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses of individual patient data (PRISMA-IPD) statement.37 To conduct the meta-analysis, we will seek primary datasets that allow us to compare HADS-D scores to MDD or major depressive episode (MDE) diagnostic status. Most primary studies use MDD as the reference standard, but some may use MDE, which is identical with respect to the symptoms of depression, but does not exclude patients with psychotic disorders or a history of manic episodes. If both are available, we will use MDD.
Sources of evidence
Our search strategy, which is based on strategies used in previous systematic reviews,34 ,38 was developed by a medical librarian and peer reviewed by another medical librarian. MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO (OvidSP platform) and Web of Science (Web of Knowledge platform) will be searched. The MEDLINE search strategy was validated by testing against already-identified publications from preliminary searches. The strategy was then adapted for PsycINFO and Web of Science. We limited our search strategy to these databases based on research showing that adding other databases (eg, EMBASE) when the MEDLINE search is highly sensitive does not identify additional eligible studies.39 The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy35 suggests combining concepts of the index test and the target conditions, but this was redundant for depression screening tools as these tests are limited to testing for depression. Thus, the search strategy for electronic databases was composed of two concepts: the index test of interest and studies of screening accuracy. There are no published search hedges designed specifically for mental health screening, but several key articles were consulted in developing search terms to ensure retrieval of relevant publications.40–42 Search strategies use a combination of subject headings, when available in the database, as well as keywords in the title, abstract or anywhere else in the record. See online supplementary file 1 for detailed information on searches. To supplement electronic searches, reference lists of all included publications and relevant reviews will be scanned. In addition, a related articles search will be conducted for included papers indexed in MEDLINE using the PubMed ‘related articles’ search feature. We will also contact researchers who have published on the topic to obtain information about additional, unpublished studies. Search results will be initially uploaded into the citation management database RefWorks (RefWorks, RefWorks-COS, Bethesda, Maryland, USA), and the RefWorks duplicate check function will be used to identify citations retrieved from multiple sources. Unique citations will then be uploaded into the systematic review program DistillerSR (Evidence Partners, Ottawa, Canada), and DistillerSR will be used to store and track search results and to track results of the review process.
Supplementary data 1
To identify relevant datasets, we will review articles in any language. Datasets will be sought for inclusion if patients completed the HADS-D and were assessed for MDD or MDE using Diagnostic and Statistical Manual of Mental Disorders, third edition (DSM-III, DSM-IV or DSM-V) or International Classification of Diseases (ICD, V.10) criteria within 2 weeks (before or after) of completion of the HADS-D, since major depression criteria are for symptoms in the last 2 weeks. Diagnostic assessments must be based on a validated structured or semistructured interview (eg, Structured Clinical Interview for DSM,43 Composite International Diagnostic Interview44). Datasets where some patients were administered the screening tools within 2 weeks of the diagnostic interview and some patients were not will be included if the original data allow us to select patients administered the diagnostic interview and screening tools within the 2-week window. Data from studies where all patients are known to have psychiatric diagnoses, have been referred for mental health evaluation or are undergoing treatment for depression will be excluded, with the exception of patients treated for substance use disorders, for whom depression screening may be considered. The coding manual for inclusion and exclusion decisions is shown in online supplementary file 2.
Supplementary data 2
Two investigators will independently review titles and abstracts for eligibility. If either reviewer determines that a study may be eligible based on title or abstract review, then a full-text article review will be completed. Disagreement between reviewers after full-text review will be resolved by consensus, including a third investigator as necessary. Translators will be consulted to evaluate titles/abstracts and articles for languages other than those for which team members are fluent. See online supplementary file 3 for a preliminary PRISMA flow of studies figure.
Supplementary data 3
Transfer of data and dataset management
Authors of studies containing datasets that meet inclusion criteria will be contacted to invite them to contribute primary data for inclusion. Data will only be used from studies that received ethics approval and all data that are transferred will be properly deidentified prior to transfer. All IPD that are obtained will be cleaned and coded to make patient data as uniform as possible across datasets, then entered into a single database. A preliminary codebook has been developed for coding data from original studies of the HADS-D. Actual data coding and transfer from original studies into the IPD database will be done by a supervised staff or trainee member of the team. Patient characteristics and screening accuracy results for each study using the cleaned datasets will be compared with those from the original datasets to identify any potential discrepancies.
In addition to obtaining original patient-level data, data will also be extracted from the published articles of included studies. We will crosscheck the published data with the original patient-level data obtained from each dataset, and any inconsistencies will be discussed with the original authors. Corrections will be made as necessary.
The Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool45 will be used to assess risk of bias factors in primary studies. QUADAS-2 incorporates assessments of risk of bias across four core domains: patient selection, the index test, the reference standard, and the flow and timing of assessments. Two reviewers will independently assess risk of bias with any discrepancies resolved by consensus.
Analyses will estimate sensitivity and specificity, which will be used to generate estimates of positive and negative predictive value, which are more useful clinically. A bivariate random-effects meta-analysis will be fit, estimated via Gauss-Hermite adaptive quadrature, as described in Riley et al,46 for the full range of plausible cut-off values. This approach models sensitivity and specificity simultaneously and accounts for variation in within-study precision.46 Data from all primary studies will be analysed at the same time using a random-effects model so that sensitivity and specificity are assumed to vary across studies. For each cut-off, separately, this model will provide us with an overall summary sensitivity and specificity and an overall summary diagnostic OR. We will compare results that only include datasets that allow the exclusion of patients diagnosed with depression or receiving depression treatment (including antidepressants with reason unspecified) with results that also include studies where these data are not available. Additionally, a subgroup analysis will be conducted that includes only data from countries listed as ‘very high development’ on the United Nation's Human Development Index.47
Heterogeneity will be quantified for each cut-off analysis by reporting the estimated variances of the random effects for sensitivity and specificity, as well as by estimating R. R is the ratio of the estimated SD of the summary sensitivity (or specificity) from the random-effects model to the estimated SD of the summary sensitivity (or specificity) from the fixed-effects model.48 We will explore underlying reasons for heterogeneity using patient-level and study-level factors. In diagnostic accuracy, this can easily be accomplished by including the factors or interaction terms in the random-effects model described above.46 These analyses take advantage of the richness of IPD. When analysed at the patient level, accounting for correlation between patients from the same study, and for the correlation between sensitivity and specificity via the random-effects bivariate model, they are more powerful to detect interactions and not vulnerable to ecologic bias compared with traditional meta-analyses.49–53 At the patient level, covariates will include age (<60 years vs ≥60 years), sex and inpatient versus outpatient status. Study-level covariates, including risk of bias factors described in QUADAS-2,45 will also be evaluated. QUADAS-2 factors include patient selection factors, blinding of reference standard to index test results, type of reference standard (eg, semistructured diagnostic interview, structured diagnostic interview) and timing of administration of index test and reference standard (eg, same day, delay of 1–7 days, delay of >7 days). Significance levels for the prespecified interaction analyses will not be adjusted for multiple comparisons.
In addition to estimating sensitivity and specificity for each relevant cut-off, we will build a predictive model that uses the score on the screening questionnaire and any other key factors that account for substantial heterogeneity to estimate the probability that a patient has major depression. The model will be evaluated in terms of its calibration (eg, slope of linear predictor; are average, low and high predictions correct?) and discrimination (eg, c-statistic; how well are low-risk subjects distinguished from high-risk subjects?).54 Validation with the same subjects used to develop a model results in overly optimistic performance. Internal validation will be assessed via the bootstrap method, which is preferable to split sample validation approaches (eg, developing the model in half the sample and evaluating it in the other half).55 Although there are advantages to external validation, given the wide range of study samples that will be used, it would be unlikely that there would be another comparable dataset large enough for validation. Thus, assessment of internal validity via bootstrapping will help develop an understanding of how the model will likely perform in a clinical setting. Furthermore, by using the regression coefficients adjusted for optimism (ie, the shrinkage estimates), we will maximise actual accuracy. On the basis of our pilot work, it is anticipated that missing data will be minimal for the variables of primary interest. Regardless, multiple imputation will be done using chained equations54 ,56 to impute data for binary and continuous variables, considering study as a fixed effect in the imputation model.57 This will allow imputation for variables missing for entire studies as well those missing more sporadically.
Studies included in the IPD meta-analysis will be compared with eligible studies that do not provide data in terms of sensitivity and specificity, using published summary data from the studies that do not provide data. Depending on the number of missing studies, a sensitivity analysis may also be conducted that includes aggregate summary estimates of sensitivity and specificity from the studies that do not provide IPD in the main meta-analysis, along with data from studies that contribute to the IPD meta-analysis.46 If there are a large number of studies that do not contribute primary data, this analysis may become the primary analysis.
Ethics and dissemination
This IPD meta-analysis does not require ethics approval, although only individual studies that obtained ethical clearance and informed consent will be included. The reasons that the IPD meta-analysis does not require ethics review are that the objectives of the IPD meta-analysis are consistent with the objectives of the primary studies, which already received ethics approval, and only anonymised data will be provided by the investigators of the original studies.
The main outcomes of the IPD meta-analysis reflect knowledge that will influence future research, clinical practice and policy. Strategies for effective dissemination and specific outputs will be based on research showing how to best tailor research outputs to different user groups,58–63 including research on improving the usefulness of reports of systematic review and meta-analyses for healthcare managers and policymakers.61 ,63 Dissemination will include publication of results in high-impact medical journals with open access, as well as presentations in seminars and symposia to policymakers, healthcare providers and researchers at national and international conferences.
If the predictive model performs well, we will create an online calculation tool that will be made freely available to estimate the probability that a given patient has major depression based on depression screening results and key patient characteristics. An example of a tool that is based on robust research evidence and effectively disseminated is the Fracture Risk Assessment Tool (FRAX) (http://www.shef.ac.uk/FRAX/index.aspx). The tool that will be made based on the results of our study will similarly be presented in an easy-to-use fashion with tablet and app versions. In addition, simpler nomogram-based presentations, which are user-friendly graphical depictions of positive and negative predictive value by prevalence, will be generated and made available.
Twitter Follow Simon Gilbody at @SimonGilbody
Contributors BDT, AB, LAK, BL, MA, KER, NS, PC, SG, JPAI, DM, SBP, IS, RJS, RCZ, MH, ZI, CGL, NM and MT contributed to the conception and design of the systematic review and meta-analysis. BDT, AB, LAK, BL, MA, KER and NS will be involved in the acquisition of data. BDT, AB and BL will analyse the data. BDT, AB, LAK, BL, MA, KER, NS, PC, SG, JPAI, DM, SBP, IS, RJS, RCZ, MH, ZI, CGL, NM and MT will interpret the results. BDT and AB drafted this protocol. All authors provided critical revisions of the protocol and approved submission of the final manuscript. BDT is the guarantor.
Funding This research is supported by a grant from the Canadian Institutes of Health Research (CIHR; Funding Reference Number KRS-144045; PI Thombs). BDT receives support from an Investigator Award from the Arthritis Society. AB is supported by the Fonds de recherche du Québec—Santé (FRQS) researcher salary award. BL is supported by the Fonds de recherche du Québec—Santé (FRQS) doctoral award. MA is supported by a CIHR Frederick Banting and Charles Best Canadian Graduate Scholarships—Master's Award. SBP is a Senior Health Scholar with Alberta Innovates—Health Solutions. ZI receives funding from the Alzheimer Society Calgary via the Hotchkiss Brain Institute. CGL's work is supported by the Christine and Herschel Victor/Hope and Cope Research Chair in Psychosocial Oncology, McGill University. No funding body had any input into any aspect of this protocol.
Competing interests SBP received a research grant from a competition cosponsored by the Hotchkiss Brain Institute and Pfizer Canada. All other authors declare that they have no competing interests.
Provenance and peer review Not commissioned; peer reviewed for ethical and funding approval prior to submission.