Research report
A comparison of the PRIME-MD PHQ-9 and PHQ-8 in a large military prospective study, the Millennium Cohort Study

https://doi.org/10.1016/j.jad.2012.11.052Get rights and content

Abstract

Background

In light of increased concerns about suicide in the military, institutional review boards have mandated increased scrutiny of the final item on the depression screening tool, the PHQ-9, which asks about suicidal thoughts. Since real-time monitoring of all individual responses in most observational studies is not feasible, many investigators have adopted the PHQ-8, choosing to remove the ninth item. This study compares the performance of the PHQ-8 with the PHQ-9 in a population-based sample of military or nonmilitary subjects.

Methods

The Millennium Cohort Study administers a self-reported questionnaire that includes the PHQ-9 at 3-year intervals to current and former U.S. military personnel. PHQ-9 responses of 143,705 Millennium Cohort members were investigated. Cross-sectional comparisons of the PHQ-9 and PHQ-8 and prospective analyses to detect a 5-unit change in these measures were performed.

Results

Greater than substantial agreement was found between the PHQ-8 and 9 instruments (kappas, 0.966–0.974 depending on survey cycle). There was similarly high agreement between the PHQ-8 and 9 in detecting a 5-point increase (κ=0.987) or decrease (κ=0.984) in score.

Limitations

One potential limitation of this study is that participants completed the PHQ-9, and PHQ-8 scores were extrapolated from the PHQ-9. In addition, the Millennium Cohort may not fully represent the U.S. military; though previous evaluations have shown the cohort to be a well-representative sample.

Conclusions

Since excellent agreement was detected between the PHQ-8 and PHQ-9 instruments, the PHQ-8 would capture nearly all the same cases of depression as the PHQ-9 in populations similar to the one in this study.

Introduction

The Primary Care Evaluation of Mental Disorders Patient Health Questionnaire (PHQ) is a standardized instrument that provides an assessment of mental health status based on scores of several health concepts (Spitzer et al., 1999, Spitzer et al., 2000, Spitzer et al., 1994). A 9-item scale from this instrument (PHQ-9) used to screen for major depressive disorder (Spitzer et al., 1999), has been shown to have high sensitivity (0.93) and specificity (0.89) (Fann et al., 2005), and correlates well with a diagnosis of depression, as outlined in the Diagnostic and Statistical Manual of Mental Disorders (2000). The ninth and final item on the scale asks about thoughts of being better off dead or hurting oneself, which is known to be related to suicidal thoughts and ideation (Corson et al., 2004). Exclusion of the final item results in the PHQ-8, which has also been shown to be a valid instrument for evaluating depression symptoms in specific populations (Kroenke et al., 2009).

The Millennium Cohort Study (Ryan et al., 2007, Smith, 2009), the largest population-based longitudinal cohort study in military history, has included the PHQ-9 as part of the standard questionnaire since the study launched in 2001. Symptoms of depression have previously been examined in this population using the PHQ-9 (Ryan et al., 2007, Wells et al., 2010), with positive screens for new-onset depression occurring in approximately 4% of men, and 8% of women at the time of the first follow-up questionnaire (Wells et al., 2010). However, in light of increased concerns about suicide in the military (Kuehn, 2009, Oquendo et al., 2005), institutional review boards have mandated increased scrutiny of this item with the intent of initiating provider referral when respondents positively endorse it. In nonclinical research settings, many investigators have adopted the PHQ-8, choosing to remove the final item that may indicate suicidal thoughts. Since real-time monitoring of more than 150,000 responses from Millennium Cohort members is not feasible, starting in 2011 the final item was removed from the questionnaire and the study currently uses the PHQ-8 to screen for depression. Although the PHQ-8 has been shown to have similar operating characteristics as the PHQ-9 (Kroenke and Spitzer, 2002), this was done using a sample of 6000 individuals seeking treatment in primary care or obstetrics-gynecology clinics (Kroenke et al., 2001). To our knowledge, no study has validated the PHQ-8 in a large, population-based cohort, military or otherwise. The objective of this study was to compare the PHQ-8 with the PHQ-9 to understand differences in depression screening capability between these two instruments in the Millennium Cohort Study, which may be generalizable to other similar population-based studies. Unique features of this study include investigating differences in a large military population that is probably healthier than a general population sample. Also, with the large sample size, subgroups such as differences by sex can be explored to determine performance of the PHQ-8 compared with the PHQ-9.

Section snippets

Study population

The Millennium Cohort Study began collection of self-reported health outcome and exposure data in 2001, prior to the start of the operations in Iraq and Afghanistan. The Millennium Cohort currently includes over 150,000 U.S. service members who enrolled during three separate cycles (panels) between 2001 and 2008. With the goal of evaluating long-term health outcomes related to military service, participants are surveyed every 3 years throughout a 21-year planned follow-up period. Detailed

Results

Of the 151,569 participants who completed at least one Millennium Cohort questionnaire, 7864 were excluded from this study due to missing data. Thus, the study population consisted of 143,705 participants (94.8%). At baseline, 6531 (4.5%) participants screened positive for depression using the PHQ-9. Of those, nearly all screened positive for depression using the PHQ-8 (n=6163, 94.4%) (Table 1).

Participants in the concordant positive and the PHQ-9 positive/PHQ-8 negative groups had more similar

Discussion

Clearly, the PHQ-9 is the preferred instrument of use in a clinical setting. The objective of this research was to determine how screening results may differ for the PHQ-8 compared with the PHQ-9 in a large population-based military study which uses self-reported survey data to identify participants who may have depression symptoms. To our knowledge, this is the first study to assess operational differences of the PHQ-8 compared with the PHQ-9 in a large, homogeneous, population-based sample of

Disclaimer

This represents report 12–20, supported by the Department of Defense, under work unit no. 60002. The views expressed in this article are those of the authors and do not reflect the official policy or position of the Department of the Navy, Department of the Army, Department of the Air Force, Department of Defense, or the U.S. Government. This research has been conducted in compliance with all applicable federal regulations governing the protection of human subjects in research (Protocol

Role of funding source

The Millennium Cohort Study is funded through the Military Operational Medicine Research Program of the U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland. Resources from the VA Puget Sound Health Care System supported Dr. Boyko’s involvement in this research. The funding organizations had no role in the design and conduct of the study; collection, analysis, or preparation of data; or preparation, review or approval of the manuscript.

Conflict of interest

None reported.

Acknowledgements

In addition to the authors, the Millennium Cohort Study Team includes Melissa Bagnell, MPH; Nancy Crum-Cianflone, MD, MPH; James Davies; Nisara Granado, MPH, PhD; Dennis Hernando; Kelly Jones, MPH; Lauren Kipp, MPH; Michelle Linfesty; Gordon Lynch; Hope McMaster, MA, PhD; Amanda Pietrucha, MPH; Teresa Powell, MS; Amber Seelig, MPH; Besa Smith, MPH, PhD; Katherine Snell; Steven Speigle; Kari Sausedo, MA; Beverly Sheppard; Martin White, MPH; James Whitmer; and Charlene Wong, MPH; from the

References (36)

  • J.A. Cohen

    A coefficent of agreement for nominal scales

    Educational and Psychological Measurement

    (1960)
  • J.R. Cornelius et al.

    Disproportionate suicidality in patients with comorbid major depression and alcoholism

    The American Journal of Psychiatry

    (1995)
  • Diagnostic and Statistical Manual of Mental Disorders

    Text Revision

    (2000)
  • S. Dhalla et al.

    The CAGE questionnaire for alcohol misuse: a review of reliability and validity studies

    Clinical & Investigative Medicine

    (2007)
  • J.R. Fann et al.

    Validity of the patient health Questionnaire-9 in assessing depression following traumatic brain injury

    The Journal of Head Trauma Rehabilitation

    (2005)
  • G.C. Gray et al.

    The millennium cohort study: a 21-year prospective cohort study of 140,000 military personnel

    Military Medicine

    (2002)
  • S. Glantz et al.

    Primer of Applied Regression and Analysis of Variance

    (1990)
  • D. Ganz et al.

    Suicidal behavior in adolescents with comorbid depression and alcohol abuse

    Minerva Pediatrica

    (2009)
  • Cited by (88)

    View all citing articles on Scopus
    View full text