Poor psychological health and 8-year mortality: a population-based prospective cohort study stratified by gender in Scania, Sweden

Objectives We investigated gender differences in the association between mortality and general psychological distress (measured by 12-item General Health Questionnaire, GHQ-12), as an increased mortality risk has been shown in community studies, but gender differences are largely unknown. Setting We used data from a cross-sectional population-based public health survey conducted in 2008 in the Swedish region of Skåne (Scania) of people 18–80 years old (response rate 54.1 %). The relationship between psychological distress and subsequent all-cause and cause-specific mortality was examined by logistic regression models for the total study population and stratified by gender, adjusting for age, socioeconomic status, lifestyle (physical activity, smoking, alcohol consumption), and chronic disease. Participants Of 28 198 respondents, 25 503 were included in analysis by restrictive criteria. Outcome measures Overall and cause-specific mortality by 31 December 2016. Results More women (20.2 %) than men (15.7 %) reported psychological distress at baseline (GHQ ≥3). During a mean follow-up of 8.1 years, 1389 participants died: 425 (30.6%) from cardiovascular diseases, 539 (38.8%) from cancer, and 425 (30.6%) from other causes. The overall association between psychological distress and mortality risk held for all mortality end-points except cancer after multiple adjustments (eg, all-cause mortality OR 1.8 (95 % CI 1.4 to 2.2) for men and women combined. However, stratification revealed a clear gender difference as the association between GHQ-12 and mortality was consistently stronger and more robust among men than women. Conclusion More women than men reported psychological distress while mortality was higher among men (ie, the morbidity-mortality gender paradox). GHQ-12 could potentially be used as one of several predictors of mortality, especially for men. In the future, screening tools for psychological distress should be validated for both men and women. Further research regarding the underlying mechanisms of the gender paradox is warranted.

*5. One major concern in the interpretation of the gender differences is whether they are statistically different. The 95% CIs for HRs, for example, overlap between men and women. I strongly suggest calculating and reporting the *mortality rates* (# of deaths per 100 person-years) and comparing whether these are significantly different (95% CI and p-values) between men and women. This could be done for all-cause and cause-specific mortality. Any reference to "rates" in the paper as it stands is actually referring to hazards or risk, not rates.
6. The Kaplan Meier curves only show the unadjusted effects. Adjusted survival curves are important as well, especially given the different baseline characteristics between psychological distress yes/no and M/F. 7. Limitations should include the low response rate of the Scania public health survey (54.1%) and implications on the study findings; e.g. do we have information on non-respondents? %women/men, % with chronic or mental illnesses, etc. Also, what percentage of Scania's population is covered by the survey; who are excluded (those in hospitals, long term care, etc)? *8. The discussion reiterates a lot of the results already presented in the text and tables. This section also introduces new analyses not mentioned in the methods or results. Once these are taken out or moved, the authors will have sufficient space to delve deeper into the interpretation and the "so-what" of their findings which is currently lacking in the discussion. For example, I would have liked to see more explanation around the gender paradox drawing from earlier literature of gender differences in psychological distress. What are the implications of these findings and what are the future directions?
Minor comments: 1. Abstract: mention GHQ-12 as the measure used to ascertain psychological distress.
2. Please provide rationale or references for why GHQ-12 cutpoints were used.
3. Report median follow-up time (in addition to or in place of maximum and mean). 4. Please include more details about how the PH assumption was not violated in the Methods. 5. A few typos: continuous (p. 7), 5.3 years (p. 7)., " SES was defined by 12 categories *of employment*" (p. 7) 6. Were other SES measures (income, education) available? Why were these not examined and what was the rationale for focusing on employment?
7. Covariates: more information about the names of the registries/datasets and where other covariate information came from. Is it from the Scania public health survey? 8. p. 8: not typical for the Table numbers to be spelled out in the Methods; these can be removed. 9. p. 17. Not sure what "The loss of these participants in the present study is a selection bias in a weakening direction but of low magnitude on final results" means.

Reviewer 1 (Andreas Lundin, Karolinska Institutet, PHS, Sweden):
The introduction 1. Reviewer: The GHQ-12 predicts all-cause mortality, and also death from cancer and CVD as expected from previous literature. This could be taken as evidence of predictive validity of the GHQ-12 in this setting, and because of the gender stratified analyses, that this is invariant across sex. What I think is missing is a sentence or two explaining why a measure of psychological distress would lead to these specific outcomes, rather than say depression or suicide. A mechanism could be presented for this (in the introduction).
Answer: We thank the reviewer for this comment. We have now rewritten this section, please see the first paragraph in the Introduction section.

Reviewer:
The authors make the sex stratified analysis an important feature of their article, but this is not reflected in the title.
Answer: We thank the reviewer for this comment. The title has now been changed to "Poor psychological health and 5-year mortality: A population-based prospective cohort study stratified by gender in Scania, Sweden".

Reviewer:
A design weight corrected for non-participation is presented, but a bit too brief. I suggest the construction of the weights is presented in a bit more detail. The strata are geographical areas (municipalities and city areas) so a weight to compensate for different sample probabilities for these are probably the first weight. This is the corrected for nonparticipation by using register information on all those in the sample frame. Relatedly, the use of this weight should be mentioned in the statistical analysis section.
Answer: We thank the reviewer for this comment. To compensate for selection bias, the geographically stratified random sample was weighted by age, sex, country of birth, marital status, income, and education through a weighting variable designed by Statistics Sweden. This has now been clarified under Participants and study design.

Reviewer:
Last sentence in participants section: those 136 most likely emigrated or were dead? Please clarify.
Answer: We thank the reviewer for this comment. The 136 persons were most likely not dead, as all deaths of Swedish citizens, whether nationally or abroad, would have been recorded in the death register by The National Board of Welfare. For some reason, these 136 individuals were not traceable in the registers of health care during the follow-up period of more than 5 years. We can only speculate that these individuals could have lived abroad and therefore not been in touch with Swedish health care.

5.
Reviewer: Under description of the GHQ-12 you describe that scores were used to construct zones. Could you describe why these specific cut offs were used? Here it would also be good to add Cronbach coefficient alpha for the GHQ-12, separately for men and women to show if the scale had the same properties for men and women. Any differences in predictive ability could be due to gender invariance.
Answer: We thank the reviewer for this comment. The GHQ-12 score was divided into four categories based on previous literature in order to investigate a dose-response association between GHQ-12 and mortality. The rationale for our division was that the majority (66 %) scored zero (69 % men; 63 % women), sub-clinical distress was by the Swedish conventional definition a score below 3, i.e.1-2 (16 % scored 1-2, 15 % men; 17 % women), and the most even distribution of the remaining population was at cut-off 5/6 (9 % were defined as moderately and highly distressed respectively, 8 % + 8% of men and 10 % + 11 % of women). Reliability was good with Cronbach coefficient alpha 0.897 for men and 0.894 for women.

6.
Reviewer: comparison with previous literature is made with Finland. Finland is not a Scandinavian county as stated. Tables 7. Reviewer: tables in the text are good. The tables following the article, numbered in a similar way seem to test a different cut off score (four or more symptoms). Is this correct?
Answer: We thank the reviewer for this comment. Yes, this is correct! The comparison between the two different cut-off scores has now been clarified as a minor aim. Supplementary Tables 1-4 (GHQ ≥ 4) show identical analyses to Tables 1-4 (GHQ ≥ 3). The reason for the comparison is that a recent Swedish case-control study concluded that the best sensitivity and specificity of GHQ-12 was seen at cut-off ≥4 when discriminating between healthy controls and psychiatric outpatients. We were curious as to how results of the present study would be affected by a higher GHQ-12 cut-off.

Reviewer 2 (Alexander M/Ponizovsky, Ministry of Health, Israel):
This study using the representative sample of the general population in the Swedish region of Scania have replicated previous findings on the well-established relationships between the GHQ-12 psychological distress scores and all-cause and specific-cause mortality rates. In addition, gender differences were shown in the psychological distress-mortality relationships with higher distress scores in women, but higher mortality rates in men.

Reviewer:
The study methodology is adequate, although the moderating/mediating effects of gender on the psychological distress-mortality association would be additionally explored.
Answer: We thank the reviewer for this comment. We have now added a test for effect modification by gender as stated in the second sentence under Statistics: "As the main aim of the study was to test gender differences, an initial test for effect modification by gender was performed. The interaction term between psychological distress and sex was significant (Hazard Ratio (HR) = 0.6; P = 0.002) indicating that the effect of psychological distress on mortality was different for men and women." Reviewer: Results of the study are presented clearly. Limitations of the study are generally noted.

Reviewer:
Because the main aim of the study was declared as to indicate gender differences in the distress-mortality relationships, the Introduction section needs more theoretical background on biological/psychological/social differences in psychological morbidity and distress expression in men and women.
3. Reviewer: Also, in the Discussion section, it is absolutely insufficient to say the phrase "Among the possible explanations of the gender paradox are differentials between the sexes in biological risk, in health behaviors and social roles, and in health seeking behavior". "The possible explanations" should be not only listened, but also discussed in details and they should be supported by the findings of previous studies and the above-mentioned theoretical background.
Answer: We thank the reviewer for this comment. We have thoroughly rewritten both the Introduction and the Discussion sections based on your good advice.

Reviewer 3 (Maria Chiu, University of Toronto):
Thank you for the opportunity to review this interesting and well-written paper entitled, "Psychological distress measured by GHQ-12 and mortality: A 5-year prospective population-based study in Scania, Sweden." The aim of this paper was to investigate whether the association between psychological distress and mortality differed between men and women in Scania, Sweden. The authors studied 28198 adults followed for up to 5 years and found that those who reported psychological distress at baseline had a higher hazards of dying than those with no psychological distress and this effect differed between men and women. They describe a morbidity-mortality gender paradox, in which women had higher prevalence of psychological distress but men with psychological distress have a higher risk of mortality.

Major comments (*most important to address):
Methods: 1a*. Reviewer: It is not clear whether weighted analyses were performed for this study. If the intention was to generalize these findings to the population of Scania and adjust at least in part for differential survey sampling, then weighted analyses would be recommended. Specifically, all statistical tests (i.e. t-tests, chi-square tests, interaction testing, Cox PH, etc.) would have to be weighted.
Answer: We thank the reviewer for this comment. All analyses have been performed on weighted data which is now stated in the Statistics section. 1b*. Reviewer: Furthermore, the authors should describe what methods were used to ensure that appropriate variance estimations were performed with weighted data (e.g. bootstrapped p-values, 95% confidence intervals?) Answer: We thank the reviewer for this comment. Appropriate variance estimations were performed by bootstrap method (2000 replicates), 95% confidence intervals, which is now specified in the statistics section.

Reviewer: What did you mean by "The differences between unweighted and weighted data were small". Which variables were examined?
Answer: We thank the reviewer for this comment. Overall, the differences were small regarding prevalence for practically all included variables regardless of whether data was weighted or not. However, all analyses in the revised manuscript have now been performed on weighted data as recommended.

3*. Reviewer: If the main aim of the study was to test gender differences, was there an initial statistical test for effect modification by gender? I don't doubt that there's an interaction, but would be good to describe in methods and state that the interaction term was significant in model 0, 1, 2 (p=)?
Answer: We thank the reviewer for this comment. We have now added the following under Statistics: "As the main aim of the study was to test gender differences, an initial test for effect modification by gender was performed. The interaction term between psychological distress and sex was significant (Hazard Ratio (HR) = 0.6; P = 0.002) indicating that the effect of psychological distress on mortality was different for men and women."

4.
Reviewer: it would be good to add more details describing findings from Table 1; i.e. factors that are present in those with psychological distress in men vs. women.
Answer: We thank the reviewer for this comment. We have now added this information.

5*. Reviewer:
One major concern in the interpretation of the gender differences is whether they are statistically different. The 95% CIs for HRs, for example, overlap between men and women. I strongly suggest calculating and reporting the *mortality rates* (# of deaths per 100 person-years) and comparing whether these are significantly different (95% CI and p-values) between men and women. This could be done for all-cause and cause-specific mortality. Any reference to "rates" in the paper as it stands is actually referring to hazards or risk, not rates.
Answer: We thank the reviewer for this comment. We have now calculated mortality rates for all-cause and cause-specific mortality. As can be seen from the tables below, the male to female incidence rate ratios (IRR) show statistically significant differences between men and women for all mortality out-comes (p<0.001 for all outcomes except cancer mortality p=0.013). The 95% CIs for incidence rates per 100 person-years 95% overlapped minimally for men and women in cancer mortality and were nonoverlapping for all other mortality outcomes.

Risk of overall
Male to female Incidence mortality.
No Answer: We thank the reviewer for this comment. A total of 52142 persons aged 18-80 years (a random stratified sample selected from the official population registers of people living in Scania including 5.8% of the total population 18-80 years old) were invited to participate in the Scania public health survey 2008. The response rate of 54.1% is in line with those normally obtained at this time in this type of population-based public health surveys in Sweden and Europe. However, we have now specifically mentioned the response rate of the survey as a limitation and stated that non-responders were more often young, male, low-educated or foreign-born. We do not know if non-responders had more psychological distress than responders. There were probably more non-responders among those with serious mental disorder (such as psychosis, schizophrenia, severe depression) than in the general population due to the mental task involved in answering a large survey with 134 main questions (totalling 273 items including subqueries and follow-up questions). As the surveys were posted to the residential address specified in the official population registers of people living in Scania, we do not know how many people hospitalized at the time had the opportunity to answer the survey (they could if the survey was brought to them by their spouse). Answer: We thank the reviewer for this comment. We have now completely rewritten the discussion based on your good advice.

Minor comments:
Abstract 1. Reviewer: mention GHQ-12 as the measure used to ascertain psychological distress.
Answer: We thank the reviewer for this comment. This has now been clarified in the abstract.

2.
Reviewer: Please provide rationale or references for why GHQ-12 cutpoints were used.
Answer: We thank the reviewer for this comment. The two cut-off points GHQ-12 ≥ 3 (conventionally used in Sweden) and GHQ-12 ≥4 (widely used internationally) are based on the literature in public health. A minor aim of the present study was to compare these two cut-off points (results for GHQ-12 ≥ 3 shown in Tables 1-3 and results for GHQ-12 ≥ 4 shown in Supplementary  Tables 1-3). The GHQ-12 score was furthermore divided into four categories (also based on previous literature) in order to investigate a dose-response association (results shown in Table 4 and Supplementary Table 4). The rationale for our division was that the majority (66 %) scored zero, sub-clinical distress was by definition in the Swedish context a score below 3, i.e.1-2 (16 %), and the most even distribution of the remaining population was at cut-off 5/6, i.e. moderate distress 3-5 (9 %) and high distress 6-12 (9%). The proportions regarding an international context were in our data: no distress (66%), sub-clinical distress 1-3 (20%), moderate distress 4-6 (7%) and high distress 7-12 (7%).

Reviewer: Report median follow-up time (in addition to or in place of maximum and mean).
Answer: We thank the reviewer for this comment.

Reviewer: Please include more details about how the PH assumption was not violated in the Methods.
Answer: We thank the reviewer for this comment. The proportional hazards assumption was considered fulfilled after inspection of the survival curves according to psychological distress (yes/no). Additional statistical tests on our study data indicated absence of perfect proportionality with regard to poor GHQ-12 and mortality across the 5.3-year period (see Supplemental file on PH assumption). This is in line with other studies showing a dilution over time for mortality impact of baseline psychological distresssee discussion under Strengths and limitations on page 20-21.

Reviewer: Were other SES measures (income, education) available? Why were these not examined and what was the rationale for focusing on employment?
Answer: We thank the reviewer for this comment. Employment was based on register data from Statistics Sweden (SCB) with internal missing 1.6%. Level of education was a self-reported variable in the Scania Public Health Survey with internal missing 3.5%. We had no information on income. Thus our choice of employment as a SES measure.

Reviewer: Covariates: more information about the names of the registries/datasets and where other covariate information came from. Is it from the Scania public health survey?
Answer: We thank the reviewer for this comment. We have now clarified on page 9 under Covariates that the participants' age and SES were registry data from Statistics Sweden (SCB) while all other covariates were self-reported data from the Scania public health survey.

Reviewer: p. 8: not typical for the Table numbers to be spelled out in the Methods; these can be removed.
Answer: We thank the reviewer for this comment. However, for reasons of clarity we choose to keep the Table numbers in the Method section as we refer both to Tables and Supplementary  Tables.

Reviewer: p. 17. Not sure what "The loss of these participants in the present study is a selection bias in a weakening direction but of low magnitude on final results" means.
Answer: We thank the reviewer for this comment. We meant that the loss of participants with poor psychological health and higher risk of premature mortality would bias results towards the null if a selection bias of non-participation among persons with serious psychiatric disease had an impact on final results. As this sentence was not easy to understand, we have removed it from the revised manuscript.

Additional information regarding references
Due to the major revision of this manuscript several new references have been added while others have been removed. We have marked the new references in bold to make it easier for you to see which ones are new to the revised manuscript (Nr 3, 4, 17, 18, 20-26, 30, 31).

GENERAL COMMENTS
This is an interesting study which demonstrates that there is a relationship between scoring poorly on the GHQ12 and mortality in men but not women some 5.3 years later. It uses data from a population survey taken in 2008 with linked mortality records to the end of December 2013. The authors chose to analyse the data using a Cox proportional hazards model. Whilst the study appears to be well conducted and reported there are a couple of areas where I have concerns. In the methods section it is stated that "The proportional hazards assumption was considered fulfilled after inspection of the survival curves according to psychological distress". However, later there is a statement which contradicts this "Statistical tests on our study data indicated absence of perfect proportionality with regard to poor GHQ-12 and mortality across the 5.3-year period (see Supplemental file on PH assumption)" and those investigations do demonstrate deviation from the assumptions of the model. Given this is the case why did the authors choose such a model? There are other alternatives such as accelerated failure time models or simple logistic regression given that the follow up period was the same for all. Age was included as a continuous variable. Was the relationship linear? Table 2 shows attenuation of hazard ratios as more explanatory variables are included in the model. However, interaction term between male/female are not included but reference is made earlier to these being significant. This being the case means that the sex specific HRs are not interpretable as shown. Table 3 improves the analysis by calculating sex specific effect sizes and confidence intervals in fully adjusted models which show that there is no significant associations between psychological distress and mortality in women and the degree of association decreases in men but is still substantial. Table 4 explores a dose response relationship between four categories of psychological distress and mortality and demonstrated this clearly in men and not in women.
However, given the very high survival rates in women power to detect any relationship between psychological distress and mortality will be very low. The linked mortality data are from 2013 which is 7-8 years ago. Are more up to date data available to improve the power of the analysis in women?
The abstract states that "GHQ12 could potentially be used as one of several predictors of mortality, especially for men. In the future, screening tools for psychological distress should be validated for both men and women". Why especially for men when no association is shown for women? Perhaps delete especially. I have no additional comments.
Answer: We thank the reviewer for this comment.

Prof. Ronan Lyons, University of Wales Swansea
Comments to the Author: This is an interesting study which demonstrates that there is a relationship between scoring poorly on the GHQ12 and mortality in men but not women some 5.3 years later. It uses data from a population survey taken in 2008 with linked mortality records to the end of December 2013. The authors chose to analyse the data using a Cox proportional hazards model. Whilst the study appears to be well conducted and reported there are a couple of areas where I have concerns.
1_Reviewer: In the methods section it is stated that "The proportional hazards assumption was considered fulfilled after inspection of the survival curves according to psychological distress". However, later there is a statement which contradicts this "Statistical tests on our study data indicated absence of perfect proportionality with regard to poor GHQ-12 and mortality across the 5.3-year period (see Supplemental file on PH assumption)" and those investigations do demonstrate deviation from the assumptions of the model. Given this is the case why did the authors choose such a model? There are other alternatives such as accelerated failure time models or simple logistic regression given that the follow up period was the same for all.
Answer: We thank the reviewer for this comment. We have now chosen to reperform all statistical analyses by logistic regression analysis instead of Cox regression survival analysis, furthermore with 8.3 years follow-up instead of 5.3 years follow-up.
In our previous manuscript we considered the proportional hazard assumption to be adequately fulfilled by ocular inspection of the survival curves according to psychological distress (yes/no). As we argued in the Discussion section a certain dilution of effect by psychological distress on mortality can be expected over time (this has been shown e.g by Lee and Singh in their paper from 2020: Psychological distress and heart disease mortality in the United States: Results from the 1997-2014 NHIS-NDI Record Linkage Study.) In our literature search we also found that Cox regression analysis was clearly the method of choice as this was used by all previous comparable population-based studies on GHQ-12 and mortality.
After your feed-back we compared results from Cox regression analysis and logistic regression analysis (in tables 2-3) for both time-frames (5.3 years and 8.3 years, respectively). The effect measures and 95 % confidence intervals were fairly similar for these two methods, see below. Model 0 adjusted for age and gender.

Table 2 with 5.3 years follow-up: a) HR by Cox regression analysis and b) ORs by Logistic regression analysis
Model 1 furthermore adjusted for socioeconomic status, physical activity, smoking, and alcohol.
Model 2 furthermore adjusted for chronic disease.
Model 1 furthermore adjusted for socioeconomic status, physical activity, smoking, and alcohol.
Model 2 furthermore adjusted for chronic disease.

Table 3 with 5.3 years follow-up: a) HR by Cox regression analysis and b) ORs by Logistic regression analysis
Model 1 furthermore adjusted for socioeconomic status, physical activity, smoking, and alcohol.
Model 2 furthermore adjusted for chronic disease.
Model 1 furthermore adjusted for socioeconomic status, physical activity, smoking, and alcohol.
Model 2 furthermore adjusted for chronic disease.

Table 3 with 8.3 years follow-up: a) HR by Cox regression analysis and b) ORs by Logistic regression analysis
Model 1 furthermore adjusted for socioeconomic status, physical activity, smoking, and alcohol.
Model 2 furthermore adjusted for chronic disease.
Significance levels: * p<0.05, ** p<0.01, *** p<0.001 Weighted Odds Ratios. Bootstrap method (2000 replicates) for variation estimation. Answer: We thank the reviewer for this comment. The relationship between age and mortality was not linear but exponential. The mortality risk was approximately 100 times higher in the oldest group compared with the youngest in our data, which mirrors the relationship between age and mortality in the general population. Table 2 shows attenuation of hazard ratios as more explanatory variables are included in the model. However, interaction term between male/female are not included but reference is made earlier to these being significant. This being the case means that the sex specific HRs are not interpretable as shown. Table 3 improves the analysis by calculating sex specific effect sizes and confidence intervals in fully adjusted models which show that there is no significant association between psychological distress and mortality in women and the degree of association decreases in men but is still substantial.

Reviewer:
Answer: We thank the reviewer for this comment. The intended logic of Table 2 and Table 3 was to first present data on men and women combined with adjustments for age and sex (as in most comparable studies) and then present separate effect measures for men and women by gender stratification (an extension of previous research). Table 4 explores a dose response relationship between four categories of psychological distress and mortality and demonstrated this clearly in men and not in women. However, given the very high survival rates in women power to detect any relationship between psychological distress and mortality will be very low. The linked mortality data are from 2013 which is 7-8 years ago. Are more up to date data available to improve the power of the analysis in women?

Reviewer:
Answer: We thank the reviewer for this comment. When we first submitted this manuscript in 2018 we had access to mortality data with follow-up 5.3 years. The mortality data has now been extended to a follow-up 8.3 years.

Reviewer:
The abstract states that "GHQ12 could potentially be used as one of several predictors of mortality, especially for men. In the future, screening tools for psychological distress should be validated for both men and women". Why especially for men when no association is shown for women? Perhaps delete especially.
Answer: We thank the reviewer for this comment. The word "especially" is now motivated by the weaker association found in women in comparison to men.