Article Text

Original research
Comparison of predicted psychological distress among workers between artificial intelligence and psychiatrists: a cross-sectional study in Tsukuba Science City, Japan
  1. Shotaro Doki1,
  2. Shinichiro Sasahara1,
  3. Daisuke Hori1,
  4. Yuichi Oi1,
  5. Tsukasa Takahashi2,
  6. Nagisa Shiraki2,
  7. Yu Ikeda2,
  8. Tomohiko Ikeda2,
  9. Yo Arai2,
  10. Kei Muroi2,
  11. Ichiyo Matsuzaki1,3
  1. 1Faculty of medicine, University of Tsukuba, Tsukuba, Japan
  2. 2Graduate School of Comprehensive Human Sciences, University of Tsukuba, Tsukuba, Japan
  3. 3International Institute for Integrative Sleep Medicine, University of Tsukuba, Tsukuba, Japan
  1. Correspondence to Dr Shinichiro Sasahara; s-sshara{at}md.tsukuba.ac.jp

Abstract

Objectives Psychological distress is a worldwide problem and a serious problem that needs to be addressed in the field of occupational health. This study aimed to use artificial intelligence (AI) to predict psychological distress among workers using sociodemographic, lifestyle and sleep factors, not subjective information such as mood and emotion, and to examine the performance of the AI models through a comparison with psychiatrists.

Design Cross-sectional study.

Setting We conducted a survey on psychological distress and living conditions among workers. An AI model for predicting psychological distress was created and then the results were compared in terms of accuracy with predictions made by psychiatrists.

Participants An AI model of the neural network and six psychiatrists.

Primary outcome The accuracies of the AI model and psychiatrists for predicting psychological distress.

Methods In total, data from 7251 workers were analysed to predict moderate and severe psychological distress. An AI model of the neural network was created and accuracy, sensitivity and specificity were calculated. Six psychiatrists used the same data as the AI model to predict psychological distress and conduct a comparison with the AI model.

Results The accuracies of the AI model and psychiatrists for predicting moderate psychological distress were 65.2% and 64.4%, respectively, showing no significant difference. The accuracies of the AI model and psychiatrists for predicting severe psychological distress were 89.9% and 85.5%, respectively, indicating that the AI model had significantly higher accuracy.

Conclusions A machine learning model was successfully developed to screen workers with depressed mood. The explanatory variables used for the predictions did not directly ask about mood. Therefore, this newly developed model appears to be able to predict psychological distress among workers easily, regardless of their subjective views.

  • occupational & industrial medicine
  • psychiatry
  • depression & mood disorders
  • public health

Data availability statement

No data are available.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • An artificial intelligence (AI) model for predicting psychological distress among workers was created using sociodemographic, lifestyle and sleep factors.

  • The explanatory variables used for the predictions did not directly ask about mood; therefore, the developed model could predict psychological distress, regardless of their subjective views.

  • The results of AI model were compared in terms of accuracy with predictions made by psychiatrists.

  • It was difficult to generalise our results to wider populations because of the large number of researchers involved in this study.

  • The model used in this study cannot indicate which parameters/features are most important for predicting the outcome of interest.

Introduction

An estimated 3.5 million people around the world are thought to suffer from depression. In terms of disability-adjusted life years, depression was ranked third in 2004 and is predicted to be ranked first in 2030.1 If further measures are not taken to prevent depression, up to US$16 trillion in economic losses between 2010 and 2030 are expected.2 In Japan, economic losses due to depression in the labour market were reported to total 2 trillion JPY (10 billion GBP) in 2005.3 Therefore, preventive measures for depression are needed for both individual health and the economy.

In recent years, new technologies using artificial intelligence (AI) have been applied to the field of clinical medicine. As a result, AI has achieved higher accuracy than clinicians for the pathological diagnosis of lymph node metastasis in breast cancer.4

In the dermatological clinical setting, using dermoscopic images, AI has achieved higher accuracy than most dermatologists for the diagnosis of melanoma.5 Another AI system that distinguishes malignant tumours from skin lesions has been shown to outperform board-certified dermatologists.6

AI has been applied to not only physical diseases but also mental illnesses. Applying AI in psychiatry has also been shown to be useful for diagnosis and treatment.7 Using machine learning, body part movements have been applied to diagnose depressive disorder with 97.2% accuracy,8 and, using a three-dimensional camera (Kinect; Microsoft, Redmond, Washington, USA), head movements, facial expressions and voice have been applied to diagnose moderate depressive disorder with 89.7% accuracy.9 AI analysis of gait data has also been used to identify mood disorders.10 Using a wearable device, a neural network and random forest model was found to detect stress levels with 92% accuracy and major depressive disorder with 87% accuracy.11 12 Using neural language processing, the tendency towards depression was predicted by posts to the internet with 0.67 precision.13 Electroencephalogram signals have also been reported to be capable of detecting major depressive disorder with high accuracy using a three-dimensional convolutional neural network.14 However, from a neuroimaging point of view, the development of technology to detect mood disorders using AI is still in the exploratory stage.15 In current neural network models, not only biological but also sociodemographic data are used for the prediction of depression.16

To our knowledge, no studies have been conducted to screen depression/psychological distress in a large population using machine learning. For earlier interventions to treat depression/psychological distress, a more rapid and effective screening tool is needed.

AI has been used to monitor safety and health among workers in the field of occupational health17 and to assess sick-listed employees returning to work based on individual, occupational, support and psychological factors.18

Given this background, the present study aimed to use AI to predict psychological distress among workers using sociodemographic, lifestyle and sleep factors, not subjective information such as mood and emotion, and to examine the predictive accuracy of AI through a comparison with psychiatrists.

Methods

Study design

In this cross-sectional study, workers took part in an online survey about psychological distress and living conditions. This study was conducted at Tsukuba Science City, Japan, from February through March 2017. An AI model for predicting psychological distress was created and the results were then compared in terms of accuracy with predictions made by psychiatrists.

Participants

The study participants were mainly researchers and office workers. Tsukuba Science City is located northeast of Tokyo in Ibaraki Prefecture. It has a population of about 230 000 and features two universities and a number of public and private research institutes. As of 2017, the Tsukuba Science City Network consisted of 89 organisations around the city, including research institutions, universities, educational foundations, local governments and private companies. A total of 53 organisations, mainly comprising research and educational institutions, with a total of 19 481 workers, agreed to participate.

Items

The Kessler Screening Scale for Psychological Distress (K6) was used to assess psychological distress. Previous studies have shown that the K6 has two cut-off points: a score of ≥5 points indicates moderate psychological distress and a score of ≥13 points indicates severe psychological distress.19 According to Kessler et al, when the cut-off is set at 12/13 points, the sensitivity and specificity are 0.36 and 0.96, respectively, with an area under the curve value of 0.86 for detecting serious mental illness, as diagnosed by the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV).20

The explanatory variables used in this study were age, sex, marital status, employment status, type of organisation, type of job, education level, living with family members, household income, exercise habit, smoking status and sleep status. Sleep status was measured using the eight-item Athens Insomnia Scale (AIS) sleep questionnaire. Since this study aimed to detect psychological distress from sociodemographic, lifestyle and sleep factors, not by asking directly about mood, two of the AIS items asking about subjective mental status—‘well-being during the day’ and ‘functioning capacity during the day’—were excluded from the analysis.

Neural network protocol

Two neural network models were created to determine the two depressive states: moderate psychological distress (a K6 Score of ≥5 points) and severe psychological distress (a K6 Score of ≥13 points). The model was repeated 10 times (models 1–10). Because the nature of psychological distress has been reported to differ between men and women,21 we created separate models for each sex.

In total, 100 cases (the hold-out test set) were randomly selected for validation through a comparison between the trained AI models and psychiatrists. Machine learning for the model creation was performed using 80% of the total data as training data and 20% as validation data. A neural network was used to perform the training. The loss function was set to SparseCategoricalCrossentropy, which calculates the cross-entropy loss between the predictions and labels, the optimisation function was set to Adam, an optimisation tool combining RMSProp and momentum, and the activation function was set to the rectified linear unit activation function. The number of layers was set to 3, the number of nodes in the middle layer was set to 100 and the number of epochs was set to 20. These conditions were determined with reference to previous studies.4 10 Python V.3.7 (https://www.python.org/downloads/), Keras V.2.2.4 (https://keras.io/) and TensorFlow V.1.13.1 (https://www.tensorflow.org/) were used for the analysis. A schematic of the neural network model used in this study is shown in online supplemental material appendix A.

Psychiatrists

The number of psychiatrists to be used in the comparison was determined with reference to a previous systematic review comparing AI and clinicians that reported median numbers of five clinicians and four experts.22 Six psychiatrists (A–F) participated in the study, five of whom were experts (two government-designated psychiatrists and three occupational health consultants). Two had more than 10 years of experience, three had more than 5 years of experience and one had less than 5 years of experience. The psychiatrists used the same information as that was used to validate the AI to predict moderate and severe psychological distress. The psychiatrists selected one of dichotomous variable of moderate/severe psychological distress or healthy for each individual participant. The psychiatrists independently evaluated the data from the 100 cases hold-out test set to calculate the accuracy, sensitivity and specificity.

Statistical analysis

The accuracy, sensitivity and specificity for each of the 10 AI models and psychiatrists were calculated using the hold-out test set. The results from the AI and psychiatrists were compared using analysis of variance (ANOVA).

Patient and public involvement statement

Neither patients nor the public were involved in this study.

Results

Of the 19 481 potential participants, responses were received from 7255 (response rate: 37.2%). Overall, 4 respondents with missing data for age were excluded, leaving 7251 respondents (4574 (63.1%) males, 2677 (36.9%) females; mean age 44.3 years) for inclusion in the analysis. The participants’ characteristics are shown in table 1. The majority (41.1%) was in teaching and research occupations, 47.0% exercised at least once a week and 10.8% were smokers. The median annual household income was 6–8 million JPY. The mean (SD) K6 Score was 5.6 (5.1), and the numbers of respondents classified as having moderate psychological distress (K6 Score ≥5) and severe psychological distress (K6 Score ≥13) were 3560 (49.1%) and 721 (9.9%), respectively. The mean K6 Score was significantly higher among women than among men (p<0.001).

Table 1

Characteristics of the participants in this study (N=7251)

Machine learning was used to create 10 trained models with 100 validated cases each (table 2 and 3). For moderate psychological distress, the accuracy, sensitivity, specificity and positive and negative predictive values were 65.2%, 64.6%, 65.9%, 65.9% and 65.1%, respectively. For severe psychological distress, the accuracy, sensitivity, specificity and positive and negative target rates were 89.9%, 17.5%, 96.2%, 29.7% and 93.1%, respectively. The results for the six psychiatrists are shown in table 2 and table 3. For moderate psychological distress, the accuracy, sensitivity and specificity were 64.4%, 51.4% and 78.0%, respectively. Among the government-designated psychiatrists, the accuracy, mean sensitivity and specificity were 62.0%, 46.4% and 78.2%, respectively. For severe psychological distress, the accuracy, sensitivity and specificity were 85.8%, 25.0% and 91.1%, respectively. Among the government-designated psychiatrists, the accuracy, mean sensitivity and specificity were 85.7%, 20.8% and 91.3%, respectively.

Table 2

Comparison of predictive performance between our artificial intelligence (AI) model and psychiatrists for moderate psychological distress

Table 3

Comparison of predictive performance between our artificial intelligence (AI) model and psychiatrists for severe psychological distress

The results of the ANOVA conducted to compare the accuracy of the trained AI model with that of the psychiatrists revealed no significant difference for moderate psychological distress (p=0.263). On the other hand, the trained AI model showed significantly higher accuracy than did the psychiatrists for severe psychological distress (p=0.001) (figures 1 and 2).

Figure 1

Comparison between our artificial intelligence (AI) model and psychiatrists of the percentage of correct answers for moderate psychological distress (K6≥5). The results of analysis of variance revealed no significant difference between the two groups (p=0.263). Whiskers extend to the maximum value and minimum value. Data points exceeding a distance of 1.5 times the IQR below the first quartile or above the third quartile are shown as outliers. K6, Kessler Screening Scale for Psychological Distress.

Figure 2

Comparison between our artificial intelligence (AI) model and psychiatrists of the percentage of correct answers for severe psychological distress (K6≥13). The results of analysis of variance revealed that the AI model achieved a significantly higher proportion of correct answers (p=0.001). Whiskers extend to the maximum value and minimum value. K6, Kessler Screening Scale for Psychological Distress.

Discussion

In this study, we successfully created a machine learning model to screen workers with depressed mood. The accuracy was 65.2% for moderate psychological distress and 89.9% for severe psychological distress, which was higher than that for the psychiatrists. The explanatory variables used for the predictions did not directly ask about mood. Therefore, the developed model could be easily applied to the workers, regardless of their subjective views.

This model appears to be appropriate as a screening tool for psychological distress because it consists of questions that do not directly ask about mood and emotion. Many depression/psychological distress questionnaires ask about subjective mood, but due to stigma, many depressed patients remain undiagnosed or untreated.23 Disclosing mental health status in the workplace has become an extraordinarily complex issue.24 Although predicting psychological distress using only assessment items that do not directly ask about mood and emotion can be difficult, it can help prevent the manipulation of questionnaire responses due to stigma towards mental healthcare and inappropriate responses to questions asking about subjective mood. The model developed in this study appears useful as a screening tool for psychological distress because it avoids this issue.

However, due to its low sensitivity and specificity, our model still needs to be improved for practical use. The sensitivity and specificity of the K6 for the prediction of severe psychological distress have been reported to be 0.36 and 0.96, respectively,20 and those of the Depression Anxiety Stress Scales, which detect depression in workers, have been reported to be 0.91 and 0.46, respectively.25 The sensitivity and specificity of the model in this study were 0.18 and 0.96, respectively, and although the specificity was high, due to its low sensitivity, the accuracy of the model still needs to be improved before it can be applied as a screening tool. To improve the accuracy of our model, the amount of training data needs to be increased, ensemble learning has to be performed in conjunction with other classifiers (such as decision trees and support vector machines) and facial expression and speech analysis need to be combined.

Because of its high specificity for the prediction of severe psychological distress and because it performed better than did the psychiatrists, our trained AI model shows potential for clinical applications. At this stage, it is difficult to use AI models as screening tools, but their high specificity suggests that they can aid clinicians in prediction. It is difficult for occupational physicians to predict psychological distress because they are not specialised in psychiatry.26 The use of tools such as guidelines by general practitioners has been shown to improve the diagnosis and management of workers mental health.27 It could be also useful for occupational physicians to interview and check for psychological distress in staff who receive a positive prediction from the AI model, and this could be useful in improving mental health in the workplace.

As a future prospect, we believe that the AI model developed in this study could be used to monitor the risk of psychological distress among workers in the field of occupational health. If smart devices could be used to obtain information on heart rate, activity levels, sleep patterns and working hours automatically, the AI model would be able to provide a prediction without requiring workers to enter data manually. Our AI model could also be effective when combined with the use of a smartphone, as an association has been reported between depression and the frequency of smartphone use.28 In addition, the use of smartphone apps has been shown to help prevent depression in workers.29 In the future, we would like to obtain time-series data and create a more accurate model using a recurrent neural network.

A weak point of this study is that it is difficult to diagnose depression accurately based on assessment items that do not ask directly about mood and emotion. The diagnostic criteria for depression according to the eleventh revision of the International Classification of Diseases and the DSM-5 are based on subjective items, such as depressed mood and loss of pleasure and interest. Based on the results of this study, our AI model could be expected to be used as an adjunct to diagnosis rather than for a definitive diagnosis and in the future, as a screening tool for large populations.

This study did have some limitations. First, we used an online questionnaire to predict psychological distress as opposed to interviews with psychiatrists. Interviews with psychiatrists are necessary to make a more accurate diagnosis, but the cost of conducting such interviews would be extremely high because of the large amount of data. As the K6 was used to predict psychological distress, we could not exclude other illnesses that present with depression, such as bipolar disorder and schizophrenia. In addition, it was difficult to generalise our results to wider populations because of the large number of researchers involved in this study, as researchers tend to burn out and feel more qualitative and quantitative pressure compared with other types of occupations.30 31 Finally, the AI black-box problem remains.32 The model used in this study cannot indicate which parameters/features are the most important for predicting the outcome of interest.

Data availability statement

No data are available.

Ethics statements

Ethics approval

Data were obtained anonymously and the participants were not disadvantaged in any way if they refused to participate in the study. This study was approved by the Ethics Committee of Faculty of Medicine, University of Tsukuba (No. 1374).

Acknowledgments

The authors would like to thank all participants for their generous cooperation and Aranha Claus de Castro for his help. This study was conducted by the Tsukuba Science City Network Occupational Health Technical Committee.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @NP

  • Contributors All authors contributed to the survey design and approved the final version. SD designed the model, analysed the data and wrote the first draft of the paper. SS, DH, YO, TT, NS, YI, TI, YA, KM and IM commented on all versions of the manuscript.

  • Funding This work was supported by JSPS KAKENHI (Grant No. 19K19431).

  • Competing interests SS is the Chair of the Occupational Health Technical Committee, Tsukuba Science City Network; SD, YO and DH are members of the Occupational Health Technical Committee and the other authors are members of the Occupational Health Technical Committee Working Group. None of the authors received remuneration.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.