BPG is committed to discovery and dissemination of knowledge
Observational Study
Copyright: ©Author(s) 2026. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) license. No commercial re-use. See permissions. Published by Baishideng Publishing Group Inc.
World J Psychiatry. Apr 19, 2026; 16(4): 116428
Published online Apr 19, 2026. doi: 10.5498/wjp.v16.i4.116428
Bridging the gap between subjective and objective measures: A multimodal protocol for adolescent depression detection
Yan Zeng, Jian Yang, Li Kuang
Yan Zeng, Department of Psychology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing 400010, China
Jian Yang, Department of Gastroenterology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
Jian Yang, Department of Gastroenterology, Changdu People’s Hospital of Xizang, Changdu 854000, Xizang Autonomous Region, China
Li Kuang, Department of Psychology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
Author contributions: Zeng Y wrote the original manuscript; Zeng Y and Yang J performed the literature search and analyzed the data; Yang J and Kuang L edited the final manuscript; Kuang L conceptualized and designed the research; All authors read and approved the final manuscript.
Supported by Chongqing Science and Health Joint Medical Research Project, No. 2021MSXM034; Natural Science Foundation of Xizang Autonomous Region, No. XZ2024ZR-ZY100(Z); Program for Youth Innovation in Future Medicine, Chongqing Medical University, China, No. W0138; and the Education and Teaching Reform Project of the First Clinical College of Chongqing Medical University, No. CMER202305.
Institutional review board statement: This study was reviewed and approved by the Ethics Committee of the Second Affiliated Hospital of Chongqing Medical University (approval No. 2023-6).
Informed consent statement: All study participants, or their legal guardian, provided informed written consent prior to study enrollment.
Conflict-of-interest statement: All authors report no relevant conflicts of interest for this article.
STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-checklist of items.
Data sharing statement: All supporting data underlying this study are available upon reasonable request to the corresponding author.
Corresponding author: Li Kuang, Chief Physician, Professor, Department of Psychology, The First Affiliated Hospital of Chongqing Medical University, No. 1 Youyi Road, Yuzhong District, Chongqing 400016, China. kuangli0388@126.com
Received: November 12, 2025
Revised: December 7, 2025
Accepted: January 14, 2026
Published online: April 19, 2026
Processing time: 139 Days and 4.6 Hours
Abstract
BACKGROUND

Adolescent depression is a pressing global public health challenge. Current screening largely depends on self-reported questionnaires, which are vulnerable to response biases and underreporting. Integrating objective behavioral signals with validated scales may bridge this subjective-objective gap and improve detection performance.

AIM

To develop a novel multimodal protocol integrating video-recorded facial expressions, vocal prosody, and the Chinese Secondary School Students Depression Scale (CSSSDS) to improve the accuracy and robustness of depression detection for adolescents in Mainland China.

METHODS

A total of 771 adolescents (aged 12-18 years, mean 15.23 ± 1.68) were recruited. Facial expressions, reading-aloud voices, and CSSSDS scale data were collected from all participants. Five machine learning algorithms [extreme gradient boosting (XGBoost), logistic regression, random forest, support vector machine, and artificial neural network] were trained under two conditions: (1) A multimodal protocol that combined facial expressions, vocal prosody, and the CSSSDS; and (2) A bimodal protocol that combined facial expressions and vocal prosody. Performance was evaluated using accuracy, precision, recall, F1 score, area under the receiver operating characteristic curve (AUC-ROC), and area under the precision-recall curve (AUC-PR) with repeated 10-fold cross-validation.

RESULTS

Statistical analysis confirmed XGBoost as the preferred algorithm in both multimodal and bimodal protocols, showing statistically significant superiority (P < 0.05) across several key metrics (multimodal recall and F1 score; bimodal AUC-ROC, AUC-PR, and F1 score). In stark contrast, the artificial neural network exhibited high volatility and low precision despite achieving perfect recall in both protocols (all P < 0.001). Statistical comparisons further confirmed the superiority of the multimodal XGBoost over its bimodal counterpart, demonstrating higher AUC-ROC (t = 4.52, P < 0.001) and AUC-PR (t = 3.87, P < 0.001), both with large effect sizes (Cohen’s d > 1.0). The multimodal model also demonstrated significantly greater stability in core discriminative metrics (AUC-ROC, AUC-PR, and recall; all P < 0.05).

CONCLUSION

The XGBoost-driven multimodal model demonstrated superior discriminative power, greater stability, and a balanced precision-recall profile compared with bimodal models and other algorithms. Nevertheless, limitations related to sample size, use of a regionspecific scale, and task-driven data collection mean that further validation in larger, more diverse, and ecologically valid settings is warranted.

Keywords: Adolescent depression; Multimodal detection; Facial expression; Vocal prosody; Machine learning; Extreme gradient boosting

Core Tip: Current screening for adolescent depression relies heavily on subjective questionnaires. Therefore, we developed a multimodal protocol combining the Chinese Secondary School Students Depression Scale with objective facial and vocal data to improve detection. Our analysis showed that extreme gradient boosting outperformed other machine learning models under multimodal and bimodal settings, achieving superior performance across multiple metrics. Statistical comparisons confirmed that the multimodal extreme gradient boosting model significantly surpassed its bimodal counterpart, demonstrating the advantage of integrating subjective scales with objective data to enhance the accuracy and robustness of depression screening in Chinese adolescents. Further validation in diverse populations is warranted.