BPG is committed to discovery and dissemination of knowledge
Case Control Study
Copyright: ©Author(s) 2026. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) license. No commercial re-use. See permissions. Published by Baishideng Publishing Group Inc.
World J Psychiatry. Jun 19, 2026; 16(6): 119773
Published online Jun 19, 2026. doi: 10.5498/wjp.v16.i6.119773
Accuracy and reproducibility of ChatGPT responses to parent and patient inquiries on attention-deficit/hyperactivity disorder
Serkan Turan, Berrin Bilgiç
Berrin Bilgiç, Department of Child and Adolescent Psychiatry, Adnan Menderes University Faculty of Medicine, Aydın 09100, Türkiye
Serkan Turan, Department of Child and Adolescent Psychiatry, Uludag University Faculty of Medicine, Bursa 16059, Türkiye
Author contributions: Bilgiç B conceptualized and designed the study, drafted the manuscript; Turan S supervised the research, performed the statistical analysis, critically revised the manuscript for important intellectual content; Bilgiç B and Turan S collected the data; all authors reviewed and approved the final manuscript.
AI contribution statement: AI-based tools (e.g., ChatGPT and language editing tools) were used in a limited manner to support language refinement and clarity.
Institutional review board statement: This study did not require institutional review board approval, as no patient data, clinical records, or personally identifiable information were used. The study was based exclusively on publicly available questions and artificial intelligence-generated responses. The authors declare no affiliation with OpenAI, the developer of ChatGPT.
Informed consent statement: Informed consent was not required because this study did not involve human participants or patient data.
Conflict-of-interest statement: Conflict of interests: The authors declare that there are no conflicts of interest regarding the publication of this article.
STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-checklist of items.
Data sharing statement: The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Corresponding author: Serkan Turan, MD, PhD, Associate Professor, Department of Child and Adolescent Psychiatry, Uludag University Faculty of Medicine, Gorukle Campus, Bursa 16059, Türkiye. serkanturan@uludag.edu.tr
Received: February 10, 2026
Revised: February 23, 2026
Accepted: March 12, 2026
Published online: June 19, 2026
Processing time: 108 Days and 1 Hours
Abstract
BACKGROUND

Children and adolescents with attention-deficit/hyperactivity disorder (ADHD) and their caregivers increasingly turn to artificial intelligence-based chatbots for information about symptoms, functional difficulties, and treatment-related concerns. Large language models (LLMs), such as ChatGPT, are of particular interest due to their ability to generate fluent, natural-language responses. However, empirical evidence regarding their clinical performance in child and adolescent psychiatry remains limited, especially with respect to the reproducibility and clinical reliability of ADHD-related responses.

AIM

To systematically evaluate the accuracy and reproducibility of ChatGPT (GPT-4o)-generated responses to commonly asked ADHD-related questions from parents and patients.

METHODS

In this cross-sectional study, 88 frequently asked ADHD-related questions were identified through internet search engines, parent-oriented forums, and professional organization websites. Questions were categorized into four domains: Basic knowledge (n = 30), diagnosis and assessment (n = 22), treatment and medication use (n = 21), and long-term outcomes and psychosocial impact (n = 15). Each question was submitted twice to the subscription-based version of ChatGPT (GPT-4o) in separate chat sessions. Two blinded child and adolescent psychiatrists independently evaluated responses for accuracy (comprehensive/correct, incomplete, mixed or potentially misleading, or inaccurate) and reproducibility. Inter-rater agreement and domain-specific differences were analyzed statistically.

RESULTS

Overall, 59.1% (52/88) of responses were rated as comprehensive/correct, 27.3% (24/88) as incomplete, and 13.6% (12/88) as mixed or potentially misleading; no inaccurate or irrelevant responses were identified. Accuracy was highest for basic knowledge questions (66.7%) and lowest for treatment and medication-related questions (47.6%). Overall reproducibility was 87.5% (77/88), with no significant differences across domains (χ², P = 0.61). Inter-rater reliability was moderate (Cohen’s κ = 0.52).

CONCLUSION

ChatGPT (GPT-4o) demonstrated relatively higher accuracy and reproducibility overall, with stronger performance in basic informational and diagnostic domains, but greater variability observed in clinically sensitive areas such as treatment, medication use, and long-term outcomes. These findings highlight both the potential utility and important limitations of LLM-based chatbots in ADHD-related information-seeking, underscoring the need for cautious interpretation-particularly in treatment-related contexts where responses may require professional clinical guidance.

Keywords: Attention-deficit/hyperactivity disorder; Large language models; ChatGPT; Accuracy; Reproducibility

Core Tip: The clinical reliability of large language models (LLMs) in addressing attention-deficit/hyperactivity disorder (ADHD)-related questions from patients and caregivers has not been sufficiently characterized. This study systematically evaluates the accuracy and reproducibility of ChatGPT (GPT-4o) across clinically relevant domains in child and adolescent psychiatry. The findings indicate stronger and more consistent performance in basic informational and diagnostic domains, whereas greater variability was observed in clinically sensitive areas such as treatment, medication use, and long-term outcomes. These results highlight both the potential utility and the limitations of LLM-based tools in ADHD-related information-seeking, emphasizing the need for cautious, developmentally informed interpretation in higher-risk clinical contexts.

Write to the Help Desk