Published online Oct 15, 2025. doi: 10.4251/wjgo.v17.i10.109792
Revised: June 17, 2025
Accepted: August 27, 2025
Published online: October 15, 2025
Processing time: 145 Days and 19.4 Hours
With the rising use of endoscopic submucosal dissection (ESD) and endoscopic mucosal resection (EMR), patients are increasingly questioning various aspects of these endoscopic procedures. At the same time, conversational artificial inte
To evaluate ChatGPT’s reliability and usefulness regarding ESD and EMR for patients and healthcare professionals.
In this study, 30 specific questions related to ESD and EMR were identified. Then, these questions were repeatedly entered into ChatGPT, with two independent answers generated for each question. A Likert scale was used to rate the accuracy, completeness, and comprehensibility of the responses. Meanwhile, a binary category (high/Low) was used to evaluate each aspect of the two responses generated by ChatGPT and the response retrieved from Google.
By analyzing the average scores of the three raters, our findings indicated that the responses generated by ChatGPT received high ratings for accuracy (mean score of 5.14 out of 6), completeness (mean score of 2.34 out of 3), and comprehensibility (mean score of 2.96 out of 3). Kendall’s coefficients of concordance indicated good agreement among raters (all P < 0.05). For the responses generated by Google, more than half were classified by experts as having low accuracy and low completeness.
ChatGPT provided accurate and reliable answers in response to questions about ESD and EMR. Future studies should address ChatGPT’s current limitations by incorporating more detailed and up-to-date medical information. This could establish AI chatbots as significant resource for both patients and health care professionals.
Core Tip: This study evaluated the reliability and usefulness of chat generative pretrained transformer in addressing questions related to endoscopic submucosal dissection and endoscopic mucosal resection. A set of thirty targeted questions was repeatedly entered, and responses were independently rated for accuracy, completeness, and comprehensibility. Compared with Google, chat generative pretrained transformer produced more accurate, detailed, and easier to understand answers, with consistent agreement among evaluators. The findings indicate that chat generative pretrained transformer may serve as a valuable and accessible source of medical information for both patients and healthcare professionals.
