Cost vs clinical utility on application of large language models in clinical practice: A double-edged sword

doi:10.4251/wjgo.v17.i12.114341

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 17, Issue 12

This Article

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (506)

All Articles published online

The chart showing PDF series, HTML series, Tables (1-1) series.

Item

Count

PDF

HTML

136

Tables (1-1)

Sum=229

Publishing Process of This Article

The chart showing Browse series, Download series.

Item

Count

Browse

Download

178

Sum=232

Dec 15, 2025 (publication date) through Mar 19, 2026

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

World Journal of Gastrointestinal Oncology

ISSN

1948-5204

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Editorial

World J Gastrointest Oncol. Dec 15, 2025; 17(12): 114341
Published online Dec 15, 2025. doi: 10.4251/wjgo.v17.i12.114341

Cost vs clinical utility on application of large language models in clinical practice: A double-edged sword

Sunny Chi Lik Au

Sunny Chi Lik Au, School of Clinical Medicine, The University of Hong Kong, Hong Kong 999077, China

Author contributions: Au SCL drafted the manuscript, acquired and analyzed the data, and revised the manuscript.

Conflict-of-interest statement: The author reports no relevant conflicts of interest for this article.

Corresponding author: Sunny Chi Lik Au, Chief Physician, Clinical Assistant Professor (Honorary), Research Fellow, School of Clinical Medicine, The University of Hong Kong, 9/F, MO Office, Lo Ka Chow Memorial Ophthalmic Centre, No. 19 Eastern Hospital Road, Causeway Bay, Hong Kong 999077, China. kilihcua@gmail.com

Received: September 16, 2025
Revised: September 26, 2025
Accepted: October 27, 2025
Published online: December 15, 2025
Processing time: 86 Days and 0.2 Hours

Abstract

As large language models increasingly permeate medical workflows, a recent study evaluating ChatGPT 4.0’s performance in addressing patient queries about endoscopic submucosal dissection and endoscopic mucosal resection offers critical insights into three domains: Performance parity, cost democratization, and clinical readiness. The findings highlight ChatGPT’s high accuracy, completeness, and comprehensibility, suggesting potential cost efficiency in patient education. Yet, cost-effectiveness alone does not ensure clinical utility. Notably, the study relied exclusively on text-based prompts, omitting multimodal data such as photographs or endoscopic scans. This is a significant limitation in a visually driven field like endoscopy, where large language model performance may drop precipitously without image context. Without multimodal integration, artificial intelligence tools risk failing to capture key diagnostic signals, underscoring the need for cautious adoption and robust validation in clinical practice.

Keywords: Large language models; Artificial intelligence; ChatGPT; Cost; Patient education; Endoscopic submucosal dissection; Endoscopic mucosal resection

Core Tip: As large language models increasingly permeate medical workflows, this study offers insight into 3 areas: Performance parity, cost democratization, and clinical readiness. Perhaps one potentially compelling finding was cost efficiency. Yet cost-effectiveness alone does not ensure clinical utility. Notably, the study relied exclusively on text-based prompts, omitting multimodal data such as photographs or scans. This is an important limitation in a domain like endoscopy, which often can be driven visually. Large language model performance can drop precipitously when deprived of image context. Without multimodal integration, artificial intelligence tools may inevitably fail to capture key diagnostic signals.