Copyright
©The Author(s) 2025.
World J Gastrointest Oncol. Dec 15, 2025; 17(12): 114341
Published online Dec 15, 2025. doi: 10.4251/wjgo.v17.i12.114341
Published online Dec 15, 2025. doi: 10.4251/wjgo.v17.i12.114341
Table 1 Strengths and weaknesses of text-based vs multimodal large language models
| Text-based LLMs | Multimodal LLMs | |
| Strengths | Provide real-time textual guidance, differential diagnoses, and automated report generation from clinical notes and patient history | Integrate images, videos, and text for comprehensive analysis, improving lesion detection, classification, and spatial localization in procedures like gastroscopy and colonoscopy |
| Support patient education and reduce health education load on physicians | Enhance diagnostic accuracy and real-time decision support through multi-scale feature fusion and domain-adaptive learning | |
| Effective for processing textual data like electronic health records and guidelines, aiding in treatment suggestions | Support fine-grained visual understanding and task-specific improvements via fine-tuning, outperforming text-only models in visual tasks | |
| Weaknesses | Cannot interpret endoscopic images or videos, missing critical visual diagnostic cues such as mucosal abnormalities | Performance gaps compared to human experts, with lower sensitivity to increased task complexity |
| Limited real-time responsiveness and adaptability to new techniques due to reliance on pre-existing textual data | High computational demands, data fusion challenges, and scalability issues for real-time processing of high-resolution endoscopic data | |
| Struggle with complex scenarios requiring visual context, leading to potential incomplete assessments | Limited generalization across institutions and need for large, diverse datasets, plus interpretability concerns |
- Citation: Au SCL. Cost vs clinical utility on application of large language models in clinical practice: A double-edged sword. World J Gastrointest Oncol 2025; 17(12): 114341
- URL: https://www.wjgnet.com/1948-5204/full/v17/i12/114341.htm
- DOI: https://dx.doi.org/10.4251/wjgo.v17.i12.114341
