Foundation models: Insights and implications for gastrointestinal cancer

doi:10.3748/wjg.v31.i47.112921

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 31, Issue 47

This Article

Table of Contents

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (658)

All Articles published online

The chart showing PDF series, HTML series, Figures (1-1) series, Tables (1-6) series.

Item

Count

PDF

HTML

360

Figures (1-1)

Tables (1-6)

Sum=468

Publishing Process of This Article

The chart showing Browse series, Download series.

Item

Count

Browse

Download

113

Sum=143

Dec 21, 2025 (publication date) through Mar 18, 2026

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

World Journal of Gastroenterology

ISSN

1007-9327

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Review Open Access

World J Gastroenterol. Dec 21, 2025; 31(47): 112921
Published online Dec 21, 2025. doi: 10.3748/wjg.v31.i47.112921

Foundation models: Insights and implications for gastrointestinal cancer

Lei Shi, Rui Huang, Li-Ling Zhao, An-Jie Guo

Lei Shi, Rui Huang, An-Jie Guo, School of Life Sciences, Chongqing University, Chongqing 400044, China

Li-Ling Zhao, Department of Stomatology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400042, China

ORCID number: Lei Shi (0000-0002-7995-5664).

Author contributions: Shi L and Huang R designed the study and collected the data; Shi L, Huang R, and Zhao LL analyzed and interpreted the data; Shi L and Huang R wrote the manuscript; Zhao LL and Guo AJ revised the manuscript; all authors approved the final version of the manuscript.

Supported by the Open Project Program of Panxi Crops Research and Utilization Key Laboratory of Sichuan Province, No. SZKF202302; and the Fundamental Research Funds for the Central Universities No. 2019CDYGYB024.

Conflict-of-interest statement: The authors deny any conflict of interest.

Corresponding author: Lei Shi, Associate Professor, School of Life Sciences, Chongqing University, No. 55 University City South Road, Shapingba District, Chongqing 400044, China. shil@cqu.edu.cn

Received: August 11, 2025
Revised: September 10, 2025
Accepted: November 3, 2025
Published online: December 21, 2025
Processing time: 132 Days and 10 Hours

Abstract

Gastrointestinal (GI) cancers represent a major global health concern due to their high incidence and mortality rates. Foundation models (FMs), also referred to as large models, represent a novel class of artificial intelligence technologies that have demonstrated considerable potential in addressing these challenges. These models encompass large language models (LLMs), vision FMs (VFMs), and multimodal LLMs (MLLMs), all of which utilize transformer architectures and self-supervised pre-training on extensive unlabeled datasets to achieve robust cross-domain generalization. This review delineates the principal applications of these models: LLMs facilitate the structuring of clinical narratives, extraction of insights from medical records, and enhancement of physician-patient communication; VFMs are employed in the analysis of endoscopic, radiological, and pathological images for lesion detection and staging; MLLMs integrate heterogeneous data modalities, including imaging, textual information, and genomic data, to support diagnostic processes, treatment prediction, and prognostic evaluation. Despite these promising developments, several challenges remain, such as the need for data standardization, limited diversity within training datasets, substantial computational resource requirements, and ethical-legal concerns. In conclusion, FMs exhibit significant potential to advance research and clinical management of GI cancers. Future research efforts should prioritize the refinement of these models, promote international collaborations, and adopt interdisciplinary approaches. Such a comprehensive strategy is essential to fully harness the capabilities of FMs, driving substantial progress in the fight against GI malignancies.

Key Words: Foundation models; Gastrointestinal cancers; Large language models; Vision foundation models; Multimodal large language models

Core Tip: This review synthesizes applications of foundation models in gastrointestinal cancer, from clinical text structuring and image analysis to multimodal data integration. Despite current knowledge gaps and challenges like data standardization, it highlights foundation models’ transformative potential, urging refined models and collaborations to advance gastrointestinal cancer research.

Citation: Shi L, Huang R, Zhao LL, Guo AJ. Foundation models: Insights and implications for gastrointestinal cancer. World J Gastroenterol 2025; 31(47): 112921
URL: https://www.wjgnet.com/1007-9327/full/v31/i47/112921.htm
DOI: https://dx.doi.org/10.3748/wjg.v31.i47.112921

INTRODUCTION

Gastrointestinal (GI) cancers represent some of the most prevalent and lethal malignancies worldwide, imposing a substantial burden on public health[1]. Their multifactorial etiology and heterogeneous clinical manifestations make them difficult to study and treat using current methods[2]. Nevertheless, the advent of next-generation artificial intelligence (AI) models, known as oundation odels (FMs), offers novel avenues for addressing these challenges[3]. These models, trained on vast amounts of datasets, have the power to handle complex tasks, thereby presenting promising strategies to mitigate this worldwide health concern[4].

Unlike early AI methods that targeted isolated tasks or limited data modalities, FMs can integrate diverse medical data types, including endoscopic images, pathology slides, electronic health records (EHRs), genomic data, and clinical narratives[5]. This integrative capability is particularly pertinent to GI cancers, which often progress through a defined pattern (e.g., Correa’s cascade from gastritis to cancer)[6]. Accurate risk assessment, early diagnosis, and therapeutic decision-making require comprehensive data interpretation. However, current knowledge regarding the application of FMs in GI cancer remains limited, underscoring the imperative to systematically review current implementations and delineate prospective research trajectories to advance FM utilization in this domain.

Traditional computational biology techniques, such as support vector machines (SVMs) and random forests, alongside more recent deep-learning approaches like convolutional neural networks (CNNs), have made incremental advances in GI cancer research[7]. Nevertheless, these methods face major limitations, including dependence on labor-intensive, high-quality annotations; heterogeneity of datasets across institutions; and a predominant focus on unimodal data (e.g., imaging or genomics in isolation). These constraints highlight the necessity for cross-modal, large-scale pre-trained models[8].

Recent breakthroughs in general-purpose FMs, exemplified by ChatGPT, Stable Diffusion, and related architectures, have introduced a new paradigm shift in GI cancer research[5,9]. Their innovation resides in exceptional generalizability and cross-domain adaptability, facilitated by transformer-based architectures comprising billions of parameters pre-trained on vast, diverse datasets[10]. This pre-training engenders universal representations transferable to a broad spectrum of downstream tasks, maintaining robust performance even with limited or unlabeled data. Compared to traditional methods, FMs offer distinct advantages: Billion-scale parameterization combined with self-supervised learning (SSL) enables deep feature extraction and fusion of heterogeneous data; zero- or few-shot transfer learning substantially diminishes reliance on annotated datasets[11]. This review retrospectively synthesizes key FMs applied in GI cancer research, focusing on three principal categories: Large language models (LLMs) for clinical decision support leveraging EHRs; vision models [e.g., Vision Transformer (ViT) architectures] for endoscopic image analysis; and multimodal fusion models integrating imaging, omics, and pathology data. It is noteworthy that this research field is rapidly evolving, with some models already operational and others exploratory yet exhibiting considerable translational potential.

OVERVIEW OF FMS

This section provides a concise historical overview of AI development to contextualize the emergence of FMs for researchers less familiar with the field. The conceptual foundation of AI traces back to Alan Turing’s 1950 proposal of the "Turing Test", envisioning computational simulation of human intelligence[12]. The 1956 Dartmouth Conference marked a seminal milestone, formally introducing the term "artificial intelligence" and transitioning the field from theoretical inquiry to systematic investigation[13]. AI evolution encompasses three major phases: The nascent period (1950s-1970s), dominated by symbolic logic and expert systems. For example, the Perceptron model developed by Frank Rosenblatt in 1957 attempted to realize classification learning through neural networks but hit a bottleneck due to hardware limitations[14]; the revival period (1980s-2000s) , characterized by statistical learning and big data exemplified by IBM's Deep Blue (which defeated the world chess champion in 1997) and Watson (which won the Jeopardy championship in 2011) verifying AI's potential in specific tasks[15]; and the contemporary era (2010s-present), defined by deep learning and large-scale models. The introduction of the transformer architecture in 2017 revolutionized natural language processing (NLP)[16]. Following 2020, pre-trained large models, exemplified by the GPT series and BERT, demonstrated universal representation capabilities by leveraging massive datasets and extensive parameter counts (e.g., GPT-3 with 175 billion parameters), thereby facilitating a transition in AI from task-specific adaptation to knowledge-driven approaches[17,18]. In 2022, ChatGPT achieved human-aligned conversational abilities; by 2024, the multimodal model GPT-4o made significant progress in cross-modal understanding[19]; and by 2025, reasoning models such as DeepSeek-R1 approximated human cognitive processes[20]. Consequently, AI has experienced a paradigm shift from rule-based systems to data-driven methodologies, culminating in the "pre-training plus fine-tuning" framework and ushering in a new era of general intelligence dominated by FMs. This evolutionary trajectory has culminated in the groundbreaking advancements of contemporary FMs, establishing a novel technical foundation for addressing complex scientific challenges, such as protein folding, as well as clinical applications, including medical diagnosis.

The concept of FMs was initially introduced by the Center for Research on Foundation Models (CRFM) at Stanford University in 2021[11]. CRFM characterizes FMs as models trained on extensive and diverse datasets, typically via large-scale SSL, that can be adapted to a variety of downstream tasks through fine-tuning. These models transcend the traditional reliance of machine learning on task-specific annotated data by capturing the fundamental structures and patterns inherent in data, such as linguistic grammar, visual textures, or cross-modal relationships, thereby substantially enhancing their generalization capabilities[10]. Prominent examples include NLP models like GPT and BERT, as well as multimodal models such as CLIP[17,18,21]. For instance, in the biomedical domain, FMs can learn universal representations by integrating heterogeneous microscopic imaging modalities (e.g., bright-field and fluorescence microscopy) and subsequently be fine-tuned for specific pathological tasks using minimal annotated data, which reduces annotation costs while enabling efficient cross-contextual transfer[22].

A principal distinction between FMs and conventional AI models lies in their methodological approach. Traditional models, such as SVMs and CNNs, are typically designed for narrowly defined tasks and require substantial labeled datasets for each specific application[15]. Consequently, these models exhibit limited generalizability and are not readily adaptable to novel tasks; for example, a model trained to detect gastric cancer pathology cannot be directly repurposed for colorectal cancer (CRC) lymph node identification[23]. In contrast, FMs employ a two-stage process involving self-supervised pre-training followed by downstream fine-tuning[11]. During pre-training, FMs learn from vast quantities of unlabeled data, such as medical images and textual corpora, through tasks like masked reconstruction. Subsequently, fine-tuning enables adaptation to new tasks with relatively small labeled datasets. This paradigm allows a single pre-trained model to be deployed across multiple scenarios. Architectures such as GPT utilize the Transformer framework and autoregressive language modeling, training on extensive internet text corpora to internalize language patterns without manual annotation[16]. This SSL strategy endows FMs with adaptability across diverse tasks, including medical question answering and clinical case summarization, requiring only modest fine-tuning. The capacity for one-time training followed by multi-task reuse underpins FMs’ ability to generalize across domains and modalities, encompassing text, images, and speech, thereby advancing from task-specific models toward more generalized intelligence[5].

The foundational principles of FMs rest upon the integration of architectural design, algorithmic strategies, and technical paradigms, collectively facilitating their versatility and scalability[11]. Architecturally, FMs predominantly adopt the Transformer framework[16], wherein the self-attention mechanism dynamically assigns weights to different elements within a sequence, enabling context-sensitive processing. For example, the term “gastric” may activate distinct medical concepts depending on its contextual usage, such as in “gastric cancer” vs “gastric bezoar”. Algorithmically, FMs follow a pre-training and fine-tuning paradigm. Pre-training constructs a universal knowledge base from large-scale unlabeled data via SSL techniques; for instance, Masked Language Modeling tasks involve predicting obscured text segments (e.g., “Colorectal [MASK] screening guidelines”) to learn associations among medical concepts. Contrastive learning methods align multimodal features, such as correlating endoscopic images with corresponding pathological descriptions[11]. This data-driven approach diminishes dependence on annotated datasets and, when combined with extensive model parameters and massive training corpora, yields substantial performance gains. Fine-tuning adjusts model parameters on task-specific datasets, enabling rapid adaptation to downstream applications; for example, after fine-tuning on tumor classification, the model can accurately delineate cancerous regions in pathology images[11].

FMs can be classified into three categories based on input modalities: LLMs, Vision FMs (VFMs), and Multimodal LLMs (MLLMs)[10]. LLMs are sophisticated neural networks comprising billions of parameters, surpassing traditional language models in performance, with model size generally correlating with efficacy. For example, BioBERT, pre-trained on PubMed abstracts and clinical notes, has enhanced the accuracy of drug-drug interaction predictions[24]. GPT-3 employs in-context learning to generate text completions, overcoming prior limitations[17]. Applications of LLMs include structuring clinical narratives (e.g., extracting gastric cancer TNM staging from medical records), synthesizing evidence from literature (e.g., summarizing clinical trial outcomes for PD-1 inhibitors), and facilitating doctor-patient communication (e.g., generating patient-friendly colonoscopy reports). VFMs specialize in processing visual data such as images and videos, achieving significant advances in visual understanding and generation by integrating Transformer architectures with generative adversarial networks. Representative models include ViT and CLIP[21,25]. Diffusion models like Stable Diffusion have further propelled high-quality, controllable image synthesis. In GI oncology, VFMs have been applied to pathological image analysis (e.g., segmenting gastric mucosal dysplasia), endoscopic video interpretation (e.g., detecting microvascular patterns indicative of early gastric cancer), and multi-scale feature fusion [e.g., generating tumor invasiveness maps by integrating computed tomography (CT)/magnetic resonance imaging (MRI) scans with histopathological sections]. MLLMs unify vision, text, and audio data within a single framework, overcoming the limitations inherent in unimodal systems through cross-modal fusion. While traditional LLMs excel in textual data and VFMs in visual data, MLLMs effectively handle heterogeneous data types. Their applications encompass image-text alignment (e.g., correlating endoscopic images with pathological reports), temporal data fusion (e.g., linking imaging changes with genomic profiles), and clinical decision support (e.g., generating personalized treatment recommendations based on pathological reports)[10].

To provide a contextual understanding of how FMs tackle challenges associated with GI cancers, Figure 1 presents their application framework. It delineates five primary data inputs, further details the processes of pre-training and subsequent fine-tuning methods, applied to various FMs including LLMs, VFMs, and MLLMs. The framework also highlights the spectrum of downstream tasks facilitated by these models, ranging from information extraction to molecular subtyping.

Open in New Tab Full Size Figure Download Figure

Figure 1 Application of foundation models in gastrointestinal cancer. This figure illustrates the workflow and applications of foundation models (FMs) in addressing challenges within gastrointestinal cancer research and clinical practice. Starting with a variety of input data sources such as clinical text documents, endoscopic imaging, radiomics from computed tomography/magnetic resonance imaging scans, pathological slides, and multi-omics data, these inputs are categorized into textual or image data for pre-training FMs using self-supervised learning, transformer architecture, and self-attention mechanisms to develop models like large language models, vision FMs, and multi-modal learning models. After pre-training, these models undergo fine-tuning through methods such as low-rank adaptation, enabling them to perform a wide range of downstream tasks, thereby showcasing the versatility and potential of FMs in advancing gastrointestinal cancer diagnosis, treatment, and research.

To offer a focused overview of FMs specifically designed for GI cancer research, we first present a summary of FMs with validated applications in GI cancer across language, vision, and multimodal domains in Table 1. This summary emphasizes most importantly, their distinct use cases within GI cancer research, categorized as NLP, endoscopy (Endo), radiology (Radio), and pathology (PA). A critical annotation in the "GI cancer applications" column, denoted as "Directly", signifies that the model was employed for GI cancer-related tasks (e.g., NLP, Endo, Radio, PA, or MLLM) without requiring further modification or fine-tuning, thereby underscoring its intrinsic adaptability to clinical demands.

Table 1 Summary of common general-purpose foundation models used in gastrointestinal cancer.

Name	Type	Creator	Year	Architecture	Parameters	Modality	OSS	GI cancer applications
BERT	LLM	Google	2018	Encoder-only transformer	110M (base), 340M (large)	Text	Yes	NLP, Radio, MLLM
GPT-3	LLM	OpenAI	2020	Decoder-only transformer	175B	Text	No	NLP
ViT	Vision	Google	2020	Encoder-only transformer	86M (base), 307M (large), 632M (huge)	Image	Yes	Endo, Radio, PA, MLLM
DINOv1	Vision	Meta	2021	Encoder-only transformer	22M, 86M	Image	Yes	Endo, PA
CLIP	MM	OpenAI	2021	Encoder-encoder	120-580M	Text, Image	Yes	Endo, Radio, MLLM, directly¹
GLM-130B	LLM	Tsinghua	2022	Encoder-decoder	130B	Text	Yes	NLP
Stable Diffusion	MM	Stability AI	2022	Diffusion model	1.45B	Text, Image	Yes	NLP, Endo, MLLM, directly
BLIP	MM	Salesforce	2022	Encoder-decoder	120M (base), 340M (large)	Text, Image	Yes	Radio, MLLM, directly
YouChat	LLM	You.com	2022	Fine-tuned LLMs	Unknown	Text	No	NLP
Bard	MM	Google	2023	Based on PaLM 2	340B estimated	Text, Image, Audio, Code	No	NLP
Bing Chat	MM	Microsoft	2023	Fine-tuned GPT-4	Unknown	Text, Image	No	NLP
Mixtral 8x7B	LLM	Mistral AI	2023	Decoder-only, Mixture-of-Experts (MoE)	46.7B total (12.9B active per token)	Text		NLP
LLaVA	MM	Microsoft	2023	Vision encoder, LLM	7B, 13B	Text, Image	Yes	PA, MLLM
DINOv2	Vision	Meta	2023	Encoder-only transformer	86M to 1.1B	Image	Yes	Endo, Radio, PA, MLLM, directly
Claude 2	LLM	Anthropic	2023	Decoder-only transformer	Unknown	Text	No	NLP
GPT-4	MM	OpenAI	2023	Decoder-only transformer	1.8T (Estimated)	Text, Image	No	NLP, Endo, MLLM, directly
LLaMa 2	LLM	Meta	2023	Decoder-only transformer	7B, 13B, 34B, 70B	Text	Yes	NLP, Endo, MLLM, directly
SAM	Vision	Meta	2023	Encoder-decoder	375M, 1.25G, 2.56G	Image	Yes	Endo, directly
GPT-4V	MM	OpenAI	2023	MM transformer	1.8T	Text, Image	No	Endo, MLLM
Qwen	NLP	Alibaba	2023	Decoder-only transformer	70B, 180B, 720B	Text	Yes	NLP, MLLM
GPT-4o	MM	OpenAI	2024	MM transformer	Unknown (Larger than GPT-4)	Text, Image, Video	No	NLP
LLaMa 3	LLM	Meta	2024	Decoder-only transformer	8B, 70B, 400B	Text	Yes	NLP, directly
Gemini 1.5	MM	Google	2024	MM transformer	1.6T	Text, Image, Video, Audio	No	NLP, Radio, directly
Claude 3.7	MM	Anthropic	2024	Decoder-only transformer	Unknown	Text, Image	No	NLP, directly
YOLOWorld	Vision	IDEA	2024	CNN + RepVL-PAN vision-language fusion	13-110M (depending on scale)	Text, Image	Yes	Endo, directly
DeepSeek	LLM	DeepSeek	2025	Decoder-only transformer	671B	Text	Yes	NLP
Phi-4	LLM	Microsoft	2025	Decoder-only transformer	14B (plus), 7B (mini)	Text	Yes	Endo

¹"Directly" indicates that the model was applied to one of the , endoscopy, radiology, pathology, or multimodal large language models tasks without any modification or fine-tuning.

OSS: Open source software; GI: Gastrointestinal; MM: Multimodal; LLM: Large; NLP: Naturalanguage; MLLM: Multimodal large language models; Endo: Endoscopy; Radio: Radiology; PA: Pathology; ViT: Vision Transformer; SAM: Segment Anything Model; CNN: Convolutional Neural Networks; MoE: Mixture of Experts.

Open in New Tab Full Size Table

The evolution of GI-related FMs reveals a discernible trajectory of enhanced capabilities and improved alignment with clinical requirements. The introduction of Transformer-based architectures by models such as BERT in 2018 laid the foundational groundwork for contemporary FMs, facilitating subsequent advancements in their medical domain adaptation. Between 2020 and 2021, language-centric FMs, including GPT-3 and GLM-130B, experienced substantial scaling, encompassing tens to hundreds of billions of parameters. This expansion augmented their proficiency in managing unstructured GI cancer data, enabling tasks such as the extraction of phenotypic characteristics and treatment information from EHRs and scientific literature. Concurrently, vision-oriented FMs, exemplified by ViT and DINO, adapted Transformer architectures for image-based applications, addressing pivotal challenges in GI cancer diagnosis. Leveraging transfer learning, these models demonstrated high accuracy in detecting early gastric and colorectal lesions within pathology slides and endoscopic video data.

Post-2021 developments witnessed a shift towards multimodal FMs, which further enhanced clinical utility. Models such as CLIP, BLIP, and Stable Diffusion integrated textual and visual encoding capabilities, facilitating end-to-end workflows including lesion localization in radiological imaging and cross-validation of pathology reports with endoscopic observations. Since 2023, advanced FMs like GPT-4, Gemini 1.5, and Claude 3 have extended their input modalities to encompass video and audio data. Employing a Mixture of Experts (MoE) architecture, these models achieve a balance between computational efficiency and performance, a critical factor for processing lengthy endoscopic videos and multimodal patient datasets. Notably, the Segment Anything Model (SAM) has emerged as a versatile instrument for segmenting GI lesions, exemplifying how general-purpose multimodal FMs can be rapidly adapted to meet specific clinical application requirements.

FMS IN GI CANCERS

Applications of LLMs in GI cancers

The analysis of natural language, characterized by its inherent unstructured nature, has historically posed significant challenges for computational processing through rule-based or traditional algorithmic approaches, particularly within the domain of medical texts that contain specialized terminology and complex syntactic structures[26]. However, since the early 2020s, the rapid advancement of LLMs has transformed the field of NLP, establishing these models as the predominant methodology for managing diverse textual data. Notably, LLMs grounded in the Transformer architecture have emerged as the leading solution in NLP applications due to their superior performance, as summarized in Table 1.

LLMs possess the capability to generate novel, contextually relevant text rather than merely reproducing or summarizing existing information[17]. The widespread adoption and standardization of LLMs have significantly democratized NLP, enabling researchers without extensive technical expertise to employ models such as GPT and BERT for practical applications[10]. These models can store and retrieve extensive knowledge bases and extract structured information from medical documents, including radiology and pathology reports, and can even offer medical recommendations based on imaging data[24]. This functionality is particularly valuable in GI cancer research, where integrating data from laboratory records, scientific literature, and clinical reports is essential[27]. LLMs facilitate the effective analysis and utilization of these heterogeneous data sources. Furthermore, LLMs have enhanced accessibility by allowing researchers to utilize pre-trained models like ChatGPT, Gemini, and open-source alternatives such as BERT and DeepSeek without necessitating retraining[19,20,28]. This capability supports tasks including the analysis of radiology reports, endoscopic findings, pathology records, clinical trial documentation, and research notes within GI oncology, thereby streamlining the generation of structured insights, risk stratification, and therapeutic recommendations.

Table 2 and Supplementary Table 1 provide a comprehensive overview of 69 representative studies on NLP and LLM applications in GI cancers conducted between 2011 and 2025. These studies encompass traditional NLP methodologies based on rule sets, lexicons, and statistical learning (Supplementary Table 1), alongside the rapidly emerging Transformer-based LLM approaches post-2020 (Table 2). The inclusion of traditional NLP methods alongside LLMs is justified by several factors: Early rule-based and statistical NLP investigations achieved high accuracy in tasks such as colonoscopy quality control and pathological entity recognition, thereby supplying valuable annotated datasets and task frameworks for subsequent LLM development. Moreover, due to the necessity for interpretability and controllability in clinical environments, certain rule-based techniques continue to function as safety mechanisms or post-validation modules for LLM outputs.

Table 2 Summary of key studies of large language models in the field of gastrointestinal cancer.

Ref.	Year	Models	Objectives	Datasets	Performance	Evaluation
Syed et al[29]	2022	BERTi	Developed fine-tuned BERTi for integrated colonoscopy reports	34165 reports	F1-scores of 91.76%, 92.25%, 88.55% for colonoscopy, pathology, and radiology	Manual chart review by 4 expert-guided reviewer
Lahat et al[30]	2023	GPT	Assessed GPT performance in addressing 110 real-world gastrointestinal inquiries	110 real-life questions	Moderate accuracy (3.4-3.9/5) for treatment and diagnostic queries	Assessed by three gastroenterologists using a 1-5 scale for accuracy etc.
Lee et al[31]	2023	GPT-3.5	Examined GPT-3.5’s responses to eight frequently asked colonoscopy questions	8 colonoscopy-related questions	GPT answers had extremely low text similarity (0%-16%)	Four gastroenterologists rated the answers on a 7-point Likert scale
Emile et al[32]	2023	GPT-3.5	Analyzed GPT-3.5’s ability to generate appropriate responses to CRC questions	38 CRC questions	86.8% deemed appropriately, with 95% concordance on 2022 ASCRS guidelines	Three surgery experts assessed answers using ASCRS guidelines
Moazzam et al[33]	2023	GPT	Investigated the quality of GPT’s responses to pancreatic cancer-related questions	30 pancreatic cancer-questions	80% responses were “very good” or “excellent”	Responses were graded by 20 experts against a clinical benchmark
Yeo et al[34]	2023	GPT	Assessed GPT’s performance in answering questions regarding cirrhosis and HCC	164 questions about cirrhosis and HCC	79.1% correctness for cirrhosis and 74% for HCC, but only 47.3% comprehensiveness	Responses were reviewed by two hepatologists and resolved by a 3^rd reviewer
Cao et al[35]	2023	GPT-3.5	Examined GPT-3.5’s capacity to answer on liver cancer screening and diagnosis	20 questions	48% answers were accurate, with frequent errors in LI-RADS categories	Six fellowship-trained physicians from three centers assessed answers
Gorelik et al[36]	2024	GPT-4	Evaluated GPT-4’s ability to provide guideline-aligned recommendations	275 colonoscopy reports	Aligned with experts in 87% of scenarios, showing no significant accuracy gap	Advice assessed by consensus review with multiple experts
Gorelik et al[37]	2023	GPT-4	Analyzed GPT-4’s effectiveness in post-colonoscopy management guidance	20 clinical scenarios	90% followed guidelines, with 85% correctness and strong agreement (κ = 0.84)	Assessed by two senior gastroenterologists for guideline compliance
Zhou et al[38]	2023	GPT-3.5 and GPT-4	Developed a gastric cancer consultation system and automated report generator	23 medical knowledge questions	91.3% appropriate gastric cancer advice (GPT-4), 73.9% for GPT-3.5	The evaluation was conducted by reviewers with medical standards
Yang et al[39]	2025	RECOVER (LLM)	Designed a LLM-based remote patient monitoring system for postoperative care	7 design sessions, 5 interviews	Six major design strategies for integrating clinical guidelines and information	Clinical staff reviewed and provided feedback on the design and functionality
Kerbage et al[40]	2024	GPT-4	Evaluated GPT-4’s accuracy in responding to IBS, IBD, and CRC screening	65 questions (45 patients, 20 doctors)	84% of answers were accurate	Assessed independently by three senior gastroenterologists
Tariq et al[41]	2024	GPT-3.5, GPT-4, and Bard	Compared the efficacy of GPT-3.5, GPT 4, and Bard (July 2023 version) in answering 47 common colonoscopy patient queries	47 queries	GPT 4 outperformed GPT-3.5 and Bard, with 91.4% fully accurate responses vs 6.4% and 14.9%, respectively	Responses were scored by two specialists on a 0-2 point scale and resolved by a 3^rd reviewer
Maida et al[42]	2025	GPT-4	Evaluated GPT-4’s suitability in addressing screening, diagnostic, therapeutic inquiries	15 CRC screening inquiries	4.8/6 for CRC screening accuracy, 2.1/3 for completeness scored	Assessment involved 20 experts and 20 non-experts rating the answers
Atarere et al[43]	2024	BingChat, GPT, YouChat	Tested the appropriateness of GPT, BingChat, and YouChat in patient education and patient-physician communication	20 questions (15 on CRC screening and 5 patient-related)	GPT and YouChat provided more reliable answers than BingChat, but all models had occasional inaccuracies	Two board-certified physicians and one Gastroenterologist graded the responses
Chang et al[44]	2024	GPT-4	Compared GPT-4’s accuracy, reliability, and alignment of colonoscopy recommendations	505 colonoscopy reports	85.7% of cases matched USMSTF guidelines	Assessment was conducted by an expert panel under USMSTF guidelines
Lim et al[45]	2024	GPT-4	Compared a contextualized GPT model with standard GPT in colonoscopy screening	62 example use cases	Contextualized GPT-4 outperformed standard GPT-4	Compare the GPT4 against a model with relevant screening guidelines
Munir et al[46]	2024	GPT	Evaluated the quality and utility of responses for three GI surgeries	24 research questions	Modest quality and vary significantly based on the type of procedure	Responses were graded by 45 expert surgeons
Truhn et al[47]	2024	GPT-4	Created a structured data parsing module with GPT-4 for clinical text processing	100 CRC reports	99% accuracy for T-stage extraction, 96% for N-stage, and 94% for M-stage	Accuracy of GPT-4 was compared with manually extracted data by experts
Choo et al[48]	2024	GPT	Designed a clinical decision-support system to generate personalized management plans	30 stage III recurrent CRC patients	86.7% agree with tumor board decisions, 100% for second-line therapies	The recommendations were compared with the decision plans made by the MDT
Huo et al[49]	2024	GPT, BingChat, Bard, Claude 2	Established a multi-AI platform framework to optimize CRC screening recommendations	Responses for 3 patient cases	GPT aligned with guidelines in 66.7% of cases, while other AIs showed greater divergence	Clinician and patient advice was compared to guidelines
Pereyra et al[50]	2024	GPT-3.5	Optimized GPT-3.5 for personalized CRC screening recommendations	238 physicians	GPT scored 4.57/10 for CRC screening, vs 7.72/10 for physicians	Answers were compared against a group of surgeons
Peng et al[51]	2024	GPT-3.5	Built a GPT-3.5-powered system for answering CRC-related queries	131 CRC questions	63.01 mean accuracy, but low comprehensiveness scores (0.73-0.83)	Two physicians reviewed each response, with a third consulted for discrepancies
Ma et al[52]	2024	GPT-3.5	Established GPT-3.5-based quality control for post-esophageal ESD procedures	165 esophageal ESD cases	92.5%-100% accuracy across post-esophageal ESD quality metrics	Two QC members and a senior supervisor conducted assessment
Cohen et al[53]	2025	LLaMA-2, Mistral-v0.1	Explored the ability of LLMs to extract PD-L1 biomarker details for research purposes	232 EHRs from 10 cancer types	Fine-tuned LLMs outperformed LSTM trained on > 10000 examples	Assessed by 3 clinical experts against manually curated answers
Scherbakov et al[54]	2025	Mixtral 8 × 7 B	Assessed LLM to extract stressful events from social history of clinical notes	109556 patients, 375334 notes	Arrest or incarceration (OR = 0.26, 95%CI: 0.06-0.77)	One human reviewer assessed the precision and recall of extracted events
Chatziisaak et al[55]	2025	GPT-4	Evaluated the concordance of therapeutic recommendations generated by GPT	100 consecutive CRC patients	72.5% complete concordance, 10.2% partial concordance, and 17.3% discordance	Three reviewers independently assessed concordance with MDT
Saraiva et al[56]	2025	GPT-4	Assessed GPT-4’s performance in interpreting images in gastroenterology	740 images	Capsule endoscopy: Accuracies 50.0%-90.0% (AUCs 0.50-0.90)	Three experts reviewed and labeled images for CE
Siu et al[57]	2025	GPT-4	Evaluated the efficacy, quality, and readability of GPT-4’s responses	8 patient-style questions	Accurate (40), safe (4.25), appropriate (4.00), actionable (4.00), effective (4.00)	Evaluated by 8 colorectal surgeons
Horesh et al[58]	2025	GPT-3.5	Evaluated management recommendations of GPT in clinical settings	15 colorectal or anal cancer patients	Rating 48 for GPT recommendations, 4.11 for decision justification	Evaluated by 3 experienced colorectal surgeons
Ellison et al[59]	2025	GPT-3.5, Perplexity	Compared readability using different prompts	52 colorectal surgery materials	Average 7.0-9.8, Ease 53.1-65.0, Modified 9.6-11.5	Compared mean scores between baseline and documents generated by AI
Ramchandani et al[60]	2025	GPT-4	Validated the use of GPT-4 for identifying articles discussing perioperative and preoperative risk factors for esophagectomy	1967 studies for title and abstract screening	Perioperative: Agreement rate = 85.58%, AUC = 0.87. Preoperative: Agreement rate = 78.75%, AUC = 0.75	Decisions were compared with those of three independent human reviewers
Zhang et al[61]	2025	GPT-4, DeepSeek, GLM-4, Qwen, LLaMa3	To evaluate the consistency of LLMs in generating diagnostic records for hepatobiliary cases using the HepatoAudit dataset	684 medical records covering 20 hepatobiliary diseases	Precision: GPT-4 reached a maximum of 93.42%. Recall: Generally below 70%, with some diseases below 40%	Professional physicians manually verified and corrected all the data
Spitzl et al[62]	2025	Claude-3.5, GPT-4o, DeepSeekV3, Gemini 2	Assessed the capability of state-of-the-art LLMs to classify liver lesions based solely on textual descriptions from MRI reports	88 fictitious MRI reports designed to resemble real clinical documentation	Micro F1-score and macro F1-score: Claude 3.5 Sonnet 0.91 and 0.78, GPT-4o 0.76 and 0.63, DeepSeekV3 0.84 and 0.70, Gemini 2.0 Flash 0.69 and 0.55	Model performance was assessed using micro and macro F1-scores benchmarked against ground truth labels
Sheng et al[63]	2025	GPT-4o and Gemini	Investigated the diagnostic accuracies for focal liver lesions	228 adult patients with CT/MRI reports	Two-step GPT-4o, single-step GPT-4o and single-step Gemini (78.9%, 68.0%, 73.2%)	Six radiologists reviewed the images and clinical information in two rounds (alone, with LLM)
Williams et al[64]	2025	GPT-4-32K	Determined LLM extract reasons for a lack of follow-up colonoscopy	846 patients' clinical notes	Overall accuracy: 89.3%, reasons: Refused/not interested (35.2%)	A physician reviewer checked 10% of LLM-generated labels
Lu et al[65]	2025	MoE-HRS	Used a novel MoE combined with LLMs for risk prediction and personalized healthcare recommendations	SNPs, medical and lifestyle data from United Kingdom Biobank	MoE-HRS outperformed state-of-the-art cancer risk prediction models in terms of ROC-AUC, precision, recall, and F1 score	LLMs-generated advice were validated by clinical medical staff
Yang et al[66]	2025	GPT-4	Explored the use of LLMs to enhance doctor-patient communication	698 pathology reports of tumors	Average communication time decreased by over 70%, from 35 to 10 min (P < 0.001)	Pathologists evaluated the consistency between original and AI reports
Jain et al[67]	2025	GPT-4, GPT-3.5, Gemini	Studied the performance of LLMs across 20 clinicopathologic scenarios in gastrointestinal pathology	20 clinicopathologic scenarios in GI	Diagnostic accuracy: Gemini Advanced (95%, P = 0.01), GPT-4 (90%, P = 0.05), GPT-3.5 (65%)	Two fellowship-trained pathologists independently assessed the responses of the models
Xu et al[68]	2025	GPT-4, GPT-4o, Gemini	Assessed the performance of LLMs in predicting immunotherapy response in unresectable HCC	Multimodal data from 186 patients	Accuracy and sensitivity: GPT-4o (65% and 47%) Gemini-GPT (68% and 58%). Physicians (72% and 70%)	Six physicians (three radiologists and three oncologists) independently assessed the same dataset
Deroy et al[69]	2025	GPT-3.5 Turbo	Explored the potential of LLMs as a question-answering (QA) tool	30 training and 50 testing queries	A1: 0.546 (maximum value); A2: 0.881 (maximum value across three runs)	Model-generated answers were compared to the gold standard
Ye et al[70]	2025	BioBERT-based	Proposed a novel framework that incorporates clinical features to enhance multi-omics clustering for cancer subtyping	Six cancer datasets across three omics levels	Mean survival score of 2.20, significantly higher than other methods	Three independent clinical experts review and validate the clustering results

CRC: Colorectal cancer; HCC: Hepatocellular carcinoma; IBS: Irritable bowel syndrome; IBD: Inflammatory bowel disease; LSTM: Long Short-Term Memory; NLP: Natural language processing; MoE: Mixture of experts; LLMs: Large language models; MDT: Multidisciplinary tumor; SNP: Single nucleotide polymorphism; TCGA: The Cancer Genome Atlas.

Open in New Tab Full Size Table

The volume and temporal distribution of studies reveal distinct trends between traditional NLP and modern LLM research. Over a 14-year span (2011-2025), only 25 studies focused on traditional NLP approaches, whereas LLM-related publications surged from zero to 42 within five years following 2020, indicating rapid expansion. Since 2023, more than ten new investigations annually have employed frameworks such as LLaMA-2 and Gemini, establishing LLMs as the most dynamic area in intelligent text processing for GI cancers.

As detailed in Table 2[29-70], LLMs have been extensively applied to address a variety of GI cancer-related challenges. For example, GPT series models have been utilized to respond to diverse clinical inquiries, including colon cancer screening, pancreatic cancer treatment, and the diagnosis of cirrhosis and liver cancer. These applications underscore the robust language comprehension and generation capabilities of LLMs, enabling them to manage medical knowledge across multiple domains and provide preliminary informational support for clinicians and patients. For example, in 2023, Emile et al[32] found that GPT-3.5 could generate appropriate responses for 86.8% of 38 CRC questions, with 95% concordance with the 2022 ASCRS guidelines.

Several studies have focused on leveraging LLMs to develop personalized medical systems. Choo et al[48] designed a clinical decision support system that used GPT to generate personalized management plans for stage III recurrent CRC patients. The plans showed 86.7% agreement with the decisions of the tumor board, and 100% agreement for second-line therapies. This indicates that LLMs have the potential to provide customized medical solutions based on patients' specific conditions. LLMs have also been applied to automated report generation and data processing. In 2023, Zhou et al[38] developed a gastric cancer consultation system and an automated report generator based on GPT-3.5 and GPT-4. GPT-4 provided appropriate gastric cancer advice in 91.3% of cases. Moreover, in 2024, Truhn et al[47] used GPT-4 to create a structured data parsing module for clinical text processing, achieving 99%, 96%, and 94% accuracy in extracting T-stage, N-stage, and M-stage respectively, which greatly improved the efficiency of data processing. To further facilitate the application of LLMs, some researchers have dedicated to model comparison and optimization. In 2024, Tariq et al[41] compared the performance of GPT-3.5, GPT-4, and Bard in answering 47 common colonoscopy patient queries. They found that GPT-4 outperformed the others, with 91.4% fully accurate responses. This helps researchers understand the performance of different models and select more suitable ones for optimization and application.

Despite these advancements, several challenges hinder the clinical translation of LLMs in GI cancer. First, while LLMs often exhibit remarkable accuracy and promising applications, these models are not specifically designed for medical contexts. Several studies have further revealed inconsistencies or uncertainties in their reported outcomes. Pereyra et al[50] found GPT-3.5 scored just 4.57/10 for CRC screening recommendations, far below physicians’ 7.72/10, while Tariq et al[41] revealed stark model disparities: GPT-4 delivered 91.4% fully accurate colonoscopy query responses, but GPT-3.5 and Bard only achieved 6.4% and 14.9%, respectively. Even for common tasks, Cao et al[35] noted GPT-3.5 had only 48% accuracy in liver cancer screening (with frequent category errors), demonstrating that generalization issues extend beyond rare cases.

Second, data privacy and compliance risks persist. For example, most widely adopted LLMs (e.g., Claude-3.5) are trained on heterogeneous non-medical datasets, lacking inherent safeguards for sensitive GI cancer data. This creates significant HIPAA/GDPR compliance concerns, raising questions about how patient data is protected during model deployment.

Third, interpretability gaps undermine clinical trust, while GPT-4 shows strong guideline alignment (87% agreement with experts in Gorelik et al[36]), its black-box nature means clinicians cannot trace the reasoning behind outputs, a critical flaw in high-stakes scenarios. This is exemplified by Yeo et al[34], where GPT achieved 74% correctness for HCC-related queries but only 47.3% comprehensiveness. Clinicians could not verify why incomplete information was generated, limiting reliance on such tools.

Together, these challenges, inconsistent performance, privacy risks, and opaque reasoning, create barriers to integrating LLMs into routine GI cancer care, as they fail to meet the rigor and reliability required for clinical decision-making. To address the aforementioned challenges and accelerate the clinical integration of LLMs, future research should prioritize directions based on findings presented in Table 2. Such as enhancing fine-tuning LLMs on GI-specific datasets, integrating rule-based checks to verify outputs (with traditional NLP in Supplementary Table 1), using open-source models with local deployment for privacy-sensitive data handling.

Applications of VFMs in GI cancers

Since the early 2020s, VFMs have revolutionized biomedical image analysis[25,71]. These models acquire universal visual representations from extensive collections of unlabeled medical images and can be adapted to specialized tasks, such as GI cancer detection, through fine-tuning on relatively small labeled datasets[72]. For example, in CRC screening, FMs have demonstrated substantial improvements in polyp detection accuracy following fine-tuning. Moreover, VFMs are increasingly employed in cross-modal applications[73]. They integrate different modalities of data to achieve a more comprehensive understanding of disease pathology. This integration necessitates the processing of diverse datasets and significant computational resources. However, the emergence of open-source VFMs, including MedSAM and BiomedCLIP, has enhanced accessibility to these advanced tools[71,74]. Although current usage requires foundational programming skills, the advent of low-code/no-code platforms and Model-as-a-Service (MaaS) frameworks is poised to enable non-expert users to leverage these technologies. Such developments are expected to catalyze advancements in early screening, diagnosis, and personalized treatment strategies for GI cancers.

VFMs in endoscopy: Endoscopy constitutes a critical modality for the diagnosis and management of GI cancers, generating vast quantities of images that capture essential information ranging from early lesions to advanced tumor stages. Traditionally, the interpretation of these images has relied heavily on the expertise of experienced endoscopists, a process that is both time-intensive and susceptible to human error, especially given the increasing volume of examinations[75]. VFMs offer a novel solution by enabling direct analysis of endoscopic video streams, facilitating the automatic localization and classification of lesions such as polyps and ulcers.

Table 3 summarizes 19 recent studies (2023-2025), all of which intentionally adapt VFMs for endoscopy applications. Due to space constraints, more detailed information about these models, such as Country, Dataset sizes, Evaluation metrics, Fine-tuning strategies, Performance benchmarks, and GPUs, is presented in Supplementary Table 2. In contrast, Supplementary Table 3 focuses on VFMs benchmarked in endoscopy. It includes models that are not specifically trained or fine-tuned for endoscopy, but some models in Table 3[76-94] use these for benchmarking. This table holds significance as it provides reference results from general or medical-general VFMs. It highlights the transferability of VFMs’ visual feature extraction capabilities and enriches the overall analysis of VFMs in the context of endoscopy.

Table 3 Summary of key studies of vision foundation models-assisted endoscopy in the field of gastrointestinal cancer.

Model	Year	Architecture	Training algorithm	Parameters	Datasets	Disease studied	Model type	Source code link
Surgical-DINO[76]	2023	DINOv2	LoRA layers added to DINOv2, optimizing the LoRA layers	86.72M	SCARED, Hamlyn	Endoscopic Surgery	Vision	https://github.com/BeileiCui/SurgicalDINO
ProMISe[77]	2023	SAM (ViT-B)	APM and IPS modules are trained while keeping SAM frozen	1.3-45.6M	EndoScene, ColonDB etc.	Polyps, Skin Cancer	Vision	NA
Polyp-SAM[78]	2023	SAM	Strategy as pretrain only the mask decoder while freezing all encoders	NA	CVC-ColonDB Kvasir etc.	Colon Polyps	Vision	https://github.com/ricklisz/Polyp-SAM
Endo-FM[79]	2023	ViT B/16	Pretrained using a self-supervised teacher-student framework, and fine-tuned on downstream tasks	121M	Colonoscopic, LDPolyp etc.	Polyps, erosion, etc.	Vision	https://github.com/med-air/Endo-FM
ColonGPT[80]	2024	SigLIP-SO, Phi1.5	Pre-alignment with image-caption pairs, followed by supervised fine-tuning using LoRA	0.4-1.3B	ColonINST (30k+ images)	Colorectal polyps	Vision	https://github.com/ColonGPT/ColonGPT
DeepCPD[81]	2024	ViT	Hyperparameters are optimized for colonoscopy datasets, including Adam optimizer	NA	PolypsSet, CP-CHILD-A etc.	CRC	Vision	https://github.com/Zhang-CV/DeepCPD
OneSLAM[82]	2024	Transformer (CoTracker)	Zero-shot adaptation using TAP + Local Bundle Adjustment	NA	SAGE-SLAM, C3VD etc.	Laparoscopy, Colon	Vision	https://github.com/arcadelab/OneSLAM
EIVS[83]	2024	Vision Mamba, CLIP	Unsupervised Cycle‑Consistency	63.41M	613 WLE, 637 images	Gastrointestinal	Vision	NA
APT[84]	2024	SAM	Parameter-efficient fine-tuning	NA	Kvasir-SEG, EndoTect etc.	CRC	Vision	NA
FCSAM[85]	2024	SAM	LayerNorm LoRA fine-tuning strategy	1.2M	Gastric cancer (630 pairs) etc.	GC, Colon Polyps	Vision	NA
DuaPSNet[86]	2024	PVTv2-B3	Transfer learning with pre-trained PVTv2-B3 on ImageNet	NA	LaribPolypDB, ColonDB etc.	CRC	Vision	https://github.com/Zachary-Hwang/Dua-PSNet
EndoDINO[87]	2025	ViT (B, L, g)	DINOv2 methodology, hyperparameters tuning	86M to 1B	HyperKvasir, LIMUC	GI Endoscopy	Vision	https://github.com/ZHANGBowen0208/EndoDINO/
PolypSegTrack[88]	2025	DINOv2	One-step fine-tuning on colonoscopic videos without first pre-training	NA	ETIS, CVC-ColonDB etc.	Colon polyps	Vision	NA
AiLES[89]	2025	RF-Net	Not fine-tuned from external model	NA	100 GC patients	Gastric cancer	Vision	https://github.com/CalvinSMU/AiLES
PPSAM[90]	2025	SAM	Fine-tuning with variable bounding box prompt perturbations	NA	EndoScene, ColonDB etc.	Investigated in Ref.	Vision	https://github.com/SLDGroup/PP-SAM
SPHINX-Co[91]	2024	LLaMA-2 + SPHINX-X	Fine-tuned SPHINX-X on CoPESD with cosine learning rate scheduler	7B, 13B	CoPESD	Gastric cancer	Multimodal	https://github.com/gkw0010/CoPESD
LLaVA-Co[91]	2024	LLaVA-1.5 (CLIP-ViT-L)	Fine-tuned LLaVA-1.5 on CoPESD with cosine learning rate scheduler	7B, 13B	CoPESD	Gastric cancer	Multimodal	https://github.com/gkw0010/CoPESD
ColonCLIP[92]	2025	CLIP	Prompt tuning with frozen CLIP, then encoder fine-tuning with frozen prompts	57M, 86M	OpenColonDB	CRC	Multimodal	https://github.com/Zoe-TAN/ColonCLIP-OpenColonDB
PSDM[93]	2025	Stable Diffusion + CLIP	Continual learning with prompt replay to incrementally train on multiple datasets	NA	PolypGen, ColonDB, Polyplus etc.	CRC	Vision, Generative	The original paper reported a GitHub link for this model, but it is currently unavailable
PathoPolypDiff[94]	2025	Stable Diffusion v1-4	Fine-tuned Stable Diffusion v1-4 and locked first U-Net block, fine-tuned remaining blocks	NA	ISIT-UMR Colonoscopy Dataset	CRC	Generative	https://github.com/Vanshali/PathoPolyp-Diff

ViT: Vision Transformer; SAM: Segment Anything Model; CLIP: Contrastive Language-Image Pre-training; APM: Auto-Prompting Module; IPS: Incremental pattern shifting; DSTPE: Dynamic spatial-temporal positional encoding; LMSA: Linear multihead self-attention; TAP: Tracking any point; GC: Gastric cancer; AdamW: AdamW Optimizer; TL: Transfer learning; RL: Reinforcement Learning; DDPM: Denoising Diffusion Probabilistic Model; NA: Not available.

Open in New Tab Full Size Table

VFMs demonstrate notable strengths in GI cancer endoscopy through multiple advanced approaches. Parameter-efficient variants such as Surgical-DINO (LoRA, 0.3% trainable) and APT/FCSAM (adapter-based, < 1%) achieve competitive results, while fully-fine-tuned Endo-FM reaches 73.9 Dice on CVC-12k[76,79]. With respect to multimodal reasoning, LLaVA-Co achieves GPT scores of 85.6/100 and mIoU 60.2% on ESD benchmarks[91]. Regarding unified architectures across tasks, SAM-derived pipelines (e.g., ProMISe[77], Polyp-SAM[78], APT[84], FCSAM[85], PP-SAM[90]) have so far been individually evaluated for either segmentation or detection metrics. This suggests a single foundation backbone could replace the current patchwork of bespoke CNNs. For generative augmentation, PSDM[93] and PathoPolyp-Diff[94] utilize Stable Diffusion to synthesize polyp subtypes and show good performance in improving relevant downstream tasks. VFMs first benefit from limited real data, then generate synthetic data to further refine themselves. In terms of hardware economy, while billion-scale models such as EndoDINO[87] require 8 × H100, many adapter-based systems (e.g., DuaPSNet[86], AiLES[89]) train on 2 or fewer consumer GPUs (e.g., RTX 3060/3090) (Supplementary Table 2). This is largely because self-supervised pre-training has already handled the bulk of computations, making the democratization of high-quality GI AI feasible even for resource-constrained centers.

Supplementary Table 3 has unique value in the context of VFMs for GI endoscopy: It includes models that are not optimized specifically for endoscopy but still prove useful in benchmarking. For example, models like TimeSformer and ST-Adapter, despite lacking endoscopy-specific refinement, demonstrate certain value when used in the benchmarking of Endo-FM[79]. Meanwhile, general-purpose models such as SAM, Gemini-1.5, and Stable Diffusion are also tested in the benchmarking of other models like PPSAM[90], ColonCLIP[92] and PathoPolyp-Diff[94] respectively, showing their potential to support performance evaluation in this specialized field. These results confirm the general-purpose vision-language capabilities of models like CLIP and Gemini-1.5 (Supplementary Table 3), even when the base model has never been exposed to endoscope data.

Collectively, these findings show that VFMs, whether applied directly or through secondary development, play a pivotal role in GI cancer endoscopy tasks including polyp recognition and early lesion monitoring. They contribute to enhanced diagnostic efficiency and accuracy. Furthermore, the reviewed studies highlight the complementary strengths of diverse models in specific tasks, thereby laying the groundwork for future multi-model fusion systems aimed at intelligent endoscopic diagnosis.

VFMs in radiology: VFMs have become increasingly significant in radiology, particularly for GI cancer diagnosis, complementing traditional endoscopic approaches. Radiological modalities such as CT, MRI, and positron emission tomography play essential roles in initial cancer staging, metastasis detection, treatment monitoring, and postoperative recurrence identification[95]. Traditional radiology methods involve manually marking regions of interest and extracting features, which is reliable but time-consuming and constrained by limited data[96]. In contrast, VFMs using Transformer-based architectures enable automated processing of entire images, capturing intricate details of tumors and adjacent tissues. This reduces the need for manual annotation. The recent availability of large-scale, open-source VFMs pre-trained on millions of radiographs has facilitated fine-tuning on relatively small datasets, such as several dozen enhanced CT scans for gastric or CRC, using modest computing resources[97].

To summarize the application and development of VFMs in radiology for GI cancer, three key tables are presented in this section. Table 4 encapsulates 10 representative VFM studies, covering essential information such as model architecture, training algorithm, applied datasets. Supplementary Table 4 extends the content of Table 4 by providing more methodological details for the same 10 models, including specific sizes of datasets, evaluation metrics, fine-tuning strategies, performance benchmarks. Meanwhile, Supplementary Table 5 offers a set of models that were not specifically trained or fine-tuned for radiology tasks but were adopted as benchmarks by several models in Table 4[97-105], thereby providing a comparative context to assess the relative performance of VFMs tailored for radiology.

Table 4 Summary of key studies of vision foundation models-assisted radiology in the field of gastrointestinal cancer.

Model	Year	Architecture	Training algorithm	Parameters	Datasets	Disease studied	Model type	Source code link
PubMedCLIP[98]	2021	CLIP	Fine-tuned on ROCO dataset for 50 epochs with Adam optimizer	NA	ROCO, VQA-RAD, SLAKE	Abdomen samples	Multimodal	https://github.com/sarahESL/PubMedCLIP
RadFM[97]	2023	MedLLaMA-13B	Pre-trained on MedMD and fine-tuned on RadMD	14B	MedMD, RadMD etc.	Over 5000 diseases	Multimodal	https://github.com/chaoyi-wu/RadFM
Merlin[99]	2024	I3D-ResNet152	Multi-task learning with EHR and radiology reports and fine-tuning for specific tasks	NA	6M images, 6M codes and reports	Multiple diseases, Abdominal	Multimodal	NA
MedGemini[100]	2024	Gemini	Fine-tuning Gemini 1.0/1.5 on medical QA, multimodal and long-context corpora	1.5B	MedQA, NEJM, GeneTuring	Various	Multimodal	https://github.com/Google-Health/med-gemini-medqa-relabelling
HAIDEF[101]	2024	VideoCoCa	Fine-tuning on downstream tasks with limited labeled data	NA	CT volumes and reports	Various	Vision	https://huggingface.co/collections/google/
CTFM[102]	2024	Vision Model¹	Trained using a self-supervised learning strategy, employing a SegResNet encoder for the pre-training phase	NA	26298 CT scans	CT scans (stomach, colon)	Vision	https://aim.hms.harvard.edu/ct-fm
MedVersa[103]	2024	Vision Model¹	Trained from scratch on the MedInterp dataset and adapted to various medical imaging tasks	NA	MedInterp	Various	Vision	https://github.com/3clyp50/MedVersa_Internal
iMD4GC[104]	2024	Transformer-based²	A novel multimodal fusion architecture with cross-modal interaction and knowledge distillation	NA	GastricRes/Sur, TCGA etc.	Gastric cancer	Multimodal	https://github.com/FT-ZHOU-ZZZ/iMD4GC/
Yasaka et al[105]	2025	BLIP-2	LORA with specific fine-tuning of the fc1 layer in the vision and q-former models	NA	5777 CT scans	Esophageal cancer via chest CT	Multimodal	NA

¹These models were trained from scratch, rather than being fine-tuned from a pre-existing foundation model.

²The overall iMD4GC framework is built upon Transformer-based multimodal fusion architecture. ResNet3D (for CT images), ABMIL/DSMIL (for WSI), TabNet/SNN/Transformer (for tabular and genomic data).

ViT: Vision Transformer; CLIP: Contrastive Language-Image Pre-training; CoCa: Contrastive captioners; LORA: Low-rank adaptation; NA: Not available; TCGA: The Cancer Genome Atlas.

Open in New Tab Full Size Table

First, in terms of architectural diversity and technical adaptation, VFMs have evolved from single-modal vision models to integrated multimodal systems. On one hand, vision-specific models focus on optimizing image feature extraction for GI-related scans. for example, CT-FM adopts a SegResNet encoder and uses SSL to process 26298 CT scans, targeting stomach and colon cancer imaging[102]; MedVersa, trained from scratch on the MedInterp dataset, is adapted to multiple medical imaging tasks, including GI cancer detection[103]. On the other hand, multimodal models integrate non-imaging data to enhance diagnostic accuracy. Merlin uses an I3D-ResNet152 architecture and incorporates multi-task learning with EHR and radiology reports, enabling it to handle abdominal GI diseases alongside other conditions[99]. Second, regarding disease coverage and clinical targeting, VFMs now address a broader spectrum of GI cancers while maintaining specificity for individual disease types. Some models achieve wide applicability across GI malignancies. RadFM, built on MedLLaMA-13B and trained on 16M image-text pairs from MedMD, covers over 5000 diseases including various GI cancers[97]; HAI-DEF, based on VideoCoCa, processes CT volumes and reports to support diagnosis for multiple GI-related conditions[101]. In contrast, other models focus on specific GI cancer subtypes to meet targeted clinical needs: Yasaka et al’s model[105], which fine-tunes BLIP-2 via LoRA, uses 5777 CT scans to specifically detect esophageal cancer; iMD4GC is exclusively developed for gastric cancer, leveraging disease-specific datasets to improve diagnostic precision for this subtype[104]. Third, in terms of performance validation and benchmarking, VFMs demonstrate robust results through standardized metrics and multi-center validation, with Supplementary Table 4 providing detailed performance data. For classification tasks, PubMedCLIP, a CLIP-based model fine-tuned on the ROCO dataset, achieves up to a 3% improvement in overall accuracy over MAML (a traditional meta-learning model)[98]. For predictive tasks, Merlin shows strong performance in multi-disease 5-year prediction for GI cancers, with an AUROC of 0.757, and maintains reliability in external validation[99]. Additionally, RadFM outperforms existing multimodal models (e.g., Openflamingo) on RadBench and other public benchmarks, with high scores in classification and open-ended tasks, as confirmed by radiologist evaluation[97].

Unlike these specialized radiology models, several general-purpose VFMs, untailored for radiology, serve as benchmarks for GI cancer-specific VFMs like RadFM or Merlin. These VFMs target broad visual or multimodal understanding outside medicine, making their role in radiology benchmarks significant. For example, GPT-4V, designed for general visual-language tasks, was benchmarked against RadFM[97]; RadFM outperformed it in radiology tasks, as shown by higher BLEU and F1 scores. MedFlamingo and OpenFlamingo also served as RadFM benchmarks, with RadFM excelling in open-ended radiology Q&A on RadBench. OpenCLIP, pre-trained on non-medical data, was used as a benchmark in Merlin’s evaluation[99]. Merlin’s radiology-specific training achieved an internal zero-shot F1 score of 0.741, outperforming OpenCLIP. Their inclusion in benchmarking experiments offers a universal performance baseline for radiology models, enabling objective assessment of domain adaptation benefits. Additionally, they boost the reproducibility and comparability of research, as publicly accessible models like GPT-4V and OpenCLIP allow for consistent cross-study alignment.

Despite the progress of VFMs in GI cancer radiology, several radiology-specific limitations and challenges remain evident in current research. For example, dataset bias and scarcity hinder model generalizability. Models like Yasaka et al[105] rely on relatively small, single-center datasets of 5777 CT scans, which may fail to capture the variability of GI cancer imaging across different populations or clinical settings. There is also limited focus on 3D radiological data. Most models (e.g., PubMedCLIP, RadFM) primarily process 2D images, while 3D CT/MRI volumes, critical for assessing tumor depth and spread in GI cancer, are less addressed (Merlin mentions 3D semantic segmentation with a Dice score of 0.798). To address these issues, future research should prioritize radiology-tailored recommendations. For instance, expand multi-center, diverse datasets for training so that future models could integrate data from global GI cancer centers to reduce bias. In practice, it is possible to combine TCGA data (used by iMD4GC) with real-world clinical scans to cover more ethnicities and disease stages[104]. Moreover, it is useful to enhance 3D data processing capabilities. Leveraging Merlin’s progress in 3D segmentation, future VFMs should optimize architectures for 3D GI cancer imaging to improve tumor staging accuracy, a key radiological task for treatment planning[99].

VFMs in pathology: Histopathology plays a pivotal role in cancer diagnosis, prognosis, and treatment. Traditionally, pathologists examined tissue slides under microscopes, a process that was slow, labor-intensive, and prone to errors stemming from variability in expertise. Such limitations occasionally resulted in misdiagnoses, particularly in complex cases[106]. The integration of digital technologies revolutionized this domain through whole-slide imaging (WSI), which converts glass slides into high-resolution digital images that retain all microscopic details[107]. But manual analysis of these extensive datasets was impractical. This led to the rise of computational pathology, which uses computer algorithms to analyze these images more efficiently[108]. The initial applications of digital pathology primarily supported clinical decision-making by enhancing cancer detection accuracy. In 2021, the Food and Drug Administration (FDA) approved the first AI pathology system, marking a major step forward[109]. Now, advancements in FMs have spurred research into their application for WSI analysis. This is advancing tumor pathology toward greater automation and intelligence, making it faster and more accurate.

To elaborate on the application and advancement of VFMs in GI pathology, Table 5 encapsulates 28 representative VFM studies, showing the deployment of VFMs for tasks like detection & classification, segmentation, and histopathological assessment in GI WSIs. Due to space constraints, Supplementary Table 6 provides comprehensive methodological details for each corresponding model. These applications have markedly enhanced diagnostic efficiency and accuracy. Unlike the direct utilization of FMs in LLMs or endoscopic imaging, GI histopathology adopts a distinct technical approach, likely influenced by the extensive research in computational pathology favoring customized and specialized model architectures. By training and fine-tuning models on domain-specific pathological data, these VFMs achieve precise recognition and analysis of tumor features, rather than relying on general-purpose models.

Table 5 Summary of key studies of Vision Foundation Models-assisted pathology in the field of gastrointestinal cancer.

Model	Year	Architecture	Training Algorithm	Paras	WSIs	Tissues	Open source link
LUNIT-SSL[110]	2021	ViT-S	DINO; full fine-tuning and linear evaluation on downstream tasks	22M	3.7K	32	https://Lunitio.github.io/research/publications/pathology_ssl
CTransPath[111]	2022	Swin Transformer	MoCoV3 (SRCL); frozen backbone with linear classifier fine-tuning	28M	32K	32	https://github.com/Xiyue-Wang/TransPath
Phikon[112]	2023	ViT-B	iBOT (Masked Image Modeling); fine-tuned with ABMIL/TransMIL on frozen features	86M	6K	16	https://github.com/owkin/HistoSSLscaling
REMEDIS[113]	2023	BiT-L (ResNet-152)	SimCLR (contrastive learning); end-to-end fine-tuning on labeled ID/OOD data	232M	29K	32	https://github.com/google-research/simclr
Virchow[114]	2024	ViT-H, DINOv2	DINOv2 (SSL); used frozen embeddings with simple aggregators	632M	1.5M	17	https://huggingface.co/paige-ai/Virchow
Virchow2[115]	2024	ViT-H	DINOv2 (SSL); fine-tuned with linear probes or full-tuning on downstream tasks	632M	3.1M	25	https://huggingface.co/paige-ai/Virchow2
Virchow2G[115]	2024	ViT-G	DINOv2 (SSL); fine-tuned with linear probes or full fine-tuning	1.9B	3.1M	25	https://huggingface.co/paige-ai/Virchow2
Virchow2G mini[115]¹	2024	ViT-S, Virchow2G	DINOv2 (SSL); distilled from Virchow2G, then fine-tuned on downstream tasks	22M	3.2M	25	https://huggingface.co/paige-ai/Virchow2
UNI[9]	2024	ViT-L	DINOv2 (SSL); used frozen features with linear probes or few-shot learning	307M	100K	20	https://github.com/mahmoodlab/UNI
Phikon-v2[116]	2024	ViT-L	DINOv2 (SSL); frozen ViT and ABMIL ensemble fine-tuning	307M	58K	30	https://huggingface.co/owkin/phikon-v2
RudolfV[117]	2024	ViT-L	DINOv2 (SSL); fine-tuned with optimizing linear classification layer and adapting encoder weights	304M	103K	58	https://github.com/rudolfv
HIBOU-B[118]	2024	ViT-B	DINOv2 (SSL); frozen feature extractor, trained linear classifier or attention pooling	86M	1.1M	12	https://github.com/HistAI/hibou
HIBOU-L[118]²	2024	ViT-L	DINOv2 (SSL); frozen feature extractor, trained linear classifier or attention pooling	307M	1.1M	12	https://github.com/HistAI/hibou
H-Optimus-0³	2024	ViT-G	DINOv2 (SSL); linear probe and ABMIL on frozen features	1.1B	> 500K	32	https://github.com/bioptimus/releases/
Madeleine[119]	2024	CONCH	MAD-MIL; linear probing, prototyping, and full fine-tuning for downstream tasks	86M	23K	2	https://github.com/mahmoodlab/MADELEINE
COBRA[120]	2024	Mamba-2	Self-supervised contrastive pretraining with multiple FMs and Mamba2 architecture	15M	3K	6	https://github.com/KatherLab/COBRA
PLUTO[121]	2024	FlexiVit-S	DINOv2; frozen backbone with task-specific heads for fine-tuning	22M	158K	28	NA
HIPT[122]	2025	ViT-HIPT	DINO (SSL); fine-tune with gradient accumulation	10M	11K	33	https://github.com/mahmoodlab/HIPT
PathoDuet[123]	2025	ViT-B	MoCoV3; fine-tuned using standard supervised learning on labeled downstream task data	86M	11K	32	https://github.com/openmedlab/PathoDuet
Kaiko[124]	2025	ViT-L	DINOv2 (SSL); linear probing with frozen encoder on downstream tasks	303M	29K	32	https://github.com/kaiko-ai/towards_large_pathology_fms
PathOrchestra[125]	2025	ViT-L	DINOv2; ABMIL, linear probing, weakly supervised classification	304M	300K	20	https://github.com/yanfang-research/PathOrchestra
THREADS[126]	2025	ViT-L, CONCHv1.5	Fine-tune gene encoder, initialize patch encoder randomly	16M	47K	39	https://github.com/mahmoodlab/trident
H0-mini[127]	2025	ViT	Using knowledge distillation from H-Optimus-0	86M	6K	16	https://huggingface.co/bioptimus/H0-mini
TissueConcepts[128]	2025	Swin Transformer	Frozen encoder with linear probe for downstream tasks	27.5M	7K	14	https://github.com/FraunhoferMEVIS/MedicalMultitaskModeling
OmniScreen[129]	2025	Virchow2	Attention-aggregated Virchow2 embeddings fine-tuning	632M	48K	27	https://github.com/OmniScreen
BROW[130]	2025	ViT-B	DINO (SSL); self-distillation with multi-scale and augmented views	86M	11K	6	NA
BEPH[131]	2025	BEiTv2	BEiTv2 (SSL); supervised fine-tuning on clinical tasks with labeled data	86M	11K	32	https://github.com/Zhcyoung/BEPH
Atlas[132]	2025	ViT-H, RudolfV	DINOv2; linear probing with frozen backbone on downstream tasks	632M	1.2M	70	NA

¹Virchow2 and Virchow2G are large foundation models trained from scratch with increasing size (632M and 1.9B parameters), while Virchow2G Mini is a small (22M) distilled version of Virchow2G for efficient deployment.

²Hibou-L is a larger, more powerful model trained on more data and achieving state-of-the-art performance, while Hibou-B is a smaller, efficient, and open-sourced variant with fewer parameters.

³H-optimus-0 is an open-source foundation model developed by Bioptimus, released without an accompanying publication.

WSIs: Whole-slide images; Paras: Parameters; ViT: Vision Transformer; MAD-MIL: Multi-head Attention-based Deep Multiple Instance Learning; NA: Not available.

Open in New Tab Full Size Table

The current research of VFMs in GI pathology presents distinct characteristics across three dimensions, with evidence supported by models from Table 5 and Supplementary Table 6. First, in terms of model architecture, there has been a clear trend toward diversification and scale expansion, with ViT variants becoming the dominant framework while complementary architectures continue to emerge. As shown in Table 5[9,110-132], early models (e.g., LUNIT-SSL) adopted lightweight ViT-S architectures with only 22M parameters, which laid the foundation for VFM application in pathology[110]. By 2024-2025, large-scale ViT-based models had become mainstream: Virchow2[114] and Virchow2G[115] use ViT-H and ViT-G architectures with 632M and 1.9B parameters, respectively, enabling more complex feature extraction for GI cancer tissue analysis. Meanwhile, specialized architectures such as Swin Transformer (CTransPath[111]) and Mamba-2 (COBRA[120]) have been introduced to address the spatial hierarchy of WSI data. For example, CTransPath’s Swin Transformer design, as detailed in Supplementary Table 6, contributes to its ability to outperform ImageNet-pretrained models by +0.6% accuracy on CRC datasets, demonstrating the adaptability of diverse architectures to GI pathology tasks[111]. Second, training data scale expansion and algorithm innovation have significantly enhanced the feature learning capabilities of VFMs, with SSL remaining the core training paradigm. Table 5 reveals a dramatic increase in training WSI volume: From LUNIT-SSL’s 3.7K WSIs (2021) to Virchow2’s 3.1M WSIs (2024) and Atlas’s 1.2M WSIs (2025), covering up to 70 tissue types (Atlas) that include multiple GI cancer subtypes[110,115,132]. Several models further complement this by highlighting dataset diversity. For instance, Phikon was pre-trained on 4M CRC-specific tiles (TCGA-COAD) and scaled to 40M pan-cancer tiles, allowing it to capture GI cancer-specific histological features more effectively[112]. In terms of algorithms, SSL methods have evolved from early contrastive learning (e.g., LUNIT-SSL’s DINO[110], CTransPath’s MoCoV3[111]) to advanced masked image modeling (e.g., Phikon’s iBOT[112]) and knowledge distillation (e.g., Virchow2G mini[115]). Virchow2G mini, a distilled version of Virchow2G (1.9B parameters), retains only 22M parameters but still outperforms larger models like H-Optimus-0 (1.1B parameters) on multiple GI cancer-related benchmarks (Table 5), proving that algorithm optimization can balance model efficiency and performance. Third, performance improvement and generalization ability enhancement have become key indicators, with models consistently outperforming traditional supervised baselines across diverse tasks. Supplementary Table 6 provides detailed performance evidence. REMEDIS, which uses a two-stage SSL strategy (contrastive learning on unlabeled medical images and end-to-end fine-tuning), achieves up to 11.5% in-distribution gain and 10.7% out-of-distribution gain compared to ImageNet/JFT baselines, even when using only 1%-33% of labeled GI cancer data[113]. This is critical for scenarios with limited annotated pathology samples. Virchow, trained on 1.5M H&E-stained WSIs, demonstrates state-of-the-art pan-cancer detection performance, with the highest or statistically tied AUC across nearly all GI cancer types (e.g., colorectal, gastric cancer) and superior generalization to external institutional data[114]. Additionally, models like Phikon[112], Madeleine[119] and HIPT[122] extend VFM application to GI cancer-related tasks beyond classification, such as survival prediction (using Harrell C-index as an evaluation metric), further expanding the utility of VFMs in clinical GI pathology workflows.

Despite their promising progress, VFMs still face distinct limitations and challenges when applied to GI pathology, most of which are closely tied to the unique characteristics of pathological analysis and clinical workflows. First, over-reliance on large-scale, high-quality pathological datasets restricts accessibility. For example, models like Virchow2[115] and Atlas[132] use 3.1M and 1.2M WSIs respectively (Table 5), but such multi-institutional, well-annotated cohorts (e.g., covering rare GI cancer subtypes) are scarce in clinical practice. Smaller datasets (e.g., COBRA’s 3K WSIs) sometimes lead to limited generalization to diverse pathological scenarios[120]. Second, mismatch between model design and pathological interpretation remains a barrier. While models like HIBOU-L[118] and Phikon-v2[116] achieve high AUC in classification (Supplementary Table 6), they lack interpretability for key pathological features. Unlike pathologists who rely on visible morphological cues, VFMs often function as “black boxes”, making clinical validation difficult. Third, computational cost and deployment feasibility hinder clinical translation. Large models such as Virchow2G (1.9B parameters) use a large amount of GPUs for training (Supplementary Table 6), while even compressed models like Virchow2G mini (22M parameters) need specialized hardware. Most clinical pathology laboratories, especially those with limited resources, cannot meet these requirements[115].

Future research on VFMs in GI cancer pathology should target specific limitations. For example, to address data scarcity, it is a priority to develop small-dataset-adaptable VFMs. The H0-mini model success in leveraging 6K WSIs (Table 5) via knowledge distillation from H-Optimus-0[127]. Future models could integrate distillation and cross-stain transfer learning, enabling reliable training even with limited GI cohorts (similar to Virchow2G mini)[115]. Second, to enhance pathological interpretability, designing feature-aligned VFMs is useful. Drawing on Phikon-v2, particularly its biomarker prediction tasks (Supplementary Table 6), future models could link image features to pathological biomarkers (e.g., MSI, HER2, ER in GI tumors), bridging the gap between model outputs and pathologists’ morphological analysis[116]. Third, to improve clinical deployment, optimizing lightweight VFMs for laboratory hardware is critical. Following TissueConcepts’ 27.5M-parameter design (Table 5) and efficient linear-probe fine-tuning (Supplementary Table 6), future research should focus on compressing models to run on standard laboratory workstations, avoiding reliance on large GPU clusters (as needed by larger models like Virchow2 or Phikon-v2)[128]. Finally, to tackle sample variability, training VFMs on heterogeneous pathological datasets is necessary. Models could incorporate augmented data simulating staining inconsistencies and tissue folding, enhancing robustness to real-world GI biopsy variations.

Applications of MLLMs in GI cancers

In the preceding overview of endoscopic and radiological imaging, multimodal FMs have been recurrently highlighted (Tables 2 and 3). These models integrate different types of data, like endoscopic images with text, or CT and MRI scans alongside clinical records and genetic information, to yield superior diagnostic and prognostic performance relative to unimodal approaches. For instance, the ColonCLIP model analyzes endoscopic images and reports together, and GPT-4V uses a multimodal approach for radiological image analysis[92,133]. MLLMs are designed to process and integrate diverse data modalities (text, images, etc.), thereby capturing intermodal relationships that facilitate more efficient learning and enhanced predictive accuracy[134]. They work by merging diverse data into a unified representation, extracting key features from each data type (e.g., word embeddings from text or CNN features from images), and subsequently integrating these features through mechanisms like multilayer perceptrons or graph neural networks. Such integrative modeling holds considerable promise in medical contexts, offering comprehensive diagnostic insights that can improve therapeutic strategies for diseases including GI cancers[135].

Table 6 summarizes pivotal studies investigating MLLMs within GI pathology, while Supplementary Table 7 extends this overview by detailing methodological aspects constrained by space in the main table. The Supplementary material elaborates on training datasets, specifying sources and volumes of image-text pairs or WSIs, performance evaluation metrics across various tasks, and the training and fine-tuning protocols employed. Collectively, these resources provide a thorough depiction of the current landscape of MLLMs in GI cancer research, enabling an in-depth examination of their potential applications.

Table 6 Summary of key studies of multimodal large language models in the field of gastrointestinal cancer.

Model	Year	Vision architecture	Vision dataset	WSIs	Text model	Text dataset	Parameters	Tissues	Generative	Open source link
PLIP[136]	2023	CLIP	OpenPath	28K	CLIP	OpenPath	NA	32	Captioning	https://github.com/PathologyFoundation/plip
HistGen[137]	2023	DINOv2, ViT-L	Multiple	55K	LGH Module	TCGA	Approximately 100M	32	Report generation	https://github.com/dddavid4real/HistGen
PathAlign[138]	2023	PathSSL	Custom	350K	BLIP-2	Diagnostic reports	Approximately 100M	32	Report generation	https://github.com/elonybear/PathAlign
CHIEF[139]	2024	CTransPath	14 Sources	60K	CLIP	Anatomical information	27.5M, 63M	19	No	https://github.com/hms-dbmi/CHIEF
PathGen[140]	2024	LLaVA, CLIP	TCGA	7K	CLIP	1.6M pairs	13B	32	WSI assistant	https://github.com/PathFoundation/PathGen-1.6M
PathChat[141]	2024	UNI	Multiple	999K	LLaMa 2	Pathology instructions	13B	20	AI assistant	https://github.com/fedshyvana/pathology_mllm_training
PathAsst[142]	2024	PathCLIP	PathCap	207K	Vicuna-13B	Pathology instructions	13B	32	AI assistant	https://github.com/superjamessyx/Generative-Foundation-AI-Assistant-for-Pathology
ProvGigaPath[143]	2024	ViT	Prov-Path	171K	OpenCLIP	17K Reports	1135	31	No	https://github.com/prov-gigapath/prov-gigapath
TITAN[144]	2024	ViT	Mass340K	336K	CoCa	Medical reports	Approximately 5B	20	Report generation	https://github.com/your-repo/TITAN
CONCH[145]	2024	ViT	Multiple	21K	GPTstyle	1.17M pairs	NA	19	Captioning	http://github.com/mahmoodlab/CONCH
SlideChat[146]	2024	CONCHLongNet	TCGA	4915	Qwen2.5-7B	Slide Instructions	7B	10	WSI assistant	https://github.com/uni-medical/SlideChat
PMPRG[147]	2024	MR-ViT	Custom	7422	GPT-2	Pathology Reports	NA	2	Multi-organ report	https://github.com/hvcl/Clinical-grade-PathologyReport-Generation
MuMo[148]	2024	MnasNet	Custom	429	Transformer	PathoRadio Reports	NA	1	No	https://github.com/czifan/MuMo
ConcepPath[149]	2024	ViT-B, CONCH	Quilt-1M	2243	CLIPGPT	PubMed	Approximately 187M	3	No	https://github.com/HKU-MedAI/ConcepPath
GPT-4V[150]	2024	Phikon ViT-B	CRC-7K, MHIST etc.	338K	GPT-4	NA	40M	3	Report generation	https://github.com/Dyke-F/GPT-4V-In-Context-Learning
MINIM[151]	2024	Stable diffusion	Multiple	NA	BERT, CLIP	Multiple	NA	6	Report generation	https://github.com/WithStomach/MINIM
PathM3[152]	2024	ViT-g/14	PatchGastric	991	FlanT5XL	PatchGastric	NA	1	WSI assistant	NA
FGCR[153]	2024	ResNet50	Custom, GastrADC	3598, 991	BERT	NA	9.21 Mb	6	Report generation	https://github.com/hudingyi/FGCR
PromptBio[154]	2024	PLIP	TCGA, CPTAC	482, 105	GPT-4	NA	NA	1	Report generation	https://github.com/DeepMed-Lab-ECNU/PromptBio
HistoCap[155]	2024	ViT	NA	10K	BERT, BioBERT	GTEx datasets	NA	40	Report generation	https://github.com/ssen7/histo_cap_transformers
mSTAR[156]	2024	UNI	TCGA	10K	BioBERT	Pathology Reports 11K	NA	32	Report generation	https://github.com/Innse/mSTAR
GPT-4 Enhanced[157]	2025	CTransPath	TCGA	NA	GPT-4	ASCO, ESMO, Onkopedia	NA	4	Recommendation generation	https://github.com/Dyke-F/LLM_RAG_Agent
PRISM[158]	2025	Virchow, ViT-H	Virchow dataset	587K	BioGPT	195K Reports	632M	17	Report generation	NA
HistoGPT[159]	2025	CTransPath, UNI	Custom	15K	BioGPT	Pathology Reports	30M to 1.5B	1	WSI assistant	https://github.com/marrlab/HistoGPT
PathologyVLM[160]	2025	PLIP, CLIP	PCaption-0.8M	NA	LLaVA	PCaption-0.5M	NA	Multi	Report generation	https://github.com/ddw2AIGROUP2CQUP/PA-LLaVA
MUSK[161]	2025	Transformer	TCGA	33K	Transformer	PubMed Central	675M	33	Question answering	https://github.com/Lilab-stanford/MUSK

WSIs: Whole-slide images; ViT: Vision Transformer; NA: Not available.

Open in New Tab Full Size Table

Starting with model development and architecture, a key trend lies in the integration of vision and language modules, as exemplified by SlideChat (Table 6)[136-161]. This model employs a dedicated vision encoder to process gigapixel WSIs and pairs it with a language model to enable multimodal conversational capabilities. It further notes that SlideChat’s integration design allows it to answer complex GI tissue pathology questions based on WSI input, achieving an overall accuracy of 81.17% on the SlideBench-VQA (TCGA) benchmark[146]. This result not only validates the effectiveness of cross-modality integration but also highlights the need for targeted parameterization and optimization. Many MLLMs in this field, including those detailed in Supplementary Table 7, undergo fine-tuning of their text-component parameters on GI-cancer-specific datasets, a process that adjusts models to better capture features like histological subtypes of gastric cancer, thereby laying a technical foundation for subsequent dataset utilization and clinical applications.

Closely tied to model advancement is the development of dataset utilization, as high-performance MLLMs rely on both diverse and specialized data sources to generalize to real-world GI cancer scenarios. On one hand, models in Table 6 leverage multi-modal datasets combining publicly available GI cancer image repositories and paired pathology reports, textual documents that detail histological features, diagnoses, and even patient clinical histories. These datasets, often containing thousands of image-text pairs, train MLLMs to establish meaningful correlations between tissue visual appearance and textual descriptions, a prerequisite for accurate clinical interpretation. On the other hand, to address unique challenges in GI pathology (such as WSI-specific analysis), specialized datasets have been developed. An example is the PathCap dataset (Supplementary Table 7), which focuses on multi-modal comprehension for pathology[142]. This dataset integrates WSI patches, associated clinical reports, and a rich collection of 207k image-caption pairs designed to simulate real-world diagnostic queries. By leveraging this multimodal dataset, researchers can train models to better understand the complex interplay between visual and textual information, thereby accelerating the translation of advanced AI techniques into actionable clinical insights.

The technical advancements in models and datasets have ultimately driven applications of MLLMs in GI cancer diagnosis and prognosis. In diagnosis, MLLMs excel at identifying distinct GI cancer types by linking histological image patterns to text-based diagnostic criteria, which notes that several models can distinguish or predict EBV or HER2-positive gastric cancer subtypes (MuMo[148] or ConcepPath[149] respectively). Beyond diagnosis, MLLMs are also advancing prognosis prediction by integrating multi-source data. They extract histological features from images and combine them with patient-specific information from text reports (e.g., tumor stage, grade, molecular markers). Findings suggest these multimodal prognostic models offer more comprehensive and accurate predictions than traditional methods relying solely on single-modality data, reflecting the synergistic progress of MLLMs across model design, data curation, and clinical translation in GI cancer pathology (e.g., CHIEF[139], PathGen[140], MuMo[148]).

Despite their progress, current MLLMs in GI cancer pathology also face distinct limitations. First, data dependence and scarcity hinder generalization, limiting a model's ability to perform well on diverse datasets due to insufficient training data. Models like PathM3 (Table 6) rely on only 991 WSIs from the PatchGastric dataset[152], while MuMo uses a mere 429 WSIs, small sample sizes that risk overfitting to specific tissue types or institutions[148], unlike larger-scale models such as PathChat (999K WSIs) which have broader but still non-representative datasets lacking diverse clinical settings[141]. Second, limited model accessibility and transparency pose barriers to widespread adoption and trust due to restricted availability and unclear operational mechanisms. Models including PRISM[158] and PathM3[152] lack open-source links, preventing independent validation by other researchers (Table 6). Even open models like CHIEF require 8 V100 GPUs (Supplementary Table 7), a resource beyond many clinical labs[139]. Finally, current models are sometimes designed for specific tasks, making them less useful for broader or more varied needs. Several models (e.g., HistGen[137], CONCH[145], FGCR[153]) focus solely on report generation, converting WSI features into text without supporting diagnostic or prognostic assistance. Only 3 out of 26 models (e.g., MUSK[161]) support question-answering for rare GI cancer subtypes. Five models (e.g., CHIEF[139], ConcepPath[149]) are explicitly non-generative, performing only basic tasks like classification and unable to address complex clinical needs such as report interpretation or treatment suggestions.

Future research on MLLMs in GI cancer pathology could improve current weaknesses by making better use of the models’ hidden potential and tackling key missing capabilities. For example, it is possible to enhance the model's ability to perform a broader range of clinical tasks, enabling it to support diverse applications such as diagnosis assistance, prognosis prediction and treatment recommendation. Second, it could enhance the diversity, quality, and clinical relevance of training data by including a broader range of patient demographics, cancer subtypes (including rare forms), disease stages, and multimodal information to ensure models generalize well across real-world clinical scenarios. Third, it could be helpful to improve the integration of these models with real-world clinical workflows by ensuring their outputs are not only accurate and interpretable but also actionable and relevant to practical needs.

DISCUSSION

This review retrospectively summarizes some key and representative studies concerning the application of FMs in GI cancer research. Given that many artificial intelligence terms (e.g., zero-shot learning, black-box problem) may not be familiar to medical researchers, we summarized Supplementary Table 8 to define key terms used in this review for improved clarity. Due to inherent limitations in literature search and screening, it is acknowledged that some studies may not have been included. Although numerous investigations have already shown that FMs have considerable potential in this domain, there are still some challenges in using them and bringing them into clinical practice. For example, medical imaging and pathology data often have different formats and standards across institutions. This makes it hard for models to work well in different settings, especially in studies done at just one center[162]. Furthermore, publication bias remains a concern, whereby studies reporting positive outcomes are preferentially published, whereas negative or inconclusive results often remain unpublished, thereby skewing the overall scientific evidence base.

The extant evidence supporting the use of FMs in GI oncology is constrained by several methodological and practical limitations. First, with respect to data privacy and security, FMs typically necessitate large-scale datasets to achieve optimal performance, which inherently increases the risk of data breaches and unauthorized access[163]. Conventional de-identification techniques are increasingly insufficient, especially when integrating multimodal data types such as imaging, genomics, and EHRs, which may facilitate re-identification. To mitigate these risks, the incorporation of privacy-preserving technologies into model development is imperative[164]. Approaches such as federated learning enable model training across multiple institutions without sharing raw data, effectively shifting the model rather than the data. Differential privacy techniques introduce controlled noise during training to safeguard individual identities, while blockchain technology offers immutable systems for tracking data access and consent. Ensuring global compliance necessitates governance frameworks aligned with regulations such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA), thereby promoting secure and ethical data utilization.

Second, regarding model interpretability and clinical trust, FMs often function as "black boxes", with limited transparency in their decision-making processes, even to developers and end-users[165]. This lack of transparency can undermine clinician and patient confidence, as clear explanations for model-driven recommendations (e.g., the rationale for classifying a polyp as malignant) are typically required. Although explainable AI (XAI) tools such as Grad-CAM (for imaging models), SHAP, and LIME exist, their application within FMs remains limited and predominantly provides correlational rather than causal insights[166]. For example, Grad-CAM can highlight regions of interest in endoscopic images but does not elucidate causal relationships, such as why a specific genetic mutation influences treatment response predictions. This discrepancy highlights a critical gap between clinical needs for causal explanations and the correlational outputs currently provided by FMs. Bridging this gap necessitates the development of clinician-centric visualization interfaces that link model predictions to specific clinical features, including polyp size or histological characteristics. Interpretability should be regarded as a core performance metric alongside accuracy and sensitivity in FM validation studies, rather than an ancillary consideration. Additionally, integrating principles from human factors engineering into FM design can ensure that explanations align with clinical workflows and cognitive demands, thereby fostering greater acceptance.

Third, respecting bias and equity, many FM training datasets predominantly originate from high-income countries and large academic centers, resulting in the underrepresentation of minority populations and low-resource settings[167]. This imbalance introduces biases that may exacerbate health disparities. For example, existing studies so far have largely focused on specific groups of patients, like those from Asia or Europe/the United States, potentially limiting model applicability to other populations. To address these issues, it is essential to actively curate diverse and representative datasets[167]. Fairness-aware training methodologies can adjust for demographic imbalances, and ongoing bias audits should be conducted post-deployment to monitor and recalibrate model performance across different subgroups.

As regards regulatory pathways, current frameworks for medical AI are inadequately suited to FMs, which differ from traditional tools in their generalizability and capacity for continuous learning from new data[168]. Regulatory pathways such as the United States FDA’s De Novo classification and 510(k) clearance have been applied to certain AI-based diagnostic tools, such as the FDA-approved Paige Prostate software for identifying cancer cells in prostate pathology images[109]. However, FMs, which can be adapted for multiple tasks (e.g., CRC detection, chemotherapy response prediction, and high-risk patient identification), do not conform to these static, task-specific approval models. Consequently, novel regulatory paradigms are required. Regulatory sandboxes may facilitate controlled pilot testing in real-world environments, while robust post-market surveillance could become standard practice to monitor long-term safety and efficacy. Ethical and legal challenges also warrant consideration[169]. For example, if a FM makes a mistake in diagnosing GI cancer, it could lead to the wrong treatment. It's not clear who should be held responsible in such cases: the doctor, the model provider, or the patient. Until we have clear rules for this, it's hard to balance risks and benefits for patients.

Finally, in regard to clinical validation and real-world deployment, most FM studies remain confined to technical validation phases, demonstrating high accuracy under controlled conditions[170]. However, such findings do not necessarily translate into clinical utility, defined by improvements in diagnosis, treatment decision-making, or patient outcomes. Operational feasibility, including seamless integration into existing clinical workflows without imposing additional burdens on healthcare providers, is infrequently evaluated. Moreover, cost-effectiveness analyses, such as whether FMs predicting chemotherapy response reduce unnecessary treatment expenditures, are scarce. Addressing these gaps requires rigorous, multicenter, prospective randomized controlled trials. Implementation science research should investigate FM performance across diverse healthcare systems and resource settings. Enhancing transparency through the establishment of public clinical trial registries, where study protocols, data, and outcomes are openly accessible, is also advocated.

CONCLUSION

In summary, FMs possess transformative potential for GI cancer care, ranging from facilitating early detection to enabling personalized therapeutic strategies. Nonetheless, technological advancements alone are insufficient for successful clinical translation. Addressing technical limitations alongside ethical, regulatory, and equity-related challenges is imperative. The future role of FMs in GI oncology is not to supplant clinicians but to augment precision medicine. It's important to recognize that both presently and prospectively, FMs and related tools will not replace endoscopists, radiologists, or pathologists. The main role of models lies in providing professional analytical support, while the final diagnosis and treatment decisions will still be led by clinicians. This partnership between humans and machines will continue to be key to helping patients.

References

Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74:229-263. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 5690] [Cited by in RCA: 13174] [Article Influence: 6587.0] [Reference Citation Analysis (8)]

Bordry N, Astaras C, Ongaro M, Goossens N, Frossard JL, Koessler T. Recent advances in gastrointestinal cancers. World J Gastroenterol. 2021;27:4493-4503. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 10] [Cited by in RCA: 21] [Article Influence: 4.2] [Reference Citation Analysis (0)]

3.	Lipkova J, Kather JN. The age of foundation models. Nat Rev Clin Oncol. 2024;21:769-770. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 12] [Reference Citation Analysis (0)]

Tsang KK, Kivelson S, Acitores Cortina JM, Kuchi A, Berkowitz JS, Liu H, Srinivasan A, Friedrich NA, Fatapour Y, Tatonetti NP. Foundation Models for Translational Cancer Biology. Annu Rev Biomed Data Sci. 2025;8:51-80. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 4] [Article Influence: 4.0] [Reference Citation Analysis (0)]

Moor M, Banerjee O, Abad ZSH, Krumholz HM, Leskovec J, Topol EJ, Rajpurkar P. Foundation models for generalist medical artificial intelligence. Nature. 2023;616:259-265. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 105] [Cited by in RCA: 711] [Article Influence: 237.0] [Reference Citation Analysis (0)]

6.	Zeng R, Gou H, Lau HCH, Yu J. Stomach microbiota in gastric cancer development and clinical implications. Gut. 2024;73:2062-2073. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 77] [Article Influence: 38.5] [Reference Citation Analysis (0)]

Cao JS, Lu ZY, Chen MY, Zhang B, Juengpanich S, Hu JH, Li SJ, Topatana W, Zhou XY, Feng X, Shen JL, Liu Y, Cai XJ. Artificial intelligence in gastroenterology and hepatology: Status and challenges. World J Gastroenterol. 2021;27:1664-1690. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 16] [Cited by in RCA: 23] [Article Influence: 4.6] [Reference Citation Analysis (1)]

Kröner PT, Engels MM, Glicksberg BS, Johnson KW, Mzaik O, van Hooft JE, Wallace MB, El-Serag HB, Krittanawong C. Artificial intelligence in gastroenterology: A state-of-the-art review. World J Gastroenterol. 2021;27:6794-6824. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 28] [Cited by in RCA: 108] [Article Influence: 21.6] [Reference Citation Analysis (12)]

Chen RJ, Ding T, Lu MY, Williamson DFK, Jaume G, Song AH, Chen B, Zhang A, Shao D, Shaban M, Williams M, Oldenburg L, Weishaupt LL, Wang JJ, Vaidya A, Le LP, Gerber G, Sahai S, Williams W, Mahmood F. Towards a general-purpose foundation model for computational pathology. Nat Med. 2024;30:850-862. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 660] [Cited by in RCA: 521] [Article Influence: 260.5] [Reference Citation Analysis (0)]

10.	Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G, Zhang K, Ji C, Yan Q, He L, Peng H, Li J, Wu J, Liu Z, Xie P, Xiong C, Pei J, Yu PS, Sun L. A comprehensive survey on pretrained foundation models: a history from BERT to ChatGPT. Int J Mach Learn Cyber. 2024. [PubMed] [DOI] [Full Text]

11.

Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buch S, Card D, Castellon R, Chatterji N, Chen A, Creel K, Quincy Davis J, Demszky D, Donahue C, Doumbouya M, Durmus E, Ermon S, Etchemendy J, Ethayarajh K, Fei-Fei L, Finn C, Gale T, Gillespie L, Goel K, Goodman N, Grossman S, Guha N, Hashimoto T, Henderson P, Hewitt J, Ho DE, Hong J, Hsu K, Huang J, Icard T, Jain S, Jurafsky D, Kalluri P, Karamcheti S, Keeling G, Khani F, Khattab O, Koh PW, Krass M, Krishna R, Kuditipudi R, Kumar A, Ladhak F, Lee M, Lee T, Leskovec J, Levent I, Li XL, Li X, Ma T, Malik A, Manning CD, Mirchandani S, Mitchell E, Munyikwa Z, Nair S, Narayan A, Narayanan D, Newman B, Nie A, Niebles JC, Nilforoshan H, Nyarko J, Ogut G, Orr L, Papadimitriou I, Park JS, Piech C, Portelance E, Potts C, Raghunathan A, Reich R, Ren H, Rong F, Roohani Y, Ruiz C, Ryan J, Ré C, Sadigh D, Sagawa S, Santhanam K, Shih A, Srinivasan K, Tamkin A, Taori R, Thomas AW, Tramèr F, Wang RE, Wang W, Wu B, Wu J, Wu Y, Xie SM, Yasunaga M, You J, Zaharia M, Zhang M, Zhang T, Zhang X, Zhang Y, Zheng L, Zhou K, Liang P. On the Opportunities and Risks of Foundation Models. 2022 Preprint. Available from: arXiv:2108.07258. [DOI] [Full Text]

12.	Turing AM. I.—Computing Machinery And Intelligence. Mind. 1950;LIX:433-460. [PubMed] [DOI] [Full Text]

13.	McCarthy J, Minsky ML, Rochester N, Shannon CE. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence: August 31, 1955. AI Mag. 1955;27:12-14. [PubMed] [DOI] [Full Text]

14.	Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65:386-408. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 4749] [Cited by in RCA: 2179] [Article Influence: 32.0] [Reference Citation Analysis (0)]

15.	LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-444. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 36149] [Cited by in RCA: 21263] [Article Influence: 1933.0] [Reference Citation Analysis (2)]

16.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: NIPS'17. Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017; Long Beach, CA, United States. Red Hook, NY, United States: Curran Associates Inc., 2017: 6000-6010.

17.

Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In: NIPS '20. Proceedings of the 34th International Conference on Neural Information Processing Systems; 2020; Vancouver, BC, Canada. Red Hook, NY, United States: Curran Associates Inc., 2020: 25.

18.

Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, MN: Association for Computational Linguistics, 2019: 4171-4186. [DOI] [Full Text]

19.

Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Leoni Aleman F, Almeida D, Altenschmidt J, Altman S, Anadkat S, Avila R, Babuschkin I, Balaji S, Balcom V, Baltescu P, Bao H, Bavarian M, Belgum J, Bello I, Berdine J, Bernadett-Shapiro G, Berner C, Bogdonoff L, Boiko O, Boyd M, Brakman A-L, Brockman G, Brooks T, Brundage M, Button K, Cai T, Campbell R, Cann A, Carey B, Carlson C, Carmichael R, Chan B, Chang C, Chantzis F, Chen D, Chen S, Chen R, Chen J, Chen M, Chess B, Cho C, Chu C, Chung HW, Cummings D, Currier J, Dai Y, Decareaux C, Degry T, Deutsch N, Deville D, Dhar A, Dohan D, Dowling S, Dunning S, Ecoffet A, Eleti A, Eloundou T, Farhi D, Fedus L, Felix N, Posada Fishman S, Forte J, Fulford I, Gao L, Georges E, Gibson C, Goel V, Gogineni T, Goh G, Gontijo-Lopes R, Gordon J, Grafstein M, Gray S, Greene R, Gross J, Gu SS, Guo Y, Hallacy C, Han J, Harris J, He Y, Heaton M, Heidecke J, Hesse C, Hickey A, Hickey W, Hoeschele P, Houghton B, Hsu K, Hu S, Hu X, Huizinga J, Jain S, Jain S, Jang J, Jiang A, Jiang R, Jin H, Jin D, Jomoto S, Jonn B, Jun H, Kaftan T, Kaiser Ł, Kamali A, Kanitscheider I, Shirish Keskar N, Khan T, Kilpatrick L, Kim JW, Kim C, Kim Y, Hendrik Kirchner J, Kiros J, Knight M, Kokotajlo D, Kondraciuk Ł, Kondrich A, Konstantinidis A, Kosic K, Krueger G, Kuo V, Lampe M, Lan I, Lee T, Leike J, Leung J, Levy D, Li CM, Lim R, Lin M, Lin S, Litwin M, Lopez T, Lowe R, Lue P, Makanju A, Malfacini K, Manning S, Markov T, Markovski Y, Martin B, Mayer K, Mayne A, McGrew B, McKinney SM, McLeavey C, McMillan P, McNeil J, Medina D, Mehta A, Menick J, Metz L, Mishchenko A, Mishkin P, Monaco V, Morikawa E, Mossing D, Mu T, Murati M, Murk O, Mély D, Nair A, Nakano R, Nayak R, Neelakantan A, Ngo R, Noh H, Ouyang L, O'Keefe C, Pachocki J, Paino A, Palermo J, Pantuliano A, Parascandolo G, Parish J, Parparita E, Passos A, Pavlov M, Peng A, Perelman A, de Avila Belbute Peres F, Petrov M, Ponde de Oliveira Pinto H, Michael, Pokorny, Pokrass M, Pong VH, Powell T, Power A, Power B, Proehl E, Puri R, Radford A; OpenAI. GPT-4 Technical Report. 2023 Preprint. Available from: eprint arXiv:2303.08774. [DOI] [Full Text]

20.

Guo D, Yang D, Zhang H, Song J, Zhang R, Xu R, Zhu Q, Ma S, Wang P, Bi X, Zhang X, Yu X, Wu Y, Wu ZF, Gou Z, Shao Z, Li Z, Gao Z, Liu A, Xue B, Wang B, Wu B, Feng B, Lu C, Zhao C, Deng C, Zhang C, Ruan C, Dai D, Chen D, Ji D, Li E, Lin F, Dai F, Luo F, Hao G, Chen G, Li G, Zhang H, Bao H, Xu H, Wang H, Ding H, Xin H, Gao H, Qu H, Li H, Guo J, Li J, Wang J, Chen J, Yuan J, Qiu J, Li J, Cai JL, Ni J, Liang J, Chen J, Dong K, Hu K, Gao K, Guan K, Huang K, Yu K, Wang L, Zhang L, Zhao L, Wang L, Zhang L, Xu L, Xia L, Zhang M, Zhang M, Tang M, Li M, Wang M, Li M, Tian N, Huang P, Zhang P, Wang Q, Chen Q, Du Q, Ge R, Zhang R, Pan R, Wang R, Chen RJ, Jin RL, Chen R, Lu S, Zhou S, Chen S, Ye S, Wang S, Yu S, Zhou S, Pan S, Li SS, Zhou S, Wu S, Ye S, Yun T, Pei T, Sun T, Wang T, Zeng W, Zhao W, Liu W, Liang W, Gao W, Yu W, Zhang W, Xiao WL, An W, Liu X, Wang X, Chen X, Nie X, Cheng X, Liu X, Xie X, Liu X, Yang X, Li X, Su X, Lin X, Li XQ, Jin X, Shen X, Chen X, Sun X, Wang X, Song X, Zhou X, Wang X, Shan X, Li YK, Wang YQ, Wei YX, Zhang Y, Xu Y, Li Y, Zhao Y, Sun Y, Wang Y, Yu Y, Zhang Y, Shi Y, Xiong Y, He Y, Piao Y, Wang Y, Tan Y, Ma Y, Liu Y, Guo Y, Ou Y, Wang Y, Gong Y, Zou Y, He Y, Xiong Y, Luo Y, You Y, Liu Y, Zhou Y, Zhu YX, Xu Y, Huang Y, Li Y, Zheng Y, Zhu Y, Ma Y, Tang Y, Zha Y, Yan Y, Ren ZZ, Ren Z, Sha Z, Fu Z, Xu Z, Xie Z, Zhang Z, Hao Z, Ma Z, Yan Z, Wu Z, Gu Z, Zhu Z, Liu Z, Li Z, Xie Z, Song Z, Pan Z, Huang Z, Xu Z, Zhang Z, Zhang Z; DeepSeek-AI. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. 2025 Preprint. Available from: eprint arXiv:2501.12948. [DOI] [Full Text]

21.

Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I. Learning Transferable Visual Models From Natural Language Supervision. In: Meila M, Zhang T, editors. Proceedings of Machine Learning Research. Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021: 8748-8763.

22.

Pai S, Bontempi D, Hadzic I, Prudente V, Sokač M, Chaunzwa TL, Bernatz S, Hosny A, Mak RH, Birkbak NJ, Aerts HJWL. Foundation model for cancer imaging biomarkers. Nat Mach Intell. 2024;6:354-367. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 5] [Cited by in RCA: 98] [Article Influence: 49.0] [Reference Citation Analysis (0)]

23.	Shen D, Wu G, Suk HI. Deep Learning in Medical Image Analysis. Annu Rev Biomed Eng. 2017;19:221-248. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2581] [Cited by in RCA: 2089] [Article Influence: 232.1] [Reference Citation Analysis (2)]

24.

Alsentzer E, Murphy J, Boag W, Weng W, Jindi D, Naumann T, Mcdermott M. Publicly Available Clinical BERT Embeddings. In: Rumshisky A, Roberts K, Bethard S, Naumann T, editors. Proceedings of the 2^nd Clinical Natural Language Processing Workshop. Minneapolis, MN, United States: Association for Computational Linguistics, 2019: 72-78. [DOI] [Full Text]

25.

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 9th International Conference on Learning Representations. Austria: ICLR, 2021.

26.	Zhou B, Yang G, Shi Z, Ma S. Natural Language Processing for Smart Healthcare. IEEE Rev Biomed Eng. 2024;17:4-18. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 58] [Cited by in RCA: 41] [Article Influence: 20.5] [Reference Citation Analysis (0)]

27.

Hou JK, Imler TD, Imperiale TF. Current and future applications of natural language processing in the field of digestive diseases. Clin Gastroenterol Hepatol. 2014;12:1257-1261. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 22] [Cited by in RCA: 27] [Article Influence: 2.3] [Reference Citation Analysis (0)]

28.

Team G, Anil R, Borgeaud S, Alayrac JB, Yu J, Soricut R, Schalkwyk J, Dai AM, Hauth A, Millican K, Silver D, Johnson M, Antonoglou I, Schrittwieser J, Glaese A, Chen J, Pitler E, Lillicrap T, Lazaridou A, Firat O, Molloy J, Isard M, Barham PR, Hennigan T, Lee B, Viola F, Reynolds M, Xu Y, Doherty R, Collins E, Meyer C, Rutherford E, Moreira E, Ayoub K, Goel M, Krawczyk J, Du C, Chi E, Cheng H-T, Ni E, Shah P, Kane P, Chan B, Faruqui M, Severyn A, Lin H, Li Y, Cheng Y, Ittycheriah A, Mahdieh M, Chen M, Sun P, Tran D, Bagri S, Lakshminarayanan B, Liu J, Orban A, Güra F, Zhou H, Song X, Boffy A, Ganapathy H, Zheng S, Choe H, Weisz Á, Zhu T, Lu Y, Gopal S, Kahn J, Kula M, Pitman J, Shah R, Taropa E, Al Merey M, Baeuml M, Chen Z, El Shafey L, Zhang Y, Sercinoglu O, Tucker G, Piqueras E, Krikun M, Barr I, Savinov N, Danihelka I, Roelofs B, White A, Andreassen A, von Glehn T, Yagati L, Kazemi M, Gonzalez L, Khalman M, Sygnowski J, Frechette A, Smith C, Culp L, Proleev L, Luan Y, Chen X, Lottes J, Schucher N, Lebron F, Rrustemi A, Clay N, Crone P, Kocisky T, Zhao J, Perz B, Yu D, Howard H, Bloniarz A, Rae JW, Lu H, Sifre L, Maggioni M, Alcober F, Garrette D, Barnes M, Thakoor S, Austin J, Barth-Maron G, Wong W, Joshi R, Chaabouni R, Fatiha D, Ahuja A, Singh Tomar G, Senter E, Chadwick M, Kornakov I, Attaluri N, Iturrate I, Liu R, Li Y, Cogan S, Chen J, Jia C, Gu C, Zhang Q, Grimstad J, Jakse Hartman A, Garcia X, Sankaranarayana Pillai T, Devlin J, Laskin M, de Las Casas D, Valter D, Tao C, Blanco L, Puigdomènech Badia A, Reitter D, Chen M, Brennan J, Rivera C, Brin S, Iqbal S, Surita G, Labanowski J, Rao A, Winkler S, Parisotto E, Gu Y, Olszewska K, Addanki R, Miech A, Louis A, Teplyashin D, Brown G, Catt E, Balaguer J, Xiang J, Wang P, Ashwood Z, Briukhov A, Webson A, Ganapathy S, Sanghavi S, Kannan A, Chang M-W, Stjerngren A, Djolonga J, Sun Y, Bapna A, Aitchison M, Pejman P, Michalewski H, Yu T, Wang C, Love J, Ahn J, Bloxwich D, Han K, Humphreys P, Sellam T, Bradbury J, Godbole V, Samangooei S, Damoc B, Kaskasoli A. Gemini: A Family of Highly Capable Multimodal Models. 2023 Preprint. Available from: eprint arXiv:2312.11805. [DOI] [Full Text]

29.

Syed S, Angel AJ, Syeda HB, Jennings CF, VanScoy J, Syed M, Greer M, Bhattacharyya S, Zozus M, Tharian B, Prior F. The h-ANN Model: Comprehensive Colonoscopy Concept Compilation Using Combined Contextual Embeddings. Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022;5:189-200. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 7] [Article Influence: 1.8] [Reference Citation Analysis (0)]

30.

Lahat A, Shachar E, Avidan B, Glicksberg B, Klang E. Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet? Diagnostics (Basel). 2023;13:1950. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 66] [Reference Citation Analysis (1)]

31.

Lee TC, Staller K, Botoman V, Pathipati MP, Varma S, Kuo B. ChatGPT Answers Common Patient Questions About Colonoscopy. Gastroenterology. 2023;165:509-511.e7. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 55] [Cited by in RCA: 106] [Article Influence: 35.3] [Reference Citation Analysis (0)]

32.

Emile SH, Horesh N, Freund M, Pellino G, Oliveira L, Wignakumar A, Wexner SD. How appropriate are answers of online chat-based artificial intelligence (ChatGPT) to common questions on colon cancer? Surgery. 2023;174:1273-1275. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 35] [Cited by in RCA: 37] [Article Influence: 12.3] [Reference Citation Analysis (0)]

33.

Moazzam Z, Cloyd J, Lima HA, Pawlik TM. Quality of ChatGPT Responses to Questions Related to Pancreatic Cancer and its Surgical Care. Ann Surg Oncol. 2023;30:6284-6286. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 3] [Cited by in RCA: 22] [Article Influence: 7.3] [Reference Citation Analysis (0)]

34.

Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, Ayoub W, Yang JD, Liran O, Spiegel B, Kuo A. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023;29:721-732. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 177] [Cited by in RCA: 396] [Article Influence: 132.0] [Reference Citation Analysis (0)]

35.

Cao JJ, Kwon DH, Ghaziani TT, Kwo P, Tse G, Kesselman A, Kamaya A, Tse JR. Accuracy of Information Provided by ChatGPT Regarding Liver Cancer Surveillance and Diagnosis. AJR Am J Roentgenol. 2023;221:556-559. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 16] [Cited by in RCA: 49] [Article Influence: 16.3] [Reference Citation Analysis (0)]

36.	Gorelik Y, Ghersin I, Arraf T, Ben-ishai O, Klein A, Khamaysi I. Using A Customized Gpt To Provide Guideline-Based Recommendations For The Management Of Pancreatic Mucinous Cystic Lesions. Gastrointest Endosc. 2024;99:AB42. [PubMed] [DOI] [Full Text]

37.

Gorelik Y, Ghersin I, Maza I, Klein A. Harnessing language models for streamlined postcolonoscopy patient management: a novel approach. Gastrointest Endosc. 2023;98:639-641.e4. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 30] [Article Influence: 10.0] [Reference Citation Analysis (0)]

38.	Zhou J, Li T, James Fong S, Dey N, González Crespo R. Exploring ChatGPT's Potential for Consultation, Recommendations and Report Diagnosis: Gastric Cancer and Gastroscopy Reports’ Case. Int J Interact Multimed Artif Intell. 2023;8:7-13. [PubMed] [DOI] [Full Text]

39.

Yang Z, Lu Y, Bagdasarian J, Das Swain V, Agarwal R, Campbell C, Al-Refaire W, El-Bayoumi J, Gao G, Wang D, Yao B, Shara N. RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care. 2025 Preprint. Available from: eprint arXiv:2502.05740. [DOI] [Full Text]

40.

Kerbage A, Kassab J, El Dahdah J, Burke CA, Achkar JP, Rouphael C. Accuracy of ChatGPT in Common Gastrointestinal Diseases: Impact for Patients and Providers. Clin Gastroenterol Hepatol. 2024;22:1323-1325.e3. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 39] [Article Influence: 19.5] [Reference Citation Analysis (0)]

41.

Tariq R, Malik S, Khanna S. Evolving Landscape of Large Language Models: An Evaluation of ChatGPT and Bard in Answering Patient Queries on Colonoscopy. Gastroenterology. 2024;166:220-221. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 26] [Cited by in RCA: 25] [Article Influence: 12.5] [Reference Citation Analysis (0)]

42.

Maida M, Ramai D, Mori Y, Dinis-Ribeiro M, Facciorusso A, Hassan C; and the AI-CORE (Artificial Intelligence COlorectal cancer Research) Working Group. The role of generative language systems in increasing patient awareness of colon cancer screening. Endoscopy. 2025;57:262-268. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 4] [Cited by in RCA: 13] [Article Influence: 13.0] [Reference Citation Analysis (0)]

43.

Atarere J, Naqvi H, Haas C, Adewunmi C, Bandaru S, Allamneni R, Ugonabo O, Egbo O, Umoren M, Kanth P. Applicability of Online Chat-Based Artificial Intelligence Models to Colorectal Cancer Screening. Dig Dis Sci. 2024;69:791-797. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 19] [Reference Citation Analysis (0)]

44.

Chang PW, Amini MM, Davis RO, Nguyen DD, Dodge JL, Lee H, Sheibani S, Phan J, Buxbaum JL, Sahakian AB. ChatGPT4 Outperforms Endoscopists for Determination of Postcolonoscopy Rescreening and Surveillance Recommendations. Clin Gastroenterol Hepatol. 2024;22:1917-1925.e17. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 20] [Article Influence: 10.0] [Reference Citation Analysis (0)]

45.

Lim DYZ, Tan YB, Koh JTE, Tung JYM, Sng GGR, Tan DMY, Tan CK. ChatGPT on guidelines: Providing contextual knowledge to GPT allows it to provide advice on appropriate colonoscopy intervals. J Gastroenterol Hepatol. 2024;39:81-106. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 36] [Article Influence: 18.0] [Reference Citation Analysis (0)]

46.	Munir MM, Endo Y, Ejaz A, Dillhoff M, Cloyd JM, Pawlik TM. Online artificial intelligence platforms and their applicability to gastrointestinal surgical operations. J Gastrointest Surg. 2024;28:64-69. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 14] [Reference Citation Analysis (0)]

47.

Truhn D, Loeffler CM, Müller-Franzes G, Nebelung S, Hewitt KJ, Brandner S, Bressem KK, Foersch S, Kather JN. Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4). J Pathol. 2024;262:310-319. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 56] [Article Influence: 28.0] [Reference Citation Analysis (0)]

48.

Choo JM, Ryu HS, Kim JS, Cheong JY, Baek SJ, Kwak JM, Kim J. Conversational artificial intelligence (chatGPT™) in the management of complex colorectal cancer patients: early experience. ANZ J Surg. 2024;94:356-361. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 28] [Article Influence: 14.0] [Reference Citation Analysis (0)]

49.

Huo B, Mckechnie T, Ortenzi M, Lee Y, Antoniou S, Mayol J, Ahmed H, Boudreau V, Ramji K, Eskicioglu C. Dr. GPT will see you now: the ability of large language model-linked chatbots to provide colorectal cancer screening recommendations. Health Technol. 2024;14:463-469. [RCA] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 15] [Article Influence: 7.5] [Reference Citation Analysis (0)]

50.

Pereyra L, Schlottmann F, Steinberg L, Lasa J. Colorectal Cancer Prevention: Is Chat Generative Pretrained Transformer (Chat GPT) ready to Assist Physicians in Determining Appropriate Screening and Surveillance Recommendations? J Clin Gastroenterol. 2024;58:1022-1027. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 10] [Article Influence: 5.0] [Reference Citation Analysis (0)]

51.

Peng W, Feng Y, Yao C, Zhang S, Zhuo H, Qiu T, Zhang Y, Tang J, Gu Y, Sun Y. Evaluating AI in medicine: a comparative analysis of expert and ChatGPT responses to colorectal cancer questions. Sci Rep. 2024;14:2840. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 23] [Reference Citation Analysis (1)]

52.

Ma H, Ma X, Yang C, Niu Q, Gao T, Liu C, Chen Y. Development and evaluation of a program based on a generative pre-trained transformer model from a public natural language processing platform for efficiency enhancement in post-procedural quality control of esophageal endoscopic submucosal dissection. Surg Endosc. 2024;38:1264-1272. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 3] [Article Influence: 1.5] [Reference Citation Analysis (0)]

53.	Cohen AB, Adamson B, Larch JK, Amster G. Large Language Model Extraction of PD-L1 Biomarker Testing Details From Electronic Health Records. AI Precis Oncol. 2025;2:57-64. [PubMed] [DOI] [Full Text]

54.

Scherbakov D, Heider PM, Wehbe R, Alekseyenko AV, Lenert LA, Obeid JS. Using large language models for extracting stressful life events to assess their impact on preventive colon cancer screening adherence. BMC Public Health. 2025;25:12. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 5] [Reference Citation Analysis (0)]

55.

Chatziisaak D, Burri P, Sparn M, Hahnloser D, Steffen T, Bischofberger S. Concordance of ChatGPT artificial intelligence decision-making in colorectal cancer multidisciplinary meetings: retrospective study. BJS Open. 2025;9:zraf040. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 13] [Reference Citation Analysis (0)]

56.

Saraiva MM, Ribeiro T, Agudo B, Afonso J, Mendes F, Martins M, Cardoso P, Mota J, Almeida MJ, Costa A, Gonzalez Haba Ruiz M, Widmer J, Moura E, Javed A, Manzione T, Nadal S, Barroso LF, de Parades V, Ferreira J, Macedo G. Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology. J Clin Med. 2025;14:572. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 6] [Reference Citation Analysis (0)]

57.

Siu AHY, Gibson DP, Chiu C, Kwok A, Irwin M, Christie A, Koh CE, Keshava A, Reece M, Suen M, Rickard MJFX. ChatGPT as a patient education tool in colorectal cancer-An in-depth assessment of efficacy, quality and readability. Colorectal Dis. 2025;27:e17267. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 4] [Cited by in RCA: 7] [Article Influence: 7.0] [Reference Citation Analysis (0)]

58.

Horesh N, Emile SH, Gupta S, Garoufalia Z, Gefen R, Zhou P, da Silva G, Wexner SD. Comparing the Management Recommendations of Large Language Model and Colorectal Cancer Multidisciplinary Team: A Pilot Study. Dis Colon Rectum. 2025;68:41-47. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 8] [Article Influence: 8.0] [Reference Citation Analysis (0)]

59.

Ellison IE, Oslock WM, Abdullah A, Wood L, Thirumalai M, English N, Jones BA, Hollis R, Rubyan M, Chu DI. De novo generation of colorectal patient educational materials using large language models: Prompt engineering key to improved readability. Surgery. 2025;180:109024. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 4] [Article Influence: 4.0] [Reference Citation Analysis (0)]

60.

Ramchandani R, Guo E, Rakab E, Rathod J, Strain J, Klement W, Shorr R, Williams E, Jones D, Gilbert S. Validation of automated paper screening for esophagectomy systematic review using large language models. PeerJ Comput Sci. 2025;11:e2822. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

61.

Zhang H, Dong F, Li W, Ren Y, Dong H. HepatoAudit: A Comprehensive Dataset for Evaluating Consistency of Large Language Models in Hepatobiliary Case Record Diagnosis. 2025 IEEE 17th International Conference on Computer Research and Development (ICCRD); 2025 Jan 17-19; Shangrao, China. IEEE, 2025: 234-239. [DOI] [Full Text]

62.

Spitzl D, Mergen M, Bauer U, Jungmann F, Bressem KK, Busch F, Makowski MR, Adams LC, Gassert FT. Leveraging large language models for accurate classification of liver lesions from MRI reports. Comput Struct Biotechnol J. 2025;27:2139-2146. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 2] [Cited by in RCA: 6] [Article Influence: 6.0] [Reference Citation Analysis (0)]

63.

Sheng L, Chen Y, Wei H, Che F, Wu Y, Qin Q, Yang C, Wang Y, Peng J, Bashir MR, Ronot M, Song B, Jiang H. Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists. Liver Int. 2025;45:e70115. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 9] [Reference Citation Analysis (0)]

64.	Williams CY, Sarkar U, Adler-milstein J, Rotenstein L. Using Large Language Models to Determine Reasons for Missed Colon Cancer Screening Follow-Up. 2025 Preprint. Available from: medrxiv:25329439. [PubMed] [DOI] [Full Text]

65.	Lu K, Lu J, Xu H, Guo K, Zhang Q, Lin H, Grosser M, Zhang Y, Zhang G. Genomics-Enhanced Cancer Risk Prediction for Personalized LLM-Driven Healthcare Recommender Systems. ACM Trans Inf Syst. 2025;43:1-30. [PubMed] [DOI] [Full Text]

66.

Yang X, Xiao Y, Liu D, Zhang Y, Deng H, Huang J, Shi H, Liu D, Liang M, Jin X, Sun Y, Yao J, Zhou X, Guo W, He Y, Tang W, Xu C. Enhancing doctor-patient communication using large language models for pathology report interpretation. BMC Med Inform Decis Mak. 2025;25:36. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 3] [Cited by in RCA: 11] [Article Influence: 11.0] [Reference Citation Analysis (0)]

67.

Jain S, Chakraborty B, Agarwal A, Sharma R. Performance of Large Language Models (ChatGPT and Gemini Advanced) in Gastrointestinal Pathology and Clinical Review of Applications in Gastroenterology. Cureus. 2025;17:e81618. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)]

68.

Xu J, Wang J, Li J, Zhu Z, Fu X, Cai W, Song R, Wang T, Li H. Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts. J Med Syst. 2025;49:64. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 4] [Reference Citation Analysis (0)]

69.	Deroy A, Maity S. Cancer-Answer: Empowering Cancer Care with Advanced Large Language Models. 2025 Preprint. Available from: eprint arXiv:2411.06946. [PubMed] [DOI] [Full Text]

70.	Ye X, Shi T, Huang D, Sakurai T. Multi-Omics clustering by integrating clinical features from large language model. Methods. 2025;239:64-71. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

71.	Ma J, He Y, Li F, Han L, You C, Wang B. Segment anything in medical images. Nat Commun. 2024;15:654. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 267] [Cited by in RCA: 673] [Article Influence: 336.5] [Reference Citation Analysis (0)]

72.	Ryu JS, Kang H, Chu Y, Yang S. Vision-language foundation models for medical imaging: a review of current practices and innovations. Biomed Eng Lett. 2025;15:809-830. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 11] [Reference Citation Analysis (0)]

73.

Rao VM, Hla M, Moor M, Adithan S, Kwak S, Topol EJ, Rajpurkar P. Multimodal generative AI for medical image interpretation. Nature. 2025;639:888-896. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 34] [Article Influence: 34.0] [Reference Citation Analysis (0)]

74.

Zhang S, Xu Y, Usuyama N, Xu H, Bagga J, Tinn R, Preston S, Rao R, Wei M, Valluri N, Wong C, Tupini A, Wang Y, Mazzola M, Shukla S, Liden L, Gao J, Crabtree A, Piening B, Bifulco C, Lungren MP, Naumann T, Wang S, Poon H. A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image-Text Pairs. NEJM AI. 2025;2. [DOI] [Full Text]

75.

Zippelius C, Alqahtani SA, Schedel J, Brookman-Amissah D, Muehlenberg K, Federle C, Salzberger A, Schorr W, Pech O. Diagnostic accuracy of a novel artificial intelligence system for adenoma detection in daily practice: a prospective nonrandomized comparative study. Endoscopy. 2022;54:465-472. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 8] [Cited by in RCA: 21] [Article Influence: 5.3] [Reference Citation Analysis (0)]

76.

Cui B, Islam M, Bai L, Ren H. Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery. Int J Comput Assist Radiol Surg. 2024;19:1013-1020. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 15] [Article Influence: 7.5] [Reference Citation Analysis (0)]

77.	Wang J, Song S, Wang X, Wang Y, Miao Y, Su J, Zhou SK. ProMISe: Promptable Medical Image Segmentation using SAM. 2024 Preprint. Available from: eprint arXiv:2403.04164. [PubMed] [DOI] [Full Text]

78.	Li Y, Hu M, Yang X. Polyp-SAM: transfer SAM for polyp segmentation. Proceedings of the Medical Imaging 2024: Computer-Aided Diagnosis; 2024 Feb 18-22; San Diego, CA, United States. SPIE, 2024: 759-765. [PubMed] [DOI] [Full Text]

79.

Wang Z, Liu C, Zhang S, Dou Q. Foundation Model for Endoscopy Video Analysis via Large-Scale Self-supervised Pre-train. In: Greenspan H, Madabhushi A, Mousavi P, Salcudean S, Duncan J, Syeda-Mahmood T, Taylor R, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14228. Cham: Springer, 2023. [DOI] [Full Text]

80.	Ji GP, Liu J, Xu P, Barnes N, Shahbaz Khan F, Khan S, Fan DP. Frontiers in Intelligent Colonoscopy. 2024 Preprint. Available from: eprint arXiv:2410.17241. [PubMed] [DOI] [Full Text]

81.	Raseena TP, Kumar J, Balasundaram SR. DeepCPD: deep learning with vision transformer for colorectal polyp detection. Multimed Tools Appl. 2024;83:78183-78206. [PubMed] [DOI] [Full Text]

82.

Teufel T, Shu H, Soberanis-Mukul RD, Mangulabnan JE, Sahu M, Vedula SS, Ishii M, Hager G, Taylor RH, Unberath M. OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point. Int J Comput Assist Radiol Surg. 2024;19:1259-1266. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 7] [Reference Citation Analysis (0)]

83.	Liu Y, Yuan X, Zhou Y. EIVS: Unpaired Endoscopy Image Virtual Staining via State Space Generative Model. 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2024 Dec 03-06; Lisbon, Portugal. IEEE, 2025. [PubMed] [DOI] [Full Text]

84.	Jing X, Zhou H, Mao K, Zhao Y, Chu L. A Novel Automatic Prompt Tuning Method for Polyp Segmentation. 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2024 Dec 03-06; Lisbon, Portugal. IEEE, 2025. [PubMed] [DOI] [Full Text]

85.	He D, Ma Z, Li C, Li Y. Dual-Branch Fully Convolutional Segment Anything Model for Lesion Segmentation in Endoscopic Images. IEEE Access. 2024;12:125654-125667. [PubMed] [DOI] [Full Text]

86.

Li F, Huang Z, Zhou L, Chen Y, Tang S, Ding P, Peng H, Chu Y. Improved dual-aggregation polyp segmentation network combining a pyramid vision transformer with a fully convolutional network. Biomed Opt Express. 2024;15:2590-2621. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

87.	Dermyer P, Kalra A, Schwartz M. EndoDINO: A Foundation Model for GI Endoscopy. 2025 Preprint. Available from: eprint arXiv:2501.05488. [PubMed] [DOI] [Full Text]

88.	Choudhuri A, Gao Z, Zheng M, Planche B, Chen T, Wu Z. PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis. 2025 Preprint. Available from: eprint arXiv:2503.24108. [PubMed] [DOI] [Full Text]

89.

Chen H, Gou L, Fang Z, Dou Q, Chen H, Chen C, Qiu Y, Zhang J, Ning C, Hu Y, Deng H, Yu J, Li G. Artificial intelligence assisted real-time recognition of intra-abdominal metastasis during laparoscopic gastric cancer surgery. NPJ Digit Med. 2025;8:9. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 11] [Reference Citation Analysis (0)]

90.	Mostafijur Rahman M, Munir M, Jha D, Bagci U, Marculescu R. PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation. 2024 Preprint. Available from: eprint arXiv:2405.16740. [PubMed] [DOI] [Full Text]

91.	Wang G, Xiao H, Gao H, Zhang R, Bai L, Yang X, Li Z, Li H, Ren H. CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection. 2024 Preprint. Available from: eprint arXiv:2410.07540. [PubMed] [DOI] [Full Text]

92.	Tan S, Cai Y, Lin X, Qi W, Li Z, Wan X, Li G. ColonCLIP: An Adaptable Prompt-Driven Multi-Modal Strategy for Colonoscopy Image Diagnosis. 2024 IEEE International Symposium on Biomedical Imaging (ISBI); 2024 May 27-30; Athens, Greece. IEEE, 2024. [PubMed] [DOI] [Full Text]

93.

Yu J, Zhu Y, Fu P, Chen T, Huang J, Li Q, Zhou P, Wang Z, Wu F, Wang S, Yang X. Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models. IEEE Trans Med Imaging. 2025;44:5245-5257. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)]

94.	Sharma V, Jha D, Bhuyan MK, Das PK, Bagci U. Diverse Image Generation with Diffusion Models and Cross Class Label Learning for Polyp Classification. 2025 Preprint. Available from: eprint arXiv:2502.05444. [PubMed] [DOI] [Full Text]

95.	Karaosmanoglu AD, Onur MR, Arellano RS. Imaging in Gastrointestinal Cancers. In: Yalcin S, Philip P, editors. Textbook of Gastrointestinal Oncology. Cham: Springer, 2019. [PubMed] [DOI] [Full Text]

96.	Chong JJR, Kirpalani A, Moreland R, Colak E. Artificial Intelligence in Gastrointestinal Imaging: Advances and Applications. Radiol Clin North Am. 2025;63:477-490. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

97.	Wu C, Zhang X, Zhang Y, Wang Y, Xie W. Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data. 2023 Preprint. Available from: eprint arXiv:2308.02463. [PubMed] [DOI] [Full Text]

98.

Cherti M, Beaumont R, Wightman R, Wortsman M, Ilharco G, Gordon C, Schuhmann C, Schmidt L, Jitsev J. Reproducible Scaling Laws for Contrastive Language-Image Learning. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17-24; Vancouver, BC, Canada. IEEE, 2023. [DOI] [Full Text]

99.

Blankemeier L, Cohen JP, Kumar A, Van Veen D, Gardezi SJS, Paschali M, Chen Z, Delbrouck JB, Reis E, Truyts C, Bluethgen C, Jensen MEK, Ostmeier S, Varma M, Valanarasu JMJ, Fang Z, Huo Z, Nabulsi Z, Ardila D, Weng WH, Amaro E, Ahuja N, Fries J, Shah NH, Johnston A, Boutin RD, Wentland A, Langlotz CP, Hom J, Gatidis S, Chaudhari AS. Merlin: A Vision Language Foundation Model for 3D Computed Tomography. Res Sq. 2024;rs.3.rs-4546309. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 44] [Cited by in RCA: 25] [Article Influence: 12.5] [Reference Citation Analysis (0)]

100.

Saab K, Tu T, Weng WH, Tanno R, Stutz D, Wulczyn E, Zhang F, Strother T, Park C, Vedadi E, Zambrano Chaves J, Hu SY, Schaekermann M, Kamath A, Cheng Y, Barrett DGT, Cheung C, Mustafa B, Palepu A, McDuff D, Hou L, Golany T, Liu L, Alayrac JB, Houlsby N, Tomasev N, Freyberg J, Lau C, Kemp J, Lai J, Azizi S, Kanada K, Man S, Kulkarni K, Sun R, Shakeri S, He L, Caine B, Webson A, Latysheva N, Johnson M, Mansfield P, Lu J, Rivlin E, Anderson J, Green B, Wong R, Krause J, Shlens J, Dominowska E, Eslami SMA, Chou K, Cui C, Vinyals O, Kavukcuoglu K, Manyika J, Dean J, Hassabis D, Matias Y, Webster D, Barral J, Corrado G, Semturs C, Mahdavi SS, Gottweis J, Karthikesalingam A, Natarajan V. Capabilities of Gemini Models in Medicine. 2024 Preprint. Available from: eprint arXiv:2404.18416. [DOI] [Full Text]

101.

Kiraly AP, Baur S, Philbrick K, Mahvar F, Yatziv L, Chen T, Sterling B, George N, Jamil F, Tang J, Bailey K, Ahmed F, Goel A, Ward A, Yang L, Sellergren A, Matias Y, Hassidim A, Shetty S, Golden D, Azizi S, Steiner DF, Liu Y, Thelin T, Pilgrim R, Kirmizibayrak C. Health AI Developer Foundations. 2024 Preprint. Available from: eprint arXiv:2411.15128. [DOI] [Full Text]

102.	Pai S, Hadzic I, Bontempi D, Bressem K, Kann BH, Fedorov A, Mak RH, Aerts HJWL. Vision Foundation Models for Computed Tomography. 2025 Preprint. Available from: eprint arXiv:2501.09001. [PubMed] [DOI] [Full Text]

103.	Zhou HY, Nicolás Acosta J, Adithan S, Datta S, Topol EJ, Rajpurkar P. MedVersa: A Generalist Foundation Model for Medical Image Interpretation. 2024 Preprint. Available from: eprint arXiv:2405.07988. [PubMed] [DOI] [Full Text]

104.

Zhou F, Xu Y, Cui Y, Zhang S, Zhu Y, He W, Wang J, Wang X, Chan R, Lau LHS, Han C, Zhang D, Li Z, Chen H. iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer. 2024 Preprint. Available from: eprint arXiv:2404.01192. [DOI] [Full Text]

105.	Yasaka K, Kawamura M, Sonoda Y, Kubo T, Kiryu S, Abe O. Large multimodality model fine-tuned for detecting breast and esophageal carcinomas on CT: a preliminary study. Jpn J Radiol. 2025;43:779-786. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

106.

Dika E, Curti N, Giampieri E, Veronesi G, Misciali C, Ricci C, Castellani G, Patrizi A, Marcelli E. Advantages of manual and automatic computer-aided compared to traditional histopathological diagnosis of melanoma: A pilot study. Pathol Res Pract. 2022;237:154014. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 8] [Reference Citation Analysis (0)]

107.	Hanna MG, Parwani A, Sirintrapun SJ. Whole Slide Imaging: Technology and Applications. Adv Anat Pathol. 2020;27:251-259. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 18] [Cited by in RCA: 84] [Article Influence: 14.0] [Reference Citation Analysis (1)]

108.	Niazi MKK, Parwani AV, Gurcan MN. Digital pathology and artificial intelligence. Lancet Oncol. 2019;20:e253-e261. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 507] [Cited by in RCA: 726] [Article Influence: 103.7] [Reference Citation Analysis (0)]

109.

da Silva LM, Pereira EM, Salles PG, Godrich R, Ceballos R, Kunz JD, Casson A, Viret J, Chandarlapaty S, Ferreira CG, Ferrari B, Rothrock B, Raciti P, Reuter V, Dogdas B, DeMuth G, Sue J, Kanan C, Grady L, Fuchs TJ, Reis-Filho JS. Independent real-world application of a clinical-grade automated prostate cancer detection system. J Pathol. 2021;254:147-158. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 115] [Reference Citation Analysis (0)]

110.	Kang M, Song H, Park S, Yoo D, Pereira S. Benchmarking Self-Supervised Learning on Diverse Pathology Datasets. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17-24; Vancouver, BC, Canada. IEEE, 2023. [PubMed] [DOI] [Full Text]

111.	Wang X, Yang S, Zhang J, Wang M, Zhang J, Yang W, Huang J, Han X. Transformer-based unsupervised contrastive learning for histopathological image classification. Med Image Anal. 2022;81:102559. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 296] [Reference Citation Analysis (0)]

112.	Filiot A, Ghermi R, Olivier A, Jacob P, Fidon L, Camara A, Mac Kain A, Saillard C, Schiratti J. Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling. 2024 Preprint. Available from: medrxiv:23292757. [PubMed] [DOI] [Full Text]

113.

Azizi S, Culp L, Freyberg J, Mustafa B, Baur S, Kornblith S, Chen T, Tomasev N, Mitrović J, Strachan P, Mahdavi SS, Wulczyn E, Babenko B, Walker M, Loh A, Chen PC, Liu Y, Bavishi P, McKinney SM, Winkens J, Roy AG, Beaver Z, Ryan F, Krogue J, Etemadi M, Telang U, Liu Y, Peng L, Corrado GS, Webster DR, Fleet D, Hinton G, Houlsby N, Karthikesalingam A, Norouzi M, Natarajan V. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat Biomed Eng. 2023;7:756-779. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 106] [Reference Citation Analysis (0)]

114.

Vorontsov E, Bozkurt A, Casson A, Shaikovski G, Zelechowski M, Severson K, Zimmermann E, Hall J, Tenenholtz N, Fusi N, Yang E, Mathieu P, van Eck A, Lee D, Viret J, Robert E, Wang YK, Kunz JD, Lee MCH, Bernhard JH, Godrich RA, Oakley G, Millar E, Hanna M, Wen H, Retamero JA, Moye WA, Yousfi R, Kanan C, Klimstra DS, Rothrock B, Liu S, Fuchs TJ. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat Med. 2024;30:2924-2935. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 2] [Cited by in RCA: 183] [Article Influence: 91.5] [Reference Citation Analysis (0)]

115.

Zimmermann E, Vorontsov E, Viret J, Casson A, Zelechowski M, Shaikovski G, Tenenholtz N, Hall J, Klimstra D, Yousfi R, Fuchs T, Fusi N, Liu S, Severson K. Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology. 2024 Preprint. Available from: eprint arXiv:2408.00738. [DOI] [Full Text]

116.	Filiot A, Jacob P, Mac Kain A, Saillard C. Phikon-v2, A large and public feature extractor for biomarker prediction. 2024 Preprint. Available from: eprint arXiv:2409.09173. [PubMed] [DOI] [Full Text]

117.

Dippel J, Feulner B, Winterhoff T, Milbich T, Tietz S, Schallenberg S, Dernbach G, Kunft A, Heinke S, Eich M-L, Ribbat-Idel J, Krupar R, Anders P, Prenißl N, Jurmeister P, Horst D, Ruff L, Müller K-R, Klauschen F, Alber M. RudolfV: A Foundation Model by Pathologists for Pathologists. 2024 Preprint. Available from: eprint arXiv:2401.04079. [DOI] [Full Text]

118.	Nechaev D, Pchelnikov A, Ivanova E. Hibou: A Family of Foundational Vision Transformers for Pathology. 2024 Preprint. Available from: eprint arXiv:2406.05074. [PubMed] [DOI] [Full Text]

119.

Jaume G, Vaidya A, Zhang A, H. Song A, J. Chen R, Sahai S, Mo D, Madrigal E, Phi Le L, Mahmood F. Multistain Pretraining for Slide Representation Learning in Pathology. In: Leonardis A, Ricci E, Roth S, Russakovsky O, Sattler T, Varol G, editors. Computer Vision - ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15091. Cham: Springer, 2025. [DOI] [Full Text]

120.

Lenz T, Neidlinger P, Ligero M, Wölflein G, van Treeck M, Kather JN. Unsupervised foundation model-agnostic slide-level representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2025 Jun 10-17; Nashville, TN, United States. IEEE, 2025. [DOI] [Full Text]

121.

Juyal D, Padigela H, Shah C, Shenker D, Harguindeguy N, Liu Y, Martin B, Zhang Y, Nercessian M, Markey M, Finberg I, Luu K, Borders D, Ashar Javed S, Krause E, Biju R, Sood A, Ma A, Nyman J, Shamshoian J, Chhor G, Sanghavi D, Thibault M, Yu L, Najdawi F, Hipp JA, Fahy D, Glass B, Walk E, Abel J, Pokkalla H, Beck AH, Grullon S. PLUTO: Pathology-Universal Transformer. 2024 Preprint. Available from: eprint arXiv:2405.07905. [DOI] [Full Text]

122.

Chen RJ, Chen C, Li Y, Chen TY, Trister AD, Krishnan RG, Mahmood F. Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18-24; New Orleans, LA, United States. IEEE, 2022. [DOI] [Full Text]

123.	Hua S, Yan F, Shen T, Ma L, Zhang X. PathoDuet: Foundation models for pathological slide analysis of H&E and IHC stains. Med Image Anal. 2024;97:103289. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 18] [Reference Citation Analysis (0)]

124.	Ai K, Aben N, de Jong ED, Gatopoulos I, Känzig N, Karasikov M, Lagré A, Moser R, van Doorn J, Tang F. Towards Large-Scale Training of Pathology Foundation Models. 2024 Preprint. Available from: eprint arXiv:2404.15217. [PubMed] [DOI] [Full Text]

125.

Yan F, Wu J, Li J, Wang W, Lu J, Chen W, Gao Z, Li J, Yan H, Ma J, Chen M, Lu Y, Chen Q, Wang Y, Ling X, Wang X, Wang Z, Huang Q, Hua S, Liu M, Ma L, Shen T, Zhang X, He Y, Chen H, Zhang S, Wang Z. PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks. 2025 Preprint. Available from: eprint arXiv:2503.24345. [DOI] [Full Text]

126.

Vaidya A, Zhang A, Jaume G, Song AH, Ding T, Wagner SJ, Lu MY, Doucet P, Robertson H, Almagro-Perez C, Chen RJ, ElHarouni D, Ayoub G, Bossi C, Ligon KL, Gerber G, Phi Le L, Mahmood F. Molecular-driven Foundation Model for Oncologic Pathology. 2025 Preprint. Available from: eprint arXiv:2501.16652. [DOI] [Full Text]

127.	Filiot A, Dop N, Tchita O, Riou A, Dubois R, Peeters T, Valter D, Scalbert M, Saillard C, Robin G, Olivier A. Distilling foundation models for robust and efficient models in digital pathology. 2025 Preprint. Available from: eprint arXiv:2501.16239. [PubMed] [DOI] [Full Text]

128.	Nicke T, Schäfer JR, Höfener H, Feuerhake F, Merhof D, Kießling F, Lotz J. Tissue concepts: Supervised foundation models in computational pathology. Comput Biol Med. 2025;186:109621. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)]

129.

Kan Wang Y, Tydlitatova L, Kunz JD, Oakley G, Chow BKB, Godrich RA, Lee MCH, Aghdam H, Bozkurt A, Zelechowski M, Vanderbilt C, Kanan C, Retamero JA, Hamilton P, Yousfi R, Fuchs TJ, Klimstra DS, Liu S. Screen Them All: High-Throughput Pan-Cancer Genetic and Phenotypic Biomarker Screening from H&E Whole Slide Images. 2024 Preprint. Available from: eprint arXiv:2408.09554. [DOI] [Full Text]

130.	Wu Y, Li S, Du Z, Zhu W. BROW: Better featuRes fOr Whole slide image based on self-distillation. 2023 Preprint. Available from: eprint arXiv:2309.08259. [PubMed] [DOI] [Full Text]

131.

Yang Z, Wei T, Liang Y, Yuan X, Gao R, Xia Y, Zhou J, Zhang Y, Yu Z. A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images. Nat Commun. 2025;16:2366. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 28] [Article Influence: 28.0] [Reference Citation Analysis (0)]

132.

Alber M, Tietz S, Dippel J, Milbich T, Lesort T, Korfiatis P, Krügener M, Perez Cancer B, Shah N, Möllers A, Seegerer P, Carpen-Amarie A, Standvoss K, Dernbach G, de Jong E, Schallenberg S, Kunft A, Hoffer von Ankershoffen H, Schaeferle G, Duffy P, Redlon M, Jurmeister P, Horst D, Ruff L, Müller K-R, Klauschen F, Norgan A. Atlas: A Novel Pathology Foundation Model by Mayo Clinic, Charité, and Aignostics. 2025 Preprint. Available from: eprint arXiv:2501.05409. [DOI] [Full Text]

133.	Yang Z, Li L, Lin K, Wang J, Lin CC, Liu Z, Wang L. The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision). 2023 Preprint. Available from: eprint arXiv:2309.17421. [PubMed] [DOI] [Full Text]

134.	Wu J, Gan W, Chen Z, Wan S, Yu PS. Multimodal Large Language Models: A Survey. 2023 IEEE International Conference on Big Data (BigData); 2023 Dec 15-18; Sorrento, Italy. IEEE, 2024. [PubMed] [DOI] [Full Text]

135.	Kaczmarczyk R, Wilhelm TI, Martin R, Roos J. Evaluating multimodal AI in medical diagnostics. NPJ Digit Med. 2024;7:205. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 31] [Reference Citation Analysis (0)]

136.

Huang Z, Bianchi F, Yuksekgonul M, Montine TJ, Zou J. A visual-language foundation model for pathology image analysis using medical Twitter. Nat Med. 2023;29:2307-2316. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 244] [Article Influence: 81.3] [Reference Citation Analysis (0)]

137.

Guo Z, Ma J, Xu Y, Wang Y, Wang L, Chen H. HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-Modal Context Interaction. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024. [DOI] [Full Text]

138.

Ahmed F, Sellergen A, Yang L, Xu S, Babenko B, Ward A, Olson N, Mohtashamian A, Matias Y, Corrado GS, Duong Q, Webster DR, Shetty S, Golden D, Liu Y, Steiner DF, Wulczyn E. PathAlign: A vision-language model for whole slide images in histopathology. Proceedings of the MICCAI Workshop on Computational Pathology; 2024; Proceedings of Machine Learning Research. PMLR, 2024: 72-108.

139.

Wang X, Zhao J, Marostica E, Yuan W, Jin J, Zhang J, Li R, Tang H, Wang K, Li Y, Wang F, Peng Y, Zhu J, Zhang J, Jackson CR, Zhang J, Dillon D, Lin NU, Sholl L, Denize T, Meredith D, Ligon KL, Signoretti S, Ogino S, Golden JA, Nasrallah MP, Han X, Yang S, Yu KH. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature. 2024;634:970-978. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 87] [Cited by in RCA: 207] [Article Influence: 103.5] [Reference Citation Analysis (0)]

140.	Sun Y, Zhang Y, Si Y, Zhu C, Shui Z, Zhang K, Li J, Lyu X, Lin T, Yang L. PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration. 2024 Preprint. Available from: eprint arXiv:2407.00203. [PubMed] [DOI] [Full Text]

141.

Lu MY, Chen B, Williamson DFK, Chen RJ, Zhao M, Chow AK, Ikemura K, Kim A, Pouli D, Patel A, Soliman A, Chen C, Ding T, Wang JJ, Gerber G, Liang I, Le LP, Parwani AV, Weishaupt LL, Mahmood F. A multimodal generative AI copilot for human pathology. Nature. 2024;634:466-473. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 282] [Cited by in RCA: 160] [Article Influence: 80.0] [Reference Citation Analysis (0)]

142.	Sun Y, Zhu C, Zheng S, Zhang K, Sun L, Shui Z, Zhang Y, Li H, Yang L. PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology. Proc AAAI Conf Artif Intell. 2024;38:5034-5042. [PubMed] [DOI] [Full Text]

143.

Xu H, Usuyama N, Bagga J, Zhang S, Rao R, Naumann T, Wong C, Gero Z, González J, Gu Y, Xu Y, Wei M, Wang W, Ma S, Wei F, Yang J, Li C, Gao J, Rosemon J, Bower T, Lee S, Weerasinghe R, Wright BJ, Robicsek A, Piening B, Bifulco C, Wang S, Poon H. A whole-slide foundation model for digital pathology from real-world data. Nature. 2024;630:181-188. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 310] [Reference Citation Analysis (0)]

144.

Ding T, Wagner SJ, Song AH, Chen RJ, Lu MY, Zhang A, Vaidya AJ, Jaume G, Shaban M, Kim A, Williamson DFK, Chen B, Almagro-Perez C, Doucet P, Sahai S, Chen C, Komura D, Kawabe A, Ishikawa S, Gerber G, Peng T, Phi Le L, Mahmood F. Multimodal Whole Slide Foundation Model for Pathology. 2024 Preprint. Available from: eprint arXiv:2411.19666. [DOI] [Full Text]

145.

Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, Jaume G, Odintsov I, Le LP, Gerber G, Parwani AV, Zhang A, Mahmood F. A visual-language foundation model for computational pathology. Nat Med. 2024;30:863-874. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 407] [Cited by in RCA: 326] [Article Influence: 163.0] [Reference Citation Analysis (0)]

146.	Chen Y, Wang G, Ji Y, Li Y, Ye J, Li T, Hu M, Yu R, Qiao Y, He J. SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding. 2024 Preprint. Available from: eprint arXiv:2410.11761. [PubMed] [DOI] [Full Text]

147.

Tan JW, Kim S, Kim E, Lee SH, Ahn S, Jeong W. Clinical-Grade Multi-organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024. [DOI] [Full Text]

148.

Chen Z, Chen Y, Sun Y, Tang L, Zhang L, Hu Y, He M, Li Z, Cheng S, Yuan J, Wang Z, Wang Y, Zhao J, Gong J, Zhao L, Cao B, Li G, Zhang X, Dong B, Shen L. Predicting gastric cancer response to anti-HER2 therapy or anti-HER2 combined immunotherapy based on multi-modal data. Signal Transduct Target Ther. 2024;9:222. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 28] [Cited by in RCA: 51] [Article Influence: 25.5] [Reference Citation Analysis (0)]

149.	Zhao W, Guo Z, Fan Y, Jiang Y, Yeung MCF, Yu L. Aligning knowledge concepts to whole slide images for precise histopathology image analysis. NPJ Digit Med. 2024;7:383. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 4] [Reference Citation Analysis (0)]

150.

Ferber D, Wölflein G, Wiest IC, Ligero M, Sainath S, Ghaffari Laleh N, El Nahhas OSM, Müller-Franzes G, Jäger D, Truhn D, Kather JN. In-context learning enables multimodal large language models to classify cancer pathology images. Nat Commun. 2024;15:10104. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 48] [Reference Citation Analysis (0)]

151.

Wang J, Wang K, Yu Y, Lu Y, Xiao W, Sun Z, Liu F, Zou Z, Gao Y, Yang L, Zhou HY, Miao H, Zhao W, Huang L, Zeng L, Guo R, Chong I, Deng B, Cheng L, Chen X, Luo J, Zhu MH, Baptista-Hon D, Monteiro O, Li M, Ke Y, Li J, Zeng S, Guan T, Zeng J, Xue K, Oermann E, Luo H, Yin Y, Zhang K, Qu J. Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nat Med. 2025;31:609-617. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 42] [Article Influence: 42.0] [Reference Citation Analysis (0)]

152.

Zhou Q, Zhong W, Guo Y, Xiao M, Ma H, Huang J. PathM3: A Multimodal Multi-task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024. [DOI] [Full Text]

153.

Hu D, Jiang Z, Shi J, Xie F, Wu K, Tang K, Cao M, Huai J, Zheng Y. Histopathology language-image representation learning for fine-grained digital pathology cross-modal retrieval. Med Image Anal. 2024;95:103163. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 5] [Reference Citation Analysis (0)]

154.

Zhang L, Yun B, Xie X, Li Q, Li X, Wang Y. Prompting Whole Slide Image Based Genetic Biomarker Prediction. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024. [DOI] [Full Text]

155.	Sengupta S, Brown DE. Automatic Report Generation for Histopathology Images Using Pre-Trained Vision Transformers and BERT. 2024 IEEE International Symposium on Biomedical Imaging (ISBI); 2024 May 27-30; Athens, Greece. IEEE: 2024. [PubMed] [DOI] [Full Text]

156.	Xu Y, Wang Y, Zhou F, Ma J, Yang S, Lin H, Wang X, Wang J, Liang L, Han A, Chan RCK, Chen H. A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model. 2024 Preprint. Available from: eprint arXiv:2407.15362. [PubMed] [DOI] [Full Text]

157.

Ferber D, El Nahhas OSM, Wölflein G, Wiest IC, Clusmann J, Leßmann ME, Foersch S, Lammert J, Tschochohei M, Jäger D, Salto-Tellez M, Schultz N, Truhn D, Kather JN. Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nat Cancer. 2025;6:1337-1349. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 70] [Cited by in RCA: 41] [Article Influence: 41.0] [Reference Citation Analysis (0)]

158.

Shaikovski G, Casson A, Severson K, Zimmermann E, Kan Wang Y, Kunz JD, Retamero JA, Oakley G, Klimstra D, Kanan C, Hanna M, Zelechowski M, Viret J, Tenenholtz N, Hall J, Fusi N, Yousfi R, Hamilton P, Moye WA, Vorontsov E, Liu S, Fuchs TJ. PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology. 2024 Preprint. Available from: eprint arXiv:2405.10254. [DOI] [Full Text]

159.

Tran M, Schmidle P, Guo RR, Wagner SJ, Koch V, Lupperger V, Novotny B, Murphree DH, Hardway HD, D'Amato M, Lefkes J, Geijs DJ, Feuchtinger A, Böhner A, Kaczmarczyk R, Biedermann T, Amir AL, Mooyaart AL, Ciompi F, Litjens G, Wang C, Comfere NI, Eyerich K, Braun SA, Marr C, Peng T. Generating dermatopathology reports from gigapixel whole slide images with HistoGPT. Nat Commun. 2025;16:4886. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 8] [Reference Citation Analysis (0)]

160.	Dai D, Zhang Y, Yang Q, Xu L, Shen X, Xia S, Wang G. Pathologyvlm: a large vision-language model for pathology image understanding. Artif Intell Rev. 2025;58:186. [PubMed] [DOI] [Full Text]

161.

Xiang J, Wang X, Zhang X, Xi Y, Eweje F, Chen Y, Li Y, Bergstrom C, Gopaulchan M, Kim T, Yu KH, Willens S, Olguin FM, Nirschl JJ, Neal J, Diehn M, Yang S, Li R. A vision-language foundation model for precision oncology. Nature. 2025;638:769-778. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 81] [Cited by in RCA: 74] [Article Influence: 74.0] [Reference Citation Analysis (0)]

162.

Deshpande P, Rasin A, Tchoua R, Furst J, Raicu D, Schinkel M, Trivedi H, Antani S. Biomedical heterogeneous data categorization and schema mapping toward data integration. Front Big Data. 2023;6:1173038. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 2] [Reference Citation Analysis (0)]

163.	Mohammed Yakubu A, Chen YP. Ensuring privacy and security of genomic data and functionalities. Brief Bioinform. 2020;21:511-526. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 43] [Cited by in RCA: 24] [Article Influence: 4.0] [Reference Citation Analysis (0)]

164.	Shin H, Ryu K, Kim JY, Lee S. Application of privacy protection technology to healthcare big data. Digit Health. 2024;10:20552076241282242. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 4] [Reference Citation Analysis (0)]

165.

Quinn TP, Jacobs S, Senadeera M, Le V, Coghlan S. The three ghosts of medical AI: Can the black-box present deliver? Artif Intell Med. 2022;124:102158. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 23] [Cited by in RCA: 79] [Article Influence: 15.8] [Reference Citation Analysis (0)]

166.	Karim MR, Islam T, Shajalal M, Beyan O, Lange C, Cochez M, Rebholz-Schuhmann D, Decker S. Explainable AI for Bioinformatics: Methods, Tools and Applications. Brief Bioinform. 2023;24:bbad236. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 63] [Reference Citation Analysis (0)]

167.	Caton S, Haas C. Fairness in Machine Learning: A Survey. ACM Comput Surv. 2024;56:1-38. [PubMed] [DOI] [Full Text]

168.

Ong JCL, Chang SY, William W, Butte AJ, Shah NH, Chew LST, Liu N, Doshi-Velez F, Lu W, Savulescu J, Ting DSW. Ethical and regulatory challenges of large language models in medicine. Lancet Digit Health. 2024;6:e428-e432. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 73] [Cited by in RCA: 131] [Article Influence: 65.5] [Reference Citation Analysis (0)]

169.

Hantel A, Walsh TP, Marron JM, Kehl KL, Sharp R, Van Allen E, Abel GA. Perspectives of Oncologists on the Ethical Implications of Using Artificial Intelligence for Cancer Care. JAMA Netw Open. 2024;7:e244077. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 16] [Cited by in RCA: 24] [Article Influence: 12.0] [Reference Citation Analysis (0)]

170.

El Arab RA, Abu-Mahfouz MS, Abuadas FH, Alzghoul H, Almari M, Ghannam A, Seweid MM. Bridging the Gap: From AI Success in Clinical Trials to Real-World Healthcare Implementation-A Narrative Review. Healthcare (Basel). 2025;13:701. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 16] [Reference Citation Analysis (0)]

Footnotes

Provenance and peer review: Invited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Gastroenterology and hepatology

Country of origin: China

Peer-review report’s classification

Scientific Quality: Grade B, Grade B, Grade B

Novelty: Grade B, Grade B, Grade C

Creativity or Innovation: Grade B, Grade C, Grade C

Scientific Significance: Grade B, Grade B, Grade C

Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/

P-Reviewer: Guo TH, MD, PhD, Researcher, China; Ma X, MD, China; Tlaiss Y, MD, Lebanon S-Editor: Li L L-Editor: A P-Editor: Zhao YQ