BPG is committed to discovery and dissemination of knowledge
Review Open Access
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Dec 21, 2025; 31(47): 112921
Published online Dec 21, 2025. doi: 10.3748/wjg.v31.i47.112921
Foundation models: Insights and implications for gastrointestinal cancer
Lei Shi, Rui Huang, An-Jie Guo, School of Life Sciences, Chongqing University, Chongqing 400044, China
Li-Ling Zhao, Department of Stomatology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400042, China
ORCID number: Lei Shi (0000-0002-7995-5664).
Author contributions: Shi L and Huang R designed the study and collected the data; Shi L, Huang R, and Zhao LL analyzed and interpreted the data; Shi L and Huang R wrote the manuscript; Zhao LL and Guo AJ revised the manuscript; all authors approved the final version of the manuscript.
Supported by the Open Project Program of Panxi Crops Research and Utilization Key Laboratory of Sichuan Province, No. SZKF202302; and the Fundamental Research Funds for the Central Universities No. 2019CDYGYB024.
Conflict-of-interest statement: The authors deny any conflict of interest.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Lei Shi, Associate Professor, School of Life Sciences, Chongqing University, No. 55 University City South Road, Shapingba District, Chongqing 400044, China. shil@cqu.edu.cn
Received: August 11, 2025
Revised: September 10, 2025
Accepted: November 3, 2025
Published online: December 21, 2025
Processing time: 132 Days and 8.5 Hours

Abstract

Gastrointestinal (GI) cancers represent a major global health concern due to their high incidence and mortality rates. Foundation models (FMs), also referred to as large models, represent a novel class of artificial intelligence technologies that have demonstrated considerable potential in addressing these challenges. These models encompass large language models (LLMs), vision FMs (VFMs), and multimodal LLMs (MLLMs), all of which utilize transformer architectures and self-supervised pre-training on extensive unlabeled datasets to achieve robust cross-domain generalization. This review delineates the principal applications of these models: LLMs facilitate the structuring of clinical narratives, extraction of insights from medical records, and enhancement of physician-patient communication; VFMs are employed in the analysis of endoscopic, radiological, and pathological images for lesion detection and staging; MLLMs integrate heterogeneous data modalities, including imaging, textual information, and genomic data, to support diagnostic processes, treatment prediction, and prognostic evaluation. Despite these promising developments, several challenges remain, such as the need for data standardization, limited diversity within training datasets, substantial computational resource requirements, and ethical-legal concerns. In conclusion, FMs exhibit significant potential to advance research and clinical management of GI cancers. Future research efforts should prioritize the refinement of these models, promote international collaborations, and adopt interdisciplinary approaches. Such a comprehensive strategy is essential to fully harness the capabilities of FMs, driving substantial progress in the fight against GI malignancies.

Key Words: Foundation models; Gastrointestinal cancers; Large language models; Vision foundation models; Multimodal large language models

Core Tip: This review synthesizes applications of foundation models in gastrointestinal cancer, from clinical text structuring and image analysis to multimodal data integration. Despite current knowledge gaps and challenges like data standardization, it highlights foundation models’ transformative potential, urging refined models and collaborations to advance gastrointestinal cancer research.



INTRODUCTION

Gastrointestinal (GI) cancers represent some of the most prevalent and lethal malignancies worldwide, imposing a substantial burden on public health[1]. Their multifactorial etiology and heterogeneous clinical manifestations make them difficult to study and treat using current methods[2]. Nevertheless, the advent of next-generation artificial intelligence (AI) models, known as oundation odels (FMs), offers novel avenues for addressing these challenges[3]. These models, trained on vast amounts of datasets, have the power to handle complex tasks, thereby presenting promising strategies to mitigate this worldwide health concern[4].

Unlike early AI methods that targeted isolated tasks or limited data modalities, FMs can integrate diverse medical data types, including endoscopic images, pathology slides, electronic health records (EHRs), genomic data, and clinical narratives[5]. This integrative capability is particularly pertinent to GI cancers, which often progress through a defined pattern (e.g., Correa’s cascade from gastritis to cancer)[6]. Accurate risk assessment, early diagnosis, and therapeutic decision-making require comprehensive data interpretation. However, current knowledge regarding the application of FMs in GI cancer remains limited, underscoring the imperative to systematically review current implementations and delineate prospective research trajectories to advance FM utilization in this domain.

Traditional computational biology techniques, such as support vector machines (SVMs) and random forests, alongside more recent deep-learning approaches like convolutional neural networks (CNNs), have made incremental advances in GI cancer research[7]. Nevertheless, these methods face major limitations, including dependence on labor-intensive, high-quality annotations; heterogeneity of datasets across institutions; and a predominant focus on unimodal data (e.g., imaging or genomics in isolation). These constraints highlight the necessity for cross-modal, large-scale pre-trained models[8].

Recent breakthroughs in general-purpose FMs, exemplified by ChatGPT, Stable Diffusion, and related architectures, have introduced a new paradigm shift in GI cancer research[5,9]. Their innovation resides in exceptional generalizability and cross-domain adaptability, facilitated by transformer-based architectures comprising billions of parameters pre-trained on vast, diverse datasets[10]. This pre-training engenders universal representations transferable to a broad spectrum of downstream tasks, maintaining robust performance even with limited or unlabeled data. Compared to traditional methods, FMs offer distinct advantages: Billion-scale parameterization combined with self-supervised learning (SSL) enables deep feature extraction and fusion of heterogeneous data; zero- or few-shot transfer learning substantially diminishes reliance on annotated datasets[11]. This review retrospectively synthesizes key FMs applied in GI cancer research, focusing on three principal categories: Large language models (LLMs) for clinical decision support leveraging EHRs; vision models [e.g., Vision Transformer (ViT) architectures] for endoscopic image analysis; and multimodal fusion models integrating imaging, omics, and pathology data. It is noteworthy that this research field is rapidly evolving, with some models already operational and others exploratory yet exhibiting considerable translational potential.

OVERVIEW OF FMS

This section provides a concise historical overview of AI development to contextualize the emergence of FMs for researchers less familiar with the field. The conceptual foundation of AI traces back to Alan Turing’s 1950 proposal of the "Turing Test", envisioning computational simulation of human intelligence[12]. The 1956 Dartmouth Conference marked a seminal milestone, formally introducing the term "artificial intelligence" and transitioning the field from theoretical inquiry to systematic investigation[13]. AI evolution encompasses three major phases: The nascent period (1950s-1970s), dominated by symbolic logic and expert systems. For example, the Perceptron model developed by Frank Rosenblatt in 1957 attempted to realize classification learning through neural networks but hit a bottleneck due to hardware limitations[14]; the revival period (1980s-2000s) , characterized by statistical learning and big data exemplified by IBM's Deep Blue (which defeated the world chess champion in 1997) and Watson (which won the Jeopardy championship in 2011) verifying AI's potential in specific tasks[15]; and the contemporary era (2010s-present), defined by deep learning and large-scale models. The introduction of the transformer architecture in 2017 revolutionized natural language processing (NLP)[16]. Following 2020, pre-trained large models, exemplified by the GPT series and BERT, demonstrated universal representation capabilities by leveraging massive datasets and extensive parameter counts (e.g., GPT-3 with 175 billion parameters), thereby facilitating a transition in AI from task-specific adaptation to knowledge-driven approaches[17,18]. In 2022, ChatGPT achieved human-aligned conversational abilities; by 2024, the multimodal model GPT-4o made significant progress in cross-modal understanding[19]; and by 2025, reasoning models such as DeepSeek-R1 approximated human cognitive processes[20]. Consequently, AI has experienced a paradigm shift from rule-based systems to data-driven methodologies, culminating in the "pre-training plus fine-tuning" framework and ushering in a new era of general intelligence dominated by FMs. This evolutionary trajectory has culminated in the groundbreaking advancements of contemporary FMs, establishing a novel technical foundation for addressing complex scientific challenges, such as protein folding, as well as clinical applications, including medical diagnosis.

The concept of FMs was initially introduced by the Center for Research on Foundation Models (CRFM) at Stanford University in 2021[11]. CRFM characterizes FMs as models trained on extensive and diverse datasets, typically via large-scale SSL, that can be adapted to a variety of downstream tasks through fine-tuning. These models transcend the traditional reliance of machine learning on task-specific annotated data by capturing the fundamental structures and patterns inherent in data, such as linguistic grammar, visual textures, or cross-modal relationships, thereby substantially enhancing their generalization capabilities[10]. Prominent examples include NLP models like GPT and BERT, as well as multimodal models such as CLIP[17,18,21]. For instance, in the biomedical domain, FMs can learn universal representations by integrating heterogeneous microscopic imaging modalities (e.g., bright-field and fluorescence microscopy) and subsequently be fine-tuned for specific pathological tasks using minimal annotated data, which reduces annotation costs while enabling efficient cross-contextual transfer[22].

A principal distinction between FMs and conventional AI models lies in their methodological approach. Traditional models, such as SVMs and CNNs, are typically designed for narrowly defined tasks and require substantial labeled datasets for each specific application[15]. Consequently, these models exhibit limited generalizability and are not readily adaptable to novel tasks; for example, a model trained to detect gastric cancer pathology cannot be directly repurposed for colorectal cancer (CRC) lymph node identification[23]. In contrast, FMs employ a two-stage process involving self-supervised pre-training followed by downstream fine-tuning[11]. During pre-training, FMs learn from vast quantities of unlabeled data, such as medical images and textual corpora, through tasks like masked reconstruction. Subsequently, fine-tuning enables adaptation to new tasks with relatively small labeled datasets. This paradigm allows a single pre-trained model to be deployed across multiple scenarios. Architectures such as GPT utilize the Transformer framework and autoregressive language modeling, training on extensive internet text corpora to internalize language patterns without manual annotation[16]. This SSL strategy endows FMs with adaptability across diverse tasks, including medical question answering and clinical case summarization, requiring only modest fine-tuning. The capacity for one-time training followed by multi-task reuse underpins FMs’ ability to generalize across domains and modalities, encompassing text, images, and speech, thereby advancing from task-specific models toward more generalized intelligence[5].

The foundational principles of FMs rest upon the integration of architectural design, algorithmic strategies, and technical paradigms, collectively facilitating their versatility and scalability[11]. Architecturally, FMs predominantly adopt the Transformer framework[16], wherein the self-attention mechanism dynamically assigns weights to different elements within a sequence, enabling context-sensitive processing. For example, the term “gastric” may activate distinct medical concepts depending on its contextual usage, such as in “gastric cancer” vs “gastric bezoar”. Algorithmically, FMs follow a pre-training and fine-tuning paradigm. Pre-training constructs a universal knowledge base from large-scale unlabeled data via SSL techniques; for instance, Masked Language Modeling tasks involve predicting obscured text segments (e.g., “Colorectal [MASK] screening guidelines”) to learn associations among medical concepts. Contrastive learning methods align multimodal features, such as correlating endoscopic images with corresponding pathological descriptions[11]. This data-driven approach diminishes dependence on annotated datasets and, when combined with extensive model parameters and massive training corpora, yields substantial performance gains. Fine-tuning adjusts model parameters on task-specific datasets, enabling rapid adaptation to downstream applications; for example, after fine-tuning on tumor classification, the model can accurately delineate cancerous regions in pathology images[11].

FMs can be classified into three categories based on input modalities: LLMs, Vision FMs (VFMs), and Multimodal LLMs (MLLMs)[10]. LLMs are sophisticated neural networks comprising billions of parameters, surpassing traditional language models in performance, with model size generally correlating with efficacy. For example, BioBERT, pre-trained on PubMed abstracts and clinical notes, has enhanced the accuracy of drug-drug interaction predictions[24]. GPT-3 employs in-context learning to generate text completions, overcoming prior limitations[17]. Applications of LLMs include structuring clinical narratives (e.g., extracting gastric cancer TNM staging from medical records), synthesizing evidence from literature (e.g., summarizing clinical trial outcomes for PD-1 inhibitors), and facilitating doctor-patient communication (e.g., generating patient-friendly colonoscopy reports). VFMs specialize in processing visual data such as images and videos, achieving significant advances in visual understanding and generation by integrating Transformer architectures with generative adversarial networks. Representative models include ViT and CLIP[21,25]. Diffusion models like Stable Diffusion have further propelled high-quality, controllable image synthesis. In GI oncology, VFMs have been applied to pathological image analysis (e.g., segmenting gastric mucosal dysplasia), endoscopic video interpretation (e.g., detecting microvascular patterns indicative of early gastric cancer), and multi-scale feature fusion [e.g., generating tumor invasiveness maps by integrating computed tomography (CT)/magnetic resonance imaging (MRI) scans with histopathological sections]. MLLMs unify vision, text, and audio data within a single framework, overcoming the limitations inherent in unimodal systems through cross-modal fusion. While traditional LLMs excel in textual data and VFMs in visual data, MLLMs effectively handle heterogeneous data types. Their applications encompass image-text alignment (e.g., correlating endoscopic images with pathological reports), temporal data fusion (e.g., linking imaging changes with genomic profiles), and clinical decision support (e.g., generating personalized treatment recommendations based on pathological reports)[10].

To provide a contextual understanding of how FMs tackle challenges associated with GI cancers, Figure 1 presents their application framework. It delineates five primary data inputs, further details the processes of pre-training and subsequent fine-tuning methods, applied to various FMs including LLMs, VFMs, and MLLMs. The framework also highlights the spectrum of downstream tasks facilitated by these models, ranging from information extraction to molecular subtyping.

Figure 1
Figure 1 Application of foundation models in gastrointestinal cancer. This figure illustrates the workflow and applications of foundation models (FMs) in addressing challenges within gastrointestinal cancer research and clinical practice. Starting with a variety of input data sources such as clinical text documents, endoscopic imaging, radiomics from computed tomography/magnetic resonance imaging scans, pathological slides, and multi-omics data, these inputs are categorized into textual or image data for pre-training FMs using self-supervised learning, transformer architecture, and self-attention mechanisms to develop models like large language models, vision FMs, and multi-modal learning models. After pre-training, these models undergo fine-tuning through methods such as low-rank adaptation, enabling them to perform a wide range of downstream tasks, thereby showcasing the versatility and potential of FMs in advancing gastrointestinal cancer diagnosis, treatment, and research.

To offer a focused overview of FMs specifically designed for GI cancer research, we first present a summary of FMs with validated applications in GI cancer across language, vision, and multimodal domains in Table 1. This summary emphasizes most importantly, their distinct use cases within GI cancer research, categorized as NLP, endoscopy (Endo), radiology (Radio), and pathology (PA). A critical annotation in the "GI cancer applications" column, denoted as "Directly", signifies that the model was employed for GI cancer-related tasks (e.g., NLP, Endo, Radio, PA, or MLLM) without requiring further modification or fine-tuning, thereby underscoring its intrinsic adaptability to clinical demands.

Table 1 Summary of common general-purpose foundation models used in gastrointestinal cancer.
Name
Type
Creator
Year
Architecture
Parameters
Modality
OSS
GI cancer applications
BERTLLMGoogle2018Encoder-only transformer110M (base), 340M (large)TextYesNLP, Radio, MLLM
GPT-3LLMOpenAI2020Decoder-only transformer175BTextNoNLP
ViTVisionGoogle2020Encoder-only transformer86M (base), 307M (large), 632M (huge)ImageYesEndo, Radio, PA, MLLM
DINOv1VisionMeta2021Encoder-only transformer22M, 86MImageYesEndo, PA
CLIPMMOpenAI2021Encoder-encoder120-580MText, ImageYesEndo, Radio, MLLM, directly1
GLM-130BLLMTsinghua2022Encoder-decoder130BTextYesNLP
Stable DiffusionMMStability AI2022Diffusion model1.45BText, ImageYesNLP, Endo, MLLM, directly
BLIPMMSalesforce2022Encoder-decoder120M (base), 340M (large)Text, ImageYesRadio, MLLM, directly
YouChatLLMYou.com2022Fine-tuned LLMsUnknownTextNoNLP
BardMMGoogle2023Based on PaLM 2340B estimatedText, Image, Audio, CodeNoNLP
Bing ChatMMMicrosoft2023Fine-tuned GPT-4UnknownText, ImageNoNLP
Mixtral 8x7BLLMMistral AI2023Decoder-only, Mixture-of-Experts (MoE)46.7B total (12.9B active per token)TextNLP
LLaVAMMMicrosoft2023Vision encoder, LLM7B, 13BText, ImageYesPA, MLLM
DINOv2VisionMeta2023Encoder-only transformer86M to 1.1BImageYesEndo, Radio, PA, MLLM, directly
Claude 2LLMAnthropic2023Decoder-only transformerUnknownTextNoNLP
GPT-4MMOpenAI2023Decoder-only transformer1.8T (Estimated)Text, ImageNoNLP, Endo, MLLM, directly
LLaMa 2LLMMeta2023Decoder-only transformer7B, 13B, 34B, 70BTextYesNLP, Endo, MLLM, directly
SAM VisionMeta2023Encoder-decoder375M, 1.25G, 2.56GImageYesEndo, directly
GPT-4VMMOpenAI2023MM transformer1.8T Text, ImageNoEndo, MLLM
QwenNLPAlibaba2023Decoder-only transformer70B, 180B, 720BTextYesNLP, MLLM
GPT-4oMMOpenAI2024MM transformerUnknown (Larger than GPT-4)Text, Image, VideoNoNLP
LLaMa 3LLMMeta2024Decoder-only transformer8B, 70B, 400BTextYesNLP, directly
Gemini 1.5MMGoogle2024MM transformer1.6TText, Image, Video, AudioNoNLP, Radio, directly
Claude 3.7MMAnthropic2024Decoder-only transformerUnknownText, ImageNoNLP, directly
YOLOWorldVisionIDEA2024CNN + RepVL-PAN vision-language fusion13-110M (depending on scale)Text, ImageYes Endo, directly
DeepSeekLLMDeepSeek2025Decoder-only transformer671BTextYesNLP
Phi-4LLMMicrosoft2025Decoder-only transformer14B (plus), 7B (mini)TextYesEndo

The evolution of GI-related FMs reveals a discernible trajectory of enhanced capabilities and improved alignment with clinical requirements. The introduction of Transformer-based architectures by models such as BERT in 2018 laid the foundational groundwork for contemporary FMs, facilitating subsequent advancements in their medical domain adaptation. Between 2020 and 2021, language-centric FMs, including GPT-3 and GLM-130B, experienced substantial scaling, encompassing tens to hundreds of billions of parameters. This expansion augmented their proficiency in managing unstructured GI cancer data, enabling tasks such as the extraction of phenotypic characteristics and treatment information from EHRs and scientific literature. Concurrently, vision-oriented FMs, exemplified by ViT and DINO, adapted Transformer architectures for image-based applications, addressing pivotal challenges in GI cancer diagnosis. Leveraging transfer learning, these models demonstrated high accuracy in detecting early gastric and colorectal lesions within pathology slides and endoscopic video data.

Post-2021 developments witnessed a shift towards multimodal FMs, which further enhanced clinical utility. Models such as CLIP, BLIP, and Stable Diffusion integrated textual and visual encoding capabilities, facilitating end-to-end workflows including lesion localization in radiological imaging and cross-validation of pathology reports with endoscopic observations. Since 2023, advanced FMs like GPT-4, Gemini 1.5, and Claude 3 have extended their input modalities to encompass video and audio data. Employing a Mixture of Experts (MoE) architecture, these models achieve a balance between computational efficiency and performance, a critical factor for processing lengthy endoscopic videos and multimodal patient datasets. Notably, the Segment Anything Model (SAM) has emerged as a versatile instrument for segmenting GI lesions, exemplifying how general-purpose multimodal FMs can be rapidly adapted to meet specific clinical application requirements.

FMS IN GI CANCERS
Applications of LLMs in GI cancers

The analysis of natural language, characterized by its inherent unstructured nature, has historically posed significant challenges for computational processing through rule-based or traditional algorithmic approaches, particularly within the domain of medical texts that contain specialized terminology and complex syntactic structures[26]. However, since the early 2020s, the rapid advancement of LLMs has transformed the field of NLP, establishing these models as the predominant methodology for managing diverse textual data. Notably, LLMs grounded in the Transformer architecture have emerged as the leading solution in NLP applications due to their superior performance, as summarized in Table 1.

LLMs possess the capability to generate novel, contextually relevant text rather than merely reproducing or summarizing existing information[17]. The widespread adoption and standardization of LLMs have significantly democratized NLP, enabling researchers without extensive technical expertise to employ models such as GPT and BERT for practical applications[10]. These models can store and retrieve extensive knowledge bases and extract structured information from medical documents, including radiology and pathology reports, and can even offer medical recommendations based on imaging data[24]. This functionality is particularly valuable in GI cancer research, where integrating data from laboratory records, scientific literature, and clinical reports is essential[27]. LLMs facilitate the effective analysis and utilization of these heterogeneous data sources. Furthermore, LLMs have enhanced accessibility by allowing researchers to utilize pre-trained models like ChatGPT, Gemini, and open-source alternatives such as BERT and DeepSeek without necessitating retraining[19,20,28]. This capability supports tasks including the analysis of radiology reports, endoscopic findings, pathology records, clinical trial documentation, and research notes within GI oncology, thereby streamlining the generation of structured insights, risk stratification, and therapeutic recommendations.

Table 2 and Supplementary Table 1 provide a comprehensive overview of 69 representative studies on NLP and LLM applications in GI cancers conducted between 2011 and 2025. These studies encompass traditional NLP methodologies based on rule sets, lexicons, and statistical learning (Supplementary Table 1), alongside the rapidly emerging Transformer-based LLM approaches post-2020 (Table 2). The inclusion of traditional NLP methods alongside LLMs is justified by several factors: Early rule-based and statistical NLP investigations achieved high accuracy in tasks such as colonoscopy quality control and pathological entity recognition, thereby supplying valuable annotated datasets and task frameworks for subsequent LLM development. Moreover, due to the necessity for interpretability and controllability in clinical environments, certain rule-based techniques continue to function as safety mechanisms or post-validation modules for LLM outputs.

Table 2 Summary of key studies of large language models in the field of gastrointestinal cancer.
Ref.
Year
Models
Objectives
Datasets
Performance
Evaluation
Syed et al[29]2022BERTiDeveloped fine-tuned BERTi for integrated colonoscopy reports34165 reportsF1-scores of 91.76%, 92.25%, 88.55% for colonoscopy, pathology, and radiologyManual chart review by 4 expert-guided reviewer
Lahat et al[30]2023GPTAssessed GPT performance in addressing 110 real-world gastrointestinal inquiries110 real-life questions Moderate accuracy (3.4-3.9/5) for treatment and diagnostic queriesAssessed by three gastroenterologists using a 1-5 scale for accuracy etc.
Lee et al[31]2023GPT-3.5Examined GPT-3.5’s responses to eight frequently asked colonoscopy questions8 colonoscopy-related questionsGPT answers had extremely low text similarity (0%-16%)Four gastroenterologists rated the answers on a 7-point Likert scale
Emile et al[32]2023GPT-3.5Analyzed GPT-3.5’s ability to generate appropriate responses to CRC questions38 CRC questions86.8% deemed appropriately, with 95% concordance on 2022 ASCRS guidelinesThree surgery experts assessed answers using ASCRS guidelines
Moazzam et al[33]2023GPTInvestigated the quality of GPT’s responses to pancreatic cancer-related questions30 pancreatic cancer-questions80% responses were “very good” or “excellent”Responses were graded by 20 experts against a clinical benchmark
Yeo et al[34]2023GPTAssessed GPT’s performance in answering questions regarding cirrhosis and HCC164 questions about cirrhosis and HCC79.1% correctness for cirrhosis and 74% for HCC, but only 47.3% comprehensivenessResponses were reviewed by two hepatologists and resolved by a 3rd reviewer
Cao et al[35]2023GPT-3.5Examined GPT-3.5’s capacity to answer on liver cancer screening and diagnosis20 questions48% answers were accurate, with frequent errors in LI-RADS categoriesSix fellowship-trained physicians from three centers assessed answers
Gorelik et al[36]2024GPT-4Evaluated GPT-4’s ability to provide guideline-aligned recommendations275 colonoscopy reportsAligned with experts in 87% of scenarios, showing no significant accuracy gapAdvice assessed by consensus review with multiple experts
Gorelik et al[37]2023GPT-4Analyzed GPT-4’s effectiveness in post-colonoscopy management guidance20 clinical scenarios90% followed guidelines, with 85% correctness and strong agreement (κ = 0.84)Assessed by two senior gastroenterologists for guideline compliance
Zhou et al[38]2023GPT-3.5 and GPT-4Developed a gastric cancer consultation system and automated report generator23 medical knowledge questions91.3% appropriate gastric cancer advice (GPT-4), 73.9% for GPT-3.5The evaluation was conducted by reviewers with medical standards
Yang et al[39]2025RECOVER (LLM)Designed a LLM-based remote patient monitoring system for postoperative care7 design sessions, 5 interviewsSix major design strategies for integrating clinical guidelines and informationClinical staff reviewed and provided feedback on the design and functionality
Kerbage et al[40]2024GPT-4Evaluated GPT-4’s accuracy in responding to IBS, IBD, and CRC screening65 questions (45 patients, 20 doctors)84% of answers were accurateAssessed independently by three senior gastroenterologists
Tariq et al[41]2024GPT-3.5, GPT-4, and BardCompared the efficacy of GPT-3.5, GPT 4, and Bard (July 2023 version) in answering 47 common colonoscopy patient queries47 queriesGPT 4 outperformed GPT-3.5 and Bard, with 91.4% fully accurate responses vs 6.4% and 14.9%, respectivelyResponses were scored by two specialists on a 0-2 point scale and resolved by a 3rd reviewer
Maida et al[42]2025GPT-4Evaluated GPT-4’s suitability in addressing screening, diagnostic, therapeutic inquiries15 CRC screening inquiries4.8/6 for CRC screening accuracy, 2.1/3 for completeness scoredAssessment involved 20 experts and 20 non-experts rating the answers
Atarere et al[43]2024BingChat, GPT, YouChatTested the appropriateness of GPT, BingChat, and YouChat in patient education and patient-physician communication20 questions (15 on CRC screening and 5 patient-related)GPT and YouChat provided more reliable answers than BingChat, but all models had occasional inaccuraciesTwo board-certified physicians and one Gastroenterologist graded the responses
Chang et al[44]2024GPT-4Compared GPT-4’s accuracy, reliability, and alignment of colonoscopy recommendations505 colonoscopy reports85.7% of cases matched USMSTF guidelinesAssessment was conducted by an expert panel under USMSTF guidelines
Lim et al[45]2024GPT-4Compared a contextualized GPT model with standard GPT in colonoscopy screening62 example use casesContextualized GPT-4 outperformed standard GPT-4Compare the GPT4 against a model with relevant screening guidelines
Munir et al[46]2024GPTEvaluated the quality and utility of responses for three GI surgeries24 research questionsModest quality and vary significantly based on the type of procedureResponses were graded by 45 expert surgeons
Truhn et al[47]2024GPT-4Created a structured data parsing module with GPT-4 for clinical text processing100 CRC reports99% accuracy for T-stage extraction, 96% for N-stage, and 94% for M-stageAccuracy of GPT-4 was compared with manually extracted data by experts
Choo et al[48]2024GPTDesigned a clinical decision-support system to generate personalized management plans30 stage III recurrent CRC patients86.7% agree with tumor board decisions, 100% for second-line therapiesThe recommendations were compared with the decision plans made by the MDT
Huo et al[49]2024GPT, BingChat, Bard, Claude 2Established a multi-AI platform framework to optimize CRC screening recommendationsResponses for 3 patient casesGPT aligned with guidelines in 66.7% of cases, while other AIs showed greater divergenceClinician and patient advice was compared to guidelines
Pereyra et al[50]2024GPT-3.5Optimized GPT-3.5 for personalized CRC screening recommendations238 physiciansGPT scored 4.57/10 for CRC screening, vs 7.72/10 for physiciansAnswers were compared against a group of surgeons
Peng et al[51]2024GPT-3.5Built a GPT-3.5-powered system for answering CRC-related queries131 CRC questions63.01 mean accuracy, but low comprehensiveness scores (0.73-0.83)Two physicians reviewed each response, with a third consulted for discrepancies
Ma et al[52]2024GPT-3.5Established GPT-3.5-based quality control for post-esophageal ESD procedures165 esophageal ESD cases92.5%-100% accuracy across post-esophageal ESD quality metricsTwo QC members and a senior supervisor conducted assessment
Cohen et al[53]2025LLaMA-2, Mistral-v0.1Explored the ability of LLMs to extract PD-L1 biomarker details for research purposes232 EHRs from 10 cancer typesFine-tuned LLMs outperformed LSTM trained on > 10000 examplesAssessed by 3 clinical experts against manually curated answers
Scherbakov et al[54]2025Mixtral 8 × 7 BAssessed LLM to extract stressful events from social history of clinical notes109556 patients, 375334 notesArrest or incarceration (OR = 0.26, 95%CI: 0.06-0.77)One human reviewer assessed the precision and recall of extracted events
Chatziisaak et al[55]2025GPT-4Evaluated the concordance of therapeutic recommendations generated by GPT100 consecutive CRC patients72.5% complete concordance, 10.2% partial concordance, and 17.3% discordanceThree reviewers independently assessed concordance with MDT
Saraiva et al[56]2025GPT-4Assessed GPT-4’s performance in interpreting images in gastroenterology740 imagesCapsule endoscopy: Accuracies 50.0%-90.0% (AUCs 0.50-0.90)Three experts reviewed and labeled images for CE
Siu et al[57]2025GPT-4Evaluated the efficacy, quality, and readability of GPT-4’s responses8 patient-style questionsAccurate (40), safe (4.25), appropriate (4.00), actionable (4.00), effective (4.00)Evaluated by 8 colorectal surgeons
Horesh et al[58]2025GPT-3.5Evaluated management recommendations of GPT in clinical settings15 colorectal or anal cancer patientsRating 48 for GPT recommendations, 4.11 for decision justificationEvaluated by 3 experienced colorectal surgeons
Ellison et al[59]2025GPT-3.5, PerplexityCompared readability using different prompts52 colorectal surgery materialsAverage 7.0-9.8, Ease 53.1-65.0, Modified 9.6-11.5Compared mean scores between baseline and documents generated by AI
Ramchandani et al[60]2025GPT-4Validated the use of GPT-4 for identifying articles discussing perioperative and preoperative risk factors for esophagectomy1967 studies for title and abstract screeningPerioperative: Agreement rate = 85.58%, AUC = 0.87. Preoperative: Agreement rate = 78.75%, AUC = 0.75Decisions were compared with those of three independent human reviewers
Zhang et al[61]2025GPT-4, DeepSeek, GLM-4, Qwen, LLaMa3To evaluate the consistency of LLMs in generating diagnostic records for hepatobiliary cases using the HepatoAudit dataset684 medical records covering 20 hepatobiliary diseasesPrecision: GPT-4 reached a maximum of 93.42%. Recall: Generally below 70%, with some diseases below 40%Professional physicians manually verified and corrected all the data
Spitzl et al[62]2025Claude-3.5, GPT-4o, DeepSeekV3, Gemini 2Assessed the capability of state-of-the-art LLMs to classify liver lesions based solely on textual descriptions from MRI reports88 fictitious MRI reports designed to resemble real clinical documentationMicro F1-score and macro F1-score: Claude 3.5 Sonnet 0.91 and 0.78, GPT-4o 0.76 and 0.63, DeepSeekV3 0.84 and 0.70, Gemini 2.0 Flash 0.69 and 0.55Model performance was assessed using micro and macro F1-scores benchmarked against ground truth labels
Sheng et al[63]2025GPT-4o and GeminiInvestigated the diagnostic accuracies for focal liver lesions228 adult patients with CT/MRI reportsTwo-step GPT-4o, single-step GPT-4o and single-step Gemini (78.9%, 68.0%, 73.2%)Six radiologists reviewed the images and clinical information in two rounds (alone, with LLM)
Williams et al[64]2025GPT-4-32KDetermined LLM extract reasons for a lack of follow-up colonoscopy846 patients' clinical notesOverall accuracy: 89.3%, reasons: Refused/not interested (35.2%)A physician reviewer checked 10% of LLM-generated labels
Lu et al[65]2025MoE-HRSUsed a novel MoE combined with LLMs for risk prediction and personalized healthcare recommendationsSNPs, medical and lifestyle data from United Kingdom BiobankMoE-HRS outperformed state-of-the-art cancer risk prediction models in terms of ROC-AUC, precision, recall, and F1 scoreLLMs-generated advice were validated by clinical medical staff
Yang et al[66]2025GPT-4Explored the use of LLMs to enhance doctor-patient communication698 pathology reports of tumorsAverage communication time decreased by over 70%, from 35 to 10 min (P < 0.001)Pathologists evaluated the consistency between original and AI reports
Jain et al[67]2025GPT-4, GPT-3.5, GeminiStudied the performance of LLMs across 20 clinicopathologic scenarios in gastrointestinal pathology20 clinicopathologic scenarios in GIDiagnostic accuracy: Gemini Advanced (95%, P = 0.01), GPT-4 (90%, P = 0.05), GPT-3.5 (65%)Two fellowship-trained pathologists independently assessed the responses of the models
Xu et al[68]2025GPT-4, GPT-4o, GeminiAssessed the performance of LLMs in predicting immunotherapy response in unresectable HCCMultimodal data from 186 patientsAccuracy and sensitivity: GPT-4o (65% and 47%) Gemini-GPT (68% and 58%). Physicians (72% and 70%)Six physicians (three radiologists and three oncologists) independently assessed the same dataset
Deroy et al[69]2025GPT-3.5 TurboExplored the potential of LLMs as a question-answering (QA) tool30 training and 50 testing queriesA1: 0.546 (maximum value); A2: 0.881 (maximum value across three runs)Model-generated answers were compared to the gold standard
Ye et al[70]2025BioBERT-basedProposed a novel framework that incorporates clinical features to enhance multi-omics clustering for cancer subtypingSix cancer datasets across three omics levels Mean survival score of 2.20, significantly higher than other methodsThree independent clinical experts review and validate the clustering results

The volume and temporal distribution of studies reveal distinct trends between traditional NLP and modern LLM research. Over a 14-year span (2011-2025), only 25 studies focused on traditional NLP approaches, whereas LLM-related publications surged from zero to 42 within five years following 2020, indicating rapid expansion. Since 2023, more than ten new investigations annually have employed frameworks such as LLaMA-2 and Gemini, establishing LLMs as the most dynamic area in intelligent text processing for GI cancers.

As detailed in Table 2[29-70], LLMs have been extensively applied to address a variety of GI cancer-related challenges. For example, GPT series models have been utilized to respond to diverse clinical inquiries, including colon cancer screening, pancreatic cancer treatment, and the diagnosis of cirrhosis and liver cancer. These applications underscore the robust language comprehension and generation capabilities of LLMs, enabling them to manage medical knowledge across multiple domains and provide preliminary informational support for clinicians and patients. For example, in 2023, Emile et al[32] found that GPT-3.5 could generate appropriate responses for 86.8% of 38 CRC questions, with 95% concordance with the 2022 ASCRS guidelines.

Several studies have focused on leveraging LLMs to develop personalized medical systems. Choo et al[48] designed a clinical decision support system that used GPT to generate personalized management plans for stage III recurrent CRC patients. The plans showed 86.7% agreement with the decisions of the tumor board, and 100% agreement for second-line therapies. This indicates that LLMs have the potential to provide customized medical solutions based on patients' specific conditions. LLMs have also been applied to automated report generation and data processing. In 2023, Zhou et al[38] developed a gastric cancer consultation system and an automated report generator based on GPT-3.5 and GPT-4. GPT-4 provided appropriate gastric cancer advice in 91.3% of cases. Moreover, in 2024, Truhn et al[47] used GPT-4 to create a structured data parsing module for clinical text processing, achieving 99%, 96%, and 94% accuracy in extracting T-stage, N-stage, and M-stage respectively, which greatly improved the efficiency of data processing. To further facilitate the application of LLMs, some researchers have dedicated to model comparison and optimization. In 2024, Tariq et al[41] compared the performance of GPT-3.5, GPT-4, and Bard in answering 47 common colonoscopy patient queries. They found that GPT-4 outperformed the others, with 91.4% fully accurate responses. This helps researchers understand the performance of different models and select more suitable ones for optimization and application.

Despite these advancements, several challenges hinder the clinical translation of LLMs in GI cancer. First, while LLMs often exhibit remarkable accuracy and promising applications, these models are not specifically designed for medical contexts. Several studies have further revealed inconsistencies or uncertainties in their reported outcomes. Pereyra et al[50] found GPT-3.5 scored just 4.57/10 for CRC screening recommendations, far below physicians’ 7.72/10, while Tariq et al[41] revealed stark model disparities: GPT-4 delivered 91.4% fully accurate colonoscopy query responses, but GPT-3.5 and Bard only achieved 6.4% and 14.9%, respectively. Even for common tasks, Cao et al[35] noted GPT-3.5 had only 48% accuracy in liver cancer screening (with frequent category errors), demonstrating that generalization issues extend beyond rare cases.

Second, data privacy and compliance risks persist. For example, most widely adopted LLMs (e.g., Claude-3.5) are trained on heterogeneous non-medical datasets, lacking inherent safeguards for sensitive GI cancer data. This creates significant HIPAA/GDPR compliance concerns, raising questions about how patient data is protected during model deployment.

Third, interpretability gaps undermine clinical trust, while GPT-4 shows strong guideline alignment (87% agreement with experts in Gorelik et al[36]), its black-box nature means clinicians cannot trace the reasoning behind outputs, a critical flaw in high-stakes scenarios. This is exemplified by Yeo et al[34], where GPT achieved 74% correctness for HCC-related queries but only 47.3% comprehensiveness. Clinicians could not verify why incomplete information was generated, limiting reliance on such tools.

Together, these challenges, inconsistent performance, privacy risks, and opaque reasoning, create barriers to integrating LLMs into routine GI cancer care, as they fail to meet the rigor and reliability required for clinical decision-making. To address the aforementioned challenges and accelerate the clinical integration of LLMs, future research should prioritize directions based on findings presented in Table 2. Such as enhancing fine-tuning LLMs on GI-specific datasets, integrating rule-based checks to verify outputs (with traditional NLP in Supplementary Table 1), using open-source models with local deployment for privacy-sensitive data handling.

Applications of VFMs in GI cancers

Since the early 2020s, VFMs have revolutionized biomedical image analysis[25,71]. These models acquire universal visual representations from extensive collections of unlabeled medical images and can be adapted to specialized tasks, such as GI cancer detection, through fine-tuning on relatively small labeled datasets[72]. For example, in CRC screening, FMs have demonstrated substantial improvements in polyp detection accuracy following fine-tuning. Moreover, VFMs are increasingly employed in cross-modal applications[73]. They integrate different modalities of data to achieve a more comprehensive understanding of disease pathology. This integration necessitates the processing of diverse datasets and significant computational resources. However, the emergence of open-source VFMs, including MedSAM and BiomedCLIP, has enhanced accessibility to these advanced tools[71,74]. Although current usage requires foundational programming skills, the advent of low-code/no-code platforms and Model-as-a-Service (MaaS) frameworks is poised to enable non-expert users to leverage these technologies. Such developments are expected to catalyze advancements in early screening, diagnosis, and personalized treatment strategies for GI cancers.

VFMs in endoscopy: Endoscopy constitutes a critical modality for the diagnosis and management of GI cancers, generating vast quantities of images that capture essential information ranging from early lesions to advanced tumor stages. Traditionally, the interpretation of these images has relied heavily on the expertise of experienced endoscopists, a process that is both time-intensive and susceptible to human error, especially given the increasing volume of examinations[75]. VFMs offer a novel solution by enabling direct analysis of endoscopic video streams, facilitating the automatic localization and classification of lesions such as polyps and ulcers.

Table 3 summarizes 19 recent studies (2023-2025), all of which intentionally adapt VFMs for endoscopy applications. Due to space constraints, more detailed information about these models, such as Country, Dataset sizes, Evaluation metrics, Fine-tuning strategies, Performance benchmarks, and GPUs, is presented in Supplementary Table 2. In contrast, Supplementary Table 3 focuses on VFMs benchmarked in endoscopy. It includes models that are not specifically trained or fine-tuned for endoscopy, but some models in Table 3[76-94] use these for benchmarking. This table holds significance as it provides reference results from general or medical-general VFMs. It highlights the transferability of VFMs’ visual feature extraction capabilities and enriches the overall analysis of VFMs in the context of endoscopy.

Table 3 Summary of key studies of vision foundation models-assisted endoscopy in the field of gastrointestinal cancer.
Model
Year
Architecture
Training algorithm
Parameters
Datasets
Disease studied
Model type
Source code link
Surgical-DINO[76]2023DINOv2LoRA layers added to DINOv2, optimizing the LoRA layers86.72MSCARED, HamlynEndoscopic SurgeryVisionhttps://github.com/BeileiCui/SurgicalDINO
ProMISe[77]2023SAM (ViT-B)APM and IPS modules are trained while keeping SAM frozen1.3-45.6M EndoScene, ColonDB etc.Polyps, Skin CancerVisionNA
Polyp-SAM[78]2023SAMStrategy as pretrain only the mask decoder while freezing all encodersNACVC-ColonDB Kvasir etc.Colon PolypsVisionhttps://github.com/ricklisz/Polyp-SAM
Endo-FM[79]2023ViT B/16Pretrained using a self-supervised teacher-student framework, and fine-tuned on downstream tasks121MColonoscopic, LDPolyp etc.Polyps, erosion, etc.Visionhttps://github.com/med-air/Endo-FM
ColonGPT[80]2024SigLIP-SO, Phi1.5Pre-alignment with image-caption pairs, followed by supervised fine-tuning using LoRA0.4-1.3BColonINST (30k+ images)Colorectal polypsVisionhttps://github.com/ColonGPT/ColonGPT
DeepCPD[81]2024ViTHyperparameters are optimized for colonoscopy datasets, including Adam optimizerNAPolypsSet, CP-CHILD-A etc.CRCVisionhttps://github.com/Zhang-CV/DeepCPD
OneSLAM[82]2024Transformer (CoTracker)Zero-shot adaptation using TAP + Local Bundle AdjustmentNASAGE-SLAM, C3VD etc.Laparoscopy, ColonVisionhttps://github.com/arcadelab/OneSLAM
EIVS[83]2024Vision Mamba, CLIPUnsupervised Cycle‑Consistency63.41M613 WLE, 637 imagesGastrointestinalVisionNA
APT[84]2024SAMParameter-efficient fine-tuningNAKvasir-SEG, EndoTect etc.CRCVisionNA
FCSAM[85]2024SAMLayerNorm LoRA fine-tuning strategy1.2MGastric cancer (630 pairs) etc.GC, Colon PolypsVisionNA
DuaPSNet[86]2024PVTv2-B3Transfer learning with pre-trained PVTv2-B3 on ImageNetNALaribPolypDB, ColonDB etc.CRCVisionhttps://github.com/Zachary-Hwang/Dua-PSNet
EndoDINO[87]2025ViT (B, L, g)DINOv2 methodology, hyperparameters tuning86M to 1BHyperKvasir, LIMUCGI EndoscopyVisionhttps://github.com/ZHANGBowen0208/EndoDINO/
PolypSegTrack[88]2025DINOv2One-step fine-tuning on colonoscopic videos without first pre-trainingNAETIS, CVC-ColonDB etc.Colon polypsVisionNA
AiLES[89]2025RF-NetNot fine-tuned from external modelNA100 GC patientsGastric cancerVisionhttps://github.com/CalvinSMU/AiLES
PPSAM[90]2025SAMFine-tuning with variable bounding box prompt perturbationsNAEndoScene, ColonDB etc.Investigated in Ref.Visionhttps://github.com/SLDGroup/PP-SAM
SPHINX-Co[91]2024LLaMA-2 + SPHINX-XFine-tuned SPHINX-X on CoPESD with cosine learning rate scheduler7B, 13BCoPESD Gastric cancerMultimodalhttps://github.com/gkw0010/CoPESD
LLaVA-Co[91]2024LLaVA-1.5 (CLIP-ViT-L)Fine-tuned LLaVA-1.5 on CoPESD with cosine learning rate scheduler7B, 13BCoPESD Gastric cancerMultimodalhttps://github.com/gkw0010/CoPESD
ColonCLIP[92]2025CLIPPrompt tuning with frozen CLIP, then encoder fine-tuning with frozen prompts57M, 86MOpenColonDB CRCMultimodalhttps://github.com/Zoe-TAN/ColonCLIP-OpenColonDB
PSDM[93]2025Stable Diffusion + CLIPContinual learning with prompt replay to incrementally train on multiple datasetsNAPolypGen, ColonDB, Polyplus etc.CRCVision, GenerativeThe original paper reported a GitHub link for this model, but it is currently unavailable
PathoPolypDiff[94]2025Stable Diffusion v1-4Fine-tuned Stable Diffusion v1-4 and locked first U-Net block, fine-tuned remaining blocksNAISIT-UMR Colonoscopy DatasetCRCGenerativehttps://github.com/Vanshali/PathoPolyp-Diff

VFMs demonstrate notable strengths in GI cancer endoscopy through multiple advanced approaches. Parameter-efficient variants such as Surgical-DINO (LoRA, 0.3% trainable) and APT/FCSAM (adapter-based, < 1%) achieve competitive results, while fully-fine-tuned Endo-FM reaches 73.9 Dice on CVC-12k[76,79]. With respect to multimodal reasoning, LLaVA-Co achieves GPT scores of 85.6/100 and mIoU 60.2% on ESD benchmarks[91]. Regarding unified architectures across tasks, SAM-derived pipelines (e.g., ProMISe[77], Polyp-SAM[78], APT[84], FCSAM[85], PP-SAM[90]) have so far been individually evaluated for either segmentation or detection metrics. This suggests a single foundation backbone could replace the current patchwork of bespoke CNNs. For generative augmentation, PSDM[93] and PathoPolyp-Diff[94] utilize Stable Diffusion to synthesize polyp subtypes and show good performance in improving relevant downstream tasks. VFMs first benefit from limited real data, then generate synthetic data to further refine themselves. In terms of hardware economy, while billion-scale models such as EndoDINO[87] require 8 × H100, many adapter-based systems (e.g., DuaPSNet[86], AiLES[89]) train on 2 or fewer consumer GPUs (e.g., RTX 3060/3090) (Supplementary Table 2). This is largely because self-supervised pre-training has already handled the bulk of computations, making the democratization of high-quality GI AI feasible even for resource-constrained centers.

Supplementary Table 3 has unique value in the context of VFMs for GI endoscopy: It includes models that are not optimized specifically for endoscopy but still prove useful in benchmarking. For example, models like TimeSformer and ST-Adapter, despite lacking endoscopy-specific refinement, demonstrate certain value when used in the benchmarking of Endo-FM[79]. Meanwhile, general-purpose models such as SAM, Gemini-1.5, and Stable Diffusion are also tested in the benchmarking of other models like PPSAM[90], ColonCLIP[92] and PathoPolyp-Diff[94] respectively, showing their potential to support performance evaluation in this specialized field. These results confirm the general-purpose vision-language capabilities of models like CLIP and Gemini-1.5 (Supplementary Table 3), even when the base model has never been exposed to endoscope data.

Collectively, these findings show that VFMs, whether applied directly or through secondary development, play a pivotal role in GI cancer endoscopy tasks including polyp recognition and early lesion monitoring. They contribute to enhanced diagnostic efficiency and accuracy. Furthermore, the reviewed studies highlight the complementary strengths of diverse models in specific tasks, thereby laying the groundwork for future multi-model fusion systems aimed at intelligent endoscopic diagnosis.

VFMs in radiology: VFMs have become increasingly significant in radiology, particularly for GI cancer diagnosis, complementing traditional endoscopic approaches. Radiological modalities such as CT, MRI, and positron emission tomography play essential roles in initial cancer staging, metastasis detection, treatment monitoring, and postoperative recurrence identification[95]. Traditional radiology methods involve manually marking regions of interest and extracting features, which is reliable but time-consuming and constrained by limited data[96]. In contrast, VFMs using Transformer-based architectures enable automated processing of entire images, capturing intricate details of tumors and adjacent tissues. This reduces the need for manual annotation. The recent availability of large-scale, open-source VFMs pre-trained on millions of radiographs has facilitated fine-tuning on relatively small datasets, such as several dozen enhanced CT scans for gastric or CRC, using modest computing resources[97].

To summarize the application and development of VFMs in radiology for GI cancer, three key tables are presented in this section. Table 4 encapsulates 10 representative VFM studies, covering essential information such as model architecture, training algorithm, applied datasets. Supplementary Table 4 extends the content of Table 4 by providing more methodological details for the same 10 models, including specific sizes of datasets, evaluation metrics, fine-tuning strategies, performance benchmarks. Meanwhile, Supplementary Table 5 offers a set of models that were not specifically trained or fine-tuned for radiology tasks but were adopted as benchmarks by several models in Table 4[97-105], thereby providing a comparative context to assess the relative performance of VFMs tailored for radiology.

Table 4 Summary of key studies of vision foundation models-assisted radiology in the field of gastrointestinal cancer.
Model
Year
Architecture
Training algorithm
Parameters
Datasets
Disease studied
Model type
Source code link
PubMedCLIP[98]2021CLIPFine-tuned on ROCO dataset for 50 epochs with Adam optimizerNAROCO, VQA-RAD, SLAKEAbdomen samplesMultimodalhttps://github.com/sarahESL/PubMedCLIP
RadFM[97]2023MedLLaMA-13BPre-trained on MedMD and fine-tuned on RadMD14BMedMD, RadMD etc.Over 5000 diseases Multimodalhttps://github.com/chaoyi-wu/RadFM
Merlin[99]2024I3D-ResNet152Multi-task learning with EHR and radiology reports and fine-tuning for specific tasksNA6M images, 6M codes and reportsMultiple diseases, AbdominalMultimodalNA
MedGemini[100]2024GeminiFine-tuning Gemini 1.0/1.5 on medical QA, multimodal and long-context corpora1.5BMedQA, NEJM, GeneTuringVariousMultimodalhttps://github.com/Google-Health/med-gemini-medqa-relabelling
HAIDEF[101] 2024VideoCoCaFine-tuning on downstream tasks with limited labeled dataNACT volumes and reportsVariousVisionhttps://huggingface.co/collections/google/
CTFM[102]2024Vision Model1Trained using a self-supervised learning strategy, employing a SegResNet encoder for the pre-training phaseNA26298 CT scansCT scans (stomach, colon) Visionhttps://aim.hms.harvard.edu/ct-fm
MedVersa[103]2024Vision Model1Trained from scratch on the MedInterp dataset and adapted to various medical imaging tasksNAMedInterpVariousVisionhttps://github.com/3clyp50/MedVersa_Internal
iMD4GC[104]2024Transformer-based2A novel multimodal fusion architecture with cross-modal interaction and knowledge distillationNAGastricRes/Sur, TCGA etc.Gastric cancerMultimodalhttps://github.com/FT-ZHOU-ZZZ/iMD4GC/
Yasaka et al[105]2025BLIP-2LORA with specific fine-tuning of the fc1 layer in the vision and q-former modelsNA5777 CT scansEsophageal cancer via chest CTMultimodalNA

First, in terms of architectural diversity and technical adaptation, VFMs have evolved from single-modal vision models to integrated multimodal systems. On one hand, vision-specific models focus on optimizing image feature extraction for GI-related scans. for example, CT-FM adopts a SegResNet encoder and uses SSL to process 26298 CT scans, targeting stomach and colon cancer imaging[102]; MedVersa, trained from scratch on the MedInterp dataset, is adapted to multiple medical imaging tasks, including GI cancer detection[103]. On the other hand, multimodal models integrate non-imaging data to enhance diagnostic accuracy. Merlin uses an I3D-ResNet152 architecture and incorporates multi-task learning with EHR and radiology reports, enabling it to handle abdominal GI diseases alongside other conditions[99]. Second, regarding disease coverage and clinical targeting, VFMs now address a broader spectrum of GI cancers while maintaining specificity for individual disease types. Some models achieve wide applicability across GI malignancies. RadFM, built on MedLLaMA-13B and trained on 16M image-text pairs from MedMD, covers over 5000 diseases including various GI cancers[97]; HAI-DEF, based on VideoCoCa, processes CT volumes and reports to support diagnosis for multiple GI-related conditions[101]. In contrast, other models focus on specific GI cancer subtypes to meet targeted clinical needs: Yasaka et al’s model[105], which fine-tunes BLIP-2 via LoRA, uses 5777 CT scans to specifically detect esophageal cancer; iMD4GC is exclusively developed for gastric cancer, leveraging disease-specific datasets to improve diagnostic precision for this subtype[104]. Third, in terms of performance validation and benchmarking, VFMs demonstrate robust results through standardized metrics and multi-center validation, with Supplementary Table 4 providing detailed performance data. For classification tasks, PubMedCLIP, a CLIP-based model fine-tuned on the ROCO dataset, achieves up to a 3% improvement in overall accuracy over MAML (a traditional meta-learning model)[98]. For predictive tasks, Merlin shows strong performance in multi-disease 5-year prediction for GI cancers, with an AUROC of 0.757, and maintains reliability in external validation[99]. Additionally, RadFM outperforms existing multimodal models (e.g., Openflamingo) on RadBench and other public benchmarks, with high scores in classification and open-ended tasks, as confirmed by radiologist evaluation[97].

Unlike these specialized radiology models, several general-purpose VFMs, untailored for radiology, serve as benchmarks for GI cancer-specific VFMs like RadFM or Merlin. These VFMs target broad visual or multimodal understanding outside medicine, making their role in radiology benchmarks significant. For example, GPT-4V, designed for general visual-language tasks, was benchmarked against RadFM[97]; RadFM outperformed it in radiology tasks, as shown by higher BLEU and F1 scores. MedFlamingo and OpenFlamingo also served as RadFM benchmarks, with RadFM excelling in open-ended radiology Q&A on RadBench. OpenCLIP, pre-trained on non-medical data, was used as a benchmark in Merlin’s evaluation[99]. Merlin’s radiology-specific training achieved an internal zero-shot F1 score of 0.741, outperforming OpenCLIP. Their inclusion in benchmarking experiments offers a universal performance baseline for radiology models, enabling objective assessment of domain adaptation benefits. Additionally, they boost the reproducibility and comparability of research, as publicly accessible models like GPT-4V and OpenCLIP allow for consistent cross-study alignment.

Despite the progress of VFMs in GI cancer radiology, several radiology-specific limitations and challenges remain evident in current research. For example, dataset bias and scarcity hinder model generalizability. Models like Yasaka et al[105] rely on relatively small, single-center datasets of 5777 CT scans, which may fail to capture the variability of GI cancer imaging across different populations or clinical settings. There is also limited focus on 3D radiological data. Most models (e.g., PubMedCLIP, RadFM) primarily process 2D images, while 3D CT/MRI volumes, critical for assessing tumor depth and spread in GI cancer, are less addressed (Merlin mentions 3D semantic segmentation with a Dice score of 0.798). To address these issues, future research should prioritize radiology-tailored recommendations. For instance, expand multi-center, diverse datasets for training so that future models could integrate data from global GI cancer centers to reduce bias. In practice, it is possible to combine TCGA data (used by iMD4GC) with real-world clinical scans to cover more ethnicities and disease stages[104]. Moreover, it is useful to enhance 3D data processing capabilities. Leveraging Merlin’s progress in 3D segmentation, future VFMs should optimize architectures for 3D GI cancer imaging to improve tumor staging accuracy, a key radiological task for treatment planning[99].

VFMs in pathology: Histopathology plays a pivotal role in cancer diagnosis, prognosis, and treatment. Traditionally, pathologists examined tissue slides under microscopes, a process that was slow, labor-intensive, and prone to errors stemming from variability in expertise. Such limitations occasionally resulted in misdiagnoses, particularly in complex cases[106]. The integration of digital technologies revolutionized this domain through whole-slide imaging (WSI), which converts glass slides into high-resolution digital images that retain all microscopic details[107]. But manual analysis of these extensive datasets was impractical. This led to the rise of computational pathology, which uses computer algorithms to analyze these images more efficiently[108]. The initial applications of digital pathology primarily supported clinical decision-making by enhancing cancer detection accuracy. In 2021, the Food and Drug Administration (FDA) approved the first AI pathology system, marking a major step forward[109]. Now, advancements in FMs have spurred research into their application for WSI analysis. This is advancing tumor pathology toward greater automation and intelligence, making it faster and more accurate.

To elaborate on the application and advancement of VFMs in GI pathology, Table 5 encapsulates 28 representative VFM studies, showing the deployment of VFMs for tasks like detection & classification, segmentation, and histopathological assessment in GI WSIs. Due to space constraints, Supplementary Table 6 provides comprehensive methodological details for each corresponding model. These applications have markedly enhanced diagnostic efficiency and accuracy. Unlike the direct utilization of FMs in LLMs or endoscopic imaging, GI histopathology adopts a distinct technical approach, likely influenced by the extensive research in computational pathology favoring customized and specialized model architectures. By training and fine-tuning models on domain-specific pathological data, these VFMs achieve precise recognition and analysis of tumor features, rather than relying on general-purpose models.

Table 5 Summary of key studies of Vision Foundation Models-assisted pathology in the field of gastrointestinal cancer.
Model
Year
Architecture
Training Algorithm
Paras
WSIs
Tissues
Open source link
LUNIT-SSL[110]2021ViT-SDINO; full fine-tuning and linear evaluation on downstream tasks22M3.7K32https://Lunitio.github.io/research/publications/pathology_ssl
CTransPath[111]2022Swin TransformerMoCoV3 (SRCL); frozen backbone with linear classifier fine-tuning28M32K32https://github.com/Xiyue-Wang/TransPath
Phikon[112]2023ViT-BiBOT (Masked Image Modeling); fine-tuned with ABMIL/TransMIL on frozen features86M6K16https://github.com/owkin/HistoSSLscaling
REMEDIS[113]2023BiT-L (ResNet-152)SimCLR (contrastive learning); end-to-end fine-tuning on labeled ID/OOD data232M29K32https://github.com/google-research/simclr
Virchow[114]2024ViT-H, DINOv2DINOv2 (SSL); used frozen embeddings with simple aggregators632M1.5M17https://huggingface.co/paige-ai/Virchow
Virchow2[115]2024ViT-HDINOv2 (SSL); fine-tuned with linear probes or full-tuning on downstream tasks632M3.1M25https://huggingface.co/paige-ai/Virchow2
Virchow2G[115]2024ViT-GDINOv2 (SSL); fine-tuned with linear probes or full fine-tuning1.9B3.1M25https://huggingface.co/paige-ai/Virchow2
Virchow2G mini[115]12024ViT-S, Virchow2GDINOv2 (SSL); distilled from Virchow2G, then fine-tuned on downstream tasks22M3.2M25https://huggingface.co/paige-ai/Virchow2
UNI[9]2024ViT-LDINOv2 (SSL); used frozen features with linear probes or few-shot learning307M100K20https://github.com/mahmoodlab/UNI
Phikon-v2[116]2024ViT-LDINOv2 (SSL); frozen ViT and ABMIL ensemble fine-tuning307M58K30https://huggingface.co/owkin/phikon-v2
RudolfV[117]2024ViT-LDINOv2 (SSL); fine-tuned with optimizing linear classification layer and adapting encoder weights304M103K58https://github.com/rudolfv
HIBOU-B[118]2024ViT-BDINOv2 (SSL); frozen feature extractor, trained linear classifier or attention pooling86M1.1M12https://github.com/HistAI/hibou
HIBOU-L[118]22024ViT-LDINOv2 (SSL); frozen feature extractor, trained linear classifier or attention pooling307M1.1M12https://github.com/HistAI/hibou
H-Optimus-032024ViT-GDINOv2 (SSL); linear probe and ABMIL on frozen features1.1B> 500K32https://github.com/bioptimus/releases/
Madeleine[119]2024CONCHMAD-MIL; linear probing, prototyping, and full fine-tuning for downstream tasks86M23K2https://github.com/mahmoodlab/MADELEINE
COBRA[120]2024Mamba-2Self-supervised contrastive pretraining with multiple FMs and Mamba2 architecture15M3K6https://github.com/KatherLab/COBRA
PLUTO[121]2024FlexiVit-SDINOv2; frozen backbone with task-specific heads for fine-tuning22M158K28NA
HIPT[122]2025ViT-HIPTDINO (SSL); fine-tune with gradient accumulation10M11K33https://github.com/mahmoodlab/HIPT
PathoDuet[123]2025ViT-BMoCoV3; fine-tuned using standard supervised learning on labeled downstream task data86M11K32https://github.com/openmedlab/PathoDuet
Kaiko[124]2025ViT-LDINOv2 (SSL); linear probing with frozen encoder on downstream tasks303M29K32https://github.com/kaiko-ai/towards_large_pathology_fms
PathOrchestra[125]2025ViT-LDINOv2; ABMIL, linear probing, weakly supervised classification304M300K20https://github.com/yanfang-research/PathOrchestra
THREADS[126]2025ViT-L, CONCHv1.5Fine-tune gene encoder, initialize patch encoder randomly16M47K39https://github.com/mahmoodlab/trident
H0-mini[127]2025ViTUsing knowledge distillation from H-Optimus-086M6K16https://huggingface.co/bioptimus/H0-mini
TissueConcepts[128]2025Swin TransformerFrozen encoder with linear probe for downstream tasks27.5M7K14https://github.com/FraunhoferMEVIS/MedicalMultitaskModeling
OmniScreen[129]2025Virchow2Attention-aggregated Virchow2 embeddings fine-tuning632M48K27https://github.com/OmniScreen
BROW[130]2025ViT-BDINO (SSL); self-distillation with multi-scale and augmented views86M11K6NA
BEPH[131]2025BEiTv2BEiTv2 (SSL); supervised fine-tuning on clinical tasks with labeled data86M11K32https://github.com/Zhcyoung/BEPH
Atlas[132]2025ViT-H, RudolfVDINOv2; linear probing with frozen backbone on downstream tasks632M1.2M70NA

The current research of VFMs in GI pathology presents distinct characteristics across three dimensions, with evidence supported by models from Table 5 and Supplementary Table 6. First, in terms of model architecture, there has been a clear trend toward diversification and scale expansion, with ViT variants becoming the dominant framework while complementary architectures continue to emerge. As shown in Table 5[9,110-132], early models (e.g., LUNIT-SSL) adopted lightweight ViT-S architectures with only 22M parameters, which laid the foundation for VFM application in pathology[110]. By 2024-2025, large-scale ViT-based models had become mainstream: Virchow2[114] and Virchow2G[115] use ViT-H and ViT-G architectures with 632M and 1.9B parameters, respectively, enabling more complex feature extraction for GI cancer tissue analysis. Meanwhile, specialized architectures such as Swin Transformer (CTransPath[111]) and Mamba-2 (COBRA[120]) have been introduced to address the spatial hierarchy of WSI data. For example, CTransPath’s Swin Transformer design, as detailed in Supplementary Table 6, contributes to its ability to outperform ImageNet-pretrained models by +0.6% accuracy on CRC datasets, demonstrating the adaptability of diverse architectures to GI pathology tasks[111]. Second, training data scale expansion and algorithm innovation have significantly enhanced the feature learning capabilities of VFMs, with SSL remaining the core training paradigm. Table 5 reveals a dramatic increase in training WSI volume: From LUNIT-SSL’s 3.7K WSIs (2021) to Virchow2’s 3.1M WSIs (2024) and Atlas’s 1.2M WSIs (2025), covering up to 70 tissue types (Atlas) that include multiple GI cancer subtypes[110,115,132]. Several models further complement this by highlighting dataset diversity. For instance, Phikon was pre-trained on 4M CRC-specific tiles (TCGA-COAD) and scaled to 40M pan-cancer tiles, allowing it to capture GI cancer-specific histological features more effectively[112]. In terms of algorithms, SSL methods have evolved from early contrastive learning (e.g., LUNIT-SSL’s DINO[110], CTransPath’s MoCoV3[111]) to advanced masked image modeling (e.g., Phikon’s iBOT[112]) and knowledge distillation (e.g., Virchow2G mini[115]). Virchow2G mini, a distilled version of Virchow2G (1.9B parameters), retains only 22M parameters but still outperforms larger models like H-Optimus-0 (1.1B parameters) on multiple GI cancer-related benchmarks (Table 5), proving that algorithm optimization can balance model efficiency and performance. Third, performance improvement and generalization ability enhancement have become key indicators, with models consistently outperforming traditional supervised baselines across diverse tasks. Supplementary Table 6 provides detailed performance evidence. REMEDIS, which uses a two-stage SSL strategy (contrastive learning on unlabeled medical images and end-to-end fine-tuning), achieves up to 11.5% in-distribution gain and 10.7% out-of-distribution gain compared to ImageNet/JFT baselines, even when using only 1%-33% of labeled GI cancer data[113]. This is critical for scenarios with limited annotated pathology samples. Virchow, trained on 1.5M H&E-stained WSIs, demonstrates state-of-the-art pan-cancer detection performance, with the highest or statistically tied AUC across nearly all GI cancer types (e.g., colorectal, gastric cancer) and superior generalization to external institutional data[114]. Additionally, models like Phikon[112], Madeleine[119] and HIPT[122] extend VFM application to GI cancer-related tasks beyond classification, such as survival prediction (using Harrell C-index as an evaluation metric), further expanding the utility of VFMs in clinical GI pathology workflows.

Despite their promising progress, VFMs still face distinct limitations and challenges when applied to GI pathology, most of which are closely tied to the unique characteristics of pathological analysis and clinical workflows. First, over-reliance on large-scale, high-quality pathological datasets restricts accessibility. For example, models like Virchow2[115] and Atlas[132] use 3.1M and 1.2M WSIs respectively (Table 5), but such multi-institutional, well-annotated cohorts (e.g., covering rare GI cancer subtypes) are scarce in clinical practice. Smaller datasets (e.g., COBRA’s 3K WSIs) sometimes lead to limited generalization to diverse pathological scenarios[120]. Second, mismatch between model design and pathological interpretation remains a barrier. While models like HIBOU-L[118] and Phikon-v2[116] achieve high AUC in classification (Supplementary Table 6), they lack interpretability for key pathological features. Unlike pathologists who rely on visible morphological cues, VFMs often function as “black boxes”, making clinical validation difficult. Third, computational cost and deployment feasibility hinder clinical translation. Large models such as Virchow2G (1.9B parameters) use a large amount of GPUs for training (Supplementary Table 6), while even compressed models like Virchow2G mini (22M parameters) need specialized hardware. Most clinical pathology laboratories, especially those with limited resources, cannot meet these requirements[115].

Future research on VFMs in GI cancer pathology should target specific limitations. For example, to address data scarcity, it is a priority to develop small-dataset-adaptable VFMs. The H0-mini model success in leveraging 6K WSIs (Table 5) via knowledge distillation from H-Optimus-0[127]. Future models could integrate distillation and cross-stain transfer learning, enabling reliable training even with limited GI cohorts (similar to Virchow2G mini)[115]. Second, to enhance pathological interpretability, designing feature-aligned VFMs is useful. Drawing on Phikon-v2, particularly its biomarker prediction tasks (Supplementary Table 6), future models could link image features to pathological biomarkers (e.g., MSI, HER2, ER in GI tumors), bridging the gap between model outputs and pathologists’ morphological analysis[116]. Third, to improve clinical deployment, optimizing lightweight VFMs for laboratory hardware is critical. Following TissueConcepts’ 27.5M-parameter design (Table 5) and efficient linear-probe fine-tuning (Supplementary Table 6), future research should focus on compressing models to run on standard laboratory workstations, avoiding reliance on large GPU clusters (as needed by larger models like Virchow2 or Phikon-v2)[128]. Finally, to tackle sample variability, training VFMs on heterogeneous pathological datasets is necessary. Models could incorporate augmented data simulating staining inconsistencies and tissue folding, enhancing robustness to real-world GI biopsy variations.

Applications of MLLMs in GI cancers

In the preceding overview of endoscopic and radiological imaging, multimodal FMs have been recurrently highlighted (Tables 2 and 3). These models integrate different types of data, like endoscopic images with text, or CT and MRI scans alongside clinical records and genetic information, to yield superior diagnostic and prognostic performance relative to unimodal approaches. For instance, the ColonCLIP model analyzes endoscopic images and reports together, and GPT-4V uses a multimodal approach for radiological image analysis[92,133]. MLLMs are designed to process and integrate diverse data modalities (text, images, etc.), thereby capturing intermodal relationships that facilitate more efficient learning and enhanced predictive accuracy[134]. They work by merging diverse data into a unified representation, extracting key features from each data type (e.g., word embeddings from text or CNN features from images), and subsequently integrating these features through mechanisms like multilayer perceptrons or graph neural networks. Such integrative modeling holds considerable promise in medical contexts, offering comprehensive diagnostic insights that can improve therapeutic strategies for diseases including GI cancers[135].

Table 6 summarizes pivotal studies investigating MLLMs within GI pathology, while Supplementary Table 7 extends this overview by detailing methodological aspects constrained by space in the main table. The Supplementary material elaborates on training datasets, specifying sources and volumes of image-text pairs or WSIs, performance evaluation metrics across various tasks, and the training and fine-tuning protocols employed. Collectively, these resources provide a thorough depiction of the current landscape of MLLMs in GI cancer research, enabling an in-depth examination of their potential applications.

Table 6 Summary of key studies of multimodal large language models in the field of gastrointestinal cancer.
Model
Year
Vision architecture
Vision dataset
WSIs
Text model
Text dataset
Parameters
Tissues
Generative
Open source link
PLIP[136]2023CLIPOpenPath28KCLIPOpenPathNA32Captioninghttps://github.com/PathologyFoundation/plip
HistGen[137]2023DINOv2, ViT-LMultiple55KLGH ModuleTCGAApproximately 100M32Report generationhttps://github.com/dddavid4real/HistGen
PathAlign[138]2023PathSSLCustom350KBLIP-2Diagnostic reportsApproximately 100M32Report generationhttps://github.com/elonybear/PathAlign
CHIEF[139]2024CTransPath14 Sources60KCLIPAnatomical information27.5M, 63M 19Nohttps://github.com/hms-dbmi/CHIEF
PathGen[140]2024LLaVA, CLIPTCGA7KCLIP1.6M pairs13B32WSI assistanthttps://github.com/PathFoundation/PathGen-1.6M
PathChat[141]2024UNIMultiple999KLLaMa 2Pathology instructions13B20AI assistanthttps://github.com/fedshyvana/pathology_mllm_training
PathAsst[142]2024PathCLIPPathCap207KVicuna-13BPathology instructions13B32AI assistanthttps://github.com/superjamessyx/Generative-Foundation-AI-Assistant-for-Pathology
ProvGigaPath[143]2024ViTProv-Path171KOpenCLIP17K Reports113531Nohttps://github.com/prov-gigapath/prov-gigapath
TITAN[144]2024ViTMass340K336KCoCaMedical reportsApproximately 5B20Report generationhttps://github.com/your-repo/TITAN
CONCH[145]2024ViTMultiple21KGPTstyle1.17M pairsNA19Captioninghttp://github.com/mahmoodlab/CONCH
SlideChat[146]2024CONCHLongNetTCGA4915Qwen2.5-7BSlide Instructions7B10WSI assistanthttps://github.com/uni-medical/SlideChat
PMPRG[147]2024MR-ViTCustom7422GPT-2Pathology ReportsNA2Multi-organ reporthttps://github.com/hvcl/Clinical-grade-PathologyReport-Generation
MuMo[148]2024MnasNetCustom429TransformerPathoRadio ReportsNA1Nohttps://github.com/czifan/MuMo
ConcepPath[149]2024ViT-B, CONCHQuilt-1M2243CLIPGPTPubMedApproximately 187M3 Nohttps://github.com/HKU-MedAI/ConcepPath
GPT-4V[150]2024Phikon ViT-BCRC-7K, MHIST etc.338KGPT-4NA40M3 Report generationhttps://github.com/Dyke-F/GPT-4V-In-Context-Learning
MINIM[151]2024Stable diffusionMultipleNABERT, CLIPMultipleNA6Report generationhttps://github.com/WithStomach/MINIM
PathM3[152]2024ViT-g/14PatchGastric991FlanT5XLPatchGastricNA1WSI assistantNA
FGCR[153]2024ResNet50Custom, GastrADC3598, 991BERTNA9.21 Mb6Report generationhttps://github.com/hudingyi/FGCR
PromptBio[154]2024PLIPTCGA, CPTAC482, 105GPT-4NANA1Report generationhttps://github.com/DeepMed-Lab-ECNU/PromptBio
HistoCap[155]2024ViTNA10KBERT, BioBERTGTEx datasetsNA40Report generationhttps://github.com/ssen7/histo_cap_transformers
mSTAR[156]2024UNITCGA10KBioBERTPathology Reports 11KNA32Report generationhttps://github.com/Innse/mSTAR
GPT-4 Enhanced[157]2025CTransPathTCGANAGPT-4ASCO, ESMO, OnkopediaNA4 Recommendation generationhttps://github.com/Dyke-F/LLM_RAG_Agent
PRISM[158]2025Virchow, ViT-HVirchow dataset587KBioGPT195K Reports632M17Report generationNA
HistoGPT[159]2025CTransPath, UNICustom15KBioGPTPathology Reports30M to 1.5B 1WSI assistanthttps://github.com/marrlab/HistoGPT
PathologyVLM[160]2025PLIP, CLIPPCaption-0.8MNALLaVAPCaption-0.5MNAMultiReport generationhttps://github.com/ddw2AIGROUP2CQUP/PA-LLaVA
MUSK[161]2025TransformerTCGA33KTransformerPubMed Central675M33Question answeringhttps://github.com/Lilab-stanford/MUSK

Starting with model development and architecture, a key trend lies in the integration of vision and language modules, as exemplified by SlideChat (Table 6)[136-161]. This model employs a dedicated vision encoder to process gigapixel WSIs and pairs it with a language model to enable multimodal conversational capabilities. It further notes that SlideChat’s integration design allows it to answer complex GI tissue pathology questions based on WSI input, achieving an overall accuracy of 81.17% on the SlideBench-VQA (TCGA) benchmark[146]. This result not only validates the effectiveness of cross-modality integration but also highlights the need for targeted parameterization and optimization. Many MLLMs in this field, including those detailed in Supplementary Table 7, undergo fine-tuning of their text-component parameters on GI-cancer-specific datasets, a process that adjusts models to better capture features like histological subtypes of gastric cancer, thereby laying a technical foundation for subsequent dataset utilization and clinical applications.

Closely tied to model advancement is the development of dataset utilization, as high-performance MLLMs rely on both diverse and specialized data sources to generalize to real-world GI cancer scenarios. On one hand, models in Table 6 leverage multi-modal datasets combining publicly available GI cancer image repositories and paired pathology reports, textual documents that detail histological features, diagnoses, and even patient clinical histories. These datasets, often containing thousands of image-text pairs, train MLLMs to establish meaningful correlations between tissue visual appearance and textual descriptions, a prerequisite for accurate clinical interpretation. On the other hand, to address unique challenges in GI pathology (such as WSI-specific analysis), specialized datasets have been developed. An example is the PathCap dataset (Supplementary Table 7), which focuses on multi-modal comprehension for pathology[142]. This dataset integrates WSI patches, associated clinical reports, and a rich collection of 207k image-caption pairs designed to simulate real-world diagnostic queries. By leveraging this multimodal dataset, researchers can train models to better understand the complex interplay between visual and textual information, thereby accelerating the translation of advanced AI techniques into actionable clinical insights.

The technical advancements in models and datasets have ultimately driven applications of MLLMs in GI cancer diagnosis and prognosis. In diagnosis, MLLMs excel at identifying distinct GI cancer types by linking histological image patterns to text-based diagnostic criteria, which notes that several models can distinguish or predict EBV or HER2-positive gastric cancer subtypes (MuMo[148] or ConcepPath[149] respectively). Beyond diagnosis, MLLMs are also advancing prognosis prediction by integrating multi-source data. They extract histological features from images and combine them with patient-specific information from text reports (e.g., tumor stage, grade, molecular markers). Findings suggest these multimodal prognostic models offer more comprehensive and accurate predictions than traditional methods relying solely on single-modality data, reflecting the synergistic progress of MLLMs across model design, data curation, and clinical translation in GI cancer pathology (e.g., CHIEF[139], PathGen[140], MuMo[148]).

Despite their progress, current MLLMs in GI cancer pathology also face distinct limitations. First, data dependence and scarcity hinder generalization, limiting a model's ability to perform well on diverse datasets due to insufficient training data. Models like PathM3 (Table 6) rely on only 991 WSIs from the PatchGastric dataset[152], while MuMo uses a mere 429 WSIs, small sample sizes that risk overfitting to specific tissue types or institutions[148], unlike larger-scale models such as PathChat (999K WSIs) which have broader but still non-representative datasets lacking diverse clinical settings[141]. Second, limited model accessibility and transparency pose barriers to widespread adoption and trust due to restricted availability and unclear operational mechanisms. Models including PRISM[158] and PathM3[152] lack open-source links, preventing independent validation by other researchers (Table 6). Even open models like CHIEF require 8 V100 GPUs (Supplementary Table 7), a resource beyond many clinical labs[139]. Finally, current models are sometimes designed for specific tasks, making them less useful for broader or more varied needs. Several models (e.g., HistGen[137], CONCH[145], FGCR[153]) focus solely on report generation, converting WSI features into text without supporting diagnostic or prognostic assistance. Only 3 out of 26 models (e.g., MUSK[161]) support question-answering for rare GI cancer subtypes. Five models (e.g., CHIEF[139], ConcepPath[149]) are explicitly non-generative, performing only basic tasks like classification and unable to address complex clinical needs such as report interpretation or treatment suggestions.

Future research on MLLMs in GI cancer pathology could improve current weaknesses by making better use of the models’ hidden potential and tackling key missing capabilities. For example, it is possible to enhance the model's ability to perform a broader range of clinical tasks, enabling it to support diverse applications such as diagnosis assistance, prognosis prediction and treatment recommendation. Second, it could enhance the diversity, quality, and clinical relevance of training data by including a broader range of patient demographics, cancer subtypes (including rare forms), disease stages, and multimodal information to ensure models generalize well across real-world clinical scenarios. Third, it could be helpful to improve the integration of these models with real-world clinical workflows by ensuring their outputs are not only accurate and interpretable but also actionable and relevant to practical needs.

DISCUSSION

This review retrospectively summarizes some key and representative studies concerning the application of FMs in GI cancer research. Given that many artificial intelligence terms (e.g., zero-shot learning, black-box problem) may not be familiar to medical researchers, we summarized Supplementary Table 8 to define key terms used in this review for improved clarity. Due to inherent limitations in literature search and screening, it is acknowledged that some studies may not have been included. Although numerous investigations have already shown that FMs have considerable potential in this domain, there are still some challenges in using them and bringing them into clinical practice. For example, medical imaging and pathology data often have different formats and standards across institutions. This makes it hard for models to work well in different settings, especially in studies done at just one center[162]. Furthermore, publication bias remains a concern, whereby studies reporting positive outcomes are preferentially published, whereas negative or inconclusive results often remain unpublished, thereby skewing the overall scientific evidence base.

The extant evidence supporting the use of FMs in GI oncology is constrained by several methodological and practical limitations. First, with respect to data privacy and security, FMs typically necessitate large-scale datasets to achieve optimal performance, which inherently increases the risk of data breaches and unauthorized access[163]. Conventional de-identification techniques are increasingly insufficient, especially when integrating multimodal data types such as imaging, genomics, and EHRs, which may facilitate re-identification. To mitigate these risks, the incorporation of privacy-preserving technologies into model development is imperative[164]. Approaches such as federated learning enable model training across multiple institutions without sharing raw data, effectively shifting the model rather than the data. Differential privacy techniques introduce controlled noise during training to safeguard individual identities, while blockchain technology offers immutable systems for tracking data access and consent. Ensuring global compliance necessitates governance frameworks aligned with regulations such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA), thereby promoting secure and ethical data utilization.

Second, regarding model interpretability and clinical trust, FMs often function as "black boxes", with limited transparency in their decision-making processes, even to developers and end-users[165]. This lack of transparency can undermine clinician and patient confidence, as clear explanations for model-driven recommendations (e.g., the rationale for classifying a polyp as malignant) are typically required. Although explainable AI (XAI) tools such as Grad-CAM (for imaging models), SHAP, and LIME exist, their application within FMs remains limited and predominantly provides correlational rather than causal insights[166]. For example, Grad-CAM can highlight regions of interest in endoscopic images but does not elucidate causal relationships, such as why a specific genetic mutation influences treatment response predictions. This discrepancy highlights a critical gap between clinical needs for causal explanations and the correlational outputs currently provided by FMs. Bridging this gap necessitates the development of clinician-centric visualization interfaces that link model predictions to specific clinical features, including polyp size or histological characteristics. Interpretability should be regarded as a core performance metric alongside accuracy and sensitivity in FM validation studies, rather than an ancillary consideration. Additionally, integrating principles from human factors engineering into FM design can ensure that explanations align with clinical workflows and cognitive demands, thereby fostering greater acceptance.

Third, respecting bias and equity, many FM training datasets predominantly originate from high-income countries and large academic centers, resulting in the underrepresentation of minority populations and low-resource settings[167]. This imbalance introduces biases that may exacerbate health disparities. For example, existing studies so far have largely focused on specific groups of patients, like those from Asia or Europe/the United States, potentially limiting model applicability to other populations. To address these issues, it is essential to actively curate diverse and representative datasets[167]. Fairness-aware training methodologies can adjust for demographic imbalances, and ongoing bias audits should be conducted post-deployment to monitor and recalibrate model performance across different subgroups.

As regards regulatory pathways, current frameworks for medical AI are inadequately suited to FMs, which differ from traditional tools in their generalizability and capacity for continuous learning from new data[168]. Regulatory pathways such as the United States FDA’s De Novo classification and 510(k) clearance have been applied to certain AI-based diagnostic tools, such as the FDA-approved Paige Prostate software for identifying cancer cells in prostate pathology images[109]. However, FMs, which can be adapted for multiple tasks (e.g., CRC detection, chemotherapy response prediction, and high-risk patient identification), do not conform to these static, task-specific approval models. Consequently, novel regulatory paradigms are required. Regulatory sandboxes may facilitate controlled pilot testing in real-world environments, while robust post-market surveillance could become standard practice to monitor long-term safety and efficacy. Ethical and legal challenges also warrant consideration[169]. For example, if a FM makes a mistake in diagnosing GI cancer, it could lead to the wrong treatment. It's not clear who should be held responsible in such cases: the doctor, the model provider, or the patient. Until we have clear rules for this, it's hard to balance risks and benefits for patients.

Finally, in regard to clinical validation and real-world deployment, most FM studies remain confined to technical validation phases, demonstrating high accuracy under controlled conditions[170]. However, such findings do not necessarily translate into clinical utility, defined by improvements in diagnosis, treatment decision-making, or patient outcomes. Operational feasibility, including seamless integration into existing clinical workflows without imposing additional burdens on healthcare providers, is infrequently evaluated. Moreover, cost-effectiveness analyses, such as whether FMs predicting chemotherapy response reduce unnecessary treatment expenditures, are scarce. Addressing these gaps requires rigorous, multicenter, prospective randomized controlled trials. Implementation science research should investigate FM performance across diverse healthcare systems and resource settings. Enhancing transparency through the establishment of public clinical trial registries, where study protocols, data, and outcomes are openly accessible, is also advocated.

CONCLUSION

In summary, FMs possess transformative potential for GI cancer care, ranging from facilitating early detection to enabling personalized therapeutic strategies. Nonetheless, technological advancements alone are insufficient for successful clinical translation. Addressing technical limitations alongside ethical, regulatory, and equity-related challenges is imperative. The future role of FMs in GI oncology is not to supplant clinicians but to augment precision medicine. It's important to recognize that both presently and prospectively, FMs and related tools will not replace endoscopists, radiologists, or pathologists. The main role of models lies in providing professional analytical support, while the final diagnosis and treatment decisions will still be led by clinicians. This partnership between humans and machines will continue to be key to helping patients.

Footnotes

Provenance and peer review: Invited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Gastroenterology and hepatology

Country of origin: China

Peer-review report’s classification

Scientific Quality: Grade B, Grade B, Grade B

Novelty: Grade B, Grade B, Grade C

Creativity or Innovation: Grade B, Grade C, Grade C

Scientific Significance: Grade B, Grade B, Grade C

P-Reviewer: Guo TH, MD, PhD, Researcher, China; Ma X, MD, China; Tlaiss Y, MD, Lebanon S-Editor: Li L L-Editor: A P-Editor: Zhao YQ

References
1.  Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74:229-263.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 5690]  [Cited by in RCA: 10818]  [Article Influence: 10818.0]  [Reference Citation Analysis (3)]
2.  Bordry N, Astaras C, Ongaro M, Goossens N, Frossard JL, Koessler T. Recent advances in gastrointestinal cancers. World J Gastroenterol. 2021;27:4493-4503.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in CrossRef: 10]  [Cited by in RCA: 21]  [Article Influence: 5.3]  [Reference Citation Analysis (0)]
3.  Lipkova J, Kather JN. The age of foundation models. Nat Rev Clin Oncol. 2024;21:769-770.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 9]  [Reference Citation Analysis (0)]
4.  Tsang KK, Kivelson S, Acitores Cortina JM, Kuchi A, Berkowitz JS, Liu H, Srinivasan A, Friedrich NA, Fatapour Y, Tatonetti NP. Foundation Models for Translational Cancer Biology. Annu Rev Biomed Data Sci. 2025;8:51-80.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 3]  [Article Influence: 3.0]  [Reference Citation Analysis (0)]
5.  Moor M, Banerjee O, Abad ZSH, Krumholz HM, Leskovec J, Topol EJ, Rajpurkar P. Foundation models for generalist medical artificial intelligence. Nature. 2023;616:259-265.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 105]  [Cited by in RCA: 610]  [Article Influence: 305.0]  [Reference Citation Analysis (0)]
6.  Zeng R, Gou H, Lau HCH, Yu J. Stomach microbiota in gastric cancer development and clinical implications. Gut. 2024;73:2062-2073.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 6]  [Cited by in RCA: 59]  [Article Influence: 59.0]  [Reference Citation Analysis (0)]
7.  Cao JS, Lu ZY, Chen MY, Zhang B, Juengpanich S, Hu JH, Li SJ, Topatana W, Zhou XY, Feng X, Shen JL, Liu Y, Cai XJ. Artificial intelligence in gastroenterology and hepatology: Status and challenges. World J Gastroenterol. 2021;27:1664-1690.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in CrossRef: 16]  [Cited by in RCA: 20]  [Article Influence: 5.0]  [Reference Citation Analysis (1)]
8.  Kröner PT, Engels MM, Glicksberg BS, Johnson KW, Mzaik O, van Hooft JE, Wallace MB, El-Serag HB, Krittanawong C. Artificial intelligence in gastroenterology: A state-of-the-art review. World J Gastroenterol. 2021;27:6794-6824.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in CrossRef: 28]  [Cited by in RCA: 101]  [Article Influence: 25.3]  [Reference Citation Analysis (7)]
9.  Chen RJ, Ding T, Lu MY, Williamson DFK, Jaume G, Song AH, Chen B, Zhang A, Shao D, Shaban M, Williams M, Oldenburg L, Weishaupt LL, Wang JJ, Vaidya A, Le LP, Gerber G, Sahai S, Williams W, Mahmood F. Towards a general-purpose foundation model for computational pathology. Nat Med. 2024;30:850-862.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 660]  [Cited by in RCA: 382]  [Article Influence: 382.0]  [Reference Citation Analysis (0)]
10.  Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G, Zhang K, Ji C, Yan Q, He L, Peng H, Li J, Wu J, Liu Z, Xie P, Xiong C, Pei J, Yu PS, Sun L. A comprehensive survey on pretrained foundation models: a history from BERT to ChatGPT. Int J Mach Learn Cyber.  2024.  [PubMed]  [DOI]  [Full Text]
11.  Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buch S, Card D, Castellon R, Chatterji N, Chen A, Creel K, Quincy Davis J, Demszky D, Donahue C, Doumbouya M, Durmus E, Ermon S, Etchemendy J, Ethayarajh K, Fei-Fei L, Finn C, Gale T, Gillespie L, Goel K, Goodman N, Grossman S, Guha N, Hashimoto T, Henderson P, Hewitt J, Ho DE, Hong J, Hsu K, Huang J, Icard T, Jain S, Jurafsky D, Kalluri P, Karamcheti S, Keeling G, Khani F, Khattab O, Koh PW, Krass M, Krishna R, Kuditipudi R, Kumar A, Ladhak F, Lee M, Lee T, Leskovec J, Levent I, Li XL, Li X, Ma T, Malik A, Manning CD, Mirchandani S, Mitchell E, Munyikwa Z, Nair S, Narayan A, Narayanan D, Newman B, Nie A, Niebles JC, Nilforoshan H, Nyarko J, Ogut G, Orr L, Papadimitriou I, Park JS, Piech C, Portelance E, Potts C, Raghunathan A, Reich R, Ren H, Rong F, Roohani Y, Ruiz C, Ryan J, Ré C, Sadigh D, Sagawa S, Santhanam K, Shih A, Srinivasan K, Tamkin A, Taori R, Thomas AW, Tramèr F, Wang RE, Wang W, Wu B, Wu J, Wu Y, Xie SM, Yasunaga M, You J, Zaharia M, Zhang M, Zhang T, Zhang X, Zhang Y, Zheng L, Zhou K, Liang P.   On the Opportunities and Risks of Foundation Models. 2022 Preprint. Available from: arXiv:2108.07258.  [PubMed]  [DOI]  [Full Text]
12.  Turing AM. I.—Computing Machinery And Intelligence. Mind. 1950;LIX:433-460.  [PubMed]  [DOI]  [Full Text]
13.  McCarthy J, Minsky ML, Rochester N, Shannon CE. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence: August 31, 1955. AI Mag. 1955;27:12-14.  [PubMed]  [DOI]  [Full Text]
14.  Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65:386-408.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 4749]  [Cited by in RCA: 2150]  [Article Influence: 32.1]  [Reference Citation Analysis (0)]
15.  LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-444.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 36149]  [Cited by in RCA: 20727]  [Article Influence: 2072.7]  [Reference Citation Analysis (0)]
16.  Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I.   Attention is all you need. In: NIPS'17. Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017; Long Beach, CA, United States. Red Hook, NY, United States: Curran Associates Inc., 2017: 6000-6010.  [PubMed]  [DOI]
17.  Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D.   Language models are few-shot learners. In: NIPS '20. Proceedings of the 34th International Conference on Neural Information Processing Systems; 2020; Vancouver, BC, Canada. Red Hook, NY, United States: Curran Associates Inc., 2020: 25.  [PubMed]  [DOI]
18.  Devlin J, Chang M, Lee K, Toutanova K.   BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, MN: Association for Computational Linguistics, 2019: 4171-4186.  [PubMed]  [DOI]  [Full Text]
19.  Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Leoni Aleman F, Almeida D, Altenschmidt J, Altman S, Anadkat S, Avila R, Babuschkin I, Balaji S, Balcom V, Baltescu P, Bao H, Bavarian M, Belgum J, Bello I, Berdine J, Bernadett-Shapiro G, Berner C, Bogdonoff L, Boiko O, Boyd M, Brakman A-L, Brockman G, Brooks T, Brundage M, Button K, Cai T, Campbell R, Cann A, Carey B, Carlson C, Carmichael R, Chan B, Chang C, Chantzis F, Chen D, Chen S, Chen R, Chen J, Chen M, Chess B, Cho C, Chu C, Chung HW, Cummings D, Currier J, Dai Y, Decareaux C, Degry T, Deutsch N, Deville D, Dhar A, Dohan D, Dowling S, Dunning S, Ecoffet A, Eleti A, Eloundou T, Farhi D, Fedus L, Felix N, Posada Fishman S, Forte J, Fulford I, Gao L, Georges E, Gibson C, Goel V, Gogineni T, Goh G, Gontijo-Lopes R, Gordon J, Grafstein M, Gray S, Greene R, Gross J, Gu SS, Guo Y, Hallacy C, Han J, Harris J, He Y, Heaton M, Heidecke J, Hesse C, Hickey A, Hickey W, Hoeschele P, Houghton B, Hsu K, Hu S, Hu X, Huizinga J, Jain S, Jain S, Jang J, Jiang A, Jiang R, Jin H, Jin D, Jomoto S, Jonn B, Jun H, Kaftan T, Kaiser Ł, Kamali A, Kanitscheider I, Shirish Keskar N, Khan T, Kilpatrick L, Kim JW, Kim C, Kim Y, Hendrik Kirchner J, Kiros J, Knight M, Kokotajlo D, Kondraciuk Ł, Kondrich A, Konstantinidis A, Kosic K, Krueger G, Kuo V, Lampe M, Lan I, Lee T, Leike J, Leung J, Levy D, Li CM, Lim R, Lin M, Lin S, Litwin M, Lopez T, Lowe R, Lue P, Makanju A, Malfacini K, Manning S, Markov T, Markovski Y, Martin B, Mayer K, Mayne A, McGrew B, McKinney SM, McLeavey C, McMillan P, McNeil J, Medina D, Mehta A, Menick J, Metz L, Mishchenko A, Mishkin P, Monaco V, Morikawa E, Mossing D, Mu T, Murati M, Murk O, Mély D, Nair A, Nakano R, Nayak R, Neelakantan A, Ngo R, Noh H, Ouyang L, O'Keefe C, Pachocki J, Paino A, Palermo J, Pantuliano A, Parascandolo G, Parish J, Parparita E, Passos A, Pavlov M, Peng A, Perelman A, de Avila Belbute Peres F, Petrov M, Ponde de Oliveira Pinto H, Michael, Pokorny, Pokrass M, Pong VH, Powell T, Power A, Power B, Proehl E, Puri R, Radford A; OpenAI.   GPT-4 Technical Report. 2023 Preprint. Available from: eprint arXiv:2303.08774.  [PubMed]  [DOI]  [Full Text]
20.  Guo D, Yang D, Zhang H, Song J, Zhang R, Xu R, Zhu Q, Ma S, Wang P, Bi X, Zhang X, Yu X, Wu Y, Wu ZF, Gou Z, Shao Z, Li Z, Gao Z, Liu A, Xue B, Wang B, Wu B, Feng B, Lu C, Zhao C, Deng C, Zhang C, Ruan C, Dai D, Chen D, Ji D, Li E, Lin F, Dai F, Luo F, Hao G, Chen G, Li G, Zhang H, Bao H, Xu H, Wang H, Ding H, Xin H, Gao H, Qu H, Li H, Guo J, Li J, Wang J, Chen J, Yuan J, Qiu J, Li J, Cai JL, Ni J, Liang J, Chen J, Dong K, Hu K, Gao K, Guan K, Huang K, Yu K, Wang L, Zhang L, Zhao L, Wang L, Zhang L, Xu L, Xia L, Zhang M, Zhang M, Tang M, Li M, Wang M, Li M, Tian N, Huang P, Zhang P, Wang Q, Chen Q, Du Q, Ge R, Zhang R, Pan R, Wang R, Chen RJ, Jin RL, Chen R, Lu S, Zhou S, Chen S, Ye S, Wang S, Yu S, Zhou S, Pan S, Li SS, Zhou S, Wu S, Ye S, Yun T, Pei T, Sun T, Wang T, Zeng W, Zhao W, Liu W, Liang W, Gao W, Yu W, Zhang W, Xiao WL, An W, Liu X, Wang X, Chen X, Nie X, Cheng X, Liu X, Xie X, Liu X, Yang X, Li X, Su X, Lin X, Li XQ, Jin X, Shen X, Chen X, Sun X, Wang X, Song X, Zhou X, Wang X, Shan X, Li YK, Wang YQ, Wei YX, Zhang Y, Xu Y, Li Y, Zhao Y, Sun Y, Wang Y, Yu Y, Zhang Y, Shi Y, Xiong Y, He Y, Piao Y, Wang Y, Tan Y, Ma Y, Liu Y, Guo Y, Ou Y, Wang Y, Gong Y, Zou Y, He Y, Xiong Y, Luo Y, You Y, Liu Y, Zhou Y, Zhu YX, Xu Y, Huang Y, Li Y, Zheng Y, Zhu Y, Ma Y, Tang Y, Zha Y, Yan Y, Ren ZZ, Ren Z, Sha Z, Fu Z, Xu Z, Xie Z, Zhang Z, Hao Z, Ma Z, Yan Z, Wu Z, Gu Z, Zhu Z, Liu Z, Li Z, Xie Z, Song Z, Pan Z, Huang Z, Xu Z, Zhang Z, Zhang Z; DeepSeek-AI.   DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. 2025 Preprint. Available from: eprint arXiv:2501.12948.  [PubMed]  [DOI]  [Full Text]
21.  Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I.   Learning Transferable Visual Models From Natural Language Supervision. In: Meila M, Zhang T, editors. Proceedings of Machine Learning Research. Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021: 8748-8763.  [PubMed]  [DOI]
22.  Pai S, Bontempi D, Hadzic I, Prudente V, Sokač M, Chaunzwa TL, Bernatz S, Hosny A, Mak RH, Birkbak NJ, Aerts HJWL. Foundation model for cancer imaging biomarkers. Nat Mach Intell. 2024;6:354-367.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 5]  [Cited by in RCA: 70]  [Article Influence: 70.0]  [Reference Citation Analysis (0)]
23.  Shen D, Wu G, Suk HI. Deep Learning in Medical Image Analysis. Annu Rev Biomed Eng. 2017;19:221-248.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2581]  [Cited by in RCA: 2033]  [Article Influence: 254.1]  [Reference Citation Analysis (0)]
24.  Alsentzer E, Murphy J, Boag W, Weng W, Jindi D, Naumann T, Mcdermott M.   Publicly Available Clinical BERT Embeddings. In: Rumshisky A, Roberts K, Bethard S, Naumann T, editors. Proceedings of the 2nd Clinical Natural Language Processing Workshop. Minneapolis, MN, United States: Association for Computational Linguistics, 2019: 72-78.  [PubMed]  [DOI]  [Full Text]
25.  Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N.   An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 9th International Conference on Learning Representations. Austria: ICLR, 2021.  [PubMed]  [DOI]
26.  Zhou B, Yang G, Shi Z, Ma S. Natural Language Processing for Smart Healthcare. IEEE Rev Biomed Eng. 2024;17:4-18.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 58]  [Cited by in RCA: 34]  [Article Influence: 34.0]  [Reference Citation Analysis (0)]
27.  Hou JK, Imler TD, Imperiale TF. Current and future applications of natural language processing in the field of digestive diseases. Clin Gastroenterol Hepatol. 2014;12:1257-1261.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 22]  [Cited by in RCA: 26]  [Article Influence: 2.4]  [Reference Citation Analysis (0)]
28.  Team G, Anil R, Borgeaud S, Alayrac JB, Yu J, Soricut R, Schalkwyk J, Dai AM, Hauth A, Millican K, Silver D, Johnson M, Antonoglou I, Schrittwieser J, Glaese A, Chen J, Pitler E, Lillicrap T, Lazaridou A, Firat O, Molloy J, Isard M, Barham PR, Hennigan T, Lee B, Viola F, Reynolds M, Xu Y, Doherty R, Collins E, Meyer C, Rutherford E, Moreira E, Ayoub K, Goel M, Krawczyk J, Du C, Chi E, Cheng H-T, Ni E, Shah P, Kane P, Chan B, Faruqui M, Severyn A, Lin H, Li Y, Cheng Y, Ittycheriah A, Mahdieh M, Chen M, Sun P, Tran D, Bagri S, Lakshminarayanan B, Liu J, Orban A, Güra F, Zhou H, Song X, Boffy A, Ganapathy H, Zheng S, Choe H, Weisz Á, Zhu T, Lu Y, Gopal S, Kahn J, Kula M, Pitman J, Shah R, Taropa E, Al Merey M, Baeuml M, Chen Z, El Shafey L, Zhang Y, Sercinoglu O, Tucker G, Piqueras E, Krikun M, Barr I, Savinov N, Danihelka I, Roelofs B, White A, Andreassen A, von Glehn T, Yagati L, Kazemi M, Gonzalez L, Khalman M, Sygnowski J, Frechette A, Smith C, Culp L, Proleev L, Luan Y, Chen X, Lottes J, Schucher N, Lebron F, Rrustemi A, Clay N, Crone P, Kocisky T, Zhao J, Perz B, Yu D, Howard H, Bloniarz A, Rae JW, Lu H, Sifre L, Maggioni M, Alcober F, Garrette D, Barnes M, Thakoor S, Austin J, Barth-Maron G, Wong W, Joshi R, Chaabouni R, Fatiha D, Ahuja A, Singh Tomar G, Senter E, Chadwick M, Kornakov I, Attaluri N, Iturrate I, Liu R, Li Y, Cogan S, Chen J, Jia C, Gu C, Zhang Q, Grimstad J, Jakse Hartman A, Garcia X, Sankaranarayana Pillai T, Devlin J, Laskin M, de Las Casas D, Valter D, Tao C, Blanco L, Puigdomènech Badia A, Reitter D, Chen M, Brennan J, Rivera C, Brin S, Iqbal S, Surita G, Labanowski J, Rao A, Winkler S, Parisotto E, Gu Y, Olszewska K, Addanki R, Miech A, Louis A, Teplyashin D, Brown G, Catt E, Balaguer J, Xiang J, Wang P, Ashwood Z, Briukhov A, Webson A, Ganapathy S, Sanghavi S, Kannan A, Chang M-W, Stjerngren A, Djolonga J, Sun Y, Bapna A, Aitchison M, Pejman P, Michalewski H, Yu T, Wang C, Love J, Ahn J, Bloxwich D, Han K, Humphreys P, Sellam T, Bradbury J, Godbole V, Samangooei S, Damoc B, Kaskasoli A.   Gemini: A Family of Highly Capable Multimodal Models. 2023 Preprint. Available from: eprint arXiv:2312.11805.  [PubMed]  [DOI]  [Full Text]
29.  Syed S, Angel AJ, Syeda HB, Jennings CF, VanScoy J, Syed M, Greer M, Bhattacharyya S, Zozus M, Tharian B, Prior F. The h-ANN Model: Comprehensive Colonoscopy Concept Compilation Using Combined Contextual Embeddings. Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022;5:189-200.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 7]  [Article Influence: 2.3]  [Reference Citation Analysis (0)]
30.  Lahat A, Shachar E, Avidan B, Glicksberg B, Klang E. Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet? Diagnostics (Basel). 2023;13:1950.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 65]  [Reference Citation Analysis (0)]
31.  Lee TC, Staller K, Botoman V, Pathipati MP, Varma S, Kuo B. ChatGPT Answers Common Patient Questions About Colonoscopy. Gastroenterology. 2023;165:509-511.e7.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 55]  [Cited by in RCA: 100]  [Article Influence: 50.0]  [Reference Citation Analysis (0)]
32.  Emile SH, Horesh N, Freund M, Pellino G, Oliveira L, Wignakumar A, Wexner SD. How appropriate are answers of online chat-based artificial intelligence (ChatGPT) to common questions on colon cancer? Surgery. 2023;174:1273-1275.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 35]  [Cited by in RCA: 35]  [Article Influence: 17.5]  [Reference Citation Analysis (0)]
33.  Moazzam Z, Cloyd J, Lima HA, Pawlik TM. Quality of ChatGPT Responses to Questions Related to Pancreatic Cancer and its Surgical Care. Ann Surg Oncol. 2023;30:6284-6286.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 3]  [Cited by in RCA: 20]  [Article Influence: 10.0]  [Reference Citation Analysis (0)]
34.  Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, Ayoub W, Yang JD, Liran O, Spiegel B, Kuo A. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023;29:721-732.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 177]  [Cited by in RCA: 370]  [Article Influence: 185.0]  [Reference Citation Analysis (0)]
35.  Cao JJ, Kwon DH, Ghaziani TT, Kwo P, Tse G, Kesselman A, Kamaya A, Tse JR. Accuracy of Information Provided by ChatGPT Regarding Liver Cancer Surveillance and Diagnosis. AJR Am J Roentgenol. 2023;221:556-559.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 16]  [Cited by in RCA: 46]  [Article Influence: 23.0]  [Reference Citation Analysis (0)]
36.  Gorelik Y, Ghersin I, Arraf T, Ben-ishai O, Klein A, Khamaysi I. Using A Customized Gpt To Provide Guideline-Based Recommendations For The Management Of Pancreatic Mucinous Cystic Lesions. Gastrointest Endosc. 2024;99:AB42.  [PubMed]  [DOI]  [Full Text]
37.  Gorelik Y, Ghersin I, Maza I, Klein A. Harnessing language models for streamlined postcolonoscopy patient management: a novel approach. Gastrointest Endosc. 2023;98:639-641.e4.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 29]  [Article Influence: 14.5]  [Reference Citation Analysis (0)]
38.  Zhou J, Li T, James Fong S, Dey N, González Crespo R. Exploring ChatGPT's Potential for Consultation, Recommendations and Report Diagnosis: Gastric Cancer and Gastroscopy Reports’ Case. Int J Interact Multimed Artif Intell. 2023;8:7-13.  [PubMed]  [DOI]  [Full Text]
39.  Yang Z, Lu Y, Bagdasarian J, Das Swain V, Agarwal R, Campbell C, Al-Refaire W, El-Bayoumi J, Gao G, Wang D, Yao B, Shara N.   RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care. 2025 Preprint. Available from: eprint arXiv:2502.05740.  [PubMed]  [DOI]  [Full Text]
40.  Kerbage A, Kassab J, El Dahdah J, Burke CA, Achkar JP, Rouphael C. Accuracy of ChatGPT in Common Gastrointestinal Diseases: Impact for Patients and Providers. Clin Gastroenterol Hepatol. 2024;22:1323-1325.e3.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 6]  [Cited by in RCA: 36]  [Article Influence: 36.0]  [Reference Citation Analysis (0)]
41.  Tariq R, Malik S, Khanna S. Evolving Landscape of Large Language Models: An Evaluation of ChatGPT and Bard in Answering Patient Queries on Colonoscopy. Gastroenterology. 2024;166:220-221.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 26]  [Cited by in RCA: 23]  [Article Influence: 23.0]  [Reference Citation Analysis (0)]
42.  Maida M, Ramai D, Mori Y, Dinis-Ribeiro M, Facciorusso A, Hassan C; and the AI-CORE (Artificial Intelligence COlorectal cancer Research) Working Group. The role of generative language systems in increasing patient awareness of colon cancer screening. Endoscopy. 2025;57:262-268.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 4]  [Cited by in RCA: 11]  [Article Influence: 11.0]  [Reference Citation Analysis (0)]
43.  Atarere J, Naqvi H, Haas C, Adewunmi C, Bandaru S, Allamneni R, Ugonabo O, Egbo O, Umoren M, Kanth P. Applicability of Online Chat-Based Artificial Intelligence Models to Colorectal Cancer Screening. Dig Dis Sci. 2024;69:791-797.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 17]  [Reference Citation Analysis (0)]
44.  Chang PW, Amini MM, Davis RO, Nguyen DD, Dodge JL, Lee H, Sheibani S, Phan J, Buxbaum JL, Sahakian AB. ChatGPT4 Outperforms Endoscopists for Determination of Postcolonoscopy Rescreening and Surveillance Recommendations. Clin Gastroenterol Hepatol. 2024;22:1917-1925.e17.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 13]  [Article Influence: 13.0]  [Reference Citation Analysis (0)]
45.  Lim DYZ, Tan YB, Koh JTE, Tung JYM, Sng GGR, Tan DMY, Tan CK. ChatGPT on guidelines: Providing contextual knowledge to GPT allows it to provide advice on appropriate colonoscopy intervals. J Gastroenterol Hepatol. 2024;39:81-106.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 34]  [Article Influence: 34.0]  [Reference Citation Analysis (0)]
46.  Munir MM, Endo Y, Ejaz A, Dillhoff M, Cloyd JM, Pawlik TM. Online artificial intelligence platforms and their applicability to gastrointestinal surgical operations. J Gastrointest Surg. 2024;28:64-69.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 14]  [Reference Citation Analysis (0)]
47.  Truhn D, Loeffler CM, Müller-Franzes G, Nebelung S, Hewitt KJ, Brandner S, Bressem KK, Foersch S, Kather JN. Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4). J Pathol. 2024;262:310-319.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 45]  [Article Influence: 45.0]  [Reference Citation Analysis (0)]
48.  Choo JM, Ryu HS, Kim JS, Cheong JY, Baek SJ, Kwak JM, Kim J. Conversational artificial intelligence (chatGPT™) in the management of complex colorectal cancer patients: early experience. ANZ J Surg. 2024;94:356-361.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 6]  [Cited by in RCA: 24]  [Article Influence: 24.0]  [Reference Citation Analysis (0)]
49.  Huo B, Mckechnie T, Ortenzi M, Lee Y, Antoniou S, Mayol J, Ahmed H, Boudreau V, Ramji K, Eskicioglu C. Dr. GPT will see you now: the ability of large language model-linked chatbots to provide colorectal cancer screening recommendations. Health Technol. 2024;14:463-469.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 6]  [Cited by in RCA: 15]  [Article Influence: 15.0]  [Reference Citation Analysis (0)]
50.  Pereyra L, Schlottmann F, Steinberg L, Lasa J. Colorectal Cancer Prevention: Is Chat Generative Pretrained Transformer (Chat GPT) ready to Assist Physicians in Determining Appropriate Screening and Surveillance Recommendations? J Clin Gastroenterol. 2024;58:1022-1027.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 9]  [Article Influence: 9.0]  [Reference Citation Analysis (0)]
51.  Peng W, Feng Y, Yao C, Zhang S, Zhuo H, Qiu T, Zhang Y, Tang J, Gu Y, Sun Y. Evaluating AI in medicine: a comparative analysis of expert and ChatGPT responses to colorectal cancer questions. Sci Rep. 2024;14:2840.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 21]  [Reference Citation Analysis (0)]
52.  Ma H, Ma X, Yang C, Niu Q, Gao T, Liu C, Chen Y. Development and evaluation of a program based on a generative pre-trained transformer model from a public natural language processing platform for efficiency enhancement in post-procedural quality control of esophageal endoscopic submucosal dissection. Surg Endosc. 2024;38:1264-1272.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 3]  [Article Influence: 3.0]  [Reference Citation Analysis (0)]
53.  Cohen AB, Adamson B, Larch JK, Amster G. Large Language Model Extraction of PD-L1 Biomarker Testing Details From Electronic Health Records. AI Precis Oncol. 2025;2:57-64.  [PubMed]  [DOI]  [Full Text]
54.  Scherbakov D, Heider PM, Wehbe R, Alekseyenko AV, Lenert LA, Obeid JS. Using large language models for extracting stressful life events to assess their impact on preventive colon cancer screening adherence. BMC Public Health. 2025;25:12.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 4]  [Reference Citation Analysis (0)]
55.  Chatziisaak D, Burri P, Sparn M, Hahnloser D, Steffen T, Bischofberger S. Concordance of ChatGPT artificial intelligence decision-making in colorectal cancer multidisciplinary meetings: retrospective study. BJS Open. 2025;9:zraf040.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 6]  [Reference Citation Analysis (0)]
56.  Saraiva MM, Ribeiro T, Agudo B, Afonso J, Mendes F, Martins M, Cardoso P, Mota J, Almeida MJ, Costa A, Gonzalez Haba Ruiz M, Widmer J, Moura E, Javed A, Manzione T, Nadal S, Barroso LF, de Parades V, Ferreira J, Macedo G. Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology. J Clin Med. 2025;14:572.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 3]  [Reference Citation Analysis (0)]
57.  Siu AHY, Gibson DP, Chiu C, Kwok A, Irwin M, Christie A, Koh CE, Keshava A, Reece M, Suen M, Rickard MJFX. ChatGPT as a patient education tool in colorectal cancer-An in-depth assessment of efficacy, quality and readability. Colorectal Dis. 2025;27:e17267.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 5]  [Reference Citation Analysis (0)]
58.  Horesh N, Emile SH, Gupta S, Garoufalia Z, Gefen R, Zhou P, da Silva G, Wexner SD. Comparing the Management Recommendations of Large Language Model and Colorectal Cancer Multidisciplinary Team: A Pilot Study. Dis Colon Rectum. 2025;68:41-47.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 6]  [Article Influence: 6.0]  [Reference Citation Analysis (0)]
59.  Ellison IE, Oslock WM, Abdullah A, Wood L, Thirumalai M, English N, Jones BA, Hollis R, Rubyan M, Chu DI. De novo generation of colorectal patient educational materials using large language models: Prompt engineering key to improved readability. Surgery. 2025;180:109024.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 4]  [Article Influence: 4.0]  [Reference Citation Analysis (0)]
60.  Ramchandani R, Guo E, Rakab E, Rathod J, Strain J, Klement W, Shorr R, Williams E, Jones D, Gilbert S. Validation of automated paper screening for esophagectomy systematic review using large language models. PeerJ Comput Sci. 2025;11:e2822.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
61.  Zhang H, Dong F, Li W, Ren Y, Dong H.   HepatoAudit: A Comprehensive Dataset for Evaluating Consistency of Large Language Models in Hepatobiliary Case Record Diagnosis. 2025 IEEE 17th International Conference on Computer Research and Development (ICCRD); 2025 Jan 17-19; Shangrao, China. IEEE, 2025: 234-239.  [PubMed]  [DOI]  [Full Text]
62.  Spitzl D, Mergen M, Bauer U, Jungmann F, Bressem KK, Busch F, Makowski MR, Adams LC, Gassert FT. Leveraging large language models for accurate classification of liver lesions from MRI reports. Comput Struct Biotechnol J. 2025;27:2139-2146.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 2]  [Cited by in RCA: 4]  [Article Influence: 4.0]  [Reference Citation Analysis (0)]
63.  Sheng L, Chen Y, Wei H, Che F, Wu Y, Qin Q, Yang C, Wang Y, Peng J, Bashir MR, Ronot M, Song B, Jiang H. Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists. Liver Int. 2025;45:e70115.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 2]  [Reference Citation Analysis (0)]
64.  Williams CY, Sarkar U, Adler-milstein J, Rotenstein L.   Using Large Language Models to Determine Reasons for Missed Colon Cancer Screening Follow-Up. 2025 Preprint. Available from: medrxiv:25329439.  [PubMed]  [DOI]  [Full Text]
65.  Lu K, Lu J, Xu H, Guo K, Zhang Q, Lin H, Grosser M, Zhang Y, Zhang G. Genomics-Enhanced Cancer Risk Prediction for Personalized LLM-Driven Healthcare Recommender Systems. ACM Trans Inf Syst. 2025;43:1-30.  [PubMed]  [DOI]  [Full Text]
66.  Yang X, Xiao Y, Liu D, Zhang Y, Deng H, Huang J, Shi H, Liu D, Liang M, Jin X, Sun Y, Yao J, Zhou X, Guo W, He Y, Tang W, Xu C. Enhancing doctor-patient communication using large language models for pathology report interpretation. BMC Med Inform Decis Mak. 2025;25:36.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 3]  [Cited by in RCA: 8]  [Article Influence: 8.0]  [Reference Citation Analysis (0)]
67.  Jain S, Chakraborty B, Agarwal A, Sharma R. Performance of Large Language Models (ChatGPT and Gemini Advanced) in Gastrointestinal Pathology and Clinical Review of Applications in Gastroenterology. Cureus. 2025;17:e81618.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 2]  [Reference Citation Analysis (0)]
68.  Xu J, Wang J, Li J, Zhu Z, Fu X, Cai W, Song R, Wang T, Li H. Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts. J Med Syst. 2025;49:64.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 2]  [Reference Citation Analysis (0)]
69.  Deroy A, Maity S.   Cancer-Answer: Empowering Cancer Care with Advanced Large Language Models. 2025 Preprint. Available from: eprint arXiv:2411.06946.  [PubMed]  [DOI]  [Full Text]
70.  Ye X, Shi T, Huang D, Sakurai T. Multi-Omics clustering by integrating clinical features from large language model. Methods. 2025;239:64-71.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
71.  Ma J, He Y, Li F, Han L, You C, Wang B. Segment anything in medical images. Nat Commun. 2024;15:654.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 267]  [Cited by in RCA: 530]  [Article Influence: 530.0]  [Reference Citation Analysis (0)]
72.  Ryu JS, Kang H, Chu Y, Yang S. Vision-language foundation models for medical imaging: a review of current practices and innovations. Biomed Eng Lett. 2025;15:809-830.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 2]  [Reference Citation Analysis (0)]
73.  Rao VM, Hla M, Moor M, Adithan S, Kwak S, Topol EJ, Rajpurkar P. Multimodal generative AI for medical image interpretation. Nature. 2025;639:888-896.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 18]  [Article Influence: 18.0]  [Reference Citation Analysis (0)]
74.  Zhang S, Xu Y, Usuyama N, Xu H, Bagga J, Tinn R, Preston S, Rao R, Wei M, Valluri N, Wong C, Tupini A, Wang Y, Mazzola M, Shukla S, Liden L, Gao J, Crabtree A, Piening B, Bifulco C, Lungren MP, Naumann T, Wang S, Poon H. A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image-Text Pairs. NEJM AI. 2025;2.  [PubMed]  [DOI]  [Full Text]
75.  Zippelius C, Alqahtani SA, Schedel J, Brookman-Amissah D, Muehlenberg K, Federle C, Salzberger A, Schorr W, Pech O. Diagnostic accuracy of a novel artificial intelligence system for adenoma detection in daily practice: a prospective nonrandomized comparative study. Endoscopy. 2022;54:465-472.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 8]  [Cited by in RCA: 20]  [Article Influence: 6.7]  [Reference Citation Analysis (0)]
76.  Cui B, Islam M, Bai L, Ren H. Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery. Int J Comput Assist Radiol Surg. 2024;19:1013-1020.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 6]  [Article Influence: 6.0]  [Reference Citation Analysis (0)]
77.  Wang J, Song S, Wang X, Wang Y, Miao Y, Su J, Zhou SK.   ProMISe: Promptable Medical Image Segmentation using SAM. 2024 Preprint. Available from: eprint arXiv:2403.04164.  [PubMed]  [DOI]  [Full Text]
78.  Li Y, Hu M, Yang X.   Polyp-SAM: transfer SAM for polyp segmentation. Proceedings of the Medical Imaging 2024: Computer-Aided Diagnosis; 2024 Feb 18-22; San Diego, CA, United States. SPIE, 2024: 759-765.  [PubMed]  [DOI]  [Full Text]
79.  Wang Z, Liu C, Zhang S, Dou Q.   Foundation Model for Endoscopy Video Analysis via Large-Scale Self-supervised Pre-train. In: Greenspan H, Madabhushi A, Mousavi P, Salcudean S, Duncan J, Syeda-Mahmood T, Taylor R, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14228. Cham: Springer, 2023.  [PubMed]  [DOI]  [Full Text]
80.  Ji GP, Liu J, Xu P, Barnes N, Shahbaz Khan F, Khan S, Fan DP.   Frontiers in Intelligent Colonoscopy. 2024 Preprint. Available from: eprint arXiv:2410.17241.  [PubMed]  [DOI]  [Full Text]
81.  Raseena TP, Kumar J, Balasundaram SR. DeepCPD: deep learning with vision transformer for colorectal polyp detection. Multimed Tools Appl. 2024;83:78183-78206.  [PubMed]  [DOI]  [Full Text]
82.  Teufel T, Shu H, Soberanis-Mukul RD, Mangulabnan JE, Sahu M, Vedula SS, Ishii M, Hager G, Taylor RH, Unberath M. OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point. Int J Comput Assist Radiol Surg. 2024;19:1259-1266.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 4]  [Reference Citation Analysis (0)]
83.  Liu Y, Yuan X, Zhou Y.   EIVS: Unpaired Endoscopy Image Virtual Staining via State Space Generative Model. 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2024 Dec 03-06; Lisbon, Portugal. IEEE, 2025.  [PubMed]  [DOI]  [Full Text]
84.  Jing X, Zhou H, Mao K, Zhao Y, Chu L.   A Novel Automatic Prompt Tuning Method for Polyp Segmentation. 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2024 Dec 03-06; Lisbon, Portugal. IEEE, 2025.  [PubMed]  [DOI]  [Full Text]
85.  He D, Ma Z, Li C, Li Y. Dual-Branch Fully Convolutional Segment Anything Model for Lesion Segmentation in Endoscopic Images. IEEE Access. 2024;12:125654-125667.  [PubMed]  [DOI]  [Full Text]
86.  Li F, Huang Z, Zhou L, Chen Y, Tang S, Ding P, Peng H, Chu Y. Improved dual-aggregation polyp segmentation network combining a pyramid vision transformer with a fully convolutional network. Biomed Opt Express. 2024;15:2590-2621.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
87.  Dermyer P, Kalra A, Schwartz M.   EndoDINO: A Foundation Model for GI Endoscopy. 2025 Preprint. Available from: eprint arXiv:2501.05488.  [PubMed]  [DOI]  [Full Text]
88.  Choudhuri A, Gao Z, Zheng M, Planche B, Chen T, Wu Z.   PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis. 2025 Preprint. Available from: eprint arXiv:2503.24108.  [PubMed]  [DOI]  [Full Text]
89.  Chen H, Gou L, Fang Z, Dou Q, Chen H, Chen C, Qiu Y, Zhang J, Ning C, Hu Y, Deng H, Yu J, Li G. Artificial intelligence assisted real-time recognition of intra-abdominal metastasis during laparoscopic gastric cancer surgery. NPJ Digit Med. 2025;8:9.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 8]  [Reference Citation Analysis (0)]
90.  Mostafijur Rahman M, Munir M, Jha D, Bagci U, Marculescu R.   PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation. 2024 Preprint. Available from: eprint arXiv:2405.16740.  [PubMed]  [DOI]  [Full Text]
91.  Wang G, Xiao H, Gao H, Zhang R, Bai L, Yang X, Li Z, Li H, Ren H.   CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection. 2024 Preprint. Available from: eprint arXiv:2410.07540.  [PubMed]  [DOI]  [Full Text]
92.  Tan S, Cai Y, Lin X, Qi W, Li Z, Wan X, Li G.   ColonCLIP: An Adaptable Prompt-Driven Multi-Modal Strategy for Colonoscopy Image Diagnosis. 2024 IEEE International Symposium on Biomedical Imaging (ISBI); 2024 May 27-30; Athens, Greece. IEEE, 2024.  [PubMed]  [DOI]  [Full Text]
93.  Yu J, Zhu Y, Fu P, Chen T, Huang J, Li Q, Zhou P, Wang Z, Wu F, Wang S, Yang X. Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models. IEEE Trans Med Imaging. 2025;PP.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
94.  Sharma V, Jha D, Bhuyan MK, Das PK, Bagci U.   Diverse Image Generation with Diffusion Models and Cross Class Label Learning for Polyp Classification. 2025 Preprint. Available from: eprint arXiv:2502.05444.  [PubMed]  [DOI]  [Full Text]
95.  Karaosmanoglu AD, Onur MR, Arellano RS.   Imaging in Gastrointestinal Cancers. In: Yalcin S, Philip P, editors. Textbook of Gastrointestinal Oncology. Cham: Springer, 2019.  [PubMed]  [DOI]  [Full Text]
96.  Chong JJR, Kirpalani A, Moreland R, Colak E. Artificial Intelligence in Gastrointestinal Imaging: Advances and Applications. Radiol Clin North Am. 2025;63:477-490.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
97.  Wu C, Zhang X, Zhang Y, Wang Y, Xie W.   Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data. 2023 Preprint. Available from: eprint arXiv:2308.02463.  [PubMed]  [DOI]  [Full Text]
98.  Cherti M, Beaumont R, Wightman R, Wortsman M, Ilharco G, Gordon C, Schuhmann C, Schmidt L, Jitsev J.   Reproducible Scaling Laws for Contrastive Language-Image Learning. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17-24; Vancouver, BC, Canada. IEEE, 2023.  [PubMed]  [DOI]  [Full Text]
99.  Blankemeier L, Cohen JP, Kumar A, Van Veen D, Gardezi SJS, Paschali M, Chen Z, Delbrouck JB, Reis E, Truyts C, Bluethgen C, Jensen MEK, Ostmeier S, Varma M, Valanarasu JMJ, Fang Z, Huo Z, Nabulsi Z, Ardila D, Weng WH, Amaro E, Ahuja N, Fries J, Shah NH, Johnston A, Boutin RD, Wentland A, Langlotz CP, Hom J, Gatidis S, Chaudhari AS. Merlin: A Vision Language Foundation Model for 3D Computed Tomography. Res Sq. 2024;rs.3.rs-4546309.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 24]  [Cited by in RCA: 20]  [Article Influence: 20.0]  [Reference Citation Analysis (0)]
100.  Saab K, Tu T, Weng WH, Tanno R, Stutz D, Wulczyn E, Zhang F, Strother T, Park C, Vedadi E, Zambrano Chaves J, Hu SY, Schaekermann M, Kamath A, Cheng Y, Barrett DGT, Cheung C, Mustafa B, Palepu A, McDuff D, Hou L, Golany T, Liu L, Alayrac JB, Houlsby N, Tomasev N, Freyberg J, Lau C, Kemp J, Lai J, Azizi S, Kanada K, Man S, Kulkarni K, Sun R, Shakeri S, He L, Caine B, Webson A, Latysheva N, Johnson M, Mansfield P, Lu J, Rivlin E, Anderson J, Green B, Wong R, Krause J, Shlens J, Dominowska E, Eslami SMA, Chou K, Cui C, Vinyals O, Kavukcuoglu K, Manyika J, Dean J, Hassabis D, Matias Y, Webster D, Barral J, Corrado G, Semturs C, Mahdavi SS, Gottweis J, Karthikesalingam A, Natarajan V.   Capabilities of Gemini Models in Medicine. 2024 Preprint. Available from: eprint arXiv:2404.18416.  [PubMed]  [DOI]  [Full Text]
101.  Kiraly AP, Baur S, Philbrick K, Mahvar F, Yatziv L, Chen T, Sterling B, George N, Jamil F, Tang J, Bailey K, Ahmed F, Goel A, Ward A, Yang L, Sellergren A, Matias Y, Hassidim A, Shetty S, Golden D, Azizi S, Steiner DF, Liu Y, Thelin T, Pilgrim R, Kirmizibayrak C.   Health AI Developer Foundations. 2024 Preprint. Available from: eprint arXiv:2411.15128.  [PubMed]  [DOI]  [Full Text]
102.  Pai S, Hadzic I, Bontempi D, Bressem K, Kann BH, Fedorov A, Mak RH, Aerts HJWL.   Vision Foundation Models for Computed Tomography. 2025 Preprint. Available from: eprint arXiv:2501.09001.  [PubMed]  [DOI]  [Full Text]
103.  Zhou HY, Nicolás Acosta J, Adithan S, Datta S, Topol EJ, Rajpurkar P.   MedVersa: A Generalist Foundation Model for Medical Image Interpretation. 2024 Preprint. Available from: eprint arXiv:2405.07988.  [PubMed]  [DOI]  [Full Text]
104.  Zhou F, Xu Y, Cui Y, Zhang S, Zhu Y, He W, Wang J, Wang X, Chan R, Lau LHS, Han C, Zhang D, Li Z, Chen H.   iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer. 2024 Preprint. Available from: eprint arXiv:2404.01192.  [PubMed]  [DOI]  [Full Text]
105.  Yasaka K, Kawamura M, Sonoda Y, Kubo T, Kiryu S, Abe O. Large multimodality model fine-tuned for detecting breast and esophageal carcinomas on CT: a preliminary study. Jpn J Radiol. 2025;43:779-786.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
106.  Dika E, Curti N, Giampieri E, Veronesi G, Misciali C, Ricci C, Castellani G, Patrizi A, Marcelli E. Advantages of manual and automatic computer-aided compared to traditional histopathological diagnosis of melanoma: A pilot study. Pathol Res Pract. 2022;237:154014.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 7]  [Reference Citation Analysis (0)]
107.  Hanna MG, Parwani A, Sirintrapun SJ. Whole Slide Imaging: Technology and Applications. Adv Anat Pathol. 2020;27:251-259.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 18]  [Cited by in RCA: 79]  [Article Influence: 15.8]  [Reference Citation Analysis (0)]
108.  Niazi MKK, Parwani AV, Gurcan MN. Digital pathology and artificial intelligence. Lancet Oncol. 2019;20:e253-e261.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 507]  [Cited by in RCA: 684]  [Article Influence: 114.0]  [Reference Citation Analysis (0)]
109.  da Silva LM, Pereira EM, Salles PG, Godrich R, Ceballos R, Kunz JD, Casson A, Viret J, Chandarlapaty S, Ferreira CG, Ferrari B, Rothrock B, Raciti P, Reuter V, Dogdas B, DeMuth G, Sue J, Kanan C, Grady L, Fuchs TJ, Reis-Filho JS. Independent real-world application of a clinical-grade automated prostate cancer detection system. J Pathol. 2021;254:147-158.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 104]  [Reference Citation Analysis (0)]
110.  Kang M, Song H, Park S, Yoo D, Pereira S.   Benchmarking Self-Supervised Learning on Diverse Pathology Datasets. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17-24; Vancouver, BC, Canada. IEEE, 2023.  [PubMed]  [DOI]  [Full Text]
111.  Wang X, Yang S, Zhang J, Wang M, Zhang J, Yang W, Huang J, Han X. Transformer-based unsupervised contrastive learning for histopathological image classification. Med Image Anal. 2022;81:102559.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 247]  [Reference Citation Analysis (0)]
112.  Filiot A, Ghermi R, Olivier A, Jacob P, Fidon L, Camara A, Mac Kain A, Saillard C, Schiratti J.   Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling. 2024 Preprint. Available from: medrxiv:23292757.  [PubMed]  [DOI]  [Full Text]
113.  Azizi S, Culp L, Freyberg J, Mustafa B, Baur S, Kornblith S, Chen T, Tomasev N, Mitrović J, Strachan P, Mahdavi SS, Wulczyn E, Babenko B, Walker M, Loh A, Chen PC, Liu Y, Bavishi P, McKinney SM, Winkens J, Roy AG, Beaver Z, Ryan F, Krogue J, Etemadi M, Telang U, Liu Y, Peng L, Corrado GS, Webster DR, Fleet D, Hinton G, Houlsby N, Karthikesalingam A, Norouzi M, Natarajan V. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat Biomed Eng. 2023;7:756-779.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 85]  [Reference Citation Analysis (0)]
114.  Vorontsov E, Bozkurt A, Casson A, Shaikovski G, Zelechowski M, Severson K, Zimmermann E, Hall J, Tenenholtz N, Fusi N, Yang E, Mathieu P, van Eck A, Lee D, Viret J, Robert E, Wang YK, Kunz JD, Lee MCH, Bernhard JH, Godrich RA, Oakley G, Millar E, Hanna M, Wen H, Retamero JA, Moye WA, Yousfi R, Kanan C, Klimstra DS, Rothrock B, Liu S, Fuchs TJ. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat Med. 2024;30:2924-2935.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 2]  [Cited by in RCA: 135]  [Article Influence: 135.0]  [Reference Citation Analysis (0)]
115.  Zimmermann E, Vorontsov E, Viret J, Casson A, Zelechowski M, Shaikovski G, Tenenholtz N, Hall J, Klimstra D, Yousfi R, Fuchs T, Fusi N, Liu S, Severson K.   Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology. 2024 Preprint. Available from: eprint arXiv:2408.00738.  [PubMed]  [DOI]  [Full Text]
116.  Filiot A, Jacob P, Mac Kain A, Saillard C.   Phikon-v2, A large and public feature extractor for biomarker prediction. 2024 Preprint. Available from: eprint arXiv:2409.09173.  [PubMed]  [DOI]  [Full Text]
117.  Dippel J, Feulner B, Winterhoff T, Milbich T, Tietz S, Schallenberg S, Dernbach G, Kunft A, Heinke S, Eich M-L, Ribbat-Idel J, Krupar R, Anders P, Prenißl N, Jurmeister P, Horst D, Ruff L, Müller K-R, Klauschen F, Alber M.   RudolfV: A Foundation Model by Pathologists for Pathologists. 2024 Preprint. Available from: eprint arXiv:2401.04079.  [PubMed]  [DOI]  [Full Text]
118.  Nechaev D, Pchelnikov A, Ivanova E.   Hibou: A Family of Foundational Vision Transformers for Pathology. 2024 Preprint. Available from: eprint arXiv:2406.05074.  [PubMed]  [DOI]  [Full Text]
119.  Jaume G, Vaidya A, Zhang A, H.   Song A, J. Chen R, Sahai S, Mo D, Madrigal E, Phi Le L, Mahmood F. Multistain Pretraining for Slide Representation Learning in Pathology. In: Leonardis A, Ricci E, Roth S, Russakovsky O, Sattler T, Varol G, editors. Computer Vision - ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15091. Cham: Springer, 2025.  [PubMed]  [DOI]  [Full Text]
120.  Lenz T, Neidlinger P, Ligero M, Wölflein G, van Treeck M, Kather JN.   Unsupervised foundation model-agnostic slide-level representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2025 Jun 10-17; Nashville, TN, United States. IEEE, 2025.  [PubMed]  [DOI]  [Full Text]
121.  Juyal D, Padigela H, Shah C, Shenker D, Harguindeguy N, Liu Y, Martin B, Zhang Y, Nercessian M, Markey M, Finberg I, Luu K, Borders D, Ashar Javed S, Krause E, Biju R, Sood A, Ma A, Nyman J, Shamshoian J, Chhor G, Sanghavi D, Thibault M, Yu L, Najdawi F, Hipp JA, Fahy D, Glass B, Walk E, Abel J, Pokkalla H, Beck AH, Grullon S.   PLUTO: Pathology-Universal Transformer. 2024 Preprint. Available from: eprint arXiv:2405.07905.  [PubMed]  [DOI]  [Full Text]
122.  Chen RJ, Chen C, Li Y, Chen TY, Trister AD, Krishnan RG, Mahmood F.   Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18-24; New Orleans, LA, United States. IEEE, 2022.  [PubMed]  [DOI]  [Full Text]
123.  Hua S, Yan F, Shen T, Ma L, Zhang X. PathoDuet: Foundation models for pathological slide analysis of H&E and IHC stains. Med Image Anal. 2024;97:103289.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 12]  [Reference Citation Analysis (0)]
124.  Ai K, Aben N, de Jong ED, Gatopoulos I, Känzig N, Karasikov M, Lagré A, Moser R, van Doorn J, Tang F.   Towards Large-Scale Training of Pathology Foundation Models. 2024 Preprint. Available from: eprint arXiv:2404.15217.  [PubMed]  [DOI]  [Full Text]
125.  Yan F, Wu J, Li J, Wang W, Lu J, Chen W, Gao Z, Li J, Yan H, Ma J, Chen M, Lu Y, Chen Q, Wang Y, Ling X, Wang X, Wang Z, Huang Q, Hua S, Liu M, Ma L, Shen T, Zhang X, He Y, Chen H, Zhang S, Wang Z.   PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks. 2025 Preprint. Available from: eprint arXiv:2503.24345.  [PubMed]  [DOI]  [Full Text]
126.  Vaidya A, Zhang A, Jaume G, Song AH, Ding T, Wagner SJ, Lu MY, Doucet P, Robertson H, Almagro-Perez C, Chen RJ, ElHarouni D, Ayoub G, Bossi C, Ligon KL, Gerber G, Phi Le L, Mahmood F.   Molecular-driven Foundation Model for Oncologic Pathology. 2025 Preprint. Available from: eprint arXiv:2501.16652.  [PubMed]  [DOI]  [Full Text]
127.  Filiot A, Dop N, Tchita O, Riou A, Dubois R, Peeters T, Valter D, Scalbert M, Saillard C, Robin G, Olivier A.   Distilling foundation models for robust and efficient models in digital pathology. 2025 Preprint. Available from: eprint arXiv:2501.16239.  [PubMed]  [DOI]  [Full Text]
128.  Nicke T, Schäfer JR, Höfener H, Feuerhake F, Merhof D, Kießling F, Lotz J. Tissue concepts: Supervised foundation models in computational pathology. Comput Biol Med. 2025;186:109621.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
129.  Kan Wang Y, Tydlitatova L, Kunz JD, Oakley G, Chow BKB, Godrich RA, Lee MCH, Aghdam H, Bozkurt A, Zelechowski M, Vanderbilt C, Kanan C, Retamero JA, Hamilton P, Yousfi R, Fuchs TJ, Klimstra DS, Liu S.   Screen Them All: High-Throughput Pan-Cancer Genetic and Phenotypic Biomarker Screening from H&E Whole Slide Images. 2024 Preprint. Available from: eprint arXiv:2408.09554.  [PubMed]  [DOI]  [Full Text]
130.  Wu Y, Li S, Du Z, Zhu W.   BROW: Better featuRes fOr Whole slide image based on self-distillation. 2023 Preprint. Available from: eprint arXiv:2309.08259.  [PubMed]  [DOI]  [Full Text]
131.  Yang Z, Wei T, Liang Y, Yuan X, Gao R, Xia Y, Zhou J, Zhang Y, Yu Z. A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images. Nat Commun. 2025;16:2366.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 1]  [Cited by in RCA: 14]  [Article Influence: 14.0]  [Reference Citation Analysis (0)]
132.  Alber M, Tietz S, Dippel J, Milbich T, Lesort T, Korfiatis P, Krügener M, Perez Cancer B, Shah N, Möllers A, Seegerer P, Carpen-Amarie A, Standvoss K, Dernbach G, de Jong E, Schallenberg S, Kunft A, Hoffer von Ankershoffen H, Schaeferle G, Duffy P, Redlon M, Jurmeister P, Horst D, Ruff L, Müller K-R, Klauschen F, Norgan A.   Atlas: A Novel Pathology Foundation Model by Mayo Clinic, Charité, and Aignostics. 2025 Preprint. Available from: eprint arXiv:2501.05409.  [PubMed]  [DOI]  [Full Text]
133.  Yang Z, Li L, Lin K, Wang J, Lin CC, Liu Z, Wang L.   The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision). 2023 Preprint. Available from: eprint arXiv:2309.17421.  [PubMed]  [DOI]  [Full Text]
134.  Wu J, Gan W, Chen Z, Wan S, Yu PS.   Multimodal Large Language Models: A Survey. 2023 IEEE International Conference on Big Data (BigData); 2023 Dec 15-18; Sorrento, Italy. IEEE, 2024.  [PubMed]  [DOI]  [Full Text]
135.  Kaczmarczyk R, Wilhelm TI, Martin R, Roos J. Evaluating multimodal AI in medical diagnostics. NPJ Digit Med. 2024;7:205.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 18]  [Reference Citation Analysis (0)]
136.  Huang Z, Bianchi F, Yuksekgonul M, Montine TJ, Zou J. A visual-language foundation model for pathology image analysis using medical Twitter. Nat Med. 2023;29:2307-2316.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 207]  [Article Influence: 103.5]  [Reference Citation Analysis (0)]
137.  Guo Z, Ma J, Xu Y, Wang Y, Wang L, Chen H.   HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-Modal Context Interaction. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024.  [PubMed]  [DOI]  [Full Text]
138.  Ahmed F, Sellergen A, Yang L, Xu S, Babenko B, Ward A, Olson N, Mohtashamian A, Matias Y, Corrado GS, Duong Q, Webster DR, Shetty S, Golden D, Liu Y, Steiner DF, Wulczyn E.   PathAlign: A vision-language model for whole slide images in histopathology. Proceedings of the MICCAI Workshop on Computational Pathology; 2024; Proceedings of Machine Learning Research. PMLR, 2024: 72-108.  [PubMed]  [DOI]
139.  Wang X, Zhao J, Marostica E, Yuan W, Jin J, Zhang J, Li R, Tang H, Wang K, Li Y, Wang F, Peng Y, Zhu J, Zhang J, Jackson CR, Zhang J, Dillon D, Lin NU, Sholl L, Denize T, Meredith D, Ligon KL, Signoretti S, Ogino S, Golden JA, Nasrallah MP, Han X, Yang S, Yu KH. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature. 2024;634:970-978.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 87]  [Cited by in RCA: 142]  [Article Influence: 142.0]  [Reference Citation Analysis (0)]
140.  Sun Y, Zhang Y, Si Y, Zhu C, Shui Z, Zhang K, Li J, Lyu X, Lin T, Yang L.   PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration. 2024 Preprint. Available from: eprint arXiv:2407.00203.  [PubMed]  [DOI]  [Full Text]
141.  Lu MY, Chen B, Williamson DFK, Chen RJ, Zhao M, Chow AK, Ikemura K, Kim A, Pouli D, Patel A, Soliman A, Chen C, Ding T, Wang JJ, Gerber G, Liang I, Le LP, Parwani AV, Weishaupt LL, Mahmood F. A multimodal generative AI copilot for human pathology. Nature. 2024;634:466-473.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 145]  [Cited by in RCA: 127]  [Article Influence: 127.0]  [Reference Citation Analysis (0)]
142.  Sun Y, Zhu C, Zheng S, Zhang K, Sun L, Shui Z, Zhang Y, Li H, Yang L. PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology. Proc AAAI Conf Artif Intell. 2024;38:5034-5042.  [PubMed]  [DOI]  [Full Text]
143.  Xu H, Usuyama N, Bagga J, Zhang S, Rao R, Naumann T, Wong C, Gero Z, González J, Gu Y, Xu Y, Wei M, Wang W, Ma S, Wei F, Yang J, Li C, Gao J, Rosemon J, Bower T, Lee S, Weerasinghe R, Wright BJ, Robicsek A, Piening B, Bifulco C, Wang S, Poon H. A whole-slide foundation model for digital pathology from real-world data. Nature. 2024;630:181-188.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 237]  [Reference Citation Analysis (0)]
144.  Ding T, Wagner SJ, Song AH, Chen RJ, Lu MY, Zhang A, Vaidya AJ, Jaume G, Shaban M, Kim A, Williamson DFK, Chen B, Almagro-Perez C, Doucet P, Sahai S, Chen C, Komura D, Kawabe A, Ishikawa S, Gerber G, Peng T, Phi Le L, Mahmood F.   Multimodal Whole Slide Foundation Model for Pathology. 2024 Preprint. Available from: eprint arXiv:2411.19666.  [PubMed]  [DOI]  [Full Text]
145.  Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, Jaume G, Odintsov I, Le LP, Gerber G, Parwani AV, Zhang A, Mahmood F. A visual-language foundation model for computational pathology. Nat Med. 2024;30:863-874.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 407]  [Cited by in RCA: 239]  [Article Influence: 239.0]  [Reference Citation Analysis (0)]
146.  Chen Y, Wang G, Ji Y, Li Y, Ye J, Li T, Hu M, Yu R, Qiao Y, He J.   SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding. 2024 Preprint. Available from: eprint arXiv:2410.11761.  [PubMed]  [DOI]  [Full Text]
147.  Tan JW, Kim S, Kim E, Lee SH, Ahn S, Jeong W.   Clinical-Grade Multi-organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024.  [PubMed]  [DOI]  [Full Text]
148.  Chen Z, Chen Y, Sun Y, Tang L, Zhang L, Hu Y, He M, Li Z, Cheng S, Yuan J, Wang Z, Wang Y, Zhao J, Gong J, Zhao L, Cao B, Li G, Zhang X, Dong B, Shen L. Predicting gastric cancer response to anti-HER2 therapy or anti-HER2 combined immunotherapy based on multi-modal data. Signal Transduct Target Ther. 2024;9:222.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 28]  [Cited by in RCA: 35]  [Article Influence: 35.0]  [Reference Citation Analysis (0)]
149.  Zhao W, Guo Z, Fan Y, Jiang Y, Yeung MCF, Yu L. Aligning knowledge concepts to whole slide images for precise histopathology image analysis. NPJ Digit Med. 2024;7:383.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 2]  [Reference Citation Analysis (0)]
150.  Ferber D, Wölflein G, Wiest IC, Ligero M, Sainath S, Ghaffari Laleh N, El Nahhas OSM, Müller-Franzes G, Jäger D, Truhn D, Kather JN. In-context learning enables multimodal large language models to classify cancer pathology images. Nat Commun. 2024;15:10104.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 33]  [Reference Citation Analysis (0)]
151.  Wang J, Wang K, Yu Y, Lu Y, Xiao W, Sun Z, Liu F, Zou Z, Gao Y, Yang L, Zhou HY, Miao H, Zhao W, Huang L, Zeng L, Guo R, Chong I, Deng B, Cheng L, Chen X, Luo J, Zhu MH, Baptista-Hon D, Monteiro O, Li M, Ke Y, Li J, Zeng S, Guan T, Zeng J, Xue K, Oermann E, Luo H, Yin Y, Zhang K, Qu J. Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nat Med. 2025;31:609-617.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 6]  [Cited by in RCA: 25]  [Article Influence: 25.0]  [Reference Citation Analysis (0)]
152.  Zhou Q, Zhong W, Guo Y, Xiao M, Ma H, Huang J.   PathM3: A Multimodal Multi-task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024.  [PubMed]  [DOI]  [Full Text]
153.  Hu D, Jiang Z, Shi J, Xie F, Wu K, Tang K, Cao M, Huai J, Zheng Y. Histopathology language-image representation learning for fine-grained digital pathology cross-modal retrieval. Med Image Anal. 2024;95:103163.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 5]  [Reference Citation Analysis (0)]
154.  Zhang L, Yun B, Xie X, Li Q, Li X, Wang Y.   Prompting Whole Slide Image Based Genetic Biomarker Prediction. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024.  [PubMed]  [DOI]  [Full Text]
155.  Sengupta S, Brown DE.   Automatic Report Generation for Histopathology Images Using Pre-Trained Vision Transformers and BERT. 2024 IEEE International Symposium on Biomedical Imaging (ISBI); 2024 May 27-30; Athens, Greece. IEEE: 2024.  [PubMed]  [DOI]  [Full Text]
156.  Xu Y, Wang Y, Zhou F, Ma J, Yang S, Lin H, Wang X, Wang J, Liang L, Han A, Chan RCK, Chen H.   A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model. 2024 Preprint. Available from: eprint arXiv:2407.15362.  [PubMed]  [DOI]  [Full Text]
157.  Ferber D, El Nahhas OSM, Wölflein G, Wiest IC, Clusmann J, Leßmann ME, Foersch S, Lammert J, Tschochohei M, Jäger D, Salto-Tellez M, Schultz N, Truhn D, Kather JN. Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nat Cancer. 2025;6:1337-1349.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 9]  [Cited by in RCA: 19]  [Article Influence: 19.0]  [Reference Citation Analysis (0)]
158.  Shaikovski G, Casson A, Severson K, Zimmermann E, Kan Wang Y, Kunz JD, Retamero JA, Oakley G, Klimstra D, Kanan C, Hanna M, Zelechowski M, Viret J, Tenenholtz N, Hall J, Fusi N, Yousfi R, Hamilton P, Moye WA, Vorontsov E, Liu S, Fuchs TJ.   PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology. 2024 Preprint. Available from: eprint arXiv:2405.10254.  [PubMed]  [DOI]  [Full Text]
159.  Tran M, Schmidle P, Guo RR, Wagner SJ, Koch V, Lupperger V, Novotny B, Murphree DH, Hardway HD, D'Amato M, Lefkes J, Geijs DJ, Feuchtinger A, Böhner A, Kaczmarczyk R, Biedermann T, Amir AL, Mooyaart AL, Ciompi F, Litjens G, Wang C, Comfere NI, Eyerich K, Braun SA, Marr C, Peng T. Generating dermatopathology reports from gigapixel whole slide images with HistoGPT. Nat Commun. 2025;16:4886.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 4]  [Reference Citation Analysis (0)]
160.  Dai D, Zhang Y, Yang Q, Xu L, Shen X, Xia S, Wang G. Pathologyvlm: a large vision-language model for pathology image understanding. Artif Intell Rev. 2025;58:186.  [PubMed]  [DOI]  [Full Text]
161.  Xiang J, Wang X, Zhang X, Xi Y, Eweje F, Chen Y, Li Y, Bergstrom C, Gopaulchan M, Kim T, Yu KH, Willens S, Olguin FM, Nirschl JJ, Neal J, Diehn M, Yang S, Li R. A vision-language foundation model for precision oncology. Nature. 2025;638:769-778.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 22]  [Cited by in RCA: 39]  [Article Influence: 39.0]  [Reference Citation Analysis (0)]
162.  Deshpande P, Rasin A, Tchoua R, Furst J, Raicu D, Schinkel M, Trivedi H, Antani S. Biomedical heterogeneous data categorization and schema mapping toward data integration. Front Big Data. 2023;6:1173038.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 2]  [Reference Citation Analysis (0)]
163.  Mohammed Yakubu A, Chen YP. Ensuring privacy and security of genomic data and functionalities. Brief Bioinform. 2020;21:511-526.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 43]  [Cited by in RCA: 23]  [Article Influence: 4.6]  [Reference Citation Analysis (0)]
164.  Shin H, Ryu K, Kim JY, Lee S. Application of privacy protection technology to healthcare big data. Digit Health. 2024;10:20552076241282242.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 3]  [Reference Citation Analysis (0)]
165.  Quinn TP, Jacobs S, Senadeera M, Le V, Coghlan S. The three ghosts of medical AI: Can the black-box present deliver? Artif Intell Med. 2022;124:102158.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 23]  [Cited by in RCA: 72]  [Article Influence: 18.0]  [Reference Citation Analysis (0)]
166.  Karim MR, Islam T, Shajalal M, Beyan O, Lange C, Cochez M, Rebholz-Schuhmann D, Decker S. Explainable AI for Bioinformatics: Methods, Tools and Applications. Brief Bioinform. 2023;24:bbad236.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 51]  [Reference Citation Analysis (0)]
167.  Caton S, Haas C. Fairness in Machine Learning: A Survey. ACM Comput Surv. 2024;56:1-38.  [PubMed]  [DOI]  [Full Text]
168.  Ong JCL, Chang SY, William W, Butte AJ, Shah NH, Chew LST, Liu N, Doshi-Velez F, Lu W, Savulescu J, Ting DSW. Ethical and regulatory challenges of large language models in medicine. Lancet Digit Health. 2024;6:e428-e432.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 73]  [Cited by in RCA: 98]  [Article Influence: 98.0]  [Reference Citation Analysis (0)]
169.  Hantel A, Walsh TP, Marron JM, Kehl KL, Sharp R, Van Allen E, Abel GA. Perspectives of Oncologists on the Ethical Implications of Using Artificial Intelligence for Cancer Care. JAMA Netw Open. 2024;7:e244077.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 16]  [Cited by in RCA: 23]  [Article Influence: 23.0]  [Reference Citation Analysis (0)]
170.  El Arab RA, Abu-Mahfouz MS, Abuadas FH, Alzghoul H, Almari M, Ghannam A, Seweid MM. Bridging the Gap: From AI Success in Clinical Trials to Real-World Healthcare Implementation-A Narrative Review. Healthcare (Basel). 2025;13:701.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 10]  [Reference Citation Analysis (0)]