Published online Dec 21, 2025. doi: 10.3748/wjg.v31.i47.112921
Revised: September 10, 2025
Accepted: November 3, 2025
Published online: December 21, 2025
Processing time: 132 Days and 8.5 Hours
Gastrointestinal (GI) cancers represent a major global health concern due to their high incidence and mortality rates. Foundation models (FMs), also referred to as large models, represent a novel class of artificial intelligence technologies that have demonstrated considerable potential in addressing these challenges. These models encompass large language models (LLMs), vision FMs (VFMs), and multimodal LLMs (MLLMs), all of which utilize transformer architectures and self-supervised pre-training on extensive unlabeled datasets to achieve robust cross-domain generalization. This review delineates the principal applications of these models: LLMs facilitate the structuring of clinical narratives, extraction of insights from medical records, and enhancement of physician-patient commu
Core Tip: This review synthesizes applications of foundation models in gastrointestinal cancer, from clinical text structuring and image analysis to multimodal data integration. Despite current knowledge gaps and challenges like data standardization, it highlights foundation models’ transformative potential, urging refined models and collaborations to advance gastroin
- Citation: Shi L, Huang R, Zhao LL, Guo AJ. Foundation models: Insights and implications for gastrointestinal cancer. World J Gastroenterol 2025; 31(47): 112921
- URL: https://www.wjgnet.com/1007-9327/full/v31/i47/112921.htm
- DOI: https://dx.doi.org/10.3748/wjg.v31.i47.112921
Gastrointestinal (GI) cancers represent some of the most prevalent and lethal malignancies worldwide, imposing a substantial burden on public health[1]. Their multifactorial etiology and heterogeneous clinical manifestations make them difficult to study and treat using current methods[2]. Nevertheless, the advent of next-generation artificial intelligence (AI) models, known as oundation odels (FMs), offers novel avenues for addressing these challenges[3]. These models, trained on vast amounts of datasets, have the power to handle complex tasks, thereby presenting promising strategies to mitigate this worldwide health concern[4].
Unlike early AI methods that targeted isolated tasks or limited data modalities, FMs can integrate diverse medical data types, including endoscopic images, pathology slides, electronic health records (EHRs), genomic data, and clinical narratives[5]. This integrative capability is particularly pertinent to GI cancers, which often progress through a defined pattern (e.g., Correa’s cascade from gastritis to cancer)[6]. Accurate risk assessment, early diagnosis, and therapeutic decision-making require comprehensive data interpretation. However, current knowledge regarding the application of FMs in GI cancer remains limited, underscoring the imperative to systematically review current implementations and delineate prospective research trajectories to advance FM utilization in this domain.
Traditional computational biology techniques, such as support vector machines (SVMs) and random forests, alongside more recent deep-learning approaches like convolutional neural networks (CNNs), have made incremental advances in GI cancer research[7]. Nevertheless, these methods face major limitations, including dependence on labor-intensive, high-quality annotations; heterogeneity of datasets across institutions; and a predominant focus on unimodal data (e.g., imaging or genomics in isolation). These constraints highlight the necessity for cross-modal, large-scale pre-trained models[8].
Recent breakthroughs in general-purpose FMs, exemplified by ChatGPT, Stable Diffusion, and related architectures, have introduced a new paradigm shift in GI cancer research[5,9]. Their innovation resides in exceptional generalizability and cross-domain adaptability, facilitated by transformer-based architectures comprising billions of parameters pre-trained on vast, diverse datasets[10]. This pre-training engenders universal representations transferable to a broad spectrum of downstream tasks, maintaining robust performance even with limited or unlabeled data. Compared to traditional methods, FMs offer distinct advantages: Billion-scale parameterization combined with self-supervised learning (SSL) enables deep feature extraction and fusion of heterogeneous data; zero- or few-shot transfer learning substantially diminishes reliance on annotated datasets[11]. This review retrospectively synthesizes key FMs applied in GI cancer research, focusing on three principal categories: Large language models (LLMs) for clinical decision support leveraging EHRs; vision models [e.g., Vision Transformer (ViT) architectures] for endoscopic image analysis; and multimodal fusion models integrating imaging, omics, and pathology data. It is noteworthy that this research field is rapidly evolving, with some models already operational and others exploratory yet exhibiting considerable translational potential.
This section provides a concise historical overview of AI development to contextualize the emergence of FMs for researchers less familiar with the field. The conceptual foundation of AI traces back to Alan Turing’s 1950 proposal of the "Turing Test", envisioning computational simulation of human intelligence[12]. The 1956 Dartmouth Conference marked a seminal milestone, formally introducing the term "artificial intelligence" and transitioning the field from theoretical inquiry to systematic investigation[13]. AI evolution encompasses three major phases: The nascent period (1950s-1970s), dominated by symbolic logic and expert systems. For example, the Perceptron model developed by Frank Rosenblatt in 1957 attempted to realize classification learning through neural networks but hit a bottleneck due to hardware limitations[14]; the revival period (1980s-2000s) , characterized by statistical learning and big data exemplified by IBM's Deep Blue (which defeated the world chess champion in 1997) and Watson (which won the Jeopardy championship in 2011) veri
The concept of FMs was initially introduced by the Center for Research on Foundation Models (CRFM) at Stanford University in 2021[11]. CRFM characterizes FMs as models trained on extensive and diverse datasets, typically via large-scale SSL, that can be adapted to a variety of downstream tasks through fine-tuning. These models transcend the tra
A principal distinction between FMs and conventional AI models lies in their methodological approach. Traditional models, such as SVMs and CNNs, are typically designed for narrowly defined tasks and require substantial labeled datasets for each specific application[15]. Consequently, these models exhibit limited generalizability and are not readily adaptable to novel tasks; for example, a model trained to detect gastric cancer pathology cannot be directly repurposed for colorectal cancer (CRC) lymph node identification[23]. In contrast, FMs employ a two-stage process involving self-supervised pre-training followed by downstream fine-tuning[11]. During pre-training, FMs learn from vast quantities of unlabeled data, such as medical images and textual corpora, through tasks like masked reconstruction. Subsequently, fine-tuning enables adaptation to new tasks with relatively small labeled datasets. This paradigm allows a single pre-trained model to be deployed across multiple scenarios. Architectures such as GPT utilize the Transformer framework and autoregressive language modeling, training on extensive internet text corpora to internalize language patterns without manual annotation[16]. This SSL strategy endows FMs with adaptability across diverse tasks, including medical question answering and clinical case summarization, requiring only modest fine-tuning. The capacity for one-time training followed by multi-task reuse underpins FMs’ ability to generalize across domains and modalities, encompassing text, images, and speech, thereby advancing from task-specific models toward more generalized intelligence[5].
The foundational principles of FMs rest upon the integration of architectural design, algorithmic strategies, and technical paradigms, collectively facilitating their versatility and scalability[11]. Architecturally, FMs predominantly adopt the Transformer framework[16], wherein the self-attention mechanism dynamically assigns weights to different elements within a sequence, enabling context-sensitive processing. For example, the term “gastric” may activate distinct medical concepts depending on its contextual usage, such as in “gastric cancer” vs “gastric bezoar”. Algorithmically, FMs follow a pre-training and fine-tuning paradigm. Pre-training constructs a universal knowledge base from large-scale unlabeled data via SSL techniques; for instance, Masked Language Modeling tasks involve predicting obscured text segments (e.g., “Colorectal [MASK] screening guidelines”) to learn associations among medical concepts. Contrastive learning methods align multimodal features, such as correlating endoscopic images with corresponding pathological descriptions[11]. This data-driven approach diminishes dependence on annotated datasets and, when combined with extensive model parameters and massive training corpora, yields substantial performance gains. Fine-tuning adjusts model parameters on task-specific datasets, enabling rapid adaptation to downstream applications; for example, after fine-tuning on tumor classification, the model can accurately delineate cancerous regions in pathology images[11].
FMs can be classified into three categories based on input modalities: LLMs, Vision FMs (VFMs), and Multimodal LLMs (MLLMs)[10]. LLMs are sophisticated neural networks comprising billions of parameters, surpassing traditional language models in performance, with model size generally correlating with efficacy. For example, BioBERT, pre-trained on PubMed abstracts and clinical notes, has enhanced the accuracy of drug-drug interaction predictions[24]. GPT-3 employs in-context learning to generate text completions, overcoming prior limitations[17]. Applications of LLMs include structuring clinical narratives (e.g., extracting gastric cancer TNM staging from medical records), synthesizing evidence from literature (e.g., summarizing clinical trial outcomes for PD-1 inhibitors), and facilitating doctor-patient communi
To provide a contextual understanding of how FMs tackle challenges associated with GI cancers, Figure 1 presents their application framework. It delineates five primary data inputs, further details the processes of pre-training and subsequent fine-tuning methods, applied to various FMs including LLMs, VFMs, and MLLMs. The framework also highlights the spectrum of downstream tasks facilitated by these models, ranging from information extraction to mo
To offer a focused overview of FMs specifically designed for GI cancer research, we first present a summary of FMs with validated applications in GI cancer across language, vision, and multimodal domains in Table 1. This summary emphasizes most importantly, their distinct use cases within GI cancer research, categorized as NLP, endoscopy (Endo), radiology (Radio), and pathology (PA). A critical annotation in the "GI cancer applications" column, denoted as "Directly", signifies that the model was employed for GI cancer-related tasks (e.g., NLP, Endo, Radio, PA, or MLLM) without requiring further modification or fine-tuning, thereby underscoring its intrinsic adaptability to clinical demands.
| Name | Type | Creator | Year | Architecture | Parameters | Modality | OSS | GI cancer applications |
| BERT | LLM | 2018 | Encoder-only transformer | 110M (base), 340M (large) | Text | Yes | NLP, Radio, MLLM | |
| GPT-3 | LLM | OpenAI | 2020 | Decoder-only transformer | 175B | Text | No | NLP |
| ViT | Vision | 2020 | Encoder-only transformer | 86M (base), 307M (large), 632M (huge) | Image | Yes | Endo, Radio, PA, MLLM | |
| DINOv1 | Vision | Meta | 2021 | Encoder-only transformer | 22M, 86M | Image | Yes | Endo, PA |
| CLIP | MM | OpenAI | 2021 | Encoder-encoder | 120-580M | Text, Image | Yes | Endo, Radio, MLLM, directly1 |
| GLM-130B | LLM | Tsinghua | 2022 | Encoder-decoder | 130B | Text | Yes | NLP |
| Stable Diffusion | MM | Stability AI | 2022 | Diffusion model | 1.45B | Text, Image | Yes | NLP, Endo, MLLM, directly |
| BLIP | MM | Salesforce | 2022 | Encoder-decoder | 120M (base), 340M (large) | Text, Image | Yes | Radio, MLLM, directly |
| YouChat | LLM | You.com | 2022 | Fine-tuned LLMs | Unknown | Text | No | NLP |
| Bard | MM | 2023 | Based on PaLM 2 | 340B estimated | Text, Image, Audio, Code | No | NLP | |
| Bing Chat | MM | Microsoft | 2023 | Fine-tuned GPT-4 | Unknown | Text, Image | No | NLP |
| Mixtral 8x7B | LLM | Mistral AI | 2023 | Decoder-only, Mixture-of-Experts (MoE) | 46.7B total (12.9B active per token) | Text | NLP | |
| LLaVA | MM | Microsoft | 2023 | Vision encoder, LLM | 7B, 13B | Text, Image | Yes | PA, MLLM |
| DINOv2 | Vision | Meta | 2023 | Encoder-only transformer | 86M to 1.1B | Image | Yes | Endo, Radio, PA, MLLM, directly |
| Claude 2 | LLM | Anthropic | 2023 | Decoder-only transformer | Unknown | Text | No | NLP |
| GPT-4 | MM | OpenAI | 2023 | Decoder-only transformer | 1.8T (Estimated) | Text, Image | No | NLP, Endo, MLLM, directly |
| LLaMa 2 | LLM | Meta | 2023 | Decoder-only transformer | 7B, 13B, 34B, 70B | Text | Yes | NLP, Endo, MLLM, directly |
| SAM | Vision | Meta | 2023 | Encoder-decoder | 375M, 1.25G, 2.56G | Image | Yes | Endo, directly |
| GPT-4V | MM | OpenAI | 2023 | MM transformer | 1.8T | Text, Image | No | Endo, MLLM |
| Qwen | NLP | Alibaba | 2023 | Decoder-only transformer | 70B, 180B, 720B | Text | Yes | NLP, MLLM |
| GPT-4o | MM | OpenAI | 2024 | MM transformer | Unknown (Larger than GPT-4) | Text, Image, Video | No | NLP |
| LLaMa 3 | LLM | Meta | 2024 | Decoder-only transformer | 8B, 70B, 400B | Text | Yes | NLP, directly |
| Gemini 1.5 | MM | 2024 | MM transformer | 1.6T | Text, Image, Video, Audio | No | NLP, Radio, directly | |
| Claude 3.7 | MM | Anthropic | 2024 | Decoder-only transformer | Unknown | Text, Image | No | NLP, directly |
| YOLOWorld | Vision | IDEA | 2024 | CNN + RepVL-PAN vision-language fusion | 13-110M (depending on scale) | Text, Image | Yes | Endo, directly |
| DeepSeek | LLM | DeepSeek | 2025 | Decoder-only transformer | 671B | Text | Yes | NLP |
| Phi-4 | LLM | Microsoft | 2025 | Decoder-only transformer | 14B (plus), 7B (mini) | Text | Yes | Endo |
The evolution of GI-related FMs reveals a discernible trajectory of enhanced capabilities and improved alignment with clinical requirements. The introduction of Transformer-based architectures by models such as BERT in 2018 laid the foundational groundwork for contemporary FMs, facilitating subsequent advancements in their medical domain adaptation. Between 2020 and 2021, language-centric FMs, including GPT-3 and GLM-130B, experienced substantial scaling, encompassing tens to hundreds of billions of parameters. This expansion augmented their proficiency in managing unstructured GI cancer data, enabling tasks such as the extraction of phenotypic characteristics and treatment information from EHRs and scientific literature. Concurrently, vision-oriented FMs, exemplified by ViT and DINO, adapted Transformer architectures for image-based applications, addressing pivotal challenges in GI cancer diagnosis. Leveraging transfer learning, these models demonstrated high accuracy in detecting early gastric and colorectal lesions within pathology slides and endoscopic video data.
Post-2021 developments witnessed a shift towards multimodal FMs, which further enhanced clinical utility. Models such as CLIP, BLIP, and Stable Diffusion integrated textual and visual encoding capabilities, facilitating end-to-end workflows including lesion localization in radiological imaging and cross-validation of pathology reports with endo
The analysis of natural language, characterized by its inherent unstructured nature, has historically posed significant challenges for computational processing through rule-based or traditional algorithmic approaches, particularly within the domain of medical texts that contain specialized terminology and complex syntactic structures[26]. However, since the early 2020s, the rapid advancement of LLMs has transformed the field of NLP, establishing these models as the pre
LLMs possess the capability to generate novel, contextually relevant text rather than merely reproducing or summarizing existing information[17]. The widespread adoption and standardization of LLMs have significantly democratized NLP, enabling researchers without extensive technical expertise to employ models such as GPT and BERT for practical applications[10]. These models can store and retrieve extensive knowledge bases and extract structured information from medical documents, including radiology and pathology reports, and can even offer medical recom
Table 2 and Supplementary Table 1 provide a comprehensive overview of 69 representative studies on NLP and LLM applications in GI cancers conducted between 2011 and 2025. These studies encompass traditional NLP methodologies based on rule sets, lexicons, and statistical learning (Supplementary Table 1), alongside the rapidly emerging Tran
| Ref. | Year | Models | Objectives | Datasets | Performance | Evaluation |
| Syed et al[29] | 2022 | BERTi | Developed fine-tuned BERTi for integrated colonoscopy reports | 34165 reports | F1-scores of 91.76%, 92.25%, 88.55% for colonoscopy, pathology, and radiology | Manual chart review by 4 expert-guided reviewer |
| Lahat et al[30] | 2023 | GPT | Assessed GPT performance in addressing 110 real-world gastrointestinal inquiries | 110 real-life questions | Moderate accuracy (3.4-3.9/5) for treatment and diagnostic queries | Assessed by three gastroenterologists using a 1-5 scale for accuracy etc. |
| Lee et al[31] | 2023 | GPT-3.5 | Examined GPT-3.5’s responses to eight frequently asked colonoscopy questions | 8 colonoscopy-related questions | GPT answers had extremely low text similarity (0%-16%) | Four gastroenterologists rated the answers on a 7-point Likert scale |
| Emile et al[32] | 2023 | GPT-3.5 | Analyzed GPT-3.5’s ability to generate appropriate responses to CRC questions | 38 CRC questions | 86.8% deemed appropriately, with 95% concordance on 2022 ASCRS guidelines | Three surgery experts assessed answers using ASCRS guidelines |
| Moazzam et al[33] | 2023 | GPT | Investigated the quality of GPT’s responses to pancreatic cancer-related questions | 30 pancreatic cancer-questions | 80% responses were “very good” or “excellent” | Responses were graded by 20 experts against a clinical benchmark |
| Yeo et al[34] | 2023 | GPT | Assessed GPT’s performance in answering questions regarding cirrhosis and HCC | 164 questions about cirrhosis and HCC | 79.1% correctness for cirrhosis and 74% for HCC, but only 47.3% comprehensiveness | Responses were reviewed by two hepatologists and resolved by a 3rd reviewer |
| Cao et al[35] | 2023 | GPT-3.5 | Examined GPT-3.5’s capacity to answer on liver cancer screening and diagnosis | 20 questions | 48% answers were accurate, with frequent errors in LI-RADS categories | Six fellowship-trained physicians from three centers assessed answers |
| Gorelik et al[36] | 2024 | GPT-4 | Evaluated GPT-4’s ability to provide guideline-aligned recommendations | 275 colonoscopy reports | Aligned with experts in 87% of scenarios, showing no significant accuracy gap | Advice assessed by consensus review with multiple experts |
| Gorelik et al[37] | 2023 | GPT-4 | Analyzed GPT-4’s effectiveness in post-colonoscopy management guidance | 20 clinical scenarios | 90% followed guidelines, with 85% correctness and strong agreement (κ = 0.84) | Assessed by two senior gastroenterologists for guideline compliance |
| Zhou et al[38] | 2023 | GPT-3.5 and GPT-4 | Developed a gastric cancer consultation system and automated report generator | 23 medical knowledge questions | 91.3% appropriate gastric cancer advice (GPT-4), 73.9% for GPT-3.5 | The evaluation was conducted by reviewers with medical standards |
| Yang et al[39] | 2025 | RECOVER (LLM) | Designed a LLM-based remote patient monitoring system for postoperative care | 7 design sessions, 5 interviews | Six major design strategies for integrating clinical guidelines and information | Clinical staff reviewed and provided feedback on the design and functionality |
| Kerbage et al[40] | 2024 | GPT-4 | Evaluated GPT-4’s accuracy in responding to IBS, IBD, and CRC screening | 65 questions (45 patients, 20 doctors) | 84% of answers were accurate | Assessed independently by three senior gastroenterologists |
| Tariq et al[41] | 2024 | GPT-3.5, GPT-4, and Bard | Compared the efficacy of GPT-3.5, GPT 4, and Bard (July 2023 version) in answering 47 common colonoscopy patient queries | 47 queries | GPT 4 outperformed GPT-3.5 and Bard, with 91.4% fully accurate responses vs 6.4% and 14.9%, respectively | Responses were scored by two specialists on a 0-2 point scale and resolved by a 3rd reviewer |
| Maida et al[42] | 2025 | GPT-4 | Evaluated GPT-4’s suitability in addressing screening, diagnostic, therapeutic inquiries | 15 CRC screening inquiries | 4.8/6 for CRC screening accuracy, 2.1/3 for completeness scored | Assessment involved 20 experts and 20 non-experts rating the answers |
| Atarere et al[43] | 2024 | BingChat, GPT, YouChat | Tested the appropriateness of GPT, BingChat, and YouChat in patient education and patient-physician communication | 20 questions (15 on CRC screening and 5 patient-related) | GPT and YouChat provided more reliable answers than BingChat, but all models had occasional inaccuracies | Two board-certified physicians and one Gastroenterologist graded the responses |
| Chang et al[44] | 2024 | GPT-4 | Compared GPT-4’s accuracy, reliability, and alignment of colonoscopy recommendations | 505 colonoscopy reports | 85.7% of cases matched USMSTF guidelines | Assessment was conducted by an expert panel under USMSTF guidelines |
| Lim et al[45] | 2024 | GPT-4 | Compared a contextualized GPT model with standard GPT in colonoscopy screening | 62 example use cases | Contextualized GPT-4 outperformed standard GPT-4 | Compare the GPT4 against a model with relevant screening guidelines |
| Munir et al[46] | 2024 | GPT | Evaluated the quality and utility of responses for three GI surgeries | 24 research questions | Modest quality and vary significantly based on the type of procedure | Responses were graded by 45 expert surgeons |
| Truhn et al[47] | 2024 | GPT-4 | Created a structured data parsing module with GPT-4 for clinical text processing | 100 CRC reports | 99% accuracy for T-stage extraction, 96% for N-stage, and 94% for M-stage | Accuracy of GPT-4 was compared with manually extracted data by experts |
| Choo et al[48] | 2024 | GPT | Designed a clinical decision-support system to generate personalized management plans | 30 stage III recurrent CRC patients | 86.7% agree with tumor board decisions, 100% for second-line therapies | The recommendations were compared with the decision plans made by the MDT |
| Huo et al[49] | 2024 | GPT, BingChat, Bard, Claude 2 | Established a multi-AI platform framework to optimize CRC screening recommendations | Responses for 3 patient cases | GPT aligned with guidelines in 66.7% of cases, while other AIs showed greater divergence | Clinician and patient advice was compared to guidelines |
| Pereyra et al[50] | 2024 | GPT-3.5 | Optimized GPT-3.5 for personalized CRC screening recommendations | 238 physicians | GPT scored 4.57/10 for CRC screening, vs 7.72/10 for physicians | Answers were compared against a group of surgeons |
| Peng et al[51] | 2024 | GPT-3.5 | Built a GPT-3.5-powered system for answering CRC-related queries | 131 CRC questions | 63.01 mean accuracy, but low comprehensiveness scores (0.73-0.83) | Two physicians reviewed each response, with a third consulted for discrepancies |
| Ma et al[52] | 2024 | GPT-3.5 | Established GPT-3.5-based quality control for post-esophageal ESD procedures | 165 esophageal ESD cases | 92.5%-100% accuracy across post-esophageal ESD quality metrics | Two QC members and a senior supervisor conducted assessment |
| Cohen et al[53] | 2025 | LLaMA-2, Mistral-v0.1 | Explored the ability of LLMs to extract PD-L1 biomarker details for research purposes | 232 EHRs from 10 cancer types | Fine-tuned LLMs outperformed LSTM trained on > 10000 examples | Assessed by 3 clinical experts against manually curated answers |
| Scherbakov et al[54] | 2025 | Mixtral 8 × 7 B | Assessed LLM to extract stressful events from social history of clinical notes | 109556 patients, 375334 notes | Arrest or incarceration (OR = 0.26, 95%CI: 0.06-0.77) | One human reviewer assessed the precision and recall of extracted events |
| Chatziisaak et al[55] | 2025 | GPT-4 | Evaluated the concordance of therapeutic recommendations generated by GPT | 100 consecutive CRC patients | 72.5% complete concordance, 10.2% partial concordance, and 17.3% discordance | Three reviewers independently assessed concordance with MDT |
| Saraiva et al[56] | 2025 | GPT-4 | Assessed GPT-4’s performance in interpreting images in gastroenterology | 740 images | Capsule endoscopy: Accuracies 50.0%-90.0% (AUCs 0.50-0.90) | Three experts reviewed and labeled images for CE |
| Siu et al[57] | 2025 | GPT-4 | Evaluated the efficacy, quality, and readability of GPT-4’s responses | 8 patient-style questions | Accurate (40), safe (4.25), appropriate (4.00), actionable (4.00), effective (4.00) | Evaluated by 8 colorectal surgeons |
| Horesh et al[58] | 2025 | GPT-3.5 | Evaluated management recommendations of GPT in clinical settings | 15 colorectal or anal cancer patients | Rating 48 for GPT recommendations, 4.11 for decision justification | Evaluated by 3 experienced colorectal surgeons |
| Ellison et al[59] | 2025 | GPT-3.5, Perplexity | Compared readability using different prompts | 52 colorectal surgery materials | Average 7.0-9.8, Ease 53.1-65.0, Modified 9.6-11.5 | Compared mean scores between baseline and documents generated by AI |
| Ramchandani et al[60] | 2025 | GPT-4 | Validated the use of GPT-4 for identifying articles discussing perioperative and preoperative risk factors for esophagectomy | 1967 studies for title and abstract screening | Perioperative: Agreement rate = 85.58%, AUC = 0.87. Preoperative: Agreement rate = 78.75%, AUC = 0.75 | Decisions were compared with those of three independent human reviewers |
| Zhang et al[61] | 2025 | GPT-4, DeepSeek, GLM-4, Qwen, LLaMa3 | To evaluate the consistency of LLMs in generating diagnostic records for hepatobiliary cases using the HepatoAudit dataset | 684 medical records covering 20 hepatobiliary diseases | Precision: GPT-4 reached a maximum of 93.42%. Recall: Generally below 70%, with some diseases below 40% | Professional physicians manually verified and corrected all the data |
| Spitzl et al[62] | 2025 | Claude-3.5, GPT-4o, DeepSeekV3, Gemini 2 | Assessed the capability of state-of-the-art LLMs to classify liver lesions based solely on textual descriptions from MRI reports | 88 fictitious MRI reports designed to resemble real clinical documentation | Micro F1-score and macro F1-score: Claude 3.5 Sonnet 0.91 and 0.78, GPT-4o 0.76 and 0.63, DeepSeekV3 0.84 and 0.70, Gemini 2.0 Flash 0.69 and 0.55 | Model performance was assessed using micro and macro F1-scores benchmarked against ground truth labels |
| Sheng et al[63] | 2025 | GPT-4o and Gemini | Investigated the diagnostic accuracies for focal liver lesions | 228 adult patients with CT/MRI reports | Two-step GPT-4o, single-step GPT-4o and single-step Gemini (78.9%, 68.0%, 73.2%) | Six radiologists reviewed the images and clinical information in two rounds (alone, with LLM) |
| Williams et al[64] | 2025 | GPT-4-32K | Determined LLM extract reasons for a lack of follow-up colonoscopy | 846 patients' clinical notes | Overall accuracy: 89.3%, reasons: Refused/not interested (35.2%) | A physician reviewer checked 10% of LLM-generated labels |
| Lu et al[65] | 2025 | MoE-HRS | Used a novel MoE combined with LLMs for risk prediction and personalized healthcare recommendations | SNPs, medical and lifestyle data from United Kingdom Biobank | MoE-HRS outperformed state-of-the-art cancer risk prediction models in terms of ROC-AUC, precision, recall, and F1 score | LLMs-generated advice were validated by clinical medical staff |
| Yang et al[66] | 2025 | GPT-4 | Explored the use of LLMs to enhance doctor-patient communication | 698 pathology reports of tumors | Average communication time decreased by over 70%, from 35 to 10 min (P < 0.001) | Pathologists evaluated the consistency between original and AI reports |
| Jain et al[67] | 2025 | GPT-4, GPT-3.5, Gemini | Studied the performance of LLMs across 20 clinicopathologic scenarios in gastrointestinal pathology | 20 clinicopathologic scenarios in GI | Diagnostic accuracy: Gemini Advanced (95%, P = 0.01), GPT-4 (90%, P = 0.05), GPT-3.5 (65%) | Two fellowship-trained pathologists independently assessed the responses of the models |
| Xu et al[68] | 2025 | GPT-4, GPT-4o, Gemini | Assessed the performance of LLMs in predicting immunotherapy response in unresectable HCC | Multimodal data from 186 patients | Accuracy and sensitivity: GPT-4o (65% and 47%) Gemini-GPT (68% and 58%). Physicians (72% and 70%) | Six physicians (three radiologists and three oncologists) independently assessed the same dataset |
| Deroy et al[69] | 2025 | GPT-3.5 Turbo | Explored the potential of LLMs as a question-answering (QA) tool | 30 training and 50 testing queries | A1: 0.546 (maximum value); A2: 0.881 (maximum value across three runs) | Model-generated answers were compared to the gold standard |
| Ye et al[70] | 2025 | BioBERT-based | Proposed a novel framework that incorporates clinical features to enhance multi-omics clustering for cancer subtyping | Six cancer datasets across three omics levels | Mean survival score of 2.20, significantly higher than other methods | Three independent clinical experts review and validate the clustering results |
The volume and temporal distribution of studies reveal distinct trends between traditional NLP and modern LLM research. Over a 14-year span (2011-2025), only 25 studies focused on traditional NLP approaches, whereas LLM-related publications surged from zero to 42 within five years following 2020, indicating rapid expansion. Since 2023, more than ten new investigations annually have employed frameworks such as LLaMA-2 and Gemini, establishing LLMs as the most dynamic area in intelligent text processing for GI cancers.
As detailed in Table 2[29-70], LLMs have been extensively applied to address a variety of GI cancer-related challenges. For example, GPT series models have been utilized to respond to diverse clinical inquiries, including colon cancer screening, pancreatic cancer treatment, and the diagnosis of cirrhosis and liver cancer. These applications underscore the robust language comprehension and generation capabilities of LLMs, enabling them to manage medical knowledge across multiple domains and provide preliminary informational support for clinicians and patients. For example, in 2023, Emile et al[32] found that GPT-3.5 could generate appropriate responses for 86.8% of 38 CRC questions, with 95% concordance with the 2022 ASCRS guidelines.
Several studies have focused on leveraging LLMs to develop personalized medical systems. Choo et al[48] designed a clinical decision support system that used GPT to generate personalized management plans for stage III recurrent CRC patients. The plans showed 86.7% agreement with the decisions of the tumor board, and 100% agreement for second-line therapies. This indicates that LLMs have the potential to provide customized medical solutions based on patients' specific conditions. LLMs have also been applied to automated report generation and data processing. In 2023, Zhou et al[38] developed a gastric cancer consultation system and an automated report generator based on GPT-3.5 and GPT-4. GPT-4 provided appropriate gastric cancer advice in 91.3% of cases. Moreover, in 2024, Truhn et al[47] used GPT-4 to create a structured data parsing module for clinical text processing, achieving 99%, 96%, and 94% accuracy in extracting T-stage, N-stage, and M-stage respectively, which greatly improved the efficiency of data processing. To further facilitate the application of LLMs, some researchers have dedicated to model comparison and optimization. In 2024, Tariq et al[41] compared the performance of GPT-3.5, GPT-4, and Bard in answering 47 common colonoscopy patient queries. They found that GPT-4 outperformed the others, with 91.4% fully accurate responses. This helps researchers understand the performance of different models and select more suitable ones for optimization and application.
Despite these advancements, several challenges hinder the clinical translation of LLMs in GI cancer. First, while LLMs often exhibit remarkable accuracy and promising applications, these models are not specifically designed for medical contexts. Several studies have further revealed inconsistencies or uncertainties in their reported outcomes. Pereyra et al[50] found GPT-3.5 scored just 4.57/10 for CRC screening recommendations, far below physicians’ 7.72/10, while Tariq et al[41] revealed stark model disparities: GPT-4 delivered 91.4% fully accurate colonoscopy query responses, but GPT-3.5 and Bard only achieved 6.4% and 14.9%, respectively. Even for common tasks, Cao et al[35] noted GPT-3.5 had only 48% accuracy in liver cancer screening (with frequent category errors), demonstrating that generalization issues extend beyond rare cases.
Second, data privacy and compliance risks persist. For example, most widely adopted LLMs (e.g., Claude-3.5) are trained on heterogeneous non-medical datasets, lacking inherent safeguards for sensitive GI cancer data. This creates significant HIPAA/GDPR compliance concerns, raising questions about how patient data is protected during model deployment.
Third, interpretability gaps undermine clinical trust, while GPT-4 shows strong guideline alignment (87% agreement with experts in Gorelik et al[36]), its black-box nature means clinicians cannot trace the reasoning behind outputs, a critical flaw in high-stakes scenarios. This is exemplified by Yeo et al[34], where GPT achieved 74% correctness for HCC-related queries but only 47.3% comprehensiveness. Clinicians could not verify why incomplete information was generated, limiting reliance on such tools.
Together, these challenges, inconsistent performance, privacy risks, and opaque reasoning, create barriers to integrating LLMs into routine GI cancer care, as they fail to meet the rigor and reliability required for clinical decision-making. To address the aforementioned challenges and accelerate the clinical integration of LLMs, future research should prioritize directions based on findings presented in Table 2. Such as enhancing fine-tuning LLMs on GI-specific datasets, integrating rule-based checks to verify outputs (with traditional NLP in Supplementary Table 1), using open-source models with local deployment for privacy-sensitive data handling.
Since the early 2020s, VFMs have revolutionized biomedical image analysis[25,71]. These models acquire universal visual representations from extensive collections of unlabeled medical images and can be adapted to specialized tasks, such as GI cancer detection, through fine-tuning on relatively small labeled datasets[72]. For example, in CRC screening, FMs have demonstrated substantial improvements in polyp detection accuracy following fine-tuning. Moreover, VFMs are increasingly employed in cross-modal applications[73]. They integrate different modalities of data to achieve a more comprehensive understanding of disease pathology. This integration necessitates the processing of diverse datasets and significant computational resources. However, the emergence of open-source VFMs, including MedSAM and Bio
VFMs in endoscopy: Endoscopy constitutes a critical modality for the diagnosis and management of GI cancers, generating vast quantities of images that capture essential information ranging from early lesions to advanced tumor stages. Traditionally, the interpretation of these images has relied heavily on the expertise of experienced endoscopists, a process that is both time-intensive and susceptible to human error, especially given the increasing volume of examinations[75]. VFMs offer a novel solution by enabling direct analysis of endoscopic video streams, facilitating the automatic localization and classification of lesions such as polyps and ulcers.
Table 3 summarizes 19 recent studies (2023-2025), all of which intentionally adapt VFMs for endoscopy applications. Due to space constraints, more detailed information about these models, such as Country, Dataset sizes, Evaluation metrics, Fine-tuning strategies, Performance benchmarks, and GPUs, is presented in Supplementary Table 2. In contrast, Supplementary Table 3 focuses on VFMs benchmarked in endoscopy. It includes models that are not specifically trained or fine-tuned for endoscopy, but some models in Table 3[76-94] use these for benchmarking. This table holds significance as it provides reference results from general or medical-general VFMs. It highlights the transferability of VFMs’ visual feature extraction capabilities and enriches the overall analysis of VFMs in the context of endoscopy.
| Model | Year | Architecture | Training algorithm | Parameters | Datasets | Disease studied | Model type | Source code link |
| Surgical-DINO[76] | 2023 | DINOv2 | LoRA layers added to DINOv2, optimizing the LoRA layers | 86.72M | SCARED, Hamlyn | Endoscopic Surgery | Vision | https://github.com/BeileiCui/SurgicalDINO |
| ProMISe[77] | 2023 | SAM (ViT-B) | APM and IPS modules are trained while keeping SAM frozen | 1.3-45.6M | EndoScene, ColonDB etc. | Polyps, Skin Cancer | Vision | NA |
| Polyp-SAM[78] | 2023 | SAM | Strategy as pretrain only the mask decoder while freezing all encoders | NA | CVC-ColonDB Kvasir etc. | Colon Polyps | Vision | https://github.com/ricklisz/Polyp-SAM |
| Endo-FM[79] | 2023 | ViT B/16 | Pretrained using a self-supervised teacher-student framework, and fine-tuned on downstream tasks | 121M | Colonoscopic, LDPolyp etc. | Polyps, erosion, etc. | Vision | https://github.com/med-air/Endo-FM |
| ColonGPT[80] | 2024 | SigLIP-SO, Phi1.5 | Pre-alignment with image-caption pairs, followed by supervised fine-tuning using LoRA | 0.4-1.3B | ColonINST (30k+ images) | Colorectal polyps | Vision | https://github.com/ColonGPT/ColonGPT |
| DeepCPD[81] | 2024 | ViT | Hyperparameters are optimized for colonoscopy datasets, including Adam optimizer | NA | PolypsSet, CP-CHILD-A etc. | CRC | Vision | https://github.com/Zhang-CV/DeepCPD |
| OneSLAM[82] | 2024 | Transformer (CoTracker) | Zero-shot adaptation using TAP + Local Bundle Adjustment | NA | SAGE-SLAM, C3VD etc. | Laparoscopy, Colon | Vision | https://github.com/arcadelab/OneSLAM |
| EIVS[83] | 2024 | Vision Mamba, CLIP | Unsupervised Cycle‑Consistency | 63.41M | 613 WLE, 637 images | Gastrointestinal | Vision | NA |
| APT[84] | 2024 | SAM | Parameter-efficient fine-tuning | NA | Kvasir-SEG, EndoTect etc. | CRC | Vision | NA |
| FCSAM[85] | 2024 | SAM | LayerNorm LoRA fine-tuning strategy | 1.2M | Gastric cancer (630 pairs) etc. | GC, Colon Polyps | Vision | NA |
| DuaPSNet[86] | 2024 | PVTv2-B3 | Transfer learning with pre-trained PVTv2-B3 on ImageNet | NA | LaribPolypDB, ColonDB etc. | CRC | Vision | https://github.com/Zachary-Hwang/Dua-PSNet |
| EndoDINO[87] | 2025 | ViT (B, L, g) | DINOv2 methodology, hyperparameters tuning | 86M to 1B | HyperKvasir, LIMUC | GI Endoscopy | Vision | https://github.com/ZHANGBowen0208/EndoDINO/ |
| PolypSegTrack[88] | 2025 | DINOv2 | One-step fine-tuning on colonoscopic videos without first pre-training | NA | ETIS, CVC-ColonDB etc. | Colon polyps | Vision | NA |
| AiLES[89] | 2025 | RF-Net | Not fine-tuned from external model | NA | 100 GC patients | Gastric cancer | Vision | https://github.com/CalvinSMU/AiLES |
| PPSAM[90] | 2025 | SAM | Fine-tuning with variable bounding box prompt perturbations | NA | EndoScene, ColonDB etc. | Investigated in Ref. | Vision | https://github.com/SLDGroup/PP-SAM |
| SPHINX-Co[91] | 2024 | LLaMA-2 + SPHINX-X | Fine-tuned SPHINX-X on CoPESD with cosine learning rate scheduler | 7B, 13B | CoPESD | Gastric cancer | Multimodal | https://github.com/gkw0010/CoPESD |
| LLaVA-Co[91] | 2024 | LLaVA-1.5 (CLIP-ViT-L) | Fine-tuned LLaVA-1.5 on CoPESD with cosine learning rate scheduler | 7B, 13B | CoPESD | Gastric cancer | Multimodal | https://github.com/gkw0010/CoPESD |
| ColonCLIP[92] | 2025 | CLIP | Prompt tuning with frozen CLIP, then encoder fine-tuning with frozen prompts | 57M, 86M | OpenColonDB | CRC | Multimodal | https://github.com/Zoe-TAN/ColonCLIP-OpenColonDB |
| PSDM[93] | 2025 | Stable Diffusion + CLIP | Continual learning with prompt replay to incrementally train on multiple datasets | NA | PolypGen, ColonDB, Polyplus etc. | CRC | Vision, Generative | The original paper reported a GitHub link for this model, but it is currently unavailable |
| PathoPolypDiff[94] | 2025 | Stable Diffusion v1-4 | Fine-tuned Stable Diffusion v1-4 and locked first U-Net block, fine-tuned remaining blocks | NA | ISIT-UMR Colonoscopy Dataset | CRC | Generative | https://github.com/Vanshali/PathoPolyp-Diff |
VFMs demonstrate notable strengths in GI cancer endoscopy through multiple advanced approaches. Parameter-efficient variants such as Surgical-DINO (LoRA, 0.3% trainable) and APT/FCSAM (adapter-based, < 1%) achieve competitive results, while fully-fine-tuned Endo-FM reaches 73.9 Dice on CVC-12k[76,79]. With respect to multimodal reasoning, LLaVA-Co achieves GPT scores of 85.6/100 and mIoU 60.2% on ESD benchmarks[91]. Regarding unified architectures across tasks, SAM-derived pipelines (e.g., ProMISe[77], Polyp-SAM[78], APT[84], FCSAM[85], PP-SAM[90]) have so far been individually evaluated for either segmentation or detection metrics. This suggests a single foundation backbone could replace the current patchwork of bespoke CNNs. For generative augmentation, PSDM[93] and Patho
Supplementary Table 3 has unique value in the context of VFMs for GI endoscopy: It includes models that are not optimized specifically for endoscopy but still prove useful in benchmarking. For example, models like TimeSformer and ST-Adapter, despite lacking endoscopy-specific refinement, demonstrate certain value when used in the benchmarking of Endo-FM[79]. Meanwhile, general-purpose models such as SAM, Gemini-1.5, and Stable Diffusion are also tested in the benchmarking of other models like PPSAM[90], ColonCLIP[92] and PathoPolyp-Diff[94] respectively, showing their potential to support performance evaluation in this specialized field. These results confirm the general-purpose vision-language capabilities of models like CLIP and Gemini-1.5 (Supplementary Table 3), even when the base model has never been exposed to endoscope data.
Collectively, these findings show that VFMs, whether applied directly or through secondary development, play a pivotal role in GI cancer endoscopy tasks including polyp recognition and early lesion monitoring. They contribute to enhanced diagnostic efficiency and accuracy. Furthermore, the reviewed studies highlight the complementary strengths of diverse models in specific tasks, thereby laying the groundwork for future multi-model fusion systems aimed at intelligent endoscopic diagnosis.
VFMs in radiology: VFMs have become increasingly significant in radiology, particularly for GI cancer diagnosis, complementing traditional endoscopic approaches. Radiological modalities such as CT, MRI, and positron emission tomography play essential roles in initial cancer staging, metastasis detection, treatment monitoring, and postoperative recurrence identification[95]. Traditional radiology methods involve manually marking regions of interest and extracting features, which is reliable but time-consuming and constrained by limited data[96]. In contrast, VFMs using Transformer-based architectures enable automated processing of entire images, capturing intricate details of tumors and adjacent tissues. This reduces the need for manual annotation. The recent availability of large-scale, open-source VFMs pre-trained on millions of radiographs has facilitated fine-tuning on relatively small datasets, such as several dozen enhanced CT scans for gastric or CRC, using modest computing resources[97].
To summarize the application and development of VFMs in radiology for GI cancer, three key tables are presented in this section. Table 4 encapsulates 10 representative VFM studies, covering essential information such as model architecture, training algorithm, applied datasets. Supplementary Table 4 extends the content of Table 4 by providing more methodological details for the same 10 models, including specific sizes of datasets, evaluation metrics, fine-tuning strategies, performance benchmarks. Meanwhile, Supplementary Table 5 offers a set of models that were not specifically trained or fine-tuned for radiology tasks but were adopted as benchmarks by several models in Table 4[97-105], thereby providing a comparative context to assess the relative performance of VFMs tailored for radiology.
| Model | Year | Architecture | Training algorithm | Parameters | Datasets | Disease studied | Model type | Source code link |
| PubMedCLIP[98] | 2021 | CLIP | Fine-tuned on ROCO dataset for 50 epochs with Adam optimizer | NA | ROCO, VQA-RAD, SLAKE | Abdomen samples | Multimodal | https://github.com/sarahESL/PubMedCLIP |
| RadFM[97] | 2023 | MedLLaMA-13B | Pre-trained on MedMD and fine-tuned on RadMD | 14B | MedMD, RadMD etc. | Over 5000 diseases | Multimodal | https://github.com/chaoyi-wu/RadFM |
| Merlin[99] | 2024 | I3D-ResNet152 | Multi-task learning with EHR and radiology reports and fine-tuning for specific tasks | NA | 6M images, 6M codes and reports | Multiple diseases, Abdominal | Multimodal | NA |
| MedGemini[100] | 2024 | Gemini | Fine-tuning Gemini 1.0/1.5 on medical QA, multimodal and long-context corpora | 1.5B | MedQA, NEJM, GeneTuring | Various | Multimodal | https://github.com/Google-Health/med-gemini-medqa-relabelling |
| HAIDEF[101] | 2024 | VideoCoCa | Fine-tuning on downstream tasks with limited labeled data | NA | CT volumes and reports | Various | Vision | https://huggingface.co/collections/google/ |
| CTFM[102] | 2024 | Vision Model1 | Trained using a self-supervised learning strategy, employing a SegResNet encoder for the pre-training phase | NA | 26298 CT scans | CT scans (stomach, colon) | Vision | https://aim.hms.harvard.edu/ct-fm |
| MedVersa[103] | 2024 | Vision Model1 | Trained from scratch on the MedInterp dataset and adapted to various medical imaging tasks | NA | MedInterp | Various | Vision | https://github.com/3clyp50/MedVersa_Internal |
| iMD4GC[104] | 2024 | Transformer-based2 | A novel multimodal fusion architecture with cross-modal interaction and knowledge distillation | NA | GastricRes/Sur, TCGA etc. | Gastric cancer | Multimodal | https://github.com/FT-ZHOU-ZZZ/iMD4GC/ |
| Yasaka et al[105] | 2025 | BLIP-2 | LORA with specific fine-tuning of the fc1 layer in the vision and q-former models | NA | 5777 CT scans | Esophageal cancer via chest CT | Multimodal | NA |
First, in terms of architectural diversity and technical adaptation, VFMs have evolved from single-modal vision models to integrated multimodal systems. On one hand, vision-specific models focus on optimizing image feature extraction for GI-related scans. for example, CT-FM adopts a SegResNet encoder and uses SSL to process 26298 CT scans, targeting stomach and colon cancer imaging[102]; MedVersa, trained from scratch on the MedInterp dataset, is adapted to multiple medical imaging tasks, including GI cancer detection[103]. On the other hand, multimodal models integrate non-imaging data to enhance diagnostic accuracy. Merlin uses an I3D-ResNet152 architecture and incorporates multi-task learning with EHR and radiology reports, enabling it to handle abdominal GI diseases alongside other conditions[99]. Second, regarding disease coverage and clinical targeting, VFMs now address a broader spectrum of GI cancers while main
Unlike these specialized radiology models, several general-purpose VFMs, untailored for radiology, serve as bench
Despite the progress of VFMs in GI cancer radiology, several radiology-specific limitations and challenges remain evident in current research. For example, dataset bias and scarcity hinder model generalizability. Models like Yasaka et al[105] rely on relatively small, single-center datasets of 5777 CT scans, which may fail to capture the variability of GI cancer imaging across different populations or clinical settings. There is also limited focus on 3D radiological data. Most models (e.g., PubMedCLIP, RadFM) primarily process 2D images, while 3D CT/MRI volumes, critical for assessing tumor depth and spread in GI cancer, are less addressed (Merlin mentions 3D semantic segmentation with a Dice score of 0.798). To address these issues, future research should prioritize radiology-tailored recommendations. For instance, expand multi-center, diverse datasets for training so that future models could integrate data from global GI cancer centers to reduce bias. In practice, it is possible to combine TCGA data (used by iMD4GC) with real-world clinical scans to cover more ethnicities and disease stages[104]. Moreover, it is useful to enhance 3D data processing capabilities. Leveraging Merlin’s progress in 3D segmentation, future VFMs should optimize architectures for 3D GI cancer imaging to improve tumor staging accuracy, a key radiological task for treatment planning[99].
VFMs in pathology: Histopathology plays a pivotal role in cancer diagnosis, prognosis, and treatment. Traditionally, pathologists examined tissue slides under microscopes, a process that was slow, labor-intensive, and prone to errors stemming from variability in expertise. Such limitations occasionally resulted in misdiagnoses, particularly in complex cases[106]. The integration of digital technologies revolutionized this domain through whole-slide imaging (WSI), which converts glass slides into high-resolution digital images that retain all microscopic details[107]. But manual analysis of these extensive datasets was impractical. This led to the rise of computational pathology, which uses computer al
To elaborate on the application and advancement of VFMs in GI pathology, Table 5 encapsulates 28 representative VFM studies, showing the deployment of VFMs for tasks like detection & classification, segmentation, and histopathological assessment in GI WSIs. Due to space constraints, Supplementary Table 6 provides comprehensive methodological details for each corresponding model. These applications have markedly enhanced diagnostic efficiency and accuracy. Unlike the direct utilization of FMs in LLMs or endoscopic imaging, GI histopathology adopts a distinct technical approach, likely influenced by the extensive research in computational pathology favoring customized and specialized model architectures. By training and fine-tuning models on domain-specific pathological data, these VFMs achieve precise recognition and analysis of tumor features, rather than relying on general-purpose models.
| Model | Year | Architecture | Training Algorithm | Paras | WSIs | Tissues | Open source link |
| LUNIT-SSL[110] | 2021 | ViT-S | DINO; full fine-tuning and linear evaluation on downstream tasks | 22M | 3.7K | 32 | https://Lunitio.github.io/research/publications/pathology_ssl |
| CTransPath[111] | 2022 | Swin Transformer | MoCoV3 (SRCL); frozen backbone with linear classifier fine-tuning | 28M | 32K | 32 | https://github.com/Xiyue-Wang/TransPath |
| Phikon[112] | 2023 | ViT-B | iBOT (Masked Image Modeling); fine-tuned with ABMIL/TransMIL on frozen features | 86M | 6K | 16 | https://github.com/owkin/HistoSSLscaling |
| REMEDIS[113] | 2023 | BiT-L (ResNet-152) | SimCLR (contrastive learning); end-to-end fine-tuning on labeled ID/OOD data | 232M | 29K | 32 | https://github.com/google-research/simclr |
| Virchow[114] | 2024 | ViT-H, DINOv2 | DINOv2 (SSL); used frozen embeddings with simple aggregators | 632M | 1.5M | 17 | https://huggingface.co/paige-ai/Virchow |
| Virchow2[115] | 2024 | ViT-H | DINOv2 (SSL); fine-tuned with linear probes or full-tuning on downstream tasks | 632M | 3.1M | 25 | https://huggingface.co/paige-ai/Virchow2 |
| Virchow2G[115] | 2024 | ViT-G | DINOv2 (SSL); fine-tuned with linear probes or full fine-tuning | 1.9B | 3.1M | 25 | https://huggingface.co/paige-ai/Virchow2 |
| Virchow2G mini[115]1 | 2024 | ViT-S, Virchow2G | DINOv2 (SSL); distilled from Virchow2G, then fine-tuned on downstream tasks | 22M | 3.2M | 25 | https://huggingface.co/paige-ai/Virchow2 |
| UNI[9] | 2024 | ViT-L | DINOv2 (SSL); used frozen features with linear probes or few-shot learning | 307M | 100K | 20 | https://github.com/mahmoodlab/UNI |
| Phikon-v2[116] | 2024 | ViT-L | DINOv2 (SSL); frozen ViT and ABMIL ensemble fine-tuning | 307M | 58K | 30 | https://huggingface.co/owkin/phikon-v2 |
| RudolfV[117] | 2024 | ViT-L | DINOv2 (SSL); fine-tuned with optimizing linear classification layer and adapting encoder weights | 304M | 103K | 58 | https://github.com/rudolfv |
| HIBOU-B[118] | 2024 | ViT-B | DINOv2 (SSL); frozen feature extractor, trained linear classifier or attention pooling | 86M | 1.1M | 12 | https://github.com/HistAI/hibou |
| HIBOU-L[118]2 | 2024 | ViT-L | DINOv2 (SSL); frozen feature extractor, trained linear classifier or attention pooling | 307M | 1.1M | 12 | https://github.com/HistAI/hibou |
| H-Optimus-03 | 2024 | ViT-G | DINOv2 (SSL); linear probe and ABMIL on frozen features | 1.1B | > 500K | 32 | https://github.com/bioptimus/releases/ |
| Madeleine[119] | 2024 | CONCH | MAD-MIL; linear probing, prototyping, and full fine-tuning for downstream tasks | 86M | 23K | 2 | https://github.com/mahmoodlab/MADELEINE |
| COBRA[120] | 2024 | Mamba-2 | Self-supervised contrastive pretraining with multiple FMs and Mamba2 architecture | 15M | 3K | 6 | https://github.com/KatherLab/COBRA |
| PLUTO[121] | 2024 | FlexiVit-S | DINOv2; frozen backbone with task-specific heads for fine-tuning | 22M | 158K | 28 | NA |
| HIPT[122] | 2025 | ViT-HIPT | DINO (SSL); fine-tune with gradient accumulation | 10M | 11K | 33 | https://github.com/mahmoodlab/HIPT |
| PathoDuet[123] | 2025 | ViT-B | MoCoV3; fine-tuned using standard supervised learning on labeled downstream task data | 86M | 11K | 32 | https://github.com/openmedlab/PathoDuet |
| Kaiko[124] | 2025 | ViT-L | DINOv2 (SSL); linear probing with frozen encoder on downstream tasks | 303M | 29K | 32 | https://github.com/kaiko-ai/towards_large_pathology_fms |
| PathOrchestra[125] | 2025 | ViT-L | DINOv2; ABMIL, linear probing, weakly supervised classification | 304M | 300K | 20 | https://github.com/yanfang-research/PathOrchestra |
| THREADS[126] | 2025 | ViT-L, CONCHv1.5 | Fine-tune gene encoder, initialize patch encoder randomly | 16M | 47K | 39 | https://github.com/mahmoodlab/trident |
| H0-mini[127] | 2025 | ViT | Using knowledge distillation from H-Optimus-0 | 86M | 6K | 16 | https://huggingface.co/bioptimus/H0-mini |
| TissueConcepts[128] | 2025 | Swin Transformer | Frozen encoder with linear probe for downstream tasks | 27.5M | 7K | 14 | https://github.com/FraunhoferMEVIS/MedicalMultitaskModeling |
| OmniScreen[129] | 2025 | Virchow2 | Attention-aggregated Virchow2 embeddings fine-tuning | 632M | 48K | 27 | https://github.com/OmniScreen |
| BROW[130] | 2025 | ViT-B | DINO (SSL); self-distillation with multi-scale and augmented views | 86M | 11K | 6 | NA |
| BEPH[131] | 2025 | BEiTv2 | BEiTv2 (SSL); supervised fine-tuning on clinical tasks with labeled data | 86M | 11K | 32 | https://github.com/Zhcyoung/BEPH |
| Atlas[132] | 2025 | ViT-H, RudolfV | DINOv2; linear probing with frozen backbone on downstream tasks | 632M | 1.2M | 70 | NA |
The current research of VFMs in GI pathology presents distinct characteristics across three dimensions, with evidence supported by models from Table 5 and Supplementary Table 6. First, in terms of model architecture, there has been a clear trend toward diversification and scale expansion, with ViT variants becoming the dominant framework while complementary architectures continue to emerge. As shown in Table 5[9,110-132], early models (e.g., LUNIT-SSL) ado
Despite their promising progress, VFMs still face distinct limitations and challenges when applied to GI pathology, most of which are closely tied to the unique characteristics of pathological analysis and clinical workflows. First, over-reliance on large-scale, high-quality pathological datasets restricts accessibility. For example, models like Virchow2[115] and Atlas[132] use 3.1M and 1.2M WSIs respectively (Table 5), but such multi-institutional, well-annotated cohorts (e.g., covering rare GI cancer subtypes) are scarce in clinical practice. Smaller datasets (e.g., COBRA’s 3K WSIs) sometimes lead to limited generalization to diverse pathological scenarios[120]. Second, mismatch between model design and patho
Future research on VFMs in GI cancer pathology should target specific limitations. For example, to address data scarcity, it is a priority to develop small-dataset-adaptable VFMs. The H0-mini model success in leveraging 6K WSIs (Table 5) via knowledge distillation from H-Optimus-0[127]. Future models could integrate distillation and cross-stain transfer learning, enabling reliable training even with limited GI cohorts (similar to Virchow2G mini)[115]. Second, to enhance pathological interpretability, designing feature-aligned VFMs is useful. Drawing on Phikon-v2, particularly its biomarker prediction tasks (Supplementary Table 6), future models could link image features to pathological biomarkers (e.g., MSI, HER2, ER in GI tumors), bridging the gap between model outputs and pathologists’ morphological analysis[116]. Third, to improve clinical deployment, optimizing lightweight VFMs for laboratory hardware is critical. Following TissueConcepts’ 27.5M-parameter design (Table 5) and efficient linear-probe fine-tuning (Supplementary Table 6), future research should focus on compressing models to run on standard laboratory workstations, avoiding reliance on large GPU clusters (as needed by larger models like Virchow2 or Phikon-v2)[128]. Finally, to tackle sample variability, training VFMs on heterogeneous pathological datasets is necessary. Models could incorporate augmented data simulating staining inconsistencies and tissue folding, enhancing robustness to real-world GI biopsy variations.
In the preceding overview of endoscopic and radiological imaging, multimodal FMs have been recurrently highlighted (Tables 2 and 3). These models integrate different types of data, like endoscopic images with text, or CT and MRI scans alongside clinical records and genetic information, to yield superior diagnostic and prognostic performance relative to unimodal approaches. For instance, the ColonCLIP model analyzes endoscopic images and reports together, and GPT-4V uses a multimodal approach for radiological image analysis[92,133]. MLLMs are designed to process and integrate diverse data modalities (text, images, etc.), thereby capturing intermodal relationships that facilitate more efficient learning and enhanced predictive accuracy[134]. They work by merging diverse data into a unified representation, extracting key features from each data type (e.g., word embeddings from text or CNN features from images), and subsequently integrating these features through mechanisms like multilayer perceptrons or graph neural networks. Such integrative modeling holds considerable promise in medical contexts, offering comprehensive diagnostic insights that can improve therapeutic strategies for diseases including GI cancers[135].
Table 6 summarizes pivotal studies investigating MLLMs within GI pathology, while Supplementary Table 7 extends this overview by detailing methodological aspects constrained by space in the main table. The Supplementary material elaborates on training datasets, specifying sources and volumes of image-text pairs or WSIs, performance evaluation metrics across various tasks, and the training and fine-tuning protocols employed. Collectively, these resources provide a thorough depiction of the current landscape of MLLMs in GI cancer research, enabling an in-depth examination of their potential applications.
| Model | Year | Vision architecture | Vision dataset | WSIs | Text model | Text dataset | Parameters | Tissues | Generative | Open source link |
| PLIP[136] | 2023 | CLIP | OpenPath | 28K | CLIP | OpenPath | NA | 32 | Captioning | https://github.com/PathologyFoundation/plip |
| HistGen[137] | 2023 | DINOv2, ViT-L | Multiple | 55K | LGH Module | TCGA | Approximately 100M | 32 | Report generation | https://github.com/dddavid4real/HistGen |
| PathAlign[138] | 2023 | PathSSL | Custom | 350K | BLIP-2 | Diagnostic reports | Approximately 100M | 32 | Report generation | https://github.com/elonybear/PathAlign |
| CHIEF[139] | 2024 | CTransPath | 14 Sources | 60K | CLIP | Anatomical information | 27.5M, 63M | 19 | No | https://github.com/hms-dbmi/CHIEF |
| PathGen[140] | 2024 | LLaVA, CLIP | TCGA | 7K | CLIP | 1.6M pairs | 13B | 32 | WSI assistant | https://github.com/PathFoundation/PathGen-1.6M |
| PathChat[141] | 2024 | UNI | Multiple | 999K | LLaMa 2 | Pathology instructions | 13B | 20 | AI assistant | https://github.com/fedshyvana/pathology_mllm_training |
| PathAsst[142] | 2024 | PathCLIP | PathCap | 207K | Vicuna-13B | Pathology instructions | 13B | 32 | AI assistant | https://github.com/superjamessyx/Generative-Foundation-AI-Assistant-for-Pathology |
| ProvGigaPath[143] | 2024 | ViT | Prov-Path | 171K | OpenCLIP | 17K Reports | 1135 | 31 | No | https://github.com/prov-gigapath/prov-gigapath |
| TITAN[144] | 2024 | ViT | Mass340K | 336K | CoCa | Medical reports | Approximately 5B | 20 | Report generation | https://github.com/your-repo/TITAN |
| CONCH[145] | 2024 | ViT | Multiple | 21K | GPTstyle | 1.17M pairs | NA | 19 | Captioning | http://github.com/mahmoodlab/CONCH |
| SlideChat[146] | 2024 | CONCHLongNet | TCGA | 4915 | Qwen2.5-7B | Slide Instructions | 7B | 10 | WSI assistant | https://github.com/uni-medical/SlideChat |
| PMPRG[147] | 2024 | MR-ViT | Custom | 7422 | GPT-2 | Pathology Reports | NA | 2 | Multi-organ report | https://github.com/hvcl/Clinical-grade-PathologyReport-Generation |
| MuMo[148] | 2024 | MnasNet | Custom | 429 | Transformer | PathoRadio Reports | NA | 1 | No | https://github.com/czifan/MuMo |
| ConcepPath[149] | 2024 | ViT-B, CONCH | Quilt-1M | 2243 | CLIPGPT | PubMed | Approximately 187M | 3 | No | https://github.com/HKU-MedAI/ConcepPath |
| GPT-4V[150] | 2024 | Phikon ViT-B | CRC-7K, MHIST etc. | 338K | GPT-4 | NA | 40M | 3 | Report generation | https://github.com/Dyke-F/GPT-4V-In-Context-Learning |
| MINIM[151] | 2024 | Stable diffusion | Multiple | NA | BERT, CLIP | Multiple | NA | 6 | Report generation | https://github.com/WithStomach/MINIM |
| PathM3[152] | 2024 | ViT-g/14 | PatchGastric | 991 | FlanT5XL | PatchGastric | NA | 1 | WSI assistant | NA |
| FGCR[153] | 2024 | ResNet50 | Custom, GastrADC | 3598, 991 | BERT | NA | 9.21 Mb | 6 | Report generation | https://github.com/hudingyi/FGCR |
| PromptBio[154] | 2024 | PLIP | TCGA, CPTAC | 482, 105 | GPT-4 | NA | NA | 1 | Report generation | https://github.com/DeepMed-Lab-ECNU/PromptBio |
| HistoCap[155] | 2024 | ViT | NA | 10K | BERT, BioBERT | GTEx datasets | NA | 40 | Report generation | https://github.com/ssen7/histo_cap_transformers |
| mSTAR[156] | 2024 | UNI | TCGA | 10K | BioBERT | Pathology Reports 11K | NA | 32 | Report generation | https://github.com/Innse/mSTAR |
| GPT-4 Enhanced[157] | 2025 | CTransPath | TCGA | NA | GPT-4 | ASCO, ESMO, Onkopedia | NA | 4 | Recommendation generation | https://github.com/Dyke-F/LLM_RAG_Agent |
| PRISM[158] | 2025 | Virchow, ViT-H | Virchow dataset | 587K | BioGPT | 195K Reports | 632M | 17 | Report generation | NA |
| HistoGPT[159] | 2025 | CTransPath, UNI | Custom | 15K | BioGPT | Pathology Reports | 30M to 1.5B | 1 | WSI assistant | https://github.com/marrlab/HistoGPT |
| PathologyVLM[160] | 2025 | PLIP, CLIP | PCaption-0.8M | NA | LLaVA | PCaption-0.5M | NA | Multi | Report generation | https://github.com/ddw2AIGROUP2CQUP/PA-LLaVA |
| MUSK[161] | 2025 | Transformer | TCGA | 33K | Transformer | PubMed Central | 675M | 33 | Question answering | https://github.com/Lilab-stanford/MUSK |
Starting with model development and architecture, a key trend lies in the integration of vision and language modules, as exemplified by SlideChat (Table 6)[136-161]. This model employs a dedicated vision encoder to process gigapixel WSIs and pairs it with a language model to enable multimodal conversational capabilities. It further notes that SlideChat’s integration design allows it to answer complex GI tissue pathology questions based on WSI input, achieving an overall accuracy of 81.17% on the SlideBench-VQA (TCGA) benchmark[146]. This result not only validates the effectiveness of cross-modality integration but also highlights the need for targeted parameterization and optimization. Many MLLMs in this field, including those detailed in Supplementary Table 7, undergo fine-tuning of their text-component parameters on GI-cancer-specific datasets, a process that adjusts models to better capture features like histological subtypes of gastric cancer, thereby laying a technical foundation for subsequent dataset utilization and clinical applications.
Closely tied to model advancement is the development of dataset utilization, as high-performance MLLMs rely on both diverse and specialized data sources to generalize to real-world GI cancer scenarios. On one hand, models in Table 6 leverage multi-modal datasets combining publicly available GI cancer image repositories and paired pathology reports, textual documents that detail histological features, diagnoses, and even patient clinical histories. These datasets, often containing thousands of image-text pairs, train MLLMs to establish meaningful correlations between tissue visual appearance and textual descriptions, a prerequisite for accurate clinical interpretation. On the other hand, to address unique challenges in GI pathology (such as WSI-specific analysis), specialized datasets have been developed. An example is the PathCap dataset (Supplementary Table 7), which focuses on multi-modal comprehension for pathology[142]. This dataset integrates WSI patches, associated clinical reports, and a rich collection of 207k image-caption pairs designed to simulate real-world diagnostic queries. By leveraging this multimodal dataset, researchers can train models to better understand the complex interplay between visual and textual information, thereby accelerating the translation of advanced AI techniques into actionable clinical insights.
The technical advancements in models and datasets have ultimately driven applications of MLLMs in GI cancer diagnosis and prognosis. In diagnosis, MLLMs excel at identifying distinct GI cancer types by linking histological image patterns to text-based diagnostic criteria, which notes that several models can distinguish or predict EBV or HER2-positive gastric cancer subtypes (MuMo[148] or ConcepPath[149] respectively). Beyond diagnosis, MLLMs are also advancing prognosis prediction by integrating multi-source data. They extract histological features from images and combine them with patient-specific information from text reports (e.g., tumor stage, grade, molecular markers). Findings suggest these multimodal prognostic models offer more comprehensive and accurate predictions than traditional methods relying solely on single-modality data, reflecting the synergistic progress of MLLMs across model design, data curation, and clinical translation in GI cancer pathology (e.g., CHIEF[139], PathGen[140], MuMo[148]).
Despite their progress, current MLLMs in GI cancer pathology also face distinct limitations. First, data dependence and scarcity hinder generalization, limiting a model's ability to perform well on diverse datasets due to insufficient training data. Models like PathM3 (Table 6) rely on only 991 WSIs from the PatchGastric dataset[152], while MuMo uses a mere 429 WSIs, small sample sizes that risk overfitting to specific tissue types or institutions[148], unlike larger-scale models such as PathChat (999K WSIs) which have broader but still non-representative datasets lacking diverse clinical settings[141]. Second, limited model accessibility and transparency pose barriers to widespread adoption and trust due to restricted availability and unclear operational mechanisms. Models including PRISM[158] and PathM3[152] lack open-source links, preventing independent validation by other researchers (Table 6). Even open models like CHIEF require 8 V100 GPUs (Supplementary Table 7), a resource beyond many clinical labs[139]. Finally, current models are sometimes designed for specific tasks, making them less useful for broader or more varied needs. Several models (e.g., HistGen[137], CONCH[145], FGCR[153]) focus solely on report generation, converting WSI features into text without supporting diagnostic or prognostic assistance. Only 3 out of 26 models (e.g., MUSK[161]) support question-answering for rare GI cancer subtypes. Five models (e.g., CHIEF[139], ConcepPath[149]) are explicitly non-generative, performing only basic tasks like classification and unable to address complex clinical needs such as report interpretation or treatment suggestions.
Future research on MLLMs in GI cancer pathology could improve current weaknesses by making better use of the models’ hidden potential and tackling key missing capabilities. For example, it is possible to enhance the model's ability to perform a broader range of clinical tasks, enabling it to support diverse applications such as diagnosis assistance, prognosis prediction and treatment recommendation. Second, it could enhance the diversity, quality, and clinical relevance of training data by including a broader range of patient demographics, cancer subtypes (including rare forms), disease stages, and multimodal information to ensure models generalize well across real-world clinical scenarios. Third, it could be helpful to improve the integration of these models with real-world clinical workflows by ensuring their outputs are not only accurate and interpretable but also actionable and relevant to practical needs.
This review retrospectively summarizes some key and representative studies concerning the application of FMs in GI cancer research. Given that many artificial intelligence terms (e.g., zero-shot learning, black-box problem) may not be familiar to medical researchers, we summarized Supplementary Table 8 to define key terms used in this review for improved clarity. Due to inherent limitations in literature search and screening, it is acknowledged that some studies may not have been included. Although numerous investigations have already shown that FMs have considerable potential in this domain, there are still some challenges in using them and bringing them into clinical practice. For example, medical imaging and pathology data often have different formats and standards across institutions. This makes it hard for models to work well in different settings, especially in studies done at just one center[162]. Furthermore, publication bias remains a concern, whereby studies reporting positive outcomes are preferentially published, whereas negative or inconclusive results often remain unpublished, thereby skewing the overall scientific evidence base.
The extant evidence supporting the use of FMs in GI oncology is constrained by several methodological and practical limitations. First, with respect to data privacy and security, FMs typically necessitate large-scale datasets to achieve optimal performance, which inherently increases the risk of data breaches and unauthorized access[163]. Conventional de-identification techniques are increasingly insufficient, especially when integrating multimodal data types such as imaging, genomics, and EHRs, which may facilitate re-identification. To mitigate these risks, the incorporation of privacy-preserving technologies into model development is imperative[164]. Approaches such as federated learning enable model training across multiple institutions without sharing raw data, effectively shifting the model rather than the data. Differential privacy techniques introduce controlled noise during training to safeguard individual identities, while blockchain technology offers immutable systems for tracking data access and consent. Ensuring global compliance necessitates governance frameworks aligned with regulations such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA), thereby promoting secure and ethical data utilization.
Second, regarding model interpretability and clinical trust, FMs often function as "black boxes", with limited transparency in their decision-making processes, even to developers and end-users[165]. This lack of transparency can undermine clinician and patient confidence, as clear explanations for model-driven recommendations (e.g., the rationale for classifying a polyp as malignant) are typically required. Although explainable AI (XAI) tools such as Grad-CAM (for imaging models), SHAP, and LIME exist, their application within FMs remains limited and predominantly provides correlational rather than causal insights[166]. For example, Grad-CAM can highlight regions of interest in endoscopic images but does not elucidate causal relationships, such as why a specific genetic mutation influences treatment response predictions. This discrepancy highlights a critical gap between clinical needs for causal explanations and the correlational outputs currently provided by FMs. Bridging this gap necessitates the development of clinician-centric visualization interfaces that link model predictions to specific clinical features, including polyp size or histological characteristics. Interpretability should be regarded as a core performance metric alongside accuracy and sensitivity in FM validation studies, rather than an ancillary consideration. Additionally, integrating principles from human factors engineering into FM design can ensure that explanations align with clinical workflows and cognitive demands, thereby fostering greater acceptance.
Third, respecting bias and equity, many FM training datasets predominantly originate from high-income countries and large academic centers, resulting in the underrepresentation of minority populations and low-resource settings[167]. This imbalance introduces biases that may exacerbate health disparities. For example, existing studies so far have largely focused on specific groups of patients, like those from Asia or Europe/the United States, potentially limiting model ap
As regards regulatory pathways, current frameworks for medical AI are inadequately suited to FMs, which differ from traditional tools in their generalizability and capacity for continuous learning from new data[168]. Regulatory pathways such as the United States FDA’s De Novo classification and 510(k) clearance have been applied to certain AI-based diagnostic tools, such as the FDA-approved Paige Prostate software for identifying cancer cells in prostate pathology images[109]. However, FMs, which can be adapted for multiple tasks (e.g., CRC detection, chemotherapy response prediction, and high-risk patient identification), do not conform to these static, task-specific approval models. Con
Finally, in regard to clinical validation and real-world deployment, most FM studies remain confined to technical validation phases, demonstrating high accuracy under controlled conditions[170]. However, such findings do not necessarily translate into clinical utility, defined by improvements in diagnosis, treatment decision-making, or patient outcomes. Operational feasibility, including seamless integration into existing clinical workflows without imposing additional burdens on healthcare providers, is infrequently evaluated. Moreover, cost-effectiveness analyses, such as whether FMs predicting chemotherapy response reduce unnecessary treatment expenditures, are scarce. Addressing these gaps requires rigorous, multicenter, prospective randomized controlled trials. Implementation science research should investigate FM performance across diverse healthcare systems and resource settings. Enhancing transparency through the establishment of public clinical trial registries, where study protocols, data, and outcomes are openly accessible, is also advocated.
In summary, FMs possess transformative potential for GI cancer care, ranging from facilitating early detection to enabling personalized therapeutic strategies. Nonetheless, technological advancements alone are insufficient for successful clinical translation. Addressing technical limitations alongside ethical, regulatory, and equity-related challenges is imperative. The future role of FMs in GI oncology is not to supplant clinicians but to augment precision medicine. It's important to recognize that both presently and prospectively, FMs and related tools will not replace endoscopists, radiologists, or pathologists. The main role of models lies in providing professional analytical support, while the final diagnosis and treatment decisions will still be led by clinicians. This partnership between humans and machines will continue to be key to helping patients.
| 1. | Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74:229-263. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 5690] [Cited by in RCA: 10818] [Article Influence: 10818.0] [Reference Citation Analysis (3)] |
| 2. | Bordry N, Astaras C, Ongaro M, Goossens N, Frossard JL, Koessler T. Recent advances in gastrointestinal cancers. World J Gastroenterol. 2021;27:4493-4503. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 10] [Cited by in RCA: 21] [Article Influence: 5.3] [Reference Citation Analysis (0)] |
| 3. | Lipkova J, Kather JN. The age of foundation models. Nat Rev Clin Oncol. 2024;21:769-770. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 9] [Reference Citation Analysis (0)] |
| 4. | Tsang KK, Kivelson S, Acitores Cortina JM, Kuchi A, Berkowitz JS, Liu H, Srinivasan A, Friedrich NA, Fatapour Y, Tatonetti NP. Foundation Models for Translational Cancer Biology. Annu Rev Biomed Data Sci. 2025;8:51-80. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 3] [Article Influence: 3.0] [Reference Citation Analysis (0)] |
| 5. | Moor M, Banerjee O, Abad ZSH, Krumholz HM, Leskovec J, Topol EJ, Rajpurkar P. Foundation models for generalist medical artificial intelligence. Nature. 2023;616:259-265. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 105] [Cited by in RCA: 610] [Article Influence: 305.0] [Reference Citation Analysis (0)] |
| 6. | Zeng R, Gou H, Lau HCH, Yu J. Stomach microbiota in gastric cancer development and clinical implications. Gut. 2024;73:2062-2073. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 59] [Article Influence: 59.0] [Reference Citation Analysis (0)] |
| 7. | Cao JS, Lu ZY, Chen MY, Zhang B, Juengpanich S, Hu JH, Li SJ, Topatana W, Zhou XY, Feng X, Shen JL, Liu Y, Cai XJ. Artificial intelligence in gastroenterology and hepatology: Status and challenges. World J Gastroenterol. 2021;27:1664-1690. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 16] [Cited by in RCA: 20] [Article Influence: 5.0] [Reference Citation Analysis (1)] |
| 8. | Kröner PT, Engels MM, Glicksberg BS, Johnson KW, Mzaik O, van Hooft JE, Wallace MB, El-Serag HB, Krittanawong C. Artificial intelligence in gastroenterology: A state-of-the-art review. World J Gastroenterol. 2021;27:6794-6824. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 28] [Cited by in RCA: 101] [Article Influence: 25.3] [Reference Citation Analysis (7)] |
| 9. | Chen RJ, Ding T, Lu MY, Williamson DFK, Jaume G, Song AH, Chen B, Zhang A, Shao D, Shaban M, Williams M, Oldenburg L, Weishaupt LL, Wang JJ, Vaidya A, Le LP, Gerber G, Sahai S, Williams W, Mahmood F. Towards a general-purpose foundation model for computational pathology. Nat Med. 2024;30:850-862. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 660] [Cited by in RCA: 382] [Article Influence: 382.0] [Reference Citation Analysis (0)] |
| 10. | Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G, Zhang K, Ji C, Yan Q, He L, Peng H, Li J, Wu J, Liu Z, Xie P, Xiong C, Pei J, Yu PS, Sun L. A comprehensive survey on pretrained foundation models: a history from BERT to ChatGPT. Int J Mach Learn Cyber. 2024. [DOI] [Full Text] |
| 11. | Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buch S, Card D, Castellon R, Chatterji N, Chen A, Creel K, Quincy Davis J, Demszky D, Donahue C, Doumbouya M, Durmus E, Ermon S, Etchemendy J, Ethayarajh K, Fei-Fei L, Finn C, Gale T, Gillespie L, Goel K, Goodman N, Grossman S, Guha N, Hashimoto T, Henderson P, Hewitt J, Ho DE, Hong J, Hsu K, Huang J, Icard T, Jain S, Jurafsky D, Kalluri P, Karamcheti S, Keeling G, Khani F, Khattab O, Koh PW, Krass M, Krishna R, Kuditipudi R, Kumar A, Ladhak F, Lee M, Lee T, Leskovec J, Levent I, Li XL, Li X, Ma T, Malik A, Manning CD, Mirchandani S, Mitchell E, Munyikwa Z, Nair S, Narayan A, Narayanan D, Newman B, Nie A, Niebles JC, Nilforoshan H, Nyarko J, Ogut G, Orr L, Papadimitriou I, Park JS, Piech C, Portelance E, Potts C, Raghunathan A, Reich R, Ren H, Rong F, Roohani Y, Ruiz C, Ryan J, Ré C, Sadigh D, Sagawa S, Santhanam K, Shih A, Srinivasan K, Tamkin A, Taori R, Thomas AW, Tramèr F, Wang RE, Wang W, Wu B, Wu J, Wu Y, Xie SM, Yasunaga M, You J, Zaharia M, Zhang M, Zhang T, Zhang X, Zhang Y, Zheng L, Zhou K, Liang P. On the Opportunities and Risks of Foundation Models. 2022 Preprint. Available from: arXiv:2108.07258. [DOI] [Full Text] |
| 13. | McCarthy J, Minsky ML, Rochester N, Shannon CE. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence: August 31, 1955. AI Mag. 1955;27:12-14. [DOI] [Full Text] |
| 14. | Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65:386-408. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 4749] [Cited by in RCA: 2150] [Article Influence: 32.1] [Reference Citation Analysis (0)] |
| 15. | LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436-444. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 36149] [Cited by in RCA: 20727] [Article Influence: 2072.7] [Reference Citation Analysis (0)] |
| 16. | Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: NIPS'17. Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017; Long Beach, CA, United States. Red Hook, NY, United States: Curran Associates Inc., 2017: 6000-6010. |
| 17. | Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In: NIPS '20. Proceedings of the 34th International Conference on Neural Information Processing Systems; 2020; Vancouver, BC, Canada. Red Hook, NY, United States: Curran Associates Inc., 2020: 25. |
| 18. | Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, MN: Association for Computational Linguistics, 2019: 4171-4186. [DOI] [Full Text] |
| 19. | Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Leoni Aleman F, Almeida D, Altenschmidt J, Altman S, Anadkat S, Avila R, Babuschkin I, Balaji S, Balcom V, Baltescu P, Bao H, Bavarian M, Belgum J, Bello I, Berdine J, Bernadett-Shapiro G, Berner C, Bogdonoff L, Boiko O, Boyd M, Brakman A-L, Brockman G, Brooks T, Brundage M, Button K, Cai T, Campbell R, Cann A, Carey B, Carlson C, Carmichael R, Chan B, Chang C, Chantzis F, Chen D, Chen S, Chen R, Chen J, Chen M, Chess B, Cho C, Chu C, Chung HW, Cummings D, Currier J, Dai Y, Decareaux C, Degry T, Deutsch N, Deville D, Dhar A, Dohan D, Dowling S, Dunning S, Ecoffet A, Eleti A, Eloundou T, Farhi D, Fedus L, Felix N, Posada Fishman S, Forte J, Fulford I, Gao L, Georges E, Gibson C, Goel V, Gogineni T, Goh G, Gontijo-Lopes R, Gordon J, Grafstein M, Gray S, Greene R, Gross J, Gu SS, Guo Y, Hallacy C, Han J, Harris J, He Y, Heaton M, Heidecke J, Hesse C, Hickey A, Hickey W, Hoeschele P, Houghton B, Hsu K, Hu S, Hu X, Huizinga J, Jain S, Jain S, Jang J, Jiang A, Jiang R, Jin H, Jin D, Jomoto S, Jonn B, Jun H, Kaftan T, Kaiser Ł, Kamali A, Kanitscheider I, Shirish Keskar N, Khan T, Kilpatrick L, Kim JW, Kim C, Kim Y, Hendrik Kirchner J, Kiros J, Knight M, Kokotajlo D, Kondraciuk Ł, Kondrich A, Konstantinidis A, Kosic K, Krueger G, Kuo V, Lampe M, Lan I, Lee T, Leike J, Leung J, Levy D, Li CM, Lim R, Lin M, Lin S, Litwin M, Lopez T, Lowe R, Lue P, Makanju A, Malfacini K, Manning S, Markov T, Markovski Y, Martin B, Mayer K, Mayne A, McGrew B, McKinney SM, McLeavey C, McMillan P, McNeil J, Medina D, Mehta A, Menick J, Metz L, Mishchenko A, Mishkin P, Monaco V, Morikawa E, Mossing D, Mu T, Murati M, Murk O, Mély D, Nair A, Nakano R, Nayak R, Neelakantan A, Ngo R, Noh H, Ouyang L, O'Keefe C, Pachocki J, Paino A, Palermo J, Pantuliano A, Parascandolo G, Parish J, Parparita E, Passos A, Pavlov M, Peng A, Perelman A, de Avila Belbute Peres F, Petrov M, Ponde de Oliveira Pinto H, Michael, Pokorny, Pokrass M, Pong VH, Powell T, Power A, Power B, Proehl E, Puri R, Radford A; OpenAI. GPT-4 Technical Report. 2023 Preprint. Available from: eprint arXiv:2303.08774. [DOI] [Full Text] |
| 20. | Guo D, Yang D, Zhang H, Song J, Zhang R, Xu R, Zhu Q, Ma S, Wang P, Bi X, Zhang X, Yu X, Wu Y, Wu ZF, Gou Z, Shao Z, Li Z, Gao Z, Liu A, Xue B, Wang B, Wu B, Feng B, Lu C, Zhao C, Deng C, Zhang C, Ruan C, Dai D, Chen D, Ji D, Li E, Lin F, Dai F, Luo F, Hao G, Chen G, Li G, Zhang H, Bao H, Xu H, Wang H, Ding H, Xin H, Gao H, Qu H, Li H, Guo J, Li J, Wang J, Chen J, Yuan J, Qiu J, Li J, Cai JL, Ni J, Liang J, Chen J, Dong K, Hu K, Gao K, Guan K, Huang K, Yu K, Wang L, Zhang L, Zhao L, Wang L, Zhang L, Xu L, Xia L, Zhang M, Zhang M, Tang M, Li M, Wang M, Li M, Tian N, Huang P, Zhang P, Wang Q, Chen Q, Du Q, Ge R, Zhang R, Pan R, Wang R, Chen RJ, Jin RL, Chen R, Lu S, Zhou S, Chen S, Ye S, Wang S, Yu S, Zhou S, Pan S, Li SS, Zhou S, Wu S, Ye S, Yun T, Pei T, Sun T, Wang T, Zeng W, Zhao W, Liu W, Liang W, Gao W, Yu W, Zhang W, Xiao WL, An W, Liu X, Wang X, Chen X, Nie X, Cheng X, Liu X, Xie X, Liu X, Yang X, Li X, Su X, Lin X, Li XQ, Jin X, Shen X, Chen X, Sun X, Wang X, Song X, Zhou X, Wang X, Shan X, Li YK, Wang YQ, Wei YX, Zhang Y, Xu Y, Li Y, Zhao Y, Sun Y, Wang Y, Yu Y, Zhang Y, Shi Y, Xiong Y, He Y, Piao Y, Wang Y, Tan Y, Ma Y, Liu Y, Guo Y, Ou Y, Wang Y, Gong Y, Zou Y, He Y, Xiong Y, Luo Y, You Y, Liu Y, Zhou Y, Zhu YX, Xu Y, Huang Y, Li Y, Zheng Y, Zhu Y, Ma Y, Tang Y, Zha Y, Yan Y, Ren ZZ, Ren Z, Sha Z, Fu Z, Xu Z, Xie Z, Zhang Z, Hao Z, Ma Z, Yan Z, Wu Z, Gu Z, Zhu Z, Liu Z, Li Z, Xie Z, Song Z, Pan Z, Huang Z, Xu Z, Zhang Z, Zhang Z; DeepSeek-AI. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. 2025 Preprint. Available from: eprint arXiv:2501.12948. [DOI] [Full Text] |
| 21. | Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I. Learning Transferable Visual Models From Natural Language Supervision. In: Meila M, Zhang T, editors. Proceedings of Machine Learning Research. Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021: 8748-8763. |
| 22. | Pai S, Bontempi D, Hadzic I, Prudente V, Sokač M, Chaunzwa TL, Bernatz S, Hosny A, Mak RH, Birkbak NJ, Aerts HJWL. Foundation model for cancer imaging biomarkers. Nat Mach Intell. 2024;6:354-367. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 5] [Cited by in RCA: 70] [Article Influence: 70.0] [Reference Citation Analysis (0)] |
| 23. | Shen D, Wu G, Suk HI. Deep Learning in Medical Image Analysis. Annu Rev Biomed Eng. 2017;19:221-248. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2581] [Cited by in RCA: 2033] [Article Influence: 254.1] [Reference Citation Analysis (0)] |
| 24. | Alsentzer E, Murphy J, Boag W, Weng W, Jindi D, Naumann T, Mcdermott M. Publicly Available Clinical BERT Embeddings. In: Rumshisky A, Roberts K, Bethard S, Naumann T, editors. Proceedings of the 2nd Clinical Natural Language Processing Workshop. Minneapolis, MN, United States: Association for Computational Linguistics, 2019: 72-78. [DOI] [Full Text] |
| 25. | Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 9th International Conference on Learning Representations. Austria: ICLR, 2021. |
| 26. | Zhou B, Yang G, Shi Z, Ma S. Natural Language Processing for Smart Healthcare. IEEE Rev Biomed Eng. 2024;17:4-18. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 58] [Cited by in RCA: 34] [Article Influence: 34.0] [Reference Citation Analysis (0)] |
| 27. | Hou JK, Imler TD, Imperiale TF. Current and future applications of natural language processing in the field of digestive diseases. Clin Gastroenterol Hepatol. 2014;12:1257-1261. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 22] [Cited by in RCA: 26] [Article Influence: 2.4] [Reference Citation Analysis (0)] |
| 28. | Team G, Anil R, Borgeaud S, Alayrac JB, Yu J, Soricut R, Schalkwyk J, Dai AM, Hauth A, Millican K, Silver D, Johnson M, Antonoglou I, Schrittwieser J, Glaese A, Chen J, Pitler E, Lillicrap T, Lazaridou A, Firat O, Molloy J, Isard M, Barham PR, Hennigan T, Lee B, Viola F, Reynolds M, Xu Y, Doherty R, Collins E, Meyer C, Rutherford E, Moreira E, Ayoub K, Goel M, Krawczyk J, Du C, Chi E, Cheng H-T, Ni E, Shah P, Kane P, Chan B, Faruqui M, Severyn A, Lin H, Li Y, Cheng Y, Ittycheriah A, Mahdieh M, Chen M, Sun P, Tran D, Bagri S, Lakshminarayanan B, Liu J, Orban A, Güra F, Zhou H, Song X, Boffy A, Ganapathy H, Zheng S, Choe H, Weisz Á, Zhu T, Lu Y, Gopal S, Kahn J, Kula M, Pitman J, Shah R, Taropa E, Al Merey M, Baeuml M, Chen Z, El Shafey L, Zhang Y, Sercinoglu O, Tucker G, Piqueras E, Krikun M, Barr I, Savinov N, Danihelka I, Roelofs B, White A, Andreassen A, von Glehn T, Yagati L, Kazemi M, Gonzalez L, Khalman M, Sygnowski J, Frechette A, Smith C, Culp L, Proleev L, Luan Y, Chen X, Lottes J, Schucher N, Lebron F, Rrustemi A, Clay N, Crone P, Kocisky T, Zhao J, Perz B, Yu D, Howard H, Bloniarz A, Rae JW, Lu H, Sifre L, Maggioni M, Alcober F, Garrette D, Barnes M, Thakoor S, Austin J, Barth-Maron G, Wong W, Joshi R, Chaabouni R, Fatiha D, Ahuja A, Singh Tomar G, Senter E, Chadwick M, Kornakov I, Attaluri N, Iturrate I, Liu R, Li Y, Cogan S, Chen J, Jia C, Gu C, Zhang Q, Grimstad J, Jakse Hartman A, Garcia X, Sankaranarayana Pillai T, Devlin J, Laskin M, de Las Casas D, Valter D, Tao C, Blanco L, Puigdomènech Badia A, Reitter D, Chen M, Brennan J, Rivera C, Brin S, Iqbal S, Surita G, Labanowski J, Rao A, Winkler S, Parisotto E, Gu Y, Olszewska K, Addanki R, Miech A, Louis A, Teplyashin D, Brown G, Catt E, Balaguer J, Xiang J, Wang P, Ashwood Z, Briukhov A, Webson A, Ganapathy S, Sanghavi S, Kannan A, Chang M-W, Stjerngren A, Djolonga J, Sun Y, Bapna A, Aitchison M, Pejman P, Michalewski H, Yu T, Wang C, Love J, Ahn J, Bloxwich D, Han K, Humphreys P, Sellam T, Bradbury J, Godbole V, Samangooei S, Damoc B, Kaskasoli A. Gemini: A Family of Highly Capable Multimodal Models. 2023 Preprint. Available from: eprint arXiv:2312.11805. [DOI] [Full Text] |
| 29. | Syed S, Angel AJ, Syeda HB, Jennings CF, VanScoy J, Syed M, Greer M, Bhattacharyya S, Zozus M, Tharian B, Prior F. The h-ANN Model: Comprehensive Colonoscopy Concept Compilation Using Combined Contextual Embeddings. Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022;5:189-200. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 7] [Article Influence: 2.3] [Reference Citation Analysis (0)] |
| 30. | Lahat A, Shachar E, Avidan B, Glicksberg B, Klang E. Evaluating the Utility of a Large Language Model in Answering Common Patients' Gastrointestinal Health-Related Questions: Are We There Yet? Diagnostics (Basel). 2023;13:1950. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 65] [Reference Citation Analysis (0)] |
| 31. | Lee TC, Staller K, Botoman V, Pathipati MP, Varma S, Kuo B. ChatGPT Answers Common Patient Questions About Colonoscopy. Gastroenterology. 2023;165:509-511.e7. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 55] [Cited by in RCA: 100] [Article Influence: 50.0] [Reference Citation Analysis (0)] |
| 32. | Emile SH, Horesh N, Freund M, Pellino G, Oliveira L, Wignakumar A, Wexner SD. How appropriate are answers of online chat-based artificial intelligence (ChatGPT) to common questions on colon cancer? Surgery. 2023;174:1273-1275. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 35] [Cited by in RCA: 35] [Article Influence: 17.5] [Reference Citation Analysis (0)] |
| 33. | Moazzam Z, Cloyd J, Lima HA, Pawlik TM. Quality of ChatGPT Responses to Questions Related to Pancreatic Cancer and its Surgical Care. Ann Surg Oncol. 2023;30:6284-6286. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 3] [Cited by in RCA: 20] [Article Influence: 10.0] [Reference Citation Analysis (0)] |
| 34. | Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, Ayoub W, Yang JD, Liran O, Spiegel B, Kuo A. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023;29:721-732. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 177] [Cited by in RCA: 370] [Article Influence: 185.0] [Reference Citation Analysis (0)] |
| 35. | Cao JJ, Kwon DH, Ghaziani TT, Kwo P, Tse G, Kesselman A, Kamaya A, Tse JR. Accuracy of Information Provided by ChatGPT Regarding Liver Cancer Surveillance and Diagnosis. AJR Am J Roentgenol. 2023;221:556-559. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 16] [Cited by in RCA: 46] [Article Influence: 23.0] [Reference Citation Analysis (0)] |
| 36. | Gorelik Y, Ghersin I, Arraf T, Ben-ishai O, Klein A, Khamaysi I. Using A Customized Gpt To Provide Guideline-Based Recommendations For The Management Of Pancreatic Mucinous Cystic Lesions. Gastrointest Endosc. 2024;99:AB42. [DOI] [Full Text] |
| 37. | Gorelik Y, Ghersin I, Maza I, Klein A. Harnessing language models for streamlined postcolonoscopy patient management: a novel approach. Gastrointest Endosc. 2023;98:639-641.e4. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 29] [Article Influence: 14.5] [Reference Citation Analysis (0)] |
| 38. | Zhou J, Li T, James Fong S, Dey N, González Crespo R. Exploring ChatGPT's Potential for Consultation, Recommendations and Report Diagnosis: Gastric Cancer and Gastroscopy Reports’ Case. Int J Interact Multimed Artif Intell. 2023;8:7-13. [DOI] [Full Text] |
| 39. | Yang Z, Lu Y, Bagdasarian J, Das Swain V, Agarwal R, Campbell C, Al-Refaire W, El-Bayoumi J, Gao G, Wang D, Yao B, Shara N. RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care. 2025 Preprint. Available from: eprint arXiv:2502.05740. [DOI] [Full Text] |
| 40. | Kerbage A, Kassab J, El Dahdah J, Burke CA, Achkar JP, Rouphael C. Accuracy of ChatGPT in Common Gastrointestinal Diseases: Impact for Patients and Providers. Clin Gastroenterol Hepatol. 2024;22:1323-1325.e3. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 36] [Article Influence: 36.0] [Reference Citation Analysis (0)] |
| 41. | Tariq R, Malik S, Khanna S. Evolving Landscape of Large Language Models: An Evaluation of ChatGPT and Bard in Answering Patient Queries on Colonoscopy. Gastroenterology. 2024;166:220-221. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 26] [Cited by in RCA: 23] [Article Influence: 23.0] [Reference Citation Analysis (0)] |
| 42. | Maida M, Ramai D, Mori Y, Dinis-Ribeiro M, Facciorusso A, Hassan C; and the AI-CORE (Artificial Intelligence COlorectal cancer Research) Working Group. The role of generative language systems in increasing patient awareness of colon cancer screening. Endoscopy. 2025;57:262-268. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 4] [Cited by in RCA: 11] [Article Influence: 11.0] [Reference Citation Analysis (0)] |
| 43. | Atarere J, Naqvi H, Haas C, Adewunmi C, Bandaru S, Allamneni R, Ugonabo O, Egbo O, Umoren M, Kanth P. Applicability of Online Chat-Based Artificial Intelligence Models to Colorectal Cancer Screening. Dig Dis Sci. 2024;69:791-797. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 17] [Reference Citation Analysis (0)] |
| 44. | Chang PW, Amini MM, Davis RO, Nguyen DD, Dodge JL, Lee H, Sheibani S, Phan J, Buxbaum JL, Sahakian AB. ChatGPT4 Outperforms Endoscopists for Determination of Postcolonoscopy Rescreening and Surveillance Recommendations. Clin Gastroenterol Hepatol. 2024;22:1917-1925.e17. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 13] [Article Influence: 13.0] [Reference Citation Analysis (0)] |
| 45. | Lim DYZ, Tan YB, Koh JTE, Tung JYM, Sng GGR, Tan DMY, Tan CK. ChatGPT on guidelines: Providing contextual knowledge to GPT allows it to provide advice on appropriate colonoscopy intervals. J Gastroenterol Hepatol. 2024;39:81-106. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 34] [Article Influence: 34.0] [Reference Citation Analysis (0)] |
| 46. | Munir MM, Endo Y, Ejaz A, Dillhoff M, Cloyd JM, Pawlik TM. Online artificial intelligence platforms and their applicability to gastrointestinal surgical operations. J Gastrointest Surg. 2024;28:64-69. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 14] [Reference Citation Analysis (0)] |
| 47. | Truhn D, Loeffler CM, Müller-Franzes G, Nebelung S, Hewitt KJ, Brandner S, Bressem KK, Foersch S, Kather JN. Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4). J Pathol. 2024;262:310-319. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 45] [Article Influence: 45.0] [Reference Citation Analysis (0)] |
| 48. | Choo JM, Ryu HS, Kim JS, Cheong JY, Baek SJ, Kwak JM, Kim J. Conversational artificial intelligence (chatGPT™) in the management of complex colorectal cancer patients: early experience. ANZ J Surg. 2024;94:356-361. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 24] [Article Influence: 24.0] [Reference Citation Analysis (0)] |
| 49. | Huo B, Mckechnie T, Ortenzi M, Lee Y, Antoniou S, Mayol J, Ahmed H, Boudreau V, Ramji K, Eskicioglu C. Dr. GPT will see you now: the ability of large language model-linked chatbots to provide colorectal cancer screening recommendations. Health Technol. 2024;14:463-469. [RCA] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 15] [Article Influence: 15.0] [Reference Citation Analysis (0)] |
| 50. | Pereyra L, Schlottmann F, Steinberg L, Lasa J. Colorectal Cancer Prevention: Is Chat Generative Pretrained Transformer (Chat GPT) ready to Assist Physicians in Determining Appropriate Screening and Surveillance Recommendations? J Clin Gastroenterol. 2024;58:1022-1027. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 9] [Article Influence: 9.0] [Reference Citation Analysis (0)] |
| 51. | Peng W, Feng Y, Yao C, Zhang S, Zhuo H, Qiu T, Zhang Y, Tang J, Gu Y, Sun Y. Evaluating AI in medicine: a comparative analysis of expert and ChatGPT responses to colorectal cancer questions. Sci Rep. 2024;14:2840. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 21] [Reference Citation Analysis (0)] |
| 52. | Ma H, Ma X, Yang C, Niu Q, Gao T, Liu C, Chen Y. Development and evaluation of a program based on a generative pre-trained transformer model from a public natural language processing platform for efficiency enhancement in post-procedural quality control of esophageal endoscopic submucosal dissection. Surg Endosc. 2024;38:1264-1272. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 3] [Article Influence: 3.0] [Reference Citation Analysis (0)] |
| 53. | Cohen AB, Adamson B, Larch JK, Amster G. Large Language Model Extraction of PD-L1 Biomarker Testing Details From Electronic Health Records. AI Precis Oncol. 2025;2:57-64. [DOI] [Full Text] |
| 54. | Scherbakov D, Heider PM, Wehbe R, Alekseyenko AV, Lenert LA, Obeid JS. Using large language models for extracting stressful life events to assess their impact on preventive colon cancer screening adherence. BMC Public Health. 2025;25:12. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 4] [Reference Citation Analysis (0)] |
| 55. | Chatziisaak D, Burri P, Sparn M, Hahnloser D, Steffen T, Bischofberger S. Concordance of ChatGPT artificial intelligence decision-making in colorectal cancer multidisciplinary meetings: retrospective study. BJS Open. 2025;9:zraf040. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 6] [Reference Citation Analysis (0)] |
| 56. | Saraiva MM, Ribeiro T, Agudo B, Afonso J, Mendes F, Martins M, Cardoso P, Mota J, Almeida MJ, Costa A, Gonzalez Haba Ruiz M, Widmer J, Moura E, Javed A, Manzione T, Nadal S, Barroso LF, de Parades V, Ferreira J, Macedo G. Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology. J Clin Med. 2025;14:572. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 57. | Siu AHY, Gibson DP, Chiu C, Kwok A, Irwin M, Christie A, Koh CE, Keshava A, Reece M, Suen M, Rickard MJFX. ChatGPT as a patient education tool in colorectal cancer-An in-depth assessment of efficacy, quality and readability. Colorectal Dis. 2025;27:e17267. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 5] [Reference Citation Analysis (0)] |
| 58. | Horesh N, Emile SH, Gupta S, Garoufalia Z, Gefen R, Zhou P, da Silva G, Wexner SD. Comparing the Management Recommendations of Large Language Model and Colorectal Cancer Multidisciplinary Team: A Pilot Study. Dis Colon Rectum. 2025;68:41-47. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 6] [Article Influence: 6.0] [Reference Citation Analysis (0)] |
| 59. | Ellison IE, Oslock WM, Abdullah A, Wood L, Thirumalai M, English N, Jones BA, Hollis R, Rubyan M, Chu DI. De novo generation of colorectal patient educational materials using large language models: Prompt engineering key to improved readability. Surgery. 2025;180:109024. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 4] [Article Influence: 4.0] [Reference Citation Analysis (0)] |
| 60. | Ramchandani R, Guo E, Rakab E, Rathod J, Strain J, Klement W, Shorr R, Williams E, Jones D, Gilbert S. Validation of automated paper screening for esophagectomy systematic review using large language models. PeerJ Comput Sci. 2025;11:e2822. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 61. | Zhang H, Dong F, Li W, Ren Y, Dong H. HepatoAudit: A Comprehensive Dataset for Evaluating Consistency of Large Language Models in Hepatobiliary Case Record Diagnosis. 2025 IEEE 17th International Conference on Computer Research and Development (ICCRD); 2025 Jan 17-19; Shangrao, China. IEEE, 2025: 234-239. [DOI] [Full Text] |
| 62. | Spitzl D, Mergen M, Bauer U, Jungmann F, Bressem KK, Busch F, Makowski MR, Adams LC, Gassert FT. Leveraging large language models for accurate classification of liver lesions from MRI reports. Comput Struct Biotechnol J. 2025;27:2139-2146. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 2] [Cited by in RCA: 4] [Article Influence: 4.0] [Reference Citation Analysis (0)] |
| 63. | Sheng L, Chen Y, Wei H, Che F, Wu Y, Qin Q, Yang C, Wang Y, Peng J, Bashir MR, Ronot M, Song B, Jiang H. Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists. Liver Int. 2025;45:e70115. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 64. | Williams CY, Sarkar U, Adler-milstein J, Rotenstein L. Using Large Language Models to Determine Reasons for Missed Colon Cancer Screening Follow-Up. 2025 Preprint. Available from: medrxiv:25329439. [DOI] [Full Text] |
| 65. | Lu K, Lu J, Xu H, Guo K, Zhang Q, Lin H, Grosser M, Zhang Y, Zhang G. Genomics-Enhanced Cancer Risk Prediction for Personalized LLM-Driven Healthcare Recommender Systems. ACM Trans Inf Syst. 2025;43:1-30. [DOI] [Full Text] |
| 66. | Yang X, Xiao Y, Liu D, Zhang Y, Deng H, Huang J, Shi H, Liu D, Liang M, Jin X, Sun Y, Yao J, Zhou X, Guo W, He Y, Tang W, Xu C. Enhancing doctor-patient communication using large language models for pathology report interpretation. BMC Med Inform Decis Mak. 2025;25:36. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 3] [Cited by in RCA: 8] [Article Influence: 8.0] [Reference Citation Analysis (0)] |
| 67. | Jain S, Chakraborty B, Agarwal A, Sharma R. Performance of Large Language Models (ChatGPT and Gemini Advanced) in Gastrointestinal Pathology and Clinical Review of Applications in Gastroenterology. Cureus. 2025;17:e81618. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 68. | Xu J, Wang J, Li J, Zhu Z, Fu X, Cai W, Song R, Wang T, Li H. Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts. J Med Syst. 2025;49:64. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 69. | Deroy A, Maity S. Cancer-Answer: Empowering Cancer Care with Advanced Large Language Models. 2025 Preprint. Available from: eprint arXiv:2411.06946. [DOI] [Full Text] |
| 70. | Ye X, Shi T, Huang D, Sakurai T. Multi-Omics clustering by integrating clinical features from large language model. Methods. 2025;239:64-71. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 71. | Ma J, He Y, Li F, Han L, You C, Wang B. Segment anything in medical images. Nat Commun. 2024;15:654. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 267] [Cited by in RCA: 530] [Article Influence: 530.0] [Reference Citation Analysis (0)] |
| 72. | Ryu JS, Kang H, Chu Y, Yang S. Vision-language foundation models for medical imaging: a review of current practices and innovations. Biomed Eng Lett. 2025;15:809-830. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 73. | Rao VM, Hla M, Moor M, Adithan S, Kwak S, Topol EJ, Rajpurkar P. Multimodal generative AI for medical image interpretation. Nature. 2025;639:888-896. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 18] [Article Influence: 18.0] [Reference Citation Analysis (0)] |
| 74. | Zhang S, Xu Y, Usuyama N, Xu H, Bagga J, Tinn R, Preston S, Rao R, Wei M, Valluri N, Wong C, Tupini A, Wang Y, Mazzola M, Shukla S, Liden L, Gao J, Crabtree A, Piening B, Bifulco C, Lungren MP, Naumann T, Wang S, Poon H. A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image-Text Pairs. NEJM AI. 2025;2. [DOI] [Full Text] |
| 75. | Zippelius C, Alqahtani SA, Schedel J, Brookman-Amissah D, Muehlenberg K, Federle C, Salzberger A, Schorr W, Pech O. Diagnostic accuracy of a novel artificial intelligence system for adenoma detection in daily practice: a prospective nonrandomized comparative study. Endoscopy. 2022;54:465-472. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 8] [Cited by in RCA: 20] [Article Influence: 6.7] [Reference Citation Analysis (0)] |
| 76. | Cui B, Islam M, Bai L, Ren H. Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery. Int J Comput Assist Radiol Surg. 2024;19:1013-1020. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 6] [Article Influence: 6.0] [Reference Citation Analysis (0)] |
| 77. | Wang J, Song S, Wang X, Wang Y, Miao Y, Su J, Zhou SK. ProMISe: Promptable Medical Image Segmentation using SAM. 2024 Preprint. Available from: eprint arXiv:2403.04164. [DOI] [Full Text] |
| 78. | Li Y, Hu M, Yang X. Polyp-SAM: transfer SAM for polyp segmentation. Proceedings of the Medical Imaging 2024: Computer-Aided Diagnosis; 2024 Feb 18-22; San Diego, CA, United States. SPIE, 2024: 759-765. [DOI] [Full Text] |
| 79. | Wang Z, Liu C, Zhang S, Dou Q. Foundation Model for Endoscopy Video Analysis via Large-Scale Self-supervised Pre-train. In: Greenspan H, Madabhushi A, Mousavi P, Salcudean S, Duncan J, Syeda-Mahmood T, Taylor R, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14228. Cham: Springer, 2023. [DOI] [Full Text] |
| 80. | Ji GP, Liu J, Xu P, Barnes N, Shahbaz Khan F, Khan S, Fan DP. Frontiers in Intelligent Colonoscopy. 2024 Preprint. Available from: eprint arXiv:2410.17241. [DOI] [Full Text] |
| 81. | Raseena TP, Kumar J, Balasundaram SR. DeepCPD: deep learning with vision transformer for colorectal polyp detection. Multimed Tools Appl. 2024;83:78183-78206. [DOI] [Full Text] |
| 82. | Teufel T, Shu H, Soberanis-Mukul RD, Mangulabnan JE, Sahu M, Vedula SS, Ishii M, Hager G, Taylor RH, Unberath M. OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point. Int J Comput Assist Radiol Surg. 2024;19:1259-1266. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 4] [Reference Citation Analysis (0)] |
| 83. | Liu Y, Yuan X, Zhou Y. EIVS: Unpaired Endoscopy Image Virtual Staining via State Space Generative Model. 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2024 Dec 03-06; Lisbon, Portugal. IEEE, 2025. [DOI] [Full Text] |
| 84. | Jing X, Zhou H, Mao K, Zhao Y, Chu L. A Novel Automatic Prompt Tuning Method for Polyp Segmentation. 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2024 Dec 03-06; Lisbon, Portugal. IEEE, 2025. [DOI] [Full Text] |
| 85. | He D, Ma Z, Li C, Li Y. Dual-Branch Fully Convolutional Segment Anything Model for Lesion Segmentation in Endoscopic Images. IEEE Access. 2024;12:125654-125667. [DOI] [Full Text] |
| 86. | Li F, Huang Z, Zhou L, Chen Y, Tang S, Ding P, Peng H, Chu Y. Improved dual-aggregation polyp segmentation network combining a pyramid vision transformer with a fully convolutional network. Biomed Opt Express. 2024;15:2590-2621. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 87. | Dermyer P, Kalra A, Schwartz M. EndoDINO: A Foundation Model for GI Endoscopy. 2025 Preprint. Available from: eprint arXiv:2501.05488. [DOI] [Full Text] |
| 88. | Choudhuri A, Gao Z, Zheng M, Planche B, Chen T, Wu Z. PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis. 2025 Preprint. Available from: eprint arXiv:2503.24108. [DOI] [Full Text] |
| 89. | Chen H, Gou L, Fang Z, Dou Q, Chen H, Chen C, Qiu Y, Zhang J, Ning C, Hu Y, Deng H, Yu J, Li G. Artificial intelligence assisted real-time recognition of intra-abdominal metastasis during laparoscopic gastric cancer surgery. NPJ Digit Med. 2025;8:9. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 8] [Reference Citation Analysis (0)] |
| 90. | Mostafijur Rahman M, Munir M, Jha D, Bagci U, Marculescu R. PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation. 2024 Preprint. Available from: eprint arXiv:2405.16740. [DOI] [Full Text] |
| 91. | Wang G, Xiao H, Gao H, Zhang R, Bai L, Yang X, Li Z, Li H, Ren H. CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection. 2024 Preprint. Available from: eprint arXiv:2410.07540. [DOI] [Full Text] |
| 92. | Tan S, Cai Y, Lin X, Qi W, Li Z, Wan X, Li G. ColonCLIP: An Adaptable Prompt-Driven Multi-Modal Strategy for Colonoscopy Image Diagnosis. 2024 IEEE International Symposium on Biomedical Imaging (ISBI); 2024 May 27-30; Athens, Greece. IEEE, 2024. [DOI] [Full Text] |
| 93. | Yu J, Zhu Y, Fu P, Chen T, Huang J, Li Q, Zhou P, Wang Z, Wu F, Wang S, Yang X. Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models. IEEE Trans Med Imaging. 2025;PP. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 94. | Sharma V, Jha D, Bhuyan MK, Das PK, Bagci U. Diverse Image Generation with Diffusion Models and Cross Class Label Learning for Polyp Classification. 2025 Preprint. Available from: eprint arXiv:2502.05444. [DOI] [Full Text] |
| 95. | Karaosmanoglu AD, Onur MR, Arellano RS. Imaging in Gastrointestinal Cancers. In: Yalcin S, Philip P, editors. Textbook of Gastrointestinal Oncology. Cham: Springer, 2019. [DOI] [Full Text] |
| 96. | Chong JJR, Kirpalani A, Moreland R, Colak E. Artificial Intelligence in Gastrointestinal Imaging: Advances and Applications. Radiol Clin North Am. 2025;63:477-490. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 97. | Wu C, Zhang X, Zhang Y, Wang Y, Xie W. Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data. 2023 Preprint. Available from: eprint arXiv:2308.02463. [DOI] [Full Text] |
| 98. | Cherti M, Beaumont R, Wightman R, Wortsman M, Ilharco G, Gordon C, Schuhmann C, Schmidt L, Jitsev J. Reproducible Scaling Laws for Contrastive Language-Image Learning. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17-24; Vancouver, BC, Canada. IEEE, 2023. [DOI] [Full Text] |
| 99. | Blankemeier L, Cohen JP, Kumar A, Van Veen D, Gardezi SJS, Paschali M, Chen Z, Delbrouck JB, Reis E, Truyts C, Bluethgen C, Jensen MEK, Ostmeier S, Varma M, Valanarasu JMJ, Fang Z, Huo Z, Nabulsi Z, Ardila D, Weng WH, Amaro E, Ahuja N, Fries J, Shah NH, Johnston A, Boutin RD, Wentland A, Langlotz CP, Hom J, Gatidis S, Chaudhari AS. Merlin: A Vision Language Foundation Model for 3D Computed Tomography. Res Sq. 2024;rs.3.rs-4546309. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 24] [Cited by in RCA: 20] [Article Influence: 20.0] [Reference Citation Analysis (0)] |
| 100. | Saab K, Tu T, Weng WH, Tanno R, Stutz D, Wulczyn E, Zhang F, Strother T, Park C, Vedadi E, Zambrano Chaves J, Hu SY, Schaekermann M, Kamath A, Cheng Y, Barrett DGT, Cheung C, Mustafa B, Palepu A, McDuff D, Hou L, Golany T, Liu L, Alayrac JB, Houlsby N, Tomasev N, Freyberg J, Lau C, Kemp J, Lai J, Azizi S, Kanada K, Man S, Kulkarni K, Sun R, Shakeri S, He L, Caine B, Webson A, Latysheva N, Johnson M, Mansfield P, Lu J, Rivlin E, Anderson J, Green B, Wong R, Krause J, Shlens J, Dominowska E, Eslami SMA, Chou K, Cui C, Vinyals O, Kavukcuoglu K, Manyika J, Dean J, Hassabis D, Matias Y, Webster D, Barral J, Corrado G, Semturs C, Mahdavi SS, Gottweis J, Karthikesalingam A, Natarajan V. Capabilities of Gemini Models in Medicine. 2024 Preprint. Available from: eprint arXiv:2404.18416. [DOI] [Full Text] |
| 101. | Kiraly AP, Baur S, Philbrick K, Mahvar F, Yatziv L, Chen T, Sterling B, George N, Jamil F, Tang J, Bailey K, Ahmed F, Goel A, Ward A, Yang L, Sellergren A, Matias Y, Hassidim A, Shetty S, Golden D, Azizi S, Steiner DF, Liu Y, Thelin T, Pilgrim R, Kirmizibayrak C. Health AI Developer Foundations. 2024 Preprint. Available from: eprint arXiv:2411.15128. [DOI] [Full Text] |
| 102. | Pai S, Hadzic I, Bontempi D, Bressem K, Kann BH, Fedorov A, Mak RH, Aerts HJWL. Vision Foundation Models for Computed Tomography. 2025 Preprint. Available from: eprint arXiv:2501.09001. [DOI] [Full Text] |
| 103. | Zhou HY, Nicolás Acosta J, Adithan S, Datta S, Topol EJ, Rajpurkar P. MedVersa: A Generalist Foundation Model for Medical Image Interpretation. 2024 Preprint. Available from: eprint arXiv:2405.07988. [DOI] [Full Text] |
| 104. | Zhou F, Xu Y, Cui Y, Zhang S, Zhu Y, He W, Wang J, Wang X, Chan R, Lau LHS, Han C, Zhang D, Li Z, Chen H. iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer. 2024 Preprint. Available from: eprint arXiv:2404.01192. [DOI] [Full Text] |
| 105. | Yasaka K, Kawamura M, Sonoda Y, Kubo T, Kiryu S, Abe O. Large multimodality model fine-tuned for detecting breast and esophageal carcinomas on CT: a preliminary study. Jpn J Radiol. 2025;43:779-786. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 106. | Dika E, Curti N, Giampieri E, Veronesi G, Misciali C, Ricci C, Castellani G, Patrizi A, Marcelli E. Advantages of manual and automatic computer-aided compared to traditional histopathological diagnosis of melanoma: A pilot study. Pathol Res Pract. 2022;237:154014. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 7] [Reference Citation Analysis (0)] |
| 107. | Hanna MG, Parwani A, Sirintrapun SJ. Whole Slide Imaging: Technology and Applications. Adv Anat Pathol. 2020;27:251-259. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 18] [Cited by in RCA: 79] [Article Influence: 15.8] [Reference Citation Analysis (0)] |
| 108. | Niazi MKK, Parwani AV, Gurcan MN. Digital pathology and artificial intelligence. Lancet Oncol. 2019;20:e253-e261. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 507] [Cited by in RCA: 684] [Article Influence: 114.0] [Reference Citation Analysis (0)] |
| 109. | da Silva LM, Pereira EM, Salles PG, Godrich R, Ceballos R, Kunz JD, Casson A, Viret J, Chandarlapaty S, Ferreira CG, Ferrari B, Rothrock B, Raciti P, Reuter V, Dogdas B, DeMuth G, Sue J, Kanan C, Grady L, Fuchs TJ, Reis-Filho JS. Independent real-world application of a clinical-grade automated prostate cancer detection system. J Pathol. 2021;254:147-158. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 104] [Reference Citation Analysis (0)] |
| 110. | Kang M, Song H, Park S, Yoo D, Pereira S. Benchmarking Self-Supervised Learning on Diverse Pathology Datasets. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17-24; Vancouver, BC, Canada. IEEE, 2023. [DOI] [Full Text] |
| 111. | Wang X, Yang S, Zhang J, Wang M, Zhang J, Yang W, Huang J, Han X. Transformer-based unsupervised contrastive learning for histopathological image classification. Med Image Anal. 2022;81:102559. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 247] [Reference Citation Analysis (0)] |
| 112. | Filiot A, Ghermi R, Olivier A, Jacob P, Fidon L, Camara A, Mac Kain A, Saillard C, Schiratti J. Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling. 2024 Preprint. Available from: medrxiv:23292757. [DOI] [Full Text] |
| 113. | Azizi S, Culp L, Freyberg J, Mustafa B, Baur S, Kornblith S, Chen T, Tomasev N, Mitrović J, Strachan P, Mahdavi SS, Wulczyn E, Babenko B, Walker M, Loh A, Chen PC, Liu Y, Bavishi P, McKinney SM, Winkens J, Roy AG, Beaver Z, Ryan F, Krogue J, Etemadi M, Telang U, Liu Y, Peng L, Corrado GS, Webster DR, Fleet D, Hinton G, Houlsby N, Karthikesalingam A, Norouzi M, Natarajan V. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat Biomed Eng. 2023;7:756-779. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 85] [Reference Citation Analysis (0)] |
| 114. | Vorontsov E, Bozkurt A, Casson A, Shaikovski G, Zelechowski M, Severson K, Zimmermann E, Hall J, Tenenholtz N, Fusi N, Yang E, Mathieu P, van Eck A, Lee D, Viret J, Robert E, Wang YK, Kunz JD, Lee MCH, Bernhard JH, Godrich RA, Oakley G, Millar E, Hanna M, Wen H, Retamero JA, Moye WA, Yousfi R, Kanan C, Klimstra DS, Rothrock B, Liu S, Fuchs TJ. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat Med. 2024;30:2924-2935. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 2] [Cited by in RCA: 135] [Article Influence: 135.0] [Reference Citation Analysis (0)] |
| 115. | Zimmermann E, Vorontsov E, Viret J, Casson A, Zelechowski M, Shaikovski G, Tenenholtz N, Hall J, Klimstra D, Yousfi R, Fuchs T, Fusi N, Liu S, Severson K. Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology. 2024 Preprint. Available from: eprint arXiv:2408.00738. [DOI] [Full Text] |
| 116. | Filiot A, Jacob P, Mac Kain A, Saillard C. Phikon-v2, A large and public feature extractor for biomarker prediction. 2024 Preprint. Available from: eprint arXiv:2409.09173. [DOI] [Full Text] |
| 117. | Dippel J, Feulner B, Winterhoff T, Milbich T, Tietz S, Schallenberg S, Dernbach G, Kunft A, Heinke S, Eich M-L, Ribbat-Idel J, Krupar R, Anders P, Prenißl N, Jurmeister P, Horst D, Ruff L, Müller K-R, Klauschen F, Alber M. RudolfV: A Foundation Model by Pathologists for Pathologists. 2024 Preprint. Available from: eprint arXiv:2401.04079. [DOI] [Full Text] |
| 118. | Nechaev D, Pchelnikov A, Ivanova E. Hibou: A Family of Foundational Vision Transformers for Pathology. 2024 Preprint. Available from: eprint arXiv:2406.05074. [DOI] [Full Text] |
| 119. | Jaume G, Vaidya A, Zhang A, H. Song A, J. Chen R, Sahai S, Mo D, Madrigal E, Phi Le L, Mahmood F. Multistain Pretraining for Slide Representation Learning in Pathology. In: Leonardis A, Ricci E, Roth S, Russakovsky O, Sattler T, Varol G, editors. Computer Vision - ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15091. Cham: Springer, 2025. [DOI] [Full Text] |
| 120. | Lenz T, Neidlinger P, Ligero M, Wölflein G, van Treeck M, Kather JN. Unsupervised foundation model-agnostic slide-level representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2025 Jun 10-17; Nashville, TN, United States. IEEE, 2025. [DOI] [Full Text] |
| 121. | Juyal D, Padigela H, Shah C, Shenker D, Harguindeguy N, Liu Y, Martin B, Zhang Y, Nercessian M, Markey M, Finberg I, Luu K, Borders D, Ashar Javed S, Krause E, Biju R, Sood A, Ma A, Nyman J, Shamshoian J, Chhor G, Sanghavi D, Thibault M, Yu L, Najdawi F, Hipp JA, Fahy D, Glass B, Walk E, Abel J, Pokkalla H, Beck AH, Grullon S. PLUTO: Pathology-Universal Transformer. 2024 Preprint. Available from: eprint arXiv:2405.07905. [DOI] [Full Text] |
| 122. | Chen RJ, Chen C, Li Y, Chen TY, Trister AD, Krishnan RG, Mahmood F. Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18-24; New Orleans, LA, United States. IEEE, 2022. [DOI] [Full Text] |
| 123. | Hua S, Yan F, Shen T, Ma L, Zhang X. PathoDuet: Foundation models for pathological slide analysis of H&E and IHC stains. Med Image Anal. 2024;97:103289. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 12] [Reference Citation Analysis (0)] |
| 124. | Ai K, Aben N, de Jong ED, Gatopoulos I, Känzig N, Karasikov M, Lagré A, Moser R, van Doorn J, Tang F. Towards Large-Scale Training of Pathology Foundation Models. 2024 Preprint. Available from: eprint arXiv:2404.15217. [DOI] [Full Text] |
| 125. | Yan F, Wu J, Li J, Wang W, Lu J, Chen W, Gao Z, Li J, Yan H, Ma J, Chen M, Lu Y, Chen Q, Wang Y, Ling X, Wang X, Wang Z, Huang Q, Hua S, Liu M, Ma L, Shen T, Zhang X, He Y, Chen H, Zhang S, Wang Z. PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks. 2025 Preprint. Available from: eprint arXiv:2503.24345. [DOI] [Full Text] |
| 126. | Vaidya A, Zhang A, Jaume G, Song AH, Ding T, Wagner SJ, Lu MY, Doucet P, Robertson H, Almagro-Perez C, Chen RJ, ElHarouni D, Ayoub G, Bossi C, Ligon KL, Gerber G, Phi Le L, Mahmood F. Molecular-driven Foundation Model for Oncologic Pathology. 2025 Preprint. Available from: eprint arXiv:2501.16652. [DOI] [Full Text] |
| 127. | Filiot A, Dop N, Tchita O, Riou A, Dubois R, Peeters T, Valter D, Scalbert M, Saillard C, Robin G, Olivier A. Distilling foundation models for robust and efficient models in digital pathology. 2025 Preprint. Available from: eprint arXiv:2501.16239. [DOI] [Full Text] |
| 128. | Nicke T, Schäfer JR, Höfener H, Feuerhake F, Merhof D, Kießling F, Lotz J. Tissue concepts: Supervised foundation models in computational pathology. Comput Biol Med. 2025;186:109621. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 129. | Kan Wang Y, Tydlitatova L, Kunz JD, Oakley G, Chow BKB, Godrich RA, Lee MCH, Aghdam H, Bozkurt A, Zelechowski M, Vanderbilt C, Kanan C, Retamero JA, Hamilton P, Yousfi R, Fuchs TJ, Klimstra DS, Liu S. Screen Them All: High-Throughput Pan-Cancer Genetic and Phenotypic Biomarker Screening from H&E Whole Slide Images. 2024 Preprint. Available from: eprint arXiv:2408.09554. [DOI] [Full Text] |
| 130. | Wu Y, Li S, Du Z, Zhu W. BROW: Better featuRes fOr Whole slide image based on self-distillation. 2023 Preprint. Available from: eprint arXiv:2309.08259. [DOI] [Full Text] |
| 131. | Yang Z, Wei T, Liang Y, Yuan X, Gao R, Xia Y, Zhou J, Zhang Y, Yu Z. A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images. Nat Commun. 2025;16:2366. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 14] [Article Influence: 14.0] [Reference Citation Analysis (0)] |
| 132. | Alber M, Tietz S, Dippel J, Milbich T, Lesort T, Korfiatis P, Krügener M, Perez Cancer B, Shah N, Möllers A, Seegerer P, Carpen-Amarie A, Standvoss K, Dernbach G, de Jong E, Schallenberg S, Kunft A, Hoffer von Ankershoffen H, Schaeferle G, Duffy P, Redlon M, Jurmeister P, Horst D, Ruff L, Müller K-R, Klauschen F, Norgan A. Atlas: A Novel Pathology Foundation Model by Mayo Clinic, Charité, and Aignostics. 2025 Preprint. Available from: eprint arXiv:2501.05409. [DOI] [Full Text] |
| 133. | Yang Z, Li L, Lin K, Wang J, Lin CC, Liu Z, Wang L. The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision). 2023 Preprint. Available from: eprint arXiv:2309.17421. [DOI] [Full Text] |
| 134. | Wu J, Gan W, Chen Z, Wan S, Yu PS. Multimodal Large Language Models: A Survey. 2023 IEEE International Conference on Big Data (BigData); 2023 Dec 15-18; Sorrento, Italy. IEEE, 2024. [DOI] [Full Text] |
| 135. | Kaczmarczyk R, Wilhelm TI, Martin R, Roos J. Evaluating multimodal AI in medical diagnostics. NPJ Digit Med. 2024;7:205. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 18] [Reference Citation Analysis (0)] |
| 136. | Huang Z, Bianchi F, Yuksekgonul M, Montine TJ, Zou J. A visual-language foundation model for pathology image analysis using medical Twitter. Nat Med. 2023;29:2307-2316. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 207] [Article Influence: 103.5] [Reference Citation Analysis (0)] |
| 137. | Guo Z, Ma J, Xu Y, Wang Y, Wang L, Chen H. HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-Modal Context Interaction. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024. [DOI] [Full Text] |
| 138. | Ahmed F, Sellergen A, Yang L, Xu S, Babenko B, Ward A, Olson N, Mohtashamian A, Matias Y, Corrado GS, Duong Q, Webster DR, Shetty S, Golden D, Liu Y, Steiner DF, Wulczyn E. PathAlign: A vision-language model for whole slide images in histopathology. Proceedings of the MICCAI Workshop on Computational Pathology; 2024; Proceedings of Machine Learning Research. PMLR, 2024: 72-108. |
| 139. | Wang X, Zhao J, Marostica E, Yuan W, Jin J, Zhang J, Li R, Tang H, Wang K, Li Y, Wang F, Peng Y, Zhu J, Zhang J, Jackson CR, Zhang J, Dillon D, Lin NU, Sholl L, Denize T, Meredith D, Ligon KL, Signoretti S, Ogino S, Golden JA, Nasrallah MP, Han X, Yang S, Yu KH. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature. 2024;634:970-978. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 87] [Cited by in RCA: 142] [Article Influence: 142.0] [Reference Citation Analysis (0)] |
| 140. | Sun Y, Zhang Y, Si Y, Zhu C, Shui Z, Zhang K, Li J, Lyu X, Lin T, Yang L. PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration. 2024 Preprint. Available from: eprint arXiv:2407.00203. [DOI] [Full Text] |
| 141. | Lu MY, Chen B, Williamson DFK, Chen RJ, Zhao M, Chow AK, Ikemura K, Kim A, Pouli D, Patel A, Soliman A, Chen C, Ding T, Wang JJ, Gerber G, Liang I, Le LP, Parwani AV, Weishaupt LL, Mahmood F. A multimodal generative AI copilot for human pathology. Nature. 2024;634:466-473. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 145] [Cited by in RCA: 127] [Article Influence: 127.0] [Reference Citation Analysis (0)] |
| 142. | Sun Y, Zhu C, Zheng S, Zhang K, Sun L, Shui Z, Zhang Y, Li H, Yang L. PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology. Proc AAAI Conf Artif Intell. 2024;38:5034-5042. [DOI] [Full Text] |
| 143. | Xu H, Usuyama N, Bagga J, Zhang S, Rao R, Naumann T, Wong C, Gero Z, González J, Gu Y, Xu Y, Wei M, Wang W, Ma S, Wei F, Yang J, Li C, Gao J, Rosemon J, Bower T, Lee S, Weerasinghe R, Wright BJ, Robicsek A, Piening B, Bifulco C, Wang S, Poon H. A whole-slide foundation model for digital pathology from real-world data. Nature. 2024;630:181-188. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 237] [Reference Citation Analysis (0)] |
| 144. | Ding T, Wagner SJ, Song AH, Chen RJ, Lu MY, Zhang A, Vaidya AJ, Jaume G, Shaban M, Kim A, Williamson DFK, Chen B, Almagro-Perez C, Doucet P, Sahai S, Chen C, Komura D, Kawabe A, Ishikawa S, Gerber G, Peng T, Phi Le L, Mahmood F. Multimodal Whole Slide Foundation Model for Pathology. 2024 Preprint. Available from: eprint arXiv:2411.19666. [DOI] [Full Text] |
| 145. | Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, Jaume G, Odintsov I, Le LP, Gerber G, Parwani AV, Zhang A, Mahmood F. A visual-language foundation model for computational pathology. Nat Med. 2024;30:863-874. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 407] [Cited by in RCA: 239] [Article Influence: 239.0] [Reference Citation Analysis (0)] |
| 146. | Chen Y, Wang G, Ji Y, Li Y, Ye J, Li T, Hu M, Yu R, Qiao Y, He J. SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding. 2024 Preprint. Available from: eprint arXiv:2410.11761. [DOI] [Full Text] |
| 147. | Tan JW, Kim S, Kim E, Lee SH, Ahn S, Jeong W. Clinical-Grade Multi-organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024. [DOI] [Full Text] |
| 148. | Chen Z, Chen Y, Sun Y, Tang L, Zhang L, Hu Y, He M, Li Z, Cheng S, Yuan J, Wang Z, Wang Y, Zhao J, Gong J, Zhao L, Cao B, Li G, Zhang X, Dong B, Shen L. Predicting gastric cancer response to anti-HER2 therapy or anti-HER2 combined immunotherapy based on multi-modal data. Signal Transduct Target Ther. 2024;9:222. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 28] [Cited by in RCA: 35] [Article Influence: 35.0] [Reference Citation Analysis (0)] |
| 149. | Zhao W, Guo Z, Fan Y, Jiang Y, Yeung MCF, Yu L. Aligning knowledge concepts to whole slide images for precise histopathology image analysis. NPJ Digit Med. 2024;7:383. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 150. | Ferber D, Wölflein G, Wiest IC, Ligero M, Sainath S, Ghaffari Laleh N, El Nahhas OSM, Müller-Franzes G, Jäger D, Truhn D, Kather JN. In-context learning enables multimodal large language models to classify cancer pathology images. Nat Commun. 2024;15:10104. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 33] [Reference Citation Analysis (0)] |
| 151. | Wang J, Wang K, Yu Y, Lu Y, Xiao W, Sun Z, Liu F, Zou Z, Gao Y, Yang L, Zhou HY, Miao H, Zhao W, Huang L, Zeng L, Guo R, Chong I, Deng B, Cheng L, Chen X, Luo J, Zhu MH, Baptista-Hon D, Monteiro O, Li M, Ke Y, Li J, Zeng S, Guan T, Zeng J, Xue K, Oermann E, Luo H, Yin Y, Zhang K, Qu J. Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nat Med. 2025;31:609-617. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 25] [Article Influence: 25.0] [Reference Citation Analysis (0)] |
| 152. | Zhou Q, Zhong W, Guo Y, Xiao M, Ma H, Huang J. PathM3: A Multimodal Multi-task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024. [DOI] [Full Text] |
| 153. | Hu D, Jiang Z, Shi J, Xie F, Wu K, Tang K, Cao M, Huai J, Zheng Y. Histopathology language-image representation learning for fine-grained digital pathology cross-modal retrieval. Med Image Anal. 2024;95:103163. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 5] [Reference Citation Analysis (0)] |
| 154. | Zhang L, Yun B, Xie X, Li Q, Li X, Wang Y. Prompting Whole Slide Image Based Genetic Biomarker Prediction. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, Schnabel JA, editors. Medical Image Computing and Computer Assisted Intervention - MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Cham: Springer, 2024. [DOI] [Full Text] |
| 155. | Sengupta S, Brown DE. Automatic Report Generation for Histopathology Images Using Pre-Trained Vision Transformers and BERT. 2024 IEEE International Symposium on Biomedical Imaging (ISBI); 2024 May 27-30; Athens, Greece. IEEE: 2024. [DOI] [Full Text] |
| 156. | Xu Y, Wang Y, Zhou F, Ma J, Yang S, Lin H, Wang X, Wang J, Liang L, Han A, Chan RCK, Chen H. A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model. 2024 Preprint. Available from: eprint arXiv:2407.15362. [DOI] [Full Text] |
| 157. | Ferber D, El Nahhas OSM, Wölflein G, Wiest IC, Clusmann J, Leßmann ME, Foersch S, Lammert J, Tschochohei M, Jäger D, Salto-Tellez M, Schultz N, Truhn D, Kather JN. Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nat Cancer. 2025;6:1337-1349. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 9] [Cited by in RCA: 19] [Article Influence: 19.0] [Reference Citation Analysis (0)] |
| 158. | Shaikovski G, Casson A, Severson K, Zimmermann E, Kan Wang Y, Kunz JD, Retamero JA, Oakley G, Klimstra D, Kanan C, Hanna M, Zelechowski M, Viret J, Tenenholtz N, Hall J, Fusi N, Yousfi R, Hamilton P, Moye WA, Vorontsov E, Liu S, Fuchs TJ. PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology. 2024 Preprint. Available from: eprint arXiv:2405.10254. [DOI] [Full Text] |
| 159. | Tran M, Schmidle P, Guo RR, Wagner SJ, Koch V, Lupperger V, Novotny B, Murphree DH, Hardway HD, D'Amato M, Lefkes J, Geijs DJ, Feuchtinger A, Böhner A, Kaczmarczyk R, Biedermann T, Amir AL, Mooyaart AL, Ciompi F, Litjens G, Wang C, Comfere NI, Eyerich K, Braun SA, Marr C, Peng T. Generating dermatopathology reports from gigapixel whole slide images with HistoGPT. Nat Commun. 2025;16:4886. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 4] [Reference Citation Analysis (0)] |
| 160. | Dai D, Zhang Y, Yang Q, Xu L, Shen X, Xia S, Wang G. Pathologyvlm: a large vision-language model for pathology image understanding. Artif Intell Rev. 2025;58:186. [DOI] [Full Text] |
| 161. | Xiang J, Wang X, Zhang X, Xi Y, Eweje F, Chen Y, Li Y, Bergstrom C, Gopaulchan M, Kim T, Yu KH, Willens S, Olguin FM, Nirschl JJ, Neal J, Diehn M, Yang S, Li R. A vision-language foundation model for precision oncology. Nature. 2025;638:769-778. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 22] [Cited by in RCA: 39] [Article Influence: 39.0] [Reference Citation Analysis (0)] |
| 162. | Deshpande P, Rasin A, Tchoua R, Furst J, Raicu D, Schinkel M, Trivedi H, Antani S. Biomedical heterogeneous data categorization and schema mapping toward data integration. Front Big Data. 2023;6:1173038. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 163. | Mohammed Yakubu A, Chen YP. Ensuring privacy and security of genomic data and functionalities. Brief Bioinform. 2020;21:511-526. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 43] [Cited by in RCA: 23] [Article Influence: 4.6] [Reference Citation Analysis (0)] |
| 164. | Shin H, Ryu K, Kim JY, Lee S. Application of privacy protection technology to healthcare big data. Digit Health. 2024;10:20552076241282242. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 165. | Quinn TP, Jacobs S, Senadeera M, Le V, Coghlan S. The three ghosts of medical AI: Can the black-box present deliver? Artif Intell Med. 2022;124:102158. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 23] [Cited by in RCA: 72] [Article Influence: 18.0] [Reference Citation Analysis (0)] |
| 166. | Karim MR, Islam T, Shajalal M, Beyan O, Lange C, Cochez M, Rebholz-Schuhmann D, Decker S. Explainable AI for Bioinformatics: Methods, Tools and Applications. Brief Bioinform. 2023;24:bbad236. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 51] [Reference Citation Analysis (0)] |
| 167. | Caton S, Haas C. Fairness in Machine Learning: A Survey. ACM Comput Surv. 2024;56:1-38. [DOI] [Full Text] |
| 168. | Ong JCL, Chang SY, William W, Butte AJ, Shah NH, Chew LST, Liu N, Doshi-Velez F, Lu W, Savulescu J, Ting DSW. Ethical and regulatory challenges of large language models in medicine. Lancet Digit Health. 2024;6:e428-e432. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 73] [Cited by in RCA: 98] [Article Influence: 98.0] [Reference Citation Analysis (0)] |
| 169. | Hantel A, Walsh TP, Marron JM, Kehl KL, Sharp R, Van Allen E, Abel GA. Perspectives of Oncologists on the Ethical Implications of Using Artificial Intelligence for Cancer Care. JAMA Netw Open. 2024;7:e244077. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 16] [Cited by in RCA: 23] [Article Influence: 23.0] [Reference Citation Analysis (0)] |
| 170. | El Arab RA, Abu-Mahfouz MS, Abuadas FH, Alzghoul H, Almari M, Ghannam A, Seweid MM. Bridging the Gap: From AI Success in Clinical Trials to Real-World Healthcare Implementation-A Narrative Review. Healthcare (Basel). 2025;13:701. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 10] [Reference Citation Analysis (0)] |
