BPG is committed to discovery and dissemination of knowledge
Review
Copyright ©The Author(s) 2025.
World J Gastroenterol. Dec 21, 2025; 31(47): 112921
Published online Dec 21, 2025. doi: 10.3748/wjg.v31.i47.112921
Table 1 Summary of common general-purpose foundation models used in gastrointestinal cancer
Name
Type
Creator
Year
Architecture
Parameters
Modality
OSS
GI cancer applications
BERTLLMGoogle2018Encoder-only transformer110M (base), 340M (large)TextYesNLP, Radio, MLLM
GPT-3LLMOpenAI2020Decoder-only transformer175BTextNoNLP
ViTVisionGoogle2020Encoder-only transformer86M (base), 307M (large), 632M (huge)ImageYesEndo, Radio, PA, MLLM
DINOv1VisionMeta2021Encoder-only transformer22M, 86MImageYesEndo, PA
CLIPMMOpenAI2021Encoder-encoder120-580MText, ImageYesEndo, Radio, MLLM, directly1
GLM-130BLLMTsinghua2022Encoder-decoder130BTextYesNLP
Stable DiffusionMMStability AI2022Diffusion model1.45BText, ImageYesNLP, Endo, MLLM, directly
BLIPMMSalesforce2022Encoder-decoder120M (base), 340M (large)Text, ImageYesRadio, MLLM, directly
YouChatLLMYou.com2022Fine-tuned LLMsUnknownTextNoNLP
BardMMGoogle2023Based on PaLM 2340B estimatedText, Image, Audio, CodeNoNLP
Bing ChatMMMicrosoft2023Fine-tuned GPT-4UnknownText, ImageNoNLP
Mixtral 8x7BLLMMistral AI2023Decoder-only, Mixture-of-Experts (MoE)46.7B total (12.9B active per token)TextNLP
LLaVAMMMicrosoft2023Vision encoder, LLM7B, 13BText, ImageYesPA, MLLM
DINOv2VisionMeta2023Encoder-only transformer86M to 1.1BImageYesEndo, Radio, PA, MLLM, directly
Claude 2LLMAnthropic2023Decoder-only transformerUnknownTextNoNLP
GPT-4MMOpenAI2023Decoder-only transformer1.8T (Estimated)Text, ImageNoNLP, Endo, MLLM, directly
LLaMa 2LLMMeta2023Decoder-only transformer7B, 13B, 34B, 70BTextYesNLP, Endo, MLLM, directly
SAM VisionMeta2023Encoder-decoder375M, 1.25G, 2.56GImageYesEndo, directly
GPT-4VMMOpenAI2023MM transformer1.8T Text, ImageNoEndo, MLLM
QwenNLPAlibaba2023Decoder-only transformer70B, 180B, 720BTextYesNLP, MLLM
GPT-4oMMOpenAI2024MM transformerUnknown (Larger than GPT-4)Text, Image, VideoNoNLP
LLaMa 3LLMMeta2024Decoder-only transformer8B, 70B, 400BTextYesNLP, directly
Gemini 1.5MMGoogle2024MM transformer1.6TText, Image, Video, AudioNoNLP, Radio, directly
Claude 3.7MMAnthropic2024Decoder-only transformerUnknownText, ImageNoNLP, directly
YOLOWorldVisionIDEA2024CNN + RepVL-PAN vision-language fusion13-110M (depending on scale)Text, ImageYes Endo, directly
DeepSeekLLMDeepSeek2025Decoder-only transformer671BTextYesNLP
Phi-4LLMMicrosoft2025Decoder-only transformer14B (plus), 7B (mini)TextYesEndo
Table 2 Summary of key studies of large language models in the field of gastrointestinal cancer
Ref.
Year
Models
Objectives
Datasets
Performance
Evaluation
Syed et al[29]2022BERTiDeveloped fine-tuned BERTi for integrated colonoscopy reports34165 reportsF1-scores of 91.76%, 92.25%, 88.55% for colonoscopy, pathology, and radiologyManual chart review by 4 expert-guided reviewer
Lahat et al[30]2023GPTAssessed GPT performance in addressing 110 real-world gastrointestinal inquiries110 real-life questions Moderate accuracy (3.4-3.9/5) for treatment and diagnostic queriesAssessed by three gastroenterologists using a 1-5 scale for accuracy etc.
Lee et al[31]2023GPT-3.5Examined GPT-3.5’s responses to eight frequently asked colonoscopy questions8 colonoscopy-related questionsGPT answers had extremely low text similarity (0%-16%)Four gastroenterologists rated the answers on a 7-point Likert scale
Emile et al[32]2023GPT-3.5Analyzed GPT-3.5’s ability to generate appropriate responses to CRC questions38 CRC questions86.8% deemed appropriately, with 95% concordance on 2022 ASCRS guidelinesThree surgery experts assessed answers using ASCRS guidelines
Moazzam et al[33]2023GPTInvestigated the quality of GPT’s responses to pancreatic cancer-related questions30 pancreatic cancer-questions80% responses were “very good” or “excellent”Responses were graded by 20 experts against a clinical benchmark
Yeo et al[34]2023GPTAssessed GPT’s performance in answering questions regarding cirrhosis and HCC164 questions about cirrhosis and HCC79.1% correctness for cirrhosis and 74% for HCC, but only 47.3% comprehensivenessResponses were reviewed by two hepatologists and resolved by a 3rd reviewer
Cao et al[35]2023GPT-3.5Examined GPT-3.5’s capacity to answer on liver cancer screening and diagnosis20 questions48% answers were accurate, with frequent errors in LI-RADS categoriesSix fellowship-trained physicians from three centers assessed answers
Gorelik et al[36]2024GPT-4Evaluated GPT-4’s ability to provide guideline-aligned recommendations275 colonoscopy reportsAligned with experts in 87% of scenarios, showing no significant accuracy gapAdvice assessed by consensus review with multiple experts
Gorelik et al[37]2023GPT-4Analyzed GPT-4’s effectiveness in post-colonoscopy management guidance20 clinical scenarios90% followed guidelines, with 85% correctness and strong agreement (κ = 0.84)Assessed by two senior gastroenterologists for guideline compliance
Zhou et al[38]2023GPT-3.5 and GPT-4Developed a gastric cancer consultation system and automated report generator23 medical knowledge questions91.3% appropriate gastric cancer advice (GPT-4), 73.9% for GPT-3.5The evaluation was conducted by reviewers with medical standards
Yang et al[39]2025RECOVER (LLM)Designed a LLM-based remote patient monitoring system for postoperative care7 design sessions, 5 interviewsSix major design strategies for integrating clinical guidelines and informationClinical staff reviewed and provided feedback on the design and functionality
Kerbage et al[40]2024GPT-4Evaluated GPT-4’s accuracy in responding to IBS, IBD, and CRC screening65 questions (45 patients, 20 doctors)84% of answers were accurateAssessed independently by three senior gastroenterologists
Tariq et al[41]2024GPT-3.5, GPT-4, and BardCompared the efficacy of GPT-3.5, GPT 4, and Bard (July 2023 version) in answering 47 common colonoscopy patient queries47 queriesGPT 4 outperformed GPT-3.5 and Bard, with 91.4% fully accurate responses vs 6.4% and 14.9%, respectivelyResponses were scored by two specialists on a 0-2 point scale and resolved by a 3rd reviewer
Maida et al[42]2025GPT-4Evaluated GPT-4’s suitability in addressing screening, diagnostic, therapeutic inquiries15 CRC screening inquiries4.8/6 for CRC screening accuracy, 2.1/3 for completeness scoredAssessment involved 20 experts and 20 non-experts rating the answers
Atarere et al[43]2024BingChat, GPT, YouChatTested the appropriateness of GPT, BingChat, and YouChat in patient education and patient-physician communication20 questions (15 on CRC screening and 5 patient-related)GPT and YouChat provided more reliable answers than BingChat, but all models had occasional inaccuraciesTwo board-certified physicians and one Gastroenterologist graded the responses
Chang et al[44]2024GPT-4Compared GPT-4’s accuracy, reliability, and alignment of colonoscopy recommendations505 colonoscopy reports85.7% of cases matched USMSTF guidelinesAssessment was conducted by an expert panel under USMSTF guidelines
Lim et al[45]2024GPT-4Compared a contextualized GPT model with standard GPT in colonoscopy screening62 example use casesContextualized GPT-4 outperformed standard GPT-4Compare the GPT4 against a model with relevant screening guidelines
Munir et al[46]2024GPTEvaluated the quality and utility of responses for three GI surgeries24 research questionsModest quality and vary significantly based on the type of procedureResponses were graded by 45 expert surgeons
Truhn et al[47]2024GPT-4Created a structured data parsing module with GPT-4 for clinical text processing100 CRC reports99% accuracy for T-stage extraction, 96% for N-stage, and 94% for M-stageAccuracy of GPT-4 was compared with manually extracted data by experts
Choo et al[48]2024GPTDesigned a clinical decision-support system to generate personalized management plans30 stage III recurrent CRC patients86.7% agree with tumor board decisions, 100% for second-line therapiesThe recommendations were compared with the decision plans made by the MDT
Huo et al[49]2024GPT, BingChat, Bard, Claude 2Established a multi-AI platform framework to optimize CRC screening recommendationsResponses for 3 patient casesGPT aligned with guidelines in 66.7% of cases, while other AIs showed greater divergenceClinician and patient advice was compared to guidelines
Pereyra et al[50]2024GPT-3.5Optimized GPT-3.5 for personalized CRC screening recommendations238 physiciansGPT scored 4.57/10 for CRC screening, vs 7.72/10 for physiciansAnswers were compared against a group of surgeons
Peng et al[51]2024GPT-3.5Built a GPT-3.5-powered system for answering CRC-related queries131 CRC questions63.01 mean accuracy, but low comprehensiveness scores (0.73-0.83)Two physicians reviewed each response, with a third consulted for discrepancies
Ma et al[52]2024GPT-3.5Established GPT-3.5-based quality control for post-esophageal ESD procedures165 esophageal ESD cases92.5%-100% accuracy across post-esophageal ESD quality metricsTwo QC members and a senior supervisor conducted assessment
Cohen et al[53]2025LLaMA-2, Mistral-v0.1Explored the ability of LLMs to extract PD-L1 biomarker details for research purposes232 EHRs from 10 cancer typesFine-tuned LLMs outperformed LSTM trained on > 10000 examplesAssessed by 3 clinical experts against manually curated answers
Scherbakov et al[54]2025Mixtral 8 × 7 BAssessed LLM to extract stressful events from social history of clinical notes109556 patients, 375334 notesArrest or incarceration (OR = 0.26, 95%CI: 0.06-0.77)One human reviewer assessed the precision and recall of extracted events
Chatziisaak et al[55]2025GPT-4Evaluated the concordance of therapeutic recommendations generated by GPT100 consecutive CRC patients72.5% complete concordance, 10.2% partial concordance, and 17.3% discordanceThree reviewers independently assessed concordance with MDT
Saraiva et al[56]2025GPT-4Assessed GPT-4’s performance in interpreting images in gastroenterology740 imagesCapsule endoscopy: Accuracies 50.0%-90.0% (AUCs 0.50-0.90)Three experts reviewed and labeled images for CE
Siu et al[57]2025GPT-4Evaluated the efficacy, quality, and readability of GPT-4’s responses8 patient-style questionsAccurate (40), safe (4.25), appropriate (4.00), actionable (4.00), effective (4.00)Evaluated by 8 colorectal surgeons
Horesh et al[58]2025GPT-3.5Evaluated management recommendations of GPT in clinical settings15 colorectal or anal cancer patientsRating 48 for GPT recommendations, 4.11 for decision justificationEvaluated by 3 experienced colorectal surgeons
Ellison et al[59]2025GPT-3.5, PerplexityCompared readability using different prompts52 colorectal surgery materialsAverage 7.0-9.8, Ease 53.1-65.0, Modified 9.6-11.5Compared mean scores between baseline and documents generated by AI
Ramchandani et al[60]2025GPT-4Validated the use of GPT-4 for identifying articles discussing perioperative and preoperative risk factors for esophagectomy1967 studies for title and abstract screeningPerioperative: Agreement rate = 85.58%, AUC = 0.87. Preoperative: Agreement rate = 78.75%, AUC = 0.75Decisions were compared with those of three independent human reviewers
Zhang et al[61]2025GPT-4, DeepSeek, GLM-4, Qwen, LLaMa3To evaluate the consistency of LLMs in generating diagnostic records for hepatobiliary cases using the HepatoAudit dataset684 medical records covering 20 hepatobiliary diseasesPrecision: GPT-4 reached a maximum of 93.42%. Recall: Generally below 70%, with some diseases below 40%Professional physicians manually verified and corrected all the data
Spitzl et al[62]2025Claude-3.5, GPT-4o, DeepSeekV3, Gemini 2Assessed the capability of state-of-the-art LLMs to classify liver lesions based solely on textual descriptions from MRI reports88 fictitious MRI reports designed to resemble real clinical documentationMicro F1-score and macro F1-score: Claude 3.5 Sonnet 0.91 and 0.78, GPT-4o 0.76 and 0.63, DeepSeekV3 0.84 and 0.70, Gemini 2.0 Flash 0.69 and 0.55Model performance was assessed using micro and macro F1-scores benchmarked against ground truth labels
Sheng et al[63]2025GPT-4o and GeminiInvestigated the diagnostic accuracies for focal liver lesions228 adult patients with CT/MRI reportsTwo-step GPT-4o, single-step GPT-4o and single-step Gemini (78.9%, 68.0%, 73.2%)Six radiologists reviewed the images and clinical information in two rounds (alone, with LLM)
Williams et al[64]2025GPT-4-32KDetermined LLM extract reasons for a lack of follow-up colonoscopy846 patients' clinical notesOverall accuracy: 89.3%, reasons: Refused/not interested (35.2%)A physician reviewer checked 10% of LLM-generated labels
Lu et al[65]2025MoE-HRSUsed a novel MoE combined with LLMs for risk prediction and personalized healthcare recommendationsSNPs, medical and lifestyle data from United Kingdom BiobankMoE-HRS outperformed state-of-the-art cancer risk prediction models in terms of ROC-AUC, precision, recall, and F1 scoreLLMs-generated advice were validated by clinical medical staff
Yang et al[66]2025GPT-4Explored the use of LLMs to enhance doctor-patient communication698 pathology reports of tumorsAverage communication time decreased by over 70%, from 35 to 10 min (P < 0.001)Pathologists evaluated the consistency between original and AI reports
Jain et al[67]2025GPT-4, GPT-3.5, GeminiStudied the performance of LLMs across 20 clinicopathologic scenarios in gastrointestinal pathology20 clinicopathologic scenarios in GIDiagnostic accuracy: Gemini Advanced (95%, P = 0.01), GPT-4 (90%, P = 0.05), GPT-3.5 (65%)Two fellowship-trained pathologists independently assessed the responses of the models
Xu et al[68]2025GPT-4, GPT-4o, GeminiAssessed the performance of LLMs in predicting immunotherapy response in unresectable HCCMultimodal data from 186 patientsAccuracy and sensitivity: GPT-4o (65% and 47%) Gemini-GPT (68% and 58%). Physicians (72% and 70%)Six physicians (three radiologists and three oncologists) independently assessed the same dataset
Deroy et al[69]2025GPT-3.5 TurboExplored the potential of LLMs as a question-answering (QA) tool30 training and 50 testing queriesA1: 0.546 (maximum value); A2: 0.881 (maximum value across three runs)Model-generated answers were compared to the gold standard
Ye et al[70]2025BioBERT-basedProposed a novel framework that incorporates clinical features to enhance multi-omics clustering for cancer subtypingSix cancer datasets across three omics levels Mean survival score of 2.20, significantly higher than other methodsThree independent clinical experts review and validate the clustering results
Table 3 Summary of key studies of vision foundation models-assisted endoscopy in the field of gastrointestinal cancer
Model
Year
Architecture
Training algorithm
Parameters
Datasets
Disease studied
Model type
Source code link
Surgical-DINO[76]2023DINOv2LoRA layers added to DINOv2, optimizing the LoRA layers86.72MSCARED, HamlynEndoscopic SurgeryVisionhttps://github.com/BeileiCui/SurgicalDINO
ProMISe[77]2023SAM (ViT-B)APM and IPS modules are trained while keeping SAM frozen1.3-45.6M EndoScene, ColonDB etc.Polyps, Skin CancerVisionNA
Polyp-SAM[78]2023SAMStrategy as pretrain only the mask decoder while freezing all encodersNACVC-ColonDB Kvasir etc.Colon PolypsVisionhttps://github.com/ricklisz/Polyp-SAM
Endo-FM[79]2023ViT B/16Pretrained using a self-supervised teacher-student framework, and fine-tuned on downstream tasks121MColonoscopic, LDPolyp etc.Polyps, erosion, etc.Visionhttps://github.com/med-air/Endo-FM
ColonGPT[80]2024SigLIP-SO, Phi1.5Pre-alignment with image-caption pairs, followed by supervised fine-tuning using LoRA0.4-1.3BColonINST (30k+ images)Colorectal polypsVisionhttps://github.com/ColonGPT/ColonGPT
DeepCPD[81]2024ViTHyperparameters are optimized for colonoscopy datasets, including Adam optimizerNAPolypsSet, CP-CHILD-A etc.CRCVisionhttps://github.com/Zhang-CV/DeepCPD
OneSLAM[82]2024Transformer (CoTracker)Zero-shot adaptation using TAP + Local Bundle AdjustmentNASAGE-SLAM, C3VD etc.Laparoscopy, ColonVisionhttps://github.com/arcadelab/OneSLAM
EIVS[83]2024Vision Mamba, CLIPUnsupervised Cycle‑Consistency63.41M613 WLE, 637 imagesGastrointestinalVisionNA
APT[84]2024SAMParameter-efficient fine-tuningNAKvasir-SEG, EndoTect etc.CRCVisionNA
FCSAM[85]2024SAMLayerNorm LoRA fine-tuning strategy1.2MGastric cancer (630 pairs) etc.GC, Colon PolypsVisionNA
DuaPSNet[86]2024PVTv2-B3Transfer learning with pre-trained PVTv2-B3 on ImageNetNALaribPolypDB, ColonDB etc.CRCVisionhttps://github.com/Zachary-Hwang/Dua-PSNet
EndoDINO[87]2025ViT (B, L, g)DINOv2 methodology, hyperparameters tuning86M to 1BHyperKvasir, LIMUCGI EndoscopyVisionhttps://github.com/ZHANGBowen0208/EndoDINO/
PolypSegTrack[88]2025DINOv2One-step fine-tuning on colonoscopic videos without first pre-trainingNAETIS, CVC-ColonDB etc.Colon polypsVisionNA
AiLES[89]2025RF-NetNot fine-tuned from external modelNA100 GC patientsGastric cancerVisionhttps://github.com/CalvinSMU/AiLES
PPSAM[90]2025SAMFine-tuning with variable bounding box prompt perturbationsNAEndoScene, ColonDB etc.Investigated in Ref.Visionhttps://github.com/SLDGroup/PP-SAM
SPHINX-Co[91]2024LLaMA-2 + SPHINX-XFine-tuned SPHINX-X on CoPESD with cosine learning rate scheduler7B, 13BCoPESD Gastric cancerMultimodalhttps://github.com/gkw0010/CoPESD
LLaVA-Co[91]2024LLaVA-1.5 (CLIP-ViT-L)Fine-tuned LLaVA-1.5 on CoPESD with cosine learning rate scheduler7B, 13BCoPESD Gastric cancerMultimodalhttps://github.com/gkw0010/CoPESD
ColonCLIP[92]2025CLIPPrompt tuning with frozen CLIP, then encoder fine-tuning with frozen prompts57M, 86MOpenColonDB CRCMultimodalhttps://github.com/Zoe-TAN/ColonCLIP-OpenColonDB
PSDM[93]2025Stable Diffusion + CLIPContinual learning with prompt replay to incrementally train on multiple datasetsNAPolypGen, ColonDB, Polyplus etc.CRCVision, GenerativeThe original paper reported a GitHub link for this model, but it is currently unavailable
PathoPolypDiff[94]2025Stable Diffusion v1-4Fine-tuned Stable Diffusion v1-4 and locked first U-Net block, fine-tuned remaining blocksNAISIT-UMR Colonoscopy DatasetCRCGenerativehttps://github.com/Vanshali/PathoPolyp-Diff
Table 4 Summary of key studies of vision foundation models-assisted radiology in the field of gastrointestinal cancer
Model
Year
Architecture
Training algorithm
Parameters
Datasets
Disease studied
Model type
Source code link
PubMedCLIP[98]2021CLIPFine-tuned on ROCO dataset for 50 epochs with Adam optimizerNAROCO, VQA-RAD, SLAKEAbdomen samplesMultimodalhttps://github.com/sarahESL/PubMedCLIP
RadFM[97]2023MedLLaMA-13BPre-trained on MedMD and fine-tuned on RadMD14BMedMD, RadMD etc.Over 5000 diseases Multimodalhttps://github.com/chaoyi-wu/RadFM
Merlin[99]2024I3D-ResNet152Multi-task learning with EHR and radiology reports and fine-tuning for specific tasksNA6M images, 6M codes and reportsMultiple diseases, AbdominalMultimodalNA
MedGemini[100]2024GeminiFine-tuning Gemini 1.0/1.5 on medical QA, multimodal and long-context corpora1.5BMedQA, NEJM, GeneTuringVariousMultimodalhttps://github.com/Google-Health/med-gemini-medqa-relabelling
HAIDEF[101] 2024VideoCoCaFine-tuning on downstream tasks with limited labeled dataNACT volumes and reportsVariousVisionhttps://huggingface.co/collections/google/
CTFM[102]2024Vision Model1Trained using a self-supervised learning strategy, employing a SegResNet encoder for the pre-training phaseNA26298 CT scansCT scans (stomach, colon) Visionhttps://aim.hms.harvard.edu/ct-fm
MedVersa[103]2024Vision Model1Trained from scratch on the MedInterp dataset and adapted to various medical imaging tasksNAMedInterpVariousVisionhttps://github.com/3clyp50/MedVersa_Internal
iMD4GC[104]2024Transformer-based2A novel multimodal fusion architecture with cross-modal interaction and knowledge distillationNAGastricRes/Sur, TCGA etc.Gastric cancerMultimodalhttps://github.com/FT-ZHOU-ZZZ/iMD4GC/
Yasaka et al[105]2025BLIP-2LORA with specific fine-tuning of the fc1 layer in the vision and q-former modelsNA5777 CT scansEsophageal cancer via chest CTMultimodalNA
Table 5 Summary of key studies of Vision Foundation Models-assisted pathology in the field of gastrointestinal cancer
Model
Year
Architecture
Training Algorithm
Paras
WSIs
Tissues
Open source link
LUNIT-SSL[110]2021ViT-SDINO; full fine-tuning and linear evaluation on downstream tasks22M3.7K32https://Lunitio.github.io/research/publications/pathology_ssl
CTransPath[111]2022Swin TransformerMoCoV3 (SRCL); frozen backbone with linear classifier fine-tuning28M32K32https://github.com/Xiyue-Wang/TransPath
Phikon[112]2023ViT-BiBOT (Masked Image Modeling); fine-tuned with ABMIL/TransMIL on frozen features86M6K16https://github.com/owkin/HistoSSLscaling
REMEDIS[113]2023BiT-L (ResNet-152)SimCLR (contrastive learning); end-to-end fine-tuning on labeled ID/OOD data232M29K32https://github.com/google-research/simclr
Virchow[114]2024ViT-H, DINOv2DINOv2 (SSL); used frozen embeddings with simple aggregators632M1.5M17https://huggingface.co/paige-ai/Virchow
Virchow2[115]2024ViT-HDINOv2 (SSL); fine-tuned with linear probes or full-tuning on downstream tasks632M3.1M25https://huggingface.co/paige-ai/Virchow2
Virchow2G[115]2024ViT-GDINOv2 (SSL); fine-tuned with linear probes or full fine-tuning1.9B3.1M25https://huggingface.co/paige-ai/Virchow2
Virchow2G mini[115]12024ViT-S, Virchow2GDINOv2 (SSL); distilled from Virchow2G, then fine-tuned on downstream tasks22M3.2M25https://huggingface.co/paige-ai/Virchow2
UNI[9]2024ViT-LDINOv2 (SSL); used frozen features with linear probes or few-shot learning307M100K20https://github.com/mahmoodlab/UNI
Phikon-v2[116]2024ViT-LDINOv2 (SSL); frozen ViT and ABMIL ensemble fine-tuning307M58K30https://huggingface.co/owkin/phikon-v2
RudolfV[117]2024ViT-LDINOv2 (SSL); fine-tuned with optimizing linear classification layer and adapting encoder weights304M103K58https://github.com/rudolfv
HIBOU-B[118]2024ViT-BDINOv2 (SSL); frozen feature extractor, trained linear classifier or attention pooling86M1.1M12https://github.com/HistAI/hibou
HIBOU-L[118]22024ViT-LDINOv2 (SSL); frozen feature extractor, trained linear classifier or attention pooling307M1.1M12https://github.com/HistAI/hibou
H-Optimus-032024ViT-GDINOv2 (SSL); linear probe and ABMIL on frozen features1.1B> 500K32https://github.com/bioptimus/releases/
Madeleine[119]2024CONCHMAD-MIL; linear probing, prototyping, and full fine-tuning for downstream tasks86M23K2https://github.com/mahmoodlab/MADELEINE
COBRA[120]2024Mamba-2Self-supervised contrastive pretraining with multiple FMs and Mamba2 architecture15M3K6https://github.com/KatherLab/COBRA
PLUTO[121]2024FlexiVit-SDINOv2; frozen backbone with task-specific heads for fine-tuning22M158K28NA
HIPT[122]2025ViT-HIPTDINO (SSL); fine-tune with gradient accumulation10M11K33https://github.com/mahmoodlab/HIPT
PathoDuet[123]2025ViT-BMoCoV3; fine-tuned using standard supervised learning on labeled downstream task data86M11K32https://github.com/openmedlab/PathoDuet
Kaiko[124]2025ViT-LDINOv2 (SSL); linear probing with frozen encoder on downstream tasks303M29K32https://github.com/kaiko-ai/towards_large_pathology_fms
PathOrchestra[125]2025ViT-LDINOv2; ABMIL, linear probing, weakly supervised classification304M300K20https://github.com/yanfang-research/PathOrchestra
THREADS[126]2025ViT-L, CONCHv1.5Fine-tune gene encoder, initialize patch encoder randomly16M47K39https://github.com/mahmoodlab/trident
H0-mini[127]2025ViTUsing knowledge distillation from H-Optimus-086M6K16https://huggingface.co/bioptimus/H0-mini
TissueConcepts[128]2025Swin TransformerFrozen encoder with linear probe for downstream tasks27.5M7K14https://github.com/FraunhoferMEVIS/MedicalMultitaskModeling
OmniScreen[129]2025Virchow2Attention-aggregated Virchow2 embeddings fine-tuning632M48K27https://github.com/OmniScreen
BROW[130]2025ViT-BDINO (SSL); self-distillation with multi-scale and augmented views86M11K6NA
BEPH[131]2025BEiTv2BEiTv2 (SSL); supervised fine-tuning on clinical tasks with labeled data86M11K32https://github.com/Zhcyoung/BEPH
Atlas[132]2025ViT-H, RudolfVDINOv2; linear probing with frozen backbone on downstream tasks632M1.2M70NA
Table 6 Summary of key studies of multimodal large language models in the field of gastrointestinal cancer
Model
Year
Vision architecture
Vision dataset
WSIs
Text model
Text dataset
Parameters
Tissues
Generative
Open source link
PLIP[136]2023CLIPOpenPath28KCLIPOpenPathNA32Captioninghttps://github.com/PathologyFoundation/plip
HistGen[137]2023DINOv2, ViT-LMultiple55KLGH ModuleTCGAApproximately 100M32Report generationhttps://github.com/dddavid4real/HistGen
PathAlign[138]2023PathSSLCustom350KBLIP-2Diagnostic reportsApproximately 100M32Report generationhttps://github.com/elonybear/PathAlign
CHIEF[139]2024CTransPath14 Sources60KCLIPAnatomical information27.5M, 63M 19Nohttps://github.com/hms-dbmi/CHIEF
PathGen[140]2024LLaVA, CLIPTCGA7KCLIP1.6M pairs13B32WSI assistanthttps://github.com/PathFoundation/PathGen-1.6M
PathChat[141]2024UNIMultiple999KLLaMa 2Pathology instructions13B20AI assistanthttps://github.com/fedshyvana/pathology_mllm_training
PathAsst[142]2024PathCLIPPathCap207KVicuna-13BPathology instructions13B32AI assistanthttps://github.com/superjamessyx/Generative-Foundation-AI-Assistant-for-Pathology
ProvGigaPath[143]2024ViTProv-Path171KOpenCLIP17K Reports113531Nohttps://github.com/prov-gigapath/prov-gigapath
TITAN[144]2024ViTMass340K336KCoCaMedical reportsApproximately 5B20Report generationhttps://github.com/your-repo/TITAN
CONCH[145]2024ViTMultiple21KGPTstyle1.17M pairsNA19Captioninghttp://github.com/mahmoodlab/CONCH
SlideChat[146]2024CONCHLongNetTCGA4915Qwen2.5-7BSlide Instructions7B10WSI assistanthttps://github.com/uni-medical/SlideChat
PMPRG[147]2024MR-ViTCustom7422GPT-2Pathology ReportsNA2Multi-organ reporthttps://github.com/hvcl/Clinical-grade-PathologyReport-Generation
MuMo[148]2024MnasNetCustom429TransformerPathoRadio ReportsNA1Nohttps://github.com/czifan/MuMo
ConcepPath[149]2024ViT-B, CONCHQuilt-1M2243CLIPGPTPubMedApproximately 187M3 Nohttps://github.com/HKU-MedAI/ConcepPath
GPT-4V[150]2024Phikon ViT-BCRC-7K, MHIST etc.338KGPT-4NA40M3 Report generationhttps://github.com/Dyke-F/GPT-4V-In-Context-Learning
MINIM[151]2024Stable diffusionMultipleNABERT, CLIPMultipleNA6Report generationhttps://github.com/WithStomach/MINIM
PathM3[152]2024ViT-g/14PatchGastric991FlanT5XLPatchGastricNA1WSI assistantNA
FGCR[153]2024ResNet50Custom, GastrADC3598, 991BERTNA9.21 Mb6Report generationhttps://github.com/hudingyi/FGCR
PromptBio[154]2024PLIPTCGA, CPTAC482, 105GPT-4NANA1Report generationhttps://github.com/DeepMed-Lab-ECNU/PromptBio
HistoCap[155]2024ViTNA10KBERT, BioBERTGTEx datasetsNA40Report generationhttps://github.com/ssen7/histo_cap_transformers
mSTAR[156]2024UNITCGA10KBioBERTPathology Reports 11KNA32Report generationhttps://github.com/Innse/mSTAR
GPT-4 Enhanced[157]2025CTransPathTCGANAGPT-4ASCO, ESMO, OnkopediaNA4 Recommendation generationhttps://github.com/Dyke-F/LLM_RAG_Agent
PRISM[158]2025Virchow, ViT-HVirchow dataset587KBioGPT195K Reports632M17Report generationNA
HistoGPT[159]2025CTransPath, UNICustom15KBioGPTPathology Reports30M to 1.5B 1WSI assistanthttps://github.com/marrlab/HistoGPT
PathologyVLM[160]2025PLIP, CLIPPCaption-0.8MNALLaVAPCaption-0.5MNAMultiReport generationhttps://github.com/ddw2AIGROUP2CQUP/PA-LLaVA
MUSK[161]2025TransformerTCGA33KTransformerPubMed Central675M33Question answeringhttps://github.com/Lilab-stanford/MUSK