Copyright
©The Author(s) 2026.
Artif Intell Gastroenterol. Jan 8, 2026; 7(1): 115498
Published online Jan 8, 2026. doi: 10.35712/aig.v7.i1.115498
Published online Jan 8, 2026. doi: 10.35712/aig.v7.i1.115498
Table 1 Application of multimodal data in gastrointestinal tumors
| Data type | Core characteristics and key technologies | Main clinical application scenarios | AI empowerment and value |
| Imaging data | CT: High spatial resolution, rapid imaging, morphological analysis; MRI: Excellent soft tissue contrast (DWI, DCE), microenvironment assessment; PET: High metabolic sensitivity (SUV value), assessment of biological activity | Tumor localization, staging, efficacy evaluation, recurrence monitoring | AI application: Automatic segmentation based on CNN; radiomics feature mining. Value: Improves diagnostic consistency, predicts efficacy and metastasis risk |
| Endoscopic data | Provides HD real-time visualization of mucosal layer; chromo/electronic staining enhances contrast | Early screening and diagnosis (e.g., early gastric cancer, colorectal polyp detection) | AI Application: CNN models for automatic lesion identification, classification, and depth assessment. Value: Increases early detection rate, assists treatment decisions |
| Omics data | Genomics: Reveals driver mutations (e.g., HER2). Transcriptomics/proteomics/metabolomics: Reflects gene expression, protein function, metabolic status | Deciphering tumor heterogeneity, predicting treatment response and prognosis, facilitating personalized therapy | AI Application: Feature selection and dimension reduction; multimodal fusion (e.g., GNN model StereoMM, drug response prediction model DROEG). Value: Mines molecular mechanisms, enables precise typing, predicts drug sensitivity |
Table 2 Core framework of multimodal data fusion technologies
| Core stage | Key methods/technologies | Core challenges & solutions | Primary application value |
| Data preprocessing & standardization | Imaging data: N4 bias field correction, CLAHE, SMORE; Text data: Tokenization, word embedding, LLMs (e.g., BioBERT, GPT-4o); Standardization: Z-score, batch normalization, FHIR standard | Challenges: Data heterogeneity, missing values, noise, privacy. Solutions: Dedicated preprocessing, automated tools, unified standards (e.g., FHIR) | Improves data quality & consistency, lays foundation for fusion |
| Fusion strategy | Early fusion (data-level): Directly concatenates raw data. Middle fusion (feature-level): Multi-stream CNN, Attention Mechanism, GNNs. Late fusion (decision-level): Weighted averaging, voting, meta-learning | Challenges: Data heterogeneity, inter-modal relationships, information loss. Solutions: Select/combine strategies based on data traits and task goals (e.g., using attention to capture cross-modal dependencies) | Integrates multi-source complementary information, enhances model robustness & prediction accuracy |
| Model training & validation | Training techniques: Data augmentation, handling missing values, regularization, early stopping validation methods: K-fold cross-validation, external validation, multi-center validation evaluation metrics: ACC, AUC, sensitivity, specificity, f1-score | Challenges: Data imbalance, overfitting, generalization. Solutions: Employ rigorous internal/external validation, use explainable AI (e.g., SHAP) to enhance trust | Ensures model reliability, stability, and clinical applicability, promotes clinical translation |
Table 3 Clinical applications of multimodal artificial intelligence in personalized gastrointestinal cancer therapy
| Application area | Core function | Key technologies/data | Primary value |
| Intelligent diagnosis & staging | Early screening & precise staging: Enhances tumor identification and classification, predicts metastasis risk | Imaging data: CT, EUS, PET/CT; Omics data: Radiomics, genomics; Clinical data: EHR | Increases early detection rates, reduces missed diagnoses; enables more accurate preoperative staging to inform treatment decisions |
| Treatment optimization | Treatment response prediction: Guides the selection of surgery, radiotherapy, chemotherapy, and targeted/immunotherapy regimens | Multimodal fusion models: e.g., MuMo model; Data integration: Radiomics, genomics, immunomics, tumor microbiome | Accurately predicts efficacy, avoids unnecessary treatments; guides personalized medication (e.g., targeted drug combinations) to overcome drug resistance and improve response rates |
| Prognostic assessment & follow-up management | Risk stratification & recurrence prediction: Precisely assesses patient survival and recurrence risk. Dynamic follow-up management: Enables personalized long-term monitoring | Prognostic models: Integrate clinical, imaging, genomic data. Intelligent systems: Clinical Decision Support Systems, EHR analysis | Enables precise risk stratification to guide adjuvant therapy; improves follow-up efficiency, provides timely recurrence alerts, and optimizes resource allocation |
Table 4 Challenges and future directions of multimodal artificial intelligence in gastrointestinal cancer therapy
| Core challenges | Key technologies/methods | Future directions |
| Data quality & privacy protection: Data heterogeneity (divergent formats/standards); data noise (equipment/operator variations). Patient privacy risks (esp. genomic/imaging data) | Data standardization: Common data models (e.g., OMOP CDM, medical imaging CDM); Privacy-preserving techniques: FL, DP, Blockchain; Legal compliance: Frameworks like GDPR to enhance policy transparency | To build a more secure and reliable data environment, promoting seamless integration and controlled sharing of high-quality data |
| Model interpretability & clinical acceptability: "Black-box" problem erodes clinical trust. Opaque decision-making hinders regulatory approval & integration | Explainable AI: Attention mechanisms, prototype networks (ProtoPNet), Counterfactual explanations; Interpretability tools: LIME, SHAP, Grad-CAM for visualization & feature importance ranking; Clinical integration: Displaying model uncertainty & key decision factors in CDSS | To develop transparent and trustworthy AI systems, enhance clinician trust, and promote deep integration of AI into clinical workflows |
| Multi-center collaboration & standardization: Significant data heterogeneity across centers (equipment, protocols, populations). Poor model generalizability, hindering cross-institutional application | Multi-center data sharing & standardization: Unified data formats and acquisition standards; privacy-preserving collaborative training: Federated learning for joint modeling; standardized multimodal databases: Integrating genomics, radiomics, and other multidimensional data | To promote large-scale, high-quality multi-center collaboration, establish industry standards, and improve model generalizability and clinical applicability |
| Technical integration & clinical translation: Reliance on large annotated datasets limits generalizability. Barriers in translating research findings to clinical application | Emerging ML paradigms: RL for dynamic treatment optimization; SSL to reduce annotation dependency; Integrating Novel Data types: e.g., digital pathology, patient behavior data; Robust clinical validation: Validating model efficacy and robustness through clinical trials and RWD | To integrate multimodal AI with cutting-edge technologies and validate it through rigorous clinical trials, ultimately enabling its routine use in personalized therapy |
Table 5 Translation roadmap for clinical application of reinforcement learning and self-supervised learning in gastrointestinal tumors
| Phase | Timeframe | Core objective | Key technical milestones | Clinical & regulatory milestones |
| Short-term | 1-3 years | Foundational development & algorithmic validation | (1) Complete SSL model pre-training using large-scale historical data; (2) Construct RL simulation environments based on historical outcomes; and (3) Validate superior predictive accuracy of integrated models vs baselines on retrospective data | (1) Publication of proof-of-concept studies; and (2) Establishment of open-source benchmark datasets and simulation platforms |
| Mid-term | 3-5 years | Clinical trials in limited settings & system integration | (1) Develop interpretable, human-in-the-loop CDSS; (2) Model outputs serve as assistive decision aids) for clinicians; and (3) Validate system usability and clinician acceptance in prospective observational studies | (1) Obtain initial regulatory approval (e.g., as Class II medical device software); and (2) Develop clinical workflow integration guidelines |
| Long-term | 5+ years | Widespread integration & adaptive learning systems | (1) Achieve multi-center deployment using privacy-preserving techniques (e.g., Federated Learning); (2) Explore regulated continuous learning and model adaptation; and (3) Conduct large-scale RCTs with OS as a primary endpoint | (1) Confirm clinical benefit through high-level evidence; (2) Establish new standards for individualized care; and (3) Advocate for healthcare reimbursement policy coverage |
- Citation: Nian H, Wu YB, Bai Y, Zhang ZL, Tu XH, Liu QZ, Zhou DH, Du QC. Multimodal artificial intelligence integrates imaging, endoscopic, and omics data for intelligent decision-making in individualized gastrointestinal tumor treatment. Artif Intell Gastroenterol 2026; 7(1): 115498
- URL: https://www.wjgnet.com/2644-3236/full/v7/i1/115498.htm
- DOI: https://dx.doi.org/10.35712/aig.v7.i1.115498
