BPG is committed to discovery and dissemination of knowledge
Retrospective Study
Copyright: ©Author(s) 2026.
World J Gastroenterol. Jun 28, 2026; 32(24): 118690
Published online Jun 28, 2026. doi: 10.3748/wjg.118690
Table 1 Mayo Endoscopic Subscore
Score
Description
Endoscopic findings
0Normal or inactive diseaseNormal mucosal appearance; intact vascular pattern; no friability, bleeding, or ulceration
1Mild diseaseMild friability; decreased but visible vascular pattern; mild erythema; no erosions
2Moderate diseaseMarked erythema; absent vascular pattern; friability; erosions may be present
3Severe diseaseSpontaneous bleeding; ulceration; denuded mucosa; severe friability
Table 2 Overall performance comparison of five multimodal large language models in phase A

Gemini-2.5-Pro
Grok-4
GPT-4o
GPT-5
Qwen-VL-Max
Accuracy0.5020.4170.5940.7170.353
Precision0.6390.5520.6350.7310.488
Recall0.5020.4170.5940.7170.353
F1 score (95%CI)0.480 (0.452-0.574)0.415 (0.369-0.481)0.602 (0.522-0.648)0.720 (0.665-0.773)0.338 (0.263-0.382)
Cohen’s κ0.3430.2390.4490.6080.133
MAE0.6110.7700.4520.2970.767
MSE0.8661.1940.5440.3251.021
RMSE0.9301.0930.7380.5701.011
SD0.7810.9180.6970.5640.953
CV0.7130.8380.6370.5150.870
r value10.681 0.590 0.777 0.852 0.478
Table 3 Overall performance comparison of five multimodal large language models in phase B

Gemini-2.5-Pro
Grok-4
GPT-4o
GPT-5
Qwen-VL-Max
Accuracy0.5410.4200.5830.7100.406
Precision0.6210.4800.5900.7240.515
Recall0.5410.4200.5830.7100.406
F1 score (95%CI)0.546 (0.490-0.611)0.426 (0.363-0.481)0.584 (0.502-0.618)0.715 (0.664-0.768)0.408 (0.313-0.439)
Cohen’s κ0.3810.2290.4250.5960.190
MAE0.5650.7530.4730.3000.707
MSE0.7991.1410.5870.3220.940
RMSE0.8941.0680.7660.5670.969
SD0.8430.9730.7550.5660.951
CV0.7690.8880.6890.5170.868
r value10.6440.5730.7490.8500.499
Table 4 Segment-wise performance comparison of five multimodal large language models in phase B
Segments
Models
Accuracy
Precision
Recall
F1 score
Cohen’s κ
MAE
MSE
RMSE
SD
CV
r value1
Ileocecal regionGemini-2.5-Pro0.4260.7520.4260.4780.2260.7871.2131.1010.771.5080.606
Grok-40.3620.7270.3620.4560.1441.0431.9791.4071.0312.0180.453
GPT-4o0.5320.6430.5320.5660.250.5740.7870.8870.791.5470.639
GPT-50.5960.7180.5960.6250.3560.4260.4680.6840.5931.1620.751
Qwen-VL-Max0.2980.5470.2980.3390.0580.9571.4681.2120.9561.8720.315
Ascending colonGemini-2.5-Pro0.490.7390.490.5140.3090.6731.0411.020.7661.2110.665
Grok-40.3880.6410.3880.4340.1750.9391.6731.2941.0041.5860.5
GPT-4o0.6730.7110.6730.6790.4770.3670.4490.670.6240.9860.793
GPT-50.7960.8120.7960.7980.6570.2040.2040.4520.4350.6870.894
Qwen-VL-Max0.3670.7270.3670.4320.1440.7761.0611.030.8571.3550.513
Transverse colonGemini-2.5-Pro0.5490.720.5490.5730.3940.4710.510.7140.6210.6210.803
Grok-40.3330.5270.3330.3680.120.8041.0781.0380.8930.8930.612
GPT-4o0.6470.710.6470.6650.5040.3730.4120.6420.6040.6040.839
GPT-50.7840.8180.7840.7960.6880.2160.2160.4640.4540.4540.9
Qwen-VL-Max0.3920.5430.3920.4310.1550.7250.9610.980.9560.9560.495
Descending colonGemini-2.5-Pro0.630.7230.630.6280.5030.4350.5650.7520.740.5870.738
Grok-40.4130.4320.4130.4160.2110.7391.0871.0431.0340.820.518
GPT-4o0.630.6260.630.6150.4910.4350.5650.7520.720.5710.777
GPT-50.6960.7320.6960.6890.5830.3260.370.6080.5980.4740.842
Qwen-VL-Max0.5650.5320.5650.5240.4080.50.630.7940.7910.6280.689
Sigmoid colonGemini-2.5-Pro0.6670.7150.6670.6820.5320.3330.3330.5770.5560.3130.828
Grok-40.6890.6960.6890.6830.570.3110.3110.5580.5580.3140.855
GPT-4o0.5560.5710.5560.5410.3780.4890.5780.760.7390.4160.686
GPT-50.7780.7960.7780.7830.6880.2220.2220.4710.4520.2540.891
Qwen-VL-Max0.4670.6530.4670.4340.2220.5560.60.7750.7480.4210.63
RectumGemini-2.5-Pro0.4890.50.4890.4910.2530.6891.1331.0651.0620.7240.332
Grok-40.3560.3870.3560.350.1250.6440.6440.8030.7880.5370.665
GPT-4o0.4440.5020.4440.4530.2380.6220.7560.8690.8650.590.606
GPT-50.60.6110.60.5940.4330.4220.4670.6830.6650.4540.752
Qwen-VL-Max0.3560.3520.3560.3480.060.7110.8890.9430.9260.6310.403
Table 5 Diagnostic accuracy by Mayo Endoscopic Subscore grade among physicians and multimodal large language models
MES grade
0
1
2
3
Expert 110053.481.882.9
Expert 298.267.166.762.9
GPT-586.556.362.487.1
GPT-4o85.345.253.852.2
Gemini-2.5-Pro93.138.945.460.0
Grok-488.930.135.140.3
Qwen-VL-Max76.726.333.338.5


Write to the Help Desk