Copyright
©The Author(s) 2025.
World J Gastroenterol. Nov 21, 2025; 31(43): 112000
Published online Nov 21, 2025. doi: 10.3748/wjg.v31.i43.112000
Published online Nov 21, 2025. doi: 10.3748/wjg.v31.i43.112000
Table 1 Summary of artificial intelligence models for acute appendicitis: Methods and performance metrics
| No. | Ref. | Year | Country | Dataset size | Variables used | AI methods | Performance metrics |
| 1 | Sibic et al[84] | 2025 | Turkey | AAp: 400; non-AAp: 400 | Demographic, and radiological data [CT images (CNN architectures)] | MobileNet v2, ResNet v2, EfficientNet b2, Inception v3 (MobileNet v2 best results) | Accuracy: 79.1; precision: 82.0; sensitivity: 74.7; F1 score: 78.1; AUC: 0.877 |
| 2 | Navaei et al[17] | 2025 | Iran | AAp: 465; non-AAp: 317 | Demographic, clinical and biochemical data | DT, RF, SVM, KNN, GBM, AdaBoost, XGBoost, LightBoost, CatBoost (RF best results) | Accuracy: 94.6; sensitivity: 93.9; specificity: 95.7; F1 score: 93.6 |
| 3 | Li et al[8] | 2025 | China | Compl AAp: 88; uncompl AAp: 213 | Demographic, clinical and biochemical data | LR, SVM, RF, DT1, GBM, KNN, GNB, MLP (RF best results) | Accuracy: 81.0; sensitivity: 76.0; specificity: 83.0; F1 score: 74.0; AUC: 0.840 |
| 4 | Kucukakcali et al[86] | 2025 | Turkey | Compl AAp: 34; uncompl AAp: 65; non-AAp: 41 | Demographic and biochemical data | SGB (non-AAp vs AAp) | Accuracy: 96.3; sensitivity: 94.7; specificity: 100; F1 score: 97.3; AUC: 0.947 |
| SGB (uncompl vs compl AAp) | Accuracy: 78.9; sensitivity: 83.3; specificity: 76.9; F1 score: 71.4; AUC: 0.790 | ||||||
| 5 | Kucukakcali et al[87] | 2025 | Turkey | Compl AAp: 183; uncompl AAp: 290; negative AAp: 117 | Demographic and biochemical data | AdaBoost, XGBoost, SGB, bagged CART, RF (XGBoost best results) | Accuracy: 80.0; sensitivity: 70.8; specificity: 85.4; F1 score: 72.3 |
| AdaBoost, XGBoost, SGB, bagged CART, RF (XGBoost best results) | Accuracy: 90.7; sensitivity: 100; specificity: 61.5; F1 score: 94.3 | ||||||
| 6 | Kim et al[29] | 2025 | South Korea | Compl AAp: 655; uncompl AAp: 2789; negative AAp: 551; non-AAp: 3058 | CT images (non vs uncomplicated) | 3D-CNN (transfer learning, ResNet/DenseNet/EfficientNet) (DenseNet best results) | Accuracy: 79.5; sensitivity: 70.1; specificity: 87.6; AUC: 0.865 |
| CT images (complicated vs uncomplicated) | 3D-CNN (transfer learning, ResNet/DenseNet/EfficientNet) (DenseNet best results) | Accuracy: 76.1; sensitivity: 82.6; specificity: 74.2; AUC: 0.827 | |||||
| 7 | Kendall et al[88] | 2025 | Compl AAp: 1192; uncompl AAp: 344; non-AAp: 317 | Demographic, clinical, biochemical and radiological data | RF, LightGBM, LR, SGD, KNN, Dummy, GANDALF, RF + embedded LightGBM (best result) | Accuracy: 98.1; sensitivity: 97.8; specificity: 96.1; AUROC: 0.993 | |
| RF, LightGBM, LR, SGD, KNN, Dummy, GANDALF, LightGBM + filter FS (best result) | Accuracy: 90.1; sensitivity: 78.8; specificity: 95.1; AUROC: 0.931 | ||||||
| 8 | Erman et al[35] | 2025 | Canada | Compl AAp: 602; uncompl AAp: 1378 | Demographic, clinical and biochemical data | ML pipeline | Accuracy: 70.1; NPV: 82.8; PPV: 56.4 |
| 9 | Chen et al[3] | 2025 | China | Compl AAp: 357; uncompl AAp: 416 | Demographic, clinical and biochemical data | XGBoost, RF, DT (CART), SVM (XGBoost best results) | Accuracy: 85.5; sensitivity: 86.5; specificity: 84.6; AUC: 0.914 |
| 10 | Aydin et al[89] | 2025 | Turkey | Compl AAp: 296; uncompl AAp: 3658; non-AAp: 4632; validation: Compl AAp: 1580; Uncompl AAp: 1287; Non-AAp: 169 | Demographic, clinical, biochemical and radiological data | LR, KNN, SVM, CART, RF (RF best results for AAp diagnosis) | Accuracy: 99.2; sensitivity: 99.8; specificity: 99.3; AUC: 0.996 |
| LR, KNN, SVM, CART, RF (RF best results for severity of AAp) | Accuracy: 99.2; sensitivity: 99.3; specificity: 99.1; AUC: 0.995 | ||||||
| 11 | Zhao et al[90] | 2024 | China | Compl AAp: 258; uncompl AAp: 76 | Demographic, clinical, biochemical and radiological data (CT images) | Radiomics model (CT images), CT model (clinical and CT features), combined model | Accuracy: 75.4; sensitivity: 74.6; specificity: 82.6; AUC: 0.817 |
| 12 | Yazici et al[37] | 2024 | Turkey | Compl AAp: 142; uncompl AAp: 990 | Demographic, clinical and biochemical data | KNN, DT, LR, SVM, MLP, GNB (LR best result) | Accuracy: 96.0; sensitivity: 60.0; specificity: 100 |
| 13 | Wei et al[40] | 2024 | China | Compl AAp: 103; uncompl AAp: 219 | Demographic, clinical and biochemical data | LR, CART, FR, SVM, Bayes, KNN, NN, FDA, GBM (GBM best result) | Accuracy: 95.6; sensitivity: 91.7; specificity: 97.4; F1 score: 93.0 |
| 14 | Schipper et al[33] | 2024 | Netherlands | AAp: 167; non-AAp: 169 | Data including physical examination | XGBoost | AUC: 0.919 |
| Data including physical examination and biochemical data | XGBoost | AUC: 0.923 | |||||
| 15 | Roshanaei et al[36] | 2024 | Iran | AAp: 138; non-AAp: 396 | Demographic, clinical and biochemical data | GNB | Accuracy: 95.0; sensitivity: 87.2; specificity: 97.5; F1 score: 89.0 |
| 16 | Marcinkevičs et al[52] | 2024 | Germany | Compl AAp: 97; uncompl AAp: 482 | Radiological data (US images) (diagnosis) | CBM; MVCBM; SSMVCBM | AUROC: 0.800; AUPR: 0.920 |
| Radiological data (US images) (severity) | CBM; MVCBM; SSMVCBM | AUROC: 0.780; AUPR: 0.580 | |||||
| 17 | Males et al[39] | 2024 | Croatia | Compl AAp: 252; uncompl AAp: 252; negative AAp: 47 (pediatric cases) | Demographic, clinical and biochemical data | RF | Sensitivity: 99.7; specificity: 17.0 |
| XGBoost | Sensitivity: 99.8; specificity: 12.0 | ||||||
| LR | Sensitivity: 99.7; specificity: 5.2 | ||||||
| 18 | Liang et al[91] | 2024 | China | Training cohort: Compl AAp: 236; uncompl AAp: 464; validation cohort: Compl AAp: 182; uncompl AAp: 283 | Demographic, clinical, biochemical and radiological data | Conventional combined model (clinical + CT features); deep learning radiomics (DL + radiomics) our combined model (clinical + CT + DL + radiomics) radiologist’s diagnosis | Accuracy: 79.0; sensitivity: 66.5; specificity: 85.3; AUC: 0.816 |
| Accuracy: 72.5; sensitivity: 70.2; specificity: 73.9; AUC: 0.799 | |||||||
| 19 | Gollapalli et al[38] | 2024 | Saudi Arabia | 411 patients3 | Demographic, clinical and biochemical data | DT (experiment 1) | Accuracy: 75.0; sensitivity: 13.8; precision: 40.0; F1 score: 20.5 |
| KNN (experiment 1) | Accuracy: 83.1; sensitivity: 41.4; precision: 75.0; F1 score: 53.3 | ||||||
| DT (experiment 2) | Accuracy: 87.4; sensitivity: 91.2; precision: 83.8; F1 score: 87.4 | ||||||
| KNN (experiment 2) | Accuracy: 84.7; sensitivity: 84.6; precision: 83.7; F1 score: 84.2 | ||||||
| KNN bagging (experiment 3) | Accuracy: 92.1; sensitivity: 91.2; precision: 92.2; F1 score: 91.7 | ||||||
| DT bagging (experiment 3) | Accuracy: 89.5; sensitivity: 83.5; precision: 93.8; F1 score: 88.4 | ||||||
| Stacking (experiment 4) | Accuracy: 92.6; sensitivity: 89.0; precision: 95.3; F1 score: 92.0 | ||||||
| 20 | Chadaga et al[42] | 2024 | India | AAp: 465; non-AAp: 317 (pediatric cases) | Demographic, clinical and biochemical data | RF, LR, DT, KNN, AdaBoost, CatBoost, LightGBM, XGBoost, APPSTACK. Bayesian optimization, hybrid bat algorithm, hybrid self-adaptive bat algorithm, firefly algorithm, grid search, randomized search (hybrid bat algorithm with APPSTACK best results) | Accuracy: 94.0; sensitivity: 74.0; precision: 85.0; F1 score: 78.0; AUC: 0.960 |
| 21 | Abu-Ashour et al[41] | 2024 | Canada | AAp: 2100 (pediatric cases) | Ultrasound reports | Human | Precision: 57.3; sensitivity: 88.1; F score: 69.4 |
| ChatGPT (large language model) | Precision: 92.3; sensitivity: 68.4; F score: 78.5 | ||||||
| Operative reports | Human | Precision: 59.2; sensitivity: 95.3; F score: 73.1 | |||||
| ChatGPT (large language model) | Precision: 97.1; sensitivity: 75.8; F score: 85.1 | ||||||
| 22 | Phan-Mai et al[46] | 2023 | Vietnam | Compl AAp: 483; uncompl AAp: 1467 | Demographic, clinical and biochemical data | SVM (SMOTE-adjusted) | Accuracy: 65.5; AUC: 0.730 |
| DT (SMOTE-adjusted) | Accuracy: 73.8; AUC: 0.738 | ||||||
| KNN (SMOTE-adjusted) | Accuracy: 74.1; AUC: 0.831 | ||||||
| LR (SMOTE-adjusted) | Accuracy: 72.9; AUC: 0.789 | ||||||
| ANN (SMOTE-adjusted) | Accuracy: 74.2; AUC: 0.810 | ||||||
| GBM (SMOTE-adjusted) | Accuracy: 82.0; AUC: 0.890 | ||||||
| 23 | Pati et al[30] | 2023 | India | Compl AAp: 514; uncompl AAp: 196; non-AAp: 183 (pediatric cases) | Demographic, clinical, biochemical and radiological data | LR, NB, KNN, SVM, DT, RF, MLP, AdaBoost (RF best for diagnostic) | Accuracy: 91.6; precision: 89.0; sensitivity: 92.0; specificity: 91.3; F1 score: 90.4 |
| LR, NB, KNN, SVM, DT, RF, MLP, AdaBoost (AdaBoost best for complication prediction) | Accuracy: 92.2; precision: 94.6; sensitivity: 96.3; specificity: 68.6; F1 score: 95.4 | ||||||
| 24 | Park et al[45] | 2023 | South Korea | AAp: 246; non-AAp: 215; diverticulitis: 254 | CT images | CNN-EfficientNet algorithm (single image method) | Accuracy: 86.1; precision: 85.4; sensitivity: 85.6; specificity: 86.5; AUC: 0.937 |
| CT images | CNN-EfficientNet algorithm (RGB method) | Accuracy: 87.9; precision: 87.1; sensitivity: 87.9; specificity: 88.1; AUC: 0.951 | |||||
| 25 | Lin et al[93] | 2023 | Taiwan | Compl AAp: 49; uncompl AAp: 362 | Demographic, clinical, biochemical and radiological data | 9 different MLP-ANN analyzed (Lin et al[93] ANN model best results) | AUC: 0.897; sensitivity: 85.7; specificity: 91.7 |
| 26 | Li et al[92] | 2023 | China | Compl AAp: 141; uncompl AAp: 201 (pregnant patients) | Demographic, clinical, biochemical and radiological data | DT | AUC: 0.780 |
| 27 | Harmantepe et al[44] | 2023 | Turkey | AAp: 189; negative AAp: 156 | Demographic and biochemical data | LR, SVM, NN, KNN, voting classifier (voting best result) | Accuracy: 86.2; sensitivity: 83.7; specificity: 88.6 |
| 28 | Akbulut et al[43] | 2023 | Turkey | Compl AAp: 304; uncompl AAp: 1161; negative AAp: 332 | Demographic and biochemical data | CatBoost + SHAP (non-AAp vs AAp) | Accuracy: 88.2; sensitivity: 84.2; specificity: 93.2; F1 score: 88.7; AUC: 0.947 |
| CatBoost + SHAP (compl vs uncompl AAp) | Accuracy: 92.0; sensitivity: 94.1; specificity: 90.5; F1 score: 91.1; AUC: 0.969 | ||||||
| 29 | Xia et al[51] | 2022 | China | Compl AAp: 148; uncompl AAp: 150 | Demographic and clinical data | SVM | Accuracy: 83.6; sensitivity: 81.7; specificity: 85.3; Matthews: 0.6732 |
| 30 | Su et al[49] | 2022 | United States | AAp: 28002; non-AAp: 655 (adult cases) | Demographic and clinical data | LR | Accuracy: 96.0; sensitivity: 73.0; specificity: 68.0; AUC: 0.780 |
| RF | Accuracy: 97.0; sensitivity: 67.0; specificity: 71.0; AUC: 0.750 | ||||||
| AAp: 11128; non-AAp: 256 (pediatric cases) | Demographic and clinical data | LR | Accuracy: 95.0; sensitivity: 81.0; specificity: 78.0; AUC: 0.870 | ||||
| RF | Accuracy: 96.0; sensitivity: 82.0; specificity: 75.0; AUC: 0.860 | ||||||
| 31 | Shikha and Kasem[48] | 2023 | Brunei | Compl AAp: 25; uncompl AAp: 24; negative AAp: 97 (pediatric cases) | Demographic, Clinical, and biochemical data | AI pediatric appendicitis DT | Accuracy: 97.1; sensitivity: 96.7; specificity: 97.4 |
| 32 | Mijwil and Aggarwal[47] | 2022 | Iraq | Appendectomy: 3185; medical: 307 | Demographic, and biochemical data | RF, LR, NB, GLM, DT, SVM, GBT (RF best results) | Accuracy: 83.8; precision: 84.1; sensitivity: 81.1; specificity: 81.0 |
| 33 | Akgül et al[50] | 2021 | Turkey | Compl AAp: 45; uncompl AAp: 147; negative AAp: 24; non-AAp: 106 (pediatric cases) | Demographic, clinical, biochemical and radiological data | ANN | Sensitivity: 89.8; specificity: 81.2; AUC: 0.910 |
| 34 | Marcinkevics et al[53] | 2021 | Germany | Compl AAp: 51; uncompl AAp: 196; non-AAp: 183 (pediatric cases) | Demographic, clinical, biochemical and radiological data | LR (diagnostic) | Sensitivity: 88.0; specificity: 76.0; AUC: 0.910 |
| RF (diagnostic) | Sensitivity: 91.0; specificity: 86.0; AUC: 0.960 | ||||||
| GBM (diagnostic) | Sensitivity: 93.0; specificity: 86.0; AUC: 0.960 | ||||||
| LR (severity) | Sensitivity: 93.0; specificity: 42.0; AUC: 0.820 | ||||||
| RF (severity) | Sensitivity: 98.0; specificity: 45.0; AUC: 0.900 | ||||||
| GBM (severity) | Sensitivity: 97.0; specificity: 46.0; AUC: 0.900 | ||||||
| 35 | Aparicio et al[79] | 2021 | Switzerland | AAp: 430 (pediatric cases) | Demographic, clinical, and biochemical data | SLIM risk model | AUC: 0.850; AUPR: 0.900 |
| 36 | Hayashi et al[55] | 2021 | Japan | AAp: 70 videos (pediatric cases) | 70 videos (between 85-347 images per video) | U-net-based CNN | Not indicated |
| 37 | Reismann et al[56] | 2021 | Germany | AAp: 29 | Gene expression data (56.666 gene) | LR-based biomarker signature (4 genes) | AUC: 0.84 |
| 38 | Ghareeb et al[54] | 2021 | Egypt | 319 | Clinical findings. Chronic diseases. Patient characteristics. Laboratory and imaging | Ensemble model (subspace KNN) | AUC: 0.82; accuracy: 91.1 |
| 39 | Stiel et al[57] | 2020 | Germany | Compl AAp: 102; uncompl AAp: 234; negative AAp: 12; non-AAp: 115 (pediatric cases) | Demographic, clinical, biochemical and radiological data | Modified HAS based CART, AI score based RF (AAp vs nonoperative) | Sensitivity: 86.6; specificity: 70.9; AUC: 0.920 |
| Modified HAS based CART, AI score based RF (uncompl vs compl AAp) | Sensitivity: 97.1; specificity: 17.9; AUC: 0.710 | ||||||
| 40 | Akmese et al[58] | 2020 | Turkey | AAp: 214; non-AAp: 214 | Demographic and biochemical data | RF, CART, SVM, LR, KNN, ANN, GB (GB best results) | Accuracy: 95.3; sensitivity: 93.2; specificity: 97.1 |
| 41 | Aydin et al[59] | 2020 | Turkey | Control: 4244; negative AAp: 169; compl AAp: 1559; uncompl AAp: 1272 (pediatric cases) | Demographic and biochemical data | KNN, NB, DT, SVM, GLM, RF (RF best results) | Accuracy: 97.5; sensitivity: 97.8; specificity: 97.2; AUC: 0.997 |
| 42 | Rajpurkar et al[60] | 2020 | United States | AAp: 359; non-AAp: 287 | CT images | Average of 2D Res-Net18, average of 2D Res-Net34, LRCN Res-Net18, LRCN Res-Net34, SE-ResNeXt-50, AppendiXNet (3D-ResNet CNN) | Accuracy: 72.5; sensitivity: 78.4; specificity: 66.7; AUC: 0.810 |
| 43 | Park et al[61] | 2020 | United States | AAp: 215; non-AAp: 452 | CT images | 3D-CNN + grad-CAM | Accuracy: 91.5; sensitivity: 90.2; specificity: 92.0 |
| 44 | Zhao et al[63] | 2020 | China | AAp: 48; non-AAp: 86 | Midstream urine samples | Urinary proteomics + RF, SVM, NB (RF best results) | Accuracy: 83.6; sensitivity: 81.2; specificity: 84.4 |
| 45 | Ramirez-garcialuna et al[62] | 2020 | Mexico | AAp: 51; non-AAp: 17; negative AAp: 3; control: 51 | Demographic, clinical biochemical, radiological and infrared thermal data | Infrared thermography + RF classifier | Accuracy: 92.3; sensitivity: 90.0; specificity: 96.1; AUC: 0.906 |
| 46 | Reismann et al[65] | 2019 | Germany | Compl AAp: 183; uncompl AAp: 290; negative AAp: 117 (pediatric cases) | Signature appendiceal diameter CRP leukocytes neutrophils | CRP, leukocytes, neutrophils, linear model (LBFGS) (AAp vs non-AAp) | Accuracy: 90.0; sensitivity: 93.0; specificity: 67.0; AUC: 0.910 |
| CRP, leukocytes, neutrophils, linear model (LBFGS) (compl vs uncompl AAp) | Accuracy: 51.0; sensitivity: 95.0; specificity: 33.0; AUC: 0.800 | ||||||
| 47 | Kang et al[64] | 2019 | South Korea | AAp: 80; non-AAp: 164 | Demographic, clinical biochemical and radiological data | Alvarado, AAS, Eskelinen, DT based CHAID algorithm | AUC: 0.850 |
| 48 | Gudelis et al[66] | 2019 | Spain | AAp: 93; non-AAp: 159 | Demographic, clinical biochemical and radiological data | ANN | AUC: 0.950; PCC: 93.5 |
| CHAID | AUC: 0.930; PCC: 81.7 | ||||||
| 49 | Shahmoradi et al[67] | 2018 | Iran | AAp: 133; negative AAp: 48 | Demographic, clinical and biochemical data | MLP | Accuracy: 92.9; sensitivity: 80.0; specificity: 97.5; AUC: 0.832 |
| RBFN | Accuracy: 77.6; sensitivity: 28.0; specificity: 87.8 | ||||||
| LR | Accuracy: 83.9; sensitivity: 58.3; specificity: 93.2; AUC: 0.808 | ||||||
| 50 | Jamshidnezhad | 2017 | Iran | NA | Demographic, clinical biochemical and radiological data | ACSS, MLNN, SVM, NN, hybrid fuzzy model, evolutionary–fuzzy + HBRC | Accuracy: 89.9 |
| 51 | Afshari Safavi | 2015 | Iran | Compl AAp: 24; uncompl: 59; negative AAp: 17 | Demographic, and biochemical data | ANN (MLP) | Accuracy: 88.0; sensitivity: 97.6; AUC: 0.875 |
| 52 | Park and Kim[70] | 2015 | South Korea | Compl AAp: 62; uncompl AAp: 143; non-AAp: 596 | Demographic, clinical and radiological data | MLNN | Accuracy: 97.8; sensitivity: 96.6; specificity: 99.5 |
| RBF | AUC: 99.8; sensitivity: 99.7; specificity: 100 | ||||||
| PNN | AUC: 99.4; sensitivity: 98.1; specificity: 100 | ||||||
| 53 | Lee et al[75] | 2013 | Taiwan | AAp: 464; negative-AAp: 110 | Demographic, clinical and biochemical data | PEL, SVM, SMOTE, MCC, CM, WCUS, Alvarado (PEL best results) | Sensitivity: 57.3; specificity: 66.7; AUC: 0.619 |
| 54 | Iliou et al[94] | 2013 | Greece | AAp: 71 Non-AAp: 236 (pediatric cases) | Demographic, clinical and biochemical data | K1, JRip, bagging ensemble (majority voting) | Accuracy: 87.8 |
| 55 | Deleger et al[95] | 2013 | United States | AAp: 534; control: 1566 | Components of the pediatric appendicitis score | NLP | Sensitivity: 86.9; precision: 86.8; specificity: 93.8 |
| 56 | Yoldaş et al[71] | 2012 | Turkey | AAp: 132; negative-AAp: 24 | Demographic, clinical and biochemical data | ANN | Sensitivity: 100; specificity: 97.2; AUC: 0.950 |
| 57 | Son et al[76] | 2012 | South Korea | AAp: 152; non-AAp: 174 | Demographic, clinical and biochemical data | DT C5.0 model (univariate) | Accuracy: 80.2; sensitivity: 82.4; specificity: 78.3; AUC: 0.803 |
| DT C5.0 model (multivariate) | Accuracy: 73.5; sensitivity: 66.0; specificity: 80.0; AUC: 0.730 | ||||||
| 58 | Malley et al[96] | 2012 | United States | AAp: 85; negative AAp: 21 | Biochemical data | b-NN, class RF, Iboost, LR, KNN, regRF (regRF best results) | Brier score: 0.061; AUC: 0.976 |
| 59 | Grigull and Lechner[74] | 2012 | Germany | AAp: 45 (pediatric cases) | Demographic, clinical and biochemical data | SVM, ANN, fuzzy logic, voting algorithm (combination best results) | Accuracy: 97.4 |
| 60 | Hsieh et al[72] | 2011 | Taiwan | Compl AAp: 28; uncompl AAp: 87; negative AAp: 11; non-AAp: 65 | Demographic, clinical and biochemical data | RF, SVM, ANN, LR (RF best results) | Accuracy: 96.0; sensitivity: 94.0; specificity: 100; AUC: 0.980 |
| 61 | Ting et al[77] | 2010 | Taiwan | Compl AAp: 80; uncompl: 340; negative-AAp: 112 | Demographic, clinical and biochemical data | DT | Sensitivity: 94.5; specificity: 80.5 |
| 62 | Prabhudesai et al[73] | 2008 | United Kingdom | AAp: 24; non-AAp: 36 | Demographic, clinical and biochemical data | Alvarado (≥ 7), Alvarado (≥ 6), clinical, ANN (ANN best results) | Sensitivity: 100; specificity: 97.2; PPV: 96.0; NPV: 100 |
| 63 | Sakai et al[78] | 2007 | Japan | AAp: 86; negative AAp: 12; non-AAp: 71 | Demographic, clinical and biochemical data | LR | Sensitivity: 21.4; specificity: 80.4; AUC: 0.719 |
| ANN | Sensitivity: 19.9; specificity: 78.5; AUC: 0.741 | ||||||
| 64 | Pesonen et al[98] | 1996 | Finland | Suspected AAp: 911 | Demographic, clinical and biochemical data | NN (ART1) | Sensitivity: 79.0; specificity: 78.0 |
| NN (SOM) | Sensitivity: 55.0; specificity: 83.0 | ||||||
| NN (LVQ) | Sensitivity: 87.0; specificity: 90.0 | ||||||
| NN (BP) | Sensitivity: 83.0; specificity: 92.0 | ||||||
| 65 | Forsström et al[97] | 1995 | Finland | AAp: 145; negative AAp: 41 | Biochemical data | LR | AUC: 0.678 |
| DiagaiD | AUC: 0.683 | ||||||
| NN (BP) | AUC: 0.622 |
Table 2 Definitions of artificial intelligence techniques employed in acute appendicitis research
| Method | Definition | Relation to deep learning | Advantages |
| Deep learning | A subset of ML that uses multi-layered neural networks to automatically extract features from large datasets | DL is commonly used in image analysis text processing and predictive modeling. FL and edge AI can enhance the efficiency and privacy of DL models | High ACC strong capability in handling image and language data |
| Federated learning | A decentralized ML approach where models are trained across multiple institutions without sharing patient data | FL allows DL models to be trained across different centers while preserving patient privacy. It is useful for multi-center AI studies in appendicitis diagnosis | Enhances data privacy allows for cross-institutional AI model development |
| Edge AI | AI models that run directly on local hospital devices portable ultrasound scanners or mobile systems instead of relying on cloud computing | Edge AI enables DL models to operate in real-time on local devices reducing dependence on internet connectivity | Real-time processing improved data security reduced latency in decision-making |
| Bayesian networks | Probabilistic models that establish relationships between variables and handle uncertainty in data | Can be integrated with DL models to improve decision-making under incomplete information | Useful for risk prediction particularly in cases with missing clinical data |
| Transformer-based AI models (BERT, GPT) | Large language models capable of understanding and processing medical text | Can be used in combination with DL for automated triage systems and clinical note analysis | Efficient text processing potential for real-time clinical decision support |
| Graph neural networks | AI models that analyze relationships between data points in a structured graph format | GNNs can enhance DL models by incorporating complex patient relationships and comorbidities | Improves risk prediction models enhances interpretability of patient data interactions |
| Automated machine learning | AI systems that automatically optimize model selection hyperparameters and feature engineering | AutoML can generate optimized DL models without requiring manual tuning | Reduces the need for expert AI developers accelerates model deployment |
| Natural language processing | AI systems designed to interpret and extract information from human language including clinical notes and radiology reports | NLP models can be integrated with DL to analyze unstructured medical data | Enhances electronic health record analysis supports AI-assisted triage systems |
| Computer vision | AI field enabling machines to interpret visual data particularly useful in medical imaging | Computer vision models. including DL-based CNNs improve diagnostic ACC in radiology | Reduces diagnostic variability. increases ACC in CT and MRI interpretation |
| Reinforcement learning and explainable AI | AI models that learn optimal decision pathways based on cumulative rewards XAI ensures transparency in model predictions | Can optimize treatment strategies while SHAP and LIME techniques make AI models interpretable for clinicians | Improves AI adoption in healthcare enables better treatment planning |
| Machine learning | A broad AI field encompassing various algorithms including supervised and unsupervised learning | ML models, such as SVM, random forest and XGBoost form the foundation for AI in clinical decision-making | Provides adaptable and scalable models for medical data analysis |
| Vision transformers | A deep learning model specifically designed for image segmentation and classification | Enhances medical image analysis by capturing spatial relationships within radiology images | Improves segmentation ACC particularly in CT and MRI-based diagnosis |
| Lazy learning algorithms (KNN) | Classification method that identifies the closest data points in a dataset | Used in ML for patient clustering and classification | Simple yet effective but computationally expensive in large datasets |
| Extra trees classifier | A variant of random forest that introduces additional randomness to improve ACC | Works alongside ensemble learning to enhance classification performance | High ACC robustness in medical data analysis |
| Hybrid AI models | AI models combining ML and DL techniques to improve diagnostic performance | Used in multimodal AI-based appendicitis detection | Enhances ACC by integrating structured and unstructured data sources |
- Citation: Akbulut S, Kucukakcali Z, Colak C. Artificial intelligence in acute appendicitis: A comprehensive review of machine learning and deep learning applications. World J Gastroenterol 2025; 31(43): 112000
- URL: https://www.wjgnet.com/1007-9327/full/v31/i43/112000.htm
- DOI: https://dx.doi.org/10.3748/wjg.v31.i43.112000
