Published online Nov 21, 2025. doi: 10.3748/wjg.v31.i43.112000
Revised: August 26, 2025
Accepted: October 14, 2025
Published online: November 21, 2025
Processing time: 128 Days and 8.4 Hours
Acute appendicitis (AAp) remains one of the most common abdominal emergen
Core Tip: This comprehensive review explores the emerging role of artificial intelligence (AI), including machine learning and deep learning techniques, in diagnosing acute appendicitis (AAp). Despite advancements in imaging and clinical scoring, diagnosing AAp remains challenging, particularly in atypical cases. AI models such as random forests, support vector machines, and convolutional neural networks have demonstrated promising results in enhancing diagnostic accuracy and decision-making. In addition to aiding in the differential diagnosis of AAp from other causes of acute abdominal pain, AI approaches have also been applied to distinguish between complicated and uncomplicated appendicitis, thereby sup
- Citation: Akbulut S, Kucukakcali Z, Colak C. Artificial intelligence in acute appendicitis: A comprehensive review of machine learning and deep learning applications. World J Gastroenterol 2025; 31(43): 112000
- URL: https://www.wjgnet.com/1007-9327/full/v31/i43/112000.htm
- DOI: https://dx.doi.org/10.3748/wjg.v31.i43.112000
Acute appendicitis (AAp) is one of the most common acute abdominal emergencies worldwide, and timely and accurate diagnosis is crucial for optimal patient management[1-3]. AAp usually develops as a result of obstruction of the ap
Traditional diagnostic approaches encompass a combination of patient history, physical examination, and evaluation of biochemical markers such as white blood cell count, bilirubin, and C-reactive protein. Imaging modalities, including ultrasonography (US) and computed tomography (CT), are routinely employed, while magnetic resonance imaging (MRI) is preferentially used in specific populations such as pregnant patients. In addition, clinical scoring systems serve as valuable adjuncts to improve diagnostic accuracy and guide clinical decision-making[3,14-17]. These scoring systems include: Alvarado; Eskelinen; Ohmann; Appendicitis inflammatory response (AIR); Raja Isteri Pengiran Anak Saleha Appendicitis; Pediatric appendicitis score (PAS); Adult appendicitis score; Tzanakis; Lintula; Fenyo-Lindberg; Karaman, and others[14,16,18-23]. For instance, among established clinical scoring systems, the AIR score has demonstrated utility in severity stratification, with a recent pediatric study showing that a score ≥ 9 distinguishes perforated from non-perforated AAp with 89.5% sensitivity, 71.9% specificity, and an area under the curve (AUC) of 0.80, establishing it as a clinically accessible reference standard for complicated AAp assessment[24]. In adults, the Alvarado score (cutoff ≥ 8) demonstrated moderate accuracy in distinguishing AAp from negative appendectomy, with reported sensitivity of 72.9%, specificity of 70.6%, and an AUC of 0.782[25]. When applied to distinguish complicated from uncomplicated AAp, the Alvarado score with a cutoff of ≥ 6 achieved 80.6% sensitivity, 44.5% specificity, and an AUC of 0.605[26].
Despite the widespread use of traditional diagnostic tools, accurately distinguishing between a normal appendix and the spectrum of appendiceal inflammation- ranging from uncomplicated to complicated (perforated) AAp- remains a significant clinical challenge, particularly in atypical presentations, pediatric patients, and pregnant women, where diagnostic nuances and imaging limitations complicate decision-making[3,9,15]. While conventional approaches are often effective, their sensitivity and specificity vary widely across patient populations. To address these limitations, artificial intelligence (AI), including its subdomain machine learning (ML), and more specifically deep learning (DL), models have emerged as promising tools, capable of integrating multimodal clinical, laboratory, and radiological data to improve diagnostic accuracy and risk stratification[16,17,27-30]. Importantly, false-negative or false-positive diagnostic outcomes whether they originate from conventional clinical tools or from AI-supported systems can lead to serious clinical consequences such as perforation or unnecessary surgery, thereby increasing patient morbidity and healthcare burden[31-34].
In recent years, AI and ML techniques have gained significant importance not only in the initial diagnosis of AAp but also in accurately differentiating uncomplicated cases from those that are complicated such as perforated or gangrenous AAp as well as in assessing disease severity, guiding treatment planning, and predicting postoperative complications[3,8,16,27-30,33,35-84]. AI-based models, particularly ensemble learning approaches, not only improve diagnostic accuracy but also support clinical decision-making by reducing unnecessary surgeries and identifying high-risk perforation cases earlier. DL algorithms can accurately diagnose AAp from radiological images, while ML-based models effectively analyze laboratory data and patient characteristics to predict the risk of perforation[31,33,43,60]. Notably, ML and DL models such as random forest (RF), support vector machine (SVM), gradient boosting machine (GBM), and convolutional neural networks (CNNs) have achieved higher accuracy rates than traditional diagnostic methods in AAp diagnosis[47,85].
Recent literature highlights the diagnostic superiority of various ML models across diverse populations and clinical settings. Erman et al[35] developed ML models for pediatric patients that achieved 76.4% accuracy and an area under the receiver operating characteristic (AUROC) of 0.79 for detecting perforation, and 70.1% accuracy with an AUROC of 0.77 for grading severity. Gollapalli et al[38] demonstrated that bagging and stacking ensemble methods, particularly k-nearest neighbors (KNN) and decision tree-based models, achieved up to 92.6% accuracy and F1 scores above 90% when combined with up sampling techniques. In a resource-limited setting, Phan-Mai et al[46] showed that GBM distinguished complicated AAp with an AUROC of 0.858, performing better than other classifiers like SVM or artificial neural networks (ANN). Additional high-performance models include the gaussian naive bayes used by Roshanaei et al[36], yielding 95% accuracy, and the GBM-based model in Wei et al’s study[40], which reached 95.56% accuracy with a high sensitivity (91.67%) and specificity (97.39%). These findings reinforce the growing utility of ML in clinical AAp decision-making, particularly when tailored with proper sampling and algorithm selection. Issaiy et al[28] conducted a systematic review of 29 studies, most of which addressed diagnostic applications of AI in AAp. ANNs were frequently used and demonstrated a high performance, with accuracy rates commonly exceeding 80%, and AUROC values reaching up to 0.985. Despite these promising results, most studies suffered from selection bias and lacked internal validation. This reinforces the growing utility of ML in clinical AAp decision-making, particularly when tailored with proper sampling and algorithm selection. While the study provides a broad overview of AI integration across the entire spectrum of AAp management, our review offers a more focused and technically detailed analysis specifically on ML and DL methodologies for diagnosis, with in-depth evaluation of model architectures, performance metrics, imaging modalities, and emerging areas such as radiomics, explainable AI (XAI), and multimodal data fusion[16].
This review evaluates AI applications in AAp diagnosis, highlighting their clinical impact, comparative performance, and future implications for emergency medicine and surgical decision-making. In particular, ensemble learning techniques and hybrid models have been highlighted as key approaches to improving diagnostic sensitivity[38]. The integration of AI into radiological imaging (US, CT, MRI) and clinical data is analyzed in terms of diagnostic performance metrics such as sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and future research directions are discussed.
Since 2023, six systematic reviews[5,16,27,28,81,82] and two narrative reviews[9,80] have examined the application of AI in AAp, addressing diagnostic, therapeutic, and prognostic dimensions. In contrast, our study differs by providing a more focused and technically detailed evaluation of ML and DL methodologies, particularly for the diagnosis of AAp. Furthermore, it distinctly incorporates the correct use of AI, ML, and DL terminologies, an aspect often overlooked in prior reviews. Unlike these earlier studies, our study emphasizes algorithmic architectures, imaging methods, and XAI approaches, thereby offering a complementary perspective tailored for researchers and clinicians interested in diagnostic applications. Finally, this study uniquely considers the application of AI architectures by analyzing a total of 65 studies covering both adult and pediatric AAp research[3,8,17,29,30,33,35-79,84,86-98]. Table 1 provides detailed information on the referenced studies, including the AI models and their performance metrics, serving as a consolidated reference for comparative evaluation.
| No. | Ref. | Year | Country | Dataset size | Variables used | AI methods | Performance metrics |
| 1 | Sibic et al[84] | 2025 | Turkey | AAp: 400; non-AAp: 400 | Demographic, and radiological data [CT images (CNN architectures)] | MobileNet v2, ResNet v2, EfficientNet b2, Inception v3 (MobileNet v2 best results) | Accuracy: 79.1; precision: 82.0; sensitivity: 74.7; F1 score: 78.1; AUC: 0.877 |
| 2 | Navaei et al[17] | 2025 | Iran | AAp: 465; non-AAp: 317 | Demographic, clinical and biochemical data | DT, RF, SVM, KNN, GBM, AdaBoost, XGBoost, LightBoost, CatBoost (RF best results) | Accuracy: 94.6; sensitivity: 93.9; specificity: 95.7; F1 score: 93.6 |
| 3 | Li et al[8] | 2025 | China | Compl AAp: 88; uncompl AAp: 213 | Demographic, clinical and biochemical data | LR, SVM, RF, DT1, GBM, KNN, GNB, MLP (RF best results) | Accuracy: 81.0; sensitivity: 76.0; specificity: 83.0; F1 score: 74.0; AUC: 0.840 |
| 4 | Kucukakcali et al[86] | 2025 | Turkey | Compl AAp: 34; uncompl AAp: 65; non-AAp: 41 | Demographic and biochemical data | SGB (non-AAp vs AAp) | Accuracy: 96.3; sensitivity: 94.7; specificity: 100; F1 score: 97.3; AUC: 0.947 |
| SGB (uncompl vs compl AAp) | Accuracy: 78.9; sensitivity: 83.3; specificity: 76.9; F1 score: 71.4; AUC: 0.790 | ||||||
| 5 | Kucukakcali et al[87] | 2025 | Turkey | Compl AAp: 183; uncompl AAp: 290; negative AAp: 117 | Demographic and biochemical data | AdaBoost, XGBoost, SGB, bagged CART, RF (XGBoost best results) | Accuracy: 80.0; sensitivity: 70.8; specificity: 85.4; F1 score: 72.3 |
| AdaBoost, XGBoost, SGB, bagged CART, RF (XGBoost best results) | Accuracy: 90.7; sensitivity: 100; specificity: 61.5; F1 score: 94.3 | ||||||
| 6 | Kim et al[29] | 2025 | South Korea | Compl AAp: 655; uncompl AAp: 2789; negative AAp: 551; non-AAp: 3058 | CT images (non vs uncomplicated) | 3D-CNN (transfer learning, ResNet/DenseNet/EfficientNet) (DenseNet best results) | Accuracy: 79.5; sensitivity: 70.1; specificity: 87.6; AUC: 0.865 |
| CT images (complicated vs uncomplicated) | 3D-CNN (transfer learning, ResNet/DenseNet/EfficientNet) (DenseNet best results) | Accuracy: 76.1; sensitivity: 82.6; specificity: 74.2; AUC: 0.827 | |||||
| 7 | Kendall et al[88] | 2025 | Compl AAp: 1192; uncompl AAp: 344; non-AAp: 317 | Demographic, clinical, biochemical and radiological data | RF, LightGBM, LR, SGD, KNN, Dummy, GANDALF, RF + embedded LightGBM (best result) | Accuracy: 98.1; sensitivity: 97.8; specificity: 96.1; AUROC: 0.993 | |
| RF, LightGBM, LR, SGD, KNN, Dummy, GANDALF, LightGBM + filter FS (best result) | Accuracy: 90.1; sensitivity: 78.8; specificity: 95.1; AUROC: 0.931 | ||||||
| 8 | Erman et al[35] | 2025 | Canada | Compl AAp: 602; uncompl AAp: 1378 | Demographic, clinical and biochemical data | ML pipeline | Accuracy: 70.1; NPV: 82.8; PPV: 56.4 |
| 9 | Chen et al[3] | 2025 | China | Compl AAp: 357; uncompl AAp: 416 | Demographic, clinical and biochemical data | XGBoost, RF, DT (CART), SVM (XGBoost best results) | Accuracy: 85.5; sensitivity: 86.5; specificity: 84.6; AUC: 0.914 |
| 10 | Aydin et al[89] | 2025 | Turkey | Compl AAp: 296; uncompl AAp: 3658; non-AAp: 4632; validation: Compl AAp: 1580; Uncompl AAp: 1287; Non-AAp: 169 | Demographic, clinical, biochemical and radiological data | LR, KNN, SVM, CART, RF (RF best results for AAp diagnosis) | Accuracy: 99.2; sensitivity: 99.8; specificity: 99.3; AUC: 0.996 |
| LR, KNN, SVM, CART, RF (RF best results for severity of AAp) | Accuracy: 99.2; sensitivity: 99.3; specificity: 99.1; AUC: 0.995 | ||||||
| 11 | Zhao et al[90] | 2024 | China | Compl AAp: 258; uncompl AAp: 76 | Demographic, clinical, biochemical and radiological data (CT images) | Radiomics model (CT images), CT model (clinical and CT features), combined model | Accuracy: 75.4; sensitivity: 74.6; specificity: 82.6; AUC: 0.817 |
| 12 | Yazici et al[37] | 2024 | Turkey | Compl AAp: 142; uncompl AAp: 990 | Demographic, clinical and biochemical data | KNN, DT, LR, SVM, MLP, GNB (LR best result) | Accuracy: 96.0; sensitivity: 60.0; specificity: 100 |
| 13 | Wei et al[40] | 2024 | China | Compl AAp: 103; uncompl AAp: 219 | Demographic, clinical and biochemical data | LR, CART, FR, SVM, Bayes, KNN, NN, FDA, GBM (GBM best result) | Accuracy: 95.6; sensitivity: 91.7; specificity: 97.4; F1 score: 93.0 |
| 14 | Schipper et al[33] | 2024 | Netherlands | AAp: 167; non-AAp: 169 | Data including physical examination | XGBoost | AUC: 0.919 |
| Data including physical examination and biochemical data | XGBoost | AUC: 0.923 | |||||
| 15 | Roshanaei et al[36] | 2024 | Iran | AAp: 138; non-AAp: 396 | Demographic, clinical and biochemical data | GNB | Accuracy: 95.0; sensitivity: 87.2; specificity: 97.5; F1 score: 89.0 |
| 16 | Marcinkevičs et al[52] | 2024 | Germany | Compl AAp: 97; uncompl AAp: 482 | Radiological data (US images) (diagnosis) | CBM; MVCBM; SSMVCBM | AUROC: 0.800; AUPR: 0.920 |
| Radiological data (US images) (severity) | CBM; MVCBM; SSMVCBM | AUROC: 0.780; AUPR: 0.580 | |||||
| 17 | Males et al[39] | 2024 | Croatia | Compl AAp: 252; uncompl AAp: 252; negative AAp: 47 (pediatric cases) | Demographic, clinical and biochemical data | RF | Sensitivity: 99.7; specificity: 17.0 |
| XGBoost | Sensitivity: 99.8; specificity: 12.0 | ||||||
| LR | Sensitivity: 99.7; specificity: 5.2 | ||||||
| 18 | Liang et al[91] | 2024 | China | Training cohort: Compl AAp: 236; uncompl AAp: 464; validation cohort: Compl AAp: 182; uncompl AAp: 283 | Demographic, clinical, biochemical and radiological data | Conventional combined model (clinical + CT features); deep learning radiomics (DL + radiomics) our combined model (clinical + CT + DL + radiomics) radiologist’s diagnosis | Accuracy: 79.0; sensitivity: 66.5; specificity: 85.3; AUC: 0.816 |
| Accuracy: 72.5; sensitivity: 70.2; specificity: 73.9; AUC: 0.799 | |||||||
| 19 | Gollapalli et al[38] | 2024 | Saudi Arabia | 411 patients3 | Demographic, clinical and biochemical data | DT (experiment 1) | Accuracy: 75.0; sensitivity: 13.8; precision: 40.0; F1 score: 20.5 |
| KNN (experiment 1) | Accuracy: 83.1; sensitivity: 41.4; precision: 75.0; F1 score: 53.3 | ||||||
| DT (experiment 2) | Accuracy: 87.4; sensitivity: 91.2; precision: 83.8; F1 score: 87.4 | ||||||
| KNN (experiment 2) | Accuracy: 84.7; sensitivity: 84.6; precision: 83.7; F1 score: 84.2 | ||||||
| KNN bagging (experiment 3) | Accuracy: 92.1; sensitivity: 91.2; precision: 92.2; F1 score: 91.7 | ||||||
| DT bagging (experiment 3) | Accuracy: 89.5; sensitivity: 83.5; precision: 93.8; F1 score: 88.4 | ||||||
| Stacking (experiment 4) | Accuracy: 92.6; sensitivity: 89.0; precision: 95.3; F1 score: 92.0 | ||||||
| 20 | Chadaga et al[42] | 2024 | India | AAp: 465; non-AAp: 317 (pediatric cases) | Demographic, clinical and biochemical data | RF, LR, DT, KNN, AdaBoost, CatBoost, LightGBM, XGBoost, APPSTACK. Bayesian optimization, hybrid bat algorithm, hybrid self-adaptive bat algorithm, firefly algorithm, grid search, randomized search (hybrid bat algorithm with APPSTACK best results) | Accuracy: 94.0; sensitivity: 74.0; precision: 85.0; F1 score: 78.0; AUC: 0.960 |
| 21 | Abu-Ashour et al[41] | 2024 | Canada | AAp: 2100 (pediatric cases) | Ultrasound reports | Human | Precision: 57.3; sensitivity: 88.1; F score: 69.4 |
| ChatGPT (large language model) | Precision: 92.3; sensitivity: 68.4; F score: 78.5 | ||||||
| Operative reports | Human | Precision: 59.2; sensitivity: 95.3; F score: 73.1 | |||||
| ChatGPT (large language model) | Precision: 97.1; sensitivity: 75.8; F score: 85.1 | ||||||
| 22 | Phan-Mai et al[46] | 2023 | Vietnam | Compl AAp: 483; uncompl AAp: 1467 | Demographic, clinical and biochemical data | SVM (SMOTE-adjusted) | Accuracy: 65.5; AUC: 0.730 |
| DT (SMOTE-adjusted) | Accuracy: 73.8; AUC: 0.738 | ||||||
| KNN (SMOTE-adjusted) | Accuracy: 74.1; AUC: 0.831 | ||||||
| LR (SMOTE-adjusted) | Accuracy: 72.9; AUC: 0.789 | ||||||
| ANN (SMOTE-adjusted) | Accuracy: 74.2; AUC: 0.810 | ||||||
| GBM (SMOTE-adjusted) | Accuracy: 82.0; AUC: 0.890 | ||||||
| 23 | Pati et al[30] | 2023 | India | Compl AAp: 514; uncompl AAp: 196; non-AAp: 183 (pediatric cases) | Demographic, clinical, biochemical and radiological data | LR, NB, KNN, SVM, DT, RF, MLP, AdaBoost (RF best for diagnostic) | Accuracy: 91.6; precision: 89.0; sensitivity: 92.0; specificity: 91.3; F1 score: 90.4 |
| LR, NB, KNN, SVM, DT, RF, MLP, AdaBoost (AdaBoost best for complication prediction) | Accuracy: 92.2; precision: 94.6; sensitivity: 96.3; specificity: 68.6; F1 score: 95.4 | ||||||
| 24 | Park et al[45] | 2023 | South Korea | AAp: 246; non-AAp: 215; diverticulitis: 254 | CT images | CNN-EfficientNet algorithm (single image method) | Accuracy: 86.1; precision: 85.4; sensitivity: 85.6; specificity: 86.5; AUC: 0.937 |
| CT images | CNN-EfficientNet algorithm (RGB method) | Accuracy: 87.9; precision: 87.1; sensitivity: 87.9; specificity: 88.1; AUC: 0.951 | |||||
| 25 | Lin et al[93] | 2023 | Taiwan | Compl AAp: 49; uncompl AAp: 362 | Demographic, clinical, biochemical and radiological data | 9 different MLP-ANN analyzed (Lin et al[93] ANN model best results) | AUC: 0.897; sensitivity: 85.7; specificity: 91.7 |
| 26 | Li et al[92] | 2023 | China | Compl AAp: 141; uncompl AAp: 201 (pregnant patients) | Demographic, clinical, biochemical and radiological data | DT | AUC: 0.780 |
| 27 | Harmantepe et al[44] | 2023 | Turkey | AAp: 189; negative AAp: 156 | Demographic and biochemical data | LR, SVM, NN, KNN, voting classifier (voting best result) | Accuracy: 86.2; sensitivity: 83.7; specificity: 88.6 |
| 28 | Akbulut et al[43] | 2023 | Turkey | Compl AAp: 304; uncompl AAp: 1161; negative AAp: 332 | Demographic and biochemical data | CatBoost + SHAP (non-AAp vs AAp) | Accuracy: 88.2; sensitivity: 84.2; specificity: 93.2; F1 score: 88.7; AUC: 0.947 |
| CatBoost + SHAP (compl vs uncompl AAp) | Accuracy: 92.0; sensitivity: 94.1; specificity: 90.5; F1 score: 91.1; AUC: 0.969 | ||||||
| 29 | Xia et al[51] | 2022 | China | Compl AAp: 148; uncompl AAp: 150 | Demographic and clinical data | SVM | Accuracy: 83.6; sensitivity: 81.7; specificity: 85.3; Matthews: 0.6732 |
| 30 | Su et al[49] | 2022 | United States | AAp: 28002; non-AAp: 655 (adult cases) | Demographic and clinical data | LR | Accuracy: 96.0; sensitivity: 73.0; specificity: 68.0; AUC: 0.780 |
| RF | Accuracy: 97.0; sensitivity: 67.0; specificity: 71.0; AUC: 0.750 | ||||||
| AAp: 11128; non-AAp: 256 (pediatric cases) | Demographic and clinical data | LR | Accuracy: 95.0; sensitivity: 81.0; specificity: 78.0; AUC: 0.870 | ||||
| RF | Accuracy: 96.0; sensitivity: 82.0; specificity: 75.0; AUC: 0.860 | ||||||
| 31 | Shikha and Kasem[48] | 2023 | Brunei | Compl AAp: 25; uncompl AAp: 24; negative AAp: 97 (pediatric cases) | Demographic, Clinical, and biochemical data | AI pediatric appendicitis DT | Accuracy: 97.1; sensitivity: 96.7; specificity: 97.4 |
| 32 | Mijwil and Aggarwal[47] | 2022 | Iraq | Appendectomy: 3185; medical: 307 | Demographic, and biochemical data | RF, LR, NB, GLM, DT, SVM, GBT (RF best results) | Accuracy: 83.8; precision: 84.1; sensitivity: 81.1; specificity: 81.0 |
| 33 | Akgül et al[50] | 2021 | Turkey | Compl AAp: 45; uncompl AAp: 147; negative AAp: 24; non-AAp: 106 (pediatric cases) | Demographic, clinical, biochemical and radiological data | ANN | Sensitivity: 89.8; specificity: 81.2; AUC: 0.910 |
| 34 | Marcinkevics et al[53] | 2021 | Germany | Compl AAp: 51; uncompl AAp: 196; non-AAp: 183 (pediatric cases) | Demographic, clinical, biochemical and radiological data | LR (diagnostic) | Sensitivity: 88.0; specificity: 76.0; AUC: 0.910 |
| RF (diagnostic) | Sensitivity: 91.0; specificity: 86.0; AUC: 0.960 | ||||||
| GBM (diagnostic) | Sensitivity: 93.0; specificity: 86.0; AUC: 0.960 | ||||||
| LR (severity) | Sensitivity: 93.0; specificity: 42.0; AUC: 0.820 | ||||||
| RF (severity) | Sensitivity: 98.0; specificity: 45.0; AUC: 0.900 | ||||||
| GBM (severity) | Sensitivity: 97.0; specificity: 46.0; AUC: 0.900 | ||||||
| 35 | Aparicio et al[79] | 2021 | Switzerland | AAp: 430 (pediatric cases) | Demographic, clinical, and biochemical data | SLIM risk model | AUC: 0.850; AUPR: 0.900 |
| 36 | Hayashi et al[55] | 2021 | Japan | AAp: 70 videos (pediatric cases) | 70 videos (between 85-347 images per video) | U-net-based CNN | Not indicated |
| 37 | Reismann et al[56] | 2021 | Germany | AAp: 29 | Gene expression data (56.666 gene) | LR-based biomarker signature (4 genes) | AUC: 0.84 |
| 38 | Ghareeb et al[54] | 2021 | Egypt | 319 | Clinical findings. Chronic diseases. Patient characteristics. Laboratory and imaging | Ensemble model (subspace KNN) | AUC: 0.82; accuracy: 91.1 |
| 39 | Stiel et al[57] | 2020 | Germany | Compl AAp: 102; uncompl AAp: 234; negative AAp: 12; non-AAp: 115 (pediatric cases) | Demographic, clinical, biochemical and radiological data | Modified HAS based CART, AI score based RF (AAp vs nonoperative) | Sensitivity: 86.6; specificity: 70.9; AUC: 0.920 |
| Modified HAS based CART, AI score based RF (uncompl vs compl AAp) | Sensitivity: 97.1; specificity: 17.9; AUC: 0.710 | ||||||
| 40 | Akmese et al[58] | 2020 | Turkey | AAp: 214; non-AAp: 214 | Demographic and biochemical data | RF, CART, SVM, LR, KNN, ANN, GB (GB best results) | Accuracy: 95.3; sensitivity: 93.2; specificity: 97.1 |
| 41 | Aydin et al[59] | 2020 | Turkey | Control: 4244; negative AAp: 169; compl AAp: 1559; uncompl AAp: 1272 (pediatric cases) | Demographic and biochemical data | KNN, NB, DT, SVM, GLM, RF (RF best results) | Accuracy: 97.5; sensitivity: 97.8; specificity: 97.2; AUC: 0.997 |
| 42 | Rajpurkar et al[60] | 2020 | United States | AAp: 359; non-AAp: 287 | CT images | Average of 2D Res-Net18, average of 2D Res-Net34, LRCN Res-Net18, LRCN Res-Net34, SE-ResNeXt-50, AppendiXNet (3D-ResNet CNN) | Accuracy: 72.5; sensitivity: 78.4; specificity: 66.7; AUC: 0.810 |
| 43 | Park et al[61] | 2020 | United States | AAp: 215; non-AAp: 452 | CT images | 3D-CNN + grad-CAM | Accuracy: 91.5; sensitivity: 90.2; specificity: 92.0 |
| 44 | Zhao et al[63] | 2020 | China | AAp: 48; non-AAp: 86 | Midstream urine samples | Urinary proteomics + RF, SVM, NB (RF best results) | Accuracy: 83.6; sensitivity: 81.2; specificity: 84.4 |
| 45 | Ramirez-garcialuna et al[62] | 2020 | Mexico | AAp: 51; non-AAp: 17; negative AAp: 3; control: 51 | Demographic, clinical biochemical, radiological and infrared thermal data | Infrared thermography + RF classifier | Accuracy: 92.3; sensitivity: 90.0; specificity: 96.1; AUC: 0.906 |
| 46 | Reismann et al[65] | 2019 | Germany | Compl AAp: 183; uncompl AAp: 290; negative AAp: 117 (pediatric cases) | Signature appendiceal diameter CRP leukocytes neutrophils | CRP, leukocytes, neutrophils, linear model (LBFGS) (AAp vs non-AAp) | Accuracy: 90.0; sensitivity: 93.0; specificity: 67.0; AUC: 0.910 |
| CRP, leukocytes, neutrophils, linear model (LBFGS) (compl vs uncompl AAp) | Accuracy: 51.0; sensitivity: 95.0; specificity: 33.0; AUC: 0.800 | ||||||
| 47 | Kang et al[64] | 2019 | South Korea | AAp: 80; non-AAp: 164 | Demographic, clinical biochemical and radiological data | Alvarado, AAS, Eskelinen, DT based CHAID algorithm | AUC: 0.850 |
| 48 | Gudelis et al[66] | 2019 | Spain | AAp: 93; non-AAp: 159 | Demographic, clinical biochemical and radiological data | ANN | AUC: 0.950; PCC: 93.5 |
| CHAID | AUC: 0.930; PCC: 81.7 | ||||||
| 49 | Shahmoradi et al[67] | 2018 | Iran | AAp: 133; negative AAp: 48 | Demographic, clinical and biochemical data | MLP | Accuracy: 92.9; sensitivity: 80.0; specificity: 97.5; AUC: 0.832 |
| RBFN | Accuracy: 77.6; sensitivity: 28.0; specificity: 87.8 | ||||||
| LR | Accuracy: 83.9; sensitivity: 58.3; specificity: 93.2; AUC: 0.808 | ||||||
| 50 | Jamshidnezhad | 2017 | Iran | NA | Demographic, clinical biochemical and radiological data | ACSS, MLNN, SVM, NN, hybrid fuzzy model, evolutionary–fuzzy + HBRC | Accuracy: 89.9 |
| 51 | Afshari Safavi | 2015 | Iran | Compl AAp: 24; uncompl: 59; negative AAp: 17 | Demographic, and biochemical data | ANN (MLP) | Accuracy: 88.0; sensitivity: 97.6; AUC: 0.875 |
| 52 | Park and Kim[70] | 2015 | South Korea | Compl AAp: 62; uncompl AAp: 143; non-AAp: 596 | Demographic, clinical and radiological data | MLNN | Accuracy: 97.8; sensitivity: 96.6; specificity: 99.5 |
| RBF | AUC: 99.8; sensitivity: 99.7; specificity: 100 | ||||||
| PNN | AUC: 99.4; sensitivity: 98.1; specificity: 100 | ||||||
| 53 | Lee et al[75] | 2013 | Taiwan | AAp: 464; negative-AAp: 110 | Demographic, clinical and biochemical data | PEL, SVM, SMOTE, MCC, CM, WCUS, Alvarado (PEL best results) | Sensitivity: 57.3; specificity: 66.7; AUC: 0.619 |
| 54 | Iliou et al[94] | 2013 | Greece | AAp: 71 Non-AAp: 236 (pediatric cases) | Demographic, clinical and biochemical data | K1, JRip, bagging ensemble (majority voting) | Accuracy: 87.8 |
| 55 | Deleger et al[95] | 2013 | United States | AAp: 534; control: 1566 | Components of the pediatric appendicitis score | NLP | Sensitivity: 86.9; precision: 86.8; specificity: 93.8 |
| 56 | Yoldaş et al[71] | 2012 | Turkey | AAp: 132; negative-AAp: 24 | Demographic, clinical and biochemical data | ANN | Sensitivity: 100; specificity: 97.2; AUC: 0.950 |
| 57 | Son et al[76] | 2012 | South Korea | AAp: 152; non-AAp: 174 | Demographic, clinical and biochemical data | DT C5.0 model (univariate) | Accuracy: 80.2; sensitivity: 82.4; specificity: 78.3; AUC: 0.803 |
| DT C5.0 model (multivariate) | Accuracy: 73.5; sensitivity: 66.0; specificity: 80.0; AUC: 0.730 | ||||||
| 58 | Malley et al[96] | 2012 | United States | AAp: 85; negative AAp: 21 | Biochemical data | b-NN, class RF, Iboost, LR, KNN, regRF (regRF best results) | Brier score: 0.061; AUC: 0.976 |
| 59 | Grigull and Lechner[74] | 2012 | Germany | AAp: 45 (pediatric cases) | Demographic, clinical and biochemical data | SVM, ANN, fuzzy logic, voting algorithm (combination best results) | Accuracy: 97.4 |
| 60 | Hsieh et al[72] | 2011 | Taiwan | Compl AAp: 28; uncompl AAp: 87; negative AAp: 11; non-AAp: 65 | Demographic, clinical and biochemical data | RF, SVM, ANN, LR (RF best results) | Accuracy: 96.0; sensitivity: 94.0; specificity: 100; AUC: 0.980 |
| 61 | Ting et al[77] | 2010 | Taiwan | Compl AAp: 80; uncompl: 340; negative-AAp: 112 | Demographic, clinical and biochemical data | DT | Sensitivity: 94.5; specificity: 80.5 |
| 62 | Prabhudesai et al[73] | 2008 | United Kingdom | AAp: 24; non-AAp: 36 | Demographic, clinical and biochemical data | Alvarado (≥ 7), Alvarado (≥ 6), clinical, ANN (ANN best results) | Sensitivity: 100; specificity: 97.2; PPV: 96.0; NPV: 100 |
| 63 | Sakai et al[78] | 2007 | Japan | AAp: 86; negative AAp: 12; non-AAp: 71 | Demographic, clinical and biochemical data | LR | Sensitivity: 21.4; specificity: 80.4; AUC: 0.719 |
| ANN | Sensitivity: 19.9; specificity: 78.5; AUC: 0.741 | ||||||
| 64 | Pesonen et al[98] | 1996 | Finland | Suspected AAp: 911 | Demographic, clinical and biochemical data | NN (ART1) | Sensitivity: 79.0; specificity: 78.0 |
| NN (SOM) | Sensitivity: 55.0; specificity: 83.0 | ||||||
| NN (LVQ) | Sensitivity: 87.0; specificity: 90.0 | ||||||
| NN (BP) | Sensitivity: 83.0; specificity: 92.0 | ||||||
| 65 | Forsström et al[97] | 1995 | Finland | AAp: 145; negative AAp: 41 | Biochemical data | LR | AUC: 0.678 |
| DiagaiD | AUC: 0.683 | ||||||
| NN (BP) | AUC: 0.622 |
AI is transforming modern medicine by enabling advanced data analysis, pattern recognition, and predictive decision-making across a wide range of clinical specialties. From diagnostic imaging to electronic health record analysis, personalized treatment planning, differential diagnosis, and disease classification, AI technologies have demonstrated growing utility in enhancing clinical workflows, improving diagnostic accuracy, and supporting evidence-based decisions. The key concepts and subfields of AI most commonly used or actively researched in clinical medicine are as follows[27,28,80-82,99].
ML enables computers to identify patterns in data and make decisions without explicit programming. It includes su
Decision tree is a fundamental supervised learning algorithm that builds a hierarchical tree-like structure to classify data points based on feature values. Decision trees are widely used in medicine due to their transparency and interpretability, especially in clinical decision support systems. While they can function independently, they also serve as the foundational base for more complex ensemble learning methods such as RF and GBMs. Despite their simplicity, decision trees can effectively model non-linear relationships and are often favored in clinical settings where explainability is essential[102,103].
KNN is a non-parametric, instance-based classification algorithm that predicts outcomes by comparing new data points to the most similar cases in the training set. It has been applied in various clinical tasks and has shown utility in AAp risk prediction using structured clinical data. While simple and interpretable, its application in high-dimensional datasets can be computationally expensive[38].
A specialized branch of ML, DL uses ANNs to process large and complex datasets, particularly for image and text analysis. Common architectures include multi-layer perceptron (MLP), CNNs for image classification, recurrent neural networks (RNNs) and long short-term memory networks for time-series and clinical note analysis, and generative adversarial networks for medical image augmentation. Vision transformers have shown promise in image segmentation tasks, while graph neural networks (GNNs) are discussed separately due to their unique capacity to model relational data. Emerging self-supervised models have also improved representation learning in limited-labeled datasets. Hybrid DL models, such as CNN-RNN combinations, are increasingly used to enhance diagnostic accuracy[28,104,105].
Natural language processing (NLP) focuses on the interaction between computers and human language, enabling AI systems to extract structured insights from unstructured clinical notes, discharge summaries, and radiology reports. It plays a crucial role in clinical information retrieval, temporal event extraction, and predictive modeling based on narrative patient data[106]. Abu-Ashour et al[41] integrated NLP into decision-support tools, significantly improving the triage efficiency for patients with suspected AAp.
This AI subfield enables machines to interpret and analyze visual data, making it particularly valuable for diagnostic imaging. In clinical medicine, computer vision is widely used for the classification, segmentation, and detection of anomalies in radiologic images such as US, CT, and MRI. Rajpurkar et al[60] developed AppendiXNet, a DL model that achieved an AUROC of 0.81 in identifying AAp from CT scans, demonstrating AI’s potential in radiology-based triage. In addition, vision transformers have recently been explored for improving segmentation accuracy and enhancing the classification of complex medical images.
RL is an AI paradigm where agents learn to make optimal sequential decisions by interacting with their environment and receiving feedback in the form of rewards. Although rarely applied to AAp, RL has been successfully used in other me
Federated learning enables AI models to be collaboratively trained across multiple healthcare institutions without centralizing sensitive patient data, thus enhancing privacy and data security. This approach could be particularly beneficial for future multi-center AAp research, allowing models to generalize across diverse populations while pre
Bayesian inference methods provide a probabilistic framework for modeling uncertainty in clinical data, particularly when information is incomplete or ambiguous. These networks use conditional dependencies among variables to support diagnostic reasoning and treatment decision-making. In AAp research, Bayesian networks could be applied to risk stratification by integrating prior knowledge from patient demographics, laboratory markers, and imaging findings[110]. However, several barriers hinder their widespread adoption in clinical practice. In particular, the need for expert knowledge to establish accurate prior probabilities, the challenges associated with constructing model structures, and the difficulties of real-time application in emergency settings stand out.
These DL models are highly effective in processing and analyzing medical text, including clinical notes, discharge summaries, and radiology reports. They enable robust extraction of clinical features from unstructured data and have been used for tasks such as automated chart review, symptom classification, and clinical risk prediction. In the context of acute care, transformer-based models may support triage systems by identifying high-risk cases based on electronic health record narratives[111].
GNNs are DL architectures designed to process data that are structured as graphs, where entities (nodes) and their interactions (edges) are central to modeling. Unlike traditional neural networks, GNNs can capture complex relationships between clinical variables, making them suitable for representing patient comorbidities, disease trajectories, and treatment outcomes. Although not yet widely applied in AAp, GNNs could support risk stratification by integrating patient history and clinical interactions in a graph-based format[112].
Automated ML (AutoML) systems provide an automated framework for model selection, hyperparameter tuning, and feature engineering, thereby reducing the need for intensive manual intervention and allowing users with limited data science expertise to develop robust ML models[113]. In the context of AAp, AutoML could streamline the development of diagnostic and prognostic AI tools, particularly in clinical settings where dedicated data science resources are scarce, thus facilitating broader clinical adoption.
Edge AI refers to executing AI algorithms directly on local devices, such as hospital bedside monitors, portable US scanners, or smartphones without relying on centralized cloud infrastructure. This enables real-time inference, reduces latency, enhances data privacy, and allows decision support in settings with limited or unstable internet access. Edge AI holds particular promise for emergency rooms, rural clinics, and prehospital environments where rapid, autonomous decision-making is essential[103].
AI’s growing role in AAp diagnosis is evident across various ML and DL applications. By integrating multimodal data sources including patient history, physical examination findings, laboratory markers, and imaging studies, AI models enhance diagnostic precision. Studies have demonstrated that AI-powered decision support systems can aid clinicians in distinguishing complicated from uncomplicated AAp, optimizing treatment strategies, and reducing unnecessary surgeries. As AI continues to evolve, its clinical applications in AAp detection and management will likely expand, further enhancing precision medicine and individualized treatment approaches[28]. The summary of the terminology was provided in Table 2.
| Method | Definition | Relation to deep learning | Advantages |
| Deep learning | A subset of ML that uses multi-layered neural networks to automatically extract features from large datasets | DL is commonly used in image analysis text processing and predictive modeling. FL and edge AI can enhance the efficiency and privacy of DL models | High ACC strong capability in handling image and language data |
| Federated learning | A decentralized ML approach where models are trained across multiple institutions without sharing patient data | FL allows DL models to be trained across different centers while preserving patient privacy. It is useful for multi-center AI studies in appendicitis diagnosis | Enhances data privacy allows for cross-institutional AI model development |
| Edge AI | AI models that run directly on local hospital devices portable ultrasound scanners or mobile systems instead of relying on cloud computing | Edge AI enables DL models to operate in real-time on local devices reducing dependence on internet connectivity | Real-time processing improved data security reduced latency in decision-making |
| Bayesian networks | Probabilistic models that establish relationships between variables and handle uncertainty in data | Can be integrated with DL models to improve decision-making under incomplete information | Useful for risk prediction particularly in cases with missing clinical data |
| Transformer-based AI models (BERT, GPT) | Large language models capable of understanding and processing medical text | Can be used in combination with DL for automated triage systems and clinical note analysis | Efficient text processing potential for real-time clinical decision support |
| Graph neural networks | AI models that analyze relationships between data points in a structured graph format | GNNs can enhance DL models by incorporating complex patient relationships and comorbidities | Improves risk prediction models enhances interpretability of patient data interactions |
| Automated machine learning | AI systems that automatically optimize model selection hyperparameters and feature engineering | AutoML can generate optimized DL models without requiring manual tuning | Reduces the need for expert AI developers accelerates model deployment |
| Natural language processing | AI systems designed to interpret and extract information from human language including clinical notes and radiology reports | NLP models can be integrated with DL to analyze unstructured medical data | Enhances electronic health record analysis supports AI-assisted triage systems |
| Computer vision | AI field enabling machines to interpret visual data particularly useful in medical imaging | Computer vision models. including DL-based CNNs improve diagnostic ACC in radiology | Reduces diagnostic variability. increases ACC in CT and MRI interpretation |
| Reinforcement learning and explainable AI | AI models that learn optimal decision pathways based on cumulative rewards XAI ensures transparency in model predictions | Can optimize treatment strategies while SHAP and LIME techniques make AI models interpretable for clinicians | Improves AI adoption in healthcare enables better treatment planning |
| Machine learning | A broad AI field encompassing various algorithms including supervised and unsupervised learning | ML models, such as SVM, random forest and XGBoost form the foundation for AI in clinical decision-making | Provides adaptable and scalable models for medical data analysis |
| Vision transformers | A deep learning model specifically designed for image segmentation and classification | Enhances medical image analysis by capturing spatial relationships within radiology images | Improves segmentation ACC particularly in CT and MRI-based diagnosis |
| Lazy learning algorithms (KNN) | Classification method that identifies the closest data points in a dataset | Used in ML for patient clustering and classification | Simple yet effective but computationally expensive in large datasets |
| Extra trees classifier | A variant of random forest that introduces additional randomness to improve ACC | Works alongside ensemble learning to enhance classification performance | High ACC robustness in medical data analysis |
| Hybrid AI models | AI models combining ML and DL techniques to improve diagnostic performance | Used in multimodal AI-based appendicitis detection | Enhances ACC by integrating structured and unstructured data sources |
Traditional clinical scores assist in suspected AAp cases; however, they do not always provide sufficient sensitivity and specificity. AI methods leverage clinical data such as patient history, physical examination findings, and laboratory values to develop more accurate diagnostic models. As of 2025, numerous studies have demonstrated the success of AI-based clinical models in AAp diagnosis[3,8,16,27,28,33,35-84].
For example, in the study by Chadaga et al[42], ensemble learning methods outperformed traditional clinical scores in diagnosing AAp. The study reported an AUROC of 0.82 when using a combination of XGBoost and LightGBM models, highlighting a significant improvement over standard clinical evaluations. The systematic review by Rey et al[82] showed that AI models trained with multimodal data (clinical + laboratory + imaging) outperformed conventional diagnostic approaches for AAp. This review particularly emphasized the high sensitivity and specificity of models such as CatBoost, LightGBM, and RF. In another study, Phan-Mai et al[46] reported an AUROC of 89.4% in the prediction of perforation using DL models. Similarly, Akbulut et al[43] found that the CatBoost algorithm, when based solely on clinical data, achieved 92% specificity and 88% sensitivity in diagnosing AAp.
Systematic reviews highlight the clinical performance of various ML models by comparing their sensitivity, specificity, and AUROC values. For example, the systematic review by Issaiy et al[28] reported that the RF model achieved 94% sensitivity and 96% specificity in diagnosing AAp. Similarly, the study by Rey et al[82] conducted a systematic review of pediatric AAp studies and found that all AI models many integrating clinical, laboratory, and imaging data achieved AUC/AUROC values above 0.9, demonstrating a superior diagnostic performance compared to conventional methods.
In another study, Yoldaş et al[71] evaluated a neural network model on 156 patients, reporting near-perfect results, i.e. 100% sensitivity and 97.2% specificity (PPV = 96%, NPV approaching 100%). However, such small-scale studies raise concerns about whether AI models will maintain the same accuracy when applied to larger datasets. However, such small-sample studies (< 200 cases in total) risk overfitting, particularly in DL, our analysis suggests a minimum target of 150 cases per severity class for ML and > 500 cases for DL architectures in AAp. For rare complications, prospective re
ML has also demonstrated success in larger datasets. In a study conducted in Taiwan, a RF model trained on demographic and clinical data from 180 patients achieved 94% sensitivity, 100% specificity, 96% accuracy, and an AUC of 0.98 for AAp diagnosis, outperforming other algorithms such as SVM and ANN[72]. In a larger dataset from Turkey, which included 7244 patients, models trained on demographic and laboratory parameters showed that RF achieved the highest performance (AUC = 0.99), particularly in identifying complicated cases[59]. In this study, the decision tree model provided slightly lower accuracy (AUC = 0.94) but offered a more interpretable approach. XAI techniques facilitate the integration of such models into clinical practice.
Ensemble learning methods have been increasingly integrated into clinical decision support systems in recent years. According to the systematic review by Rey et al[82], AI models demonstrated consistently high diagnostic performance in pediatric AAp. In particular, models that combined clinical, laboratory, and imaging features achieved accuracies above 90% and AUROC values exceeding 0.9, substantially outperforming conventional diagnostic approaches. Similarly, Males et al[39] demonstrated that an XGBoost model incorporating SHAP and LIME techniques improved diagnostic per
Models trained solely on laboratory data have shown limited success. In a study by Mijwil and Aggarwal[47], the RF model, when trained exclusively on laboratory data, achieved 81% sensitivity, 81% specificity, and 84% accuracy. This finding suggests that ML models trained only on laboratory values may underperform in the absence of clinical and imaging data. However, Ghareeb et al[54] developed a hybrid model using ensemble learning techniques that achieved 93% accuracy even when trained only on laboratory data.
AI algorithms are not only used to detect AAp but also to differentiate between complicated and uncomplicated cases. In a study involving 1797 patients, Akbulut et al[43] developed a CatBoost algorithm based on demographic and bio
Recent studies have demonstrated that integrating multimodal data such as clinical, laboratory, and imaging findings enhances the accuracy of AI models in differentiating complicated from uncomplicated AAp. According to the systematic review by Rey et al[82], AI models that incorporate multimodal inputs including CT-based features as well as inflammatory markers such as C-reactive protein and leukocytosis consistently achieved very high diagnostic accuracy (generally > 90%) or AUC values (> 0.9) for distinguishing AAp, indicating a strong predictive performance across diverse modeling approaches. Similarly, in a systematic review, Issaiy et al[28] reported that advanced ML models, such as RF, SVMs, ANNs, and XGBoost, achieved AUROC values ranging from 0.84 to 0.94, consistently outperforming traditional clinical assessment.
In addition to ML techniques, DL models have shown promising results in distinguishing complicated AAp. Phan-Mai et al[46] utilized a CNN trained on US and CT images, achieving an AUROC of 0.894, with 91% sensitivity and 88% specificity in detecting complicated AAp cases. Another study by Liang et al[91] reported that radiomics-based DL approaches improved prediction of perforation compared to standard radiological evaluations.
Furthermore, XAI techniques, such as SHAP and LIME, have been integrated into predictive models to enhance interpretability. According to Chadaga et al[42], SHAP-based feature analysis indicated that the most influential predictors for AAp detection in pediatric patients were length of hospital stay, visibility of the vermiform appendix on US, white blood cell count, and appendix diameter, thereby enhancing the interpretability and reliability of AI-driven diagnostic decision-making. These insights contribute to more reliable and interpretable AI-driven decision-making in clinical practice.
In conclusion, systematic review studies have shown that ML-based approaches provide higher accuracy than traditional clinical methods in diagnosing AAp. Particularly, ensemble learning models such as XGBoost, LightGBM, and CatBoost play a crucial role in diagnosing atypical cases. DL methods and radiomics-based AI models further enhance the differentiation between complicated and uncomplicated AAp. These technologies can be integrated into clinical decision support systems to improve diagnostic accuracy, reduce unnecessary surgeries, and optimize patient management strategies.
US is typically the first-line imaging modality for diagnosing AAp. However, its diagnostic accuracy is operator-dependent and may vary significantly among less experienced users. AI assists in this area by enabling automatic appendix detection in US images, particularly benefiting less experienced practitioners. In the study by Abu-Ashour et al[41], ChatGPT-4 was applied to label free-text operative and US reports for grading pediatric AAp. Compared with human data abstractors, ChatGPT-4 substantially reduced misclassification rates (2.9% vs 28.2%) and prevented 59.2% of errors, while being nearly 40 times faster. Several studies have demonstrated that the addition of US findings to clinical and laboratory parameters improves diagnostic performance. For instance, Anandalwar et al[114] reported that in
According to a systematic review by Rey et al[82], AI integration into US-based assessment was reported to facilitate appendix visualization and improve diagnostic performance, particularly in case of low-resolution or challenging cases. In recent years, ensemble learning and transfer learning techniques have been increasingly applied in US analysis. Marcinkevičs et al[52] developed an interpretable ML framework for pediatric AAp detection using US images. Their best-performing model, a semi-supervised Multiview concept bottleneck model, achieved an AUROC of 0.80 and an area under precision-recall of 0.92, demonstrating competitive accuracy while maintaining model explainability. According to a recent systematic review by Rey et al[82], ensemble learning-based approaches such as XGBoost and LightGBM out
A recent study by Hayashi et al[55] further explored AI-assisted US in pediatric AAp diagnosis. Their research involved training a DL model with 70 US videos, evaluating its effectiveness in two phases. The first phase assessed AI performance in detecting the appendix, with successful identification in shallow scans but decreased accuracy in deeper scans (> 8 cm). Potential technical solutions to improve performance in deep tissue imaging include the integration of higher-frequency transducers with enhanced penetration capabilities, optimization of DL architectures through multi-scale feature extraction, implementation of contrast-enhanced US techniques, and development of hybrid models that combine AI with real-time operator feedback systems to guide probe positioning and optimize image acquisition parameters. The second phase analyzed AIs affect pediatricians’ diagnostic confidence. Results indicated that AI assistance was beneficial when the appendix was at least partially detected but could negatively influence decision-making when the appendix was not identified. This highlights the need for further refinement in AI models, particularly in handling deep tissue scans and minimizing false negatives to avoid misleading clinicians.
CT is considered the gold standard for diagnosing AAp due to its high sensitivity, especially in ambiguous cases, where low-dose abdominal CT is recommended. AI enhances CT image analysis by automatically detecting AAp features and complications, identifying details that may be missed by human observation. DL has played a pivotal role in this field. The AppendiXNet model developed by Rajpurkar et al[60] was trained on 438 patient CT images, achieving an accuracy of 72% (AUC = 0.81). The study by Gollapalli et al[38] found that AI-assisted CT analysis achieved up to 90% sensitivity in early AAp cases. Zhao et al[90] reported that radiomics models integrated with clinical information on CT images achieved higher diagnostic accuracy in differentiating simple from non-simple AAp compared with CT-based as
Differentiating between complicated and uncomplicated AAp using CT images is crucial for determining surgical necessity. Traditional radiological criteria (e.g., abscess, free air) do not always reliably indicate perforation. AI-enhanced imaging analysis can more accurately assess perforation risk. In the study by Liang et al[91], CT data from 1165 patients were analyzed using DL and radiomics techniques. A CatBoost model, when combined with radiologist evaluation, achieved an AUC of 0.79 for identifying complicated AAp. The model’s sensitivity reached 70%, whereas traditional radiologist assessments achieved only 45%. Its NPV was 80%, which was 7% higher than that of radiologists. However, its specificity was 74%, lower than the 90% specificity of radiologists, indicating a potential tendency for overdiagnosis in non-complicated cases.
ML techniques are also effectively applied in CT analysis. Issaiy et al[28] conducted a systematic review of AI and ML models in AAp diagnosis and prognosis. They highlighted that ensemble learning models including XGBoost, CatBoost, and LightGBM consistently outperformed traditional diagnostic methods (e.g., clinical scoring systems or standard imaging interpretation) across multiple studies, demonstrating higher accuracy, sensitivity, and specificity. According to Rey et al[82], ensemble learning and XAI techniques applied to CT analysis have demonstrated superior diagnostic performance compared with conventional interpretation methods, with several studies reporting high AUROC and sensitivity values across different patient cohorts. Dogan and Selcuk[31] developed a novel DL approach for AAp diagnosis, utilizing a hybrid CNN model integrated with ensemble learning techniques, including SVM, KNN, and RF. Their method demonstrated an independent diagnostic accuracy of 96% in cases with definitive CT-based radiological findings and 83.3% in cases with radiologically ambiguous CT findings, surpassing traditional radiologist-based evaluations. The hybrid model achieved a sensitivity of 95.7%, specificity of 69.7%, overall accuracy of 92.8%, and an F1 score of 94.2%, highlighting its robustness in diagnosing AAp, particularly in challenging cases where conventional imaging interpretation is difficult.
MRI is primarily used for AAp diagnosis in pregnant women and pediatric patients, where radiation exposure must be minimized. However, its routine use in emergency settings remains limited. Despite its proven diagnostic accuracy, MRI’s routine use in emergency settings remains limited due to longer acquisition times and limited availability compared to CT and US. Current literature on AI-assisted MRI analysis for AAp diagnosis is notably scarce, with existing AI research predominantly focusing on CT imaging, US, and clinical-laboratory parameter combinations. The vast majority of published AI studies in AAp diagnosis have utilized CT scans and US imaging as primary data sources, reflecting the more widespread availability and faster acquisition times of these modalities in emergency departments. However, emerging developments in pediatric MRI and AI demonstrate significant potential for future applications, particularly in image optimization, organ segmentation, and automated diagnosis. The integration of AI with MRI technology represents an underexplored frontier that could potentially enhance diagnostic accuracy and efficiency in AAp evaluation, especially in radiation-sensitive populations.
ML and DL approaches offer different advantages in the diagnosis of AAp. ML methods (e.g., logistic regression, decision trees, RF, SVM, XGBoost, LightGBM, etc.) are models that can work with relatively smaller datasets and provide results that are somewhat interpretable. Indeed, many studies have demonstrated that ML models offer significantly higher accuracy than traditional clinical scores. For example, in a study by Gollapalli et al[38], the RF model outperformed other ML algorithms in both diagnosing AAp and predicting its complicated form, achieving an AUC of up to 99%. Similarly, Chadaga et al[42] reported that XGBoost and LightGBM algorithms, when integrated with clinical data, achieved an accuracy of 91% compared to traditional methods. ML-based systems also use XAI techniques such as SHAP to determine which clinical parameters are most influential in diagnosis, providing clinicians with more transparent models[116].
On the DL side, ANNs and, more specifically, CNNs for image analysis have gained significant attention in recent years. DL has the ability to autonomously learn features from raw data through multi-layered neural networks[117]. This allows it to detect complex data patterns that human experts may overlook. ANNs were first applied to AAp diagnosis in the late 2000s and have been shown to outperform clinical scores even in small-scale studies[73]. For example, in a study by Prabhudesai et al[73], an ANN model significantly outperformed clinicians, completely eliminating false-negative cases and diagnosing with 100% sensitivity (97% specificity). Similarly, in a study conducted in Turkey (Yoldaş et al[71]), an ANNs model achieved 100% sensitivity and 97% specificity, demonstrating excellent performance in preventing false negatives. These models statistically outperform scoring systems such as the Alvarado score.
The superiority of DL in image analysis is also becoming more evident. The aforementioned AppendiXNet study achieved a reasonable AUC (0.81) in AAp diagnosis using a limited number of CT samples[60]. However, more recent systematic reviews have shown that CNNs models become more robust in clinical diagnosis when trained on large datasets. In the study by Schipper et al[33] two ML models (history intake vitals examination and history intake vitals examination-laboratory tests) based on the XGBoost algorithm were developed to predict AAp in patients presenting with acute abdominal pain in the emergency department. The models demonstrated high discriminative performance (AUROC = 0.919 and 0.923, respectively), outperforming the Alvarado score (AUROC = 0.824) and showing comparable or superior accuracy to emergency physicians, particularly when laboratory results were incorporated. In a systematic review, Rey et al[82] noted that all included studies developed their own ML or CNN-based models and consistently reported diagnostic performances exceeding 90% accuracy or an AUC greater than 0.9.
Multimodal AI models, which integrate different types of data, are particularly effective in improving diagnostic success. Zhao et al[90] developed a radiomics model that integrated clinical data, laboratory parameters, and CT imaging and demonstrated that a combined model using both radiomics features and clinical information achieved a significantly higher AUC than a CT-only model (P = 0.041) in differentiating simple from complicated AAp. In a retrospective cohort study by Phan-Mai et al[46] various ML models including SVM, decision trees, logistic regression, KNN, ANNs, and gradient boosting were used to classify complicated vs uncomplicated AAp. AUC and accuracy values ranged from approximately 0.69 to 0.82 in raw data, with gradient boosting models achieving higher accuracy and an AUC ≥ 0.8 after balancing via synthetic minority oversampling technique.
However, it is important to note that some of the striking results reported in the literature were obtained in small and selectively chosen patient groups. Early ANN studies that reported 100% sensitivity and 97% specificity have been subject to methodological criticisms, particularly regarding small sample sizes and the risk of overfitting. While small-sample studies (< 200 total cases) risk overfitting, particularly in DL, our analysis suggests minimum a target of 150 cases per severity class for ML and > 500 cases for DL architectures in AAp. For rare complications, prospective registries should target ≥ 50 confirmed events per model class. A systematic review highlighted that many AI-based AAp diagnosis studies suffer from selection bias and inadequate model validation. Therefore, before DL models can be implemented in real-world practice, they must be tested on larger, multicenter, and heterogeneous datasets.
In the future, it will be crucial for these models to gain clinicians’ trust by combining the interpretability of decision tree-based approaches with the accuracy of DL. Additionally, more prospective studies and randomized controlled trials are needed to integrate AI models into clinical practice. Particularly, the combination of ensemble learning techniques and DL models holds promise for developing the most accurate and reliable decision support systems in AAp diagnosis.
The performance of AI models in diagnosing AAp varies between studies but is generally high. Table 1 presents the performance metrics for distinguishing between normal appendix and AAp across various studies, while Table 2 summarizes the accuracy of AI models in detecting complicated (perforated) cases among AAp patients. These tables provide key performance indicators such as sensitivity, specificity, PPV, NPV, and AUROC.
Comprehensive systematic reviews indicate that DL and ensemble learning methods (e.g., XGBoost, LightGBM, CatBoost) offer significant advantages over traditional clinical scoring methods in AAp diagnosis. While these reviews highlight promising performance, they also underscore a critical limitation: A great proportion of the underlying evidence derives from retrospective, single-center studies with heterogeneous definitions of ‘complicated’ AAp, which can introduce selection bias and limit generalizability. In the systematic review by Rey et al[82], ANN models were reported to reach an AUROC of up to 0.985. Models trained on large datasets, such as RF and XGBoost, achieved sen
PPV and NPV are as crucial as sensitivity and specificity in clinical practice. For example, in the ANN model by Yoldaş et al[71], NPV was reported as 100%, meaning the model was highly reliable in ruling out AAp cases and preventing unnecessary surgeries. Similarly, Liang et al[91] reported that a DL + radiomics model for complicated AAp achieved an NPV of 80%, which was 7 percentage points higher than that of traditional radiology-based assessments (73%). The model also demonstrated a substantially higher sensitivity (70%) compared to radiologists (45%). Shahmoradi et al[67] developed a MLP network-based DL model with 80% sensitivity, 97.5% specificity, 92.3% PPV, and 93% NPV for AAp diagnosis. Hsieh et al[72] used a RF model that achieved 94% sensitivity, 100% specificity, 100% PPV, and 87% NPV. Hsieh et al[72] reported that a SVM-based model reached 91% sensitivity, 100% specificity, 85% PPV, and 73% NPV. Hsieh et al[72] applied an ANN model that achieved 94% sensitivity, 85% specificity, 94% PPV, and 85% NPV. Yazici et al[37] demonstrated that using a logistic regression model with only three readily available clinical features age, C-reactive protein, and peri-appendicular fluid collection achieved a diagnostic accuracy of approximately 96% in differentiating uncomplicated and complicated AAp. These studies highlight the potential of various AI approaches in ensuring reliable clinical diagnosis.
AI plays a crucial role in supporting decision-making processes in managing patients with suspected AAp. AI-based models enhance diagnostic accuracy, reduce unnecessary appendectomies, and improve the prediction of complications such as perforation, contributing directly to patient care. The advantages of AI-based clinical decision support systems include: (1) More accurate classification of uncertain cases in AAp diagnosis; (2) Better determination of surgical necessity for complicated AAp cases; (3) Improved selection of patients suitable for non-operative treatment; and (4) Enhanced postoperative complication prediction and patient management.
Cappuccio et al[118] conducted a comprehensive literature review on AI applications in AAp diagnosis and ma
The integration of AI into clinical decision-making also requires XAI techniques for transparency. SHAP and LIME interpretation algorithms help increase clinical trust by clarifying why a model predicts high or low risk for a particular patient. Chadaga et al[42] demonstrated that SHAP-based analyses identified key factors affecting the likelihood of complicated AAp, aiding physicians in clinical decision-making. In their study, Schipper et al[33] introduced ML models utilizing clinical and laboratory information that achieved AUROCs of 0.919 without and 0.923 with laboratory data. Compared with the Alvarado score (AUROC = 0.824), these models showed markedly improved accuracy and performed on par with or better than emergency physicians, whose AUROCs ranged between 0.791 and 0.923.
From a clinical perspective, AI models should be used as decision-support tools rather than a substitute for physicians. AI should be viewed as a second opinion or triage tool rather than a primary decision-maker. The final clinical decision should always be made by an experienced physician.
In conclusion, AI-based approaches have demonstrated superior diagnostic accuracy over traditional methods in many studies. However, the success of these models depends on the type of data used (clinical data alone or with imaging), the structure of the algorithm, and the population on which the model was trained. Therefore, reported figures in the literature should be interpreted cautiously, and each healthcare institution should select the AI model most appropriate for its patient profile.
The integration of AI into the diagnosis and management of AAp is rapidly advancing; however, there are still areas that require further development. Several key directions and recommendations for future research are as follows.
Many existing studies have been conducted in single centers with limited patient populations. There is a need for large-scale datasets that include patients from different geographic regions, age groups, and risk categories. Special attention should be given to subgroups such as pediatric patients, the elderly, and pregnant women to evaluate model per
To assess the real-world clinical impact of AI models, prospective studies conducted across multiple centers are essential. This will enable independent validation of the models on diverse patient populations and provide concrete evidence of their contribution to routine clinical practice, such as reducing negative appendectomy rates and minimizing perforation complications. Kelly et al[120] emphasized that for AI to achieve a measurable clinical impact, models must be validated on large, diverse, and multicenter datasets, as reliance on small single-center cohorts risks overestimating performance and limits real-world applicability.
Future AAp diagnostic algorithms should not rely solely on a single data type but should incorporate clinical, laboratory, and imaging information in a multimodal framework. However, technical challenges such as data heterogeneity, varying acquisition protocols, missing data across modalities, and computational complexity must be addressed through standardized data preprocessing pipelines, advanced feature alignment techniques, imputation strategies for missing values, and efficient DL architectures designed for multimodal fusion. For example, integrating symptom duration, physical examination findings, blood test results, and imaging data into AI models can lead to more accurate predictions. A notable example is the study by Liang et al[91], where a combined clinical + CT + radiomics model demonstrated promising results. Additional studies have shown that combining clinical and imaging data can increase AUROC values by 15% compared to single-source models. Future advancements may include real-time AI-assisted decision-making systems in emergency departments, integrating clinical data with portable US imaging.
For clinicians to trust AI, it is crucial to ensure model interpretability. Therefore, future research should not only focus on accuracy but also on elucidating the reasoning behind model predictions. The use of techniques such as SHAP and LIME should be expanded to visualize how AI models reach decisions. In practice, SHAP values can be integrated into electronic health records dashboards to highlight patient-specific feature contributions (e.g., elevated C-reactive protein or leukocytosis) directly within the clinical interface, enabling physicians to rapidly assess the rationale behind an AI-generated risk score; early implementations in sepsis prediction systems have demonstrated that such real-time, in
AI systems can also contribute to clinician education. In the future, interactive training platforms for surgical residents and emergency physicians could utilize AI to simulate diagnostic scenarios. This would allow practitioners to learn the key clinical features that AI identifies, improving their diagnostic accuracy in real-world settings. Educational AI modules have already been shown to improve junior physicians’ diagnostic accuracy by 18% in pilot studies, under
In patients presenting with acute abdominal pain, accurate diagnosis extends beyond confirming or excluding AAp. A wide range of alternative conditions including mesenteric lymphadenitis, ovarian cyst-related pathology, diverticulitis, and gallbladder disease may mimic AAp both clinically and radiologically, often leading to diagnostic uncertainty and unnecessary interventions. AI has the potential to bridge this gap by advancing from traditional binary classification to multi-diagnostic frameworks that can simultaneously evaluate several plausible conditions. DL applied to CT imaging, particularly when integrated with clinical and laboratory parameters, could enable a more holistic interpretation of abdominal pain presentations. Unlike binary AAp models, multi-class systems would be able to provide differential probabilities across multiple diagnoses, offering clinicians a ranked diagnostic spectrum rather than a single yes/no output. To achieve this, models must be trained on carefully curated, multi-institutional datasets with explicit labels for both AAp and its common mimickers, while addressing class imbalance and overlapping clinical features. Equally im
In reviewing the broader literature on the applications of AI in healthcare, it becomes evident that many of the published studies are authored primarily by computer scientists, software developers, and engineers, often without sufficient involvement of clinical experts. As a result, a considerable proportion of these papers contain critical inaccuracies or oversimplifications in their medical content. In fact, when preparing this review, we observed that in more than half of the AAp and AI-related papers, the authors appeared to have limited understanding of what AAp truly entails. This underscores a crucial reality: Healthcare cannot be reduced to a purely mathematical exercise. Therefore, editors and reviewers should be particularly vigilant when evaluating such submissions. They should carefully examine author lists and ensure the inclusion of clinical expertise, and they should scrutinize more rigorously any manuscripts on AI in healthcare that lack clinician involvement. The responsible and ethical application of AI in medicine requires that data be carefully analyzed and interpreted by specialists in the relevant clinical domains before being processed through computational models. Moreover, the outcomes generated by such models must be rigorously evaluated for their clinical applicability and real-world translational value. Going forward, stronger interdisciplinary collaboration between clinicians and computer scientists will be essential to ensure that AI research in healthcare is accurate, clinically relevant, and ethically responsible.
AI is an emerging tool for differentiating normal appendix, AAp, and perforated AAp. Current literature suggests that well-trained algorithms can surpass clinical scoring systems and, in some cases, even outperform experienced clinicians in diagnostic accuracy. ML and DL techniques have achieved sensitivity and specificity values exceeding 90% in AAp diagnosis. These models can expedite decision-making in emergency settings, helping to prevent life-threatening complications and reduce unnecessary surgeries. However, for AI to become part of routine clinical practice, extensive validation studies and increased model transparency are required to gain clinician trust. Further integration of AI into healthcare workflows, coupled with regulatory approvals and physician training programs, will be critical for large-scale adoption. In the future, AI systems integrated with multidisciplinary approaches and clinical workflows may not only revolutionize AAp diagnosis but also become a standard tool in the management of general surgical emergencies.
| 1. | Di Saverio S, Podda M, De Simone B, Ceresoli M, Augustin G, Gori A, Boermeester M, Sartelli M, Coccolini F, Tarasconi A, De' Angelis N, Weber DG, Tolonen M, Birindelli A, Biffl W, Moore EE, Kelly M, Soreide K, Kashuk J, Ten Broek R, Gomes CA, Sugrue M, Davies RJ, Damaskos D, Leppäniemi A, Kirkpatrick A, Peitzman AB, Fraga GP, Maier RV, Coimbra R, Chiarugi M, Sganga G, Pisanu A, De' Angelis GL, Tan E, Van Goor H, Pata F, Di Carlo I, Chiara O, Litvin A, Campanile FC, Sakakushev B, Tomadze G, Demetrashvili Z, Latifi R, Abu-Zidan F, Romeo O, Segovia-Lohse H, Baiocchi G, Costa D, Rizoli S, Balogh ZJ, Bendinelli C, Scalea T, Ivatury R, Velmahos G, Andersson R, Kluger Y, Ansaloni L, Catena F. Diagnosis and treatment of acute appendicitis: 2020 update of the WSES Jerusalem guidelines. World J Emerg Surg. 2020;15:27. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 697] [Cited by in RCA: 641] [Article Influence: 128.2] [Reference Citation Analysis (109)] |
| 2. | Kabir SA, Kabir SI, Sun R, Jafferbhoy S, Karim A. How to diagnose an acutely inflamed appendix; a systematic review of the latest evidence. Int J Surg. 2017;40:155-162. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 29] [Cited by in RCA: 49] [Article Influence: 6.1] [Reference Citation Analysis (0)] |
| 3. | Chen S, Xia J, Xu B, Huang Y, Teng M, Pan J. Risk prediction and effect evaluation of complicated appendicitis based on XGBoost modeling. BMC Gastroenterol. 2025;25:295. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 4. | Singh JP, Mariadason JG. Role of the faecolith in modern-day appendicitis. Ann R Coll Surg Engl. 2013;95:48-51. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 55] [Cited by in RCA: 75] [Article Influence: 6.3] [Reference Citation Analysis (0)] |
| 5. | Bhandarkar S, Tsutsumi A, Schneider EB, Ong CS, Paredes L, Brackett A, Ahuja V. Emergent Applications of Machine Learning for Diagnosing and Managing Appendicitis: A State-of-the-Art Review. Surg Infect (Larchmt). 2024;25:7-18. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 4] [Reference Citation Analysis (0)] |
| 6. | Petrauskas V, Poskus E, Luksaite-Lukste R, Kryzauskas M, Petrulionis M, Strupas K, Poskus T. Suspected and Confirmed Acute Appendicitis During the COVID-19 Pandemic: First and Second Quarantines-a Prospective Study. Front Surg. 2022;9:896206. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 2] [Cited by in RCA: 2] [Article Influence: 0.7] [Reference Citation Analysis (0)] |
| 7. | Akbulut S, Koc C, Kocaaslan H, Gonultas F, Samdanci E, Yologlu S, Yilmaz S. Comparison of clinical and histopathological features of patients who underwent incidental or emergency appendectomy. World J Gastrointest Surg. 2019;11:19-26. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 8] [Cited by in RCA: 16] [Article Influence: 2.7] [Reference Citation Analysis (0)] |
| 8. | Li L, Sun Y, Sun Y, Gao Y, Zhang B, Qi R, Sheng F, Yang X, Liu X, Liu L, Lu C, Chen L, Zhang K. Clinical-radiomics models with machine-learning algorithms to distinguish uncomplicated from complicated acute appendicitis in adults: a multiphase multicenter cohort study. Gastroenterol Rep (Oxf). 2025;13:goaf039. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 1] [Article Influence: 1.0] [Reference Citation Analysis (0)] |
| 9. | Li J, Ye J, Luo Y, Xu T, Jia Z. Progress in the application of machine learning in CT diagnosis of acute appendicitis. Abdom Radiol (NY). 2025;50:4040-4049. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 10. | Dongarwar D, Taylor J, Ajewole V, Anene N, Omoyele O, Ogba C, Oluwatoba A, Giger D, Thuy A, Argueta E, Naik E, Salemi JL, Spooner K, Olaleye O, Salihu HM. Trends in Appendicitis Among Pregnant Women, the Risk for Cardiac Arrest, and Maternal-Fetal Mortality. World J Surg. 2020;44:3999-4005. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3] [Cited by in RCA: 13] [Article Influence: 2.6] [Reference Citation Analysis (0)] |
| 11. | Jearwattanakanok K, Yamada S, Suntornlimsiri W, Smuthtai W, Patumanond J. Validation of the diagnostic score for acute lower abdominal pain in women of reproductive age. Emerg Med Int. 2014;2014:320926. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 4] [Article Influence: 0.4] [Reference Citation Analysis (0)] |
| 12. | Wahhab RASA, Mohammed LK. Incidental Gynecological Conditions in Patients Presented with Acute Appendicitis. Med J Babylon. 2024;21:259-262. [DOI] [Full Text] |
| 13. | Raman SS, Osuagwu FC, Kadell B, Cryer H, Sayre J, Lu DS. Effect of CT on false positive diagnosis of appendicitis and perforation. N Engl J Med. 2008;358:972-973. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 77] [Cited by in RCA: 76] [Article Influence: 4.5] [Reference Citation Analysis (0)] |
| 14. | Köse E, Hasbahçeci M, Aydın MC, Toy C, Saydam T, Özsoy A, Karahan SR. Is it beneficial to use clinical scoring systems for acute appendicitis in adults? Ulus Travma Acil Cerrahi Derg. 2019;25:12-19. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 3] [Article Influence: 0.5] [Reference Citation Analysis (0)] |
| 15. | Bom WJ, Scheijmans JCG, Salminen P, Boermeester MA. Diagnosis of Uncomplicated and Complicated Appendicitis in Adults. Scand J Surg. 2021;110:170-179. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 8] [Cited by in RCA: 59] [Article Influence: 14.8] [Reference Citation Analysis (0)] |
| 16. | Maleš I, Kumrić M, Huić Maleš A, Cvitković I, Šantić R, Pogorelić Z, Božić J. A Systematic Integration of Artificial Intelligence Models in Appendicitis Management: A Comprehensive Review. Diagnostics (Basel). 2025;15:866. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 17. | Navaei M, Doogchi Z, Gholami F, Tavakoli MK. Leveraging Machine Learning for Pediatric Appendicitis Diagnosis: A Retrospective Study Integrating Clinical, Laboratory, and Imaging Data. Health Sci Rep. 2025;8:e70756. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 18. | Shahul Hameed MR, Shahul Hameed S, Rafi Ahamed R, Thomas FA, George B. WBC Count vs. CRP Level in Laboratory Markers and USG vs. CT Abdomen in Imaging Modalities: A Retrospective Study in the United Arab Emirates to Determine Which Are the Better Diagnostic Tools for Acute Appendicitis. Cureus. 2023;15:e47454. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 19. | Benabbas R, Hanna M, Shah J, Sinert R. Diagnostic Accuracy of History, Physical Examination, Laboratory Tests, and Point-of-care Ultrasound for Pediatric Acute Appendicitis in the Emergency Department: A Systematic Review and Meta-analysis. Acad Emerg Med. 2017;24:523-551. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 86] [Cited by in RCA: 136] [Article Influence: 17.0] [Reference Citation Analysis (0)] |
| 20. | Balakrishnan P, Munisamy P, Vijayakumar S, Sinha P. Clinical Scoring Systems to Diagnose Complicated Acute Appendicitis in a Rural Hospital: Are They Good Enough? Cureus. 2024;16:e64927. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 21. | Andersson RE, Stark J. Diagnostic value of the appendicitis inflammatory response (AIR) score. A systematic review and meta-analysis. World J Emerg Surg. 2025;20:12. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 4] [Article Influence: 4.0] [Reference Citation Analysis (0)] |
| 22. | Mantoglu B, Gonullu E, Akdeniz Y, Yigit M, Firat N, Akin E, Altintoprak F, Erkorkmaz U. Which appendicitis scoring system is most suitable for pregnant patients? A comparison of nine different systems. World J Emerg Surg. 2020;15:34. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 16] [Cited by in RCA: 15] [Article Influence: 3.0] [Reference Citation Analysis (0)] |
| 23. | Gonullu E, Bayhan Z, Capoglu R, Mantoglu B, Kamburoglu B, Harmantepe T, Altıntoprak F, Erkorkmaz U. Diagnostic Accuracy Rates of Appendicitis Scoring Systems for the Stratified Age Groups. Emerg Med Int. 2022;2022:2505977. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 24. | Pogorelić Z, Mihanović J, Ninčević S, Lukšić B, Elezović Baloević S, Polašek O. Validity of Appendicitis Inflammatory Response Score in Distinguishing Perforated from Non-Perforated Appendicitis in Children. Children (Basel). 2021;8:309. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 15] [Cited by in RCA: 40] [Article Influence: 10.0] [Reference Citation Analysis (0)] |
| 25. | Karaman K, Ercan M, Demir H, Yalkın Ö, Uzunoğlu Y, Gündoğdu K, Zengin İ, Aksoy YE, Bostancı EB. The Karaman score: A new diagnostic score for acute appendicitis. Ulus Travma Acil Cerrahi Derg. 2018;24:545-551. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 5] [Article Influence: 0.8] [Reference Citation Analysis (0)] |
| 26. | Kaya MG, Acar E. The Role of Inflammatory Parameters and Scoring Systems in Predicting Complicated Acute Appendicitis. Meand Med Dent J. 2024;25:305-316. [DOI] [Full Text] |
| 27. | Lam A, Squires E, Tan S, Swen NJ, Barilla A, Kovoor J, Gupta A, Bacchi S, Khurana S. Artificial intelligence for predicting acute appendicitis: a systematic review. ANZ J Surg. 2023;93:2070-2078. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 17] [Cited by in RCA: 13] [Article Influence: 6.5] [Reference Citation Analysis (0)] |
| 28. | Issaiy M, Zarei D, Saghazadeh A. Artificial Intelligence and Acute Appendicitis: A Systematic Review of Diagnostic and Prognostic Models. World J Emerg Surg. 2023;18:59. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 34] [Cited by in RCA: 35] [Article Influence: 17.5] [Reference Citation Analysis (0)] |
| 29. | Kim M, Park T, Kang J, Kim MJ, Kwon MJ, Oh BY, Kim JW, Ha S, Yang WS, Cho BJ, Son I. Development and validation of automated three-dimensional convolutional neural network model for acute appendicitis diagnosis. Sci Rep. 2025;15:7711. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 30. | Pati A, Panigrahi A, Nayak DSK, Sahoo G, Singh D. Predicting Pediatric Appendicitis using Ensemble Learning Techniques. Procedia Comput Sci. 2023;218:1166-1175. [DOI] [Full Text] |
| 31. | Dogan K, Selcuk T. A Novel Deep Learning Approach for the Automatic Diagnosis of Acute Appendicitis. J Clin Med. 2024;13:4949. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 32. | Echevarria S, Rauf F, Hussain N, Zaka H, Farwa UE, Ahsan N, Broomfield A, Akbar A, Khawaja UA. Typical and Atypical Presentations of Appendicitis and Their Implications for Diagnosis and Treatment: A Literature Review. Cureus. 2023;15:e37024. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 20] [Reference Citation Analysis (0)] |
| 33. | Schipper A, Belgers P, O'Connor R, Jie KE, Dooijes R, Bosma JS, Kurstjens S, Kusters R, van Ginneken B, Rutten M. Machine-learning based prediction of appendicitis for patients presenting with acute abdominal pain at the emergency department. World J Emerg Surg. 2024;19:40. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 6] [Reference Citation Analysis (0)] |
| 34. | Roupakias S, Kambouri K, Al Nimer A, Bekiaridou K, Blevrakis E, Tsalikidis C, Sinopidis X. Balancing Between Negative Appendectomy and Complicated Appendicitis: A Persisting Reality Under the Rule of the Uncertainty Principle. Cureus. 2025;17:e81516. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 35. | Erman A, Ferreira J, Ashour WA, Guadagno E, St-Louis E, Emil S, Cheung J, Poenaru D. Machine-learning-assisted Preoperative Prediction of Pediatric Appendicitis Severity. J Pediatr Surg. 2025;60:162151. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 36. | Roshanaei G, Salimi R, Mahjub H, Faradmal J, Yamini A, Tarokhian A. Accurate diagnosis of acute appendicitis in the emergency department: an artificial intelligence-based approach. Intern Emerg Med. 2024;19:2347-2357. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 37. | Yazici H, Ugurlu O, Aygul Y, Ugur MA, Sen YK, Yildirim M. Predicting severity of acute appendicitis with machine learning methods: a simple and promising approach for clinicians. BMC Emerg Med. 2024;24:101. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 38. | Gollapalli M, Rahman A, Kudos SA, Foula MS, Alkhalifa AM, Albisher HM, Al-Hariri MT, Mohammad N. Appendicitis Diagnosis: Ensemble Machine Learning and Explainable Artificial Intelligence-Based Comprehensive Approach. Big Data Cogn Comput. 2024;8:108. [RCA] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 39. | Males I, Boban Z, Kumric M, Vrdoljak J, Berkovic K, Pogorelic Z, Bozic J. Applying an explainable machine learning model might reduce the number of negative appendectomies in pediatric patients with a high probability of acute appendicitis. Sci Rep. 2024;14:12772. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 11] [Reference Citation Analysis (0)] |
| 40. | Wei W, Tongping S, Jiaming W. Construction of a clinical prediction model for complicated appendicitis based on machine learning techniques. Sci Rep. 2024;14:16473. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 4] [Reference Citation Analysis (0)] |
| 41. | Abu-Ashour W, Emil S, Poenaru D. Using Artificial Intelligence to Label Free-Text Operative and Ultrasound Reports for Grading Pediatric Appendicitis. J Pediatr Surg. 2024;59:783-790. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 11] [Article Influence: 11.0] [Reference Citation Analysis (0)] |
| 42. | Chadaga K, Khanna V, Prabhu S, Sampathila N, Chadaga R, Umakanth S, Bhat D, Swathi KS, Kamath R. An interpretable and transparent machine learning framework for appendicitis detection in pediatric patients. Sci Rep. 2024;14:24454. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 43. | Akbulut S, Yagin FH, Cicek IB, Koc C, Colak C, Yilmaz S. Prediction of Perforated and Nonperforated Acute Appendicitis Using Machine Learning-Based Explainable Artificial Intelligence. Diagnostics (Basel). 2023;13:1173. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 18] [Cited by in RCA: 23] [Article Influence: 11.5] [Reference Citation Analysis (0)] |
| 44. | Harmantepe AT, Dikicier E, Gönüllü E, Ozdemir K, Kamburoğlu MB, Yigit M. A different way to diagnosis acute appendicitis: machine learning. Pol Przegl Chir. 2023;96:38-43. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 45. | Park SH, Kim YJ, Kim KG, Chung JW, Kim HC, Choi IY, You MW, Lee GP, Hwang JH. Comparison between single and serial computed tomography images in classification of acute appendicitis, acute right-sided diverticulitis, and normal appendix using EfficientNet. PLoS One. 2023;18:e0281498. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 13] [Article Influence: 6.5] [Reference Citation Analysis (0)] |
| 46. | Phan-Mai TA, Thai TT, Mai TQ, Vu KA, Mai CC, Nguyen DA. Validity of Machine Learning in Detecting Complicated Appendicitis in a Resource-Limited Setting: Findings from Vietnam. Biomed Res Int. 2023;2023:5013812. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 15] [Reference Citation Analysis (0)] |
| 47. | Mijwil MM, Aggarwal K. A diagnostic testing for people with appendicitis using machine learning techniques. Multimed Tools Appl. 2022;81:7011-7023. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 5] [Cited by in RCA: 28] [Article Influence: 9.3] [Reference Citation Analysis (0)] |
| 48. | Shikha A, Kasem A. The Development and Validation of Artificial Intelligence Pediatric Appendicitis Decision-Tree for Children 0 to 12 Years Old. Eur J Pediatr Surg. 2023;33:395-402. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 6] [Reference Citation Analysis (0)] |
| 49. | Su D, Li Q, Zhang T, Veliz P, Chen Y, He K, Mahajan P, Zhang X. Prediction of acute appendicitis among patients with undifferentiated abdominal pain at emergency department. BMC Med Res Methodol. 2022;22:18. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 16] [Reference Citation Analysis (0)] |
| 50. | Akgül F, Er A, Ulusoy E, Çağlar A, Çitlenbik H, Keskinoğlu P, Şişman AR, Karakuş OZ, Özer E, Duman M, Yılmaz D. Integration of Physical Examination, Old and New Biomarkers, and Ultrasonography by Using Neural Networks for Pediatric Appendicitis. Pediatr Emerg Care. 2021;37:e1075-e1081. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 19] [Article Influence: 4.8] [Reference Citation Analysis (0)] |
| 51. | Xia J, Wang Z, Yang D, Li R, Liang G, Chen H, Heidari AA, Turabieh H, Mafarja M, Pan Z. Performance optimization of support vector machine with oppositional grasshopper optimization for acute appendicitis diagnosis. Comput Biol Med. 2022;143:105206. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 71] [Cited by in RCA: 45] [Article Influence: 15.0] [Reference Citation Analysis (0)] |
| 52. | Marcinkevičs R, Reis Wolfertstetter P, Klimiene U, Chin-Cheong K, Paschke A, Zerres J, Denzinger M, Niederberger D, Wellmann S, Ozkan E, Knorr C, Vogt JE. Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis. Med Image Anal. 2024;91:103042. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 11] [Reference Citation Analysis (0)] |
| 53. | Marcinkevics R, Reis Wolfertstetter P, Wellmann S, Knorr C, Vogt JE. Using Machine Learning to Predict the Diagnosis, Management and Severity of Pediatric Appendicitis. Front Pediatr. 2021;9:662183. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 8] [Cited by in RCA: 39] [Article Influence: 9.8] [Reference Citation Analysis (0)] |
| 54. | Ghareeb WM, Emile SH, Elshobaky A. Artificial Intelligence Compared to Alvarado Scoring System Alone or Combined with Ultrasound Criteria in the Diagnosis of Acute Appendicitis. J Gastrointest Surg. 2022;26:655-658. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 15] [Cited by in RCA: 16] [Article Influence: 5.3] [Reference Citation Analysis (0)] |
| 55. | Hayashi K, Ishimaru T, Lee J, Hirai S, Ooke T, Hosokawa T, Omata K, Sanmoto Y, Kakihara T, Kawashima H. Identification of Appendicitis Using Ultrasound with the Aid of Machine Learning. J Laparoendosc Adv Surg Tech A. 2021;31:1412-1419. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 7] [Cited by in RCA: 10] [Article Influence: 2.5] [Reference Citation Analysis (0)] |
| 56. | Reismann J, Kiss N, Reismann M. The application of artificial intelligence methods to gene expression data for differentiation of uncomplicated and complicated appendicitis in children and adolescents - a proof of concept study. BMC Pediatr. 2021;21:268. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 11] [Article Influence: 2.8] [Reference Citation Analysis (0)] |
| 57. | Stiel C, Elrod J, Klinke M, Herrmann J, Junge CM, Ghadban T, Reinshagen K, Boettcher M. The Modified Heidelberg and the AI Appendicitis Score Are Superior to Current Scores in Predicting Appendicitis in Children: A Two-Center Cohort Study. Front Pediatr. 2020;8:592892. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 4] [Cited by in RCA: 24] [Article Influence: 4.8] [Reference Citation Analysis (0)] |
| 58. | Akmese OF, Dogan G, Kor H, Erbay H, Demir E. The Use of Machine Learning Approaches for the Diagnosis of Acute Appendicitis. Emerg Med Int. 2020;2020:7306435. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 7] [Cited by in RCA: 30] [Article Influence: 6.0] [Reference Citation Analysis (0)] |
| 59. | Aydin E, Türkmen İU, Namli G, Öztürk Ç, Esen AB, Eray YN, Eroğlu E, Akova F. A novel and simple machine learning algorithm for preoperative diagnosis of acute appendicitis in children. Pediatr Surg Int. 2020;36:735-742. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 11] [Cited by in RCA: 32] [Article Influence: 6.4] [Reference Citation Analysis (0)] |
| 60. | Rajpurkar P, Park A, Irvin J, Chute C, Bereket M, Mastrodicasa D, Langlotz CP, Lungren MP, Ng AY, Patel BN. AppendiXNet: Deep Learning for Diagnosis of Appendicitis from A Small Dataset of CT Exams Using Video Pretraining. Sci Rep. 2020;10:3958. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 60] [Cited by in RCA: 60] [Article Influence: 12.0] [Reference Citation Analysis (0)] |
| 61. | Park JJ, Kim KA, Nam Y, Choi MH, Choi SY, Rhie J. Convolutional-neural-network-based diagnosis of appendicitis via CT scans in patients with acute abdominal pain presenting in the emergency department. Sci Rep. 2020;10:9556. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 15] [Cited by in RCA: 32] [Article Influence: 6.4] [Reference Citation Analysis (0)] |
| 62. | Ramirez-Garcialuna JL, Vera-Bañuelos LR, Guevara-Torres L, Martínez-Jiménez MA, Ortiz-Dosal A, Gonzalez FJ, Kolosovas-Machuca ES. Infrared thermography of abdominal wall in acute appendicitis: Proof of concept study. Infrared Phys Techn. 2020;105:103165. [DOI] [Full Text] |
| 63. | Zhao Y, Yang L, Sun C, Li Y, He Y, Zhang L, Shi T, Wang G, Men X, Sun W, He F, Qin J. Discovery of Urinary Proteomic Signature for Differential Diagnosis of Acute Appendicitis. Biomed Res Int. 2020;2020:3896263. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 7] [Reference Citation Analysis (0)] |
| 64. | Kang HJ, Kang H, Kim B, Chae MS, Ha YR, Oh SB, Ahn JH. Evaluation of the diagnostic performance of a decision tree model in suspected acute appendicitis with equivocal preoperative computed tomography findings compared with Alvarado, Eskelinen, and adult appendicitis scores: A STARD compliant article. Medicine (Baltimore). 2019;98:e17368. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3] [Cited by in RCA: 8] [Article Influence: 1.3] [Reference Citation Analysis (0)] |
| 65. | Reismann J, Romualdi A, Kiss N, Minderjahn MI, Kallarackal J, Schad M, Reismann M. Diagnosis and classification of pediatric acute appendicitis by artificial intelligence methods: An investigator-independent approach. PLoS One. 2019;14:e0222030. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 24] [Cited by in RCA: 53] [Article Influence: 8.8] [Reference Citation Analysis (0)] |
| 66. | Gudelis M, Lacasta Garcia JD, Trujillano Cabello JJ. Diagnosis of pain in the right iliac fossa. A new diagnostic score based on Decision-Tree and Artificial Neural Network Methods. Cir Esp (Engl Ed). 2019;97:329-335. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 4] [Cited by in RCA: 11] [Article Influence: 1.8] [Reference Citation Analysis (0)] |
| 67. | Shahmoradi L, Safdari R, Mirhosseini MM, Arji G, Jannat B, Abdar M. Predicting Risk of Acute Appendicitis: A Comparison of Artificial Neural Network and Logistic Regression Models. Acta Med Iran. 2019;56:784-795. |
| 68. | Afshari Safavi A, Zand Karimi E, Rezaei M, Mohebi H, Mehrvarz S, Khorrami MR. Comparing the accuracy of neural network models and conventional tests in diagnosis of suspected acute appendicitis. J Mazandaran Univ Med Sci. 2015;25:58-65. |
| 69. | Jamshidnezhad A, Azizi A, Zadeh SR, Shirali S, Shoushtari MH, Sabaghan Y, Ziagham V, Attarzadeh M. A Computer Based Model in Comparison with Sonography Imaging to Diagnosis of Acute Appendicitis in Iran. J Acute Med. 2017;7:10-18. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 70. | Park SY, Kim SM. Acute appendicitis diagnosis using artificial neural networks. Technol Health Care. 2015;23 Suppl 2:S559-S565. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 16] [Cited by in RCA: 21] [Article Influence: 2.1] [Reference Citation Analysis (0)] |
| 71. | Yoldaş Ö, Tez M, Karaca T. Artificial neural networks in the diagnosis of acute appendicitis. Am J Emerg Med. 2012;30:1245-1247. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 24] [Cited by in RCA: 21] [Article Influence: 1.6] [Reference Citation Analysis (0)] |
| 72. | Hsieh CH, Lu RH, Lee NH, Chiu WT, Hsu MH, Li YC. Novel solutions for an old disease: diagnosis of acute appendicitis with random forest, support vector machines, and artificial neural networks. Surgery. 2011;149:87-93. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 88] [Cited by in RCA: 94] [Article Influence: 6.3] [Reference Citation Analysis (0)] |
| 73. | Prabhudesai SG, Gould S, Rekhraj S, Tekkis PP, Glazer G, Ziprin P. Artificial neural networks: useful aid in diagnosing acute appendicitis. World J Surg. 2008;32:305-9; discussion 310. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 36] [Cited by in RCA: 38] [Article Influence: 2.2] [Reference Citation Analysis (0)] |
| 74. | Grigull L, Lechner WM. Supporting diagnostic decisions using hybrid and complementary data mining applications: a pilot study in the pediatric emergency department. Pediatr Res. 2012;71:725-731. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 15] [Cited by in RCA: 19] [Article Influence: 1.5] [Reference Citation Analysis (0)] |
| 75. | Lee YH, Hu PJ, Cheng TH, Huang TC, Chuang WY. A preclustering-based ensemble learning technique for acute appendicitis diagnoses. Artif Intell Med. 2013;58:115-124. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 15] [Cited by in RCA: 11] [Article Influence: 0.9] [Reference Citation Analysis (0)] |
| 76. | Son CS, Jang BK, Seo ST, Kim MS, Kim YN. A hybrid decision support model to discover informative knowledge in diagnosing acute appendicitis. BMC Med Inform Decis Mak. 2012;12:17. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 8] [Cited by in RCA: 13] [Article Influence: 1.0] [Reference Citation Analysis (0)] |
| 77. | Ting HW, Wu JT, Chan CL, Lin SL, Chen MH. Decision model for acute appendicitis treatment with decision tree technology--a modification of the Alvarado scoring system. J Chin Med Assoc. 2010;73:401-406. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 19] [Cited by in RCA: 19] [Article Influence: 1.3] [Reference Citation Analysis (0)] |
| 78. | Sakai S, Kobayashi K, Toyabe S, Mandai N, Kanda T, Akazawa K. Comparison of the levels of accuracy of an artificial neural network model and a logistic regression model for the diagnosis of acute appendicitis. J Med Syst. 2007;31:357-364. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 24] [Cited by in RCA: 22] [Article Influence: 1.2] [Reference Citation Analysis (0)] |
| 79. | Aparicio PR, Marcinkevics R, Wolfertstetter PR, Wellmann S, Knorr C, Vogt JE. Learning Medical Risk Scores for Pediatric Appendicitis. 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA); 2021 Dec 13-16; Pasadena, CA, United States. IEEE, 2021: 1507-1512. |
| 80. | Bianchi V, Giambusso M, De Iacob A, Chiarello MM, Brisinda G. Artificial intelligence in the diagnosis and treatment of acute appendicitis: a narrative review. Updates Surg. 2024;76:783-792. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 5] [Cited by in RCA: 9] [Article Influence: 9.0] [Reference Citation Analysis (0)] |
| 81. | Chekmeyan M, Liu SH. Artificial intelligence for the diagnosis of pediatric appendicitis: A systematic review. Am J Emerg Med. 2025;92:18-31. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 3] [Cited by in RCA: 6] [Article Influence: 6.0] [Reference Citation Analysis (0)] |
| 82. | Rey R, Gualtieri R, La Scala G, Posfay Barbe K. Artificial Intelligence in the Diagnosis and Management of Appendicitis in Pediatric Departments: A Systematic Review. Eur J Pediatr Surg. 2024;34:385-391. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 6] [Article Influence: 6.0] [Reference Citation Analysis (0)] |
| 83. | Hua R, O'Brien MK, Carter M, Pitt JB, Kwon S, Ghomrawi HMK, Jayaraman A, Abdullah F. Improving Early Prediction of Abnormal Recovery after Appendectomy in Children using Real-world Data from Wearables. Annu Int Conf IEEE Eng Med Biol Soc. 2024;2024:1-4. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 2] [Article Influence: 2.0] [Reference Citation Analysis (0)] |
| 84. | Sibic O, Somuncu E, Yilmaz S, Avsar E, Bozdag E, Ozcan A, Aydin MO, Ozkan C. Diagnosis of Acute Appendicitis with Machine Learning-Based Computer Tomography: Diagnostic Reliability and Role in Clinical Management. J Laparoendosc Adv Surg Tech A. 2025;35:313-317. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 85. | Singh D, Nagaraj S, Mashouri P, Drysdale E, Fischer J, Goldenberg A, Brudno M. Assessment of Machine Learning-Based Medical Directives to Expedite Care in Pediatric Emergency Medicine. JAMA Netw Open. 2022;5:e222599. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 14] [Cited by in RCA: 23] [Article Influence: 7.7] [Reference Citation Analysis (0)] |
| 86. | Kucukakcali Z, Akbulut S. Role of immature granulocyte and blood biomarkers in predicting perforated acute appendicitis using machine learning model. World J Clin Cases. 2025;13:104379. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 87. | Kucukakcali Z, Akbulut S, Colak C. Evaluating Ensemble-Based Machine Learning Models for Diagnosing Pediatric Acute Appendicitis: Insights from a Retrospective Observational Study. J Clin Med. 2025;14:4264. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
| 88. | Kendall J, Gaspar G, Berger D, Levman J. Machine Learning and Feature Selection in Pediatric Appendicitis. Tomography. 2025;11:90. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 2] [Cited by in RCA: 1] [Article Influence: 1.0] [Reference Citation Analysis (0)] |
| 89. | Aydın E, Sarnıç TE, Türkmen İU, Khanmammadova N, Ateş U, Öztan MO, Sekmenli T, Aras NF, Öztaş T, Yalçınkaya A, Özbek M, Gökçe D, Yalçın Cömert HS, Uzunlu O, Kandırıcı A, Ertürk N, Süzen A, Akova F, Paşaoğlu M, Eroğlu E, Göllü Bahadır G, Çakmak AM, Bilici S, Karabulut R, İmamoğlu M, Sarıhan H, Karakuş SC. Diagnostic Accuracy of a Machine Learning-Derived Appendicitis Score in Children: A Multicenter Validation Study. Children (Basel). 2025;12:937. [PubMed] [DOI] [Full Text] |
| 90. | Zhao Y, Wang X, Zhang Y, Liu T, Zuo S, Sun L, Zhang J, Wang K, Liu J. Combination of clinical information and radiomics models for the differentiation of acute simple appendicitis and non simple appendicitis on CT images. Sci Rep. 2024;14:1854. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 91. | Liang D, Fan Y, Zeng Y, Zhou H, Zhou H, Li G, Liang Y, Zhong Z, Chen D, Chen A, Li G, Deng J, Huang B, Wei X. Development and Validation of a Deep Learning and Radiomics Combined Model for Differentiating Complicated From Uncomplicated Acute Appendicitis. Acad Radiol. 2024;31:1344-1354. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 10] [Cited by in RCA: 13] [Article Influence: 13.0] [Reference Citation Analysis (0)] |
| 92. | Li P, Zhang Z, Weng S, Nie H. Establishment of predictive models for acute complicated appendicitis during pregnancy-A retrospective case-control study. Int J Gynaecol Obstet. 2023;162:744-751. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 5] [Reference Citation Analysis (0)] |
| 93. | Lin HA, Lin LT, Lin SF. Application of Artificial Neural Network Models to Differentiate Between Complicated and Uncomplicated Acute Appendicitis. J Med Syst. 2023;47:38. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 3] [Cited by in RCA: 8] [Article Influence: 4.0] [Reference Citation Analysis (0)] |
| 94. | Iliou T, Anagnostopoulos C, Stephanakis IM, Anastassopoulos G. Combined Classification of Risk Factors for Appendicitis Prediction in Childhood. In: Iliadis L, Papadopoulos H, Jayne C, editors. Engineering Applications of Neural Networks. Berlin: Springer, 2013: 203-211. |
| 95. | Deleger L, Brodzinski H, Zhai H, Li Q, Lingren T, Kirkendall ES, Alessandrini E, Solti I. Developing and evaluating an automated appendicitis risk stratification algorithm for pediatric patients in the emergency department. J Am Med Inform Assoc. 2013;20:e212-e220. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 34] [Cited by in RCA: 40] [Article Influence: 3.3] [Reference Citation Analysis (0)] |
| 96. | Malley JD, Kruppa J, Dasgupta A, Malley KG, Ziegler A. Probability machines: consistent probability estimation using nonparametric learning machines. Methods Inf Med. 2012;51:74-81. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 162] [Cited by in RCA: 119] [Article Influence: 8.5] [Reference Citation Analysis (0)] |
| 97. | Forsström JJ, Irjala K, Selén G, Nyström M, Eklund P. Using data preprocessing and single layer perceptron to analyze laboratory data. Scand J Clin Lab Invest Suppl. 1995;222:75-81. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 4] [Cited by in RCA: 7] [Article Influence: 0.2] [Reference Citation Analysis (0)] |
| 98. | Pesonen E, Eskelinen M, Juhola M. Comparison of different neural network algorithms in the diagnosis of acute appendicitis. Int J Biomed Comput. 1996;40:227-233. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 17] [Cited by in RCA: 15] [Article Influence: 0.5] [Reference Citation Analysis (0)] |
| 99. | Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, Aldairem A, Alrashed M, Bin Saleh K, Badreldin HA, Al Yami MS, Al Harbi S, Albekairy AM. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23:689. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 6] [Cited by in RCA: 834] [Article Influence: 417.0] [Reference Citation Analysis (0)] |
| 100. | Utmal DM. Machine Learning Its Applications, Challenges & Tools: A Review. Int J Comput Sci Mob Comput. 2021;10:32-38. [DOI] [Full Text] |
| 101. | Yadalam PK, Thirukkumaran PV, Natarajan PM, Ardila CM. Light gradient boost tree classifier predictions on appendicitis with periodontal disease from biochemical and clinical parameters. Front Oral Health. 2024;5:1462873. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 3] [Reference Citation Analysis (0)] |
| 102. | Obaido G, Mienye ID, Egbelowo OF, Emmanuel ID, Ogunleye A, Ogbuokiri B, Mienye P, Aruleba K. Supervised machine learning in drug discovery and development: Algorithms, applications, challenges, and prospects. Mach Learn Appl. 2024;17:100576. [DOI] [Full Text] |
| 103. | Sarker IH. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput Sci. 2021;2:160. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 237] [Cited by in RCA: 833] [Article Influence: 208.3] [Reference Citation Analysis (0)] |
| 104. | Sarker IH. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput Sci. 2021;2:420. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 73] [Cited by in RCA: 428] [Article Influence: 107.0] [Reference Citation Analysis (0)] |
| 105. | Li M, Jiang Y, Zhang Y, Zhu H. Medical image analysis using deep learning algorithms. Front Public Health. 2023;11:1273253. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 5] [Cited by in RCA: 78] [Article Influence: 39.0] [Reference Citation Analysis (0)] |
| 106. | Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, Wang Y, Dong Q, Shen H, Wang Y. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2:230-243. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1189] [Cited by in RCA: 1495] [Article Influence: 186.9] [Reference Citation Analysis (0)] |
| 107. | Wells L, Bednarz T. Explainable AI and Reinforcement Learning-A Systematic Review of Current Approaches and Trends. Front Artif Intell. 2021;4:550030. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 14] [Cited by in RCA: 30] [Article Influence: 7.5] [Reference Citation Analysis (0)] |
| 108. | Dazeley R, Vamplew P, Cruz F. Explainable Reinforcement Learning for Broad-XAI: A Conceptual Framework and Survey. Available from: arXiv:2108.09003. [DOI] [Full Text] |
| 109. | Yurdem B, Kuzlu M, Gullu MK, Catak FO, Tabassum M. Federated learning: Overview, strategies, applications, tools and future directions. Heliyon. 2024;10:e38137. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 15] [Reference Citation Analysis (0)] |
| 110. | Kyrimi E, Dube K, Fenton N, Fahmi A, Neves MR, Marsh W, McLachlan S. Bayesian networks in healthcare: What is preventing their adoption? Artif Intell Med. 2021;116:102079. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 5] [Cited by in RCA: 24] [Article Influence: 6.0] [Reference Citation Analysis (0)] |
| 111. | Denecke K, May R, Rivera-Romero O. Transformer Models in Healthcare: A Survey and Thematic Analysis of Potentials, Shortcomings and Risks. J Med Syst. 2024;48:23. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 13] [Cited by in RCA: 18] [Article Influence: 18.0] [Reference Citation Analysis (0)] |
| 112. | Oss Boll H, Amirahmadi A, Ghazani MM, Morais WO, Freitas EP, Soliman A, Etminani F, Byttner S, Recamonde-Mendoza M. Graph neural networks for clinical risk prediction based on electronic health records: A survey. J Biomed Inform. 2024;151:104616. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 6] [Reference Citation Analysis (0)] |
| 113. | Nagarajah T, Poravi G. A Review on Automated Machine Learning (AutoML) Systems. 2019 IEEE 5th International Conference for Convergence in Technology (I2CT); 2019 Mar 29-31; Bombay, India. IEEE, 2019: 1-6. |
| 114. | Anandalwar SP, Callahan MJ, Bachur RG, Feng C, Sidhwa F, Karki M, Taylor GA, Rangel SJ. Use of White Blood Cell Count and Polymorphonuclear Leukocyte Differential to Improve the Predictive Value of Ultrasound for Suspected Appendicitis in Children. J Am Coll Surg. 2015;220:1010-1017. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 33] [Cited by in RCA: 39] [Article Influence: 3.9] [Reference Citation Analysis (0)] |
| 115. | Hao TK, Chung NT, Huy HQ, Linh NTM, Xuan NT. Combining Ultrasound with a Pediatric Appendicitis Score to Distinguish Complicated from Uncomplicated Appendicitis in a Pediatric Population. Acta Inform Med. 2020;28:114-118. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 8] [Article Influence: 1.6] [Reference Citation Analysis (0)] |
| 116. | Sadeghi Z, Alizadehsani R, Cifci MA, Kausar S, Rehman R, Mahanta P, Bora PK, Almasri A, Alkhawaldeh RS, Hussain S, Alatas B, Shoeibi A, Moosaei H, Hladík M, Nahavandi S, Pardalos PM. A review of Explainable Artificial Intelligence in healthcare. Comput Electr Eng. 2024;118:109370. [DOI] [Full Text] |
| 117. | Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021;8:53. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3068] [Cited by in RCA: 1203] [Article Influence: 300.8] [Reference Citation Analysis (0)] |
| 118. | Cappuccio M, Bianco P, Rotondo M, Spiezia S, D'Ambrosio M, Menegon Tasselli F, Guerra G, Avella P. Current use of artificial intelligence in the diagnosis and management of acute appendicitis. Minerva Surg. 2024;79:326-338. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 119. | Arora A, Alderman JE, Palmer J, Ganapathi S, Laws E, McCradden MD, Oakden-Rayner L, Pfohl SR, Ghassemi M, McKay F, Treanor D, Rostamzadeh N, Mateen B, Gath J, Adebajo AO, Kuku S, Matin R, Heller K, Sapey E, Sebire NJ, Cole-Lewis H, Calvert M, Denniston A, Liu X. The value of standards for health datasets in artificial intelligence-based applications. Nat Med. 2023;29:2929-2938. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 15] [Cited by in RCA: 91] [Article Influence: 45.5] [Reference Citation Analysis (0)] |
| 120. | Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17:195. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1023] [Cited by in RCA: 1027] [Article Influence: 171.2] [Reference Citation Analysis (0)] |
| 121. | Adeniran AA, Onebunne AP, William P. Explainable AI (XAI) in healthcare: Enhancing trust and transparency in critical decision-making. World J Adv Res Rev. 2024;23:2447-2658. [DOI] [Full Text] |
| 122. | Han W, Li W, Zhang H. A comprehensive review on the fundamental principles, innovative designs, and multidisciplinary applications of micromixers. Phys Fluids. 2024;36:101306. [DOI] [Full Text] |
| 123. | Chen X, Tang T, Zhai J, Liang A, Li X, Chen X. Bioinspired Leaf-Vein Micromixer for a Rapid and Efficient Synthesis of Monodisperse Ciprofloxacin Lipid Nanoparticles. Langmuir. 2025;41:19572-19581. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 2] [Reference Citation Analysis (0)] |
| 124. | Han W, Li W, Zhang H. Insight into mixing performance of bionic fractal baffle micromixers based on Murray's Law. Int Commun Heat Mass. 2024;157:107843. [DOI] [Full Text] |
| 125. | Hamilton A. Artificial Intelligence and Healthcare Simulation: The Shifting Landscape of Medical Education. Cureus. 2024;16:e59747. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 18] [Reference Citation Analysis (0)] |
