Observational Study
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastrointest Endosc. Jul 16, 2025; 17(7): 108307
Published online Jul 16, 2025. doi: 10.4253/wjge.v17.i7.108307
Construction and validation of a machine learning algorithm-based predictive model for difficult colonoscopy insertion
Ren-Xuan Gao, Xin-Lei Wang, Ming-Jie Tian, Xiao-Ming Li, Jia-Jia Zhang, Jun-Jing Wang, Jing Gao, Chao Zhang, Zhi-Ting Li
Ren-Xuan Gao, Zhi-Ting Li, Department of Gastroenterology, North China University of Science and Technology Affiliated Hospital, Tangshan 063000, Hebei Province, China
Xin-Lei Wang, Department of Gastroenterology, Tangshan Fengrun District People's Hospital, Tangshan 064000, Hebei Province, China
Ming-Jie Tian, Jia-Jia Zhang, Jun-Jing Wang, Chao Zhang, School of Clinical Medicine, North China University of Science and Technology, Tangshan 063000, Hebei Province, China
Xiao-Ming Li, North China University of Science and Technology, School of Public Health, Tangshan 063000, Hebei Province, China
Jing Gao, Department of Gastroenterology, Tangshan Maternal and Child Health Hospital, Tangshan 063000, Hebei Province, China
Co-first authors: Ren-Xuan Gao and Xin-Lei Wang.
Co-corresponding authors: Chao Zhang and Zhi-Ting Li.
Author contributions: Gao RX and Wang XL performed the data analysis and wrote the manuscript; Tian MJ and Li XM performed the data curation; Zhang JJ, Wang JJ, and Gao J performed the data collection; All authors have read and approved the final manuscript.
Supported by Natural Science Foundation of Hebei Province, No. H2020209166.
Institutional review board statement: This study has been registered at the Chinese Clinical Trial Registry (No. ChiCTR2000040109) and approved by the Hospital Ethics Committee (No. 20210130017).
Informed consent statement: All data collection has obtained the patient's consent.
Conflict-of-interest statement: The authors have no conflict of interests with respect to the research, authorship, and/or publication of this article.
STROBE statement: The authors have read the STROBE Statement—checklist of items, and the manuscript was prepared and revised according to the STROBE Statement—checklist of items.
Data sharing statement: All data and code associated with this study have been deposited in GitHub and are publicly available at: https://github.com/chao2025/data.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Chao Zhang, Associate Professor, Chief Physician, School of Clinical Medicine, North China University of Science and Technology, Construction South Road, Tangshan 063000, Hebei Province, China. handsomechao2025@126.com
Received: April 11, 2025
Revised: May 7, 2025
Accepted: May 30, 2025
Published online: July 16, 2025
Processing time: 90 Days and 4.9 Hours
Abstract
BACKGROUND

Difficulty of colonoscopy insertion (DCI) significantly affects colonoscopy effectiveness and serves as a key quality indicator. Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.

AIM

To evaluate the predictive performance of machine learning (ML) algorithms for DCI by comparing three modeling approaches, identify factors influencing DCI, and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.

METHODS

This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021. Demographic data, past medical history, medication use, and psychological status were collected. The endoscopist assessed DCI using the visual analogue scale. After univariate screening, predictive models were developed using multivariable logistic regression, least absolute shrinkage and selection operator (LASSO) regression, and random forest (RF) algorithms. Model performance was evaluated based on discrimination, calibration, and decision curve analysis (DCA), and results were visualized using nomograms.

RESULTS

A total of 712 patients (53.8% male; mean age 54.5 years ± 12.9 years) were included. Logistic regression analysis identified constipation [odds ratio (OR) = 2.254, 95% confidence interval (CI): 1.289-3.931], abdominal circumference (AC) (77.5–91.9 cm, OR = 1.895, 95%CI: 1.065-3.350; AC ≥ 92 cm, OR = 1.271, 95%CI: 0.730-2.188), and anxiety (OR = 1.071, 95%CI: 1.044-1.100) as predictive factors for DCI, validated by LASSO and RF methods. Model performance revealed training/validation sensitivities of 0.826/0.925, 0.924/0.868, and 1.000/0.981; specificities of 0.602/0.511, 0.510/0.562, and 0.977/0.526; and corresponding area under the receiver operating characteristic curves (AUCs) of 0.780 (0.737-0.823)/0.726 (0.654-0.799), 0.754 (0.710-0.798)/0.723 (0.656-0.791), and 1.000 (1.000-1.000)/0.754 (0.688-0.820), respectively. DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37. The RF model demonstrated superior diagnostic accuracy, reflected by perfect training sensitivity (1.000) and highest validation AUC (0.754), outperforming other methods in clinical applicability.

CONCLUSION

The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models. This approach supports individualized preoperative optimization, enhancing colonoscopy quality through targeted risk stratification.

Keywords: Colonoscopy; Difficulty of colonoscopy insertion; Machine learning algorithms; Predictive model; Logistic regression; Least absolute shrinkage and selection operator regression; Random forest

Core Tip: This study developed machine learning models to predict the difficulty of colonoscopy insertion using abdominal circumference, constipation, anxiety, and clinical history. Among the 712 patients, the random forest model achieved optimal performance, demonstrating high sensitivity and clinical utility. It uniquely integrates anatomical, psychological, and medical factors, offering a novel preoperative risk-stratification tool to enhance procedural success and patient comfort. This approach supports tailored interventions, improving colonoscopy quality through personalized risk assessment.