Construction and validation of a machine learning algorithm-based predictive model for difficult colonoscopy insertion

doi:10.4253/wjge.v17.i7.108307

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 17, Issue 7

This Article

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (1650)

All Articles published online

The chart showing PDF series, HTML series, Figures (1-6) series, Tables (1-5) series.

Item

Count

PDF

HTML

710

Figures (1-6)

145

Tables (1-5)

146

Sum=1058

Publishing Process of This Article

The chart showing Browse series, Download series.

Item

Count

Browse

Download

426

Sum=506

Jul 16, 2025 (publication date) through Feb 17, 2026

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

World Journal of Gastrointestinal Endoscopy

ISSN

1948-5190

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Observational Study

World J Gastrointest Endosc. Jul 16, 2025; 17(7): 108307
Published online Jul 16, 2025. doi: 10.4253/wjge.v17.i7.108307

Construction and validation of a machine learning algorithm-based predictive model for difficult colonoscopy insertion

Ren-Xuan Gao, Xin-Lei Wang, Ming-Jie Tian, Xiao-Ming Li, Jia-Jia Zhang, Jun-Jing Wang, Jing Gao, Chao Zhang, Zhi-Ting Li

Ren-Xuan Gao, Zhi-Ting Li, Department of Gastroenterology, North China University of Science and Technology Affiliated Hospital, Tangshan 063000, Hebei Province, China

Xin-Lei Wang, Department of Gastroenterology, Tangshan Fengrun District People's Hospital, Tangshan 064000, Hebei Province, China

Ming-Jie Tian, Jia-Jia Zhang, Jun-Jing Wang, Chao Zhang, School of Clinical Medicine, North China University of Science and Technology, Tangshan 063000, Hebei Province, China

Xiao-Ming Li, North China University of Science and Technology, School of Public Health, Tangshan 063000, Hebei Province, China

Jing Gao, Department of Gastroenterology, Tangshan Maternal and Child Health Hospital, Tangshan 063000, Hebei Province, China

Co-first authors: Ren-Xuan Gao and Xin-Lei Wang.

Co-corresponding authors: Chao Zhang and Zhi-Ting Li.

Author contributions: Gao RX and Wang XL performed the data analysis and wrote the manuscript; Tian MJ and Li XM performed the data curation; Zhang JJ, Wang JJ, and Gao J performed the data collection; All authors have read and approved the final manuscript.

Supported by Natural Science Foundation of Hebei Province, No. H2020209166.

Institutional review board statement: This study has been registered at the Chinese Clinical Trial Registry (No. ChiCTR2000040109) and approved by the Hospital Ethics Committee (No. 20210130017).

Informed consent statement: All data collection has obtained the patient's consent.

Conflict-of-interest statement: The authors have no conflict of interests with respect to the research, authorship, and/or publication of this article.

STROBE statement: The authors have read the STROBE Statement—checklist of items, and the manuscript was prepared and revised according to the STROBE Statement—checklist of items.

Data sharing statement: All data and code associated with this study have been deposited in GitHub and are publicly available at: https://github.com/chao2025/data.

Corresponding author: Chao Zhang, Associate Professor, Chief Physician, School of Clinical Medicine, North China University of Science and Technology, Construction South Road, Tangshan 063000, Hebei Province, China. handsomechao2025@126.com

Received: April 11, 2025
Revised: May 7, 2025
Accepted: May 30, 2025
Published online: July 16, 2025
Processing time: 90 Days and 4.9 Hours

Abstract

BACKGROUND

Difficulty of colonoscopy insertion (DCI) significantly affects colonoscopy effectiveness and serves as a key quality indicator. Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.

AIM

To evaluate the predictive performance of machine learning (ML) algorithms for DCI by comparing three modeling approaches, identify factors influencing DCI, and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.

METHODS

This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021. Demographic data, past medical history, medication use, and psychological status were collected. The endoscopist assessed DCI using the visual analogue scale. After univariate screening, predictive models were developed using multivariable logistic regression, least absolute shrinkage and selection operator (LASSO) regression, and random forest (RF) algorithms. Model performance was evaluated based on discrimination, calibration, and decision curve analysis (DCA), and results were visualized using nomograms.

RESULTS

A total of 712 patients (53.8% male; mean age 54.5 years ± 12.9 years) were included. Logistic regression analysis identified constipation [odds ratio (OR) = 2.254, 95% confidence interval (CI): 1.289-3.931], abdominal circumference (AC) (77.5–91.9 cm, OR = 1.895, 95%CI: 1.065-3.350; AC ≥ 92 cm, OR = 1.271, 95%CI: 0.730-2.188), and anxiety (OR = 1.071, 95%CI: 1.044-1.100) as predictive factors for DCI, validated by LASSO and RF methods. Model performance revealed training/validation sensitivities of 0.826/0.925, 0.924/0.868, and 1.000/0.981; specificities of 0.602/0.511, 0.510/0.562, and 0.977/0.526; and corresponding area under the receiver operating characteristic curves (AUCs) of 0.780 (0.737-0.823)/0.726 (0.654-0.799), 0.754 (0.710-0.798)/0.723 (0.656-0.791), and 1.000 (1.000-1.000)/0.754 (0.688-0.820), respectively. DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37. The RF model demonstrated superior diagnostic accuracy, reflected by perfect training sensitivity (1.000) and highest validation AUC (0.754), outperforming other methods in clinical applicability.

CONCLUSION

The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models. This approach supports individualized preoperative optimization, enhancing colonoscopy quality through targeted risk stratification.

Keywords: Colonoscopy; Difficulty of colonoscopy insertion; Machine learning algorithms; Predictive model; Logistic regression; Least absolute shrinkage and selection operator regression; Random forest

Core Tip: This study developed machine learning models to predict the difficulty of colonoscopy insertion using abdominal circumference, constipation, anxiety, and clinical history. Among the 712 patients, the random forest model achieved optimal performance, demonstrating high sensitivity and clinical utility. It uniquely integrates anatomical, psychological, and medical factors, offering a novel preoperative risk-stratification tool to enhance procedural success and patient comfort. This approach supports tailored interventions, improving colonoscopy quality through personalized risk assessment.