Published online Jul 28, 2020. doi: 10.35712/aig.v1.i1.30
Peer-review started: April 23, 2020
First decision: June 4, 2020
Revised: June 15, 2020
Accepted: June 17, 2020
Article in press: June 17, 2020
Published online: July 28, 2020
Processing time: 94 Days and 14.3 Hours
The use of machine learning (ML) to predict colonoscopy procedure duration has not been examined.
To assess if ML and data available at the time a colonoscopy procedure is scheduled could be used to estimate procedure duration more accurately than the current practice.
Total 40168 colonoscopies from the Clinical Outcomes Research Initiative database were collected. ML models predicting procedure duration were developed using data available at time of scheduling. The top performing model was compared against historical practice. Models were evaluated based on accuracy (prediction – actual time) ± 5, 10, and 15 min.
ML outperformed historical practice with 77.1% to 68.9%, 87.3% to 79.6%, and 92.1% to 86.8% accuracy at 5, 10 and 15 min thresholds.
The use of ML to estimate colonoscopy procedure duration may lead to more accurate scheduling.
Core tip: Machine learning has been utilized to predict surgical procedure duration and enhance operating room proficiency, however its usefulness for predicting colonoscopy procedure duration has not been examined. Procedure duration predictions from a machine learning algorithm trained on data from the Clinical Outcomes Research Initiative database outperformed historical practice.
- Citation: Podboy AJ, Scheinker D. Machine learning better predicts colonoscopy duration. Artif Intell Gastroenterol 2020; 1(1): 30-36
- URL: https://www.wjgnet.com/2644-3236/full/v1/i1/30.htm
- DOI: https://dx.doi.org/10.35712/aig.v1.i1.30
Current colonoscopy scheduling models utilize either historical averages or predetermined time allotments (usually 30-45 min). Scheduling has not evolved to incorporate patient information, case complexity, procedure environment, or operator proficiency. Failure to assess for these variables can lead to significant misjudgments of procedural duration. These errors can result in both under- and overutilization of endoscopy room time leading to increased cost, misappropriation of endoscopy resources, delays, and decreases to patient and provider satisfaction[1]. Machine learning (ML) has been utilized to predict surgical procedure duration and enhance operating room proficiency, however its usefulness for predicting colonoscopy procedure duration has not been examined[2,3].
Our aim was to assess if ML and data available at the time a colonoscopy procedure is scheduled could be used to estimate procedure duration more accurately than the current practice.
The Clinical Outcomes Research Initiative (CORIv.4) database was queried for all colonoscopies with complete procedural duration times from 2008-2014 following approval from our institutional review board.
The CORI database is a national central repository of endoscopic procedures from a physician network of academic, community and veteran administration hospitals/practices. The details of the repository can be found in previous publications[4]. ML models were trained on variables with < 20% missing values and variables available prior to the procedure. Procedures with duration < 5 and > 280 min were excluded. All statistical analyses were performed in R-studio version 3.5.3 (Boston, Massachusetts). 80% of the cases were used for training data and the remaining 20% used to compare the performance of these models. To reduce skew in the data, the target variable (procedural duration), was logarithmically transformed in line with previous publications[3,5].
Following established methodology[3,5,6], several models were tuned to predict procedure-time duration using cross-validation. The various models included random forest, gradient boosting machine, least absolute shrinkage and selection operator or LASSO, and extreme gradient boosting models (xgboost). The best performing model was selected based on lowest root mean squared error of the model and trained using historical data (2008-2013) to predict “current” data (2014). Predictions derived from the best performing model were compared with the current standard of using historical means. Models were evaluated based on accuracy (prediction – actual time) within thresholds of 5, 10, and 15 min to account for operational considerations.
Total of 40168 colonoscopies from 75 different sites from 2008 to 2014 with procedural duration information were obtained. 32136 (80%) of the cases were used for training the algorithm, with the remaining 8032 (20%) used to compare the performance of these models. A total of five patient (age, gender, race, ASA class, pediatric status), eight provider (endoscopist ID, degree of performing provider, degree year of performing provider, specialty of provider, gender and race/ethnicity of the provider, fellow involvement) and twelve procedure specific [(procedure year, procedure order, site ID, site type (University vs Community), location of procedure/facility type, duration of procedure, primary indication of procedure, depth intended of the procedure, sedation type used, state, and region)] variables were all selected for model analysis and training.
Table 1 demonstrates background characteristics of the final cohort. The best performing machine learning algorithm was the xgboost model. Figure 1 depicts the final models accuracy. The percentages of procedures for which the xgboost and the historical models generated forecasts within the 5, 10 and 15 min threshold were 77.1% vs 68.9%, 87.3% vs 79.6%, and 92.1% vs 86.8% (P < 0.001). The most important features of the model were: Patient age, procedure year, and the degree year of provider year (Figure 2).
Demographic information | ||
Total patients | 40168 | |
Mean age | 58.95 | |
Sex | Female | 17682 |
Male | 22485 (56.0%) | |
ASA Class | I | 7071 |
II | 27699 | |
III | 5237 | |
IV | 158 | |
V | 3 | |
Race | Caucasian | 32031 |
Hispanic | 2219 | |
Black | 2193 | |
Asian | 1140 | |
Native American | 679 | |
Other | 1906 | |
Procedural information | ||
Median procedure year | 2012 | (2008-2014) |
Total No. of sites | 75 | |
Fellow involved | 3575 | |
Indication for procedure | Average risk screening | 12687 |
Surveillance of adenomatous polyps | 8213 | |
Hematochezia | 3795 | |
High risk screening | 3272 | |
Anemia | 1508 | |
Diarrhea | 1469 | |
Other | 9224 | |
Procedure order | 1st | 37864 |
2nd | 2056 | |
Other | 248 | |
Mean duration of procedure | 23.4 min | |
Depth intended | Cecum | 31745 |
Terminal Ileum | 6798 | |
Ascending colon | 570 | |
Ileum | 424 | |
Anastomosis site | 447 | |
Other | 163 | |
Location of the procedure | Hospital endoscopy suite | 15589 |
Ambulatory surgery center | 14730 | |
unknown | 5739 | |
Office | 2501 | |
Endoscopy suite | 1450 | |
ICU | 88 | |
Region | North Central | 3490 |
Northeast | 11156 | |
Northwest | 12329 | |
South Central | 776 | |
South East | 1466 | |
South West | 10947 | |
Site type | Community | 25133 |
HMO | 1000 | |
University | 5676 | |
VA | 8359 | |
Sedation | None | 241 |
Moderate/Conscious sedation | 28009 | |
“Deep” Sedation | 7289 | |
General Anesthesia | 2510 | |
Anxiolytic Sedation | 78 | |
Provider information | ||
Gender of provider | Female | 9881 |
Male | 30287 | |
Median degree year of provider | 1989 | (1962-2009) |
Degree of performing provider | DO | 1253 |
MD | 38851 | |
PA | 64 | |
Provider specialty | Gastroenterology | 33059 |
Surgery | 2976 | |
Colorectal surgery | 995 | |
Internal medicine | 1589 | |
Family medicine | 581 | |
Other | 968 | |
Ethnicity of provider | Hispanic | 419 |
Non-hispanic | 37148 |
We demonstrated that machine learning predicts colonoscopy procedure duration more accurately than the currently accepted standard practice and the improvement was greater as the tolerance for error decreased.
Our results mimic similar applications of machine learning algorithms. Bartek et al[6] compared the standard practice of using average historical procedure duration and surgeon estimates of procedural duration compared to predictions derived from a machine learning model. Using a 10% accuracy threshold, the machine learning algorithm outperformed both traditional practices (39% ML vs 32% surgeon derived and 30% historical means). In an analysis of feature importance, the authors noted that fundamental case information, such as mean duration of the last ten procedures, was the most important predictive feature, with patient health metrics having a smaller total impact. However, our results suggest that patient specific factors may play a greater role in determining colonoscopy procedure duration. While again provider and procedural factors demonstrated high importance, patient specific factors (such as age, female sex) factored substantially into our model’s final predictions.
There are several strengths to our analysis. A large number of colonoscopies from a national repository of endoscopic procedures composed of a wide array of procedures, patients, and providers from an assortment of practice environments were analyzed. Inclusion of a national database increases generalizability by limiting regional or practice related biases.
However, there are several limitations to our analysis. Procedure reporting to the CORI database is voluntary and there may be an inherent selection bias in which easier colonoscopies were more likely to be reported to the database. This is supported by the relatively short overall procedural duration in our cohort. While the effects of a longer average procedure duration on our model are unknown, we anticipate more resiliency to increased error in the ML model compared to historical means, further enhancing the overall accuracy of the model compared to traditional practice.
While the algorithm was successful, it largely represents a rudimentary proof of concept option. Several variables that have been associated with difficult or lengthy colonoscopies in previous reports[7] and were either not available or too incomplete in this current data set to allow for inclusion into our analysis. Addition of variables associated with difficult colonoscopies including body mass index, previous abdominal or pelvic surgeries, bowel habits, weight, height etc. would potentially improve the models accuracy.
The use of an algorithm trained on prospectively collected data with greater provider, environmental, patient, and procedural information may lead to improvements in colonoscopy procedure scheduling. Such improvements may contribute to improved efficiency, patient and provider satisfaction, and reduced costs. Further study is necessary to examine the implications of the deployment of such a model in a clinical setting, and assess if such models can be used in other gastrointestinal procedures.
The usefulness of machine learning (ML) for predicting colonoscopy procedure duration has not been examined.
A ML algorithm trained on endoscopic data derived from the Clinical Outcomes Research Initiative database predicted colonoscopy procedure duration more accurately than the currently accepted standard practice and the improvement was greater as the tolerance for error decreased.
The aim of this study was to assess if ML and data available at the time a colonoscopy procedure is scheduled could be used to estimate procedure duration more accurately than the current practice.
Total 40168 colonoscopies were collected. ML models predicting procedure duration were developed using data available at time of scheduling. The top performing model was compared against historical practice.
ML outperformed historical practice with 77.1% to 68.9%, 87.3% to 79.6%, and 92.1% to 86.8% accuracy at 5, 10 and 15 min thresholds, and the most important features of the model were: patient age, procedure year, and the degree year of provider year.
The use of ML to estimate colonoscopy procedure duration may lead to more accurate scheduling.
Further study is necessary to examine the implications of the deployment of such a model in a clinical setting, and assess if such models can be used in other gastrointestinal procedures.
Manuscript source: Unsolicited manuscript
Specialty type: Gastroenterology and Hepatology
Country/Territory of origin: United States
Peer-review report’s scientific quality classification
Grade A (Excellent): 0
Grade B (Very good): 0
Grade C (Good): C
Grade D (Fair): 0
Grade E (Poor): 0
P-Reviewer: Mohamed SY S-Editor: Wang JL L-Editor: A E-Editor: Ma YJ
1. | Almeida R, Paterson WG, Craig N, Hookey L. A Patient Flow Analysis: Identification of Process Inefficiencies and Workflow Metrics at an Ambulatory Endoscopy Unit. Can J Gastroenterol Hepatol. 2016;2016:2574076. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 12] [Cited by in F6Publishing: 13] [Article Influence: 1.6] [Reference Citation Analysis (0)] |
2. | Stepaniak PS, Heij C, Mannaerts GH, de Quelerij M, de Vries G. Modeling procedure and surgical times for current procedural terminology-anesthesia-surgeon combinations and evaluation in terms of case-duration prediction and operating room efficiency: a multicenter study. Anesth Analg. 2009;109:1232-1245. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 88] [Cited by in F6Publishing: 89] [Article Influence: 5.9] [Reference Citation Analysis (0)] |
3. | Master N, Zhou Z, Miller D, Scheinker D, Bambos N, Glynn P. Improving predictions of pediatric surgical durations with supervised learning. Int J Data Sci Anal. 2017;4:33-52. [DOI] [Cited in This Article: ] [Cited by in Crossref: 22] [Cited by in F6Publishing: 9] [Article Influence: 1.3] [Reference Citation Analysis (0)] |
4. | Holub JL, Morris C, Fagnan LJ, Logan JR, Michaels LC, Lieberman DA. Quality of Colonoscopy Performed in Rural Practice: Experience From the Clinical Outcomes Research Initiative and the Oregon Rural Practice-Based Research Network. J Rural Health. 2018;34 Suppl 1:s75-s83. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 5] [Cited by in F6Publishing: 6] [Article Influence: 0.9] [Reference Citation Analysis (0)] |
5. | Scheinker D, Valencia A, Rodriguez F. Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models. JAMA Netw Open. 2019;2:e192884. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 26] [Cited by in F6Publishing: 22] [Article Influence: 4.4] [Reference Citation Analysis (0)] |
6. | Bartek MA, Saxena RC, Solomon S, Fong CT, Behara LD, Venigandla R, Velagapudi K, Lang JD, Nair BG. Improving Operating Room Efficiency: Machine Learning Approach to Predict Case-Time Duration. J Am Coll Surg. 2019;229:346-354.e3. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 79] [Cited by in F6Publishing: 60] [Article Influence: 12.0] [Reference Citation Analysis (0)] |
7. | Anderson JC, Messina CR, Cohn W, Gottfried E, Ingber S, Bernstein G, Coman E, Polito J. Factors predictive of difficult colonoscopy. Gastrointest Endosc. 2001;54:558-562. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 165] [Cited by in F6Publishing: 181] [Article Influence: 7.9] [Reference Citation Analysis (0)] |