TO THE EDITOR
We were delighted to read the recent minireview by Ding et al[1] published in Artificial Intelligence in Gastrointestinal Endoscopy. This minireview provides the reader with an introduction to the applications of artificial intelligence (AI) in gastrointestinal endoscopy. It also presents the limitations of operator-dependent endoscopy, positioning deep learning models, such as convolutional neural networks, as a feasible solution to overcome these challenges. However, we recognized that some sections were lacking in analytical depth.
The authors reported the rate of missed early-stage gastrointestinal tumors as 20%-30% but failed to provide a detailed analysis of the underlying reasons, limiting its deep analytical component. More precisely, the diagnostic performance metrics - the miss rate in particular - for incipient lesions exhibited marked variation, correlating directly with both the anatomical localization and the corresponding lesion morphology. With focus on the influence of anatomical localization, this variation manifested as the adenoma detection rate, which was frequently cited as being lower in the right colon than in the distal colon. Furthermore, lesion morphology contributes significantly and directly to the variation in miss rates. The fact that serrated sessile lesions are challenging to identify is a limitation that is consistent across multiple studies[2-4].
The authors also oversimplified the definitions of machine learning and deep learning and the models involved in the improvement of gastrointestinal endoscopy. Essentially, from the point of view of how the algorithm learns from data, machine learning models can be categorized in three main types. Among them, supervised learning is a machine learning method where the goal is to learn a mapping function that connects input data to the correct output. Conversely, unsupervised learning works by autonomous discovery of the underlying structure and relationships within the data itself, using these patterns to form its outputs in the absence of labels. Lastly, the hybrid deep learning model basically merges the two aforementioned ones to provide improved accuracy and robustness[5].
Even though the condensed versions were accurate statements, the authors hindered the readers in gaining an in-depth understanding of the technology and models, potentially creating confusion among readers. For example, generative adversarial networks (GANs), were described as a beneficial tool for clinical applications, whereas the primary scope of GANs is to generate new data rather than having a direct involvement in diagnosis. Fundamentally, GANs represent deep learning architecture, with the core/intrinsic role being generation of more authentic new data from a given training dataset[6]. Moreover, they were included in the category of supervised models. However, GANs can also be used in an unsupervised or hybrid manner, thus warranting a further explanation from the authors. The other two most common deep learning models (hybrid and unsupervised) were broadly outlined with a list of limitations (e.g., computational complexities or lack of validity). These limitations require an in-depth investigation because they are crucial factors for determining the successful adoption of an AI model in a clinical setting.
Furthermore, computer aided systems represent applications of machine learning. computer-aided detection (CADe) mainly focuses on enhancing detection sensitivity by localizing and flagging the region of interest. In contrast, computer-aided diagnosis (CADx) systems facilitate characterization and classification of the lesion and therefore provide a more quantitative assessment, ultimately providing refined clinical diagnostic interpretation[7]. The two systems, namely CADe and CADx, work consecutively in a sequential and interdependent manner, rendering them complementary. In addition, in contemporary research, modern AI systems are designed as integrated CADe/CADx solutions to provide a complete spectrum of support[8].
It is well known in the AI field that AI models are only as good as the data they are trained on[9]. Consequently, we were concerned by the paucity of details regarding the origin, quality, and diversity of the datasets when the AI model combined endoscopic, histological, and risk factors. Combining three distinct features in AI models without robust details on the dataset carries a potential for data bias (e.g., different ethnic backgrounds or inconsistent annotation and labelling) that could compromise generalizability. In line with this concept, one study conducted on the impact of data quality on the machine learning model’s performance concluded that automated labeling methods could mitigate the challenge of poor data quality, specifically resolving wrong labels[10]. Additionally, the section that addressed denoising algorithms like U-Net variants could be improved by providing the reader with illustrative examples to enhance clarity.
We also identified a discrepancy between research findings and clinical applicability. The results from the study by Namikawa et al[11] failed to demonstrate the impact a less specific model has in a clinical setting, including a higher rate of false positives leading to an increased need for unnecessary biopsies and burdening the patient and the healthcare system. To mitigate this challenge, the European Society for Gastrointestinal Endoscopy (ESGE) issued a recommendation stating that for acceptance of AI in the detection of colorectal polyps, the AI-assisted detection should have a false-positive rate that does not significantly prolong withdrawal time, justifying the need for endoscopists to spend an excessive amount of time in efforts to discard the false-positive alert, itself which may result in unnecessary procedures such as polypectomy along with the avoidable related adverse events[12]. On the other hand, the authors successfully presented the hurdles in traditional endoscopy imaging and provided explicit solutions through examples from the studies conducted by Fang et al[13] on super-resolution and Daher et al[14] on specular highlights.
FUTURE DIRECTIONS
For AI to be applied and achieve its maximum potential, clinicians must pursue training on this innovative technology while acknowledging the limitations and challenges created by the integration of AI[15]. Data on the knowledge, perceptions, and attitudes of endoscopists on the use of AI in endoscopy was presented in a systematic review[16], revealing an overall positive sentiment toward AI. Moreover, that same review determined that 92% of endoscopists believed that AI should become part of endoscopy training[16].
The use of AI is a solution for the shortage of expert endoscopists to act as training directors, optimizing the endoscopic training of novices. The study conducted by Zhang et al[17] suggested that AI-assisted training systems, specifically with real-time detection and characterization, can help novice endoscopists optimize specific tasks. The group trained with assistance from exhibited superior outcomes including reduced examination time, decreased blind spots, improved completeness of photodocumentation, and enhanced detection rates in specific anatomical areas. Although that study had a small number of participants, it serves as a starting point for confirming AI technology as a valuable tool in endoscopy training, facilitating skill development and enhancing overall endoscopist proficiency[17-19].
Moreover, the international bodies have provided itemized key performance measures to be adopted by all endoscopy services across Europe to ensure the standardization of practice across gastrointestinal endoscopy procedures. On the one hand, the list of key performance measures for upper gastrointestinal endoscopy (UGI) were: Appropriate indication; fasting instructions received; visibility score recorded; accurate photodocumentation; examination time 7 minutes; standardized terminology used; Seattle protocol used for Barrett’s esophagus (BE); management of precancerous conditions and lesions in the stomach protocol used for gastric precancerous assessment; complications recorded after therapeutic procedures; BE surveillance according to guidelines; gastric precancerous conditions surveillance according to guidelines; and minor performance measures, namely the time slot of 20 minutes allocated for UGI, observation time of 1 minute/cm and chromoendoscopy in BE inspection, chromoendoscopy in patients at risk for squamous cell carcinoma, and patient experience[20]. On the other hand, the ESGE and United European Gastroenterology provided a list of key performance measures in daily practice to be adopted by all endoscopy services across Europe, for lower gastrointestinal endoscopy, consisting of: Rate of adequate bowel preparation cecal intubation rate, adenoma detection rate, appropriate polypectomy technique, complication rate, patient experience and appropriate polypectomy surveillance recommendations[21]. Additionally, quality indicators were assessed in AI training. For lower gastrointestinal endoscopy, these included withdrawal time, cecal intubation rate, adequate bowel preparation rate, polyp detection rate; for UGI, these included photodocumented stomach site and inspection time[22].
Beyond this, feedback is a key factor for improving outcomes. Constructive and timely feedback accelerates skill acquisition. Therefore, the use of AI for feedback during simulated colonoscopies has been shown to improve trainee performance, lowering risks for patients. One study conducted by Huang et al[23] developed an AI-based system to assess red-out views during intubation in colonoscopy. When a red-out view appears, the tip of the endoscope is pressed to the mucosa. When this press is forceful, it can lead to colorectal perforation. AI can provide feedback for red-out view to facilitate safer colonoscopies. This is particularly valuable during training as novices build their skills and promote safety[24]. Training with simulators is effective for building skills that trainees can then use in real clinical situations.
AI also facilitates objective performance evaluations during training and promotes asynchronous learning that optimizes gastrointestinal training. The development of an AI-powered virtual mentor will further facilitate the asynchronous training and provide adaptative guidance, thereby eliminating the need for constant human expert guidance. Needless to say, funded programs and projects to sustain the aforementioned development are of big importance in ensuring long-term quality and sustainability. The American Society for Gastrointestinal Endoscopy[25] and ESGE[12] have provided position statements outlining priorities for AI in gastrointestinal endoscopy and the expected value of AI in gastrointestinal endoscopy, offering standardization.
Despite the promising results of AI in improving gastrointestinal endoscopy, the lack of multicenter trials with extended follow-up periods is an important limitation[17,26]. Multicenter trials are essential for validating the scalability, cost-effectiveness, and educational sustainability of the AI model. Moreover, data privacy and annotation quality hinder model training. A study conducted by Buendgens et al[27] reported that weakly supervised AI systems can achieve a high performance and maintain explainability in end-to-end image analysis in gastrointestinal endoscopy, showing that manual annotations do not necessarily bottleneck future clinical applications of AI.
CONCLUSION
While Ding et al[1] provided a valuable overview, a deeper analytical approach to the specifics and dataset limitations of the AI models is essential for advancing the field. We have emphasized herein the potential of AI to revolutionize endoscopist training and skill acquisition, a critical direction for the successful clinical integration of this innovative technology. Conclusively, the integration of robust data quality assurance mechanisms is crucial for achieving reliable, generalizable, and sustained high performance of AI-supported training outcomes.
Provenance and peer review: Unsolicited article; Externally peer reviewed.
Peer-review model: Single blind
Specialty type: Computer science, artificial intelligence
Country of origin: Italy
Peer-review report’s classification
Scientific Quality: Grade B
Novelty: Grade C
Creativity or Innovation: Grade C
Scientific Significance: Grade B
P-Reviewer: Turan B, MD, Assistant Professor, Researcher, Türkiye S-Editor: Hu XY L-Editor: A P-Editor: Xu J