BPG is committed to discovery and dissemination of knowledge
Review
Copyright ©The Author(s) 2025.
World J Gastroenterol. Sep 28, 2025; 31(36): 111137
Published online Sep 28, 2025. doi: 10.3748/wjg.v31.i36.111137
Figure 1
Figure 1 Organizational framework of the review. The diagram illustrates the logical flow of the manuscript, beginning with the clinical background of gastrointestinal diseases. It then progresses to the fundamental principles of convolutional neural networks and their specific applications in endoscopic examination, categorized into three main domains. The review concludes by addressing the key challenges and future perspectives for the clinical translation of these technologies. GI: Gastrointestinal; CNNs: Convolutional neural networks; AI: Artificial intelligence.
Figure 2
Figure 2 The basic architecture of a convolutional neural network. The architecture includes an input layer for receiving endoscopic image data, multiple convolutional layers for extracting spatial features, pooling layers to downsample and retain salient regions, fully connected layers to integrate high-level representations, and an output layer for classification or regression. Non-linear activations such as rectified linear unit are used between layers to introduce complexity.
Figure 3
Figure 3 Convolution and pooling diagram. A: Convolution operation with a 3 × 3 kernel resulting in a 4 × 4 output; B: Max-pooling operation using a 2 × 2 filter and stride 2 × 2, reducing feature map dimensions to 2 × 2. These operations illustrate how convolutional neural networks extract features and reduce spatial resolution while preserving critical information.
Figure 4
Figure 4 VGGNet-16 architecture. VGGNet comprises 16 weight layers and is characterized by repeated use of small (3 × 3) convolution filters and a uniform layer design. This deep and consistent architecture enhances its ability to capture hierarchical features, and laid the foundation for deeper networks like ResNet.
Figure 5
Figure 5 The U-Net architecture, widely used in medical image segmentation. The left side is the contracting path (encoder), which reduces spatial resolution while increasing feature complexity. The right side is the expansive path (decoder), which restores spatial resolution. Skip connections link corresponding encoder-decoder levels to retain spatial detail and improve segmentation accuracy.