BPG is committed to discovery and dissemination of knowledge
Retrospective Study
Copyright ©The Author(s) 2026. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastrointest Oncol. Feb 15, 2026; 18(2): 113959
Published online Feb 15, 2026. doi: 10.4251/wjgo.v18.i2.113959
Risk prediction for chronic atrophic gastritis using a random forest model: A multicenter study
Hui Cao, Jing-Lue Han, Hao Wu, Shu-Ping Si, Li-Jia Ding, Lin Ji, Hua-Zhen Zhang, Jie Yin, Zhi-Yi Zhou, Yu-Nan Zhang, Zhi-Fa Lv, Wen-Ying Tian, Qiang Zhan, Hui Wang, Fang-Mei An
Hui Cao, Jing-Lue Han, Shu-Ping Si, Li-Jia Ding, Lin Ji, Hua-Zhen Zhang, Jie Yin, Yu-Nan Zhang, Zhi-Fa Lv, Wen-Ying Tian, Qiang Zhan, Hui Wang, Fang-Mei An, Department of Gastroenterology, National Clinical Research Center for Digestive Diseases (Xi’an) Jiangsu Branch, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi Medical Center, Nanjing Medical University, Wuxi 214000, Jiangsu Province, China
Hao Wu, Department of Gastroenterology, Yixing Fifth People's Hospital, Yixing 214200, Jiangsu Province, China
Zhi-Yi Zhou, Departments of Pathology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi 214023, Jiangsu Province, China
Co-first authors: Hui Cao and Jing-Lue Han.
Co-corresponding authors: Hui Wang and Fang-Mei An.
Author contributions: Cao H and Han JL contributed equally as co-first author. Cao H was responsible for the design of the study, data collection, and the writing of a portion of the initial draft, while Han JL was responsible for data analysis, model construction, and the writing of the remaining sections of the initial draft; An FM and Wang H contributed equally to this article, they are the co-corresponding authors of this manuscript. They were responsible for the overall design of the project, as well as the proofreading of the manuscript and supervision of the project; An FM and Zhan Q provided financial support; Wu H, Si SP, Ding LJ, Ji L, Zhang HZ, Yin J, Zhou ZY, Zhang YN, Lv ZF, Tian WY, and Zhan Q were responsible for the collection and organization of data; all authors read and approved the final manuscript.
Supported by the Wuxi "Double Hundred" Young and Middle-aged Medical Talents Project, No. BJ2023008; the Wuxi Medical Center of Nanjing Medical University Special Disease Cohort and Clinical Research Project, No. WMCC202502; the Wuxi Medical Center of Nanjing Medical University Key Project, No. WMCM202501; and the Jiangsu Branch of the National Clinical Research Center for Digestive Diseases, No. JSZX202301.
Institutional review board statement: The research protocol was reviewed and approved by the Research Ethics Committee of Wuxi People's Hospital (Approval No. KY23001) and registered with the Chinese Clinical Trial Registry (ChiCTR2400085856).
Informed consent statement: All participants provided written informed consent after being fully informed of the study's purpose, procedures, risks, and rights.
Conflict-of-interest statement: The authors declare that they have no competing interests.
Data sharing statement: The datasets generated and analysed during the current study are not publicly available due to privacy or ethical restrictions but are available from the corresponding author on reasonable request.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Fang-Mei An, MD, Associate Chief Physician, Associate Professor, Department of Gastroenterology, National Clinical Research Center for Digestive Diseases (Xi’an) Jiangsu Branch, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi Medical Center, Nanjing Medical University, No. 299 Qingyang Road, Liangxi District, Wuxi 214000, Jiangsu Province, China. fangmeian@njmu.edu.cn
Received: September 8, 2025
Revised: November 9, 2025
Accepted: December 11, 2025
Published online: February 15, 2026
Processing time: 148 Days and 13.3 Hours
Abstract
BACKGROUND

Chronic atrophic gastritis (CAG) is a significant precancerous condition of gastric cancer (GC). CAG often lacks typical symptoms in its early stages, and clinical diagnosis relies on gastroscopy and pathological examination, which are invasive and have limitations such as poor patient compliance. Therefore, developing a noninvasive, simple, and generalizable prediction tool is crucial for the early identification of CAG.

AIM

To construct and validate a CAG risk prediction model to achieve noninvasive and accurate identification of high-risk patients.

METHODS

This study included 1268 subjects from a GC screening program. Multimodal data, including serological marker, demographic, lifestyle, and family history data, were collected. Subjects were grouped by pathological biopsy results. Least absolute shrinkage and selection operator regression was used for feature selection. A model was constructed using the random forest algorithm, evaluated with metrics such as the area under the curve (AUC), and interpreted using the SHapley Additive exPlanation (SHAP) method. The model was validated in an independent external cohort, and a web-based prediction platform was developed using Shiny.

RESULTS

Six key features were ultimately included: Age, Helicobacter pylori (H. pylori) infection status, pepsinogen I/II ratio (PGR), smoking history, alcohol consumption history, and family history of GC. The model achieved AUCs of 0.8542 and 0.8073 in the training and testing sets, respectively, and an AUC of 0.8505 in the external validation cohort, demonstrating good generalizability and stability. SHAP analysis indicated that H. pylori infection, age, and PGR were the most important variables influencing CAG risk. The final model was successfully embedded into a web-based platform for convenient clinical application.

CONCLUSION

The random forest-based CAG prediction model is a highly accurate and interpretable tool with significant clinical utility in early screening and identifying high-risk patients.

Keywords: Chronic atrophic gastritis; Machine learning; Risk prediction; Gastric cancer screening; Random forest

Core Tip: This study addresses the need for a noninvasive method to screen for chronic atrophic gastritis (CAG), a key precancerous condition of gastric cancer (GC). We developed and validated a random forest machine learning model using data from 1268 subjects. The model accurately predicts CAG risk using six easily obtainable features: Helicobacter pylori infection status, age, pepsinogen ratio, smoking history, alcohol use history, and family history of GC. The model demonstrated high accuracy and generalizability (area under the curve > 0.85). A user-friendly web calculator was created for clinical application, providing a practical tool for the early identification of high-risk individuals.