BPG is committed to discovery and dissemination of knowledge
Basic Study
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastrointest Oncol. Nov 15, 2025; 17(11): 111670
Published online Nov 15, 2025. doi: 10.4251/wjgo.v17.i11.111670
Early cancer diagnosis via interpretable two-layer machine learning of plasma extracellular vesicle long RNA
Shi-Cai Liu, Han Zhang
Shi-Cai Liu, School of Medical Information, Wannan Medical College, Wuhu 241002, Anhui Province, China
Han Zhang, School of Basic Medical Sciences, Wannan Medical College, Wuhu 241002, Anhui Province, China
Co-corresponding authors: Shi-Cai Liu and Han Zhang.
Author contributions: Liu SC and Zhang H collected and analyzed the data, wrote the manuscript, and made equal contributions as co-corresponding authors; Liu SC supervised the project. Both authors have read and approved the final version to be published.
Supported by Talent Scientific Research Start-up Foundation of Wannan Medical College, No. WYRCQD2023045.
Institutional review board statement: This study did not involve human participants or animal subjects; therefore, neither Institutional Review Board nor Institutional Animal Care and Use Committee approval was required.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Data sharing statement: The data that support the findings of this study are available from the authors upon reasonable request.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Shi-Cai Liu, PhD, School of Medical Information, Wannan Medical College, No. 22 Wenchang West Road, Wuhu 241002, Anhui Province, China. liushicainj@163.com
Received: July 7, 2025
Revised: August 7, 2025
Accepted: October 9, 2025
Published online: November 15, 2025
Processing time: 131 Days and 14.2 Hours
Abstract
BACKGROUND

The early diagnosis rate of pancreatic ductal adenocarcinoma (PDAC) is low and the prognosis is poor. It is important to develop an interpretable noninvasive early diagnostic model in clinical practice.

AIM

To develop an interpretable noninvasive early diagnostic model for PDAC using plasma extracellular vesicle long RNA (EvlRNA).

METHODS

The diagnostic model was constructed based on plasma EvlRNA data. During the process of establishing the model, EvlRNA-index was introduced, and four algorithms were adopted to calculate EvlRNA-index. After the model was successfully constructed, performance evaluation was conducted. A series of bioinformatics methods were adopted to explore the potential mechanism of EvlRNA-index as the input feature of the model. And the relationship between key characteristics and PDAC were explored at the single-cell level.

RESULTS

A novel interpretable machine learning framework was developed based on plasma EvlRNA. In this framework, a two-layer classifier was established. A new concept was proposed: EvlRNA-index. Based on EvlRNA-index, a cancer diagnostic model was established, and a good diagnostic effect was achieved. The accuracy of PDACandCPvsHealth-Probabilistic PCA Index-SVM (PDAC and chronic pancreatitis vs health-probabilistic principal component analysis index-support vector machine) (1-18) was 91.51%, with Mathew’s correlation coefficient 0.7760 and area under the curve 0.9560. In the second layer of the model, the accuracy of PDACvsCP-Probabilistic PCA Index-RF (PDAC vs chronic pancreatitis-probabilistic principal component analysis index-random forest) (2-17) was 93.83%, with Mathew’s correlation coefficient 0.8422 and area under the curve 0.9698. Forty-nine PDAC-related genes were identified, among which 16 were known, inferring that the remaining ones were also PDAC-related genes.

CONCLUSION

An interpretable two-layer machine learning framework was proposed for early diagnosis and prediction of PDAC based on plasma EvlRNA, providing new insights into the clinical value of EvlRNA.

Keywords: Pancreatic ductal adenocarcinoma; Extracellular vesicle long RNA; Noninvasive early diagnosis; Interpretable machine learning; Two-layer classifier

Core Tip: The early diagnosis rate of pancreatic ductal adenocarcinoma is low and the prognosis is poor. It is important to develop an interpretable noninvasive early diagnostic model in clinical practice. In this study, an interpretable two-layer machine learning framework was proposed for the early diagnosis and prediction of pancreatic ductal adenocarcinoma based on plasma extracellular vesicle long RNA. This study provides new insights into the clinical value of extracellular vesicle long RNA for promoting the development of precision medicine.