Early cancer diagnosis via interpretable two-layer machine learning of plasma extracellular vesicle long RNA

doi:10.4251/wjgo.v17.i11.111670

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 17, Issue 11

This Article

(38)

(59)

(0)

(7)

(555)

Table of Contents

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Journal Information of This Article

Publication Name

World Journal of Gastrointestinal Oncology

ISSN

1948-5204

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Basic Study Open Access

World J Gastrointest Oncol. Nov 15, 2025; 17(11): 111670
Published online Nov 15, 2025. doi: 10.4251/wjgo.v17.i11.111670

Early cancer diagnosis via interpretable two-layer machine learning of plasma extracellular vesicle long RNA

Shi-Cai Liu, Han Zhang

Shi-Cai Liu, School of Medical Information, Wannan Medical College, Wuhu 241002, Anhui Province, China

Han Zhang, School of Basic Medical Sciences, Wannan Medical College, Wuhu 241002, Anhui Province, China

ORCID number: Shi-Cai Liu (0000-0003-1270-4729); Han Zhang (0009-0002-1010-5909).

Co-corresponding authors: Shi-Cai Liu and Han Zhang.

Author contributions: Liu SC and Zhang H collected and analyzed the data, wrote the manuscript, and made equal contributions as co-corresponding authors; Liu SC supervised the project. Both authors have read and approved the final version to be published.

Supported by Talent Scientific Research Start-up Foundation of Wannan Medical College, No. WYRCQD2023045.

Institutional review board statement: This study did not involve human participants or animal subjects; therefore, neither Institutional Review Board nor Institutional Animal Care and Use Committee approval was required.

Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.

Data sharing statement: The data that support the findings of this study are available from the authors upon reasonable request.

Corresponding author: Shi-Cai Liu, PhD, School of Medical Information, Wannan Medical College, No. 22 Wenchang West Road, Wuhu 241002, Anhui Province, China. liushicainj@163.com

Received: July 7, 2025
Revised: August 7, 2025
Accepted: October 9, 2025
Published online: November 15, 2025
Processing time: 131 Days and 14.2 Hours

Abstract

BACKGROUND

The early diagnosis rate of pancreatic ductal adenocarcinoma (PDAC) is low and the prognosis is poor. It is important to develop an interpretable noninvasive early diagnostic model in clinical practice.

AIM

To develop an interpretable noninvasive early diagnostic model for PDAC using plasma extracellular vesicle long RNA (EvlRNA).

METHODS

The diagnostic model was constructed based on plasma EvlRNA data. During the process of establishing the model, EvlRNA-index was introduced, and four algorithms were adopted to calculate EvlRNA-index. After the model was successfully constructed, performance evaluation was conducted. A series of bioinformatics methods were adopted to explore the potential mechanism of EvlRNA-index as the input feature of the model. And the relationship between key characteristics and PDAC were explored at the single-cell level.

RESULTS

A novel interpretable machine learning framework was developed based on plasma EvlRNA. In this framework, a two-layer classifier was established. A new concept was proposed: EvlRNA-index. Based on EvlRNA-index, a cancer diagnostic model was established, and a good diagnostic effect was achieved. The accuracy of PDACandCPvsHealth-Probabilistic PCA Index-SVM (PDAC and chronic pancreatitis vs health-probabilistic principal component analysis index-support vector machine) (1-18) was 91.51%, with Mathew’s correlation coefficient 0.7760 and area under the curve 0.9560. In the second layer of the model, the accuracy of PDACvsCP-Probabilistic PCA Index-RF (PDAC vs chronic pancreatitis-probabilistic principal component analysis index-random forest) (2-17) was 93.83%, with Mathew’s correlation coefficient 0.8422 and area under the curve 0.9698. Forty-nine PDAC-related genes were identified, among which 16 were known, inferring that the remaining ones were also PDAC-related genes.

CONCLUSION

An interpretable two-layer machine learning framework was proposed for early diagnosis and prediction of PDAC based on plasma EvlRNA, providing new insights into the clinical value of EvlRNA.

Key Words: Pancreatic ductal adenocarcinoma; Extracellular vesicle long RNA; Noninvasive early diagnosis; Interpretable machine learning; Two-layer classifier

Core Tip: The early diagnosis rate of pancreatic ductal adenocarcinoma is low and the prognosis is poor. It is important to develop an interpretable noninvasive early diagnostic model in clinical practice. In this study, an interpretable two-layer machine learning framework was proposed for the early diagnosis and prediction of pancreatic ductal adenocarcinoma based on plasma extracellular vesicle long RNA. This study provides new insights into the clinical value of extracellular vesicle long RNA for promoting the development of precision medicine.

Citation: Liu SC, Zhang H. Early cancer diagnosis via interpretable two-layer machine learning of plasma extracellular vesicle long RNA. World J Gastrointest Oncol 2025; 17(11): 111670
URL: https://www.wjgnet.com/1948-5204/full/v17/i11/111670.htm
DOI: https://dx.doi.org/10.4251/wjgo.v17.i11.111670

INTRODUCTION

Pancreatic cancer is a type of malignant neoplasm that primarily arises from the pancreatic duct epithelium and acinar cells. This cancer is highly aggressive. Its onset is insidious, making early diagnosis a challenging task. The disease progresses at a rapid pace, and patients typically have a short survival period[1]. Pancreatic cancer is regarded as one of the most poorly prognosticated malignant tumors and is often referred to as “the king of cancers”. Pancreatic ductal adenocarcinoma (PDAC) is a tumor occurring in the ductal epithelium of the pancreas, which is the main type of pancreatic cancer, accounting for > 90% of cases[2]. The onset of PDAC is insidious and highly aggressive, and most patients are diagnosed at an advanced stage[3]. Although early-stage cancers can be effectively treated with surgery and radiation, late-stage cancers often cannot be controlled. The emergence of cancer is a multifactor, multistage, complex and progressive process. In the process of disease progression, early screening, early diagnosis, early treatment, and the management of cancer as a chronic disease are the most effective ways to improve the cure rate, reduce the pain of treatment, improve prognosis and reduce the economic burden. Therefore, early diagnosis is crucial for the successful treatment of cancer.

Although some existing techniques have been applied to the early diagnosis of PDAC, the overall effect is still far from expectations. For example, although carbohydrate antigen 19-9 (CA19-9) level is helpful for the prediction and efficacy judgment of pancreatic cancer, its sensitivity and specificity are low [sensitivity of 75.4% and a specificity of 77.6% for differentiation between malignant and non-malignant forms of cancer; the specificity of distinction between PDAC and chronic pancreatitis (CP) often does not exceed 60%][4,5]. CA19-9 levels may also be elevated in cases of biliary tract infection, cholangitis, bile duct obstruction, or jaundice[6-8]. For Lewis antigen negative patients, CA19-9 levels usually do not increase[9]. Therefore, there is an urgent need to find new diagnostic biomarkers for PDAC, especially those liquid biopsy biomarkers suitable for early detection and diagnosis of PDAC.

Plasma extracellular vesicles (EVs)are one of the important materials for liquid biopsy. EVs, which include exosomes and microvesicles, are a special class of membrane-like, nanosized endocytotic vesicles secreted by most cell types. EVs contain a variety of molecular components (including RNA, proteins, lipids, and metabolites) that reflect the type of cell from which they are derived[10]. Initially, EVs were considered to be cellular waste. However, at present, EVs are being more widely acknowledged as crucial mediators for intercellular communication and as biomarkers for disease. EVs are associated with most pathological conditions, including cancer[11], and cardiovascular[12], neurological[13], and infectious[14] diseases. EVs contain and stabilize various types of RNA[15]. In EVs, microRNA (miRNA) has been well characterized and investigated[16]. Nevertheless, the small number of miRNA in EVs and the lack of specificity in their production limit their wide application. More evidence shows that plasma EV long RNA (EvlRNA), including mRNA, long non-coding RNA (lncRNA) and circular RNA, have functional and clinical significance[17,18]. For example, androgen receptor splice variant 7 can be detected in the blood EVs of patients with castration-resistant prostate cancer and can be used as a predictive biomarker of hormone therapy resistance[19]. In patients with melanoma and non-small cell lung cancer, CD274 mRNA in plasma-derived EVs is associated with anti-programmed death-1 antibody response[20]. Nabet et al[21] found that unshielded RN7SL1 can be transferred into breast cancer cells via EVs and activate the pattern recognition receptor retinoic acid-inducible gene I, promoting cancer invasion. These findings reveal that plasma EVs are rich in a large number of valuable and functional EvlRNAs. Therefore, it is feasible to identify tumor-specific genes in the plasma EvlRNA library for early cancer diagnosis. This is a non-invasive strategy for early diagnosis, detection, and treatment evaluation of human cancer.

In recent years, with the increasing availability of clinical data to support diagnosis and prognosis, advances in science and technology have made it possible to study cancer using high-throughput biomedical data. However, due to the complexity of cancer, clinical bioinformatic analysis and genetic interpretation pose challenges, and these data need to be truly explored. At present, most of the machine learning algorithms, especially the complex algorithms that rely on neural networks, although they have good classification effect, it is difficult to mine the knowledge learned during the model training process, which is a “black box” model[22,23]. For biological problems, it is not only necessary to build a high-performance diagnostic model to meet the prediction needs, but also to find out the rules used in the training process of the model, visualize the corresponding important feature weights, and explore the close correlation between important features and actual biological processes, so as to provide help for biomedical researchers to understand the model. Therefore, how to design a model with interpretability and visualize the corresponding important feature weights is a hot and difficult topic in the field of machine learning in recent years.

In view of these, we propose an interpretable machine learning framework for early diagnostic prediction of PDAC based on plasma EvlRNA. In this framework, we combine our previous research ideas[24] to establish a two-layer classifier. The first layer identified normal and non-normal samples, and the second layer identified whether the samples belonged to PDAC or CP. In this study, a new concept, EvlRNA-index, was proposed. Based on EvlRNA-index, a cancer diagnosis model was established, and a good diagnostic effect was achieved. In this study, the interpretability of the entire machine learning framework was studied and explored, and the close correlation between important features and actual biological processes was explored, in order to provide important help for biomedical researchers to understand the model.

MATERIALS AND METHODS

Dataset

The dataset S used in this study was collected from the research of Yu et al[25], which was derived from multiple centers. The dataset can be formulated as S = S^non-normal ∪ S^normal, where S^non-normal is the non-normal dataset consisting of PDAC and CP samples, S^normal the normal dataset with normal samples only, and ∪ is the symbol for union in the set theory. The non-normal dataset can be further classified into two categories: S^non-normal = S₁^non-normal ∪ S₂^non-normal, where the subscripts 1 and 2 represent PDAC and CP. The dataset S was taken from next generation sequencing of plasma EvlRNA. The dataset had 501 patient records in total, comprising 284 patients with PDAC, 100 with CP and 117 healthy subjects (Table 1). For each record, we obtained 54148 explanatory variables and one response variable. These plasma EvlRNA expression data have been standardized by transcripts per kilobase million.

Table 1 Dataset information.

Sample	Dataset	Type	Number of samples
Sample	Dataset	Type	Male	Female	Both
Non-normal	S₁^non-normal	Pancreatic ductal adenocarcinoma	167	117	284
Non-normal	S₂^non-normal	Chronic pancreatitis	73	27	100
Normal	S^normal	Healthy	71	46	117

Open in New Tab Full Size Table

Feature selection

The selection of features is important for the effective establishment of the model. We use mean decrease in accuracy (MDA) combination of DEseq2 and edger methods to screen important features. Figure 1 showed the specific feature selection method flow. MDA represented the average decrease of classification accuracy on the “out of bag” samples when the values of a particular feature were randomly permuted. MDA was calculated using the randomForest package[26] in R (http://cran.r-project.org//). DEseq2 and edger were implemented by DEseq2 package[27] and edger package[28], respectively.

Open in New Tab Full Size Figure Download Figure

Figure 1 Feature selection method flowchart. EvlRNA: Extracellular vesicle long RNA; PDAC: Pancreatic ductal adenocarcinoma; CP: Chronic pancreatitis.

EvlRNA-index calculation of plasma EvlRNA samples

In order to calculate the EvlRNA-index, we used four algorithms to calculate each sample (Figure 2). Each sample was scored by 10 EvlRNA-index. Finally, each sample will generate a 1 × 10 matrix, which is equivalent to 10 EvlRNA-index information (EvlRNA-index1, EvlRNA-index2, EvlRNA-index3, …, EvlRNA-index8, EvlRNA-index9, EvlRNA-index10). The four algorithms used were singular value decomposition (SVD) principal component analysis (PCA), nonlinear iterative partial least squares (Nipals) PCA, probabilistic PCA and FastHCS (high-dimensional congruent subsets).

Open in New Tab Full Size Figure Download Figure

Figure 2 Extracellular vesicle long RNA-index calculation of plasma extracellular vesicle long RNA samples. EvlRNA: Extracellular vesicle long RNA; SVD: Singular value decomposition; PCA: Principal component analysis; FastHCS: High-dimensional congruent subsets; Nipals: Nonlinear iterative partial least squares.

SVD PCA is a conventional PCA algorithm[29]. Nipals[30] is an algorithm at the root of partial least squares regression which can execute PCA with missing values by simply leaving those out from the appropriate inner products. It is tolerant to small amounts (generally not more than 5%) of missing data. Probabilistic PCA[31] combined an expectation maximization approach for PCA with a probabilistic model. FastHCS[32] is a robust PCA algorithm suitable for high-dimensional applications, including cases where the number of variables exceeded the number of observations. SVD PCA, Nipals PCA and probabilistic PCA were implemented through the pcaMethods package. FastHCS was implemented through the FastHCS package.

Two-layer classifier

To enable the first layer classifier to distinguish samples as either normal or non-normal, we selected four machine learning approaches: Support vector machine (SVM), random forest (RF), deep learning (DL), and extreme gradient boosting (XGBoost) to construct our first layer classifier. In R, the implementation of SVM was carried out with the e1071 package (http://cran.r-project.org//). The randomForest package in R was used for implementing RF. For DL, we relied on the h2o package in R, and the XGBoost package in R was used to implement XGBoost. For the first layer prediction, we used SVM, RF, DL, or XGBoost as the basic classifier due to its performance. The second layer classifier identified whether the sample belonged to PDAC or CP. We choose four machine learning methods: SVM, RF, DL and XGBoost to implement our second layer classifier. For the second layer prediction, we used SVM, RF, DL or XGBoost as the basic classifier due to their performance. Figure 3 shows the diagnostic model construction flowchart.

Open in New Tab Full Size Figure Download Figure

Figure 3 Diagnostic model construction flowchart. NGS: Next-generation sequencing; EvlRNA: Extracellular vesicle long RNA; PDAC: Pancreatic ductal adenocarcinoma; CP: Chronic pancreatitis.

Evaluating performance

After the models were prepared, the performance of the classifier was evaluated based on sensitivity, specificity, accuracy, and Mathew’s Correlation Coefficient (MCC). The calculation formulas for these four metrics were detailed in the Supplementary material. To compare the overall performance of various models, the area under the receiver operating characteristic (ROC) curve (AUC), which ranged from 0 to 1, was calculated based on the ROC curve. The ROC curve depicted the relationship between the true positive rate and false positive rate. Specifically, the AUC represented the likelihood that a randomly chosen real target had a higher rank than a randomly chosen decoy target. A higher AUC value indicated better predictive performance of the model. The five-fold cross validation method was used to assess the performance of the model. To evaluate the performance of our models, we established an independent dataset. This dataset consisted of 35 normal samples randomly selected from the 117 normal samples and 115 non-normal samples randomly selected from the 384 non-normal samples. These samples were not used in the training, feature selection, or parameter optimization processes of the model.

Data exploration and functional analysis

The t-distributed stochastic neighbor embedding (t-SNE) analysis is implemented through the Rtsne package in R, and PCA through the FactoMineR package in R. The biological significance of the gene was determined through the functional enrichment analysis of DAVID[33]. The P value optimized by Benjamin Hochberg took < 0.05 as the critical value. Protein-protein interaction analysis was implemented through the STRING database (https://string-db.org/)[34]. Pathway analysis was implemented through the Reactome Knowledgebase (https://reactome.org)[35]. The single-cell RNA expression analysis was implemented using TISCH2 (Tumor Immune Single-cell Hub 2, http://tisch.comp-genomics.org/) which is a single-cell RNA expression database that focuses on the tumor microenvironment[36].

Survival outcome assessments, including overall survival (OS) and disease-free survival (DFS), were conducted through Kaplan-Meier survival curves complemented by log-rank test comparisons. Cohort stratification according to gene expression levels (high vs low) was determined using median expression values as the cutoff criterion. Statistical calculations encompassed hazard ratio quantification with corresponding 95% confidence intervals. All analytical procedures were executed via the GEPIA bioinformatics platform (http://gepia.cancer-pku.cn/)[37].

RESULTS

Identification of potential biomarkers

Figure 1 shows the specific identification process of potential biomarkers. The EvlRNA differences between PDAC and CP, PDAC and normal, CP and normal were analyzed by DEseq2 (false discovery rate < 0.05, log₂fold change > 0.5), and the sets PDACvsCP_By_DEseq2, PDACvsNormal_By_DEseq2 and CPvsNormal_By_DEseq2 were obtained. The EvlRNA differences between PDAC and CP, PDAC and normal, CP and normal were also analyzed by edger (false discovery rate < 0.05, log₂fold change > 0.5), and the sets PDACvsCP_By_edger, PDACvsNormal_By_edger and CPvsNormal_By_edger were obtained. The intersection of the results obtained by DEseq2 and edger was taken in turn.

To obtain cancer-associated EvlRNA biomarkers, these EvlRNA were combined with RNA profiles from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) (178 PDAC tissue dataset and 171 normal pancreatic tissue dataset). PDAC and CP obtained 1623 differential genes (set 1), PDAC and normal obtained 1376 differential genes (set 2), and CP and normal obtained 326 differential genes (set 3) (Figure 1). Set 1 and set 2 contained seven genes finally screened Yu et al[25], indicating the reliability of our analysis results. The three sets were further intersected and 96 genes were obtained. Finally, MDA was used for feature selection. After screening, 20 genes selected by MDA were used for further analysis (Figure 4A). These genes contain well-known cancer-related genes, such as S100A9[38], P2RX1[39], and PTPRJ[40]. Heat map visualization showed that there were significant differences in the expression of PDAC, CP and normal samples in EvlRNA (Figure 4B). However, the classification effect of the three types of samples in unsupervised learning is not ideal. Figure 4C shows the results of PCA and t-SNE analysis of 20 identified genes based on EvlRNA expression. PCA and t-SNE analysis were also performed on the eight genes screened in the study of Yu et al[25], which was similar to the analysis in this study (Figure 4D). Figure 4E shows the expression differences of these 20 genes in pancreatic cancer, both significantly up-regulated and significantly down-regulated, while the eight genes screened in Yu et al’s study[25] were all significantly up-regulated. Gene Ontology (GO) analysis of these genes showed that apoptotic process (GO: 0006915), regulation of apoptotic signaling pathway (GO: 2001233), regulation of apoptotic process (GO: 0042981), positive regulation of apoptotic signaling pathway (GO: 2001235), and negative regulation of apoptotic process (GO: 0043066) were significantly enriched, which means that these genes play a key role in regulating apoptotic signaling pathway (Figure 4F). Other enriched GO entries were associated with programmed cell death (GO: 0012501), regulation of growth (GO: 0040008), cell death (GO: 0008219), positive regulation of tumor necrosis factor production (GO: 0032760), regulation of programmed cell death (GO: 0043067), and inflammatory response (GO: 0006954) (Figure 4F), which play an important role in the development of cancer.

Open in New Tab Full Size Figure Download Figure

Figure 4 Analysis of potential biomarkers and visualization. A: Feature (gene set) selection with mean decrease in accuracy; B: Heatmap analysis of the selected biomarkers (20 genes); C: Principal component analysis and t-distributed stochastic neighbor embedding analysis based on the extracellular vesicle long RNA expression of the selected biomarkers (20 genes); D: Principal component analysis and t-distributed stochastic neighbor embedding analysis based on the extracellular vesicle long RNA expression of the eight genes screened by Yu et al[25]; E: Gene expression analysis of potential biomarkers was performed using RNA-seq data from the The Cancer Genome Atlas and Genotype-Tissue Expression databases, which included 178 pancreatic ductal adenocarcinoma tissue samples and 171 normal pancreatic tissue samples; F: Gene Ontology analysis of 20 genes (potential biomarkers). PDAC: Pancreatic ductal adenocarcinoma; CP: Chronic pancreatitis; PCA: Principal component analysis; t-SNE: T-distributed stochastic neighbor embedding; FPKM: Fragments per kilobase of exon per million mapped fragments; GO: Gene Ontology; RAGE: Receptor for advanced glycation end products; BP: Biological process; CC: Cellular component; MF: Molecular function.

Analysis and visualization of EvlRNA-index

After calculating the EvlRNA-index (Figure 2), to study the relationship between these indexes, we conducted a correlation analysis using the corrplot package. Figure 5A shows the correlation analysis of EvlRNA-index calculated based on SVD PCA algorithm. Figure 5B shows the correlation analysis of EvlRNA-index calculated based on Nipals PCA algorithm. Figure 5C shows the correlation analysis of EvlRNA-index calculated based on probabilistic PCA algorithm. Figure 5D shows the correlation analysis of EvlRNA-index calculated based on FastHCS algorithm. The indexes were not correlated with each other, indicating that they were not redundant as input features of the model. We visualized the top three indexes for each sample, showing that the naked eye alone cannot accurately distinguish between PDAC/CP and normal samples (Figure 5E-H).

Open in New Tab Full Size Figure Download Figure

Figure 5 Correlation analysis and visualization of extracellular vesicle long RNA-index calculated by different algorithms. A: Correlation analysis of extracellular vesicle long RNA (EvlRNA)-index calculated based on singular value decomposition principal component analysis (PCA) algorithm; B: Correlation analysis of EvlRNA-index calculated based on nonlinear iterative partial least squares PCA algorithm; C: Correlation analysis of EvlRNA-index calculated based on Probabilistic PCA algorithm; D: Correlation analysis of EvlRNA-index calculated based on FastHCS algorithm; E: Visualization of EvlRNA-index calculated by singular value decomposition PCA algorithm; F: Visualization of EvlRNA-index calculated by nonlinear iterative partial least squares PCA algorithm; G: Visualization of EvlRNA-index calculated by Probabilistic PCA algorithm; H: Visualization of EvlRNA-index calculated by FastHCS algorithm. SVD: Singular value decomposition; PCA: Principal component analysis; EvlRNA: Extracellular vesicle long RNA; PDAC: Pancreatic ductal adenocarcinoma; CP: Chronic pancreatitis; FastHCS: High-dimensional congruent subsets; Nipals: Nonlinear iterative partial least squares.

First classifier-identifying normal or non-normal

The first layer of the classifier identifies whether the sample is normal or non-normal. Using RF, SVM, DL, and XGBoost, each algorithm creates the first-layer models using biomarkers selected based on MDA, EvlRNA-index calculated based on conventional_SVD_PCA, EvlRNA-index calculated based on FastHCS, EvlRNA-index calculated based on Nipals_PCA, and EvlRNA-index calculated based on Probabilistic_PCA, respectively. Table 2 shows the performance of the first-layer models. The accuracy of PDACandCPvsHealth-MDA-SVM (1-2) was 90.57%, with sensitivity 93.83%, specificity 80.00%, MCC 0.7383, and AUC 0.9146. The accuracy of PDACandCPvsHealth-SVD PCA Index-RF (1-5) was 96.23%, with sensitivity 100.00%, specificity 84.00%, MCC 0.8947, and AUC 0.9901. The accuracy of PDACandCPvsHealth-SVD PCA Index-XGB (1-8) was 91.51%, with sensitivity 98.77%, specificity 68.00%, MCC 0.7549, and AUC 0.9294. The accuracy of PDACandCPvsHealth-Nipals PCA Index-DL (1-15) was 90.57%, with sensitivity 95.06%, specificity 76.00%, MCC 0.7319, and AUC 0.9679. The accuracy of PDACandCPvsHealth-Nipals PCA Index-XGB (1-16) was 91.51%, with sensitivity 98.77%, specificity 68.00%, MCC 0.7549, and AUC 0.9294. The accuracy of PDACandCPvsHealth-Probabilistic PCA Index-RF (1-17) was 93.40%, with sensitivity 96.30%, specificity 84.00%, MCC 0.8145, and AUC 0.9763. The accuracy of PDACandCPvsHealth-Probabilistic PCA Index-SVM (1-18) was 91.51%, with sensitivity 92.59%, specificity 88.00%, MCC 0.7760, and AUC 0.9560. In the first layer of the classifier, the performance of the model based on EvlRNA-index was better than that of the model based on the genes screened by MDA. Among all models, PDACandCPvsHealth-SVD PCA Index-RF (1-5) showed the best performance effect, with accuracy 96.23%, MCC 0.8947, and AUC 0.9901 (Table 2). However, in independent datatest and five-fold cross-validation, the prediction ability of PDACandCPvsHealth-SVD PCA Index-RF (1-5) was poor (Tables 3 and 4). Among all models, the PDACandCPvsHealth-Probabilistic PCA Index-SVM (1-18) performed best in terms of internal stability and external predictability with accuracy 91.51%, MCC 0.7760, and AUC 0.9560 (on training dataset) (Table 2, Figure 6A), and accuracy 93.33%, MCC 0.8137, and AUC 0.9717 (on the independent dataset) (Table 3, Figure 6B). In the five-fold cross-validation, accuracy was 91.19%, MCC 0.7430, and AUC 0.9389 (Table 4).

Open in New Tab Full Size Figure Download Figure

Figure 6 Receiver operating characteristic curve of pancreatic ductal adenocarcinoma and chronic pancreatitis vs health-probabilistic principal component analysis index-support vector machine (1-18). A: Based on training dataset; B: Based on the independent dataset. AUC: Area under the receiver operating characteristic curve.

Table 2 Performance of the first-layer models on training dataset.

Models	ACC, %	Sen, %	Sp, %	MCC	AUC
PDACandCPvsHealth-MDA-RF (1-1)	88.68	96.30	64.00	0.6674	0.9328
PDACandCPvsHealth-MDA-SVM (1-2)	90.57	93.83	80.00	0.7383	0.9146
PDACandCPvsHealth-MDA-DL (1-3)	87.74	88.89	84.00	0.6869	0.9151
PDACandCPvsHealth-MDA-XGB (1-4)	87.74	95.06	64.00	0.6408	0.9304
PDACandCPvsHealth-SVD PCA Index-RF (1-5)	96.23	100.00	84.00	0.8947	0.9901
PDACandCPvsHealth-SVD PCA Index-SVM (1-6)	85.85	95.06	56.00	0.5773	0.9200
PDACandCPvsHealth-SVD PCA Index-DL (1-7)	89.62	90.12	88.00	0.7363	0.9580
PDACandCPvsHealth-SVD PCA Index-XGB (1-8)	91.51	98.77	68.00	0.7549	0.9294
PDACandCPvsHealth-FastHCS Index-RF (1-9)	89.62	96.30	68.00	0.6976	0.9462
PDACandCPvsHealth-FastHCS Index-SVM (1-10)	88.68	96.30	64.00	0.6674	0.9230
PDACandCPvsHealth-FastHCS Index-DL (1-11)	82.08	88.89	60.00	0.4959	0.8993
PDACandCPvsHealth-FastHCS Index-XGB (1-12)	85.85	93.83	60.00	0.5840	0.9407
PDACandCPvsHealth-Nipals PCA Index-RF (1-13)	89.62	98.77	60.00	0.6969	0.9059
PDACandCPvsHealth-Nipals PCA Index-SVM (1-14)	89.62	97.53	64.00	0.6957	0.9294
PDACandCPvsHealth-Nipals PCA Index-DL (1-15)	90.57	95.06	76.00	0.7319	0.9679
PDACandCPvsHealth-Nipals PCA Index-XGB (1-16)	91.51	98.77	68.00	0.7549	0.9294
PDACandCPvsHealth-Probabilistic PCA Index-RF (1-17)	93.40	96.30	84.00	0.8145	0.9763
PDACandCPvsHealth-Probabilistic PCA Index-SVM (1-18)	91.51	92.59	88.00	0.7760	0.9560
PDACandCPvsHealth-Probabilistic PCA Index-DL (1-19)	88.68	90.12	84.00	0.7059	0.9462
PDACandCPvsHealth-Probabilistic PCA Index-XGB (1-20)	86.79	93.83	64.00	0.6159	0.9007

ACC: Accuracy; Sen: Sensitivity; Sp: Specificity; MCC: Mathew’s correlation coefficient; AUC: Area under the receiver operating characteristic curve; PDAC: Pancreatic ductal adenocarcinoma; CP: Chronic pancreatitis; MDA: Mean decrease in accuracy; RF: Random forest; SVM: Support vector machine; DL: Deep learning; XGB: Extreme gradient boosting; SVD: Singular value decomposition; PCA: Principal component analysis; FastHCS: High-dimensional congruent subsets; Nipals: Nonlinear iterative partial least squares.

Open in New Tab Full Size Table

Table 3 Performance of the first-layer models on independent dataset.

Models	ACC, %	Sen, %	Sp, %	MCC	AUC
PDACandCPvsHealth-MDA-RF (1-1)	91.33	98.26	68.57	0.7467	0.9606
PDACandCPvsHealth-MDA-SVM (1-2)	87.33	94.78	62.86	0.6257	0.9409
PDACandCPvsHealth-MDA-DL (1-3)	84.00	87.83	71.43	0.5714	0.9153
PDACandCPvsHealth-MDA-XGB (1-4)	90.67	96.52	71.43	0.7278	0.9595
PDACandCPvsHealth-SVD PCA Index-RF (1-5)	90.67	97.39	68.57	0.7262	0.9288
PDACandCPvsHealth-SVD PCA Index-SVM (1-6)	88.67	96.52	62.86	0.6635	0.9198
PDACandCPvsHealth-SVD PCA Index-DL (1-7)	85.33	86.09	82.86	0.6363	0.9190
PDACandCPvsHealth-SVD PCA Index-XGB (1-8)	90.67	98.26	65.71	0.7261	0.9160
PDACandCPvsHealth-FastHCS Index-RF (1-9)	84.00	93.91	51.43	0.5146	0.8737
PDACandCPvsHealth-FastHCS Index-SVM (1-10)	87.33	94.78	62.86	0.6257	0.9155
PDACandCPvsHealth-FastHCS Index-DL (1-11)	84.67	90.43	65.71	0.5672	0.9016
PDACandCPvsHealth-FastHCS Index-XGB (1-12)	86.67	93.91	62.86	0.6080	0.9058
PDACandCPvsHealth-Nipals PCA Index-RF (1-13)	90.67	98.26	65.71	0.7261	0.8961
PDACandCPvsHealth-Nipals PCA Index-SVM (1-14)	90.67	99.13	62.86	0.7276	0.9324
PDACandCPvsHealth-Nipals PCA Index-DL (1-15)	88.00	91.30	77.14	0.6716	0.9093
PDACandCPvsHealth-Nipals PCA Index-XGB (1-16)	90.67	98.26	65.71	0.7261	0.9160
PDACandCPvsHealth-Probabilistic PCA Index-RF (1-17)	91.33	96.52	74.29	0.7487	0.9165
PDACandCPvsHealth-Probabilistic PCA Index-SVM (1-18)	93.33	95.65	85.71	0.8137	0.9717
PDACandCPvsHealth-Probabilistic PCA Index-DL (1-19)	89.33	92.17	80.00	0.7081	0.9347
PDACandCPvsHealth-Probabilistic PCA Index-XGB (1-20)	89.33	93.04	77.14	0.7019	0.9426

Open in New Tab Full Size Table

Table 4 The results of the first-layer models five-fold cross-validation.

Models	ACC, %	MCC	AUC
PDACandCPvsHealth-MDA-RF (1-1)	88.61	0.6710	0.9427
PDACandCPvsHealth-MDA-SVM (1-2)	85.74	0.5850	0.9059
PDACandCPvsHealth-MDA-DL (1-3)	83.19	0.5267	0.8910
PDACandCPvsHealth-MDA-XGB (1-4)	89.17	0.6858	0.9475
PDACandCPvsHealth-SVD PCA Index-RF (1-5)	88.90	0.6724	0.9288
PDACandCPvsHealth-SVD PCA Index-SVM (1-6)	92.31	0.7802	0.9760
PDACandCPvsHealth-SVD PCA Index-DL (1-7)	89.72	0.7130	0.9471
PDACandCPvsHealth-SVD PCA Index-XGB (1-8)	88.89	0.6753	0.9472
PDACandCPvsHealth-FastHCS Index-RF (1-9)	84.60	0.5352	0.9025
PDACandCPvsHealth-FastHCS Index-SVM (1-10)	85.73	0.5670	0.9268
PDACandCPvsHealth-FastHCS Index-DL (1-11)	83.73	0.5386	0.8951
PDACandCPvsHealth-FastHCS Index-XGB (1-12)	86.90	0.6230	0.9250
PDACandCPvsHealth-Nipals PCA Index-RF (1-13)	88.02	0.6540	0.9274
PDACandCPvsHealth-Nipals PCA Index-SVM (1-14)	87.46	0.6306	0.9334
PDACandCPvsHealth-Nipals PCA Index-DL (1-15)	88.86	0.6957	0.9336
PDACandCPvsHealth-Nipals PCA Index-XGB (1-16)	88.89	0.6753	0.9474
PDACandCPvsHealth-Probabilistic PCA Index-RF (1-17)	89.16	0.6796	0.9478
PDACandCPvsHealth-Probabilistic PCA Index-SVM (1-18)	91.19	0.7430	0.9389
PDACandCPvsHealth-Probabilistic PCA Index-DL (1-19)	89.47	0.7163	0.9576
PDACandCPvsHealth-Probabilistic PCA Index-XGB (1-20)	88.61	0.6686	0.9106

ACC: Accuracy; MCC: Mathew’s correlation coefficient; AUC: Area under the receiver operating characteristic curve; PDAC: Pancreatic ductal adenocarcinoma; CP: Chronic pancreatitis; MDA: Mean decrease in accuracy; RF: Random forest; SVM: Support vector machine; DL: Deep learning; XGB: Extreme gradient boosting; SVD: Singular value decomposition; PCA: Principal component analysis; FastHCS: High-dimensional congruent subsets; Nipals: Nonlinear iterative partial least squares.

Open in New Tab Full Size Table

To validate our approach, we performed a performance evaluation of the models on an independent dataset. An independent dataset containing a total of 115 positive and negative samples (85 PDAC, 30 CP, 35 normal) was used to assess the predictive power of the models. These models all showed good performance, indicating the reliability of these models (Table 3). Five-fold cross-validation was also performed (Table 4). Considering the performance on both the training and independent datasets to select the best classifier, to obtain the best prediction results, we only choose PDACandCPvsHealth-Probabilistic PCA Index-SVM (1-18) as the first layer classifier of our method.

Second classifier-identifying PDAC or CP

The second layer of the classifier is used for diagnosis to distinguish PDAC from CP. Four machine learning algorithms, RF, SVM, DL, and XGBoost, were selected to implement our second classifier. The statistics for the second layer model are summarized in Table 5. In the second layer of the models, the performance based on EvlRNA index was better than that of the models based on the genes screened by MDA, indicating that the models based on EvlRNA index are suitable. The accuracy of PDACvsCP-Probabilistic PCA Index-RF (2-17) was 93.83%, with sensitivity 95.00%, specificity 90.48%, MCC 0.8422, and AUC 0.9698 (Figure 7). The accuracy of PDACvsCP-Probabilistic PCA Index-SVM (2-18) was 96.30%, with sensitivity 100.00%, specificity 85.71%, MCC 0.9035, and AUC 0.9833. In independent dataset tests, the model PDACvsCP-Probabilistic PCA Index-RF (2-17) showed better performance than the model PDACvsCP-Probabilistic PCA Index-SVM (2-18) (Table 6). Combination of EvlRNA index treated by Probabilistic PCA and RF is a good choice, with high internal stability and strong external prediction ability. According to the results of five-fold cross-validation, PDACvsCP-Probabilistic PCA Index-RF (2-17) had the best probabilistic performance among all the models, with an accuracy of 98.13% (Table 7).

Open in New Tab Full Size Figure Download Figure

Figure 7 Receiver operating characteristic curve of pancreatic ductal adenocarcinoma vs chronic pancreatitis-probabilistic principal component analysis index-random forest (2-17). A: Based on training dataset; B: Based on the independent dataset. AUC: Area under the receiver operating characteristic curve.

Table 5 Performance of the second-layer models on training dataset.

Models	ACC, %	Sen, %	Sp, %	MCC	AUC
PDACvsCP-MDA-RF (2-1)	90.12	95.00	76.19	0.7363	0.9444
PDACvsCP-MDA-SVM (2-2)	93.83	96.67	85.71	0.8372	0.9563
PDACvsCP-MDA-DL (2-3)	85.19	90.00	71.43	0.6143	0.9159
PDACvsCP-MDA-XGB (2-4)	86.42	91.67	71.43	0.6412	0.9246
PDACvsCP-SVD PCA Index-RF (2-5)	93.83	96.67	85.71	0.8372	0.9802
PDACvsCP-SVD PCA Index-SVM (2-6)	93.83	98.33	80.95	0.8357	0.9627
PDACvsCP-SVD PCA Index-DL (2-7)	88.89	90.00	85.71	0.7266	0.9159
PDACvsCP-SVD PCA Index-XGB (2-8)	86.42	90.00	76.19	0.6521	0.9603
PDACvsCP-FastHCS Index-RF (2-9)	93.83	98.33	80.95	0.8357	0.9563
PDACvsCP-FastHCS Index-SVM (2-10)	88.89	93.33	76.19	0.7065	0.9750
PDACvsCP-FastHCS Index-DL (2-11)	93.83	96.67	85.71	0.8372	0.9333
PDACvsCP-FastHCS Index-XGB (2-12)	93.83	98.33	80.95	0.8357	0.9357
PDACvsCP-Nipals PCA Index-RF (2-13)	92.59	93.33	90.48	0.8145	0.9651
PDACvsCP-Nipals PCA Index-SVM (2-14)	87.65	93.33	71.43	0.6698	0.9397
PDACvsCP-Nipals PCA Index-DL (2-15)	88.89	91.67	80.95	0.7155	0.9246
PDACvsCP-Nipals PCA Index-XGB (2-16)	86.42	90.00	76.19	0.6521	0.9603
PDACvsCP-Probabilistic PCA Index-RF (2-17)	93.83	95.00	90.48	0.8422	0.9698
PDACvsCP-Probabilistic PCA Index-SVM (2-18)	96.30	100.00	85.71	0.9035	0.9833
PDACvsCP-Probabilistic PCA Index-DL (2-19)	91.36	96.67	76.19	0.7680	0.9579
PDACvsCP-Probabilistic PCA Index-XGB (2-20)	90.12	91.67	85.71	0.7520	0.9563

Open in New Tab Full Size Table

Table 6 Performance of the second-layer models on independent dataset.

Models	ACC, %	Sen, %	Sp, %	MCC	AUC
PDACvsCP-MDA-RF (2-1)	88.70	97.65	63.33	0.6931	0.9500
PDACvsCP-MDA-SVM (2-2)	91.30	97.65	73.33	0.7670	0.9549
PDACvsCP-MDA-DL (2-3)	89.57	90.59	86.67	0.7434	0.9282
PDACvsCP-MDA-XGB (2-4)	86.09	92.94	66.67	0.6257	0.9149
PDACvsCP-SVD PCA Index-RF (2-5)	93.04	95.29	86.67	0.8196	0.9641
PDACvsCP-SVD PCA Index-SVM (2-6)	93.04	92.94	93.33	0.8302	0.9753
PDACvsCP-SVD PCA Index-DL (2-7)	86.09	84.71	90.00	0.6888	0.9392
PDACvsCP-SVD PCA Index-XGB (2-8)	92.17	95.29	83.33	0.7951	0.9702
PDACvsCP-FastHCS Index-RF (2-9)	86.96	92.94	70.00	0.6521	0.9380
PDACvsCP-FastHCS Index-SVM (2-10)	90.43	90.59	90.00	0.7691	0.9561
PDACvsCP-FastHCS Index-DL (2-11)	86.96	89.41	80.00	0.6738	0.9149
PDACvsCP-FastHCS Index-XGB (2-12)	85.22	91.76	66.67	0.6053	0.9145
PDACvsCP-Nipals PCA Index-RF (2-13)	93.91	96.47	86.67	0.8407	0.9714
PDACvsCP-Nipals PCA Index-SVM (2-14)	86.96	94.12	66.67	0.6471	0.9604
PDACvsCP-Nipals PCA Index-DL (2-15)	86.96	89.41	80.00	0.6738	0.9192
PDACvsCP-Nipals PCA Index-XGB (2-16)	92.17	95.29	83.33	0.7951	0.9702
PDACvsCP-Probabilistic PCA Index-RF (2-17)	94.78	97.65	86.67	0.8626	0.9741
PDACvsCP-Probabilistic PCA Index-SVM (2-18)	94.78	100.00	80.00	0.8644	0.9255
PDACvsCP-Probabilistic PCA Index-DL (2-19)	86.96	91.76	73.33	0.6582	0.8859
PDACvsCP-Probabilistic PCA Index-XGB (2-20)	92.17	95.29	83.33	0.7951	0.9722

Open in New Tab Full Size Table

Table 7 The results of the second-layer models five-fold cross-validation.

Models	ACC, %	MCC	AUC
PDACvsCP-MDA-RF (2-1)	89.60	0.7239	0.9431
PDACvsCP-MDA-SVM (2-2)	91.44	0.7739	0.9573
PDACvsCP-MDA-DL (2-3)	89.21	0.7293	0.9493
PDACvsCP-MDA-XGB (2-4)	88.09	0.6791	0.9157
PDACvsCP-SVD PCA Index-RF (2-5)	89.60	0.7209	0.9066
PDACvsCP-SVD PCA Index-SVM (2-6)	91.42	0.7796	0.9298
PDACvsCP-SVD PCA Index-DL (2-7)	85.14	0.6272	0.8874
PDACvsCP-SVD PCA Index-XGB (2-8)	82.54	0.5395	0.8957
PDACvsCP-FastHCS Index-RF (2-9)	87.38	0.6612	0.9071
PDACvsCP-FastHCS Index-SVM (2-10)	88.11	0.6882	0.9345
PDACvsCP-FastHCS Index-DL (2-11)	87.37	0.6672	0.9063
PDACvsCP-FastHCS Index-XGB (2-12)	86.62	0.6449	0.9117
PDACvsCP-Nipals PCA Index-RF (2-13)	89.22	0.7158	0.9130
PDACvsCP-Nipals PCA Index-SVM (2-14)	91.06	0.7684	0.9109
PDACvsCP-Nipals PCA Index-DL (2-15)	86.25	0.6401	0.8944
PDACvsCP-Nipals PCA Index-XGB (2-16)	82.54	0.5395	0.8964
PDACvsCP-Probabilistic PCA Index-RF (2-17)	98.13	0.9520	0.9956
PDACvsCP-Probabilistic PCA Index-SVM (2-18)	92.94	0.8195	0.9502
PDACvsCP-Probabilistic PCA Index-DL (2-19)	88.11	0.6951	0.9519
PDACvsCP-Probabilistic PCA Index-XGB (2-20)	82.91	0.5606	0.9011

Open in New Tab Full Size Table

Model interpretability and functional analysis

Among all models, PDACandCPvsHealth-Probabilistic PCA Index-SVM (1-18) in the first layer and PDACvsCP-Probabilistic PCA Index-RF (2-17) in the second layer showed the best diagnostic performance. To explore the potential mechanism of the EvlRNA-index as input features of the diagnostic model and improve the interpretability of the model, the top five genes with weights in each Probabilistic PCA-based EvlRNA-index calculation process were extracted, and 10 × 5 genes were obtained (Figure 8A). After removing duplicate genes and converting gene names, 49 genes were obtained. These genes type contains protein coding (51.02%), pseudogene (26.53%), lncRNA (18.37%) and others (4.08%) (Figure 8B). There was one gene in this set of 49 that also exists in the gene set of Yu et al[25], namely MAL2. Through database and literature search, we found that this gene set contained 16 known PDAC cancer-related genes (Figure 8C), including ANXA4[41], PF4[42], MUC3A[43], MAL2[44,45], EIF4G2[46], NEAT1[47-49], MALAT1[50-52], NRGN[53], SCARNA10[54], GAPDH[55], TUBA1C[56], CALM1[57], DUOX2[58,59], FXYD3[60-62], LGALS4[63,64], and LENG8[53]. Therefore, we infer that the remaining ones are also PDAC cancer-related genes.

Open in New Tab Full Size Figure Download Figure

Figure 8 Extraction and classification of important genes. A: The top five genes in terms of weight in each probabilistic principal component analysis-based extracellular vesicle long RNA index calculation process; B: Classification of gene types; C: The 49 pancreatic ductal adenocarcinoma cancer-related genes identified in this study were compared with those reported in the literature. PDAC: Pancreatic ductal adenocarcinoma; lncRNA: Long non-coding RNA.

To verify the relationship between these 49 genes and PDAC, RNA-seq data for PDAC (178 PDAC tissue dataset and 171 normal pancreatic tissue dataset) were downloaded from the TCGA and GTEx databases for analysis. The expression of most genes was found to be significantly up-regulated or down-regulated in cancer samples (Figure 9A). Gene expression at the protein level was closer to the original manifestations of the disease. Therefore, we conducted a further study on the protein expression data of PDAC using the clinical proteomic tumor analysis consortium dataset. Protein expression of most encoding protein genes was significantly upregulated or downregulated in PDAC tissues (Figure 9B), which verified the reliability of these results.

Open in New Tab Full Size Figure Download Figure

Figure 9 Differential expression of selected genes in normal and pancreatic ductal adenocarcinoma tissues. A: Expression of selected genes was performed using RNA-seq data from The Cancer Genome Atlas and Genotype-Tissue Expression databases; B: Protein expression of encoding protein genes in normal and primary pancreatic ductal adenocarcinoma tissues based on the clinical proteomic tumor analysis consortium dataset. FPKM: Fragments per kilobase of exon per million mapped fragments; NA: Not available. ^aP < 0.05, ^bP < 0.01, and ^cP < 0.001.

GO analysis based showed that these gene-enriched entries were related to molecular function regulator activity (GO: 0098772), plasma membrane (GO: 0005886), vesicle (GO: 0031982), nucleus (GO: 0005634), miRNA inhibitor activity via base-pairing (GO: 0140869), lncRNA-mediated post-transcriptional gene silencing (GO: 0000512), regulation of miRNA catabolic process (GO: 2000625), and gene expression (GO: 0010467) (Figure 10A), which play an important role in the development of cancer. To better understand the interrelationships among these protein-encoding genes, protein-protein interaction analysis was used to demonstrate the interactions between proteins (Figure 10B). Pathway analysis showed that the pathways enriched by these genes include cAMP responsive element binding protein 1 phosphorylation through the activation of adenylate cyclase, protein kinase A activation, protein kinase A-mediated phosphorylation of CREB, etc. (Figure 10C), and these pathways are closely related to cancer.

Open in New Tab Full Size Figure Download Figure

Figure 10 Functional analysis. A: Gene Ontology analysis; B: Protein-protein interaction analysis (protein-protein interaction enrichment P value: < 1.0e-16), network nodes represent proteins, edges represent protein-protein associations; C: Pathway analysis. GO: Gene Ontology; miRNA: MicroRNA; lncRNA: Long non-coding RNA; Cam-PDE: Calmodulin-dependent phosphodiesterase; PKA: Protein kinase A; CREB1: CAMP response element-binding protein 1; AMPK: AMP-activated protein kinase; NMDARs: N-methyl-D-aspartate receptors; DAG: Diacylglycerol; IP3: Inositol 1,4,5-trisphosphate; eIFs: Eukaryotic initiation factors; PLC: Phospholipase C.

Survival analysis

We conducted survival analysis on these 49 genes and found that the high expression of some was linked to the survival prognosis of PDAC. Specifically, CCDC13-AS1 (OS and DFS), LENG8 (OS) and NRGN (OS and DFS) were correlated with a more favorable outcome for PDAC patients. In contrast, PTP4A2 (OS), OST4 (OS), MAL2 (OS and DFS), GAPDH (DFS), TUBA1C (OS and DFS) and DUOX2 (OS) were associated with a poor outcome for those suffering from PDAC. The Kaplan-Meier plots of these genes are shown in Figure 11. The above results indicated that these biomarkers are important for the diagnosis and prognosis of patients with PDAC.

Open in New Tab Full Size Figure Download Figure

Figure 11 Analysis of overall survival and disease free survival. TPM: Transcripts per kilobase of exon model per million mapped reads; HR: Hazard ratio.

Single-cell RNA expression analysis in PDAC tumor microenvironment

To further investigate the potential mechanism of EvlRNA-index as an input feature of the model and the possible pathways by which EvlRNA-index affects the tumor microenvironment, we explored the expression of 49 identified genes (EvlRNA signature) at the single-cell level. The single-cell RNA-seq dataset (CRA001160[65] and GSE154778[66]) was used to determine the expression level of EvlRNA signature in immune cells. EvlRNA signature was mainly expressed in malignant cells in the PDAC tumor microenvironment (Figure 12), further demonstrating the reliability of the previous experimental results and indicating the potential principle of the model for accurate classification.

Open in New Tab Full Size Figure Download Figure

Figure 12 Analysis of single-cell RNA expression in pancreatic ductal adenocarcinoma tumor microenvironment. A: Based on CRA001160, number of cells: 57443; B: Based on GSE154778, number of cells: 14953. EvlRNA: Extracellular vesicle long RNA; CD8T: CD8+ T cell; DC: Dendritic cell; B: B cell.

DISCUSSION

In this study, we proposed an interpretable machine learning framework called ECD-itMLF (early cancer diagnosis: An interpretable two-layer machine learning framework with plasma EvlRNA) for early diagnosis and prediction of PDAC based on plasma EvlRNA. This framework combines our previous research ideas[24] to establish a two-layer classifier. The first layer identifies normal and non-normal samples, and the second layer identifies whether the samples belong to PDAC or CP. In this study, a new concept was proposed: EvlRNA-index. Based on EvlRNA-index, a cancer diagnosis model was established, and a good diagnostic effect was achieved. In the first layer of the model, the accuracy of PDACandCPvsHealth-Probabilistic PCA Index-SVM (1-18) was 91.51%, with MCC 0.7760 and AUC 0.9560. In the second layer of the model, the accuracy of PDACvsCP-Probabilistic PCA Index-RF (2-17) was 93.83% with MCC 0.8422 and AUC 0.9698. ECD-itMLF is significantly different from traditional black box models and has demonstrated its superiority and uniqueness in diagnostic tasks.

In the feature selection process, we ultimately screened 20 genes, which did not intersect with the eight genes screened by Yu et al[25]. The main reason was that our study considered the differences between CP and normal during the screening process. In addition, the feature selection process was more rigorous (Figure 1). A diagnostic model was established using the 20 genes obtained from the final screening. The second layer of the model classified PDAC and CP. The accuracy of PDACvsCP-MDA-SVM (2-2) was 93.83%, with sensitivity 96.67%, specificity 85.71%, MCC 0.8372, and AUC 0.9563. In the study of Yu et al[25], based on the selected eight genes, SVM was used to classify PDAC and CP with accuracy 92.70%, sensitivity 93.39%, specificity 85.00%, and AUC 0.9280. Compared with their model, the performance indicators of model PDACvsCP-MDA-SVM (2-2) were improved. Several of these 20 genes are well-known cancer-related genes. For example, S100A9 can promote the occurrence, growth and metastasis of tumors by interfering with tumor metabolism and microenvironment[38]. The loss of P2RX1 modulates immunosuppressive activity in specific neutrophil subpopulations, thereby facilitating hepatic metastatic progression in PDAC[39]. The results of GO analysis showed that these 20 genes play a key role in regulating apoptotic signaling pathway.

Unlike the existing complex machine learning models, the method in this study solved the problems of model transparency and interpretability to some extent. Traditional black box models often face challenges in interpreting predictions, especially when complex analyses of high-dimensional data are required. The EvlRNA-index proposed in this study not only simplifies the feature space, but also provides an intuitive and easily understood interpretative framework, making the prediction process transparent to non-technical users. More importantly, using EvlRNA-index as the input feature improved the predictive performance and generalization ability of the model. By carefully designing and calculating EvlRNA-index, we captured key patterns and trends in the data, thereby improving the accuracy and robustness of the model. In the EvlRNA-index based model, in the first layer, the accuracy of PDACandCPvsHealth-Probabilistic PCA Index-SVM (1-18) was 91.51%, with MCC 0.7760 and AUC 0.9560. In the second layer, the accuracy of PDACvsCP-Probabilistic PCA Index-RF (2-17) was 93.83%, with MCC 0.8422 and AUC 0.9698. Compared to the study by Yu et al[25], the performance indicators of PDACvsCP-Probabilistic PCA Index-RF (2-17) have been improved.

In this study, the potential mechanism of these EvlRNA-index as input features of the diagnostic prediction model was investigated. The EvlRNA-index extracted based on probabilistic PCA was used as the input feature, which not only improved the model performance, but also enhanced the interpretability through gene weight ranking. The top five genes with the weight ranking in the calculation of each EvlRNA-index were extracted. The screening of 49 key genes demonstrates the ability to mine core markers from high-dimensional data. The study verified the expression differences of the candidate genes through both RNA-seq (TCGA/GTEx) and proteomics (clinical proteomic tumor analysis consortium) data. The results indicated that expressions of RNA and protein was consistent. For example, the consistent changes of the coding protein genes ANXA4 and NRGN at the transcriptional and translation levels suggested that they are directly involved in the pathological process of PDAC. In the study, 26.53% of the pseudogenes and 18.37% of the lncRNA were included in the biomarker list, suggesting that the regulatory role of ncRNAs (such as CCDC13-AS1) in PDAC deserves in-depth exploration and may involve epigenetic or post-transcriptional regulatory mechanisms. GO analysis revealed that the candidate genes were enriched in multiple biological processes closely related to cancer development. The survival analysis revealed that some of these genes had prognostic value for PDAC. The analysis of single-cell RNA expression in the tumor microenvironment showed that EvlRNA signature was mainly expressed in malignant cells in the PDAC tumor microenvironment, further demonstrating the reliability of the experimental results and indicating the potential principle of the model for accurate classification.

Our method has good scalability and adaptability. Although the focus of this study was on PDAC, the core analytical strategy possesses transferable utility across diverse oncological contexts and various pathophysiological conditions. Our method extracts EvlRNA-index information from some biological states and can be applied to better understand other biological states. However, fully developing and validating the model to address different clinical applications will require additional samples in these specific populations. In contrast to conventional approaches that rely on disease-specific biomarker identification, our genome-wide screening methodology facilitates impartial detection of biological signatures independent of pathological specificity. This technical advantage permits potential adaptation for evaluating non-pathological physiological variations. Furthermore, the method demonstrated potential for identifying distinctive genomic fingerprints associated with various disease entities, enabling machine learning algorithms to distinguish cancer subtypes through the EvlRNA-index. Efforts are under way to assess these assumptions.

There were limitations to our study. Firstly, although the research was based on a multi-center sample for model construction and validation, in the future, the generalization ability, diagnostic efficacy and stability of the model need to be verified in a larger and prospective independent cohort to evaluate its practical application in a broader population. Secondly, the second layer of the model focuses on differentiating PDAC from CP. However, in clinical practice, the differential diagnosis of PDAC also needs to consider other benign pancreatic diseases or adjacent organ tumors that are easily confused with PDAC. Therefore, including a more comprehensive disease spectrum for control will help to evaluate more comprehensively the differential diagnostic ability of the model. Thirdly, although the research revealed the association between key biomarkers and PDAC by visualizing feature weights, thus enhancing the interpretability of the model, the specific biological functions of these important EvlRNA markers and their molecular mechanisms in the occurrence and development of PDAC have not been fully explored. Subsequent experimental studies (such as functional verification) are needed to further clarify them.

CONCLUSION

We proposed an interpretable machine learning framework for early diagnostic prediction of PDAC based on plasma EvlRNA, called ECD-itMLF. In this framework, a two-layer classifier was established, the first layer identified Normal and Non-Normal samples, and the second layer identified whether the samples belong to PDAC or CP. A new concept was proposed: EvlRNA-index. Based on EvlRNA-index, a cancer diagnostic model was established, and a good diagnostic effect was achieved. The interpretability of the entire machine learning framework was studied and explored, and the close correlation between important features and actual biological processes was explored, to provide important help for biomedical researchers to understand the model. Finally, this study provides new insights into the clinical value of EvlRNA.

References

Stoffel EM, Brand RE, Goggins M. Pancreatic Cancer: Changing Epidemiology and New Approaches to Risk Assessment, Early Detection, and Prevention. Gastroenterology. 2023;164:752-765. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 404] [Cited by in RCA: 371] [Article Influence: 123.7] [Reference Citation Analysis (8)]

Ying H, Kimmelman AC, Bardeesy N, Kalluri R, Maitra A, DePinho RA. Genetics and biology of pancreatic ductal adenocarcinoma. Genes Dev. 2025;39:36-63. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 18] [Cited by in RCA: 19] [Article Influence: 19.0] [Reference Citation Analysis (0)]

Mosalem OM, Abdelhakeem A, Abdel-Razeq NH, Babiker H. Pancreatic ductal adenocarcinoma (PDAC): clinical progress in the last five years. Expert Opin Investig Drugs. 2025;34:149-160. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 17] [Cited by in RCA: 15] [Article Influence: 15.0] [Reference Citation Analysis (0)]

4.	Zhang Y, Yang J, Li H, Wu Y, Zhang H, Chen W. Tumor markers CA19-9, CA242 and CEA in the diagnosis of pancreatic cancer: a meta-analysis. Int J Clin Exp Med. 2015;8:11683-11691. [PubMed] [DOI]

Schultz NA, Dehlendorff C, Jensen BV, Bjerregaard JK, Nielsen KR, Bojesen SE, Calatayud D, Nielsen SE, Yilmaz M, Holländer NH, Andersen KK, Johansen JS. MicroRNA biomarkers in whole blood for detection of pancreatic cancer. JAMA. 2014;311:392-404. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 381] [Cited by in RCA: 351] [Article Influence: 29.3] [Reference Citation Analysis (0)]

Venkatesh PG, Navaneethan U, Shen B, McCullough AJ. Increased serum levels of carbohydrate antigen 19-9 and outcomes in primary sclerosing cholangitis patients without cholangiocarcinoma. Dig Dis Sci. 2013;58:850-857. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 42] [Cited by in RCA: 35] [Article Influence: 2.7] [Reference Citation Analysis (0)]

Marrelli D, Caruso S, Pedrazzani C, Neri A, Fernandes E, Marini M, Pinto E, Roviello F. CA19-9 serum levels in obstructive jaundice: clinical value in benign and malignant conditions. Am J Surg. 2009;198:333-339. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 186] [Cited by in RCA: 170] [Article Influence: 10.0] [Reference Citation Analysis (2)]

Chen J, Liang J, Xu B, Liang J, Ma M, Wang Z, Zeng G, Xu Q, Liang L, Lai J, Huang L. High Bile Titer and High Bile to Serum Ratio of CYFRA 21 - 1 Reliably Discriminate Malignant Biliary Obstruction Caused by Cholangiocarcinoma. J Gastrointest Cancer. 2024;55:800-808. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 1] [Article Influence: 0.5] [Reference Citation Analysis (0)]

9.	Scarà S, Bottoni P, Scatena R. CA 19-9: Biochemical and Clinical Aspects. Adv Exp Med Biol. 2015;867:247-260. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 282] [Cited by in RCA: 247] [Article Influence: 22.5] [Reference Citation Analysis (5)]

10.

Kumar MA, Baba SK, Sadida HQ, Marzooqi SA, Jerobin J, Altemani FH, Algehainy N, Alanazi MA, Abou-Samra AB, Kumar R, Al-Shabeeb Akil AS, Macha MA, Mir R, Bhat AA. Extracellular vesicles as tools and targets in therapy for diseases. Signal Transduct Target Ther. 2024;9:27. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 983] [Cited by in RCA: 848] [Article Influence: 424.0] [Reference Citation Analysis (0)]

11.

González Á, López-Borrego S, Sandúa A, Vales-Gomez M, Alegre E. Extracellular vesicles in cancer: challenges and opportunities for clinical laboratories. Crit Rev Clin Lab Sci. 2024;61:435-457. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 18] [Cited by in RCA: 18] [Article Influence: 9.0] [Reference Citation Analysis (3)]

12.

Xiao J, Sluijter JPG. Extracellular vesicles in cardiovascular homeostasis and disease: potential role in diagnosis and therapy. Nat Rev Cardiol. 2025;. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 19] [Cited by in RCA: 16] [Article Influence: 16.0] [Reference Citation Analysis (0)]

13.

Chen J, Tian C, Xiong X, Yang Y, Zhang J. Extracellular vesicles: new horizons in neurodegeneration. EBioMedicine. 2025;113:105605. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 30] [Cited by in RCA: 32] [Article Influence: 32.0] [Reference Citation Analysis (1)]

14.

Cruz CG, Sodawalla HM, Mohanakumar T, Bansal S. Extracellular Vesicles as Biomarkers in Infectious Diseases. Biology (Basel). 2025;14:182. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 12] [Cited by in RCA: 12] [Article Influence: 12.0] [Reference Citation Analysis (1)]

15.

Lai H, Li Y, Zhang H, Hu J, Liao J, Su Y, Li Q, Chen B, Li C, Wang Z, Li Y, Wang J, Meng Z, Huang Z, Huang S. exoRBase 2.0: an atlas of mRNA, lncRNA and circRNA in extracellular vesicles from human biofluids. Nucleic Acids Res. 2022;50:D118-D128. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 185] [Cited by in RCA: 169] [Article Influence: 42.3] [Reference Citation Analysis (1)]

16.	Li C, Zhou T, Chen J, Li R, Chen H, Luo S, Chen D, Cai C, Li W. The role of Exosomal miRNAs in cancer. J Transl Med. 2022;20:6. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 145] [Cited by in RCA: 127] [Article Influence: 31.8] [Reference Citation Analysis (0)]

17.

Zhou R, Chen KK, Zhang J, Xiao B, Huang Z, Ju C, Sun J, Zhang F, Lv XB, Huang G. The decade of exosomal long RNA species: an emerging cancer antagonist. Mol Cancer. 2018;17:75. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 139] [Cited by in RCA: 134] [Article Influence: 16.8] [Reference Citation Analysis (0)]

18.

He X, Chen L, Di Y, Li W, Zhang X, Bai Z, Wang Z, Liu S, Corpe C, Wang J. Plasma-derived exosomal long noncoding RNAs of pancreatic cancer patients as novel blood-based biomarkers of disease. BMC Cancer. 2024;24:961. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 10] [Cited by in RCA: 9] [Article Influence: 4.5] [Reference Citation Analysis (0)]

19.

Del Re M, Biasco E, Crucitta S, Derosa L, Rofi E, Orlandini C, Miccoli M, Galli L, Falcone A, Jenster GW, van Schaik RH, Danesi R. The Detection of Androgen Receptor Splice Variant 7 in Plasma-derived Exosomal RNA Strongly Predicts Resistance to Hormonal Therapy in Metastatic Prostate Cancer Patients. Eur Urol. 2017;71:680-687. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 248] [Cited by in RCA: 219] [Article Influence: 24.3] [Reference Citation Analysis (2)]

20.

Del Re M, Marconcini R, Pasquini G, Rofi E, Vivaldi C, Bloise F, Restante G, Arrigoni E, Caparello C, Bianco MG, Crucitta S, Petrini I, Vasile E, Falcone A, Danesi R. PD-L1 mRNA expression in plasma-derived exosomes is associated with response to anti-PD-1 antibodies in melanoma and NSCLC. Br J Cancer. 2018;118:820-824. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 220] [Cited by in RCA: 208] [Article Influence: 26.0] [Reference Citation Analysis (1)]

21.

Nabet BY, Qiu Y, Shabason JE, Wu TJ, Yoon T, Kim BC, Benci JL, DeMichele AM, Tchou J, Marcotrigiano J, Minn AJ. Exosome RNA Unshielding Couples Stromal Activation to Pattern Recognition Receptor Signaling in Cancer. Cell. 2017;170:352-366.e13. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 406] [Cited by in RCA: 368] [Article Influence: 40.9] [Reference Citation Analysis (1)]

22.

Zhang XW, Qi GX, Liu MX, Yang YF, Wang JH, Yu YL, Chen S. Deep Learning Promotes Profiling of Multiple miRNAs in Single Extracellular Vesicles for Cancer Diagnosis. ACS Sens. 2024;9:1555-1564. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 39] [Cited by in RCA: 34] [Article Influence: 17.0] [Reference Citation Analysis (1)]

23.

Bahrambeigi V, Lee JJ, Branchi V, Rajapakshe KI, Xu Z, Kui N, Henry JT, Kun W, Stephens BM, Dhebat S, Hurd MW, Sun R, Yang P, Ruppin E, Wang W, Kopetz S, Maitra A, Guerrero PA. Transcriptomic Profiling of Plasma Extracellular Vesicles Enables Reliable Annotation of the Cancer-Specific Transcriptome and Molecular Subtype. Cancer Res. 2024;84:1719-1732. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 17] [Cited by in RCA: 16] [Article Influence: 8.0] [Reference Citation Analysis (0)]

24.

Liu SC, Tang HL, Liu HD, Wang JK. Multi-label Learning for the Diagnosis of Cancer and Identification of Novel Biomarkers with High-throughput Omics. Curr Bioinform. 2021;16:261-273. [RCA] [DOI] [Full Text] [Cited by in Crossref: 9] [Cited by in RCA: 9] [Article Influence: 1.8] [Reference Citation Analysis (2)]

25.

Yu S, Li Y, Liao Z, Wang Z, Wang Z, Li Y, Qian L, Zhao J, Zong H, Kang B, Zou WB, Chen K, He X, Meng Z, Chen Z, Huang S, Wang P. Plasma extracellular vesicle long RNA profiling identifies a diagnostic signature for the detection of pancreatic ductal adenocarcinoma. Gut. 2020;69:540-550. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 185] [Cited by in RCA: 162] [Article Influence: 27.0] [Reference Citation Analysis (1)]

26.	Breiman L. Random Forests. Mach Learn. 2001;45:5-32. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 56052] [Cited by in RCA: 36755] [Article Influence: 2827.3] [Reference Citation Analysis (0)]

27.

Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 82919] [Cited by in RCA: 66975] [Article Influence: 5581.3] [Reference Citation Analysis (4)]

28.

Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139-140. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 38466] [Cited by in RCA: 31494] [Article Influence: 1968.4] [Reference Citation Analysis (8)]

29.	Brunton SL, Kutz JN. Singular Value Decomposition (SVD). In: Data-Driven Science and Engineering. United Kingdom: Cambridge University Press, 2019: 3-46. [PubMed] [DOI] [Full Text]

30.

de Souza VC, Rodrigues SA, Filho LRAG. Comparison of principal component analysis algorithms for imputation in agrometeorological data in high dimension and reduced sample size. PLoS One. 2024;19:e0315574. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3] [Cited by in RCA: 2] [Article Influence: 1.0] [Reference Citation Analysis (1)]

31.	Tipping ME, Bishop CM. Probabilistic Principal Component Analysis. J R Stat Soc Ser B Stat Methodol. 1999;61:611-622. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1892] [Cited by in RCA: 806] [Article Influence: 29.9] [Reference Citation Analysis (0)]

32.	Schmitt E, Vakili K. The FastHCS algorithm for robust PCA. Stat Comput. 2016;26:1229-1242. [PubMed] [DOI] [Full Text]

33.

Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, Imamichi T, Chang W. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022;50:W216-W221. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 5409] [Cited by in RCA: 4379] [Article Influence: 1094.8] [Reference Citation Analysis (1)]

34.

Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, Gable AL, Fang T, Doncheva NT, Pyysalo S, Bork P, Jensen LJ, von Mering C. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023;51:D638-D646. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 7660] [Cited by in RCA: 6125] [Article Influence: 2041.7] [Reference Citation Analysis (4)]

35.

Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff-Ribeiro A, Griss J, Sevilla C, Matthews L, Gong C, Deng C, Varusai T, Ragueneau E, Haider Y, May B, Shamovsky V, Weiser J, Brunson T, Sanati N, Beckman L, Shao X, Fabregat A, Sidiropoulos K, Murillo J, Viteri G, Cook J, Shorser S, Bader G, Demir E, Sander C, Haw R, Wu G, Stein L, Hermjakob H, D'Eustachio P. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 2022;50:D687-D692. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1804] [Cited by in RCA: 1401] [Article Influence: 350.3] [Reference Citation Analysis (1)]

36.

Han Y, Wang Y, Dong X, Sun D, Liu Z, Yue J, Wang H, Li T, Wang C. TISCH2: expanded datasets and new tools for single-cell transcriptome analyses of the tumor microenvironment. Nucleic Acids Res. 2023;51:D1425-D1431. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 519] [Cited by in RCA: 492] [Article Influence: 164.0] [Reference Citation Analysis (0)]

37.

Tang Z, Kang B, Li C, Chen T, Zhang Z. GEPIA2: an enhanced web server for large-scale expression profiling and interactive analysis. Nucleic Acids Res. 2019;47:W556-W560. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 4076] [Cited by in RCA: 3716] [Article Influence: 530.9] [Reference Citation Analysis (8)]

38.	Chen Y, Ouyang Y, Li Z, Wang X, Ma J. S100A8 and S100A9 in Cancer. Biochim Biophys Acta Rev Cancer. 2023;1878:188891. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 135] [Cited by in RCA: 120] [Article Influence: 40.0] [Reference Citation Analysis (1)]

39.

Wang X, Hu LP, Qin WT, Yang Q, Chen DY, Li Q, Zhou KX, Huang PQ, Xu CJ, Li J, Yao LL, Wang YH, Tian GA, Yang JY, Yang MW, Liu DJ, Sun YW, Jiang SH, Zhang XL, Zhang ZG. Identification of a subset of immunosuppressive P2RX1-negative neutrophils in pancreatic cancer liver metastasis. Nat Commun. 2021;12:174. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 103] [Cited by in RCA: 97] [Article Influence: 19.4] [Reference Citation Analysis (0)]

40.

Toland AE, Rozek LS, Presswala S, Rennert G, Gruber SB. PTPRJ haplotypes and colorectal cancer risk. Cancer Epidemiol Biomarkers Prev. 2008;17:2782-2785. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 15] [Cited by in RCA: 14] [Article Influence: 0.8] [Reference Citation Analysis (0)]

41.	Wei B, Guo C, Liu S, Sun MZ. Annexin A4 and cancer. Clin Chim Acta. 2015;447:72-78. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 57] [Cited by in RCA: 53] [Article Influence: 4.8] [Reference Citation Analysis (1)]

42.

Poruk KE, Firpo MA, Huerter LM, Scaife CL, Emerson LL, Boucher KM, Jones KA, Mulvihill SJ. Serum platelet factor 4 is an independent predictor of survival and venous thromboembolism in patients with pancreatic adenocarcinoma. Cancer Epidemiol Biomarkers Prev. 2010;19:2605-2610. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 59] [Cited by in RCA: 57] [Article Influence: 3.6] [Reference Citation Analysis (0)]

43.

Wang J, Zhou H, Wang Y, Huang H, Yang J, Gu W, Zhang X, Yang J. Serum mucin 3A as a potential biomarker for extrahepatic cholangiocarcinoma. Saudi J Gastroenterol. 2020;26:129-136. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 10] [Cited by in RCA: 8] [Article Influence: 1.3] [Reference Citation Analysis (0)]

44.

Zhang B, Xiao J, Cheng X, Liu T. MAL2 interacts with IQGAP1 to promote pancreatic cancer progression by increasing ERK1/2 phosphorylation. Biochem Biophys Res Commun. 2021;554:63-70. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 18] [Cited by in RCA: 15] [Article Influence: 3.0] [Reference Citation Analysis (0)]

45.

Eguchi D, Ohuchida K, Kozono S, Ikenaga N, Shindo K, Cui L, Fujiwara K, Akagawa S, Ohtsuka T, Takahata S, Tokunaga S, Mizumoto K, Tanaka M. MAL2 expression predicts distant metastasis and short survival in pancreatic cancer. Surgery. 2013;154:573-582. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 30] [Cited by in RCA: 34] [Article Influence: 2.6] [Reference Citation Analysis (0)]

46.

Chen Y, Wang J, Guo Q, Li X, Zou X. Identification of MYEOV-Associated Gene Network as a Potential Therapeutic Target in Pancreatic Cancer. Cancers (Basel). 2022;14:5439. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 7] [Cited by in RCA: 5] [Article Influence: 1.3] [Reference Citation Analysis (1)]

47.

Gu J, Wang Q, Mo J, Qin T, Qian W, Duan W, Han L, Wang Z, Ma Q, Ma J. NEAT1 promotes the perineural invasion of pancreatic cancer via the E2F1/GDNF axis. Cancer Lett. 2025;613:217497. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 7] [Cited by in RCA: 11] [Article Influence: 11.0] [Reference Citation Analysis (0)]

48.

Gao Y, Zandieh K, Zhao K, Khizanishvili N, Fazio PD, Yu X, Schulte L, Aillaud M, Chung HR, Ball Z, Meixner M, Bauer UM, Bartsch DK, Buchholz M, Lauth M, Nimsky C, Cook L, Bartsch JW. The long non-coding RNA NEAT1 contributes to aberrant STAT3 signaling in pancreatic cancer and is regulated by a metalloprotease-disintegrin ADAM8/miR-181a-5p axis. Cell Oncol (Dordr). 2025;48:391-409. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 7] [Cited by in RCA: 5] [Article Influence: 5.0] [Reference Citation Analysis (0)]

49.	Feng Y, Gao L, Cui G, Cao Y. LncRNA NEAT1 facilitates pancreatic cancer growth and metastasis through stabilizing ELF3 mRNA. Am J Cancer Res. 2020;10:237-248. [PubMed] [DOI]

50.

Lee JE, Cho SG, Ko SG, Ahrmad SA, Puga A, Kim K. Regulation of a long noncoding RNA MALAT1 by aryl hydrocarbon receptor in pancreatic cancer cells and tissues. Biochem Biophys Res Commun. 2020;532:563-569. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 19] [Cited by in RCA: 19] [Article Influence: 3.2] [Reference Citation Analysis (1)]

51.

Xu J, Xu W, Xuan Y, Liu Z, Sun Q, Lan C. Pancreatic Cancer Progression Is Regulated by IPO7/p53/LncRNA MALAT1/MiR-129-5p Positive Feedback Loop. Front Cell Dev Biol. 2021;9:630262. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 20] [Cited by in RCA: 20] [Article Influence: 4.0] [Reference Citation Analysis (0)]

52.

Song Z, Wang X, Chen F, Chen Q, Liu W, Yang X, Zhu X, Liu X, Wang P. LncRNA MALAT1 regulates METTL3-mediated PD-L1 expression and immune infiltrates in pancreatic cancer. Front Oncol. 2022;12:1004212. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 47] [Cited by in RCA: 48] [Article Influence: 12.0] [Reference Citation Analysis (0)]

53.

Uhlen M, Zhang C, Lee S, Sjöstedt E, Fagerberg L, Bidkhori G, Benfeitas R, Arif M, Liu Z, Edfors F, Sanli K, von Feilitzen K, Oksvold P, Lundberg E, Hober S, Nilsson P, Mattsson J, Schwenk JM, Brunnström H, Glimelius B, Sjöblom T, Edqvist PH, Djureinovic D, Micke P, Lindskog C, Mardinoglu A, Ponten F. A pathology atlas of the human cancer transcriptome. Science. 2017;357:eaan2507. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2867] [Cited by in RCA: 2509] [Article Influence: 278.8] [Reference Citation Analysis (3)]

54.

Jafari S, Ravan M, Karimi-Sani I, Aria H, Hasan-Abad AM, Banasaz B, Atapour A, Sarab GA. Screening and identification of potential biomarkers for pancreatic cancer: An integrated bioinformatics analysis. Pathol Res Pract. 2023;249:154726. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 24] [Cited by in RCA: 18] [Article Influence: 6.0] [Reference Citation Analysis (0)]

55.	Guo C, Liu S, Sun MZ. Novel insight into the role of GAPDH playing in tumor. Clin Transl Oncol. 2013;15:167-172. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 108] [Cited by in RCA: 106] [Article Influence: 8.2] [Reference Citation Analysis (1)]

56.

Albahde MAH, Zhang P, Zhang Q, Li G, Wang W. Upregulated Expression of TUBA1C Predicts Poor Prognosis and Promotes Oncogenesis in Pancreatic Ductal Adenocarcinoma via Regulating the Cell Cycle. Front Oncol. 2020;10:49. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 37] [Cited by in RCA: 32] [Article Influence: 5.3] [Reference Citation Analysis (0)]

57.

Zhao S, Xue Z, Yao Wang J, Song P. Gene Expression Array Analyses Predict Proto-Oncogene Expression During Perineural Invasion in Pancreatic Ductal Adenocarcinoma. Turk J Gastroenterol. 2024;35:48-60. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3] [Cited by in RCA: 3] [Article Influence: 1.5] [Reference Citation Analysis (0)]

58.

Wang SL, Wu Y, Konaté M, Lu J, Mallick D, Antony S, Meitzler JL, Jiang G, Dahan I, Juhasz A, Diebold B, Roy K, Doroshow JH. Exogenous DNA enhances DUOX2 expression and function in human pancreatic cancer cells by activating the cGAS-STING signaling pathway. Free Radic Biol Med. 2023;205:262-274. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 15] [Cited by in RCA: 14] [Article Influence: 4.7] [Reference Citation Analysis (1)]

59.

Lyu PW, Xu XD, Zong K, Qiu XG. Overexpression of DUOX2 mediates doxorubicin resistance and predicts prognosis of pancreatic cancer. Gland Surg. 2022;11:115-124. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 10] [Cited by in RCA: 8] [Article Influence: 2.0] [Reference Citation Analysis (0)]

60.

Rasko NB, Nahm CB, Turchini J, Teh R, Rasmussen H, Byeon S, Sahni S, Samra JS, Gill AJ, Mittal A. FXYD3 Is Frequently Expressed in Pancreatic Ductal Adenocarcinoma but Does Not Predict Survival. Cancer Med. 2025;14:e70500. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 1] [Article Influence: 1.0] [Reference Citation Analysis (1)]

61.

Kayed H, Kleeff J, Kolb A, Ketterer K, Keleg S, Felix K, Giese T, Penzel R, Zentgraf H, Büchler MW, Korc M, Friess H. FXYD3 is overexpressed in pancreatic ductal adenocarcinoma and influences pancreatic cancer cell growth. Int J Cancer. 2006;118:43-54. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 55] [Cited by in RCA: 54] [Article Influence: 2.7] [Reference Citation Analysis (1)]

62.

Yee KX, Lee YC, Nguyen HD, Chen MY, Ni YC, Wu YF, Lee KH. Uncovering the role of FXYD3 as a potential oncogene and early biomarker in pancreatic cancer. Am J Cancer Res. 2024;14:4353-4366. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 4] [Cited by in RCA: 3] [Article Influence: 1.5] [Reference Citation Analysis (0)]

63.

Hu D, Ansari D, Zhou Q, Sasor A, Said Hilmersson K, Andersson R. Galectin 4 is a biomarker for early recurrence and death after surgical resection for pancreatic ductal adenocarcinoma. Scand J Gastroenterol. 2019;54:95-100. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 17] [Cited by in RCA: 15] [Article Influence: 2.1] [Reference Citation Analysis (0)]

64.

Lidström T, Cumming J, Gaur R, Frängsmyr L, Pateras IS, Mickert MJ, Franklin O, Forsell MNE, Arnberg N, Dongre M, Patthey C, Öhlund D. Extracellular Galectin 4 Drives Immune Evasion and Promotes T-cell Apoptosis in Pancreatic Cancer. Cancer Immunol Res. 2023;11:72-92. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 41] [Cited by in RCA: 39] [Article Influence: 13.0] [Reference Citation Analysis (1)]

65.

Peng J, Sun BF, Chen CY, Zhou JY, Chen YS, Chen H, Liu L, Huang D, Jiang J, Cui GS, Yang Y, Wang W, Guo D, Dai M, Guo J, Zhang T, Liao Q, Liu Y, Zhao YL, Han DL, Zhao Y, Yang YG, Wu W. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 2019;29:725-738. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1114] [Cited by in RCA: 959] [Article Influence: 137.0] [Reference Citation Analysis (4)]

66.

Lin W, Noel P, Borazanci EH, Lee J, Amini A, Han IW, Heo JS, Jameson GS, Fraser C, Steinbach M, Woo Y, Fong Y, Cridebring D, Von Hoff DD, Park JO, Han H. Single-cell transcriptome analysis of tumor and stromal compartments of pancreatic ductal adenocarcinoma primary tumors and metastatic lesions. Genome Med. 2020;12:80. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 262] [Cited by in RCA: 225] [Article Influence: 37.5] [Reference Citation Analysis (1)]

Footnotes

Provenance and peer review: Invited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Oncology

Country of origin: China

Peer-review report’s classification

Scientific Quality: Grade B, Grade B

Novelty: Grade B, Grade B

Creativity or Innovation: Grade B, Grade B

Scientific Significance: Grade B, Grade B

Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/

P-Reviewer: Yan SY, PhD, Associate Professor, China S-Editor: Wu S L-Editor: A P-Editor: Xu J