1
|
Shi WJ, Cao Z, Long XB, Yao CR, Zhang JG, Chen CE, Ying GG. Predicting estrogen receptor agonists from plastic additives across various aquatic-related species using machine learning and AlphaFold2. JOURNAL OF HAZARDOUS MATERIALS 2025; 494:138629. [PMID: 40378742 DOI: 10.1016/j.jhazmat.2025.138629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2025] [Revised: 04/27/2025] [Accepted: 05/13/2025] [Indexed: 05/19/2025]
Abstract
The absence of effective public databases greatly limits high-throughput prediction of hormonal effects mediated by nuclear receptors in aquatic organisms. In this study, we developed novel strategies for multi-species screening of estrogen receptor (ER) agonists in plastic additives using AlphaFold2. Firstly, Deep Forest (DF), artificial neural network (ANN) and conventional machine learning (ML) models were utilized to screen ERα agonists. The DF models using RDKit.Chem.Descriptors and MorganFingerprint achieved a sensitivity = 0.96, specificity > 0.99, and an F1 score > 0.95, identifying 42 plastic additives as ERα agonists. Subsequently, ERα structures for Danio rerio (Dr), Oryzias melastigma (Om), Delphinus delphis (Dd), Physeter catodon (Pc), Mytilus edulis (Me), Xenopus tropicalis (Xt), Nipponia nippon (Nn), and Aptenodytes forsteri (Af) were constructed using AlphaFold2. Except for Me ERα, most species shared two common key amino acid residues responsible for ERα activity: arginine 85 and glutamic acid 44 (aligned serial numbers in the LBD). However, aquatic-related species exhibited other three additional key residues: glycine 212, leucine 216 and phenylalanine 95 (aligned serial numbers in the LBD). The number of compounds with docking energy < -9 kcal/mol for Dr, Om, Dd, Pc, Me, Xt, Nn, and Af were 4, 8, 4, 12, 10, 13, 7, and 9, respectively. The docking energy of estrone in all species was < -9 kcal/mol, while that of bisphenol P varied greatly among different species. The combined application of ML and AlphaFold enables high-throughput evaluation of the ecotoxicity posed by emerging pollutants across multiple aquatic-related species.
Collapse
Affiliation(s)
- Wen-Jun Shi
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China.
| | - Zhou Cao
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Xiao-Bing Long
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Chong-Rui Yao
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Jin-Ge Zhang
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Chang-Er Chen
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Guang-Guo Ying
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| |
Collapse
|
2
|
Huang W, Xu Y, Li Z, Li J, Chen Q, Huang Q, Wu Y, Chen H. Enhancing noninvasive pancreatic cystic neoplasm diagnosis with multimodal machine learning. Sci Rep 2025; 15:16398. [PMID: 40355497 PMCID: PMC12069609 DOI: 10.1038/s41598-025-01502-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Accepted: 05/06/2025] [Indexed: 05/14/2025] Open
Abstract
Pancreatic cystic neoplasms (PCNs) are a complex group of lesions with a spectrum of malignancy. Accurate differentiation of PCN types is crucial for patient management, as misdiagnosis can result in unnecessary surgeries or treatment delays, affecting the quality of life. The significance of developing a non-invasive, accurate diagnostic model is underscored by the need to improve patient outcomes and reduce the impact of these conditions. We developed a machine learning model capable of accurately identifying different types of PCNs in a non-invasive manner, by using a dataset comprising 449 MRI and 568 CT scans from adult patients, spanning from 2009 to 2022. The study's results indicate that our multimodal machine learning algorithm, which integrates both clinical and imaging data, significantly outperforms single-source data algorithms. Specifically, it demonstrated state-of-the-art performance in classifying PCN types, achieving an average accuracy of 91.2%, precision of 91.7%, sensitivity of 88.9%, and specificity of 96.5%. Remarkably, for patients with mucinous cystic neoplasms (MCNs), regardless of undergoing MRI or CT imaging, the model achieved a 100% prediction accuracy rate. It indicates that our non-invasive multimodal machine learning model offers strong support for the early screening of MCNs, and represents a significant advancement in PCN diagnosis for improving clinical practice and patient outcomes. We also achieved the best results on an additional pancreatic cancer dataset, which further proves the generality of our model.
Collapse
Affiliation(s)
- Wei Huang
- Department of Gastroenterology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yue Xu
- Department of Gastroenterology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhao Li
- Research Center for Data Hub and Security, Zhejiang Lab, Hangzhou, China.
| | - Jun Li
- Department of Pathology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qing Chen
- Department of Pathology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qiang Huang
- Department of Imaging, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yaping Wu
- Department of Imaging, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Hongtan Chen
- Department of Gastroenterology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.
| |
Collapse
|
3
|
He Y, Deng K, Han J. Patent value prediction in biomedical textiles: A method based on a fusion of machine learning models. PLoS One 2025; 20:e0322182. [PMID: 40273052 PMCID: PMC12021132 DOI: 10.1371/journal.pone.0322182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 03/18/2025] [Indexed: 04/26/2025] Open
Abstract
Patent value prediction is essential for technology innovation management. This study aims to enhance technology innovation management in the field of biomedical textiles by processing complex biomedical patent information to improve the accuracy of predicting patent values. A patent value grading prediction method based on a fusion of machine learning models is proposed, utilizing 113,428 biomedical textile patents as the research sample. The method combines BERT (Bidirectional Encoder Representations from Transformers) and a stacking strategy to classify and predict the value class of biomedical textile patents using both textual information and structured patent features. We implemented this method for patent value prediction in biomedical textiles, leading to the development of BioTexVal-the first dedicated patent value prediction model for this domain. BioTexVal's innovation lies in employing a stacking strategy that integrates multiple machine learning models to enhance predictive accuracy while leveraging unstructured data during training. Results have shown that this approach significantly outperforms previous predictive methods. Validated on 113,428 biomedical textile patents spanning from 2003 to 2023, BioTexVal achieved an accuracy of 88.38%. This study uses average annual forward citations as an indicator for distinguishing patent value grades. The method may require adjustments based on data characteristics when applied to other research fields to ensure its effectiveness.
Collapse
Affiliation(s)
- Yifan He
- Department of Humanities, Donghua University, Shanghai, China
| | - Kehui Deng
- Department of Humanities, Donghua University, Shanghai, China
| | - Jiawei Han
- Department of Humanities, Donghua University, Shanghai, China
| |
Collapse
|
4
|
Qiu Y, Shan D, Wang Y, Dong P, Wu D, Yang X, Hong Q, Shen D. A topology-preserving three-stage framework for fully-connected coronary artery extraction. Med Image Anal 2025; 103:103578. [PMID: 40239457 DOI: 10.1016/j.media.2025.103578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 03/28/2025] [Accepted: 03/28/2025] [Indexed: 04/18/2025]
Abstract
Coronary artery extraction is a crucial prerequisite for computer-aided diagnosis of coronary artery disease. Accurately extracting the complete coronary tree remains challenging due to several factors, including presence of thin distal vessels, tortuous topological structures, and insufficient contrast. These issues often result in over-segmentation and under-segmentation in current segmentation methods. To address these challenges, we propose a topology-preserving three-stage framework for fully-connected coronary artery extraction. This framework includes vessel segmentation, centerline reconnection, and missing vessel reconstruction. First, we introduce a new centerline enhanced loss in the segmentation process. Second, for the broken vessel segments, we further propose a regularized walk algorithm to integrate distance, probabilities predicted by a centerline classifier, and directional cosine similarity, for reconnecting the centerlines. Third, we apply implicit neural representation and implicit modeling, to reconstruct the geometric model of the missing vessels. Experimental results show that our proposed framework outperforms existing methods, achieving Dice scores of 88.53% and 85.07%, with Hausdorff Distances (HD) of 1.07 mm and 1.63 mm on ASOCA and PDSCA datasets, respectively. Code will be available at https://github.com/YH-Qiu/CorSegRec.
Collapse
Affiliation(s)
- Yuehui Qiu
- Center for Digital Media Computing, School of Film, School of Informatics, Xiamen University, Xiamen, 361005, China
| | - Dandan Shan
- Institute of Artificial Intelligence, Xiamen University, Xiamen, 361005, China
| | - Yining Wang
- Peking Union Medical College Hospital, Beijing, 100006, China
| | - Pei Dong
- Shanghai United Imaging Intelligence Co., Ltd., Shanghai, 200232, China
| | - Dijia Wu
- Shanghai United Imaging Intelligence Co., Ltd., Shanghai, 200232, China
| | - Xinnian Yang
- City University of Hong Kong, 999077, Hong Kong, China
| | - Qingqi Hong
- Center for Digital Media Computing, School of Film, School of Informatics, Xiamen University, Xiamen, 361005, China; Institute of Artificial Intelligence, Xiamen University, Xiamen, 361005, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China.
| | - Dinggang Shen
- Shanghai United Imaging Intelligence Co., Ltd., Shanghai, 200232, China; School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, 201210, China; Shanghai Clinical Research and Trial Center, Shanghai, 201210, China.
| |
Collapse
|
5
|
Yan Y, Chai X, Liu J, Wang S, Li W, Huang T. DeepMethyGene: a deep-learning model to predict gene expression using DNA methylations. BMC Bioinformatics 2025; 26:99. [PMID: 40200144 PMCID: PMC11977931 DOI: 10.1186/s12859-025-06115-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2024] [Accepted: 03/17/2025] [Indexed: 04/10/2025] Open
Abstract
Gene expression is the basis for cells to achieve various functions, while DNA methylation constitutes a critical epigenetic mechanism governing gene expression regulation. Here we propose DeepMethyGene, an adaptive recursive convolutional neural network model based on ResNet that predicts gene expression using DNA methylation information. Our model transforms methylation Beta values to M values for Gaussian distributed data optimization, dynamically adjusts the output channels according to input dimension, and implements residual blocks to mitigate the problem of gradient vanishing when training very deep networks. Benchmarking against the state-of-the-art geneEXPLORE model (R2 = 0.449), DeepMethyGene (R2 = 0.640) demonstrated superior predictive performance. Further analysis revealed that the number of methylation sites and the average distance between these sites and gene transcription start sites (TSS) significantly affected the prediction accuracy. By exploring the complex relationship between methylation and gene expression, this study provides theoretical support for disease progression prediction and clinical intervention. Relevant data and code are available at https://github.com/yaoyao-11/DeepMethyGene .
Collapse
Affiliation(s)
- Yuyao Yan
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Xinyi Chai
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Jiajun Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Sijia Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Wenran Li
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
- Department of Artificial Intelligence and Digital Health, CAS Engineering Laboratory for Nutrition, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
6
|
Wang X, Zhao Z, Pan D, Zhou H, Hou J, Sun H, Shen X, Mehta S, Wang W. Deep cross entropy fusion for pulmonary nodule classification based on ultrasound Imagery. Front Oncol 2025; 15:1514779. [PMID: 40255427 PMCID: PMC12005990 DOI: 10.3389/fonc.2025.1514779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 03/18/2025] [Indexed: 04/22/2025] Open
Abstract
Introduction Accurate differentiation of benign and malignant pulmonary nodules in ultrasound remains a clinical challenge due to insufficient diagnostic precision. We propose the Deep Cross-Entropy Fusion (DCEF) model to enhance classification accuracy. Methods A retrospective dataset of 135 patients (27 benign, 68 malignant training; 11 benign, 29 malignant testing) was analyzed. Manually annotated ultrasound ROIs were preprocessed and input into DCEF, which integrates ResNet, DenseNet, VGG, and InceptionV3 via entropy-based fusion. Performance was evaluated using AUC, accuracy, sensitivity, specificity, precision, and F1-score. Results DCEF achieved an AUC of 0.873 (training) and 0.792 (testing), outperforming traditional methods. Test metrics included 71.5% accuracy, 70.69% sensitivity, 70.58% specificity, 72.55% precision, and 71.13% F1-score, demonstrating robust diagnostic capability. Discussion DCEF's multi-architecture fusion enhances diagnostic reliability for ultrasound-based nodule assessment. While promising, validation in larger multi-center cohorts is needed to address single-center data limitations. Future work will explore next-generation architectures and multi-modal integration.
Collapse
Affiliation(s)
- Xian Wang
- Department of Ultrasound, Affiliated People’s Hospital of Jiangsu University, Zhenjiang, Jiangsu, China
- Medical College of Yangzhou University, Yangzhou, Jiangsu, China
| | - Ziou Zhao
- Department of Ultrasound, Affiliated People’s Hospital of Jiangsu University, Zhenjiang, Jiangsu, China
| | - Donggang Pan
- Department of Radiology, Affiliated People’s Hospital of Jiangsu University, Zhenjiang, Jiangsu, China
| | - Hui Zhou
- Department of Ultrasound, Affiliated People’s Hospital of Jiangsu University, Zhenjiang, Jiangsu, China
| | - Jie Hou
- Department of Ultrasound, Affiliated People’s Hospital of Jiangsu University, Zhenjiang, Jiangsu, China
| | - Hui Sun
- Department of Pathology, Affiliated People’s Hospital of Jiangsu University, Zhenjiang, Jiangsu, China
| | - Xiangjun Shen
- School of Computer Science & Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu, China
| | - Sumet Mehta
- School of Computer Science & Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu, China
| | - Wei Wang
- Department of Radiology, Affiliated Hospital of Yangzhou University, Yangzhou, Jiangsu, China
| |
Collapse
|
7
|
Peduzzi G, Felici A, Pellungrini R, Campa D. Explainable machine learning identifies a polygenic risk score as a key predictor of pancreatic cancer risk in the UK Biobank. Dig Liver Dis 2025; 57:915-922. [PMID: 39632152 DOI: 10.1016/j.dld.2024.11.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 11/11/2024] [Accepted: 11/12/2024] [Indexed: 12/07/2024]
Abstract
BACKGROUND Predicting the risk of developing pancreatic ductal adenocarcinoma (PDAC) is of paramount importance, given its high mortality rate. Current PDAC risk prediction models rely on a limited number of variables, do not include genetics, and have a modest accuracy. AIM This study aimed to develop an interpretable PDAC risk prediction model, based on machine learning (ML). METHODS Five ML models (Adaptive Boosting, eXtreme Gradient Boosting, CatBoost, Deep Forest and Random Forest) built on 56 exposome variables and a polygenic risk score (PRS) were tested in 654 PDAC cases and 1,308 controls of the UK Biobank. Additionally, SHapley Additive exPlanation (SHAP) and Global model Interpretation via the Recursive Partitioning (Girp) were employed to explain the models. RESULTS All models provided similar performance, but based on recall the best was CatBoost (77.10 %). SHAP highlighted age and the PRS as primary contributors across all models. Girp developed rules to discern cases from controls, identifying age, PRS, and pancreatitis in most of the rules. CONCLUSION The predictive models tested have exhibited good performance, indicating their potential application in the clinical field in the near future, with the PRS playing a key role in identifying high-risk individuals as demonstrated by the explainers.
Collapse
Affiliation(s)
- Giulia Peduzzi
- Department of Biology, University of Pisa, Via Luca Ghini, 13 - 56126, Pisa, Italy.
| | - Alessio Felici
- Department of Biology, University of Pisa, Via Luca Ghini, 13 - 56126, Pisa, Italy.
| | - Roberto Pellungrini
- Classe di scienze, Scuola Normale Superiore, Piazza dei Cavalieri, 7 - 56126, Pisa, Italy.
| | - Daniele Campa
- Department of Biology, University of Pisa, Via Luca Ghini, 13 - 56126, Pisa, Italy.
| |
Collapse
|
8
|
Tan M, Zhao J, Tao Y, Sehar U, Yan Y, Zou Q, Liu Q, Xu L, Xia Z, Feng L, Xiong J. Utilizing machine learning algorithms for predicting Anxiety-Depression Comorbidity Syndrome in Gastroenterology Inpatients (ADCS-GI). BMC Psychiatry 2025; 25:253. [PMID: 40102794 PMCID: PMC11921569 DOI: 10.1186/s12888-025-06666-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 02/27/2025] [Indexed: 03/20/2025] Open
Abstract
BACKGROUND Accurately diagnosing Anxiety-Depression Comorbidity Syndrome in Gastroenterology Inpatients (ADCS-GI) shows significant challenges as traditional diagnostic methods fail to meet expectations due to patient hesitance and non-psychiatric healthcare professionals' limitations. Therefore, the need for objective diagnostics highlights the potential of machine learning in identifying and treating ADCS-GI. METHODS A total of 1186 ADCS patients were recruited for this study. We conducted extensive studies for the dataset, including data quantification, equilibrium, and correlation analysis. Eight machine learning models, including Gaussian Naive Bayes (NB), Support Vector Classifier (SVC), K-Neighbors Classifier, RandomForest, XGB, CatBoost, Cascade Forest, and Decision Tree, were utilized to compare prediction efficacy, with an effort to minimize the dependency on subjective questionnaires. RESULTS Among eight machine learning algorithms, the Decision Tree and K-nearest neighbors models demonstrated an accuracy exceeding 81% and a sensitivity in the same range for detecting ADCS in patients. Notably, when identifying moderate and severe cases, the models achieved an accuracy above 88% and a sensitivity of 90%. Furthermore, the models trained without reliance on subjective questionnaires showed promising performance, indicating the feasibility of developing questionnaire-free early detection applications. CONCLUSION Machine learning algorithms can be used to identify ADCS among gastroenterology patients. This can help facilitate the early detection and intervention of psychological disorders in gastroenterology patients' care.
Collapse
Affiliation(s)
- Min Tan
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- University of Chinese Academy of Sciences, Beijing, 101400, China
| | - Jinjin Zhao
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yushun Tao
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- University of Chinese Academy of Sciences, Beijing, 101400, China
| | - Uroosa Sehar
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- University of Chinese Academy of Sciences, Beijing, 101400, China
| | - Yan Yan
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Qian Zou
- Department of Gastroenterology and Hepatology, Shenzhen University General Hospital, Shenzhen University, Shenzhen, 518055, China
| | - Qing Liu
- Department of Gastroenterology, Futian District Second People's Hospital, Shenzhen, 518049, China
| | - Long Xu
- Department of Gastroenterology and Hepatology, Shenzhen University General Hospital, Shenzhen University, Shenzhen, 518055, China
| | - Zeyang Xia
- School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Lijuan Feng
- Department of Gastroenterology and Hepatology, Shenzhen University General Hospital, Shenzhen University, Shenzhen, 518055, China.
- Department of Gastroenterology, Xiangya Hospital, Central South University, Changsha, 410008, China.
| | - Jing Xiong
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
- University of Chinese Academy of Sciences, Beijing, 101400, China.
| |
Collapse
|
9
|
Du M, Ren Y, Zhang Y, Li W, Yang H, Chu H, Zhao Y. CSEL-BGC: A Bioinformatics Framework Integrating Machine Learning for Defining the Biosynthetic Evolutionary Landscape of Uncharacterized Antibacterial Natural Products. Interdiscip Sci 2025; 17:27-41. [PMID: 39348072 DOI: 10.1007/s12539-024-00656-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 08/26/2024] [Accepted: 08/28/2024] [Indexed: 10/01/2024]
Abstract
The sluggish pace of new antibacterial drug development reflects a vulnerability in the face of the current severe threat posed by bacterial resistance. Microbial natural products (NPs), as a reservoir of immense chemical potential, have emerged as the most promising avenue for the discovery of next generation antibacterial agent. Directly accessing the antibacterial activity of potential products derived from biosynthetic gene clusters (BGCs) would significantly expedite the process. To tackle this issue, we propose a CSEL-BGC framework that integrates machine learning (ML) techniques. This framework involves the development of a novel cascade-stacking ensemble learning (CSEL) model and the establishment of a groundbreaking model evaluation system. Based on this framework, we predict 6,666 BGCs with antibacterial activity from 3,468 complete bacterial genomes and elucidate a biosynthetic evolutionary landscape to reveal their antibacterial potential. This provides crucial insights for interpretating the synthesis and secretion mechanisms of unknown NPs.
Collapse
Affiliation(s)
- Minghui Du
- School of Life Science and Bio-Pharmaceutics, Shenyang Pharmaceutical University, Shenyang, 110016, China
| | - Yuxiang Ren
- School of Life Science and Bio-Pharmaceutics, Shenyang Pharmaceutical University, Shenyang, 110016, China
| | - Yang Zhang
- School of Life Science and Bio-Pharmaceutics, Shenyang Pharmaceutical University, Shenyang, 110016, China
| | - Wenwen Li
- School of Life Science and Bio-Pharmaceutics, Shenyang Pharmaceutical University, Shenyang, 110016, China
| | - Hongtao Yang
- School of Life Science and Bio-Pharmaceutics, Shenyang Pharmaceutical University, Shenyang, 110016, China
| | - Huiying Chu
- State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116000, China
| | - Yongshan Zhao
- School of Life Science and Bio-Pharmaceutics, Shenyang Pharmaceutical University, Shenyang, 110016, China.
| |
Collapse
|
10
|
Huang W, Zeng R, Li Y, Hua Y, Liu L, Chen M, Xue M, Tu S, Huang F, Hu J. Identification of Alzheimer's disease and vascular dementia based on a Deep Forest and near-infrared spectroscopy analysis method. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2025; 326:125209. [PMID: 39340951 DOI: 10.1016/j.saa.2024.125209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 09/14/2024] [Accepted: 09/22/2024] [Indexed: 09/30/2024]
Abstract
Alzheimer's disease (AD) and vascular dementia (VaD) typically do not exhibit distinct differences in clinical manifestations and auxiliary examination results, which leads to a high misdiagnosis rate. However, significant differences in treatment approaches and prognosis between these two diseases underscore the critical need for an accurate diagnosis of AD and VaD. In this study, serum samples from 33 patients with AD patients, 37 patients with VaD, and 130 healthy individuals were collected, employing near-infrared aquaphotomics technology in combination with deep learning for differential diagnoses. Through an analysis of water absorption patterns among different diseases via aquaphotomics, the efficacies of traditional machine learning methods (Support Vector Machine and Decision Trees) and deep learning approaches (Deep Forest) in modeling were compared. Ultimately, by leveraging feature extraction techniques in conjunction with deep learning, a differential diagnostic model for AD and VaD was successfully developed. The results revealed that aquaphotomics could identify a certain correlation between the number of hydrogen bonds in water molecules and the development of AD and VaD; the deep learning model was found to be superior to traditional machine learning models, achieving an accuracy of 98.67 %, sensitivity of 97.33 %, and specificity of 100.00 %. The bands identified using the Competitive Adaptive Reweighting Algorithm method, primarily located at approximately 1300-1500 nm, showed a significant correlation with water molecules containing four hydrogen bonds. These results highlighted the potential role of the water molecule hydrogen-bond network in disease development and were consistent with the aquaphotomics analysis results. Therefore, the differential diagnostic model developed by integrating near-infrared spectroscopy and deep learning was proven to be effective and feasible, providing accurate and rapid diagnostic methods for AD and VaD diagnoses.
Collapse
Affiliation(s)
- Wenchang Huang
- College of Physical Science and Technology, Guangxi Normal University, Guilin, Guangxi 541004, China
| | - Rui Zeng
- College of Physical Science and Technology, Guangxi Normal University, Guilin, Guangxi 541004, China
| | - Yuanpeng Li
- College of Physical Science and Technology, Guangxi Normal University, Guilin, Guangxi 541004, China; Guangxi Key Laboratory of Nuclear Physics and Technology, Guangxi Normal University, 541004, China.
| | - Yisheng Hua
- College of Physical Science and Technology, Guangxi Normal University, Guilin, Guangxi 541004, China
| | - Lingli Liu
- College of Physical Science and Technology, Guangxi Normal University, Guilin, Guangxi 541004, China
| | - Meiyuan Chen
- College of Physical Science and Technology, Guangxi Normal University, Guilin, Guangxi 541004, China
| | - Mengjiao Xue
- College of Physical Science and Technology, Guangxi Normal University, Guilin, Guangxi 541004, China
| | - Shan Tu
- College of Physical Science and Technology, Guangxi Normal University, Guilin, Guangxi 541004, China; Guangxi Key Laboratory of Nuclear Physics and Technology, Guangxi Normal University, 541004, China.
| | - Furong Huang
- Department of Optoelectronic Engineering, Jinan University, Guangzhou, Guangdong 510632, China.
| | - Junhui Hu
- College of Physical Science and Technology, Guangxi Normal University, Guilin, Guangxi 541004, China; Guangxi Key Laboratory of Nuclear Physics and Technology, Guangxi Normal University, 541004, China
| |
Collapse
|
11
|
Gliozzo J, Soto-Gomez M, Guarino V, Bonometti A, Cabri A, Cavalleri E, Reese J, Robinson PN, Mesiti M, Valentini G, Casiraghi E. Intrinsic-dimension analysis for guiding dimensionality reduction and data fusion in multi-omics data processing. Artif Intell Med 2025; 160:103049. [PMID: 39673960 DOI: 10.1016/j.artmed.2024.103049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/03/2024] [Accepted: 12/04/2024] [Indexed: 12/16/2024]
Abstract
Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines. While several dimensionality reduction and data fusion algorithms have been proposed, crucial aspects are often overlooked. Specifically, the choice of projection space dimension is typically heuristic and uniformly applied across all omics, neglecting the unique high dimension small sample size challenges faced by individual omics. This paper introduces a novel multi-modal dimensionality reduction pipeline tailored to individual views. By leveraging intrinsic dimensionality estimators, we assess the curse-of-dimensionality impact on each view and propose a two-step reduction strategy for significantly affected views, combining feature selection with feature extraction. Compared to traditional uniform reduction pipelines in a crucial and supervised multi-omics analysis setting, our approach shows significant improvement. Additionally, we explore three effective unsupervised multi-omics data fusion methods rooted in the main data fusion strategies to gain insights into their performance under crucial, yet overlooked, settings.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - Mauricio Soto-Gomez
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Valentina Guarino
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Arturo Bonometti
- Department of Biomedical Sciences, Humanitas University, Milan, Italy; Department of Pathology, IRCCS Humanitas Clinical and Research Hospital, Milan, Italy
| | - Alberto Cabri
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Emanuele Cavalleri
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Justin Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Marco Mesiti
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Giorgio Valentini
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; CINI, Infolife National Laboratory, Roma, Italy
| | - Elena Casiraghi
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA; CINI, Infolife National Laboratory, Roma, Italy; Department of Computer Science, Aalto University, Espoo, Finland.
| |
Collapse
|
12
|
Nareklishvili M, Geitle M. Deep Ensemble Transformers for Dimensionality Reduction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2091-2102. [PMID: 38294917 DOI: 10.1109/tnnls.2024.3357621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2024]
Abstract
We propose deep ensemble transformers (DETs), a fast, scalable approach for dimensionality reduction problems. This method leverages the power of deep neural networks and employs cascade ensemble techniques as its fundamental feature extraction tool. To handle high-dimensional data, our approach employs a flexible number of intermediate layers sequentially. These layers progressively transform the input data into decision tree predictions. To further enhance prediction performance, the output from the final intermediate layer is fed through a feed-forward neural network architecture for final prediction. We derive an upper bound of the disparity between the generalization error and the empirical error and demonstrate that it converges to zero. This highlights the generalizability of our method to parameter estimation and feature selection problems. In our experimental evaluations, DETs outperform existing models in terms of prediction accuracy, representation learning ability, and computational time. Specifically, the method achieves over 95% accuracy in gene expression data and can be trained on average 50% faster than traditional artificial neural networks (ANNs).
Collapse
|
13
|
Liu L, Tan Z, Wei Y, Sun Q. A multi-perspective deep learning framework for enhancer characterization and identification. Comput Biol Chem 2025; 114:108284. [PMID: 39577030 DOI: 10.1016/j.compbiolchem.2024.108284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/02/2024] [Accepted: 11/13/2024] [Indexed: 11/24/2024]
Abstract
Enhancers are vital elements in the genome that boost the transcriptional activity of neighboring genes and are essential in regulating cell-specific gene expression. Therefore, accurately identifying and characterizing enhancers is essential for comprehending gene regulatory networks and the development of related diseases. This study introduces MPDL-Enhancer, a novel multi-perspective deep learning framework aimed at enhancer characterization and identification. In this study, enhancer sequences are encoded using the dna2vec model along with features derived from the structural properties of DNA sequences. Subsequently, these representations are processed through a novel dual-scale deep neural network designed to discern subtle correlations and extended interactions embedded within the semantic content of DNA. The predictive phase of our methodology employs a Support Vector Machine classifier to render the final classification. To rigorously assess the efficacy of our approach, a comprehensive evaluation was executed utilizing an independent test dataset, thereby substantiating the robustness and accuracy of our model. Our methodology demonstrated superior performance over existing computational techniques, with an accuracy (ACC) of 81.00 %, a sensitivity (SN) of 79.00 %, and specificity (SP) of 83.00 %. The innovative dual-scale deep neural network and the unique feature representation strategy contributed to this performance improvement. MPDL-Enhancer has effectively characterized enhancer sequences and achieved excellent predictive performance. Building upon this foundation, we conducted an interpretability analysis of the model, which can assist researchers in identifying key features and patterns that affect the functionality of enhancers, thereby promoting a deeper understanding of gene regulatory networks.
Collapse
Affiliation(s)
- Liwei Liu
- College of Science, Dalian Jiaotong University, Dalian 116028, China.
| | - Zhebin Tan
- College of Software, Dalian Jiaotong University, Dalian 116028, China
| | - Yuxiao Wei
- College of Software, Dalian Jiaotong University, Dalian 116028, China
| | - Qianhui Sun
- College of Software, Dalian Jiaotong University, Dalian 116028, China
| |
Collapse
|
14
|
Ji S, Wu J, An F, Lou M, Zhang T, Guo J, Wu P, Zhu Y, Wu R. Umami-gcForest: Construction of a predictive model for umami peptides based on deep forest. Food Chem 2025; 464:141826. [PMID: 39522377 DOI: 10.1016/j.foodchem.2024.141826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 10/07/2024] [Accepted: 10/27/2024] [Indexed: 11/16/2024]
Abstract
Umami peptides have recently gained attention for their ability to enhance umami flavor, reduce salt content, and provide nutritional benefits. However, traditional wet laboratory methods to identify them are time-consuming, laborious, and costly. Therefore, we developed the Umami-gcForest model using the deep forest algorithm. It constructs amino acid feature matrices using ProtBERT, amino acid composition, composition-transition-distribution, and pseudo amino acid composition, applying mutual information for feature selection to optimize dimensions. Compared to other machine learning baseline, umami peptide prediction, and composite models, the validation results of Umami-gcForest on different test sets demonstrated outstanding predictive accuracy. Using SHapley Additive exPlanations to calculate feature contributions, we found that the key features of Umami-gcForest were hydrophobicity, charge, and polarity. Based on this, an online platform was developed to facilitate its user application. In conclusion, Umami-gcForest serves as a powerful tool, providing a solid foundation for the efficient and accurate screening of umami peptides.
Collapse
Affiliation(s)
- Shuaiqi Ji
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Junrui Wu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Feiyu An
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang 110866, PR China
| | - Mengxue Lou
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Taowei Zhang
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Jiawei Guo
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Penggong Wu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang 110866, PR China
| | - Yi Zhu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Rina Wu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang 110866, PR China.
| |
Collapse
|
15
|
Li T, Zheng X, Liu X, Zhang H, Grieneisen ML, He C, Ji M, Zhan Y, Yang F. Enhancing Space-Based Tracking of Fossil Fuel CO 2 Emissions via Synergistic Integration of OCO-2, OCO-3, and TROPOMI Measurements. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025; 59:1587-1597. [PMID: 39453935 DOI: 10.1021/acs.est.4c05896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2024]
Abstract
Top-down estimates of fossil fuel CO2 (FFCO2) emissions are crucial for tracking emissions and evaluating mitigation strategies. However, their practical application is hindered by limited data coverage and overreliance on NOx-to-CO2 emission ratios from emission inventories. We developed the Machine Learning-Driven Mapping Satellite-based XCO2en (ML-MSXE) model using the column-averaged dry-air mole fraction of CO2 enhancement (XCO2en) derived from OCO-2 and OCO-3 measurements to reconstruct the XCO2en distribution for monitoring FFCO2 emissions. Compared to the previous Machine Learning-Driven Deriving XCO2en from Mapped XCO2 (ML-DXEMX) model, ML-MSXE enhances the utilization of TROPOMI NO2 measurements, increasing their relative contribution from 4.3 to 21.7%, thereby improving XCO2en reconstruction accuracy and enhancing the ability to track emissions. Despite the COVID-19 lockdown, XCO2en levels in China rose from 1.33 ± 1.06 in 2019 to 1.39 ± 1.01 ppm in 2021. In February 2020, while the national average rate of XCO2en decline (16.3%) aligned with the reduction in FFCO2 emissions estimated by inventories, XCO2en further revealed varying rates of decline between cities. Furthermore, the spatial distribution of XCO2en identified hotspots where FFCO2 emissions might be underestimated by inventories. This study presents a space-based approach for monitoring FFCO2 emissions, offering valuable insights for assessing carbon neutrality progress and informing policy.
Collapse
Affiliation(s)
- Tao Li
- College of Carbon Neutrality Future Technology, Sichuan University, Chengdu 610065, China
- Department of Environmental Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Xi Zheng
- College of Carbon Neutrality Future Technology, Sichuan University, Chengdu 610065, China
- Department of Environmental Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Xinyi Liu
- College of Carbon Neutrality Future Technology, Sichuan University, Chengdu 610065, China
- Department of Environmental Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Han Zhang
- State Grid Sichuan Electric Power Research Institute, Chengdu 610041, China
| | - Michael L Grieneisen
- Department of Land, Air, and Water Resources, University of California, Davis, California 95616, United States
| | - Changpei He
- Department of Environmental Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Mingrui Ji
- Department of Environmental Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Yu Zhan
- College of Carbon Neutrality Future Technology, Sichuan University, Chengdu 610065, China
| | - Fumo Yang
- College of Carbon Neutrality Future Technology, Sichuan University, Chengdu 610065, China
| |
Collapse
|
16
|
Emmanuel J, Isewon I, Oyelade J. An optimized deep-forest algorithm using a modified differential evolution optimization algorithm: A case of host-pathogen protein-protein interaction prediction. Comput Struct Biotechnol J 2025; 27:595-611. [PMID: 39995682 PMCID: PMC11849198 DOI: 10.1016/j.csbj.2025.01.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Revised: 01/21/2025] [Accepted: 01/21/2025] [Indexed: 02/26/2025] Open
Abstract
Deep Forest employs forest structures and leverages deep architecture to learn feature vector information adaptively. However, deep forest-based models have limitations such as manual hyperparameter optimization and time and memory usage inefficiencies. Bayesian optimization is a widely used model-based hyperparameter optimization method. Evolutionary algorithms such as Differential Evolution (DE) have recently been introduced to improve Bayesian optimization's acquisition function. Despite its effectiveness, DE has a significant drawback as it relies on randomly selecting indices from the population of target vectors to construct donor vectors in search of optimal solutions. This randomness is ineffective, as suboptimal or redundant indices may be selected. Therefore, in this research we developed a modified differential evolution (DE) acquisition function for improved host-pathogen protein-protein interaction prediction. The modified DE introduces a weighted and adaptive donor vector technique that selects the best-fitted donor vectors as opposed to the random approach. This modified optimization approach was implemented in a deep forest model for automatic hyperparameter optimization. The performance of the optimized deep forest model was evaluated on human-Plasmodium falciparum protein sequence datasets using 10-fold cross-validation. The results were compared with standard optimization methods such as traditional Bayesian optimization, genetic algorithms, evolutionary strategies, and other machine learning models. The optimized model achieved an accuracy of 89.3 %, outperforming other models across all metrics, including a sensitivity of 85.4 % and a precision of 91.6 %. Additionally, the optimized model predicted seven novel host-pathogen interactions. Finally, the model was implemented as a web application which is accessible at http://dfh3pi.covenantuniversity.edu.ng.
Collapse
Affiliation(s)
- Jerry Emmanuel
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
- Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Nigeria
- Covenant University Bioinformatics Research (CUBRe), Nigeria
| | - Itunuoluwa Isewon
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
- Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Nigeria
- Covenant University Bioinformatics Research (CUBRe), Nigeria
| | - Jelili Oyelade
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
- Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Nigeria
- Covenant University Bioinformatics Research (CUBRe), Nigeria
| |
Collapse
|
17
|
Qin Z, Yang H, Shu Q, Yu J, Yang Z, Ma X, Duan D. Estimation of Dendrocalamus giganteus leaf area index by combining multi-source remote sensing data and machine learning optimization model. FRONTIERS IN PLANT SCIENCE 2025; 15:1505414. [PMID: 39881727 PMCID: PMC11775760 DOI: 10.3389/fpls.2024.1505414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Accepted: 12/12/2024] [Indexed: 01/31/2025]
Abstract
The Leaf Area Index (LAI) is an essential parameter that affects the exchange of energy and materials between the vegetative canopy and the surrounding environment. Estimating LAI using machine learning models with remote sensing data has become a prevalent method for large-scale LAI estimation. However, existing machine learning models have exhibited various flaws, hindering the accurate estimation of LAI. Thus, a new method for large-scale estimation of Dendrocalamus giganteus LAI was proposed, which integrates ICESat-2/ATLAS, and Sentinel-1/-2 data, and refines machine learning models through the application of Bayesian Optimization (BO), Particle Swarm Optimization (PSO), Genetic Algorithms (GA), and Simulated Annealing (SA). First, spatial interpolation was performed using the Sequential Gaussian Conditional Simulation (SGCS) method. Then, multi-source remote sensing data were leveraged to optimize feature variables through the Pearson correlation coefficient approach. Subsequently, optimization algorithms were applied to Random Forest Regression (RFR), Gradient Boosting Regression Tree (GBRT), and Support Vector Machine Regression (SVR) models, leading to efficient large-scale LAI estimation. The results showed that the BO-GBRT model achieved high accuracy in LAI estimation, with a coefficient of determination (R 2) of 0.922, a root mean square error (RMSE) of 0.263, a mean absolute error (MAE) of 0.187, and an overall estimation accuracy (P 1) of 92.38%. Compared to existing machine learning methods, the proposed approach demonstrated superior performance. This method holds significant potential for large-scale forest LAI inversion and can facilitate further research on other forest structure parameters.
Collapse
Affiliation(s)
- Zhen Qin
- College of Forestry, Southwest Forestry University, Kunming, Yunnan, China
| | - Huanfen Yang
- College of Forestry, Southwest Forestry University, Kunming, Yunnan, China
| | - Qingtai Shu
- College of Forestry, Southwest Forestry University, Kunming, Yunnan, China
| | - Jinge Yu
- School of Ecology and Applied Meteorology, Nanjing University of Information Science & Technology, Nanjing, China
| | - Zhengdao Yang
- College of Forestry, Southwest Forestry University, Kunming, Yunnan, China
| | - Xu Ma
- College of Geography and Remote Sensing Sciences, Xinjiang University, Urumqi, China
| | - Dandan Duan
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
| |
Collapse
|
18
|
Wang Y, Yang Y, Yuan Q, Li T, Zhou Y, Zong L, Wang M, Xie Z, Ho HC, Gao M, Tong S, Lolli S, Zhang L. Substantially underestimated global health risks of current ozone pollution. Nat Commun 2025; 16:102. [PMID: 39747001 PMCID: PMC11696706 DOI: 10.1038/s41467-024-55450-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 12/11/2024] [Indexed: 01/04/2025] Open
Abstract
Existing assessments might have underappreciated ozone-related health impacts worldwide. Here our study assesses current global ozone pollution using the high-resolution (0.05°) estimation from a geo-ensemble learning model, with key focuses on population exposure and all-cause mortality burden. Our model demonstrates strong performance, achieving a mean bias of less than -1.5 parts per billion against in-situ measurements. We estimate that 66.2% of the global population is exposed to excess ozone for short term (> 30 days per year), and 94.2% suffers from long-term exposure. Furthermore, severe ozone exposure levels are observed in Cropland areas, particularly over Asia. Importantly, the all-cause ozone-attributable deaths significantly surpass previous recognition from specific diseases worldwide. Notably, mid-latitude Asia (30°N) and the western United States show high mortality burden, contributing substantially to global ozone-attributable deaths. Our study highlights current significant global ozone-related health risks and may benefit the ozone-exposed population in the future.
Collapse
Affiliation(s)
- Yuan Wang
- School of Atmospheric Physics, Nanjing University of Information Science and Technology, Nanjing, China
| | - Yuanjian Yang
- School of Atmospheric Physics, Nanjing University of Information Science and Technology, Nanjing, China
| | - Qiangqiang Yuan
- School of Geodesy and Geomatics, Wuhan University, Wuhan, China.
| | - Tongwen Li
- School of Geospatial Engineering and Science, Sun Yat-sen University, Zhuhai, China
| | - Yi Zhou
- School of Atmospheric Physics, Nanjing University of Information Science and Technology, Nanjing, China
| | - Lian Zong
- School of Atmospheric Physics, Nanjing University of Information Science and Technology, Nanjing, China
| | - Mengya Wang
- School of Atmospheric Physics, Nanjing University of Information Science and Technology, Nanjing, China
| | - Zunyi Xie
- College of Geography and Environmental Science, Henan University, Kaifeng, China
| | - Hung Chak Ho
- Department of Public and International Affairs, The City University of Hong Kong, Hong Kong, China
| | - Meng Gao
- Department of Geography, Hong Kong Baptist University, Hong Kong, China
| | - Shilu Tong
- School of Atmospheric Physics, Nanjing University of Information Science and Technology, Nanjing, China
- National Institute of Environmental Health, Chinese Centre for Disease Control and Prevention, Beijing, China
- School of Public Health and Social Work, Queensland University of Technology, Brisbane, Australia
| | | | - Liangpei Zhang
- The State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China.
| |
Collapse
|
19
|
Li HH, Liao YH. Application and effectiveness of adaptive AI in elderly healthcare. Psychogeriatrics 2025; 25:e13214. [PMID: 39537559 DOI: 10.1111/psyg.13214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 09/30/2024] [Accepted: 10/31/2024] [Indexed: 11/16/2024]
Abstract
BACKGROUND In addressing elderly healthcare issues, cognitive impairment can cause significant disruptions in daily life and may potentially develop into dementia. Thus, finding ways to delay the progression of cognitive impairment is a critical issue. METHODS This study aims to develop an adaptive artificial intelligence (AI) mechanism that creates enjoyable and beneficial content to help delay cognitive impairment in the elderly. Utilising virtual reality (VR) and a fishing game, the design enhances reaction time and attention through interactive fishing activities. The AI personalises content based on individual performance to improve cognitive function. RESULTS Experimental results showed that adaptive AI increased participant satisfaction from 86.84 to 91.05 points and future willingness from 75.26 to 85.68 points. The number of fish caught rose from 98 to 120, with the average per participant increasing from 2.64 to 2.85. CONCLUSIONS This is undoubtedly the trend of the future. VR allows the elderly to have a more impactful and memorable first experience, while AI dynamically adjusts the game's difficulty based on the elderly's performance, addressing the issue of reduced willingness to continue due to inappropriate game difficulty. The VR game developed in this study is designed to be relaxing and incorporates mechanisms to promote the elderly's health. It is not restricted by location or time and, more importantly, meets the health promotion needs of the elderly.
Collapse
Affiliation(s)
- Hsiao-Hui Li
- Department of Maritime Information and Technology, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - Yuan-Hsun Liao
- Department of Computer Science, Tunghai University, Taichung, Taiwan
| |
Collapse
|
20
|
Wang C, Wang R, Leng Y, Iramina K, Yang Y, Ge S. An Eye Movement Classification Method Based on Cascade Forest. IEEE J Biomed Health Inform 2024; 28:7184-7194. [PMID: 39106144 DOI: 10.1109/jbhi.2024.3439568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2024]
Abstract
Eye tracking technology has become increasingly important in scientific research and practical applications. In the field of eye tracking research, analysis of eye movement data is crucial, particularly for classifying raw eye movement data into eye movement events. Current classification methods exhibit considerable variation in adaptability across different participants, and it is necessary to address the issues of class imbalance and data scarcity in eye movement classification. In the current study, we introduce a novel eye movement classification method based on cascade forest (EMCCF), which comprises two modules: 1) a feature extraction module that employs a multi-scale time window method to extract features from raw eye movement data; 2) a classification module that innovatively employs a layered ensemble architecture, integrating the cascade forest structure with ensemble learning principles, specifically for eye movement classification. Consequently, EMCCF not only enhanced the accuracy and efficiency of eye movement classification but also represents an advancement in applying ensemble learning techniques within this domain. Furthermore, experimental results indicated that EMCCF outperformed existing deep learning-based classification models in several metrics and demonstrated robust performance across different datasets and participants.
Collapse
|
21
|
Hao X, Liu D, Fan L. Yab Xnization platform: A monoclonal antibody heterologization server based on rational design and artificial intelligence-assisted computation. Comput Struct Biotechnol J 2024; 23:3222-3231. [PMID: 39660217 PMCID: PMC11630649 DOI: 10.1016/j.csbj.2024.08.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 08/12/2024] [Accepted: 08/12/2024] [Indexed: 12/12/2024] Open
Abstract
The application of antibody therapeutics is promising in the field of immunotherapy. While, heterologization should be done in most cases before applying the therapeutic antibodies into bodies, e.g., humanization, caninization and felinization for human beings, canine and feline, respectively. Here we report YabXnization, the platform which realizes antibody heterologization on the basis of rational design and artificial intelligence (AI)-assisted computation. YabXnization provides two ways for heterologization: traditional CDR-grafting and backmutation-based rational design; and AI-assisted fusion computational design. Taking humanization as example, both of the two ways first find the proper template for heavy and light chains with CDR-grafting followed. For rational design, bioinformatics analysis-based backmutation is then conducted. For AI-assisted computational design, the backmutation and humanness evaluation are implemented through evolutionary computation framework with DeepForest-based humanness evaluation model and the distance to the previously found human template as objective functions. Finally, the top K heterologized antibodies can be provided by YabXnization platform. We examined the platform with 18 antibodies to be heterologized, in which 10 for humanization, 6 for caninization and 2 for felinization, respectively. The heterologized antibodies were measured by indirect ELISA and BLI(Octet)/SPR(Biacore) binding affinity measurement methods. Test results show a 90% success rate with the binding affinity loss of heterologized antibodies within an order of magnitude compared to the corresponding chimeric antibodies. It even shows an increase in the binding affinity on some of the heterologized antibodies. The platform can be reached through https://www.genscript.com/tools/yabxnization-service.
Collapse
Affiliation(s)
- Xiaohu Hao
- Production and R&D Center I of LSS (Life Science Service), GenScript Biotech Corporation, No. 28, Yongxi Rd., Nanjing, 211100, Jiangsu, China
| | - Dongping Liu
- Production and R&D Center I of LSS (Life Science Service), GenScript Biotech Corporation, No. 28, Yongxi Rd., Nanjing, 211100, Jiangsu, China
| | - Long Fan
- Production and R&D Center I of LSS (Life Science Service), GenScript Biotech Corporation, No. 28, Yongxi Rd., Nanjing, 211100, Jiangsu, China
- Production and R&D Center I of LSS (Life Science Service), GenScript (Shanghai) Biotech Corporation, No. 186, Hedan Rd., Shanghai, 200100, China
| |
Collapse
|
22
|
Jiang L, Jia L, Wang Y, Wu Y, Yue J. Adap-BDCM: Adaptive Bilinear Dynamic Cascade Model for Classification Tasks on CNV Datasets. Interdiscip Sci 2024; 16:1019-1037. [PMID: 38758306 DOI: 10.1007/s12539-024-00635-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 04/18/2024] [Accepted: 04/23/2024] [Indexed: 05/18/2024]
Abstract
Copy number variation (CNV) is an essential genetic driving factor of cancer formation and progression, making intelligent classification based on CNV feasible. However, there are a few challenges in the current machine learning and deep learning methods, such as the design of base classifier combination schemes in ensemble methods and the selection of layers of neural networks, which often result in low accuracy. Therefore, an adaptive bilinear dynamic cascade model (Adap-BDCM) is developed to further enhance the accuracy and applicability of these methods for intelligent classification on CNV datasets. In this model, a feature selection module is introduced to mitigate the interference of redundant information, and a bilinear model based on the gated attention mechanism is proposed to extract more beneficial deep fusion features. Furthermore, an adaptive base classifier selection scheme is designed to overcome the difficulty of manually designing base classifier combinations and enhance the applicability of the model. Lastly, a novel feature fusion scheme with an attribute recall submodule is constructed, effectively avoiding getting stuck in local solutions and missing some valuable information. Numerous experiments have demonstrated that our Adap-BDCM model exhibits optimal performance in cancer classification, stage prediction, and recurrence on CNV datasets. This study can assist physicians in making diagnoses faster and better.
Collapse
Affiliation(s)
- Liancheng Jiang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Liye Jia
- College of Computer Science and Technology, Taiyuan Normal University, Taiyuan, 030619, China
| | - Yizhen Wang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Yongfei Wu
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Junhong Yue
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China.
| |
Collapse
|
23
|
Ma L, Yan Y, Dai S, Shao D, Yi S, Wang J, Li J, Yan J. Research on prediction of human oral bioavailability of drugs based on improved deep forest. J Mol Graph Model 2024; 133:108851. [PMID: 39232489 DOI: 10.1016/j.jmgm.2024.108851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 08/22/2024] [Accepted: 08/26/2024] [Indexed: 09/06/2024]
Abstract
Human oral bioavailability is a crucial factor in drug discovery. In recent years, researchers have constructed a variety of different prediction models. However, given the limited size of human oral bioavailability data sets, the challenge of making accurate predictions with small sample sizes has become a critical issue in the field. The deep forest model, with its adaptively determinable number of cascade levels, can perform exceptionally well even on small-scale data. However, the original deep forest suffers unbalanced multi-grained scanning process and premature stopping of cascade forest training. In this paper, we propose a human oral bioavailability predict method based on an improved deep forest, called balanced multi-grained scanning mapping cascade forest (bgmc-forest). Firstly, the mordred descriptor method is selected to feature extraction, then enhanced features are obtained by the improved balanced multi-grained scanning, which solves the problem of missing features at both ends. And finally, the prediction results are obtained by feature mapping cascaded forests, which is based on principal component analysis and cascade forests, ensures the effectiveness of the cascade forest. The superiority of the model constructed in this paper is demonstrated through comparative experiments, while the effectiveness of the improved module is verified through ablation experiments. Finally the decision-making process of the model is explained by the shapley additive explanations interpretation algorithm.
Collapse
Affiliation(s)
- Lei Ma
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Yukun Yan
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Shaoxing Dai
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Dangguo Shao
- Kunming University of Science and Technology, Kunming, CN 650500, China.
| | - Sanli Yi
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Jiawei Wang
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Jingtao Li
- Kunming University of Science and Technology, Kunming, CN 650500, China
| | - Jiangkai Yan
- Kunming University of Science and Technology, Kunming, CN 650500, China
| |
Collapse
|
24
|
Cai K, Guan L, Li S, Zhang S, Liu Y, Liu Y. Full-coverage estimation of CO 2 concentrations in China via multisource satellite data and Deep Forest model. Sci Data 2024; 11:1231. [PMID: 39543183 PMCID: PMC11564725 DOI: 10.1038/s41597-024-04063-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 10/31/2024] [Indexed: 11/17/2024] Open
Abstract
Monitoring China's carbon dioxide (CO2) concentration is essential for formulating effective carbon cycle policies to achieve carbon peaking and neutrality. Despite insufficient satellite observation coverage, this study utilizes high-resolution spatiotemporal data from the Orbiting Carbon Observatory 2 (OCO-2), supplemented with various auxiliary datasets, to estimate full-coverage, monthly, column-averaged carbon dioxide (XCO2) values across China from 2015 to 2022 at a spatial resolution of 0.05° via the deep forest model. The 10-fold cross-validation results indicate a correlation coefficient (R) of 0.95 and a determination coefficient (R²) of 0.90. Validation against ground-based station data yielded R values of 0.93, and R² values reached 0.81. Further validation from the Greenhouse Gases Observing Satellite (GOSAT) and the Copernicus Atmosphere Monitoring Service Reanalysis dataset (CAMS) produced R² values of 0.87 and 0.80, respectively. During the study period, CO2 concentrations in China were higher in spring and winter than in summer and autumn, indicating a clear annual increase. The estimates generated by this study could potentially support CO2 monitoring in China.
Collapse
Affiliation(s)
- Kun Cai
- School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China
| | - Liuyin Guan
- School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China
| | - Shenshen Li
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, 100094, China.
| | - Shuo Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China
| | - Yang Liu
- School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China
| | - Yang Liu
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA, 30322, United States of America
| |
Collapse
|
25
|
Bai Q, Chen H, Li W, Li L, Li J, Gao Z, Li Y, Li X, Song B. DeepForest-HTP: A novel deep forest approach for predicting antihypertensive peptides. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 258:108514. [PMID: 39549393 DOI: 10.1016/j.cmpb.2024.108514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 11/07/2024] [Accepted: 11/09/2024] [Indexed: 11/18/2024]
Abstract
Hypertension is a major preventable risk factor for cardiovascular disease, affecting over 1.5 billion adults worldwide. Antihypertensive peptides (AHTPs) have gained attention as a natural therapeutic option with minimal side effects. This study proposes a Deep Forest-based machine learning framework for AHTP prediction, leveraging a multi-granularity cascade structure to enhance classification accuracy. We integrated data from BIOPEP-UWM and three previously used datasets, totaling 2000 peptide sequences, and introduced novel feature extraction methods to build a comprehensive dataset for model training. This study represents the first application of Deep Forest for AHTP identification, demonstrating substantial classification performance advantages over traditional methods (e.g., SVM, CNN, and XGBoost) as well as recent mainstream prediction models (Ensemble-AHTPpred, CNN-SVM Ensemble, and mAHTPred). Requiring no complex manual feature engineering, the model adapts flexibly to various data needs, offering a novel perspective for efficient AHTP prediction and promising utility in hypertension management. On the benchmark dataset, the model achieved high accuracy, sensitivity, and AUC, providing a robust tool for identifying safe and effective AHTPs. However, future efforts should incorporate larger and more diverse independent validation datasets to further improve the model and enhance its generalizability. Additionally, the model's predictive accuracy relies on known AHTP targets and sequence features, potentially limiting its ability to detect AHTPs with uncharacterized or atypical properties.
Collapse
Affiliation(s)
- Qiyuan Bai
- The First Clinical Medical College of Lanzhou University, 199 Donggang West Road, Lanzhou 730000, China
| | - Hao Chen
- The First Clinical Medical College of Lanzhou University, 199 Donggang West Road, Lanzhou 730000, China
| | - Wenshuo Li
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China; Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Lei Li
- Department of Infectious Diseases, State Key Laboratory of Antiviral Drugs, Pingyuan Laboratory, The First Affiliated Hospital of Zhengzhou University, 1 Jianshe East Road, Zhengzhou 450052, China
| | - Junhao Li
- Department of General Surgery, School of Medicine, The Fourth Affliated Hospital, Zhejiang University, N1 Mall Avenue, Yiwu 322000, China
| | - Zhen Gao
- Department of Cardiac Surgery, Beijing Institute of Heart Lung and Blood Vessel Diseases, Capital Medical University Affiliated Beijing Anzhen Hospital, 2 Anzhen Road, Beijing 100029, China
| | - Yuan Li
- Department of Cardiovascular Surgery, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences, Peking Union Medical College, 167 Lisi Road, Beijing 100006, China
| | - Xuhua Li
- The First Clinical Medical College of Lanzhou University, 199 Donggang West Road, Lanzhou 730000, China
| | - Bing Song
- The First Clinical Medical College of Lanzhou University, 199 Donggang West Road, Lanzhou 730000, China; Department of Cardiovascular Surgery, First Hospital of Lanzhou University, 1 Donggang West Road, Lanzhou 730000, China.
| |
Collapse
|
26
|
Wang D, Wang Q, Chen Z, Guo J, Li S. CVAE-DF: A hybrid deep learning framework for fertilization status detection of pre-incubation duck eggs based on VIS/NIR spectroscopy. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 320:124569. [PMID: 38878719 DOI: 10.1016/j.saa.2024.124569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 04/30/2024] [Accepted: 05/29/2024] [Indexed: 07/08/2024]
Abstract
Unfertilized duck eggs not removed prior to incubation will deteriorate quickly, posing a risk of contaminating the normally fertilized duck eggs. Thus, detecting the fertilization status of breeding duck eggs as early as possible is a meaningful and challenging task. Most existing work usually focus on the characteristics of chicken eggs during mid-term hatching. However, little attention has been paid to the detection for duck eggs prior to incubation. In this paper, we present a novel hybrid deep learning detection framework for the fertilization status of pre-incubation duck eggs, termed CVAE-DF, based on visible/near-infrared (VIS/NIR) transmittance spectroscopy. The framework comprises the encoder of a convolutional variational autoencoder (CVAE) and an improved deep forest (DF) model. More specifically, we first collected transmittance spectral data (400-1000 nm) of 255 duck eggs before hatching. The multiplicative scatter correction (MSC) method was then used to eliminate noise and extraneous information of the raw spectral data. Two efficient data augmentation methods were adopted to provide sufficient data. After that, CVAE was applied to extract representative features and reduce the feature dimension for the detection task. Finally, an improved DF model was employed to build the classification model on the enhanced feature set. The CVAE-DF model achieved an overall accuracy of 95.94 % on the test dataset. These experimental results in terms of four metrics demonstrate that our CVAE-DF method outperforms the traditional methods by a significant margin. Furthermore, the results also indicate that CVAE holds great promise as a novel feature extraction method for the VIS/NIR spectral analysis of other agricultural products. It is extremely beneficial to practical engineering.
Collapse
Affiliation(s)
- Dongqiao Wang
- College of Engineering, Huazhong Agricultural University, Wuhan 430070, China
| | - Qiaohua Wang
- College of Engineering, Huazhong Agricultural University, Wuhan 430070, China; Key Laboratory of Agricultural Equipment in Mid-Lower Yangtze River, Ministry of Agriculture and Rural Agriculture, Wuhan 430070, China.
| | - Zhuoting Chen
- College of Engineering, Huazhong Agricultural University, Wuhan 430070, China
| | - Juncai Guo
- School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Shijun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Veterinary Medicine, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
27
|
Jia L, Jiang L, Yue J, Hao F, Wu Y, Liu X. MLW-BFECF: A Multi-Weighted Dynamic Cascade Forest Based on Bilinear Feature Extraction for Predicting the Stage of Kidney Renal Clear Cell Carcinoma on Multi-Modal Gene Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2568-2579. [PMID: 39453793 DOI: 10.1109/tcbb.2024.3486742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2024]
Abstract
The stage prediction of kidney renal clear cell carcinoma (KIRC) is important for the diagnosis, personalized treatment, and prognosis of patients. Many prediction methods have been proposed, but most of them are based on unimodal gene data, and their accuracy is difficult to further improve. Therefore, we propose a novel multi-weighted dynamic cascade forest based on the bilinear feature extraction (MLW-BFECF) model for stage prediction of KIRC using multimodal gene data (RNA-seq, CNA, and methylation). The proposed model utilizes a dynamic cascade framework with shuffle layers to prevent early degradation of the model. In each cascade layer, a voting technique based on three gene selection algorithms is first employed to effectively retain gene features more relevant to KIRC and eliminate redundant information in gene features. Then, two new bilinear models based on the gated attention mechanism are proposed to better extract new intra-modal and inter-modal gene features; Finally, based on the idea of the bagging, a multi-weighted ensemble forest classifiers module is proposed to extract and fuse probabilistic features of the three-modal gene data. A series of experiments demonstrate that the MLW-BFECF model based on the three-modal KIRC dataset achieves the highest prediction performance with an accuracy of 88.9 %.
Collapse
|
28
|
Bai J, Huang Y, Fan X, Cui J, Chen B, Chen Y, Guo L. Production of high calorific value hydrogen-rich combustible gas by supercritical water gasification of straw assisted by machine learning. BIORESOURCE TECHNOLOGY 2024; 410:131275. [PMID: 39151570 DOI: 10.1016/j.biortech.2024.131275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 08/07/2024] [Accepted: 08/13/2024] [Indexed: 08/19/2024]
Abstract
This article reveals the basic laws of straw supercritical water gasification (SCWG) and provides basic experimental data for the effective utilization of straw. The paper studied the impact of three operational conditions on the production of high-calorific value hydrogen-rich combustible gases through SCWG of straw within a quartz tube reactor. The findings reveal that elevated reaction temperatures, extended residence times, and reduced feedstock concentrations favor the SCWG of straw. When combustible gas contains carbon dioxide, the maximum low heating value (LHV) of the gas is 21 MJ/Nm3. Upon removing carbon dioxide, the LHV of the gas reached 38 MJ/Nm3. Subsequently, a machine learning (ML) model was developed to forecast gas yield and LHV during the SCWG process. The results demonstrate that the model exhibits robust generalization capabilities. ML can be extensively applied to forecast biomass SCWG processes across various operational conditions.
Collapse
Affiliation(s)
- Jingui Bai
- State Key Laboratory of Multiphase Flow in Power Engineering, Xi'an Jiaotong University, No.28, Xianning West Road, Xi'an 710049, China
| | - Yong Huang
- State Key Laboratory of Multiphase Flow in Power Engineering, Xi'an Jiaotong University, No.28, Xianning West Road, Xi'an 710049, China
| | - Xihang Fan
- State Key Laboratory of Multiphase Flow in Power Engineering, Xi'an Jiaotong University, No.28, Xianning West Road, Xi'an 710049, China
| | - Jinhua Cui
- State Key Laboratory of Multiphase Flow in Power Engineering, Xi'an Jiaotong University, No.28, Xianning West Road, Xi'an 710049, China
| | - Bin Chen
- State Key Laboratory of Multiphase Flow in Power Engineering, Xi'an Jiaotong University, No.28, Xianning West Road, Xi'an 710049, China
| | - Yunan Chen
- State Key Laboratory of Multiphase Flow in Power Engineering, Xi'an Jiaotong University, No.28, Xianning West Road, Xi'an 710049, China.
| | - Liejin Guo
- State Key Laboratory of Multiphase Flow in Power Engineering, Xi'an Jiaotong University, No.28, Xianning West Road, Xi'an 710049, China
| |
Collapse
|
29
|
Song Y, Li X, Zheng Y, Zhang G. Quantitative prediction of water quality in Dongjiang Lake watershed based on LUCC. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 284:117005. [PMID: 39250859 DOI: 10.1016/j.ecoenv.2024.117005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 08/19/2024] [Accepted: 09/02/2024] [Indexed: 09/11/2024]
Abstract
Land Use/ Cover Change (LUCC) plays a crucial role in influencing hydrological processes, nutrient cycling, and sediment transport in watersheds, ultimately impacting water quality on both spatial and temporal scales. Accurately predicting changes in watershed water quality is beneficial for the sustainable management of water resources. Current models often lack the ability to effectively predict water quality changes in a dynamic spatio-temporal context, particularly in complex watershed environments. The overall purpose of the study is to establish a comprehensive and dynamic modeling framework that links LUCC with water quality, allowing for accurate predictions of future water quality under varying land use scenarios. The model, which uses water quality as the dependent variable and LUCC as the independent variable, was developed to quantitatively predict changes in watershed water quality. To achieve this, annual multi-period remote sensing images from Landsat-5, Landsat-8 or Sentinel-2 satellites spanning from 1992 to 2022 were analyzed. Random Forest (achieving a Kappa coefficient of 0.9468) were employed to classify land use within the watershed. Based on classification results, a Cellular Automata-Markov chain model (CA-Markov) was constructed to simulate and predict the spatio-temporal patterns of land use, incorporating driving factors such as proximity to water systems, roads, elevation, and slope. Validation of the model using LUCC data from 2020 yielded a high prediction accuracy with a Kappa coefficient of 0.9505. The CA-Markov model was further utilized to project LUCC under three different scenarios-natural development, ecological protection, and arable land protection-between 2023 and 2033. Based on these projections, the coupled water quality and LUCC model was employed to predict water quality changes in the watershed over the same period. Key findings indicate that water quality is likely to improve under ecological protection scenario, while deterioration is expected under natural development scenario and cropland protection scenario due to urban expansion, agricultural practices, and water diversion for irrigation. This study provides a robust framework for watershed management, offering scientific guidance for source management and water purification efforts, thereby contributing significantly to the sustainable development of water resources.
Collapse
Affiliation(s)
- Yang Song
- College of Environmental Science and Engineering, Hunan University, Changsha 410082, PR China; ASEM Water Resources Research and Development Center, Changsha 410031, PR China
| | - Xiaoming Li
- College of Environmental Science and Engineering, Hunan University, Changsha 410082, PR China.
| | - Ying Zheng
- College of Forestry, Central South University of Forestry and Technology, Changsha 410004, PR China
| | - Gui Zhang
- College of Forestry, Central South University of Forestry and Technology, Changsha 410004, PR China
| |
Collapse
|
30
|
Liu W, Zhou B, Li G, Luo X. Enhanced diagnostics for generalized anxiety disorder: leveraging differential channel and functional connectivity features based on frontal EEG signals. Sci Rep 2024; 14:22789. [PMID: 39354007 PMCID: PMC11445517 DOI: 10.1038/s41598-024-73615-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 09/19/2024] [Indexed: 10/03/2024] Open
Abstract
Generalized Anxiety Disorder (GAD) is a chronic anxiety condition characterized by persistent excessive worry, anxiety, and fear. Current diagnostic practices primarily rely on clinicians' subjective assessments and experience, highlighting a need for more objective and reliable methods. This study collected 10-minute resting-state electroencephalogram (EEG) from 45 GAD patients and 36 healthy controls (HC), focusing on six frontal EEG channels for preprocessing, data segmentation, and frequency band division. Innovatively, this study introduced the "Differential Channel" method, which enhances classification performance by enhancing the information related to anxiety from the data, thereby highlighting signal differences. Utilizing the preprocessed EEG signals, undirected functional connectivity features (Phase Lag Index, Pearson Correlation Coefficient, and Mutual Information) and directed functional connectivity features (Partial Directed Coherence) were extracted. Multiple machine learning models were applied to distinguish between GAD patients and HC. The results show that the Deep Forest classifier achieves excellent performance with a 12-second time window of DiffFeature. In particular, the classification of GAD and HC was successfully obtained by combining OriFeature and DiffFeature on Mutual Information with a maximum accuracy of 98.08%. Furthermore, it was observed that undirected functional connectivity features significantly outperformed directed functional connectivity when fewer frontal channels were used. Overall, the methodologies developed in this study offer accurate and practical identification strategies for the early screening and clinical diagnosis of GAD, offering the necessary theoretical and technical support for further enhancing the portability of EEG devices.
Collapse
Affiliation(s)
- Wei Liu
- College of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004, China
| | - Bin Zhou
- College of Mathematical Medicine, Zhejiang Normal University, Jinhua, 321004, China
| | - Gang Li
- College of Mathematical Medicine, Zhejiang Normal University, Jinhua, 321004, China.
| | - Xiaodong Luo
- The Second Hospital of Jinhua, Jinhua, 321016, China.
| |
Collapse
|
31
|
Xu J, Hao J, Liao X, Shang X, Li X. SSCI: Self-Supervised Deep Learning Improves Network Structure for Cancer Driver Gene Identification. Int J Mol Sci 2024; 25:10351. [PMID: 39408682 PMCID: PMC11476395 DOI: 10.3390/ijms251910351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Revised: 09/21/2024] [Accepted: 09/23/2024] [Indexed: 10/20/2024] Open
Abstract
The pathogenesis of cancer is complex, involving abnormalities in some genes in organisms. Accurately identifying cancer genes is crucial for the early detection of cancer and personalized treatment, among other applications. Recent studies have used graph deep learning methods to identify cancer driver genes based on biological networks. However, incompleteness and the noise of the networks will weaken the performance of models. To address this, we propose a cancer driver gene identification method based on self-supervision for graph convolutional networks, which can efficiently enhance the structure of the network and further improve predictive accuracy. The reliability of SSCI is verified by the area under the receiver operating characteristic curves (AUROC), the area under the precision-recall curves (AUPRC), and the F1 score, with respective values of 0.966, 0.964, and 0.913. The results show that our method can identify cancer driver genes with strong discriminative power and biological interpretability.
Collapse
Affiliation(s)
- Jialuo Xu
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (J.X.); (J.H.); (X.L.); (X.S.)
| | - Jun Hao
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (J.X.); (J.H.); (X.L.); (X.S.)
| | - Xingyu Liao
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (J.X.); (J.H.); (X.L.); (X.S.)
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (J.X.); (J.H.); (X.L.); (X.S.)
| | - Xingyi Li
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (J.X.); (J.H.); (X.L.); (X.S.)
- Research & Development Institute of Northwestern Polytechnical University in Shenzhen, Shenzhen 518063, China
| |
Collapse
|
32
|
Shi K, Liu Q, Ji Q, He Q, Zhao XM. MicroHDF: predicting host phenotypes with metagenomic data using a deep forest-based framework. Brief Bioinform 2024; 25:bbae530. [PMID: 39446191 PMCID: PMC11500453 DOI: 10.1093/bib/bbae530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 09/25/2024] [Accepted: 10/07/2024] [Indexed: 10/25/2024] Open
Abstract
The gut microbiota plays a vital role in human health, and significant effort has been made to predict human phenotypes, especially diseases, with the microbiota as a promising indicator or predictor with machine learning (ML) methods. However, the accuracy is impacted by a lot of factors when predicting host phenotypes with the metagenomic data, e.g. small sample size, class imbalance, high-dimensional features, etc. To address these challenges, we propose MicroHDF, an interpretable deep learning framework to predict host phenotypes, where a cascade layers of deep forest units is designed for handling sample class imbalance and high dimensional features. The experimental results show that the performance of MicroHDF is competitive with that of existing state-of-the-art methods on 13 publicly available datasets of six different diseases. In particular, it performs best with the area under the receiver operating characteristic curve of 0.9182 ± 0.0098 and 0.9469 ± 0.0076 for inflammatory bowel disease (IBD) and liver cirrhosis, respectively. Our MicroHDF also shows better performance and robustness in cross-study validation. Furthermore, MicroHDF is applied to two high-risk diseases, IBD and autism spectrum disorder, as case studies to identify potential biomarkers. In conclusion, our method provides an effective and reliable prediction of the host phenotype and discovers informative features with biological insights.
Collapse
Affiliation(s)
- Kai Shi
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, Gaungxi 541004, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent Systems, Guilin University of Technology, Guilin, Gaungxi 541004, China
| | - Qiaohui Liu
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, Gaungxi 541004, China
| | - Qingrong Ji
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, Gaungxi 541004, China
| | - Qisheng He
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, Gaungxi 541004, China
| | - Xing-Ming Zhao
- Huzhou Central Hospital, Affiliated Central Hospital Huzhou University, Huzhou, Zhejiang 313000, China
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| |
Collapse
|
33
|
Wei P, Hao S, Shi Y, Anand A, Wang Y, Chu M, Ning Z. Combining Google traffic map with deep learning model to predict street-level traffic-related air pollutants in a complex urban environment. ENVIRONMENT INTERNATIONAL 2024; 191:108992. [PMID: 39250881 DOI: 10.1016/j.envint.2024.108992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 08/26/2024] [Accepted: 08/29/2024] [Indexed: 09/11/2024]
Abstract
BACKGROUND Traffic-related air pollution (TRAP) is a major contributor to urban pollution and varies sharply at the street level, posing a challenge for air quality modeling. Traditional land use regression models combined with data from fixed monitoring stations may be unable to predict and characterize fine-scale TRAP, especially in complex urban environments influenced by various features. This study aims to estimate fine-scale (50 m) concentrations of nitrogen oxides (NO and NO₂) in Hong Kong using a deep learning (DL) structured model. METHODS We collected data from mobile air quality sensors on buses and crowd-sourced Google real-time traffic status as a proxy for real-time traffic emissions. Our DL model was compared with existing machine learning models to assess performance improvements. Using an interpretable machine learning method, we hierarchically evaluated the global, local, and interaction effects for different features. RESULTS Our DL model outperformed existing machine learning models, achieving R2 values of 0.72 for NO and 0.69 for NO₂. The incorporation of traffic status as a key predictor improved model performance by 9% to 17%. The interpretable machine learning method revealed the importance of traffic-related features and their pairwise interactions. CONCLUSION The results indicate that traffic-related features significantly contribute to TRAP and provide insights and guidance for urban planning. By incorporating crowd-sourced Google traffic information, we assessed traffic abatement scenarios that could inform targeted strategies for improving urban air quality.
Collapse
Affiliation(s)
- Peng Wei
- College of Geography and Environment, Shandong Normal University, Jinan, China; Division of Environment and Sustainability, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Song Hao
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China.
| | - Yuan Shi
- Department of Geography & Planning, University of Liverpool, Liverpool, UK.
| | - Abhishek Anand
- Department of Mechanical Engineering, Carnegie Mellon University, United States
| | - Ya Wang
- Division of Environment and Sustainability, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Mengyuan Chu
- Division of Environment and Sustainability, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Zhi Ning
- Division of Environment and Sustainability, The Hong Kong University of Science and Technology, Hong Kong, China.
| |
Collapse
|
34
|
Hu H, Qian C, Xue K, Jörgensen RG, Keiluweit M, Liang C, Zhu X, Chen J, Sun Y, Ni H, Ding J, Huang W, Mao J, Tan RX, Zhou J, Crowther TW, Zhou ZH, Zhang J, Liang Y. Reducing the uncertainty in estimating soil microbial-derived carbon storage. Proc Natl Acad Sci U S A 2024; 121:e2401916121. [PMID: 39172788 PMCID: PMC11363314 DOI: 10.1073/pnas.2401916121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 07/22/2024] [Indexed: 08/24/2024] Open
Abstract
Soil organic carbon (SOC) is the largest carbon pool in terrestrial ecosystems and plays a crucial role in mitigating climate change and enhancing soil productivity. Microbial-derived carbon (MDC) is the main component of the persistent SOC pool. However, current formulas used to estimate the proportional contribution of MDC are plagued by uncertainties due to limited sample sizes and the neglect of bacterial group composition effects. Here, we compiled the comprehensive global dataset and employed machine learning approaches to refine our quantitative understanding of MDC contributions to total carbon storage. Our efforts resulted in a reduction in the relative standard errors in prevailing estimations by an average of 71% and minimized the effect of global variations in bacterial group compositions on estimating MDC. Our estimation indicates that MDC contributes approximately 758 Pg, representing approximately 40% of the global soil carbon stock. Our study updated the formulas of MDC estimation with improving the accuracy and preserving simplicity and practicality. Given the unique biochemistry and functioning of the MDC pool, our study has direct implications for modeling efforts and predicting the land-atmosphere carbon balance under current and future climate scenarios.
Collapse
Affiliation(s)
- Han Hu
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing210008, China
- University of the Chinese Academy of Sciences, Beijing100049, China
| | - Chao Qian
- National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing210023, China
- School of Artificial Intelligence, Nanjing University, Nanjing210023, China
| | - Ke Xue
- National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing210023, China
- School of Artificial Intelligence, Nanjing University, Nanjing210023, China
| | - Rainer Georg Jörgensen
- Department of Soil Biology and Plant Nutrition, University of Kassel, Kassel34117, Germany
| | - Marco Keiluweit
- Institute of Earth Surface Dynamics, University of Lausanne, LausanneCH-1015, Switzerland
| | - Chao Liang
- Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang110016, China
- Key Lab of Conservation Tillage and Ecological Agriculture, Liaoning Province, Shenyang110016, China
| | - Xuefeng Zhu
- Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang110016, China
- Key Lab of Conservation Tillage and Ecological Agriculture, Liaoning Province, Shenyang110016, China
| | - Ji Chen
- Department of Agroecology, Aarhus University, Tjele8830, Denmark
- Aarhus University Centre for Circular Bioeconomy, Aarhus University, Tjele8830, Denmark
- Interdisciplinary Centre for Climate Change, Aarhus University, Roskilde4000, Denmark
| | - Yishen Sun
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing210008, China
- University of the Chinese Academy of Sciences, Beijing100049, China
| | - Haowei Ni
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing210008, China
- University of the Chinese Academy of Sciences, Beijing100049, China
| | - Jixian Ding
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing210008, China
| | - Weigen Huang
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing210008, China
- University of the Chinese Academy of Sciences, Beijing100049, China
| | - Jingdong Mao
- Department of Chemistry and Biochemistry, Old Dominion University, Norfolk, VA23529
| | - Rong-Xi Tan
- National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing210023, China
- School of Artificial Intelligence, Nanjing University, Nanjing210023, China
| | - Jizhong Zhou
- School of Biological Sciences, University of Oklahoma, Norman, OK73069
| | - Thomas W. Crowther
- Department of Environmental Systems Science, Institute of Integrative Biology, ETH Zurich8092, Switzerland
| | - Zhi-Hua Zhou
- National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing210023, China
- School of Artificial Intelligence, Nanjing University, Nanjing210023, China
| | - Jiabao Zhang
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing210008, China
| | - Yuting Liang
- State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing210008, China
- University of the Chinese Academy of Sciences, Beijing100049, China
| |
Collapse
|
35
|
Arif M, Musleh S, Fida H, Alam T. PLMACPred prediction of anticancer peptides based on protein language model and wavelet denoising transformation. Sci Rep 2024; 14:16992. [PMID: 39043738 PMCID: PMC11266708 DOI: 10.1038/s41598-024-67433-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 07/11/2024] [Indexed: 07/25/2024] Open
Abstract
Anticancer peptides (ACPs) perform a promising role in discovering anti-cancer drugs. The growing research on ACPs as therapeutic agent is increasing due to its minimal side effects. However, identifying novel ACPs using wet-lab experiments are generally time-consuming, labor-intensive, and expensive. Leveraging computational methods for fast and accurate prediction of ACPs would harness the drug discovery process. Herein, a machine learning-based predictor, called PLMACPred, is developed for identifying ACPs from peptide sequence only. PLMACPred adopted a set of encoding schemes representing evolutionary-property, composition-property, and protein language model (PLM), i.e., evolutionary scale modeling (ESM-2)- and ProtT5-based embedding to encode peptides. Then, two-dimensional (2D) wavelet denoising (WD) was employed to remove the noise from extracted features. Finally, ensemble-based cascade deep forest (CDF) model was developed to identify ACP. PLMACPred model attained superior performance on all three benchmark datasets, namely, ACPmain, ACPAlter, and ACP740 over tenfold cross validation and independent dataset. PLMACPred outperformed the existing models and improved the prediction accuracy by 18.53%, 2.4%, 7.59% on ACPmain, ACPalter, ACP740 dataset, respectively. We showed that embedding from ProtT5 and ESM-2 was capable of capturing better contextual information from the entire sequence than the other encoding schemes for ACP prediction. For the explainability of proposed model, SHAP (SHapley Additive exPlanations) method was used to analyze the feature effect on the ACP prediction. A list of novel sequence motifs was proposed from the ACP sequence using MEME suites. We believe, PLMACPred will support in accelerating the discovery of novel ACPs as well as other activities of microbial peptides.
Collapse
Affiliation(s)
- Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Huma Fida
- Department of Microbiology, Abdul Wali Khan University, Mardan, KPK, Pakistan
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
| |
Collapse
|
36
|
Shen J, Guo X, Bai H, Luo J. CAEM-GBDT: a cancer subtype identifying method using multi-omics data and convolutional autoencoder network. FRONTIERS IN BIOINFORMATICS 2024; 4:1403826. [PMID: 39077754 PMCID: PMC11284046 DOI: 10.3389/fbinf.2024.1403826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 06/13/2024] [Indexed: 07/31/2024] Open
Abstract
The identification of cancer subtypes plays a very important role in the field of medicine. Accurate identification of cancer subtypes is helpful for both cancer treatment and prognosis Currently, most methods for cancer subtype identification are based on single-omics data, such as gene expression data. However, multi-omics data can show various characteristics about cancer, which also can improve the accuracy of cancer subtype identification. Therefore, how to extract features from multi-omics data for cancer subtype identification is the main challenge currently faced by researchers. In this paper, we propose a cancer subtype identification method named CAEM-GBDT, which takes gene expression data, miRNA expression data, and DNA methylation data as input, and adopts convolutional autoencoder network to identify cancer subtypes. Through a convolutional encoder layer, the method performs feature extraction on the input data. Within the convolutional encoder layer, a convolutional self-attention module is embedded to recognize higher-level representations of the multi-omics data. The extracted high-level representations from the convolutional encoder are then concatenated with the input to the decoder. The GBDT (Gradient Boosting Decision Tree) is utilized for cancer subtype identification. In the experiments, we compare CAEM-GBDT with existing cancer subtype identifying methods. Experimental results demonstrate that the proposed CAEM-GBDT outperforms other methods. The source code is available from GitHub at https://github.com/gxh-1/CAEM-GBDT.git.
Collapse
Affiliation(s)
| | | | | | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, China
| |
Collapse
|
37
|
Shi WJ, Long XB, Xin L, Chen CE, Ying GG. Predicting the new psychoactive substance activity of antitussives and evaluating their ecotoxicity to fish. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 932:172872. [PMID: 38692322 DOI: 10.1016/j.scitotenv.2024.172872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/25/2024] [Accepted: 04/27/2024] [Indexed: 05/03/2024]
Abstract
The misuse of antitussives preparations is a continuing problem in the world, and imply that they might have potential new psychoactive substances (NPS) activity. However, few study focus on their ecological toxicity towards fish. In the present study, the machine learning (ML) methods gcForest and random forest (RF) were employed to predict NPS activity in 30 antitussives. The potential toxic target, mode of action (MOA), acute toxicity and chronic toxicity to fish were further investigated. The results showed that both gcForest and RF achieved optimal performance when utilizing combined features of molecular fingerprint (MF) and molecular descriptor (MD), with area under the curve (AUC) = 0.99, accuracy >0.94 and f1 score > 0.94, and were applied to screen the NPS activity in antitussives. A total of 15 antitussives exhibited potential NPS activity, including frequently-used substances like codeine and dextromethorphan. The binding affinity of these antitussives with zebrafish dopamine transporter (zDAT) was high, and even surpassing that of some traditional narcotics and NPS. Some antitussives formed hydrogen bonds or salt bridges with aspartate (Asp) 95, tyrosine (Tyr) 171 of zDAT. For the ecotoxicity, the MOA of these 15 antitussives in fish was predicted as narcosis. The prenoxdiazin, pholcodine, codeine, dextromethorphan and dextrorphan exhibited very toxic/toxic to fish. It was necessary to pay close attention to the ecotoxicity of these antitussives. In this study, the integration of ML, molecular docking and ECOSAR approaches are powerful tools for understanding the toxicity profiles and ecological hazards posed by new pollutants.
Collapse
Affiliation(s)
- Wen-Jun Shi
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China.
| | - Xiao-Bing Long
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Lei Xin
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Chang-Er Chen
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Guang-Guo Ying
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| |
Collapse
|
38
|
Yin X, Müller F, Laguna AF, Li C, Huang Q, Shi Z, Lederer M, Laleni N, Deng S, Zhao Z, Imani M, Shi Y, Niemier M, Hu XS, Zhuo C, Kämpfe T, Ni K. Deep random forest with ferroelectric analog content addressable memory. SCIENCE ADVANCES 2024; 10:eadk8471. [PMID: 38838137 DOI: 10.1126/sciadv.adk8471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 05/01/2024] [Indexed: 06/07/2024]
Abstract
Deep random forest (DRF), which combines deep learning and random forest, exhibits comparable accuracy, interpretability, low memory and computational overhead to deep neural networks (DNNs) in edge intelligence tasks. However, efficient DRF accelerator is lagging behind its DNN counterparts. The key to DRF acceleration lies in realizing the branch-split operation at decision nodes. In this work, we propose implementing DRF through associative searches realized with ferroelectric analog content addressable memory (ACAM). Utilizing only two ferroelectric field effect transistors (FeFETs), the ultra-compact ACAM cell performs energy-efficient branch-split operations by storing decision boundaries as analog polarization states in FeFETs. The DRF accelerator architecture and its model mapping to ACAM arrays are presented. The functionality, characteristics, and scalability of the FeFET ACAM DRF and its robustness against FeFET device non-idealities are validated in experiments and simulations. Evaluations show that the FeFET ACAM DRF accelerator achieves ∼106×/10× and ∼106×/2.5× improvements in energy and latency, respectively, compared to other DRF hardware implementations on state-of-the-art CPU/ReRAM.
Collapse
Affiliation(s)
- Xunzhao Yin
- Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of CS&AUS of Zhejiang Province, Hangzhou, China
| | | | | | - Chao Li
- Zhejiang University, Hangzhou, Zhejiang, China
| | | | - Zhiguo Shi
- Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of CS&AUS of Zhejiang Province, Hangzhou, China
| | | | | | - Shan Deng
- University of Notre Dame, Notre Dame, IN 46614, USA
| | - Zijian Zhao
- University of Notre Dame, Notre Dame, IN 46614, USA
| | | | - Yiyu Shi
- University of Notre Dame, Notre Dame, IN 46614, USA
| | | | | | - Cheng Zhuo
- Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of CS&AUS of Zhejiang Province, Hangzhou, China
| | | | - Kai Ni
- University of Notre Dame, Notre Dame, IN 46614, USA
| |
Collapse
|
39
|
Fan J, Hu X. Towards Efficient Neural Decoder for Dexterous Finger Force Predictions. IEEE Trans Biomed Eng 2024; 71:1831-1840. [PMID: 38215325 DOI: 10.1109/tbme.2024.3353145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2024]
Abstract
OBJECTIVE Dexterous control of robot hands requires a robust neural-machine interface capable of accurately decoding multiple finger movements. Existing studies primarily focus on single-finger movement or rely heavily on multi-finger data for decoder training, which requires large datasets and high computation demand. In this study, we investigated the feasibility of using limited single-finger surface electromyogram (sEMG) data to train a neural decoder capable of predicting the forces of unseen multi-finger combinations. METHODS We developed a deep forest-based neural decoder to concurrently predict the extension and flexion forces of three fingers (index, middle, and ring-pinky). We trained the model using varying amounts of high-density EMG data in a limited condition (i.e., single-finger data). RESULTS We showed that the deep forest decoder could achieve consistently commendable performance with 7.0% of force prediction errors and R2 value of 0.874, significantly surpassing the conventional EMG amplitude method and convolutional neural network approach. However, the deep forest decoder accuracy degraded when a smaller amount of data was used for training and when the testing data became noisy. CONCLUSION The deep forest decoder shows accurate performance in multi-finger force prediction tasks. The efficiency aspect of the deep forest lies in the short training time and small volume of training data, which are two critical factors in current neural decoding applications. SIGNIFICANCE This study offers insights into efficient and accurate neural decoder training for advanced robotic hand control, which has the potential for real-life applications during human-machine interactions.
Collapse
|
40
|
Xu W, Rong Z, Ma W, Zhu B, Li N, Huang J, Liu Z, Yu Y, Zhang F, Zhang X, Ge M, Hou Y. Improving the classification of multiple sclerosis and cerebral small vessel disease with interpretable transfer attention neural network. Comput Biol Med 2024; 176:108530. [PMID: 38749324 DOI: 10.1016/j.compbiomed.2024.108530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 04/14/2024] [Accepted: 04/28/2024] [Indexed: 05/31/2024]
Abstract
As an autoimmune-mediated inflammatory demyelinating disease of the central nervous system, multiple sclerosis (MS) is often confused with cerebral small vessel disease (cSVD), which is a regional pathological change in brain tissue with unknown pathogenesis. This is due to their similar clinical presentations and imaging manifestations. That misdiagnosis can significantly increase the occurrence of adverse events. Delayed or incorrect treatment is one of the most important causes of MS progression. Therefore, the development of a practical diagnostic imaging aid could significantly reduce the risk of misdiagnosis and improve patient prognosis. We propose an interpretable deep learning (DL) model that differentiates MS and cSVD using T2-weighted fluid-attenuated inversion recovery (FLAIR) images. Transfer learning (TL) was utilized to extract features from the ImageNet dataset. This pioneering model marks the first of its kind in neuroimaging, showing great potential in enhancing differential diagnostic capabilities within the field of neurological disorders. Our model extracts the texture features of the images and achieves more robust feature learning through two attention modules. The attention maps provided by the attention modules provide model interpretation to validate model learning and reveal more information to physicians. Finally, the proposed model is trained end-to-end using focal loss to reduce the influence of class imbalance. The model was validated using clinically diagnosed MS (n=112) and cSVD (n=321) patients from the Beijing Tiantan Hospital. The performance of the proposed model was better than that of two commonly used DL approaches, with a mean balanced accuracy of 86.06 % and a mean area under the receiver operating characteristic curve of 98.78 %. Moreover, the generated attention heat maps showed that the proposed model could focus on the lesion signatures in the image. The proposed model provides a practical diagnostic imaging aid for the use of routinely available imaging techniques such as magnetic resonance imaging to classify MS and cSVD by linking DL to human brain disease. We anticipate a substantial improvement in accurately distinguishing between various neurological conditions through this novel model.
Collapse
Affiliation(s)
- Wangshu Xu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China; China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China
| | - Zhiwei Rong
- Department of Biostatistics, School of Public Health, Peking University, Beijing, 100191, China
| | - Wenping Ma
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Bin Zhu
- Department of Pharmacy, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
| | - Na Li
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China
| | - Jiansong Huang
- Peking University Health Science Center, Beijing, 100191, China
| | - Zhilin Liu
- Department of Biostatistics, School of Public Health, Peking University, Beijing, 100191, China
| | - Yipei Yu
- Department of Biostatistics, School of Public Health, Peking University, Beijing, 100191, China
| | - Fa Zhang
- The School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Xinghu Zhang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China; China National Clinical Research Center for Neurological Diseases, Beijing, 100070, China.
| | - Ming Ge
- Department of Neurosurgery, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China.
| | - Yan Hou
- Department of Biostatistics, School of Public Health, Peking University, Beijing, 100191, China; Peking University Clinical Research Center, Beijing, 100191, China.
| |
Collapse
|
41
|
Yao L, Guan J, Xie P, Chung C, Deng J, Huang Y, Chiang Y, Lee T. AMPActiPred: A three-stage framework for predicting antibacterial peptides and activity levels with deep forest. Protein Sci 2024; 33:e5006. [PMID: 38723168 PMCID: PMC11081525 DOI: 10.1002/pro.5006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/10/2024] [Accepted: 04/13/2024] [Indexed: 05/13/2024]
Abstract
The emergence and spread of antibiotic-resistant bacteria pose a significant public health threat, necessitating the exploration of alternative antibacterial strategies. Antibacterial peptide (ABP) is a kind of antimicrobial peptide (AMP) that has the potential ability to fight against bacteria infection, offering a promising avenue for developing novel therapeutic interventions. This study introduces AMPActiPred, a three-stage computational framework designed to identify ABPs, characterize their activity against diverse bacterial species, and predict their activity levels. AMPActiPred employed multiple effective peptide descriptors to effectively capture the compositional features and physicochemical properties of peptides. AMPActiPred utilized deep forest architecture, a cascading architecture similar to deep neural networks, capable of effectively processing and exploring original features to enhance predictive performance. In the first stage, AMPActiPred focuses on ABP identification, achieving an Accuracy of 87.6% and an MCC of 0.742 on an elaborate dataset, demonstrating state-of-the-art performance. In the second stage, AMPActiPred achieved an average GMean at 82.8% in identifying ABPs targeting 10 bacterial species, indicating AMPActiPred can achieve balanced predictions regarding the functional activity of ABP across this set of species. In the third stage, AMPActiPred demonstrates robust predictive capabilities for ABP activity levels with an average PCC of 0.722. Furthermore, AMPActiPred exhibits excellent interpretability, elucidating crucial features associated with antibacterial activity. AMPActiPred is the first computational framework capable of predicting targets and activity levels of ABPs. Finally, to facilitate the utilization of AMPActiPred, we have established a user-friendly web interface deployed at https://awi.cuhk.edu.cn/∼AMPActiPred/.
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of MedicineThe Chinese University of Hong KongShenzhenChina
- School of Science and EngineeringThe Chinese University of Hong KongShenzhenChina
| | - Jiahui Guan
- Kobilka Institute of Innovative Drug Discovery, School of MedicineThe Chinese University of Hong KongShenzhenChina
- School of MedicineThe Chinese University of Hong KongShenzhenChina
| | - Peilin Xie
- Kobilka Institute of Innovative Drug Discovery, School of MedicineThe Chinese University of Hong KongShenzhenChina
| | - Chia‐Ru Chung
- Department of Computer Science and Information EngineeringNational Central UniversityTaoyuanTaiwan
| | - Junyang Deng
- School of MedicineThe Chinese University of Hong KongShenzhenChina
| | - Yixian Huang
- School of MedicineThe Chinese University of Hong KongShenzhenChina
| | - Ying‐Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of MedicineThe Chinese University of Hong KongShenzhenChina
- School of MedicineThe Chinese University of Hong KongShenzhenChina
| | - Tzong‐Yi Lee
- Institute of Bioinformatics and Systems BiologyNational Yang Ming Chiao Tung UniversityHsinchuTaiwan
- Center for Intelligent Drug Systems and Smart Bio‐devices (IDS2B)National Yang Ming Chiao Tung UniversityHsinchuTaiwan
| |
Collapse
|
42
|
Zhou Y, Chen P, Fan Y, Wu Y. A Multimodal Feature Fusion Brain Fatigue Recognition System Based on Bayes-gcForest. SENSORS (BASEL, SWITZERLAND) 2024; 24:2910. [PMID: 38733015 PMCID: PMC11086115 DOI: 10.3390/s24092910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 04/28/2024] [Accepted: 04/30/2024] [Indexed: 05/13/2024]
Abstract
Modern society increasingly recognizes brain fatigue as a critical factor affecting human health and productivity. This study introduces a novel, portable, cost-effective, and user-friendly system for real-time collection, monitoring, and analysis of physiological signals aimed at enhancing the precision and efficiency of brain fatigue recognition and broadening its application scope. Utilizing raw physiological data, this study constructed a compact dataset that incorporated EEG and ECG data from 20 subjects to index fatigue characteristics. By employing a Bayesian-optimized multi-granularity cascade forest (Bayes-gcForest) for fatigue state recognition, this study achieved recognition rates of 95.71% and 96.13% on the DROZY public dataset and constructed dataset, respectively. These results highlight the effectiveness of the multi-modal feature fusion model in brain fatigue recognition, providing a viable solution for cost-effective and efficient fatigue monitoring. Furthermore, this approach offers theoretical support for designing rest systems for researchers.
Collapse
Affiliation(s)
- You Zhou
- College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China (Y.F.)
| | - Pukun Chen
- Shanghai Shentian Industrial Co., Ltd., Shanghai 200090, China
- Shanghai Radio Equipment Research Institute, Shanghai 201109, China
| | - Yifan Fan
- College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China (Y.F.)
| | - Yin Wu
- College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China (Y.F.)
| |
Collapse
|
43
|
Arif M, Fang G, Ghulam A, Musleh S, Alam T. DPI_CDF: druggable protein identifier using cascade deep forest. BMC Bioinformatics 2024; 25:145. [PMID: 38580921 PMCID: PMC11334562 DOI: 10.1186/s12859-024-05744-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 03/13/2024] [Indexed: 04/07/2024] Open
Abstract
BACKGROUND Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory. METHODS In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. RESULTS The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. AVAILABILITY The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .
Collapse
Affiliation(s)
- Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Ge Fang
- State Key Laboratory for Organic Electronics and Information Displays, Institute of Advanced Materials (IAM), Nanjing 210023, P. R. China, Nanjing 210023, China
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bankok, 10700, Thailand
| | - Ali Ghulam
- Information Technology Centre, Sindh Agriculture University, Sindh, Pakistan
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
| |
Collapse
|
44
|
Zahiri Z, Mehrshad N, Mehrshad M. DF-Phos: Prediction of Protein Phosphorylation Sites by Deep Forest. J Biochem 2024; 175:447-456. [PMID: 38153271 DOI: 10.1093/jb/mvad116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 12/10/2023] [Accepted: 12/12/2023] [Indexed: 12/29/2023] Open
Abstract
Phosphorylation is the most important and studied post-translational modification (PTM), which plays a crucial role in protein function studies and experimental design. Many significant studies have been performed to predict phosphorylation sites using various machine-learning methods. Recently, several studies have claimed that deep learning-based methods are the best way to predict the phosphorylation sites because deep learning as an advanced machine learning method can automatically detect complex representations of phosphorylation patterns from raw sequences and thus offers a powerful tool to improve phosphorylation site prediction. In this study, we report DF-Phos, a new phosphosite predictor based on the Deep Forest to predict phosphorylation sites. In DF-Phos, the feature vector taken from the CkSAApair method is as input for a Deep Forest framework for predicting phosphorylation sites. The results of 10-fold cross-validation show that the Deep Forest method has the highest performance among other available methods. We implemented a Python program of DF-Phos, which is freely available for non-commercial use at https://github.com/zahiriz/DF-Phos Moreover, users can use it for various PTM predictions.
Collapse
Affiliation(s)
- Zeynab Zahiri
- Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Nasser Mehrshad
- Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Maliheh Mehrshad
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, 750 07 Sweden
| |
Collapse
|
45
|
Yan Y, Li W, Wang S, Huang T. Seq-RBPPred: Predicting RNA-Binding Proteins from Sequence. ACS OMEGA 2024; 9:12734-12742. [PMID: 38524500 PMCID: PMC10955590 DOI: 10.1021/acsomega.3c08381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/18/2023] [Accepted: 12/28/2023] [Indexed: 03/26/2024]
Abstract
RNA-binding proteins (RBPs) can interact with RNAs to regulate RNA translation, modification, splicing, and other important biological processes. The accurate identification of RBPs is of paramount importance for gaining insights into the intricate mechanisms underlying organismal life activities. Traditional experimental methods to predict RBPs require a lot of time and money, so it is important to develop computational methods to predict RBPs. However, the existing approaches for RBP prediction still require further improvement due to unidentified RBPs in many species. In this study, we present Seq-RBPPred (predicting RBPs from sequence), a novel method that utilizes a comprehensive feature representation encompassing both biophysical properties and hidden-state features derived from protein sequences. In the results, comprehensive performance evaluations of Seq-RBPPred its superiority compare with state-of-the-art methods, yielding impressive performance including 0.922 for overall accuracy, 0.926 for sensitivity, 0.903 for specificity, and Matthew's correlation coefficient (MCC) of 0.757 as ascertained from the evaluation of the testing set. The data and code of Seq-RBPPred are available at https://github.com/yaoyao-11/Seq-RBPPred.
Collapse
Affiliation(s)
- Yuyao Yan
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| | - Wenran Li
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| | - Sijia Wang
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| | - Tao Huang
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| |
Collapse
|
46
|
Ma Y, Zhang B, Liu Z, Liu Y, Wang J, Li X, Feng F, Ni Y, Li S. IAS-FET: An intelligent assistant system and an online platform for enhancing successful rate of in-vitro fertilization embryo transfer technology based on clinical features. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 245:108050. [PMID: 38301430 DOI: 10.1016/j.cmpb.2024.108050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/20/2024] [Accepted: 01/23/2024] [Indexed: 02/03/2024]
Abstract
BACKGROUND Among all of the assisted reproductive technology (ART) methods, in vitro fertilization-embryo transfer (IVF-ET) holds a prominent position as a key solution for overcoming infertility. However, its success rate hovers at a modest 30% to 70%. Adding to the challenge is the absence of effective models and clinical tools capable of predicting the outcome of IVF-ET before embryo formation. Our study is dedicated to filling this critical gap by aiming to predict IVF-ET outcomes and ultimately enhance the success rate of this transformative procedure. METHODS In this retrospective study, infertile patients who received artificial assisted pregnancy treatment at Gansu Provincial Maternity and Child-care Hospital in China were enrolled from 2016 to 2020. Individual's clinical information were studied by cascade XGBoost method to build an intelligent assisted system for predicting the outcome of IVF-ET, called IAS-FET. The cascade XGBoost model was trained using clinical information from 2292 couples and externally tested using clinical information from 573 couples. In addition, several schemes which will be of help for patients to adjust their physical condition to improve their success rate on ART were suggested by IAS-FET. RESULTS The outcome of IVF-ET can be predicted by the built IAS-FET method with the area under curve (AUC) value of 0.8759 on the external test set. Besides, this IAS-FET method can provide several schemes to improve the successful rate of IVF-ET outcomes. The built tool for IAS-FET is addressed as a free platform online at http://www.cppdd.cn/ART for the convenient usage of users. CONCLUSIONS It suggested the significant influence of personal clinical features for the success of ART. The proposed system IAS-FET based on the top 27 factors could be a promising tool to predict the outcome of ART and propose a plan for the patient's physical adjustment. With the help of IAS-FET, patients can take informed steps towards increasing their chances of a successful outcome on their journey to parenthood.
Collapse
Affiliation(s)
- Ying Ma
- Gansu Provincial Maternity and Child-care Hospital, Lanzhou, Gansu 730030, China
| | - Bowen Zhang
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430073, China
| | - Zhaoqing Liu
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Yujie Liu
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Jiarui Wang
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xingxuan Li
- School of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, Gansu 730030, China
| | - Fan Feng
- Gansu Provincial Maternity and Child-care Hospital, Lanzhou, Gansu 730030, China
| | - Yali Ni
- Gansu Provincial Maternity and Child-care Hospital, Lanzhou, Gansu 730030, China
| | - Shuyan Li
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.
| |
Collapse
|
47
|
Xie K, Hou Y, Zhou X. Deep centroid: a general deep cascade classifier for biomedical omics data classification. Bioinformatics 2024; 40:btae039. [PMID: 38305432 PMCID: PMC10868341 DOI: 10.1093/bioinformatics/btae039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/13/2024] [Accepted: 01/30/2024] [Indexed: 02/03/2024] Open
Abstract
MOTIVATION Classification of samples using biomedical omics data is a widely used method in biomedical research. However, these datasets often possess challenging characteristics, including high dimensionality, limited sample sizes, and inherent biases across diverse sources. These factors limit the performance of traditional machine learning models, particularly when applied to independent datasets. RESULTS To address these challenges, we propose a novel classifier, Deep Centroid, which combines the stability of the nearest centroid classifier and the strong fitting ability of the deep cascade strategy. Deep Centroid is an ensemble learning method with a multi-layer cascade structure, consisting of feature scanning and cascade learning stages that can dynamically adjust the training scale. We apply Deep Centroid to three precision medicine applications-cancer early diagnosis, cancer prognosis, and drug sensitivity prediction-using cell-free DNA fragmentations, gene expression profiles, and DNA methylation data. Experimental results demonstrate that Deep Centroid outperforms six traditional machine learning models in all three applications, showcasing its potential in biological omics data classification. Furthermore, functional annotations reveal that the features scanned by the model exhibit biological significance, indicating its interpretability from a biological perspective. Our findings underscore the promising application of Deep Centroid in the classification of biomedical omics data, particularly in the field of precision medicine. AVAILABILITY AND IMPLEMENTATION Deep Centroid is available at both github (github.com/xiexiexiekuan/DeepCentroid) and Figshare (https://figshare.com/articles/software/Deep_Centroid_A_General_Deep_Cascade_Classifier_for_Biomedical_Omics_Data_Classification/24993516).
Collapse
Affiliation(s)
- Kuan Xie
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Yuying Hou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Xionghui Zhou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| |
Collapse
|
48
|
Yao L, Guan J, Li W, Chung CR, Deng J, Chiang YC, Lee TY. Identifying Antitubercular Peptides via Deep Forest Architecture with Effective Feature Representation. Anal Chem 2024; 96:1538-1546. [PMID: 38226973 DOI: 10.1021/acs.analchem.3c04196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Tuberculosis (TB) is a severe disease caused by Mycobacterium tuberculosis that poses a significant threat to human health. The emergence of drug-resistant strains has made the global fight against TB even more challenging. Antituberculosis peptides (ATPs) have shown promising results as a potential treatment for TB. However, conventional wet lab-based approaches to ATP discovery are time-consuming and costly and often fail to discover peptides with desired properties. To address these challenges, we propose a novel machine learning-based framework called ATPfinder that can significantly accelerate the discovery of ATP. Our approach integrates various efficient peptide descriptors and utilizes the deep forest algorithm to construct the model. This neural network-like cascading structure can effectively process and mine features without complex hyperparameter tuning. Our experimental results show that ATPfinder outperforms existing ATP prediction tools, achieving state-of-the-art performance with an accuracy of 89.3% and an MCC of 0.70. Moreover, our framework exhibits better robustness than baseline algorithms commonly used for other sequence analysis tasks. Additionally, the excellent interpretability of our model can assist researchers in understanding the critical features of ATP. Finally, we developed a downloadable desktop application to simplify the use of our framework for researchers. Therefore, ATPfinder can facilitate the discovery of peptide drugs and provide potential solutions for TB treatment. Our framework is freely available at https://github.com/lantianyao/ATPfinder/ (data sets and code) and https://awi.cuhk.edu.cn/dbAMP/ATPfinder.html (software).
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Wenshuo Li
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, 320317 Taoyuan, Taiwan
| | - Junyang Deng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Ying-Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
| |
Collapse
|
49
|
Arif M, Fang G, Fida H, Musleh S, Yu DJ, Alam T. iMRSAPred: Improved Prediction of Anti-MRSA Peptides Using Physicochemical and Pairwise Contact-Energy Properties of Amino Acids. ACS OMEGA 2024; 9:2874-2883. [PMID: 38250405 PMCID: PMC10795061 DOI: 10.1021/acsomega.3c08303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 12/06/2023] [Accepted: 12/13/2023] [Indexed: 01/23/2024]
Abstract
Methicillin-resistant Staphylococcus aureus (MRSA) is a growing concern for human lives worldwide. Anti-MRSA peptides act as potential antibiotic agents and play significant role to combat MRSA infection. Traditional laboratory-based methods for annotating Anti-MRSA peptides are although precise but quite challenging, costly, and time-consuming. Therefore, computational methods capable of identifying Anti-MRSA peptides accelerate the drug designing process for treating bacterial infections. In this study, we developed a novel sequence-based predictor "iMRSAPred" for screening Anti-MRSA peptides by incorporating energy estimation and physiochemical and sequential information. We successfully resolved the skewed imbalance phenomena by using synthetic minority oversampling technique plus Tomek link (SMOTETomek) algorithm. Furthermore, the Shapley additive explanation method was leveraged to analyze the impact of top-ranked features in the prediction task. We evaluated multiple machine learning algorithms, i.e., CatBoost, Cascade Deep Forest, Kernel and Tree Boosting, support vector machine, and HistGBoost classifiers by 10-fold cross-validation and independent testing. The proposed iMRSAPred method significantly improved the overall performance in terms of accuracy and Matthew's correlation coefficient (MCC) by 5.45 and 0.083%, respectively, on the training data set. On the independent data set, iMRSAPred improved accuracy and MCC by 3.98 and 0.055%, respectively. We believe that the proposed method would be useful in large-scale Anti-MRSA peptide prediction and provide insights into other bioactive peptides.
Collapse
Affiliation(s)
- Muhammad Arif
- College
of Science and Engineering, Hamad Bin Khalifa
University, Doha 34110, Qatar
| | - Ge Fang
- State
Key Laboratory for Organic Electronics and Information Displays, Institute of Advanced Materials (IAM), Nanjing University of Posts Telecommunications
9 Wenyuan Road, Nanjing 210023, P. R. China
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bankok 10700, Thailand
| | - Huma Fida
- Department
of Microbiology, Abdul Wali Khan University, Mardan 23200, KPK, Pakistan
| | - Saleh Musleh
- College
of Science and Engineering, Hamad Bin Khalifa
University, Doha 34110, Qatar
| | - Dong-Jun Yu
- School
of Computer Science and Engineering, Nanjing
University of Science and Technology, Nanjing 210023, China
| | - Tanvir Alam
- College
of Science and Engineering, Hamad Bin Khalifa
University, Doha 34110, Qatar
| |
Collapse
|
50
|
Jiang X, Ma C, Nazarpour K. One-shot random forest model calibration for hand gesture decoding. J Neural Eng 2024; 21:016006. [PMID: 38225863 DOI: 10.1088/1741-2552/ad1786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 12/20/2023] [Indexed: 01/17/2024]
Abstract
Objective.Most existing machine learning models for myoelectric control require a large amount of data to learn user-specific characteristics of the electromyographic (EMG) signals, which is burdensome. Our objective is to develop an approach to enable the calibration of a pre-trained model with minimal data from a new myoelectric user.Approach.We trained a random forest (RF) model with EMG data from 20 people collected during the performance of multiple hand grips. To adapt the decision rules for a new user, first, the branches of the pre-trained decision trees were pruned using the validation data from the new user. Then new decision trees trained merely with data from the new user were appended to the pruned pre-trained model.Results.Real-time myoelectric experiments with 18 participants over two days demonstrated the improved accuracy of the proposed approach when compared to benchmark user-specific RF and the linear discriminant analysis models. Furthermore, the RF model that was calibrated on day one for a new participant yielded significantly higher accuracy on day two, when compared to the benchmark approaches, which reflects the robustness of the proposed approach.Significance.The proposed model calibration procedure is completely source-free, that is, once the base model is pre-trained, no access to the source data from the original 20 people is required. Our work promotes the use of efficient, explainable, and simple models for myoelectric control.
Collapse
Affiliation(s)
- Xinyu Jiang
- School of Informatics, The University of Edinburgh, Edinburgh, United Kingdom
| | - Chenfei Ma
- School of Informatics, The University of Edinburgh, Edinburgh, United Kingdom
| | - Kianoush Nazarpour
- School of Informatics, The University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|