1
|
Liu T, Liang L, Che C, Liu Y, Jin B. A transformer-based framework for temporal health event prediction with graph-enhanced representations. J Biomed Inform 2025; 166:104826. [PMID: 40324665 DOI: 10.1016/j.jbi.2025.104826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 03/31/2025] [Accepted: 04/03/2025] [Indexed: 05/07/2025]
Abstract
OBJECTIVE Deep learning approaches have demonstrated significant potential in predicting temporal health events in recent years. However, existing methods have not fully leveraged the complex interactions among comorbidities and have overlooked imbalances and temporal irregularities in admission records. METHODS This study proposes GLT-Net, a deep learning approach that combines Graph Learning with Transformer framework to tackle these challenges. GLT-Net begins by constructing a patient association graph to generate unique representations for each individual. At the same time, the hierarchical structure of diagnosis codes is utilized to pre-train the diagnosis code embeddings. Subsequently, a comorbidity association matrix is created to illustrate the relationships between comorbidities, and graph neural networks are employed to enhance the feature representations of diagnosis codes. Finally, a Transformer-Encoder framework captures the dependencies in historical admission records by incorporating time information. RESULTS We demonstrate our approach on two tasks in temporal health event predcition. Experimental results on real-world datasets show that GLT-Net outperforms baseline models in forecasting temporal health events. Additionally, a case study demonstrates the effectiveness of GLT-Net in predicting health events. CONCLUSION Understanding progression patterns over time, comorbidity associations, and patient characterization is essential for predicting temporal health events. Our study provides new insights and methods for a deeper understanding of patient health status and disease trends. Moreover, our model can be extended to other data sources, enhancing its versatility.
Collapse
Affiliation(s)
- Tianci Liu
- Key Laboratory of Advanced Design and Intelligent Computing Ministry of Education, Dalian University, Dalian, 116622, Liaoning, China; School of Software Engineering, Dalian University, Dalian, 116622, Liaoning, China
| | - Lizhong Liang
- School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, 510006, Guangdong, China; Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524013, Guangdong, China; The Marine Biomedical Research Institute, School of Ocean and Tropical Medicine, Guangdong Medical University, Zhanjiang, 524023, Guangdong, China
| | - Chao Che
- Key Laboratory of Advanced Design and Intelligent Computing Ministry of Education, Dalian University, Dalian, 116622, Liaoning, China; School of Software Engineering, Dalian University, Dalian, 116622, Liaoning, China.
| | - Yunjiong Liu
- Key Laboratory of Advanced Design and Intelligent Computing Ministry of Education, Dalian University, Dalian, 116622, Liaoning, China; School of Software Engineering, Dalian University, Dalian, 116622, Liaoning, China
| | - Bo Jin
- School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, 116024, Liaoning, China
| |
Collapse
|
2
|
Cui H, Shen Z, Zhang J, Shao H, Qin L, Ho JC, Yang C. LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025; 2024:319-328. [PMID: 40417470 PMCID: PMC12099430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]
Abstract
Electronic health records (EHRs) contain valuable patient data for health-related prediction tasks, such as disease prediction. Traditional approaches rely on supervised learning methods that require large labeled datasets, which can be expensive and challenging to obtain. In this study, we investigate the feasibility of applying Large Language Models (LLMs) to convert structured patient visit data (e.g., diagnoses, labs, prescriptions) into natural language narratives. We evaluate the zero-shot and few-shot performance of LLMs using various EHR-prediction-oriented prompting strategies. Furthermore, we propose a novel approach that utilizes LLM agents with different roles: a predictor agent that makes predictions and generates reasoning processes and a critic agent that analyzes incorrect predictions and provides guidance for improving the reasoning of the predictor agent. Our results demonstrate that with the proposed approach, LLMs can achieve decent few-shot performance compared to traditional supervised learning methods in EHR-based disease predictions, suggesting its potential for health-oriented applications.
Collapse
Affiliation(s)
- Hejie Cui
- Department of Computer Science, Emory University, Atlanta, GA, USA
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Zhuocheng Shen
- Department of Computer Science, Emory University, Atlanta, GA, USA
| | - Jieyu Zhang
- School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Hui Shao
- Rollins School of Public Health, Emory University, Atlanta, GA, USA
- School of Medicine, Emory University, Atlanta, GA, USA
| | - Lianhui Qin
- Department of Computer Science & Engineering, UCSD, San Diego, CA, USA
| | - Joyce C Ho
- Department of Computer Science, Emory University, Atlanta, GA, USA
| | - Carl Yang
- Department of Computer Science, Emory University, Atlanta, GA, USA
- Rollins School of Public Health, Emory University, Atlanta, GA, USA
| |
Collapse
|
3
|
Savorgnan F, Checchia PA. From Prediction to Practice: Evaluating Real-Time Clinical Decision Support in Pediatric Cardiac Intensive Care. Crit Care Med 2025:00003246-990000000-00522. [PMID: 40331870 DOI: 10.1097/ccm.0000000000006696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2025]
Affiliation(s)
- Fabio Savorgnan
- Both authors: Division of Pediatric Critical Care Medicine, Department of Pediatrics, Baylor College of Medicine and the Texas Children's Hospital, Houston, TX
| | | |
Collapse
|
4
|
Sah AK, Elshaikh RH, Shalabi MG, Abbas AM, Prabhakar PK, Babker AMA, Choudhary RK, Gaur V, Choudhary AS, Agarwal S. Role of Artificial Intelligence and Personalized Medicine in Enhancing HIV Management and Treatment Outcomes. Life (Basel) 2025; 15:745. [PMID: 40430173 PMCID: PMC12112836 DOI: 10.3390/life15050745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2025] [Revised: 04/25/2025] [Accepted: 04/29/2025] [Indexed: 05/29/2025] Open
Abstract
The integration of artificial intelligence and personalized medicine is transforming HIV management by enhancing diagnostics, treatment optimization, and disease monitoring. Advances in machine learning, deep neural networks, and multi-omics data analysis enable precise prognostication, tailored antiretroviral therapy, and early detection of drug resistance. AI-driven models analyze vast genomic, proteomic, and clinical datasets to refine treatment strategies, predict disease progression, and pre-empt therapy failures. Additionally, AI-powered diagnostic tools, including deep learning imaging and natural language processing, improve screening accuracy, particularly in resource-limited settings. Despite these innovations, challenges such as data privacy, algorithmic bias, and the need for clinical validation remain. Successful integration of AI into HIV care requires robust regulatory frameworks, interdisciplinary collaboration, and equitable technology access. This review explores both the potential and limitations of AI in HIV management, emphasizing the need for ethical implementation and expanded research to maximize its impact. AI-driven approaches hold great promise for a more personalized, efficient, and effective future in HIV treatment and care.
Collapse
Affiliation(s)
- Ashok Kumar Sah
- Department of Medical Laboratory Sciences, College of Applied & Health Sciences, A’Sharqiyah University, Ibra 400, Oman;
| | - Rabab H. Elshaikh
- Department of Medical Laboratory Sciences, College of Applied & Health Sciences, A’Sharqiyah University, Ibra 400, Oman;
| | - Manar G. Shalabi
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Jouf University, Sakala 72388, Saudi Arabia; (M.G.S.); (A.M.A.)
| | - Anass M. Abbas
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Jouf University, Sakala 72388, Saudi Arabia; (M.G.S.); (A.M.A.)
| | - Pranav Kumar Prabhakar
- Department of Biotechnology, School of Engineering and Technology, Nagaland University, Meriema, Kohima 797004, India;
| | - Asaad M. A. Babker
- Department of Medical Laboratory Sciences, College of Health Sciences, Gulf Medical University, Ajman 4184, United Arab Emirates;
| | - Ranjay Kumar Choudhary
- Department of Medical Laboratory Technology, UIAHS, Chandigarh University, Chandigarh 160036, India
- School of Paramedics and Allied Health Sciences, Centurion University of Technology and Management, R. Sitapur 761211, India
| | - Vikash Gaur
- Meerabai Institute of Technology, Delhi Skill and Entrepreneurship University, New Delhi 110077, India;
| | - Ajab Singh Choudhary
- Department of Medical Laboratory Technology, School of Allied Health Sciences, Noida International University, Greater Noida 203201, India;
| | - Shagun Agarwal
- School of Allied Health Sciences, Galgotias University, Greater Noida 203201, India
| |
Collapse
|
5
|
Huang X, Ren S, Mao X, Chen S, Chen E, He Y, Jiang Y. Association Between Risk Factors and Major Cancers: Explainable Machine Learning Approach. JMIR Cancer 2025; 11:e62833. [PMID: 40315870 PMCID: PMC12064211 DOI: 10.2196/62833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Revised: 03/08/2025] [Accepted: 03/20/2025] [Indexed: 05/04/2025] Open
Abstract
Background Cancer is a life-threatening disease and a leading cause of death worldwide, with an estimated 611,000 deaths and over 2 million new cases in the United States in 2024. The rising incidence of major cancers, including among younger individuals, highlights the need for early screening and monitoring of risk factors to manage and decrease cancer risk. Objective This study aimed to leverage explainable machine learning models to identify and analyze the key risk factors associated with breast, colorectal, lung, and prostate cancers. By uncovering significant associations between risk factors and these major cancer types, we sought to enhance the understanding of cancer diagnosis risk profiles. Our goal was to facilitate more precise screening, early detection, and personalized prevention strategies, ultimately contributing to better patient outcomes and promoting health equity. Methods Deidentified electronic health record data from Medical Information Mart for Intensive Care (MIMIC)-III was used to identify patients with 4 types of cancer who had longitudinal hospital visits prior to their diagnosis presence. Their records were matched and combined with those of patients without cancer diagnoses using propensity scores based on demographic factors. Three advanced models, penalized logistic regression, random forest, and multilayer perceptron (MLP), were conducted to identify the rank of risk factors for each cancer type, with feature importance analysis for random forest and MLP models. The rank biased overlap was adopted to compare the similarity of ranked risk factors across cancer types. Results Our framework evaluated the prediction performance of explainable machine learning models, with the MLP model demonstrating the best performance. It achieved an area under the receiver operating characteristic curve of 0.78 for breast cancer (n=58), 0.76 for colorectal cancer (n=140), 0.84 for lung cancer (n=398), and 0.78 for prostate cancer (n=104), outperforming other baseline models (P<.001). In addition to demographic risk factors, the most prominent nontraditional risk factors overlapped across models and cancer types, including hyperlipidemia (odds ratio [OR] 1.14, 95% CI 1.11-1.17; P<.01), diabetes (OR 1.34, 95% CI 1.29-1.39; P<.01), depressive disorders (OR 1.11, 95% CI 1.06-1.16; P<.01), heart diseases (OR 1.42, 95% CI 1.32-1.52; P<.01), and anemia (OR 1.22, 95% CI 1.14-1.30; P<.01). The similarity analysis indicated the unique risk factor pattern for lung cancer from other cancer types. Conclusions The study's findings demonstrated the effectiveness of explainable ML models in assessing nontraditional risk factors for major cancers and highlighted the importance of considering unique risk profiles for different cancer types. Moreover, this research served as a hypothesis-generating foundation, providing preliminary results for future investigation into cancer diagnosis risk analysis and management. Furthermore, expanding collaboration with clinical experts for external validation would be essential to refine model outputs, integrate findings into practice, and enhance their impact on patient care and cancer prevention efforts.
Collapse
Affiliation(s)
- Xiayuan Huang
- Department of Biostatistics, Yale University, New Haven, CT, United States
| | - Shushun Ren
- School of Nursing, University of Michigan–Ann Arbor, 400 North Ingalls Street, Ann Arbor, MI, 48109, United States, 1 7347633705, 1 7346472416
| | - Xinyue Mao
- College of Literature Science and the Arts, University of Michigan–Ann Arbor, Ann Arbor, MI, United States
| | - Sirui Chen
- College of Literature Science and the Arts, University of Michigan–Ann Arbor, Ann Arbor, MI, United States
| | - Elle Chen
- School of Nursing, University of Michigan–Ann Arbor, 400 North Ingalls Street, Ann Arbor, MI, 48109, United States, 1 7347633705, 1 7346472416
| | - Yuqi He
- University Library, San Jose State University, San Jose, CA, United States
| | - Yun Jiang
- School of Nursing, University of Michigan–Ann Arbor, 400 North Ingalls Street, Ann Arbor, MI, 48109, United States, 1 7347633705, 1 7346472416
| |
Collapse
|
6
|
Bornet A, Proios D, Yazdani A, Jaume-Santero F, Haller G, Choi E, Teodoro D. Comparing neural language models for medical concept representation and patient trajectory prediction. Artif Intell Med 2025; 163:103108. [PMID: 40086407 DOI: 10.1016/j.artmed.2025.103108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 01/22/2024] [Accepted: 03/09/2025] [Indexed: 03/16/2025]
Abstract
Effective representation of medical concepts is crucial for secondary analyses of electronic health records. Neural language models have shown promise in automatically deriving medical concept representations from clinical data. However, the comparative performance of different language models for creating these empirical representations, and the extent to which they encode medical semantics, has not been extensively studied. This study aims to address this gap by evaluating the effectiveness of three popular language models - word2vec, fastText, and GloVe - in creating medical concept embeddings that capture their semantic meaning. By using a large dataset of digital health records, we created patient trajectories and used them to train the language models. We then assessed the ability of the learned embeddings to encode semantics through an explicit comparison with biomedical terminologies, and implicitly by predicting patient outcomes and trajectories with different levels of available information. Our qualitative analysis shows that empirical clusters of embeddings learned by fastText exhibit the highest similarity with theoretical clustering patterns obtained from biomedical terminologies, with a similarity score between empirical and theoretical clusters of 0.88, 0.80, and 0.92 for diagnosis, procedure, and medication codes, respectively. Conversely, for outcome prediction, word2vec and GloVe tend to outperform fastText, with the former achieving AUROC as high as 0.78, 0.62, and 0.85 for length-of-stay, readmission, and mortality prediction, respectively. In predicting medical codes in patient trajectories, GloVe achieves the highest performance for diagnosis and medication codes (AUPRC of 0.45 and of 0.81, respectively) at the highest level of the semantic hierarchy, while fastText outperforms the other models for procedure codes (AUPRC of 0.66). Our study demonstrates that subword information is crucial for learning medical concept representations, but global embedding vectors are better suited for more high-level downstream tasks, such as trajectory prediction. Thus, these models can be harnessed to learn representations that convey clinical meaning, and our insights highlight the potential of using machine learning techniques to semantically encode medical data.
Collapse
Affiliation(s)
- Alban Bornet
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.
| | - Dimitrios Proios
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland; Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, Geneva, Switzerland
| | - Anthony Yazdani
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland; Geneva School of Business Administration, HES-SO University of Applied Sciences and Arts of Western Switzerland, Geneva, Switzerland
| | - Fernando Jaume-Santero
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Guy Haller
- Department of Acute Care Medicine, Division of Anaesthesiology, Geneva University Hospitals, Switzerland; Department of Epidemiology and Preventive Medicine, Health Services Management and Research Unit, Monash University, Melbourne, Victoria, Australia
| | | | - Douglas Teodoro
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.
| |
Collapse
|
7
|
Awe OO, Mwangi PN, Goudoungou SK, Esho RV, Oyejide OS. Explainable AI for enhanced accuracy in malaria diagnosis using ensemble machine learning models. BMC Med Inform Decis Mak 2025; 25:162. [PMID: 40217281 PMCID: PMC11987329 DOI: 10.1186/s12911-025-02874-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 01/16/2025] [Indexed: 04/14/2025] Open
Abstract
BACKGROUND Malaria, an infectious disease caused by protozoan parasites belonging to the Plasmodium genus, remains a significant public health challenge, with African regions bearing the heaviest burden. Machine learning techniques have shown great promise in improving the diagnosis of infectious diseases, such as malaria. OBJECTIVES This study aims to integrate ensemble machine learning models and Explainable Artificial Intelligence (XAI) frameworks to enhance the diagnosis accuracy of malaria. METHODS The study utilized a dataset from the Federal Polytechnic Ilaro Medical Centre, Ilaro, Ogun State, Nigeria, which includes information from 337 patients aged between 3 and 77 years (180 females and 157 males) over a 4-week period. Ensemble methods, namely Random Forest, AdaBoost, Gradient Boost, XGBoost, and CatBoost, were employed after addressing class imbalance through oversampling techniques. Explainable AI techniques, such as LIME, Shapley Additive Explanations (SHAP) and Permutation Feature Importance, were utilized to enhance transparency and interpretability. RESULTS Among the ensemble models, Random Forest demonstrated the highest performance with an ROC AUC score of 0.869, followed closely by CatBoost at 0.787. XGBoost, Gradient Boost, and AdaBoost achieved ROC AUC scores of 0.770, 0.747, and 0.633, respectively. These methods evaluated the influence of different characteristics on the probability of malaria diagnosis, revealing critical features that contribute to prediction outcomes. CONCLUSION By integrating ensemble machine learning models with explainable AI frameworks, the study promoted transparency in decision-making processes, thereby empowering healthcare providers with actionable insights for improved treatment strategies and enhanced patient outcomes, particularly in malaria management.
Collapse
Affiliation(s)
| | - Peter Njoroge Mwangi
- Department of Data Science, African Institute for Mathematical Sciences (AIMS), Limbe, Cameroon
| | - Samuel Kotva Goudoungou
- Department of Data Science, African Institute for Mathematical Sciences (AIMS), Limbe, Cameroon
| | - Ruth Victoria Esho
- Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Braga, Portugal
| | - Olanrewaju Samuel Oyejide
- Department of Clinical Pharmacology and Clinical Pharmacy, Bogomolets National Medical University, Kiev, Ukraine
| |
Collapse
|
8
|
Cai J, Li P, Li W, Hao X, Li S, Zhu T. Digital Decision Support for Perioperative Care of Patients With Type 2 Diabetes: A Call to Action. JMIR Diabetes 2025; 10:e70475. [PMID: 40198903 PMCID: PMC11999379 DOI: 10.2196/70475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 02/27/2025] [Accepted: 03/05/2025] [Indexed: 04/10/2025] Open
Abstract
Unlabelled Type 2 diabetes mellitus affects over 500 million people globally, with 10%-20% requiring surgery. Patients with diabetes are at increased risk for perioperative complications, including prolonged hospital stays and higher mortality, primarily due to perioperative hyperglycemia. Managing blood glucose during the perioperative period is challenging, and conventional monitoring is often inadequate to detect rapid fluctuations. Clinical decision support systems (CDSS) are emerging tools to improve perioperative diabetes management by providing real-time glucose data and medication recommendations. This viewpoint examines the role of CDSS in perioperative diabetes care, highlighting their benefits and limitations. CDSS can help manage blood glucose more effectively, preventing both hyperglycemia and hypoglycemia. However, technical and integration challenges, along with clinician acceptance, remain significant barriers.
Collapse
Affiliation(s)
- Jianwen Cai
- Department of Anesthesiology, West China Hospital of Sichuan University, No. 17 Section 3 Renmin South Road, Chengdu, 610000, China, 86 18681357952
- Laboratory of Anesthesia and Critical Care Medicine, West China Hospital of Sichuan University, Chengdu, China
| | - Peiyi Li
- Department of Anesthesiology, West China Hospital of Sichuan University, No. 17 Section 3 Renmin South Road, Chengdu, 610000, China, 86 18681357952
- Laboratory of Anesthesia and Critical Care Medicine, West China Hospital of Sichuan University, Chengdu, China
- The Research Units of West China (2018RU012)-Chinese Academy of Medical Sciences, West China Hospital of Sichuan University, Chengdu, China
| | - Weimin Li
- Department of Respiratory and Critical Care Medicine, West China Hospital of Sichuan University, Chengdu, China
- Institute of Respiratory Health, Frontiers Science Center for Disease-related Molecular Network, West China Hospital of Sichuan University, Chengdu, China
- State Key Laboratory of Respiratory Health and Multimorbidity, West China Hospital of Sichuan University, Chengdu, China
| | - Xuechao Hao
- Department of Anesthesiology, West China Hospital of Sichuan University, No. 17 Section 3 Renmin South Road, Chengdu, 610000, China, 86 18681357952
- The Research Units of West China (2018RU012)-Chinese Academy of Medical Sciences, West China Hospital of Sichuan University, Chengdu, China
| | - Sheyu Li
- Department of Endocrinology and Metabolism and Department of Guideline and Rapid Recommendation, Cochrane China Center, MAGIC China Center, Chinese Evidence-Based Medicine Center, West China Hospital of Sichuan University, Chengdu, China
| | - Tao Zhu
- Department of Anesthesiology, West China Hospital of Sichuan University, No. 17 Section 3 Renmin South Road, Chengdu, 610000, China, 86 18681357952
- The Research Units of West China (2018RU012)-Chinese Academy of Medical Sciences, West China Hospital of Sichuan University, Chengdu, China
| |
Collapse
|
9
|
Almahadeen L, Vijay R, Shabaz M, Soni M, Singh PP, Patel P, Byeon H. Clinical deep model to analyse medical multivariate time-series data for health diagnosis. CYBER-PHYSICAL SYSTEMS 2025; 11:139-164. [DOI: 10.1080/23335777.2024.2329677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 03/08/2024] [Indexed: 12/09/2024]
Affiliation(s)
| | - Richa Vijay
- Department of computer science, Amity University
| | - Mohammad Shabaz
- Department of Computer Science Engineering, Model Institute of Engineering and Technology Jammu
| | - Mukesh Soni
- University Centre for Research & Development, Chandigarh University
| | | | - Pavan Patel
- Department of Computer Science Engineering, Ahmedabad Institute of Technology
| | | |
Collapse
|
10
|
Dong F, Li S, Li W. TCKAN: a novel integrated network model for predicting mortality risk in sepsis patients. Med Biol Eng Comput 2025; 63:1013-1025. [PMID: 39560917 DOI: 10.1007/s11517-024-03245-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 11/07/2024] [Indexed: 11/20/2024]
Abstract
Sepsis poses a major global health threat, accounting for millions of deaths annually and significant economic costs. Accurately predicting the risk of mortality in sepsis patients enables early identification, promotes the efficient allocation of medical resources, and facilitates timely interventions, thereby improving patient outcomes. Current methods typically utilize only one type of data-either constant, temporal, or ICD codes. This study introduces a novel approach, the Time-Constant Kolmogorov-Arnold Network (TCKAN), which uniquely integrates temporal data, constant data, and ICD codes within a single predictive model. Unlike existing methods that typically rely on one type of data, TCKAN leverages a multi-modal data integration strategy, resulting in superior predictive accuracy and robustness in identifying high-risk sepsis patients. Validated against the MIMIC-III and MIMIC-IV datasets, TCKAN surpasses existing machine learning and deep learning methods in accuracy, sensitivity, and specificity. Notably, TCKAN achieved AUCs of 87.76% and 88.07%, demonstrating superior capability in identifying high-risk patients. Additionally, TCKAN effectively combats the prevalent issue of data imbalance in clinical settings, improving the detection of patients at elevated risk of mortality and facilitating timely interventions. These results confirm the model's effectiveness and its potential to transform patient management and treatment optimization in clinical practice. Although the TCKAN model has already incorporated temporal, constant, and ICD code data, future research could include more diverse medical data types, such as imaging and laboratory test results, to achieve a more comprehensive data integration and further improve predictive accuracy.
Collapse
Affiliation(s)
- Fanglin Dong
- Yunnan University, Kunming, 650000, Yunnan Province, China
| | - Shibo Li
- Yunnan University, Kunming, 650000, Yunnan Province, China
| | - Weihua Li
- Yunnan University, Kunming, 650000, Yunnan Province, China.
| |
Collapse
|
11
|
Liu T, Krentz AJ, Huo Z, Ćurčin V. Opportunities and Challenges of Cardiovascular Disease Risk Prediction for Primary Prevention Using Machine Learning and Electronic Health Records: A Systematic Review. Rev Cardiovasc Med 2025; 26:37443. [PMID: 40351688 PMCID: PMC12059770 DOI: 10.31083/rcm37443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Revised: 03/13/2025] [Accepted: 03/20/2025] [Indexed: 05/14/2025] Open
Abstract
Background Cardiovascular disease (CVD) remains the foremost cause of morbidity and mortality worldwide. Recent advancements in machine learning (ML) have demonstrated substantial potential in augmenting risk stratification for primary prevention, surpassing conventional statistical models in predictive performance. Thus, integrating ML with Electronic Health Records (EHRs) enables refined risk estimation by leveraging the granularity and breadth of longitudinal individual patient data. However, fundamental barriers persist, including limited generalizability, challenges in interpretability, and the absence of rigorous external validation, all of which impede widespread clinical deployment. Methods This review adheres to the methodological rigor of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Scale for the Assessment of Narrative Review Articles (SANRA) guidelines. A systematic literature search was performed in March 2024, encompassing the Medline and Embase databases, to identify studies published since 2010. Supplementary references were retrieved from the Institute for Scientific Information (ISI) Web of Science, and manual searches were curated. The selection process, conducted via Rayyan, focused on systematic and narrative reviews evaluating ML-driven models for long-term CVD risk prediction within primary prevention contexts utilizing EHR data. Studies investigating short-term prognostication, highly specific comorbid cohorts, or conventional models devoid of ML components were excluded. Results Following an exhaustive screening of 1757 records, 22 studies met the inclusion criteria. Of these, 10 were systematic reviews (four incorporating meta-analyses), while 12 constituted narrative reviews, with the majority published post-2020. The synthesis underscores the superiority of ML in modeling intricate EHR-derived risk factors, facilitating precision-driven cardiovascular risk assessment. Nonetheless, salient challenges endure heterogeneity in CVD outcome definitions, undermine comparability, data incompleteness and inconsistency compromise model robustness, and a dearth of external validation constrains clinical translatability. Moreover, ethical and regulatory considerations, including algorithmic opacity, equity in predictive performance, and the absence of standardized evaluation frameworks, pose formidable obstacles to seamless integration into clinical workflows. Conclusions Despite the transformative potential of ML-based CVD risk prediction, it remains encumbered by methodological, technical, and regulatory impediments that hinder its full-scale adoption into real-world healthcare settings. This review underscores the imperative circumstances for standardized validation protocols, stringent regulatory oversight, and interdisciplinary collaboration to bridge the translational divide. Our findings established an integrative framework for developing, validating, and applying ML-based CVD risk prediction algorithms, addressing both clinical and technical dimensions. To further advance this field, we propose a standardized, transparent, and regulated EHR platform that facilitates fair model evaluation, reproducibility, and clinical translation by providing a high-quality, representative dataset with structured governance and benchmarking mechanisms. Meanwhile, future endeavors must prioritize enhancing model transparency, mitigating biases, and ensuring adaptability to heterogeneous clinical populations, fostering equitable and evidence-based implementation of ML-driven predictive analytics in cardiovascular medicine.
Collapse
Affiliation(s)
- Tianyi Liu
- School of Life Course & Population Sciences, King’s College London, SE1 1UL London, UK
| | - Andrew J. Krentz
- School of Life Course & Population Sciences, King’s College London, SE1 1UL London, UK
- Metadvice, 1025 St-Sulpice, Switzerland
| | - Zhiqiang Huo
- School of Life Course & Population Sciences, King’s College London, SE1 1UL London, UK
| | - Vasa Ćurčin
- School of Life Course & Population Sciences, King’s College London, SE1 1UL London, UK
| |
Collapse
|
12
|
Garg S, Kitchen R, Gupta R, Pearson E. Applications of AI in Predicting Drug Responses for Type 2 Diabetes. JMIR Diabetes 2025; 10:e66831. [PMID: 40146874 PMCID: PMC11967697 DOI: 10.2196/66831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 01/24/2025] [Accepted: 01/27/2025] [Indexed: 03/29/2025] Open
Abstract
Unlabelled Type 2 diabetes mellitus has seen a continuous rise in prevalence in recent years, and a similar trend has been observed in the increased availability of glucose-lowering drugs. There is a need to understand the variation in treatment response to these drugs to be able to predict people who will respond well or poorly to a drug. Electronic health records, clinical trials, and observational studies provide a huge amount of data to explore predictors of drug response. The use of artificial intelligence (AI), which includes machine learning and deep learning techniques, has the capacity to improve the prediction of treatment response in patients. AI can assist in the analysis of vast datasets to identify patterns and may provide valuable information on selecting an effective drug. Predicting an individual's response to a drug can aid in treatment selection, optimizing therapy, exploring new therapeutic options, and personalized medicine. This viewpoint highlights the growing evidence supporting the potential of AI-based methods to predict drug response with accuracy. Furthermore, the methods highlight a trend toward using ensemble methods as preferred models in drug response prediction studies.
Collapse
Affiliation(s)
- Shilpa Garg
- Diabetes Endocrinology and Reproductive Biology, School of Medicine, University of Dundee, Ninewells Avenue, Dundee, DD1 9SY, United Kingdom, 44 7443787733
| | | | | | - Ewan Pearson
- Diabetes Endocrinology and Reproductive Biology, School of Medicine, University of Dundee, Ninewells Avenue, Dundee, DD1 9SY, United Kingdom, 44 7443787733
| |
Collapse
|
13
|
Chen X, Yu B, Zhang Y, Wang X, Huang D, Gong S, Hu W. A machine learning model based on emergency clinical data predicting 3-day in-hospital mortality for stroke and trauma patients. Front Neurol 2025; 16:1512297. [PMID: 40183016 PMCID: PMC11966482 DOI: 10.3389/fneur.2025.1512297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Accepted: 03/05/2025] [Indexed: 04/05/2025] Open
Abstract
Background Accurately predicting the short-term in-hospital mortality risk for patients with stroke and TBI (Traumatic Brain Injury) is crucial for improving the quality of emergency medical care. Method This study analyzed data from 2,125 emergency admission patients with stroke and traumatic brain injury at two Grade a hospitals in China from January 2021 to March 2024. LASSO regression was used for feature selection, and the predictive performance of logistic regression was compared with six machine learning algorithms. A 70:30 ratio was applied for cross-validation, and confidence intervals were calculated using the bootstrap method. Temporal validation was performed on the best-performing model. SHAP values were employed to assess variable importance. Results The random forest algorithm excelled in predicting in-hospital 3-day mortality, achieving an AUC of 0.978 (95% CI: 0.966-0.986). Time series validation demonstrated the model's strong generalization capability, with an AUC of 0.975 (95% CI: 0.963-0.986). Key predictive factors in the final model included metabolic syndrome, NEWS2 score, Glasgow Coma Scale (GCS), whether surgery was performed, bowel movement status, potassium level (K), aspartate transaminase (AST) level, and temporal factors. SHAP value analysis further confirmed the significant contributions of these variables to the predictive outcomes. The random forest model developed in this study demonstrates good accuracy in predicting short-term in-hospital mortality rates for stroke and traumatic brain injury patients. The model integrates emergency scores, clinical signs, and key biochemical indicators, providing a comprehensive perspective for risk assessment. This approach, which incorporates emergency data, holds promise for assisting decision-making in clinical practice, thereby improving patient outcomes.
Collapse
Affiliation(s)
- Xu Chen
- Shangrao People's Hospital, Shangrao, China
| | - Bin Yu
- Shangrao People's Hospital, Shangrao, China
| | | | - Xin Wang
- Huaian Hospital of Huaian City, Huai'an, China
| | | | | | - Wei Hu
- School of Nursing, Jinzhou Medical University, Jinzhou, China
| |
Collapse
|
14
|
Daphne S, Rajam VMA, Hemanth P, Dinesh S. An Ensemble Patient Graph Framework for Predictive Modelling from Electronic Health Records and Medical Notes. Diagnostics (Basel) 2025; 15:756. [PMID: 40150098 PMCID: PMC11941089 DOI: 10.3390/diagnostics15060756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2025] [Revised: 03/01/2025] [Accepted: 03/12/2025] [Indexed: 03/29/2025] Open
Abstract
Objective: Electronic health records (EHRs) are becoming increasingly important in both academic research and business applications. Recent studies indicate that predictive tasks, such as heart failure detection, perform better when the geometric structure of EHR data, including the relationships between diagnoses and treatments, is considered. However, many EHRs lack essential structural information. This study aims to improve predictive accuracy in healthcare by constructing a Patient Knowledge Graph Ensemble Framework (PKGNN) to analyse ICU patient cohorts and predict mortality and hospital readmission outcomes. Methods: This study utilises a cohort of 42,671 patients from the MIMIC-IV dataset to build the PKGNN framework, which consists of three main components: (1) medical note extraction, (2) patient graph construction, and (3) prediction tasks. Advanced Natural Language Processing (NLP) models, including Clinical BERT, BioBERT, and BlueBERT, extract and integrate semantic representations from discharge summaries into a patient knowledge graph. This structured representation is then used to enhance predictive tasks. Results: Performance evaluations on the MIMIC-IV dataset indicate that the PKGNN framework outperforms state-of-the-art baseline models in predicting mortality and 30-day hospital readmission. A thorough framework analysis reveals that incorporating patient graph structures improves prediction accuracy. Furthermore, an ensemble model enhances risk prediction performance and identifies crucial clinical indicators. Conclusions: This study highlights the importance of leveraging structured knowledge graphs in EHR analysis to improve predictive modelling for critical healthcare outcomes. The PKGNN framework enhances the accuracy of mortality and readmission predictions by integrating advanced NLP techniques with patient graph structures. This work contributes to the literature by advancing knowledge graph-based EHR analysis strategies, ultimately supporting better clinical decision-making and risk assessment.
Collapse
Affiliation(s)
- S. Daphne
- Department of Computer Science and Engineering, CEG Campus, Anna University, Chennai 600025, India; (S.D.); (P.H.); (S.D.)
| | - V. Mary Anita Rajam
- Centre for Cyber Security, Department of Computer Science and Engineering, CEG Campus, Anna University, Chennai 600025, India
| | - P. Hemanth
- Department of Computer Science and Engineering, CEG Campus, Anna University, Chennai 600025, India; (S.D.); (P.H.); (S.D.)
| | - Sundarrajan Dinesh
- Department of Computer Science and Engineering, CEG Campus, Anna University, Chennai 600025, India; (S.D.); (P.H.); (S.D.)
| |
Collapse
|
15
|
Quang Tran V, Byeon H. Explainable hybrid tabular Variational Autoencoder and feature Tokenizer Transformer for depression prediction. EXPERT SYSTEMS WITH APPLICATIONS 2025; 265:126084. [DOI: 10.1016/j.eswa.2024.126084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
|
16
|
Yazdani S, Henry RC, Byrne A, Henry IC. Utility of word embeddings from large language models in medical diagnosis. J Am Med Inform Assoc 2025; 32:526-534. [PMID: 39786898 PMCID: PMC11833464 DOI: 10.1093/jamia/ocae314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 12/07/2024] [Accepted: 12/13/2024] [Indexed: 01/12/2025] Open
Abstract
OBJECTIVE This study evaluates the utility of word embeddings, generated by large language models (LLMs), for medical diagnosis by comparing the semantic proximity of symptoms to their eponymic disease embedding ("eponymic condition") and the mean of all symptom embeddings associated with a disease ("ensemble mean"). MATERIALS AND METHODS Symptom data for 5 diagnostically challenging pediatric diseases-CHARGE syndrome, Cowden disease, POEMS syndrome, Rheumatic fever, and Tuberous sclerosis-were collected from PubMed. Using the Ada-002 embedding model, disease names and symptoms were translated into vector representations in a high-dimensional space. Euclidean and Chebyshev distance metrics were used to classify symptoms based on their proximity to both the eponymic condition and the ensemble mean of the condition's symptoms. RESULTS The ensemble mean approach showed significantly higher classification accuracy, correctly classifying between 80% (Cowden disease) to 100% (Tuberous sclerosis) of the sample disease symptoms using the Euclidean distance metric. In contrast, the eponymic condition approach using Euclidian distance metric and Chebyshev distances, in general, showed poor symptom classification performance, with erratic results (0%-100% accuracy), largely ranging between 0% and 3% accuracy. DISCUSSION The ensemble mean captures a disease's collective symptom profile, providing a more nuanced representation than the disease name alone. However, some misclassifications were due to superficial semantic similarities, highlighting the need for LLM models trained on medical corpora. CONCLUSION The ensemble mean of symptom embeddings improves classification accuracy over the eponymic condition approach. Future efforts should focus on medical-specific training of LLMs to enhance their diagnostic accuracy and clinical utility.
Collapse
Affiliation(s)
- Shahram Yazdani
- Department of Pediatrics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, United States
| | - Ronald Claude Henry
- Department of Civil Engineering, University of Southern California, Los Angeles, CA 90089, United States
| | | | | |
Collapse
|
17
|
Huang X, Arora J, Erzurumluoglu AM, Stanhope SA, Lam D, Zhao H, Ding Z, Wang Z, de Jong J. Enhancing patient representation learning with inferred family pedigrees improves disease risk prediction. J Am Med Inform Assoc 2025; 32:435-446. [PMID: 39723811 PMCID: PMC11833479 DOI: 10.1093/jamia/ocae297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 10/29/2024] [Accepted: 11/19/2024] [Indexed: 12/28/2024] Open
Abstract
BACKGROUND Machine learning and deep learning are powerful tools for analyzing electronic health records (EHRs) in healthcare research. Although family health history has been recognized as a major predictor for a wide spectrum of diseases, research has so far adopted a limited view of family relations, essentially treating patients as independent samples in the analysis. METHODS To address this gap, we present ALIGATEHR, which models inferred family relations in a graph attention network augmented with an attention-based medical ontology representation, thus accounting for the complex influence of genetics, shared environmental exposures, and disease dependencies. RESULTS Taking disease risk prediction as a use case, we demonstrate that explicitly modeling family relations significantly improves predictions across the disease spectrum. We then show how ALIGATEHR's attention mechanism, which links patients' disease risk to their relatives' clinical profiles, successfully captures genetic aspects of diseases using longitudinal EHR diagnosis data. Finally, we use ALIGATEHR to successfully distinguish the 2 main inflammatory bowel disease subtypes with highly shared risk factors and symptoms (Crohn's disease and ulcerative colitis). CONCLUSION Overall, our results highlight that family relations should not be overlooked in EHR research and illustrate ALIGATEHR's great potential for enhancing patient representation learning for predictive and interpretable modeling of EHRs.
Collapse
Affiliation(s)
- Xiayuan Huang
- Department of Biostatistics, Yale University School of Public Health, New Haven, CT 06510, United States
| | - Jatin Arora
- Human Genetics, Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ 88400, Germany
| | - Abdullah Mesut Erzurumluoglu
- Human Genetics, Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ 88400, Germany
| | - Stephen A Stanhope
- Real World Data and Analytics, Global Medical Affairs, Boehringer Ingelheim, Ridgefield, CT 06877, United States
| | - Daniel Lam
- CB CMDR, Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ 88400, Germany
| | - Hongyu Zhao
- Department of Biostatistics, Yale University School of Public Health, New Haven, CT 06510, United States
| | - Zhihao Ding
- Human Genetics, Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ 88400, Germany
| | - Zuoheng Wang
- Department of Biostatistics, Yale University School of Public Health, New Haven, CT 06510, United States
- Department of Biomedical Informatics & Data Science, Yale University School of Medicine, New Haven, CT 06510, United States
| | - Johann de Jong
- Statistical Modeling, Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ 88400, Germany
| |
Collapse
|
18
|
Zhan Z, Zhou S, Li M, Zhang R. RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements. J Am Med Inform Assoc 2025; 32:545-554. [PMID: 39798153 PMCID: PMC11833482 DOI: 10.1093/jamia/ocaf002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Revised: 12/20/2024] [Accepted: 01/03/2025] [Indexed: 01/15/2025] Open
Abstract
OBJECTIVE To develop an advanced multi-task large language model (LLM) framework for extracting diverse types of information about dietary supplements (DSs) from clinical records. METHODS We focused on 4 core DS information extraction tasks: named entity recognition (2 949 clinical sentences), relation extraction (4 892 sentences), triple extraction (2 949 sentences), and usage classification (2 460 sentences). To address these tasks, we introduced the retrieval-augmented multi-task information extraction (RAMIE) framework, which incorporates: (1) instruction fine-tuning with task-specific prompts; (2) multi-task training of LLMs to enhance storage efficiency and reduce training costs; and (3) retrieval-augmented generation, which retrieves similar examples from the training set to improve task performance. We compared the performance of RAMIE to LLMs with instruction fine-tuning alone and conducted an ablation study to evaluate the individual contributions of multi-task learning and retrieval-augmented generation to overall performance improvements. RESULTS Using the RAMIE framework, Llama2-13B achieved an F1 score of 87.39 on the named entity recognition task, reflecting a 3.51% improvement. It also excelled in the relation extraction task with an F1 score of 93.74, a 1.15% improvement. For the triple extraction task, Llama2-7B achieved an F1 score of 79.45, representing a significant 14.26% improvement. MedAlpaca-7B delivered the highest F1 score of 93.45 on the usage classification task, with a 0.94% improvement. The ablation study highlighted that while multi-task learning improved efficiency with a minor trade-off in performance, the inclusion of retrieval-augmented generation significantly enhanced overall accuracy across tasks. CONCLUSION The RAMIE framework demonstrates substantial improvements in multi-task information extraction for DS-related data from clinical records.
Collapse
Affiliation(s)
- Zaifu Zhan
- Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, United States
| | - Shuang Zhou
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Mingchen Li
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
19
|
Gong M, Jiang Y, Sun Y, Liao R, Liu Y, Yan Z, He A, Zhou M, Yang J, Wu Y, Wu Z, Huang Z, Wu H, Jiang L. Knowledge domain and frontier trends of artificial intelligence applied in solid organ transplantation: A visualization analysis. Int J Med Inform 2025; 195:105782. [PMID: 39761617 DOI: 10.1016/j.ijmedinf.2024.105782] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 12/30/2024] [Accepted: 12/30/2024] [Indexed: 02/12/2025]
Abstract
BACKGROUND Solid organ transplantation (SOT) is vital for end-stage organ failure but faces challenges like organ shortage and rejection. Artificial intelligence (AI) offers potential to improve outcomes through better matching, success prediction, and automation. However, the evolution of AI in SOT research remains underexplored. This study uses bibliometric analysis to identify trends, hotspots, and key contributors in the field. METHODS 821 articles from the Web of Science Core Collection were exported for analysis. Microsoft Excel 2021 was used for descriptive statistics. VOSviewer, CiteSpace, Scimago Graphica, and Biblioshiny were used for bibliometric analysis. The ggalluvial package in R was utilized to create Sankey diagrams, and top articles were selected based on citation count. RESULTS This analysis reveals the rapid expansion of AI in SOT. Key areas include robotic surgery, organ allocation, outcome prediction, immunosuppression management, and precision medicine. Robotic surgery has improved transplant outcomes. AI algorithms optimize organ matching and enhance fairness. Machine learning models predict outcomes and guide treatment, while AI-based systems advance personalized immunosuppression. AI in precision medicine, including diagnostics and imaging, is crucial for transplant success. CONCLUSION This study highlights AI's transformative potential in SOT, with significant contributions from countries like the USA, Canada, and the UK. Key institutions such as the University of Toronto and the University of Pittsburgh have played vital roles. However, practical challenges like ethical issues, bias, and data integration remain. Fostering international and interdisciplinary collaborations is crucial for overcoming these challenges and accelerating AI's integration into clinical practice, ultimately improving patient outcomes.
Collapse
Affiliation(s)
- Miao Gong
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Yingsong Jiang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Yingshuo Sun
- Department of Obstetrics and Gynecology, Jinan Central Hospital of Shandong Province, Jinan, Shandong, China
| | - Rui Liao
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Yanyao Liu
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Zikang Yan
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Aiting He
- Department of Hepatobiliary Pancreatic Tumor Center, Chongqing University Cancer Hospital, Chongqing, China
| | - Mingming Zhou
- Department of Hepatobiliary Pancreatic Tumor Center, Chongqing University Cancer Hospital, Chongqing, China
| | - Jie Yang
- Department of Hepatobiliary Pancreatic Tumor Center, Chongqing University Cancer Hospital, Chongqing, China
| | - Yongzhong Wu
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Zhongjun Wu
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - ZuoTian Huang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China; Department of Hepatobiliary Pancreatic Tumor Center, Chongqing University Cancer Hospital, Chongqing, China.
| | - Hao Wu
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
| | - Liqing Jiang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
| |
Collapse
|
20
|
Cabral BP, Braga LAM, Conte Filho CG, Penteado B, Freire de Castro Silva SL, Castro L, Fornazin M, Mota F. Future Use of AI in Diagnostic Medicine: 2-Wave Cross-Sectional Survey Study. J Med Internet Res 2025; 27:e53892. [PMID: 40053779 PMCID: PMC11907171 DOI: 10.2196/53892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 05/06/2024] [Accepted: 10/18/2024] [Indexed: 03/09/2025] Open
Abstract
BACKGROUND The rapid evolution of artificial intelligence (AI) presents transformative potential for diagnostic medicine, offering opportunities to enhance diagnostic accuracy, reduce costs, and improve patient outcomes. OBJECTIVE This study aimed to assess the expected future impact of AI on diagnostic medicine by comparing global researchers' expectations using 2 cross-sectional surveys. METHODS The surveys were conducted in September 2020 and February 2023. Each survey captured a 10-year projection horizon, gathering insights from >3700 researchers with expertise in AI and diagnostic medicine from all over the world. The survey sought to understand the perceived benefits, integration challenges, and evolving attitudes toward AI use in diagnostic settings. RESULTS Results indicated a strong expectation among researchers that AI will substantially influence diagnostic medicine within the next decade. Key anticipated benefits include enhanced diagnostic reliability, reduced screening costs, improved patient care, and decreased physician workload, addressing the growing demand for diagnostic services outpacing the supply of medical professionals. Specifically, x-ray diagnosis, heart rhythm interpretation, and skin malignancy detection were identified as the diagnostic tools most likely to be integrated with AI technologies due to their maturity and existing AI applications. The surveys highlighted the growing optimism regarding AI's ability to transform traditional diagnostic pathways and enhance clinical decision-making processes. Furthermore, the study identified barriers to the integration of AI in diagnostic medicine. The primary challenges cited were the difficulties of embedding AI within existing clinical workflows, ethical and regulatory concerns, and data privacy issues. Respondents emphasized uncertainties around legal responsibility and accountability for AI-supported clinical decisions, data protection challenges, and the need for robust regulatory frameworks to ensure safe AI deployment. Ethical concerns, particularly those related to algorithmic transparency and bias, were noted as increasingly critical, reflecting a heightened awareness of the potential risks associated with AI adoption in clinical settings. Differences between the 2 survey waves indicated a growing focus on ethical and regulatory issues, suggesting an evolving recognition of these challenges over time. CONCLUSIONS Despite these barriers, there was notable consistency in researchers' expectations across the 2 survey periods, indicating a stable and sustained outlook on AI's transformative potential in diagnostic medicine. The findings show the need for interdisciplinary collaboration among clinicians, AI developers, and regulators to address ethical and practical challenges while maximizing AI's benefits. This study offers insights into the projected trajectory of AI in diagnostic medicine, guiding stakeholders, including health care providers, policy makers, and technology developers, on navigating the opportunities and challenges of AI integration.
Collapse
Affiliation(s)
- Bernardo Pereira Cabral
- Cellular Communication Laboratory, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
- Department of Economics, Faculty of Economics, Federal University of Bahia, Salvador, Brazil
| | - Luiza Amara Maciel Braga
- Cellular Communication Laboratory, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | | | - Bruno Penteado
- Fiocruz Strategy for the 2030 Agenda, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Sandro Luis Freire de Castro Silva
- National Cancer Institute, Rio de Janeiro, Brazil
- Graduate Program in Management and Strategy, Federal Rural University of Rio de Janeiro, Seropedica, Brazil
| | - Leonardo Castro
- Fiocruz Strategy for the 2030 Agenda, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
- National School of Public Health, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Marcelo Fornazin
- Fiocruz Strategy for the 2030 Agenda, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
- National School of Public Health, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Fabio Mota
- Cellular Communication Laboratory, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| |
Collapse
|
21
|
Guo Y, Ma F, Li P, Guo L, Liu Z, Huo C, Shi C, Zhu L, Gu M, Na R, Zhang W. Comprehensive SHAP Values and Single-Cell Sequencing Technology Reveal Key Cell Clusters in Bovine Skeletal Muscle. Int J Mol Sci 2025; 26:2054. [PMID: 40076676 PMCID: PMC11900076 DOI: 10.3390/ijms26052054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 02/24/2025] [Accepted: 02/25/2025] [Indexed: 03/14/2025] Open
Abstract
The skeletal muscle of cattle is the main component of their muscular system, responsible for supporting and movement functions. However, there are still many unknown areas regarding the ranking of the importance of different types of cell populations within it. This study conducted in-depth research and made a series of significant findings. First, we trained 15 bovine skeletal muscle models and selected the best-performing model as the initial model. Based on the SHAP (Shapley Additive exPlanations) analysis of this initial model, we obtained the SHAP values of 476 important genes. Using the contributions of these 476 genes, we reconstructed a 476-gene SHAP value matrix, and relying solely on the interactions among these 476 genes, successfully mapped the single-cell atlas of bovine skeletal muscle. After retraining the model and further interpretation, we found that Myofiber cells are the most representative cell type in bovine skeletal muscle, followed by neutrophils. By determining the key genes of each cell type through SHAP values, we conducted analyses on the correlations among key genes and between cells for Myofiber cells, revealing the critical role these genes play in muscle growth and development. Further, by using protein language models, we performed cross-species comparisons between cattle and pigs, deepening our understanding of Myofiber cells as key cells in skeletal muscle, and exploring the common regulatory mechanisms of muscle development across species.
Collapse
Affiliation(s)
- Yaqiang Guo
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot 010010, China; (Y.G.); (F.M.); (L.G.); (C.H.); (C.S.); (L.Z.); (M.G.)
- Inner Mongolia Engineering Research Center of Genomic Big Data for Agriculture, Hohhot 010018, China
| | - Fengying Ma
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot 010010, China; (Y.G.); (F.M.); (L.G.); (C.H.); (C.S.); (L.Z.); (M.G.)
- Inner Mongolia Engineering Research Center of Genomic Big Data for Agriculture, Hohhot 010018, China
| | - Peipei Li
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot 010010, China; (Y.G.); (F.M.); (L.G.); (C.H.); (C.S.); (L.Z.); (M.G.)
| | - Lili Guo
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot 010010, China; (Y.G.); (F.M.); (L.G.); (C.H.); (C.S.); (L.Z.); (M.G.)
| | - Zaixia Liu
- College of Life Sciences, Inner Mongolia University, Hohhot 010020, China;
| | - Chenxi Huo
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot 010010, China; (Y.G.); (F.M.); (L.G.); (C.H.); (C.S.); (L.Z.); (M.G.)
| | - Caixia Shi
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot 010010, China; (Y.G.); (F.M.); (L.G.); (C.H.); (C.S.); (L.Z.); (M.G.)
| | - Lin Zhu
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot 010010, China; (Y.G.); (F.M.); (L.G.); (C.H.); (C.S.); (L.Z.); (M.G.)
| | - Mingjuan Gu
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot 010010, China; (Y.G.); (F.M.); (L.G.); (C.H.); (C.S.); (L.Z.); (M.G.)
| | - Risu Na
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot 010010, China; (Y.G.); (F.M.); (L.G.); (C.H.); (C.S.); (L.Z.); (M.G.)
| | - Wenguang Zhang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot 010010, China; (Y.G.); (F.M.); (L.G.); (C.H.); (C.S.); (L.Z.); (M.G.)
| |
Collapse
|
22
|
Li R. Integrative diagnosis of psychiatric conditions using ChatGPT and fMRI data. BMC Psychiatry 2025; 25:145. [PMID: 39972267 PMCID: PMC11837688 DOI: 10.1186/s12888-025-06586-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Accepted: 02/06/2025] [Indexed: 02/21/2025] Open
Abstract
BACKGROUND Traditional diagnostic methods for psychiatric disorders often rely on subjective assessments, leading to inconsistent diagnoses. Integrating advanced natural language processing (NLP) techniques with neuroimaging data may improve diagnostic accuracy. METHODS We propose a novel approach that uses ChatGPT to conduct interactive patient interviews, capturing nuanced emotional and psychological data. By analyzing these dialogues using NLP, we generate a comprehensive feature matrix. This matrix, combined with 4D fMRI data, is input into a neural network to predict psychiatric diagnoses. We conducted comparative analysis with survey-based and app-based methods, providing detailed statistical validation. RESULTS Our model achieved an accuracy of 85.7%, significantly outperforming traditional methods. Statistical analysis confirmed the superiority of the ChatGPT-based approach in capturing nuanced patient information, with p-values indicating significant improvements over baseline models. CONCLUSIONS Integrating NLP-driven patient interactions with fMRI data offers a promising approach to psychiatric diagnosis, enhancing precision and reliability. This method could advance clinical practice by providing a more objective and comprehensive diagnostic tool, although more research is needed to generalize these findings.
Collapse
Affiliation(s)
- Runda Li
- Vanderbilt University, 2301 Vanderbilt Place, Nashville, 37235, TN, USA.
| |
Collapse
|
23
|
Wang B, Sheu YH, Lee H, Mealer RG, Castro VM, Smoller JW. Prediction of early-onset bipolar using electronic health records. J Child Psychol Psychiatry 2025. [PMID: 39967306 DOI: 10.1111/jcpp.14131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/11/2024] [Indexed: 02/20/2025]
Abstract
BACKGROUND Early identification of bipolar disorder (BD) provides an important opportunity for timely intervention. In this study, we aimed to develop machine learning models using large-scale electronic health record (EHR) data including clinical notes for predicting early-onset BD. METHODS Structured and unstructured data were extracted from the longitudinal EHR of the Mass General Brigham health system. We defined three cohorts aged 10-25 years: (1) the full youth cohort (N = 300,398); (2) a subcohort defined by having a mental health visit (N = 105,461); and (3) a subcohort defined by having a diagnosis of mood disorder or ADHD (N = 35,213). By adopting a prospective landmark modeling approach that aligns with clinical practice, we developed and validated a range of machine learning models, across different cohorts and prediction windows. RESULTS We found the two tree-based models, random forests (RF) and light gradient-boosting machine (LGBM), achieving good discriminative performance across different clinical settings (area under the receiver operating characteristic curve 0.76-0.88 for RF and 0.74-0.89 for LGBM). In addition, we showed comparable performance can be achieved with a greatly reduced set of features, demonstrating computational efficiency can be attained without significant compromise of model accuracy. CONCLUSIONS Good discriminative performance for models predicting early-onset BD can be achieved utilizing large-scale EHR data. Our study offers a scalable and accurate method for identifying youth at risk for BD that could help inform clinical decision-making and facilitate early intervention. Future work includes evaluating the portability of our approach to other healthcare systems and exploring considerations regarding possible implementation.
Collapse
Affiliation(s)
- Bo Wang
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yi-Han Sheu
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Hyunjoon Lee
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Robert G Mealer
- Department of Psychiatry, Oregon Health & Science University, Portland, OR, USA
| | - Victor M Castro
- Research Information Science and Computing, Mass General Brigham, Somerville, MA, USA
| | - Jordan W Smoller
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
24
|
Miranda O, Jiang C, Qi X, Kofler J, Sweet RA, Wang L. Exploring Potential Medications for Alzheimer's Disease with Psychosis by Integrating Drug Target Information into Deep Learning Models: A Data-Driven Approach. Int J Mol Sci 2025; 26:1617. [PMID: 40004081 PMCID: PMC11855865 DOI: 10.3390/ijms26041617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 01/24/2025] [Accepted: 02/11/2025] [Indexed: 02/27/2025] Open
Abstract
Approximately 50% of Alzheimer's disease (AD) patients develop psychotic symptoms, leading to a subtype known as psychosis in AD (AD + P), which is associated with accelerated cognitive decline compared to AD without psychosis. Currently, no FDA-approved medication specifically addresses AD + P. This study aims to improve psychosis predictions and identify potential therapeutic agents using the DeepBiomarker deep learning model by incorporating drug-target interactions. Electronic health records from the University of Pittsburgh Medical Center were analyzed to predict psychosis within three months of AD diagnosis. AD + P patients were classified as those with either a formal psychosis diagnosis or antipsychotic prescriptions post-AD diagnosis. Two approaches were employed as follows: (1) a drug-focused method using individual medications and (2) a target-focused method pooling medications by shared targets. The updated DeepBiomarker model achieved an area under the receiver operating curve (AUROC) above 0.90 for psychosis prediction. A drug-focused analysis identified gabapentin, amlodipine, levothyroxine, and others as potentially beneficial. A target-focused analysis highlighted significant proteins, including integrins, calcium channels, and tyrosine hydroxylase, confirming several medications linked to these targets. Integrating drug-target information into predictive models improves the identification of medications for AD + P risk reduction, offering a promising strategy for therapeutic development.
Collapse
Affiliation(s)
- Oshin Miranda
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA; (O.M.); (C.J.); (X.Q.)
| | - Chen Jiang
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA; (O.M.); (C.J.); (X.Q.)
| | - Xiguang Qi
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA; (O.M.); (C.J.); (X.Q.)
| | - Julia Kofler
- Division of Neuropathology, Department of Pathology, University of Pittsburgh, Pittsburgh, PA 15213, USA;
- Alzheimer Disease Research Center, University of Pittsburgh, Pittsburgh, PA 15213, USA;
| | - Robert A. Sweet
- Alzheimer Disease Research Center, University of Pittsburgh, Pittsburgh, PA 15213, USA;
- Department of Psychiatry, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lirong Wang
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA; (O.M.); (C.J.); (X.Q.)
| |
Collapse
|
25
|
Ding H, Xia W, Zhou Y, Wei L, Feng Y, Wang Z, Song X, Li R, Mao Q, Chen B, Wang H, Huang X, Zhu B, Jiang D, Sun J, Dong G, Jiang F. Evaluation and practical application of prompt-driven ChatGPTs for EMR generation. NPJ Digit Med 2025; 8:77. [PMID: 39894840 PMCID: PMC11788423 DOI: 10.1038/s41746-025-01472-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 01/19/2025] [Indexed: 02/04/2025] Open
Abstract
This study investigates the application of prompt engineering to optimize prompt-driven ChatGPT for generating electronic medical records (EMRs) during lung nodule screening. We assessed the performance of ChatGPT in generating EMRs from patient-provider verbal consultations and integrated this approach into practical tools, such as WeChat mini-programs, accessible to patients before hospital visits. The findings highlight ChatGPT's potential to enhance workflow efficiency and improve diagnostic processes in clinical settings.
Collapse
Affiliation(s)
- Hanlin Ding
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China
| | - Wenjie Xia
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
| | - Yujia Zhou
- The Second Clinical Medical School of Nanjing Medical University, Nanjing, China
| | - Lei Wei
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China
- Department of Cardiothoracic Surgery, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China
| | - Yipeng Feng
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China
| | - Zi Wang
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China
| | - Xuming Song
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China
| | - Rutao Li
- Department of Thoracic Surgery, Dushu Lake Hospital Affiliated to Soochow University, Suzhou, China
| | - Qixing Mao
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
| | - Bing Chen
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
| | - Hui Wang
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China
| | - Xing Huang
- Pathological Department of Jiangsu Cancer Hospital, Nanjing, P. R. China
| | - Bin Zhu
- Hospital Development Management Office, Nanjing Medical University, Nanjing, China
| | - Dongyu Jiang
- Department of Orthopedics, Wuxi People's Hospital Affiliated to Nanjing Medical University, Wuxi, China
| | - Jingyu Sun
- Department of Cardiology, First Affiliated Hospital of Nanjing Medical University, Jiangsu Province Hospital, Nanjing, China
| | - Gaochao Dong
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China.
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China.
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China.
| | - Feng Jiang
- Department of Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, 21009, Nanjing, China.
- Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Cancer Institute of Jiangsu Province, Nanjing, China.
- The Fourth Clinical College of Nanjing Medical University, Nanjing, China.
| |
Collapse
|
26
|
Luo J, Huang S, Lan L, Yang S, Cao T, Yin J, Qiu J, Yang X, Guo Y, Zhou X. EMR-LIP: A lightweight framework for standardizing the preprocessing of longitudinal irregular data in electronic medical records. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 259:108521. [PMID: 39615196 DOI: 10.1016/j.cmpb.2024.108521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 10/26/2024] [Accepted: 11/18/2024] [Indexed: 12/11/2024]
Abstract
OBJECTIVE Longitudinal data from Electronic Medical Records (EMRs) are increasingly utilized to construct predictive models for various clinical tasks, offering enhanced insights into patient health. However, significant discrepancies exist in preprocessing the irregular and intricate EMR data across studies due to the absence of universally accepted tools and standardization methods. This study introduces the Electronic Medical Record Longitudinal Irregular Data Preprocessing (EMR-LIP) framework, a lightweight approach for optimizing the preprocessing of longitudinal, irregular EMR data, aiming to enhance research efficiency, consistency, reproducibility, and comparability. MATERIALS AND METHODS EMR-LIP modularizes the preprocessing of longitudinal irregular EMR data, offering tools with a low level of encapsulation. Compared to other pipelines, EMR-LIP categorizes variables in a more granular manner, designing specific preprocessing techniques for each type. To demonstrate its versatility, EMR-LIP was applied in an empirical study to two public EMR databases, MIMIC-IV and eICU-CRD. Data processed with EMR-LIP was then used to test several renowned deep learning models on a range of commonly used benchmark tasks. RESULTS In both the MIMIC-IV and eICU-CRD databases, models based on EMR-LIP showed superior baseline performance compared to previous studies. Interestingly, using data preprocessed by EMR-LIP, traditional models such as LSTM and GRU outperformed more complex models, achieving an AUROC of up to 0.94 for in-hospital death prediction. Additionally, models based on EMR-LIP showed stable performance across various resampling intervals and exhibited better fairness in performance across different ethnic groups. CONCLUSION EMR-LIP streamlines the preprocessing of irregular longitudinal EMR data, offering an end-to-end solution for model-ready data creation, and has been open-sourced for collaborative refinement by the research community.
Collapse
Affiliation(s)
- Jiawei Luo
- Department of Cardiovascular Surgery and West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, Sichuan, 610041, China; Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China.
| | - Shixin Huang
- Department of Scientific Research, The People's Hospital of Yubei District of Chongqing, Chongqing, 401120, China; School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Lan Lan
- IT Center, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100070, China.
| | - Shu Yang
- College of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu, 610075, China.
| | - Tingqian Cao
- Integrated Care Management Center, West China Hospital, Sichuan University, Chengdu 610041, China.
| | - Jin Yin
- Department of Cardiovascular Surgery and West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, Sichuan, 610041, China; Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China.
| | - Jiajun Qiu
- Department of Cardiovascular Surgery and West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, Sichuan, 610041, China; Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China.
| | - Xiaoyan Yang
- Department of Cardiovascular Surgery and West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, Sichuan, 610041, China; Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China.
| | - Yingqiang Guo
- Department of Cardiovascular Surgery, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, Sichuan, 610041, China.
| | - Xiaobo Zhou
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA.
| |
Collapse
|
27
|
Lindenmeyer A, Blattmann M, Franke S, Neumuth T, Schneider D. Towards Trustworthy AI in Healthcare: Epistemic Uncertainty Estimation for Clinical Decision Support. J Pers Med 2025; 15:58. [PMID: 39997335 PMCID: PMC11856777 DOI: 10.3390/jpm15020058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Revised: 01/20/2025] [Accepted: 01/23/2025] [Indexed: 02/26/2025] Open
Abstract
Introduction: Widespread adoption of AI for medical decision-making is still hindered due to ethical and safety-related concerns. For AI-based decision support systems in healthcare settings, it is paramount to be reliable and trustworthy. Common deep learning approaches, however, have the tendency towards overconfidence when faced with unfamiliar or changing conditions. Inappropriate extrapolation beyond well-supported scenarios may have dire consequences highlighting the importance of the reliable estimation of local knowledge uncertainty and its communication to the end user. Materials and Methods: While neural network ensembles (ENNs) have been heralded as a potential solution to these issues for many years, deep learning methods, specifically modeling the amount of knowledge, promise more principled and reliable behavior. This study compares their reliability in clinical applications. We centered our analysis on experiments with low-dimensional toy datasets and the exemplary case study of mortality prediction for intensive care unit hospitalizations using Electronic Health Records (EHRs) from the MIMIC3 study. For predictions on the EHR time series, Encoder-Only Transformer models were employed. Knowledge uncertainty estimation is achieved with both ensemble and Spectral Normalized Neural Gaussian Process (SNGP) variants of the common Transformer model. We designed two datasets to test their reliability in detecting token level and more subtle discrepancies both for toy datasets and an EHR dataset. Results: While both SNGP and ENN model variants achieve similar prediction performance (AUROC: ≈0.85, AUPRC: ≈0.52 for in-hospital mortality prediction from a selected MIMIC3 benchmark), the former demonstrates improved capabilities to quantify knowledge uncertainty for individual samples/patients. Discussion/Conclusions: Methods including a knowledge model, such as SNGP, offer superior uncertainty estimation compared to traditional stochastic deep learning, leading to more trustworthy and safe clinical decision support.
Collapse
Affiliation(s)
- Adrian Lindenmeyer
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Leipzig University, Humboldtstraße 25, 04105 Leipzig, Germany
- Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Semmelweisstrasse 14, 04103 Leipzig, Germany
| | - Malte Blattmann
- Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Semmelweisstrasse 14, 04103 Leipzig, Germany
| | - Stefan Franke
- Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Semmelweisstrasse 14, 04103 Leipzig, Germany
| | - Thomas Neumuth
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Leipzig University, Humboldtstraße 25, 04105 Leipzig, Germany
- Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Semmelweisstrasse 14, 04103 Leipzig, Germany
| | - Daniel Schneider
- Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Semmelweisstrasse 14, 04103 Leipzig, Germany
| |
Collapse
|
28
|
Basubrin O. Current Status and Future of Artificial Intelligence in Medicine. Cureus 2025; 17:e77561. [PMID: 39958114 PMCID: PMC11830112 DOI: 10.7759/cureus.77561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/30/2024] [Indexed: 02/18/2025] Open
Abstract
Artificial intelligence (AI) has rapidly emerged as a transformative force in medicine, revolutionizing various aspects of healthcare from diagnostics and treatment to public health and patient care. This narrative review synthesizes evidence from diverse study designs, exploring the current and future applications of AI in medicine. We highlight AI's role in improving diagnostic accuracy, optimizing treatment strategies, and enhancing patient care through personalized interventions and remote monitoring, drawing upon recent advancements and landmark studies. Emerging trends such as explainable AI and federated learning are also examined. While acknowledging the tremendous potential of AI in medicine, the review also addresses the barriers and ethical challenges that need to be overcome, including concerns about algorithmic bias, transparency, over-reliance, and the potential impact on the healthcare workforce. We emphasize the importance of establishing regulatory guidelines, fostering collaboration between clinicians and AI developers, and ensuring ongoing education for healthcare professionals. Despite these challenges, the future of AI in medicine holds immense promise, with the potential to significantly improve patient outcomes, transform healthcare delivery, and address healthcare disparities.
Collapse
Affiliation(s)
- Omar Basubrin
- Department of Medicine, Umm Al-Qura University, Makkah, SAU
| |
Collapse
|
29
|
Obeidat R, Alsmadi I, Baker QB, Al-Njadat A, Srinivasan S. Researching public health datasets in the era of deep learning: a systematic literature review. Health Informatics J 2025; 31:14604582241307839. [PMID: 39794941 DOI: 10.1177/14604582241307839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2025]
Abstract
Objective: Explore deep learning applications in predictive analytics for public health data, identify challenges and trends, and then understand the current landscape. Materials and Methods: A systematic literature review was conducted in June 2023 to search articles on public health data in the context of deep learning, published from the inception of medical and computer science databases through June 2023. The review focused on diverse datasets, abstracting applications, challenges, and advancements in deep learning. Results: 2004 articles were reviewed, identifying 14 disease categories. Observed trends include explainable-AI, patient embedding learning, and integrating different data sources and employing deep learning models in health informatics. Noted challenges were technical reproducibility and handling sensitive data. Discussion: There has been a notable surge in deep learning applications on public health data publications since 2015. Consistent deep learning applications and models continue to be applied across public health data. Despite the wide applications, a standard approach still does not exist for addressing the outstanding challenges and issues in this field. Conclusion: Guidelines are needed for applying deep learning and models in public health data to improve FAIRness, efficiency, transparency, comparability, and interoperability of research. Interdisciplinary collaboration among data scientists, public health experts, and policymakers is needed to harness the full potential of deep learning.
Collapse
Affiliation(s)
- Rand Obeidat
- Department of Management Information Systems, Bowie State University, Bowie, USA
| | - Izzat Alsmadi
- Department of Computational, Engineering and Mathematical Sciences, Texas A & M San Antonio, San Antonio, USA
| | - Qanita Bani Baker
- Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan
| | | | - Sriram Srinivasan
- Department of Management Information Systems, Bowie State University, Bowie, USA
| |
Collapse
|
30
|
Ozdemir C, Olaimat MA, Bozdag S. A Dynamic Model for Early Prediction of Alzheimer's Disease by Leveraging Graph Convolutional Networks and Tensor Algebra. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2025; 30:675-689. [PMID: 39670404 PMCID: PMC11649016 DOI: 10.1142/9789819807024_0048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/16/2025]
Abstract
Alzheimer's disease (AD) is a neurocognitive disorder that deteriorates memory and impairs cognitive functions. Mild Cognitive Impairment (MCI) is generally considered as an intermediate phase between normal cognitive aging and more severe conditions such as AD. Although not all individuals with MCI will develop AD, they are at an increased risk of developing AD. Diagnosing AD once strong symptoms are already present is of limited value, as AD leads to irreversible cognitive decline and brain damage. Thus, it is crucial to develop methods for the early prediction of AD in individuals with MCI. Recurrent Neural Networks (RNN)-based methods have been effectively used to predict the progression from MCI to AD by analyzing electronic health records (EHR). However, despite their widespread use, existing RNN-based tools may introduce increased model complexity and often face difficulties in capturing long-term dependencies. In this study, we introduced a novel Dynamic deep learning model for Early Prediction of AD (DyEPAD) to predict MCI subjects' progression to AD utilizing EHR data. In the first phase of DyEPAD, embeddings for each time step or visit are captured through Graph Convolutional Networks (GCN) and aggregation functions. In the final phase, DyEPAD employs tensor algebraic operations for frequency domain analysis of these embeddings, capturing the full scope of evolutionary patterns across all time steps. Our experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC) datasets demonstrate that our proposed model outperforms or is in par with the state-of-the-art and baseline methods.
Collapse
Affiliation(s)
- Cagri Ozdemir
- Department of Computer Science and Engineering, University of North Texas, TX, USA,
| | - Mohammad Al Olaimat
- Department of Computer Science and Engineering, University of North Texas, TX, USA
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, TX, USA,
| |
Collapse
|
31
|
Dhurandhar D, Dhamande M, C S, Bhadoria P, Chandrakar T, Agrawal J. Exploring Medical Artificial Intelligence Readiness Among Future Physicians: Insights From a Medical College in Central India. Cureus 2025; 17:e76835. [PMID: 39897272 PMCID: PMC11787952 DOI: 10.7759/cureus.76835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/02/2025] [Indexed: 02/04/2025] Open
Abstract
INTRODUCTION Medical students, as future healthcare professionals, are pivotal in the adoption and application of artificial intelligence (AI) in clinical settings. Their ability to effectively engage with AI technologies is shaped by their understanding, attitudes, and perceived significance of AI in medicine. Given the growing prominence of AI in the medical field, it is crucial to evaluate how well-prepared medical students are to integrate and use these technologies proficiently. MATERIALS AND METHODS The cross-sectional study was conducted among 482 undergraduate medical students at a medical college in Central India with the objective to evaluate their readiness for the integration of medical AI into their future clinical practice, utilizing the Medical Artificial Intelligence Readiness Scale for Medical Students (MAIRS-MS) questionnaire. RESULTS The mean age of respondents was 21.39 ± 1.770 years with 282 (58.5%) male participants. The respondents were almost equally distributed among all Bachelor of Medicine and Bachelor of Surgery (MBBS) batch students. The average MAIRS-MS score came out to be 74.61 ± 10.137 out of a maximum of 110, whereas the mean values of various subscales of MAIRS-MS were as follows: Cognition Factor, 26.23 ± 4.417; Ability Factor, 27.62 ± 4.372; Vision Factor, 10.37 ± 1.803; and Ethics Factor, 10.39 ± 1.789. CONCLUSION Although there is overall readiness for AI among the respondents, significant variation exists among individuals, especially in the areas of Cognition and Ability. The data highlights the necessity for focused educational programs to improve AI knowledge, skills, and ethical understanding, ensuring that every respondent is well-equipped to handle the advancing field of AI in medicine.
Collapse
Affiliation(s)
- Diwakar Dhurandhar
- Anatomy, Pt. Jawahar Lal Nehru Memorial (JNM) Medical College, Raipur, IND
| | | | - Shivaleela C
- Anatomy, Sri Siddhartha Medical College, Tumkur, IND
| | - Pooja Bhadoria
- Anatomy, All India Institute of Medical Sciences, Rishikesh, IND
| | - Tripti Chandrakar
- Community Medicine, Pt. Jawahar Lal Nehru Memorial (JNM) Medical College, Raipur, IND
| | - Jagriti Agrawal
- Anatomy, Pt. Jawahar Lal Nehru Memorial (JNM) Medical College, Raipur, IND
| |
Collapse
|
32
|
Mușat F, Păduraru DN, Bolocan A, Palcău CA, Copăceanu AM, Ion D, Jinga V, Andronic O. Machine Learning Models in Sepsis Outcome Prediction for ICU Patients: Integrating Routine Laboratory Tests-A Systematic Review. Biomedicines 2024; 12:2892. [PMID: 39767798 PMCID: PMC11727033 DOI: 10.3390/biomedicines12122892] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Revised: 12/11/2024] [Accepted: 12/15/2024] [Indexed: 01/16/2025] Open
Abstract
Background. Sepsis presents significant diagnostic and prognostic challenges, and traditional scoring systems, such as SOFA and APACHE, show limitations in predictive accuracy. Machine learning (ML)-based predictive survival models can support risk assessment and treatment decision-making in the intensive care unit (ICU) by accounting for the numerous and complex factors that influence the outcome in the septic patient. Methods. A systematic literature review of studies published from 2014 to 2024 was conducted using the PubMed database. Eligible studies investigated the development of ML models incorporating commonly available laboratory and clinical data for predicting survival outcomes in adult ICU patients with sepsis. Study selection followed the PRISMA guidelines and relied on predefined inclusion criteria. All records were independently assessed by two reviewers, with conflicts resolved by a third senior reviewer. Data related to study design, methodology, results, and interpretation of the results were extracted in a predefined grid. Results. Overall, 19 studies were identified, encompassing primarily logistic regression, random forests, and neural networks. Most used datasets were US-based (MIMIC-III, MIMIC-IV, and eICU-CRD). The most common variables used in model development were age, albumin levels, lactate levels, and ventilator. ML models demonstrated superior performance metrics compared to conventional methods and traditional scoring systems. The best-performing model was a gradient boosting decision tree, with an area under curve of 0.992, an accuracy of 0.954, and a sensitivity of 0.917. However, several critical limitations should be carefully considered when interpreting the results, such as population selection bias (i.e., single center studies), small sample sizes, limited external validation, and model interpretability. Conclusions. Through real-time integration of routine laboratory and clinical data, ML-based tools can assist clinical decision-making and enhance the consistency and quality of sepsis management across various healthcare contexts, including ICUs with limited resources.
Collapse
Affiliation(s)
- Florentina Mușat
- Carol Davila University of Medicine and Pharmacy, Faculty of Medicine, General Surgery Department, University Emergency Hospital of Bucharest, 050098 Bucharest, Romania; (F.M.); (A.B.); (C.A.P.); (D.I.); (O.A.)
| | - Dan Nicolae Păduraru
- Carol Davila University of Medicine and Pharmacy, Faculty of Medicine, General Surgery Department, University Emergency Hospital of Bucharest, 050098 Bucharest, Romania; (F.M.); (A.B.); (C.A.P.); (D.I.); (O.A.)
| | - Alexandra Bolocan
- Carol Davila University of Medicine and Pharmacy, Faculty of Medicine, General Surgery Department, University Emergency Hospital of Bucharest, 050098 Bucharest, Romania; (F.M.); (A.B.); (C.A.P.); (D.I.); (O.A.)
| | - Cosmin Alexandru Palcău
- Carol Davila University of Medicine and Pharmacy, Faculty of Medicine, General Surgery Department, University Emergency Hospital of Bucharest, 050098 Bucharest, Romania; (F.M.); (A.B.); (C.A.P.); (D.I.); (O.A.)
| | - Andreea-Maria Copăceanu
- Bucharest University of Economic Studies, Faculty of Cybernetics, Statistics and Informatics, 010374 Bucharest, Romania;
| | - Daniel Ion
- Carol Davila University of Medicine and Pharmacy, Faculty of Medicine, General Surgery Department, University Emergency Hospital of Bucharest, 050098 Bucharest, Romania; (F.M.); (A.B.); (C.A.P.); (D.I.); (O.A.)
| | - Viorel Jinga
- Carol Davila University of Medicine and Pharmacy, Faculty of Medicine, Urology Department, “Prof. Dr. Th. Burghele” Clinical Hospital, 061344 Bucharest, Romania;
| | - Octavian Andronic
- Carol Davila University of Medicine and Pharmacy, Faculty of Medicine, General Surgery Department, University Emergency Hospital of Bucharest, 050098 Bucharest, Romania; (F.M.); (A.B.); (C.A.P.); (D.I.); (O.A.)
- Innovation and eHealth Center, Carol Davila University of Medicine and Pharmacy Bucharest, 010451 Bucharest, Romania
| |
Collapse
|
33
|
Abbas S, Iftikhar M, Shah MM, Khan SJ. ChatGPT-Assisted Machine Learning for Chronic Disease Classification and Prediction: A Developmental and Validation Study. Cureus 2024; 16:e75851. [PMID: 39822450 PMCID: PMC11736518 DOI: 10.7759/cureus.75851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/17/2024] [Indexed: 01/19/2025] Open
Abstract
Background Chronic diseases such as chronic kidney disease (CKD), chronic liver disease (CLD), tuberculosis (TB), dementia, and heart disease are global health concerns of significant importance, representing major causes of morbidity and mortality worldwide. Early diagnosis and interventions are critical to improve patient outcomes and reduce healthcare costs. Methods This prospective observational study analyzed clinical data from 270 patients (calculated using G*Power 3.1.9.7 analysis (Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany), α = 0.05, power = 0.80), with 260 (96.3%) completing the protocol. The cohort comprised 149 (55.2%) males and 121 (44.8%) females, distributed across CKD (n=55, 21.2%), CLD (n=52, 20.0%), TB (n=51, 19.6%), dementia (n=50, 19.2%), and heart disease (n=52, 20.0%). Three ML models were employed with ChatGPT version 3.5 assistance (OpenAI, San Francisco, CA, USA) in feature selection and hyperparameter optimization: logistic regression, random forest, and support vector machines. Model performance was evaluated using accuracy, sensitivity, specificity, precision, recall, F1-score, and AUC-ROC metrics. Ten-fold cross-validation was applied to ensure robustness. Results The random forest model demonstrated superior performance, achieving the highest accuracy in predicting CKD (47/55, 85.3%, p < 0.001, sensitivity 45/55, 82.5%, specificity 48/55, 87.2%) and heart disease (46/52, 88.2%, p < 0.001, sensitivity 45/52, 85.7%, specificity 47/52, 90.1%). Logistic regression effectively predicted TB (41/51, 80.1%, p < 0.01) and dementia (41/50, 82.4%, p < 0.01). Key predictive parameters included hemoglobin (median 10.2 g/dL, IQR 8.4-12.6) and erythrocyte sedimentation rate (median 42.0 mm/hr, IQR 20.0-65.0). Model validation showed high consistency, with positive acid-fast bacilli in 40/51 (78.4%) TB cases and characteristic radiological findings in 43/51 (84.3%) cases. Conclusion ML algorithms, particularly random forest, show promising potential in chronic disease classification and prediction. The integration of ChatGPT enhanced model development through optimized feature selection and hyperparameter tuning. Future research should focus on external validation through multi-center studies and prospective clinical trials.
Collapse
Affiliation(s)
- Sumira Abbas
- Department of Pathology, Peshawar Medical College, Peshawar, PAK
| | - Mahwish Iftikhar
- Department of Medicine, Medical Teaching Institution (MTI) Hayatabad Medical Complex, Peshawar, PAK
| | - Mian Mufarih Shah
- Department of Medicine, Medical Teaching Institution (MTI) Hayatabad Medical Complex, Peshawar, PAK
| | - Sheraz J Khan
- Department of Medicine, Medical Teaching Institution (MTI) Hayatabad Medical Complex, Peshawar, PAK
| |
Collapse
|
34
|
Onthoni DD, Lin MY, Lan KY, Huang TH, Lin HM, Chiou HY, Hsu CC, Chung RH. Latent space representation of electronic health records for clustering dialysis-associated kidney failure subtypes. Comput Biol Med 2024; 183:109243. [PMID: 39369548 DOI: 10.1016/j.compbiomed.2024.109243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 09/30/2024] [Accepted: 10/01/2024] [Indexed: 10/08/2024]
Abstract
OBJECTIVE Kidney failure manifests in various forms, from sudden occurrences such as Acute Kidney Injury (AKI) to progressive like Chronic Kidney Disease (CKD). Given its intricate nature, marked by overlapping comorbidities and clinical similarities-including treatment modalities like dialysis-we sought to design and validate an end-to-end framework for clustering kidney failure subtypes. MATERIALS AND METHODS Our emphasis was on dialysis, utilizing a comprehensive dataset from the UK Biobank (UKB). We transformed raw Electronic Health Record (EHR) data into standardized matrices that incorporate patient demographics, clinical visit data, and the innovative feature of visit time-gaps. This matrix structure was achieved using a unique data cutting method. Latent space transformation was facilitated using a convolution autoencoder (ConvAE) model, which was then subjected to clustering using Principal Component Analysis (PCA) and K-means algorithms. RESULTS Our transformation model effectively reduced data dimensionality, thereby accelerating computational processes. The derived latent space demonstrated remarkable clustering capacities. Through cluster analysis, two distinct groups were identified: CKD-majority (cluster 1) and a mixed group of non-CKD and some CKD subtypes (cluster 0). Cluster 1 exhibited notably low survival probability, suggesting it predominantly represented severe CKD. In contrast, cluster 0, with substantially higher survival probability, likely to include milder CKD forms and severe AKI. Our end-to-end framework effectively differentiates kidney failure subtypes using the UKB dataset, offering potential for nuanced therapeutic interventions. CONCLUSIONS This innovative approach integrates diverse data sources, providing a holistic understanding of kidney failure, which is imperative for patient management and targeted therapeutic interventions.
Collapse
Affiliation(s)
- Djeane Debora Onthoni
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan.
| | - Ming-Yen Lin
- Division of Nephrology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan.
| | - Kuei-Yuan Lan
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan.
| | - Tsung-Hsien Huang
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan.
| | - Hong-Ming Lin
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan.
| | - Hung-Yi Chiou
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan; School of Public Health, College of Public Health, Taipei Medical University, Taipei, Taiwan.
| | - Chih-Cheng Hsu
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan; National Center for Geriatrics and Welfare Research, National Health Research Institutes, Yunlin, Taiwan.
| | - Ren-Hua Chung
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan.
| |
Collapse
|
35
|
Tafavvoghi M, Bongo LA, Shvetsov N, Busund LTR, Møllersen K. Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review. J Pathol Inform 2024; 15:100363. [PMID: 38405160 PMCID: PMC10884505 DOI: 10.1016/j.jpi.2024.100363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 11/24/2023] [Accepted: 01/23/2024] [Indexed: 02/27/2024] Open
Abstract
Advancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and treatment. However, access to high-quality labeled histopathological images of breast cancer is a big challenge that limits the development of accurate and robust deep learning models. In this scoping review, we identified the publicly available datasets of breast H&E-stained whole-slide images (WSIs) that can be used to develop deep learning algorithms. We systematically searched 9 scientific literature databases and 9 research data repositories and found 17 publicly available datasets containing 10 385 H&E WSIs of breast cancer. Moreover, we reported image metadata and characteristics for each dataset to assist researchers in selecting proper datasets for specific tasks in breast cancer computational pathology. In addition, we compiled 2 lists of breast H&E patches and private datasets as supplementary resources for researchers. Notably, only 28% of the included articles utilized multiple datasets, and only 14% used an external validation set, suggesting that the performance of other developed models may be susceptible to overestimation. The TCGA-BRCA was used in 52% of the selected studies. This dataset has a considerable selection bias that can impact the robustness and generalizability of the trained algorithms. There is also a lack of consistent metadata reporting of breast WSI datasets that can be an issue in developing accurate deep learning models, indicating the necessity of establishing explicit guidelines for documenting breast WSI dataset characteristics and metadata.
Collapse
Affiliation(s)
- Masoud Tafavvoghi
- Department of Community Medicine, Uit The Arctic University of Norway, Tromsø, Norway
| | - Lars Ailo Bongo
- Department of Computer Science, Uit The Arctic University of Norway, Tromsø, Norway
| | - Nikita Shvetsov
- Department of Computer Science, Uit The Arctic University of Norway, Tromsø, Norway
| | | | - Kajsa Møllersen
- Department of Community Medicine, Uit The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
36
|
Lin N, Paul R, Guerra S, Liu Y, Doulgeris J, Shi M, Lin M, Engeberg ED, Hashemi J, Vrionis FD. The Frontiers of Smart Healthcare Systems. Healthcare (Basel) 2024; 12:2330. [PMID: 39684952 DOI: 10.3390/healthcare12232330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 11/14/2024] [Accepted: 11/15/2024] [Indexed: 12/18/2024] Open
Abstract
Artificial Intelligence (AI) is poised to revolutionize numerous aspects of human life, with healthcare among the most critical fields set to benefit from this transformation. Medicine remains one of the most challenging, expensive, and impactful sectors, with challenges such as information retrieval, data organization, diagnostic accuracy, and cost reduction. AI is uniquely suited to address these challenges, ultimately improving the quality of life and reducing healthcare costs for patients worldwide. Despite its potential, the adoption of AI in healthcare has been slower compared to other industries, highlighting the need to understand the specific obstacles hindering its progress. This review identifies the current shortcomings of AI in healthcare and explores its possibilities, realities, and frontiers to provide a roadmap for future advancements.
Collapse
Affiliation(s)
- Nan Lin
- Department of Gastroenterology, The Affiliated Hospital of Putian University, Putian 351100, China
| | - Rudy Paul
- Department of Ocean & Mechanical Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Santiago Guerra
- Department of Ocean & Mechanical Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Yan Liu
- Department of Gastroenterology, The Affiliated Hospital of Putian University, Putian 351100, China
- Department of Neurosurgery, Marcus Neuroscience Institute, Boca Raton Regional Hospital, Boca Raton, FL 33486, USA
| | - James Doulgeris
- Department of Biomedical Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Min Shi
- Harvard Ophthalmology AI Lab, Schepens Eye Research Institute of Massachusetts Eye and Ear, Harvard Medical School, Boston, MA 02115, USA
- School of Computing and Informatics, University of Louisiana, Lafayette, LA 70504, USA
| | - Maohua Lin
- Department of Biomedical Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Erik D Engeberg
- Department of Ocean & Mechanical Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA
- Department of Biomedical Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA
- Center for Complex Systems and Brain Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Javad Hashemi
- Department of Biomedical Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Frank D Vrionis
- Department of Neurosurgery, Marcus Neuroscience Institute, Boca Raton Regional Hospital, Boca Raton, FL 33486, USA
| |
Collapse
|
37
|
Tian H, He X, Yang K, Dai X, Liu Y, Zhang F, Shu Z, Zheng Q, Wang S, Xia J, Wen T, Liu B, Yu J, Zhou X. DAPNet: multi-view graph contrastive network incorporating disease clinical and molecular associations for disease progression prediction. BMC Med Inform Decis Mak 2024; 24:345. [PMID: 39563302 PMCID: PMC11575134 DOI: 10.1186/s12911-024-02756-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 11/07/2024] [Indexed: 11/21/2024] Open
Abstract
BACKGROUND Timely and accurate prediction of disease progress is crucial for facilitating early intervention and treatment for various chronic diseases. However, due to the complicated and longitudinal nature of disease progression, the capacity and completeness of clinical data required for training deep learning models remains a significant challenge. This study aims to explore a new method that reduces data dependency and achieves predictive performance comparable to existing research. METHODS This study proposed DAPNet, a deep learning-based disease progression prediction model that solely utilizes the comorbidity duration (without relying on multi-modal data or comprehensive medical records) and disease associations from biomedical knowledge graphs to deliver high-performance prediction. DAPNet is the first to apply multi-view graph contrastive learning to disease progression prediction tasks. Compared with other studies on comorbidities, DAPNet innovatively integrates molecular-level disease association information, combines disease co-occurrence and ICD10, and fully explores the associations between diseases; RESULTS: This study validated DAPNet using a de-identified clinical dataset derived from medical claims, which includes 2,714 patients and 10,856 visits. Meanwhile, a kidney dataset (606 patients) based on MIMIC-IV has also been constructed to fully validate its performance. The results showed that DAPNet achieved state-of-the-art performance on the severe pneumonia dataset (F1=0.84, with an improvement of 8.7%), and outperformed the six baseline models on the kidney disease dataset (F1=0.80, with an improvement of 21.3%). Through case analysis, we elucidated the clinical and molecular associations identified by the DAPNet model, which facilitated a better understanding and explanation of potential disease association, thereby providing interpretability for the model. CONCLUSIONS The proposed DAPNet, for the first time, utilizes comorbidity duration and disease associations network, enabling more accurate disease progression prediction based on a multi-view graph contrastive learning, which provides valuable insights for early diagnosis and treatment of patients. Based on disease association networks, our research has enhanced the interpretability of disease progression predictions.
Collapse
Affiliation(s)
- Haoyu Tian
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, Beijing, China
| | - Xiong He
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, Beijing, China
| | - Kuo Yang
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, Beijing, China
| | - Xinyu Dai
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, Beijing, China
| | - Yiming Liu
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, Beijing, China
| | - Fengjin Zhang
- Department of Nephrology, Third Hospital of Hebei Medical University, China Academy of Chinese Medical Sciences, Shijiazhuang, 050051, Hebei, China
| | - Zixin Shu
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, Beijing, China
| | - Qiguang Zheng
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, Beijing, China
| | - Shihua Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, Beijing, China
| | - Jianan Xia
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, Beijing, China
| | - Tiancai Wen
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, Beijing, China
| | - Baoyan Liu
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, Beijing, China
| | - Jian Yu
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, Beijing, China
| | - Xuezhong Zhou
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, Beijing, China.
| |
Collapse
|
38
|
Wu G, Wang H, Yang Z, He D, Chan S. Electronic Health Records Sharing Based on Consortium Blockchain. J Med Syst 2024; 48:106. [PMID: 39557726 DOI: 10.1007/s10916-024-02120-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 11/04/2024] [Indexed: 11/20/2024]
Abstract
In recent years, Electronic health records (EHR) has gradually become the mainstream in the healthcare field. However, due to the fact that EHR systems are provided by different vendors, data is dispersed and stored, which leads to the phenomenon of data silos, making medical information too fragmented and bringing some challenges to current medical services. Therefore, in view of the difficulties in sharing EHR between medical institutions, the risk of privacy leakage, and the lack of EHR usage control by patients, an EHR sharing model based on consortium blockchain is proposed in this paper. Firstly, the Interplanetary File System is combined with consortium blockchain, which forms a hybrid storage scheme of EHR, this technology effectively improves data security, privacy protection, and operational efficiency. Secondly, the model combines unidirectional multi-hop conditional proxy re-encryption based on type and identity with distributed key generation technology to achieve secure EHR sharing with fine grained control. At the same time, users are required to link the operation records of EHR, so as to realize the traceability of EHR usage. A dynamic Byzantine fault-tolerant algorithm based on reputation and clustering is then proposed to solve the problems of arbitrary master node selection, high latency and low throughput of PBFT, enabling the nodes to reach consensus more efficiently. Finally, the model is analyzed in terms of security and user control, showing that the model is less energy intensive in terms of communication overhead and time consumption, and can effectively achieve secure sharing between medical data.
Collapse
Affiliation(s)
- Guangfu Wu
- Department of Information Engineering, Jiangxi University of Science and Technology, Jiangxi, 341000, China.
| | - Haiping Wang
- Department of Information Engineering, Jiangxi University of Science and Technology, Jiangxi, 341000, China
| | - Zi Yang
- Department of Information Engineering, Jiangxi University of Science and Technology, Jiangxi, 341000, China
| | - Daojing He
- Department of Software Engineering, Harbin Institute of Technology, ShengZhen, 518000, GuangDong, China
| | - Sammy Chan
- Department of Electronic Engineering, City University of Hong Kong, 999077, Kowloon Tong, Hong Kong
| |
Collapse
|
39
|
Tian M, Chen B, Guo A, Jiang S, Zhang AR. Reliable generation of privacy-preserving synthetic electronic health record time series via diffusion models. J Am Med Inform Assoc 2024; 31:2529-2539. [PMID: 39222376 PMCID: PMC11491591 DOI: 10.1093/jamia/ocae229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 08/04/2024] [Accepted: 08/12/2024] [Indexed: 09/04/2024] Open
Abstract
OBJECTIVE Electronic health records (EHRs) are rich sources of patient-level data, offering valuable resources for medical data analysis. However, privacy concerns often restrict access to EHRs, hindering downstream analysis. Current EHR deidentification methods are flawed and can lead to potential privacy leakage. Additionally, existing publicly available EHR databases are limited, preventing the advancement of medical research using EHR. This study aims to overcome these challenges by generating realistic and privacy-preserving synthetic EHRs time series efficiently. MATERIALS AND METHODS We introduce a new method for generating diverse and realistic synthetic EHR time series data using denoizing diffusion probabilistic models. We conducted experiments on 6 databases: Medical Information Mart for Intensive Care III and IV, the eICU Collaborative Research Database (eICU), and non-EHR datasets on Stocks and Energy. We compared our proposed method with 8 existing methods. RESULTS Our results demonstrate that our approach significantly outperforms all existing methods in terms of data fidelity while requiring less training effort. Additionally, data generated by our method yield a lower discriminative accuracy compared to other baseline methods, indicating the proposed method can generate data with less privacy risk. DISCUSSION The proposed model utilizes a mixed diffusion process to generate realistic synthetic EHR samples that protect patient privacy. This method could be useful in tackling data availability issues in the field of healthcare by reducing barrier to EHR access and supporting research in machine learning for health. CONCLUSION The proposed diffusion model-based method can reliably and efficiently generate synthetic EHR time series, which facilitates the downstream medical data analysis. Our numerical results show the superiority of the proposed method over all other existing methods.
Collapse
Affiliation(s)
- Muhang Tian
- Department of Computer Science, Duke University, Durham, NC 27708, United States
| | - Bernie Chen
- Department of Electrical & Computer Engineering, Duke University, Durham, NC 27708, United States
| | - Allan Guo
- Department of Computer Science, Duke University, Durham, NC 27708, United States
| | - Shiyi Jiang
- Department of Electrical & Computer Engineering, Duke University, Durham, NC 27708, United States
| | - Anru R Zhang
- Department of Computer Science, Duke University, Durham, NC 27708, United States
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC 27708, United States
| |
Collapse
|
40
|
Wang K, Tan X, Nan S, Sang L, Chen H, Duan H. OLR-Net: Object Label Retrieval Network for principal diagnosis extraction. Comput Biol Med 2024; 182:109130. [PMID: 39288555 DOI: 10.1016/j.compbiomed.2024.109130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 08/26/2024] [Accepted: 09/06/2024] [Indexed: 09/19/2024]
Abstract
BACKGROUND Extracting principal diagnosis from patient discharge summaries is an essential task for the meaningful use of medical data. The extraction process, usually by medical staff, is laborious and time-consuming. Although automatic models have been proposed to retrieve principal diagnoses from medical records, many rare diagnoses and a small amount of training data per rare diagnosis provide significant statistical and computational challenges. OBJECTIVE In this study, we aimed to extract principal diagnoses with limited available data. METHODS We proposed the OLR-Net, Object Label Retrieval Network, to extract principal diagnoses for discharge summaries. Our approach included semantic extraction, label localization, label retrieval, and recommendation. The semantic information of discharge summaries was mapped into the diagnoses set. Then, one-dimensional convolutional neural networks slid into the bottom-up region for diagnosis localization to enrich rare diagnoses. Finally, OLR-Net detected the principal diagnosis in the localized region. The evaluation metrics focus on the hit ratio, mean reciprocal rank, and the area under the receiver operating characteristic curve (AUROC). RESULTS 12,788 desensitized discharge summary records were collected from the oncology department at Hainan Hospital of Chinese People's Liberation Army General Hospital. We designed five distinct settings based on the number of training data per diagnosis: the full dataset, the top-50 dataset, the few-shot dataset, the one-shot dataset, and the zero-shot dataset. The performance of our model had the highest HR@5 of 0.8778 and macro-AUROC of 0.9851. In the limited available (few-shot and one-shot) dataset, the macro-AUROC were 0.9833 and 0.9485, respectively. CONCLUSIONS OLR-Net has great potential for extracting principal diagnosis with limited available data through label localization and retrieval.
Collapse
Affiliation(s)
- Kai Wang
- Key Laboratory of Biomedical Engineering of Hainan Province, School of Biomedical Engineering, Hainan University, Haikou 570228, China; School of Information and Communication Engineering, Hainan University, Haikou 570228, China
| | - Xin Tan
- Key Laboratory of Biomedical Engineering of Hainan Province, School of Biomedical Engineering, Hainan University, Haikou 570228, China; College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou 310027, China
| | - Shan Nan
- Key Laboratory of Biomedical Engineering of Hainan Province, School of Biomedical Engineering, Hainan University, Haikou 570228, China.
| | - Lei Sang
- Hainan Hospital of Chinese People's Liberation Army General Hospital, Sanya 572013, China
| | - Han Chen
- Hainan Hospital of Chinese People's Liberation Army General Hospital, Sanya 572013, China
| | - Huilong Duan
- Key Laboratory of Biomedical Engineering of Hainan Province, School of Biomedical Engineering, Hainan University, Haikou 570228, China; College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
41
|
Morís DI, de Moura J, Marcos PJ, Míguez Rey E, Novo J, Ortega M. Efficient clinical decision-making process via AI-based multimodal data fusion: A COVID-19 case study. Heliyon 2024; 10:e38642. [PMID: 39640748 PMCID: PMC11619951 DOI: 10.1016/j.heliyon.2024.e38642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 09/26/2024] [Indexed: 12/07/2024] Open
Abstract
COVID-19 is an infectious disease that caused a global pandemic in 2020. In the critical moments of this healthcare emergencies, the medical staff needs to take important decisions in a context of limited resources that must be carefully managed. To this end, the computer-aided diagnosis methods are extremely powerful and help them to better recognize the evidences of high-risk patients. This can be done with the support of relevant information extracted from electronic health records, lab tests and imaging studies. In this work, we present a novel fully-automatic efficient method to help the clinical decision-making process in the context of COVID-19 risk estimation, using multimodal data fusion of clinical features and deep features extracted from chest X-ray images. The risk estimation is studied in two of the most relevant and critical encountered scenarios: the risk of hospitalization and mortality. This study shows which are the most important features for each scenario, the ratio of clinical and imaging features present in the top ranking and the performance of the used machine learning models. The results demonstrate a great performance by the classifiers, estimating the risk of hospitalization with an AUC-ROC of 0.8452 ± 0.0133 and the risk of death with an AUC-ROC of 0.8285 ± 0.0210, only using a subset of the original features, and highlight the significant contribution of imaging features to hospitalization risk assessment, while clinical features become more crucial for mortality risk evaluation. Furthermore, multimodal data fusion can outperform the approaches that use one data source. Despite the model's complexity, it requires fewer features, an advantage in scenarios with limited computational resources. This streamlined, fully-automated method shows promising potential to improve the clinical decision-making process and better manage medical resources, not only in the context of COVID-19, but also in other clinical scenarios.
Collapse
Affiliation(s)
- Daniel I. Morís
- Varpa Group, Biomedical Research Institute A Coruña (INIBIC), University of A Coruña, 15006, A Coruña, Spain
- Department of Computer Science and Information Technologies, University of A Coruña, 15071, A Coruña, Spain
| | - Joaquim de Moura
- Varpa Group, Biomedical Research Institute A Coruña (INIBIC), University of A Coruña, 15006, A Coruña, Spain
- Department of Computer Science and Information Technologies, University of A Coruña, 15071, A Coruña, Spain
| | - Pedro J. Marcos
- Dirección Asistencial y Servicio de Neumología, Complejo Hospitalario Universitario de A Coruña (CHUAC), Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, Sergas, 15006 A Coruña, Spain
| | - Enrique Míguez Rey
- Grupo de Investigación en Virología Clínica, Sección de Enfermedades Infecciosas, Servicio de Medicina Interna, Instituto de Investigación Biomédica de A Coruña (INIBIC), Área Sanitaria A Coruña y CEE (ASCC), SERGAS, 15006 A Coruña, Spain
| | - Jorge Novo
- Varpa Group, Biomedical Research Institute A Coruña (INIBIC), University of A Coruña, 15006, A Coruña, Spain
- Department of Computer Science and Information Technologies, University of A Coruña, 15071, A Coruña, Spain
| | - Marcos Ortega
- Varpa Group, Biomedical Research Institute A Coruña (INIBIC), University of A Coruña, 15006, A Coruña, Spain
- Department of Computer Science and Information Technologies, University of A Coruña, 15071, A Coruña, Spain
| |
Collapse
|
42
|
Epelde F. How AI Could Help Us in the Epidemiology and Diagnosis of Acute Respiratory Infections? Pathogens 2024; 13:940. [PMID: 39599493 PMCID: PMC11597561 DOI: 10.3390/pathogens13110940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Revised: 10/19/2024] [Accepted: 10/20/2024] [Indexed: 11/29/2024] Open
Abstract
Acute respiratory infections (ARIs) represent a significant global health burden, contributing to high morbidity and mortality rates, particularly in vulnerable populations. Traditional methods for diagnosing and tracking ARIs often face limitations in terms of speed, accuracy, and scalability. The advent of artificial intelligence (AI) has the potential to revolutionize these processes by enhancing early detection, precise diagnosis, and effective epidemiological tracking. This review explores the integration of AI in the epidemiology and diagnosis of ARIs, highlighting its capabilities, current applications, and future prospects. By examining recent advancements and existing studies, this paper provides a comprehensive understanding of how AI can improve ARI management, offering insights into its practical applications and the challenges that must be addressed to realize its full potential.
Collapse
Affiliation(s)
- Francisco Epelde
- Internal Medicine Department, Hospital Universitari Parc Taulí, 08208 Sabadell, Spain
| |
Collapse
|
43
|
AbuAlrob MA, Mesraoua B. Harnessing artificial intelligence for the diagnosis and treatment of neurological emergencies: a comprehensive review of recent advances and future directions. Front Neurol 2024; 15:1485799. [PMID: 39463792 PMCID: PMC11502371 DOI: 10.3389/fneur.2024.1485799] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2024] [Accepted: 09/30/2024] [Indexed: 10/29/2024] Open
Abstract
Artificial intelligence (AI) is rapidly transforming the landscape of neurology, offering innovative solutions for diagnosing and managing emergent neurological conditions such as stroke, traumatic brain injury, and acute spinal cord injury. This review critically examines the recent advancements in AI applications within the field of neurology, emphasizing both the potential and limitations of these technologies. While AI demonstrates remarkable accuracy and speed in diagnostic imaging, outcome prediction, and personalized treatment plans, its integration into clinical practice remains challenged by ethical concerns, infrastructural limitations, and the "black box" nature of many AI algorithms. The review highlights the current gaps in literature, particularly the limited research on AI's use in low-resource settings and its generalizability across diverse populations. Moreover, the review underscores the need for more longitudinal studies to assess the long-term efficacy of AI-driven interventions and calls for greater transparency in AI systems to enhance trust among clinicians. Future directions for AI in neurology emphasize the importance of interdisciplinary collaboration, regulatory oversight, and the development of equitable AI models that can benefit all patient populations. This review provides a balanced and comprehensive overview of AI's role in neurology, offering insights into both the opportunities and challenges that lie ahead.
Collapse
Affiliation(s)
- Majd A. AbuAlrob
- Department of Neurosciences, Hamad Medical Corporation, Doha, Qatar
| | - Boulenouar Mesraoua
- Department of Neurosciences, Hamad Medical Corporation, Doha, Qatar
- Weill Cornell Medical College, Doha, Qatar
| |
Collapse
|
44
|
Gao W, Rong F, Shao L, Deng Z, Xiao D, Zhang R, Chen C, Gong Z, Niu Z, Li F, Wei W, Ma L. Enhancing ophthalmology medical record management with multi-modal knowledge graphs. Sci Rep 2024; 14:23221. [PMID: 39369079 PMCID: PMC11455959 DOI: 10.1038/s41598-024-73316-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 09/16/2024] [Indexed: 10/07/2024] Open
Abstract
The electronic medical record management system plays a crucial role in clinical practice, optimizing the recording and management of healthcare data. To enhance the functionality of the medical record management system, this paper develops a customized schema designed for ophthalmic diseases. A multi-modal knowledge graph is constructed, which is built upon expert-reviewed and de-identified real-world ophthalmology medical data. Based on this data, we propose an auxiliary diagnostic model based on a contrastive graph attention network (CGAT-ADM), which uses the patient's diagnostic results as anchor points and achieves auxiliary medical record diagnosis services through graph clustering. By implementing contrastive methods and feature fusion of node types, text, and numerical information in medical records, the CGAT-ADM model achieved an average precision of 0.8563 for the top 20 similar case retrievals, indicating high performance in identifying analogous diagnoses. Our research findings suggest that medical record management systems underpinned by multimodal knowledge graphs significantly enhance the development of AI services. These systems offer a range of benefits, from facilitating assisted diagnosis and addressing similar patient inquiries to delving into potential case connections and disease patterns. This comprehensive approach empowers healthcare professionals to garner deeper insights and make well-informed decisions.
Collapse
Affiliation(s)
- Weihao Gao
- Shenzhen International Graduate School, Tsinghua University, Shenzhe, P.R. China
| | - Fuju Rong
- Shenzhen International Graduate School, Tsinghua University, Shenzhe, P.R. China
| | - Lei Shao
- Beijing Tongren Hospital, Capital Medical University, Beijing, P.R. China
| | - Zhuo Deng
- Shenzhen International Graduate School, Tsinghua University, Shenzhe, P.R. China
| | - Daimin Xiao
- Shenzhen International Graduate School, Tsinghua University, Shenzhe, P.R. China
| | - Ruiheng Zhang
- Beijing Tongren Hospital, Capital Medical University, Beijing, P.R. China
| | - Chucheng Chen
- Shenzhen International Graduate School, Tsinghua University, Shenzhe, P.R. China
| | - Zheng Gong
- Shenzhen International Graduate School, Tsinghua University, Shenzhe, P.R. China
| | - Zhiyuan Niu
- Shenzhen International Graduate School, Tsinghua University, Shenzhe, P.R. China
| | - Fang Li
- Shenzhen International Graduate School, Tsinghua University, Shenzhe, P.R. China
| | - Wenbin Wei
- Beijing Tongren Hospital, Capital Medical University, Beijing, P.R. China.
| | - Lan Ma
- Shenzhen International Graduate School, Tsinghua University, Shenzhe, P.R. China.
| |
Collapse
|
45
|
Yang R, Zeng Q, You K, Qiao Y, Huang L, Hsieh CC, Rosand B, Goldwasser J, Dave A, Keenan T, Ke Y, Hong C, Liu N, Chew E, Radev D, Lu Z, Xu H, Chen Q, Li I. Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study. J Med Internet Res 2024; 26:e60601. [PMID: 39361955 PMCID: PMC11487205 DOI: 10.2196/60601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/08/2024] [Accepted: 07/15/2024] [Indexed: 10/05/2024] Open
Abstract
BACKGROUND Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. OBJECTIVE This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. METHODS We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. RESULTS The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). CONCLUSIONS This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.
Collapse
Affiliation(s)
- Rui Yang
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Qingcheng Zeng
- Department of Linguistics, Northwestern University, Evanston, IL, United States
| | - Keen You
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Yujie Qiao
- Yale School of Public Health, Yale University, New Haven, CT, United States
| | - Lucas Huang
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Chia-Chun Hsieh
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Benjamin Rosand
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Jeremy Goldwasser
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Amisha Dave
- Yale New Haven Hospital, Yale School of Medicine, Yale University, New Haven, CT, United States
| | - Tiarnan Keenan
- Division of Epidemiology and Clinical Applications, National Eye Institute, National Institutes of Health, Bethesda, MD, United States
| | - Yuhe Ke
- Department of Anesthesiology, Singapore General Hospital, Singapore, Singapore
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
- Program in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- Institute of Data Science, National University of Singapore, Singapore, Singapore
| | - Emily Chew
- Division of Epidemiology and Clinical Applications, National Eye Institute, National Institutes of Health, Bethesda, MD, United States
| | - Dragomir Radev
- Department of Computer Science, Yale University, New Haven, CT, United States
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Hua Xu
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, United States
| | - Qingyu Chen
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, United States
| | - Irene Li
- Information Technology Center, University of Tokyo, Kashiwa, Japan
- Smartor LLC, Tokyo, Japan
| |
Collapse
|
46
|
Agraz M, Deng Y, Karniadakis GE, Mantzoros CS. Enhancing severe hypoglycemia prediction in type 2 diabetes mellitus through multi-view co-training machine learning model for imbalanced dataset. Sci Rep 2024; 14:22741. [PMID: 39349500 PMCID: PMC11444036 DOI: 10.1038/s41598-024-69844-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 08/09/2024] [Indexed: 10/02/2024] Open
Abstract
Patients with type 2 diabetes mellitus (T2DM) who have severe hypoglycemia (SH) poses a considerable risk of long-term death, especially among the elderly, demanding urgent medical attention. Accurate prediction of SH remains challenging due to its multifaced nature, contributed from factors such as medications, lifestyle choices, and metabolic measurements. In this study, we propose a systematic approach to improve the robustness and accuracy of SH predictions using machine learning models, guided by clinical feature selection. Our focus is on developing long-term SH prediction models using both semi-supervised learning and supervised learning algorithms. Using the action to control cardiovascular risk in diabetes trial, which includes electronic health records for over 10,000 individuals, we focus on studying adults with T2DM. Our results indicate that the application of a multi-view co-training method, incorporating the random forest algorithm, improves the specificity of SH prediction, while the same setup with Naive Bayes replacing random forest demonstrates better sensitivity. Our framework also provides interpretability of machine learning models by identifying key predictors for hypoglycemia, including fasting plasma glucose, hemoglobin A1c, general diabetes education, and NPH or L insulins. The integration of data routinely available in electronic health records significantly enhances our model's capability to predict SH events, showcasing its potential to transform clinical practice by facilitating early interventions and optimizing patient management. By enhancing prediction accuracy and identifying crucial predictive features, our study contributes to advancing the understanding and management of hypoglycemia in this population.
Collapse
Affiliation(s)
- Melih Agraz
- Division of Applied Mathematics, Brown University, Providence, RI, 02912, USA
- Department of Statistics, Giresun University, Giresun, 28200, Turkey
- Department of Endocrinology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, 02215, USA
| | - Yixiang Deng
- Department of Computer and Information Science, College of Engineering, University of Delaware, Newark, DE, 19716, USA
- Ragon Institute of Mass General, MIT and Harvard, Cambridge, MA, 02142, USA
| | - George Em Karniadakis
- Division of Applied Mathematics, Brown University, Providence, RI, 02912, USA
- School of Engineering, Brown University, Providence, RI, 02912, USA
| | - Christos Socrates Mantzoros
- Department of Endocrinology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, 02215, USA.
| |
Collapse
|
47
|
Ganguli R, Franklin J, Yu X, Lin A, Vichare A, Wagner S. Comparison of machine learning models for the prediction of hypertension in transgender patients undergoing gynecologic surgery. COMMUNICATIONS MEDICINE 2024; 4:183. [PMID: 39349936 PMCID: PMC11442826 DOI: 10.1038/s43856-024-00603-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 09/02/2024] [Indexed: 10/04/2024] Open
Abstract
BACKGROUND Transgender patients face a higher burden of cardiovascular morbidity due to structural and biological stressors, particularly in low-resource settings. No studies exist comparing machine learning model development strategies for this unique patient cohort and limited literature exists comparing data/outcomes between transgender and cisgender populations. METHODS We compare machine learning models trained solely on transgender patients against models developed on a size-matched and ratio-matched cohort of cisgender patients and a 300-fold larger, ratio-matched cohort of cisgender patients undergoing obstetric/gynecologic procedures in the National Surgical Quality Improvement Program from January 1, 2005 through December 31, 2019. All models were developed to predict the outcome of hypertension. Statistical significance between models was calculated using 5-by-2 fold cross validation hypothesis testing. RESULTS Among 626,102 patients having an obstetric/gynecologic surgery, there are 1959 transgender patients of which 85,405 (13.7%) have hypertension requiring medication. Saliently, the logistic regression machine learning models trained selectively on the transgender cohort have an AUC of 0.865 (95% CI: 0.83-0.90), with an accuracy of 85% (95% CI: 0.80-0.87) compared to (p < 0.05) the logistic regression model trained on the 300-fold larger combined cohort which has an AUC of 0.861 (95% CI: 0.82-0.90), with an accuracy of 83% (95% CI: 0.80-0.87). CONCLUSION Machine learning models can be trained on smaller, selectively transgender populations and may perform similarly or better to predict cardiovascular outcomes in transgender patients, than models developed on predominantly cisgender patients; this can be useful in lower-resource settings with smaller-volume transgender patients.
Collapse
Affiliation(s)
- Reetam Ganguli
- Brown University, Providence, RI, USA
- University of California Los Angeles, Los Angeles, CA, USA
| | - Jordan Franklin
- Department of Computer Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Xiaotian Yu
- Department of Mathematics, University of Virginia, Charlottesville, VA, USA
| | - Alice Lin
- Warren Alpert Medical School, Providence, RI, USA
| | - Aditi Vichare
- University of California Los Angeles, Los Angeles, CA, USA.
| | - Stephen Wagner
- Department of Obstetrics and Gynecology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
48
|
Kolasseri AE, B V. Comparative study of machine learning and statistical survival models for enhancing cervical cancer prognosis and risk factor assessment using SEER data. Sci Rep 2024; 14:22203. [PMID: 39333298 PMCID: PMC11437206 DOI: 10.1038/s41598-024-72790-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 09/10/2024] [Indexed: 09/29/2024] Open
Abstract
Cervical cancer is a common malignant tumor of the female reproductive system and the leading cause of death among women worldwide. The survival prediction method can be used to effectively analyze the time to event, which is essential in any clinical study. This study aims to bridge the gap between traditional statistical methods and machine learning in survival analysis by revealing which techniques are most effective in predicting survival, with a particular emphasis on improving prediction accuracy and identifying key risk factors for cervical cancer. Women with cervical cancer diagnosed between 2013 and 2015 were included in our study using data from the Surveillance, Epidemiology, and End Results (SEER) database. Using this dataset, the study assesses the performance of Weibull, Cox proportional hazards models, and Random Survival Forests in terms of predictive accuracy and risk factor identification. The findings reveal that machine learning models, particularly Random Survival Forests (RSF), outperform traditional statistical methods in both predictive accuracy and the discernment of crucial prognostic factors, underscoring the advantages of machine learning in handling complex survival data. However, for a survival dataset with a small number of predictors, statistical models should be used first. The study finds that RSF models enhance survival analysis with more accurate predictions and insights into survival risk factors but highlights the need for larger datasets and further research on model interpretability and clinical applicability.
Collapse
Affiliation(s)
- Anjana Eledath Kolasseri
- Department of Mathematics, School of Advanced Sciences, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Venkataramana B
- Department of Mathematics, School of Advanced Sciences, Vellore Institute of Technology, Vellore, Tamil Nadu, India.
| |
Collapse
|
49
|
Xian S, Grabowska ME, Kullo IJ, Luo Y, Smoller JW, Wei WQ, Jarvik G, Mooney S, Crosslin D. Language-model-based patient embedding using electronic health records facilitates phenotyping, disease forecasting, and progression analysis. RESEARCH SQUARE 2024:rs.3.rs-4708839. [PMID: 39399661 PMCID: PMC11469380 DOI: 10.21203/rs.3.rs-4708839/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Current studies regarding the secondary use of electronic health records (EHR) predominantly rely on domain expertise and existing medical knowledge. Though significant efforts have been devoted to investigating the application of machine learning algorithms in the EHR, efficient and powerful representation of patients is needed to unleash the potential of discovering new medical patterns underlying the EHR. Here, we present an unsupervised method for embedding high-dimensional EHR data at the patient level, aimed at characterizing patient heterogeneity in complex diseases and identifying new disease patterns associated with clinical outcome disparities. Inspired by the architecture of modern language models-specifically transformers with attention mechanisms, we use patient diagnosis and procedure codes as vocabularies and treat each patient as a sentence to perform the patient embedding. We applied this approach to 34,851 unique medical codes across 1,046,649 longitudinal patient events, including 102,739 patients from the electronic Medical Records and GEnomics (eMERGE) Network. The resulting patient vectors demonstrated excellent performance in predicting future disease events (median AUROC = 0.87 within one year) and bulk phenotyping (median AUROC = 0.84). We then illustrated the utility of these patient vectors in revealing heterogeneous comorbidity patterns, exemplified by disease subtypes in colorectal cancer and systemic lupus erythematosus, and capturing distinct longitudinal disease trajectories. External validation using EHR data from the University of Washington confirmed robust model performance, with median AUROCs of 0.83 and 0.84 for bulk phenotyping tasks and disease onset prediction, respectively. Importantly, the model reproduced the clustering results of disease subtypes identified in the eMERGE cohort and uncovered variations in overall mortality among these subtypes. Together, these results underscore the potential of representation learning in EHRs to enhance patient characterization and associated clinical outcomes, thereby advancing disease forecasting and facilitating personalized medicine.
Collapse
Affiliation(s)
- Su Xian
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA
| | - Monika E Grabowska
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine and the Gonda Vascular Center, Mayo Clinic Rochester Minnesota
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine
| | - Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Gail Jarvik
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Sean Mooney
- Center for Information Technology, National Institutes of Health
| | - David Crosslin
- Department of Medicine, Division of Biomedical Informatics and Genomics, Tulane University, New Orleans, LA
| |
Collapse
|
50
|
Teo YX, Lee RE, Nurzaman SG, Tan CP, Chan PY. Action tremor features discovery for essential tremor and Parkinson's disease with explainable multilayer BiLSTM. Comput Biol Med 2024; 180:108957. [PMID: 39098236 DOI: 10.1016/j.compbiomed.2024.108957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 07/04/2024] [Accepted: 07/26/2024] [Indexed: 08/06/2024]
Abstract
The tremors of Parkinson's disease (PD) and essential tremor (ET) are known to have overlapping characteristics that make it complicated for clinicians to distinguish them. While deep learning is robust in detecting features unnoticeable to humans, an opaque trained model is impractical in clinical scenarios as coincidental correlations in the training data may be used by the model to make classifications, which may result in misdiagnosis. This work aims to overcome the aforementioned challenge of deep learning models by introducing a multilayer BiLSTM network with explainable AI (XAI) that can better explain tremulous characteristics and quantify the respective discovered important regions in tremor differentiation. The proposed network classifies PD, ET, and normal tremors during drinking actions and derives the contribution from tremor characteristics, (i.e., time, frequency, amplitude, and actions) utilized in the classification task. The analysis shows that the XAI-BiLSTM marks the regions with high tremor amplitude as important in classification, which is verified by a high correlation between relevance distribution and tremor displacement amplitude. The XAI-BiLSTM discovered that the transition phases from arm resting to lifting (during the drinking cycle) is the most important action to classify tremors. Additionally, the XAI-BiLSTM reveals frequency ranges that only contribute to the classification of one tremor class, which may be the potential distinctive feature to overcome the overlapping frequencies problem. By revealing critical timing and frequency patterns unique to PD and ET tremors, this proposed XAI-BiLSTM model enables clinicians to make more informed classifications, potentially reducing misclassification rates and improving treatment outcomes.
Collapse
Affiliation(s)
- Yu Xuan Teo
- Department of Electrical & Robotics Engineering, School of Engineering, Monash University Malaysia, Malaysia.
| | - Rui En Lee
- Department of Electrical & Robotics Engineering, School of Engineering, Monash University Malaysia, Malaysia.
| | - Surya Girinatha Nurzaman
- Department of Mechanical Engineering, School of Engineering, Monash University Malaysia, Bandar Sunway, Malaysia.
| | - Chee Pin Tan
- Department of Electrical & Robotics Engineering, School of Engineering, Monash University Malaysia, Malaysia.
| | - Ping Yi Chan
- Department of Electrical & Robotics Engineering, School of Engineering, Monash University Malaysia, Malaysia.
| |
Collapse
|