1
|
Cheshire WP, Sandroni P, Shouman K, Cutsforth-Gregory JK, Coon EA, Benarroch EE, Singer W, Low PA. Accuracy of chat-based artificial intelligence for patient education on orthostatic hypotension. Clin Auton Res 2025:10.1007/s10286-025-01125-9. [PMID: 40167938 DOI: 10.1007/s10286-025-01125-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Accepted: 03/19/2025] [Indexed: 04/02/2025]
Affiliation(s)
- W P Cheshire
- Division of Autonomic Neurology, Department of Neurology, Mayo Clinic, 4500 San Pablo Rd., Jacksonville, FL, 32224, USA.
| | - P Sandroni
- Department of Neurology, Mayo Clinic, Rochester, MN, 55905, USA
| | - K Shouman
- Department of Neurology, Mayo Clinic, Rochester, MN, 55905, USA
| | | | - E A Coon
- Department of Neurology, Mayo Clinic, Rochester, MN, 55905, USA
| | - E E Benarroch
- Department of Neurology, Mayo Clinic, Rochester, MN, 55905, USA
| | - W Singer
- Department of Neurology, Mayo Clinic, Rochester, MN, 55905, USA
| | - P A Low
- Department of Neurology, Mayo Clinic, Rochester, MN, 55905, USA
| |
Collapse
|
2
|
Dihan QA, Brown AD, Chauhan MZ, Alzein AF, Abdelnaem SE, Kelso SD, Rahal DA, Park R, Ashraf M, Azzam A, Morsi M, Warner DB, Sallam AB, Saeed HN, Elhusseiny AM. Leveraging large language models to improve patient education on dry eye disease. Eye (Lond) 2025; 39:1115-1122. [PMID: 39681711 DOI: 10.1038/s41433-024-03476-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 10/18/2024] [Accepted: 11/07/2024] [Indexed: 12/18/2024] Open
Abstract
BACKGROUND/OBJECTIVES Dry eye disease (DED) is an exceedingly common diagnosis in patients, yet recent analyses have demonstrated patient education materials (PEMs) on DED to be of low quality and readability. Our study evaluated the utility and performance of three large language models (LLMs) in enhancing and generating new patient education materials (PEMs) on dry eye disease (DED). SUBJECTS/METHODS We evaluated PEMs generated by ChatGPT-3.5, ChatGPT-4, Gemini Advanced, using three separate prompts. Prompts A and B requested they generate PEMs on DED, with Prompt B specifying a 6th-grade reading level, using the SMOG (Simple Measure of Gobbledygook) readability formula. Prompt C asked for a rewrite of existing PEMs at a 6th-grade reading level. Each PEM was assessed on readability (SMOG, FKGL: Flesch-Kincaid Grade Level), quality (PEMAT: Patient Education Materials Assessment Tool, DISCERN), and accuracy (Likert Misinformation scale). RESULTS All LLM-generated PEMs in response to Prompt A and B were of high quality (median DISCERN = 4), understandable (PEMAT understandability ≥70%) and accurate (Likert Score=1). LLM-generated PEMs were not actionable (PEMAT Actionability <70%). ChatGPT-4 and Gemini Advanced rewrote existing PEMs (Prompt C) from a baseline readability level (FKGL: 8.0 ± 2.4, SMOG: 7.9 ± 1.7) to targeted 6th-grade reading level; rewrites contained little to no misinformation (median Likert misinformation=1 (range: 1-2)). However, only ChatGPT-4 rewrote PEMs while maintaining high quality and reliability (median DISCERN = 4). CONCLUSION LLMs (notably ChatGPT-4) were able to generate and rewrite PEMs on DED that were readable, accurate, and high quality. Our study underscores the value of leveraging LLMs as supplementary tools to improving PEMs.
Collapse
Affiliation(s)
- Qais A Dihan
- Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL, USA
- Department of Ophthalmology, Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Andrew D Brown
- UAMS College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Muhammad Z Chauhan
- UAMS College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Ahmad F Alzein
- College of Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Seif E Abdelnaem
- College of Natural Sciences and Mathematics, University of Central Arkansas, Conway, AR, USA
| | - Sean D Kelso
- Burnett School of Medicine, Texas Christian University, Fort Worth, TX, USA
| | - Dania A Rahal
- UAMS College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Royce Park
- Department of Ophthalmology and Visual Sciences, University of Illinois Chicago, Chicago, USA
| | - Mohammadali Ashraf
- Department of Ophthalmology and Visual Sciences, University of Illinois Chicago, Chicago, USA
| | - Amr Azzam
- Department of Ophthalmology, Kasr Al-Ainy Hospitals, Cairo University, Cairo, Egypt
| | - Mahmoud Morsi
- Department of Anesthesia and Pain Management, Kasr Al-Ainy Hospitals, Cairo University, Cairo, Egypt
- Department of Anesthesiology and Pain Management, John H. Stroger, Jr. Hospital of Cook County, Chicago, IL, USA
| | - David B Warner
- Department of Ophthalmology, Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Ahmed B Sallam
- Department of Ophthalmology, Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Department of Ophthalmology, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Hajirah N Saeed
- Department of Ophthalmology and Visual Sciences, University of Illinois Chicago, Chicago, IL, USA.
- Department of Ophthalmology, Loyola University Medical Center, Maywood, IL, USA.
| | - Abdelrahman M Elhusseiny
- Department of Ophthalmology, Harvey and Bernice Jones Eye Institute, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
- Department of Ophthalmology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
3
|
Wan Z, Guo Y, Bao S, Wang Q, Malin BA. Evaluating Sex and Age Biases in Multimodal Large Language Models for Skin Disease Identification from Dermatoscopic Images. HEALTH DATA SCIENCE 2025; 5:0256. [PMID: 40170800 PMCID: PMC11961048 DOI: 10.34133/hds.0256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 02/15/2025] [Accepted: 02/20/2025] [Indexed: 04/03/2025]
Abstract
Background: Multimodal large language models (LLMs) have shown potential in various health-related fields. However, many healthcare studies have raised concerns about the reliability and biases of LLMs in healthcare applications. Methods: To explore the practical application of multimodal LLMs in skin disease identification, and to evaluate sex and age biases, we tested the performance of 2 popular multimodal LLMs, ChatGPT-4 and LLaVA-1.6, across diverse sex and age groups using a subset of a large dermatoscopic dataset containing around 10,000 images and 3 skin diseases (melanoma, melanocytic nevi, and benign keratosis-like lesions). Results: In comparison to 3 deep learning models (VGG16, ResNet50, and Model Derm) based on convolutional neural network (CNN), one vision transformer model (Swin-B), we found that ChatGPT-4 and LLaVA-1.6 demonstrated overall accuracies that were 3% and 23% higher (and F1-scores that were 4% and 34% higher), respectively, than the best performing CNN-based baseline while maintaining accuracies that were 38% and 26% lower (and F1-scores that were 38% and 19% lower), respectively, than Swin-B. Meanwhile, ChatGPT-4 is generally unbiased in identifying these skin diseases across sex and age groups, while LLaVA-1.6 is generally unbiased across age groups, in contrast to Swin-B, which is biased in identifying melanocytic nevi. Conclusions: This study suggests the usefulness and fairness of LLMs in dermatological applications, aiding physicians and practitioners with diagnostic recommendations and patient screening. To further verify and evaluate the reliability and fairness of LLMs in healthcare, experiments using larger and more diverse datasets need to be performed in the future.
Collapse
Affiliation(s)
- Zhiyu Wan
- Department of Biomedical Informatics,
Vanderbilt University Medical Center, Nashville, TN, USA
- School of Biomedical Engineering,
ShanghaiTech University, Shanghai, China
| | - Yuhang Guo
- School of Biomedical Engineering,
ShanghaiTech University, Shanghai, China
| | - Shunxing Bao
- Department of Electrical and Computer Engineering,
Vanderbilt University, Nashville, TN, USA
| | - Qian Wang
- School of Biomedical Engineering,
ShanghaiTech University, Shanghai, China
| | - Bradley A. Malin
- Department of Biomedical Informatics,
Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Computer Science,
Vanderbilt University, Nashville, TN, USA
- Department of Biostatistics,
Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
4
|
Syryca F, Gräßer C, Trenkwalder T, Nicol P. Automated generation of echocardiography reports using artificial intelligence: a novel approach to streamlining cardiovascular diagnostics. THE INTERNATIONAL JOURNAL OF CARDIOVASCULAR IMAGING 2025:10.1007/s10554-025-03382-1. [PMID: 40159559 DOI: 10.1007/s10554-025-03382-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Accepted: 03/12/2025] [Indexed: 04/02/2025]
Abstract
Accurate interpretation of echocardiography measurements is essential for diagnosing cardiovascular diseases and guiding clinical management. The emergence of large language models (LLMs) like ChatGPT presents a novel opportunity to automate the generation of echocardiography reports and provide clinical recommendations. This study aimed to evaluate the ability of an LLM (ChatGPT) to 1) generate comprehensive echocardiography reports based solely on provided echocardiographic measurements, and when enriched with clinical information 2) formulate accurate diagnoses, along with appropriate recommendations for further tests, treatment, and follow-up. Echocardiographic data from n = 13 fictional cases (Group 1) and n = 8 clinical cases (Group 2) were input into the LLM. The model's outputs were compared against standard clinical assessments conducted by experienced cardiologists. Using a dedicated scoring system, the LLM's performance was evaluated and stratified based on its accuracy in report generation, diagnostic precision, and the appropriateness of its recommendations. Patterns, frequency and examples of misinterpretations by LLM were analysed. Across all cases, mean total score was 6.86 (SD = 1.12). Group 1 had a mean total score of 6.54 (SD = 1.13) and accuracy of 3.92 (SD = 0.86), while Group 2 scored 7.38 (SD = 0.92) and 4.38 (SD = 0.92), respectively. Recommendations were 2.62 (SD = 0.51) for Group 1 and 3.00 (SD = 0.00) for Group 2, with no significant differences (p = 0.096). Fully acceptable reports were 85.7%, borderline acceptable 14.3%, and none were not acceptable. Of 299 parameters, 5.3% were misinterpreted. The LLM demonstrated a high level of accuracy in generating detailed echocardiography reports, mostly correctly identifying normal and abnormal findings, and making accurate diagnoses across a range of cardiovascular conditions. ChatGPT, as an LLM, shows significant potential in automating the interpretation of echocardiographic data, offering accurate diagnostic insights and clinical recommendations. These findings suggest that LLMs could serve as valuable tools in clinical practice, assisting and streamlining clinical workflow.
Collapse
Affiliation(s)
- Finn Syryca
- Department of Cardiovascular Diseases, German Heart Centre Munich, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Christian Gräßer
- Department of Cardiovascular Diseases, German Heart Centre Munich, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Teresa Trenkwalder
- Department of Cardiovascular Diseases, German Heart Centre Munich, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Munich, Germany
| | - Philipp Nicol
- Department of Cardiovascular Diseases, German Heart Centre Munich, School of Medicine and Health, TUM University Hospital, Technical University of Munich, Munich, Germany.
- MVZ Med 360 Grad Alter Hof Kardiologe Und Nuklearmedizin, Dienerstraße 12, 80331, Munich, Germany.
| |
Collapse
|
5
|
van Lent LGG, Yilmaz NG, Goosen S, Burgers J, Giani S, Schouten BC, Langendam MW. Effectiveness of interpreters and other strategies for mitigating language barriers: A systematic review. PATIENT EDUCATION AND COUNSELING 2025; 136:108767. [PMID: 40179546 DOI: 10.1016/j.pec.2025.108767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/14/2025] [Accepted: 03/26/2025] [Indexed: 04/05/2025]
Abstract
OBJECTIVE To examine the effectiveness of different communication strategies for mitigating language barriers on patient-, provider- and context-related outcomes. METHODS A systematic search was conducted in nine databases for quantitative studies from 2013 comparing different strategies. The studies' quality was assessed with the Evidence Project Risk of Bias tool and the certainty of evidence with the GRADE approach. RESULTS Twenty-six articles were included, all about healthcare settings. Generally, having a shared language (e.g., a provider in the patient's native language) followed by using professional interpreters yielded the most positive outcomes, and in-person or video interpreters more than telephone interpreters. Compared to professional interpreters, the translation quality of informal interpreters was only similar when assessing patient outcomes after surgery, and the quality of digital translation tools was only sufficient with simple messages or when messages were pre-translated. CONCLUSION Having a provider in patients' native language and having professional interpreters outperform other strategies for mitigating language barriers in healthcare. However, other strategies may suffice in specific situations. Future research should explore the effectiveness of (combining) strategies, especially in social care. PRACTICE IMPLICATIONS This review can inform policy and help develop guidelines on mitigating language barriers to support providers in their daily practice.
Collapse
Affiliation(s)
- Liza G G van Lent
- Department of Communication Science, Amsterdam School for Communication Research (ASCoR), University of Amsterdam, Amsterdam, the Netherlands.
| | - Nida Gizem Yilmaz
- Department of Communication Science, Amsterdam School for Communication Research (ASCoR), University of Amsterdam, Amsterdam, the Netherlands
| | - Simone Goosen
- Netherlands Patients Federation, Utrecht, the Netherlands
| | - Jako Burgers
- Maastricht University, Department of General Practice, Care and Public Health Research Institute (CAPHRI), Maastricht, the Netherlands
| | - Stefano Giani
- University Library, University of Amsterdam, Amsterdam, the Netherlands
| | - Barbara C Schouten
- Department of Communication Science, Amsterdam School for Communication Research (ASCoR), University of Amsterdam, Amsterdam, the Netherlands
| | - Miranda W Langendam
- Department of Epidemiology and Data Science, Amsterdam University Medical Center, Amsterdam, the Netherlands; Amsterdam Public Health Research Institute, Methodology, Amsterdam, the Netherlands
| |
Collapse
|
6
|
Yang H, Li J, Zhang C, Sierra AP, Shen B. Large Language Model-Driven Knowledge Graph Construction in Sepsis Care Using Multicenter Clinical Databases: Development and Usability Study. J Med Internet Res 2025; 27:e65537. [PMID: 40146985 DOI: 10.2196/65537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Revised: 10/28/2024] [Accepted: 02/18/2025] [Indexed: 03/29/2025] Open
Abstract
BACKGROUND Sepsis is a complex, life-threatening condition characterized by significant heterogeneity and vast amounts of unstructured data, posing substantial challenges for traditional knowledge graph construction methods. The integration of large language models (LLMs) with real-world data offers a promising avenue to address these challenges and enhance the understanding and management of sepsis. OBJECTIVE This study aims to develop a comprehensive sepsis knowledge graph by leveraging the capabilities of LLMs, specifically GPT-4.0, in conjunction with multicenter clinical databases. The goal is to improve the understanding of sepsis and provide actionable insights for clinical decision-making. We also established a multicenter sepsis database (MSD) to support this effort. METHODS We collected clinical guidelines, public databases, and real-world data from 3 major hospitals in Western China, encompassing 10,544 patients diagnosed with sepsis. Using GPT-4.0, we used advanced prompt engineering techniques for entity recognition and relationship extraction, which facilitated the construction of a nuanced sepsis knowledge graph. RESULTS We established a sepsis database with 10,544 patient records, including 8497 from West China Hospital, 690 from Shangjin Hospital, and 357 from Tianfu Hospital. The sepsis knowledge graph comprises of 1894 nodes and 2021 distinct relationships, encompassing nine entity concepts (diseases, symptoms, biomarkers, imaging examinations, etc) and 8 semantic relationships (complications, recommended medications, laboratory tests, etc). GPT-4.0 demonstrated superior performance in entity recognition and relationship extraction, achieving an F1-score of 76.76 on a sepsis-specific dataset, outperforming other models such as Qwen2 (43.77) and Llama3 (48.39). On the CMeEE dataset, GPT-4.0 achieved an F1-score of 65.42 using few-shot learning, surpassing traditional models such as BERT-CRF (62.11) and Med-BERT (60.66). Building upon this, we compiled a comprehensive sepsis knowledge graph, comprising of 1894 nodes and 2021 distinct relationships. CONCLUSIONS This study represents a pioneering effort in using LLMs, particularly GPT-4.0, to construct a comprehensive sepsis knowledge graph. The innovative application of prompt engineering, combined with the integration of multicenter real-world data, has significantly enhanced the efficiency and accuracy of knowledge graph construction. The resulting knowledge graph provides a robust framework for understanding sepsis, supporting clinical decision-making, and facilitating further research. The success of this approach underscores the potential of LLMs in medical research and sets a new benchmark for future studies in sepsis and other complex medical conditions.
Collapse
Affiliation(s)
- Hao Yang
- Department of Critical Care Medicine, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Frontiers Science Center for Disease-related Molecular Network, Institutes for Systems Genetics, Sichuan University, West China Hospital, Chengdu, China
- Information Center, Engineering Research Center of Medical Information Technology, Ministry of Education, West China Hospital, Sichuan University, Chengdu, China
- Department of Computer Science and Information Technologies, Iberian Society of Telehealth and Telemedicine, University of A Coruña, A Coruña, Spain
| | - Jiaxi Li
- Department of Clinical Laboratory Medicine, Jinniu Maternity and Child Health Hospital of Chengdu, Chengdu, China
| | - Chi Zhang
- Department of Critical Care Medicine, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Frontiers Science Center for Disease-related Molecular Network, Institutes for Systems Genetics, Sichuan University, West China Hospital, Chengdu, China
| | - Alejandro Pazos Sierra
- Department of Computer Science and Information Technologies, Iberian Society of Telehealth and Telemedicine, Research Center for Information and Communications Technologies, Biomedical Research Institute of A Coruña, University of A Coruña, A Coruña, Spain
| | - Bairong Shen
- Department of Critical Care Medicine, Joint Laboratory of Artifcial Intelligence for Critical Care Medicine, Frontiers Science Center for Disease-related Molecular Network, Institutes for Systems Genetics, Sichuan University, West China Hospital, Chengdu, China
| |
Collapse
|
7
|
Wang J, Shue K, Liu L, Hu G. Preliminary evaluation of ChatGPT model iterations in emergency department diagnostics. Sci Rep 2025; 15:10426. [PMID: 40140500 PMCID: PMC11947261 DOI: 10.1038/s41598-025-95233-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Accepted: 03/19/2025] [Indexed: 03/28/2025] Open
Abstract
Large language model chatbots such as ChatGPT have shown the potential in assisting health professionals in emergency departments (EDs). However, the diagnostic accuracy of newer ChatGPT models remains unclear. This retrospective study evaluated the diagnostic performance of various ChatGPT models-including GPT-3.5, GPT-4, GPT-4o, and o1 series-in predicting diagnoses for ED patients (n = 30) and examined the impact of explicitly invoking reasoning (thoughts). Earlier models, such as GPT-3.5, demonstrated high accuracy for top-three differential diagnoses (80.0% in accuracy) but underperformed in identifying leading diagnoses (47.8%) compared to newer models such as chatgpt-4o-latest (60%, p < 0.01) and o1-preview (60%, p < 0.01). Asking for thoughts to be provided significantly enhanced the performance on predicting leading diagnosis for 4o models such as 4o-2024-0513 (from 45.6 to 56.7%; p = 0.03) and 4o-mini-2024-07-18 (from 54.4 to 60.0%; p = 0.04) but had minimal impact on o1-mini and o1-preview. In challenging cases, such as pneumonia without fever, all models generally failed to predict the correct diagnosis, indicating atypical presentations as a major limitation for ED application of current ChatGPT models.
Collapse
Affiliation(s)
- Jinge Wang
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV, 26506, USA
| | - Kenneth Shue
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV, 26506, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA
- Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA
| | - Gangqing Hu
- Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV, 26506, USA.
| |
Collapse
|
8
|
Gim H, Cook B, Le J, Stretton B, Gao C, Gupta A, Kovoor J, Guo C, Arnold M, Gheihman G, Bacchi S. Large language model-supported interactive case-based learning: a pilot study. Intern Med J 2025. [PMID: 40125598 DOI: 10.1111/imj.70030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Accepted: 02/16/2025] [Indexed: 03/25/2025]
Abstract
Large language models (LLMs) have been proposed as a means to augment case-based learning but are prone to generating factually incorrect content. In this study, an LLM-based tool was developed, and its performance evaluated. In response to student-generated questions, the LLM adhered to the provided screenplay in 832/857 (97.1%) instances, and in the remaining instances, it was medically appropriate in 24/25 (96.0%) cases. Use of LLM appears to be feasible for this purpose, and further studies are required to examine their educational impact.
Collapse
Affiliation(s)
- Haelynn Gim
- Harvard Medical School, Boston, Massachusetts, USA
| | - Benjamin Cook
- Adelaide Medical School, The University of Adelaide, Adelaide, South Australia, Australia
| | - Jasmin Le
- Adelaide Medical School, The University of Adelaide, Adelaide, South Australia, Australia
| | - Brandon Stretton
- Adelaide Medical School, The University of Adelaide, Adelaide, South Australia, Australia
| | - Christina Gao
- Adelaide Medical School, The University of Adelaide, Adelaide, South Australia, Australia
| | - Aashray Gupta
- Adelaide Medical School, The University of Adelaide, Adelaide, South Australia, Australia
| | - Joshua Kovoor
- Adelaide Medical School, The University of Adelaide, Adelaide, South Australia, Australia
- Ballarat Base Hospital, Ballarat, Victoria, Australia
| | - Christina Guo
- The Alfred, Melbourne, Victoria, Australia
- Johns Hopkins, Baltimore, Maryland, USA
| | - Matthew Arnold
- Adelaide Medical School, The University of Adelaide, Adelaide, South Australia, Australia
| | - Galina Gheihman
- Harvard Medical School, Boston, Massachusetts, USA
- Mass General Brigham, Boston, Massachusetts, USA
| | - Stephen Bacchi
- Harvard Medical School, Boston, Massachusetts, USA
- Adelaide Medical School, The University of Adelaide, Adelaide, South Australia, Australia
- Massachusetts General Hospital, Boston, Massachusetts, USA
- Flinders University, Adelaide, South Australia, Australia
- Lyell McEwin Hospital, Adelaide, South Australia, Australia
| |
Collapse
|
9
|
Lo Bianco G, Robinson CL, D’Angelo FP, Cascella M, Natoli S, Sinagra E, Mercadante S, Drago F. Effectiveness of Generative Artificial Intelligence-Driven Responses to Patient Concerns in Long-Term Opioid Therapy: Cross-Model Assessment. Biomedicines 2025; 13:636. [PMID: 40149612 PMCID: PMC11940240 DOI: 10.3390/biomedicines13030636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2025] [Revised: 02/28/2025] [Accepted: 03/04/2025] [Indexed: 03/29/2025] Open
Abstract
Background: While long-term opioid therapy is a widely utilized strategy for managing chronic pain, many patients have understandable questions and concerns regarding its safety, efficacy, and potential for dependency and addiction. Providing clear, accurate, and reliable information is essential for fostering patient understanding and acceptance. Generative artificial intelligence (AI) applications offer interesting avenues for delivering patient education in healthcare. This study evaluates the reliability, accuracy, and comprehensibility of ChatGPT's responses to common patient inquiries about opioid long-term therapy. Methods: An expert panel selected thirteen frequently asked questions regarding long-term opioid therapy based on the authors' clinical experience in managing chronic pain patients and a targeted review of patient education materials. Questions were prioritized based on prevalence in patient consultations, relevance to treatment decision-making, and the complexity of information typically required to address them comprehensively. We assessed comprehensibility by implementing the multimodal generative AI Copilot (Microsoft 365 Copilot Chat). Spanning three domains-pre-therapy, during therapy, and post-therapy-each question was submitted to GPT-4.0 with the prompt "If you were a physician, how would you answer a patient asking…". Ten pain physicians and two non-healthcare professionals independently assessed the responses using a Likert scale to rate reliability (1-6 points), accuracy (1-3 points), and comprehensibility (1-3 points). Results: Overall, ChatGPT's responses demonstrated high reliability (5.2 ± 0.6) and good comprehensibility (2.8 ± 0.2), with most answers meeting or exceeding predefined thresholds. Accuracy was moderate (2.7 ± 0.3), with lower performance on more technical topics like opioid tolerance and dependency management. Conclusions: While AI applications exhibit significant potential as a supplementary tool for patient education on opioid long-term therapy, limitations in addressing highly technical or context-specific queries underscore the need for ongoing refinement and domain-specific training. Integrating AI systems into clinical practice should involve collaboration between healthcare professionals and AI developers to ensure safe, personalized, and up-to-date patient education in chronic pain management.
Collapse
Affiliation(s)
- Giuliano Lo Bianco
- Anesthesiology and Pain Department, Foundation G. Giglio Cefalù, 90015 Palermo, Italy
| | - Christopher L. Robinson
- Anesthesiology, Perioperative, and Pain Medicine, Brigham and Women’s Hospital, Harvard Medical School, Harvard University, Boston, MA 02115, USA;
| | - Francesco Paolo D’Angelo
- Department of Anaesthesia, Intensive Care and Emergency, University Hospital Policlinico Paolo Giaccone, 90127 Palermo, Italy;
| | - Marco Cascella
- Anesthesia and Pain Medicine, Department of Medicine, Surgery and Dentistry “Scuola Medica Salernitana”, University of Salerno, 84081 Baronissi, Italy;
| | - Silvia Natoli
- Department of Clinical-Surgical, Diagnostic and Pediatric Sciences, University of Pavia, 27100 Pavia, Italy;
- Pain Unit, Fondazione IRCCS Policlinico San Matteo, 27100 Pavia, Italy
| | - Emanuele Sinagra
- Gastroenterology and Endoscopy Unit, Fondazione Istituto San Raffaele Giglio, 90015 Cefalù, Italy;
| | - Sebastiano Mercadante
- Main Regional Center for Pain Relief and Supportive/Palliative Care, La Maddalena Cancer Center, Via San Lorenzo 312, 90146 Palermo, Italy;
| | - Filippo Drago
- Department of Biomedical and Biotechnological Sciences, University of Catania, 95124 Catania, Italy;
| |
Collapse
|
10
|
Phu J, Wang H, Kalloniatis M. Re: 'Using ChatGPT-4 in visual field test assessment'. Clin Exp Optom 2025:1-2. [PMID: 40032637 DOI: 10.1080/08164622.2025.2472876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025] Open
Affiliation(s)
- Jack Phu
- School of Optometry and Vision Science, University of New South Wales, Kensington, New South Wales, Australia
| | - Henrietta Wang
- School of Optometry and Vision Science, University of New South Wales, Kensington, New South Wales, Australia
| | | |
Collapse
|
11
|
Trapp C, Schmidt-Hegemann N, Keilholz M, Brose SF, Marschner SN, Schönecker S, Maier SH, Dehelean DC, Rottler M, Konnerth D, Belka C, Corradini S, Rogowski P. Patient- and clinician-based evaluation of large language models for patient education in prostate cancer radiotherapy. Strahlenther Onkol 2025; 201:333-342. [PMID: 39792259 PMCID: PMC11839798 DOI: 10.1007/s00066-024-02342-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 11/18/2024] [Indexed: 01/12/2025]
Abstract
BACKGROUND This study aims to evaluate the capabilities and limitations of large language models (LLMs) for providing patient education for men undergoing radiotherapy for localized prostate cancer, incorporating assessments from both clinicians and patients. METHODS Six questions about definitive radiotherapy for prostate cancer were designed based on common patient inquiries. These questions were presented to different LLMs [ChatGPT‑4, ChatGPT-4o (both OpenAI Inc., San Francisco, CA, USA), Gemini (Google LLC, Mountain View, CA, USA), Copilot (Microsoft Corp., Redmond, WA, USA), and Claude (Anthropic PBC, San Francisco, CA, USA)] via the respective web interfaces. Responses were evaluated for readability using the Flesch Reading Ease Index. Five radiation oncologists assessed the responses for relevance, correctness, and completeness using a five-point Likert scale. Additionally, 35 prostate cancer patients evaluated the responses from ChatGPT‑4 for comprehensibility, accuracy, relevance, trustworthiness, and overall informativeness. RESULTS The Flesch Reading Ease Index indicated that the responses from all LLMs were relatively difficult to understand. All LLMs provided answers that clinicians found to be generally relevant and correct. The answers from ChatGPT‑4, ChatGPT-4o, and Claude AI were also found to be complete. However, we found significant differences between the performance of different LLMs regarding relevance and completeness. Some answers lacked detail or contained inaccuracies. Patients perceived the information as easy to understand and relevant, with most expressing confidence in the information and a willingness to use ChatGPT‑4 for future medical questions. ChatGPT-4's responses helped patients feel better informed, despite the initially standardized information provided. CONCLUSION Overall, LLMs show promise as a tool for patient education in prostate cancer radiotherapy. While improvements are needed in terms of accuracy and readability, positive feedback from clinicians and patients suggests that LLMs can enhance patient understanding and engagement. Further research is essential to fully realize the potential of artificial intelligence in patient education.
Collapse
Affiliation(s)
- Christian Trapp
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany.
| | - Nina Schmidt-Hegemann
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Michael Keilholz
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Sarah Frederike Brose
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Sebastian N Marschner
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Stephan Schönecker
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Sebastian H Maier
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Diana-Coralia Dehelean
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Maya Rottler
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Dinah Konnerth
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Claus Belka
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
- Bavarian Cancer Research Center (BZKF), Munich, Germany
- German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| | - Stefanie Corradini
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Paul Rogowski
- Department of Radiation Oncology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| |
Collapse
|
12
|
García-Rudolph A, Sanchez-Pinsach D, Caridad Fernandez M, Cunyat S, Opisso E, Hernandez-Pena E. How Chatbots Respond to NCLEX-RN Practice Questions: Assessment of Google Gemini, GPT-3.5, and GPT-4. Nurs Educ Perspect 2025; 46:E18-E20. [PMID: 39692545 DOI: 10.1097/01.nep.0000000000001364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2024]
Abstract
ABSTRACT ChatGPT often "hallucinates" or misleads, underscoring the need for formal validation at the professional level for reliable use in nursing education. We evaluated two free chatbots (Google Gemini and GPT-3.5) and a commercial version (GPT-4) on 250 standardized questions from a simulated nursing licensure exam, which closely matches the content and complexity of the actual exam. Gemini achieved 73.2 percent (183/250), GPT-3.5 achieved 72 percent (180/250), and GPT-4 reached a notably higher performance with 92.4 percent (231/250). GPT-4 exhibited its highest error rate (13.3%) in the psychosocial integrity category.
Collapse
Affiliation(s)
- Alejandro García-Rudolph
- About the Authors Alejandro García-Rudolph, PhD; David Sanchez-Pinsach, PhD; Mira Caridad Fernandez, MSc; Sandra Cunyat, MSc; Eloy Opisso, PhD; and Elena Hernandez-Pena, MSc, are faculty, Institut Guttmann Hospital de Neurorehabilitació, Barcelona, Spain. The authors are grateful to Olga Araujo of the Institut Guttmann-Documentation Office for her support in accessing the literature. For more information, contact Dr. Alejandro García-Rudolph at
| | | | | | | | | | | |
Collapse
|
13
|
Lehnen NC, Kürsch J, Wichtmann BD, Wolter M, Bendella Z, Bode FJ, Zimmermann H, Radbruch A, Vollmuth P, Dorn F. Llama 3.1 405B Is Comparable to GPT-4 for Extraction of Data from Thrombectomy Reports-A Step Towards Secure Data Extraction. Clin Neuroradiol 2025:10.1007/s00062-025-01500-z. [PMID: 39998651 DOI: 10.1007/s00062-025-01500-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 01/12/2025] [Indexed: 02/27/2025]
Abstract
PURPOSE GPT‑4 has been shown to correctly extract procedural details from free-text reports on mechanical thrombectomy. However, GPT may not be suitable for analyzing reports containing personal data. The purpose of this study was to evaluate the ability of the large language models (LLM) Llama3.1 405B, Llama3 70B, Llama3 8B, and Mixtral 8X7B, that can be operated offline, to extract procedural details from free-text reports on mechanical thrombectomies. METHODS Free-text reports on mechanical thrombectomy from two institutions were included. A detailed prompt was used in German and English languages. The ability of the LLMs to extract procedural data was compared to GPT‑4 using McNemar's test. The manual data entries made by an interventional neuroradiologist served as the reference standard. RESULTS 100 reports from institution 1 (mean age 74.7 ± 13.2 years; 53 females) and 30 reports from institution 2 (mean age 72.7 ± 13.5 years; 18 males) were included. Llama 3.1 405B extracted 2619 of 2800 data points correctly (93.5% [95%CI: 92.6%, 94.4%], p = 0.39 vs. GPT-4). Llama3 70B with the English prompt extracted 2537 data points correctly (90.6% [95%CI: 89.5%, 91.7%], p < 0.001 vs. GPT-4), and 2471 (88.2% [95%CI: 87.0%, 89.4%], p < 0.001 vs. GPT-4) with the German prompt. Llama 3 8B extracted 2314 data points correctly (86.1% [95%CI: 84.8%, 87.4%], p < 0.001 vs. GPT-4), and Mixtral 8X7B extracted 2411 (86.1% [95%CI: 84.8%, 87.4%], p < 0.001 vs. GPT-4) correctly. CONCLUSION Llama 3.1 405B was equal to GPT‑4 for data extraction from free-text reports on mechanical thrombectomies and may represent a data secure alternative, when operated locally.
Collapse
Affiliation(s)
- Nils C Lehnen
- Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127, Bonn, Germany.
| | - Johannes Kürsch
- Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| | - Barbara D Wichtmann
- Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| | - Moritz Wolter
- High Performance Computing & Analytics Lab, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Zeynep Bendella
- Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| | - Felix J Bode
- Department of Vascular Neurology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127, Bonn, Germany
| | - Hanna Zimmermann
- Institute of Neuroradiology, University Hospital, LMU Munich, Munich, Germany
| | - Alexander Radbruch
- Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| | - Philipp Vollmuth
- Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| | - Franziska Dorn
- Department of Neuroradiology, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| |
Collapse
|
14
|
Lo Bianco G, Cascella M, Li S, Day M, Kapural L, Robinson CL, Sinagra E. Reliability, Accuracy, and Comprehensibility of AI-Based Responses to Common Patient Questions Regarding Spinal Cord Stimulation. J Clin Med 2025; 14:1453. [PMID: 40094896 PMCID: PMC11899866 DOI: 10.3390/jcm14051453] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Revised: 02/06/2025] [Accepted: 02/19/2025] [Indexed: 03/19/2025] Open
Abstract
Background: Although spinal cord stimulation (SCS) is an effective treatment for managing chronic pain, many patients have understandable questions and concerns regarding this therapy. Artificial intelligence (AI) has shown promise in delivering patient education in healthcare. This study evaluates the reliability, accuracy, and comprehensibility of ChatGPT's responses to common patient inquiries about SCS. Methods: Thirteen commonly asked questions regarding SCS were selected based on the authors' clinical experience managing chronic pain patients and a targeted review of patient education materials and relevant medical literature. The questions were prioritized based on their frequency in patient consultations, relevance to decision-making about SCS, and the complexity of the information typically required to comprehensively address the questions. These questions spanned three domains: pre-procedural, intra-procedural, and post-procedural concerns. Responses were generated using GPT-4.0 with the prompt "If you were a physician, how would you answer a patient asking…". Responses were independently assessed by 10 pain physicians and two non-healthcare professionals using a Likert scale for reliability (1-6 points), accuracy (1-3 points), and comprehensibility (1-3 points). Results: ChatGPT's responses demonstrated strong reliability (5.1 ± 0.7) and comprehensibility (2.8 ± 0.2), with 92% and 98% of responses, respectively, meeting or exceeding our predefined thresholds. Accuracy was 2.7 ± 0.3, with 95% of responses rated sufficiently accurate. General queries, such as "What is spinal cord stimulation?" and "What are the risks and benefits?", received higher scores compared to technical questions like "What are the different types of waveforms used in SCS?". Conclusions: ChatGPT can be implemented as a supplementary tool for patient education, particularly in addressing general and procedural queries about SCS. However, the AI's performance was less robust in addressing highly technical or nuanced questions.
Collapse
Affiliation(s)
- Giuliano Lo Bianco
- Anesthesiology and Pain Department, Foundation G. Giglio Cefalù, 90015 Palermo, Italy;
| | - Marco Cascella
- Anesthesia and Pain Medicine, Department of Medicine, Surgery and Dentistry “Scuola Medica Salernitana”, University of Salerno, 84081 Baronissi, Italy
| | - Sean Li
- National Spine and Pain Centers, Shrewsbury, NJ 07702, USA;
| | - Miles Day
- Department of Anesthesiology, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA;
| | | | - Christopher L. Robinson
- Anesthesiology, Perioperative, and Pain Medicine, Harvard Medical School, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Emanuele Sinagra
- Gastroenterology and Endoscopy Unit, Fondazione Istituto San Raffaele Giglio, 90015 Cefalù, Italy;
| |
Collapse
|
15
|
On SW, Cho SW, Park SY, Ha JW, Yi SM, Park IY, Byun SH, Yang BE. Chat Generative Pre-Trained Transformer (ChatGPT) in Oral and Maxillofacial Surgery: A Narrative Review on Its Research Applications and Limitations. J Clin Med 2025; 14:1363. [PMID: 40004892 PMCID: PMC11856154 DOI: 10.3390/jcm14041363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2025] [Revised: 02/17/2025] [Accepted: 02/17/2025] [Indexed: 02/27/2025] Open
Abstract
Objectives: This review aimed to evaluate the role of ChatGPT in original research articles within the field of oral and maxillofacial surgery (OMS), focusing on its applications, limitations, and future directions. Methods: A literature search was conducted in PubMed using predefined search terms and Boolean operators to identify original research articles utilizing ChatGPT published up to October 2024. The selection process involved screening studies based on their relevance to OMS and ChatGPT applications, with 26 articles meeting the final inclusion criteria. Results: ChatGPT has been applied in various OMS-related domains, including clinical decision support in real and virtual scenarios, patient and practitioner education, scientific writing and referencing, and its ability to answer licensing exam questions. As a clinical decision support tool, ChatGPT demonstrated moderate accuracy (approximately 70-80%). It showed moderate to high accuracy (up to 90%) in providing patient guidance and information. However, its reliability remains inconsistent across different applications, necessitating further evaluation. Conclusions: While ChatGPT presents potential benefits in OMS, particularly in supporting clinical decisions and improving access to medical information, it should not be regarded as a substitute for clinicians and must be used as an adjunct tool. Further validation studies and technological refinements are required to enhance its reliability and effectiveness in clinical and research settings.
Collapse
Affiliation(s)
- Sung-Woon On
- Division of Oral and Maxillofacial Surgery, Department of Dentistry, Dongtan Sacred Heart Hospital, Hallym University College of Medicine, Hwaseong 18450, Republic of Korea; (S.-W.O.); (J.-W.H.)
- Department of Artificial Intelligence and Robotics in Dentistry, Graduated School of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea; (S.-W.C.); (S.-Y.P.); (S.-M.Y.); (I.-Y.P.); (S.-H.B.)
- Institute of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea
| | - Seoung-Won Cho
- Department of Artificial Intelligence and Robotics in Dentistry, Graduated School of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea; (S.-W.C.); (S.-Y.P.); (S.-M.Y.); (I.-Y.P.); (S.-H.B.)
- Institute of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea
| | - Sang-Yoon Park
- Department of Artificial Intelligence and Robotics in Dentistry, Graduated School of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea; (S.-W.C.); (S.-Y.P.); (S.-M.Y.); (I.-Y.P.); (S.-H.B.)
- Institute of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea
- Department of Oral and Maxillofacial Surgery, Hallym University Sacred Heart Hospital, Anyang 14066, Republic of Korea
- Dental Artificial Intelligence and Robotics R&D Center, Hallym University Medical Center, Anyang 14066, Republic of Korea
| | - Ji-Won Ha
- Division of Oral and Maxillofacial Surgery, Department of Dentistry, Dongtan Sacred Heart Hospital, Hallym University College of Medicine, Hwaseong 18450, Republic of Korea; (S.-W.O.); (J.-W.H.)
- Department of Artificial Intelligence and Robotics in Dentistry, Graduated School of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea; (S.-W.C.); (S.-Y.P.); (S.-M.Y.); (I.-Y.P.); (S.-H.B.)
- Institute of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea
| | - Sang-Min Yi
- Department of Artificial Intelligence and Robotics in Dentistry, Graduated School of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea; (S.-W.C.); (S.-Y.P.); (S.-M.Y.); (I.-Y.P.); (S.-H.B.)
- Institute of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea
- Department of Oral and Maxillofacial Surgery, Hallym University Sacred Heart Hospital, Anyang 14066, Republic of Korea
- Dental Artificial Intelligence and Robotics R&D Center, Hallym University Medical Center, Anyang 14066, Republic of Korea
| | - In-Young Park
- Department of Artificial Intelligence and Robotics in Dentistry, Graduated School of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea; (S.-W.C.); (S.-Y.P.); (S.-M.Y.); (I.-Y.P.); (S.-H.B.)
- Institute of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea
- Dental Artificial Intelligence and Robotics R&D Center, Hallym University Medical Center, Anyang 14066, Republic of Korea
- Department of Orthodontics, Hallym University Sacred Heart Hospital, Anyang 14066, Republic of Korea
| | - Soo-Hwan Byun
- Department of Artificial Intelligence and Robotics in Dentistry, Graduated School of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea; (S.-W.C.); (S.-Y.P.); (S.-M.Y.); (I.-Y.P.); (S.-H.B.)
- Institute of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea
- Department of Oral and Maxillofacial Surgery, Hallym University Sacred Heart Hospital, Anyang 14066, Republic of Korea
- Dental Artificial Intelligence and Robotics R&D Center, Hallym University Medical Center, Anyang 14066, Republic of Korea
| | - Byoung-Eun Yang
- Department of Artificial Intelligence and Robotics in Dentistry, Graduated School of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea; (S.-W.C.); (S.-Y.P.); (S.-M.Y.); (I.-Y.P.); (S.-H.B.)
- Institute of Clinical Dentistry, Hallym University, Chuncheon 24252, Republic of Korea
- Department of Oral and Maxillofacial Surgery, Hallym University Sacred Heart Hospital, Anyang 14066, Republic of Korea
- Dental Artificial Intelligence and Robotics R&D Center, Hallym University Medical Center, Anyang 14066, Republic of Korea
| |
Collapse
|
16
|
Wang Y, Yang S, Zeng C, Xie Y, Shen Y, Li J, Huang X, Wei R, Chen Y. Evaluating the performance of ChatGPT in patient consultation and image-based preliminary diagnosis in thyroid eye disease. Front Med (Lausanne) 2025; 12:1546706. [PMID: 40041459 PMCID: PMC11876178 DOI: 10.3389/fmed.2025.1546706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2024] [Accepted: 01/27/2025] [Indexed: 03/06/2025] Open
Abstract
Background The emergence of Large Language Model (LLM) chatbots, such as ChatGPT, has great promise for enhancing healthcare practice. Online consultation, accurate pre-diagnosis, and clinical efforts are of fundamental importance for the patient-oriented management system. Objective This cross-sectional study aims to evaluate the performance of ChatGPT in inquiries across ophthalmic domains and to focus on Thyroid Eye Disease (TED) consultation and image-based preliminary diagnosis in a non-English language. Methods We obtained frequently consulted clinical inquiries from a published reference based on patient consultation data, titled A Comprehensive Collection of Thyroid Eye Disease Knowledge. Additionally, we collected facial and Computed Tomography (CT) images from 16 patients with a definitive diagnosis of TED. From 18 to 30 May 2024, inquiries about the TED consultation and preliminary diagnosis were posed to ChatGPT using a new chat for each question. Responses to questions from ChatGPT-4, 4o, and an experienced ocular professor were compiled into three questionnaires, which were evaluated by patients and ophthalmologists on four dimensions: accuracy, comprehensiveness, conciseness, and satisfaction. The preliminary diagnosis of TED was deemed accurate, and the differences in the accuracy rates were further calculated. Results For common TED consultation questions, ChatGPT-4o delivered more accurate information with logical consistency, adhering to a structured format of disease definition, detailed sections, and summarized conclusions. Notably, the answers generated by ChatGPT-4o were rated higher than those of ChatGPT-4 and the professor, with accuracy (4.33 [0.69]), comprehensiveness (4.17 [0.75]), conciseness (4.12 [0.77]), and satisfaction (4.28 [0.70]). The characteristics of the evaluators, the response variables, and other quality scores were all correlated with overall satisfaction levels. Based on several facial images, ChatGPT-4 twice failed to make diagnoses because of lacking characteristic symptoms or a complete medical history, whereas ChatGPT-4o accurately identified the pathologic conditions in 31.25% of cases (95% confidence interval, CI: 11.02-58.66%). Furthermore, in combination with CT images, ChatGPT-4o performed comparably to the professor in terms of diagnosis accuracy (87.5, 95% CI 61.65-98.45%). Conclusion ChatGPT-4o excelled in comprehensive and satisfactory patient consultation and imaging interpretation, indicating the potential to improve clinical practice efficiency. However, limitations in disinformation management and legal permissions remain major concerns, which require further investigation in clinical practice.
Collapse
Affiliation(s)
- Yue Wang
- Department of Ophthalmology, Changzheng Hospital of Naval Medical University, Shanghai, China
| | - Shuo Yang
- Department of Ophthalmology, Changzheng Hospital of Naval Medical University, Shanghai, China
| | - Chengcheng Zeng
- Department of Ophthalmology, Changzheng Hospital of Naval Medical University, Shanghai, China
| | - Yingwei Xie
- Department of Urology, Beijing Tongren Hospital of Capital Medical University, Beijing, China
| | - Ya Shen
- Department of Ophthalmology, Changzheng Hospital of Naval Medical University, Shanghai, China
| | - Jian Li
- Department of Ophthalmology, Changzheng Hospital of Naval Medical University, Shanghai, China
| | - Xiao Huang
- Department of Ophthalmology, Changzheng Hospital of Naval Medical University, Shanghai, China
| | - Ruili Wei
- Department of Ophthalmology, Changzheng Hospital of Naval Medical University, Shanghai, China
| | - Yuqing Chen
- Department of Ophthalmology, Changzheng Hospital of Naval Medical University, Shanghai, China
| |
Collapse
|
17
|
Huo B, Boyle A, Marfo N, Tangamornsuksan W, Steen JP, McKechnie T, Lee Y, Mayol J, Antoniou SA, Thirunavukarasu AJ, Sanger S, Ramji K, Guyatt G. Large Language Models for Chatbot Health Advice Studies: A Systematic Review. JAMA Netw Open 2025; 8:e2457879. [PMID: 39903463 PMCID: PMC11795331 DOI: 10.1001/jamanetworkopen.2024.57879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 11/26/2024] [Indexed: 02/06/2025] Open
Abstract
Importance There is much interest in the clinical integration of large language models (LLMs) in health care. Many studies have assessed the ability of LLMs to provide health advice, but the quality of their reporting is uncertain. Objective To perform a systematic review to examine the reporting variability among peer-reviewed studies evaluating the performance of generative artificial intelligence (AI)-driven chatbots for summarizing evidence and providing health advice to inform the development of the Chatbot Assessment Reporting Tool (CHART). Evidence Review A search of MEDLINE via Ovid, Embase via Elsevier, and Web of Science from inception to October 27, 2023, was conducted with the help of a health sciences librarian to yield 7752 articles. Two reviewers screened articles by title and abstract followed by full-text review to identify primary studies evaluating the clinical accuracy of generative AI-driven chatbots in providing health advice (chatbot health advice studies). Two reviewers then performed data extraction for 137 eligible studies. Findings A total of 137 studies were included. Studies examined topics in surgery (55 [40.1%]), medicine (51 [37.2%]), and primary care (13 [9.5%]). Many studies focused on treatment (91 [66.4%]), diagnosis (60 [43.8%]), or disease prevention (29 [21.2%]). Most studies (136 [99.3%]) evaluated inaccessible, closed-source LLMs and did not provide enough information to identify the version of the LLM under evaluation. All studies lacked a sufficient description of LLM characteristics, including temperature, token length, fine-tuning availability, layers, and other details. Most studies (136 [99.3%]) did not describe a prompt engineering phase in their study. The date of LLM querying was reported in 54 (39.4%) studies. Most studies (89 [65.0%]) used subjective means to define the successful performance of the chatbot, while less than one-third addressed the ethical, regulatory, and patient safety implications of the clinical integration of LLMs. Conclusions and Relevance In this systematic review of 137 chatbot health advice studies, the reporting quality was heterogeneous and may inform the development of the CHART reporting standards. Ethical, regulatory, and patient safety considerations are crucial as interest grows in the clinical integration of LLMs.
Collapse
Affiliation(s)
- Bright Huo
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Amy Boyle
- Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Nana Marfo
- H. Ross University School of Medicine, Miramar, Florida
| | - Wimonchat Tangamornsuksan
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Jeremy P. Steen
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Tyler McKechnie
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Yung Lee
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Julio Mayol
- Hospital Clinico San Carlos, IdISSC, Universidad Complutense de Madrid, Madrid, Spain
| | | | | | - Stephanie Sanger
- Health Science Library, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
| | - Karim Ramji
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Gordon Guyatt
- Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
18
|
Kahlon S, Sleet M, Sujka J, Docimo S, DuCoin C, Dimou F, Mhaskar R. Evaluating the concordance of ChatGPT and physician recommendations for bariatric surgery. Can J Physiol Pharmacol 2025; 103:70-74. [PMID: 39561352 DOI: 10.1139/cjpp-2024-0026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2024]
Abstract
Integrating artificial intelligence (AI) into healthcare prompts the need to measure its proficiency relative to human experts. This study evaluates the proficiency of ChatGPT, an OpenAI language model, in offering guidance concerning bariatric surgery compared to bariatric surgeons. Five clinical scenarios representative of diverse bariatric surgery situations were given to American Society for Metabolic and Bariatric Surgery (ASMBS)-accredited bariatric surgeons and ChatGPT. Both groups proposed medical or surgical management for the patients depicted in each scenario. The outcomes from both the surgeons and ChatGPT were examined and matched with the clinical benchmarks set by the ASMBS. There was a high degree of agreement between ChatGPT and physicians on the three simpler clinical scenarios. There was a positive correlation between physicians' and ChatGPT answers for not recommending surgery. ChatGPT's advice aligned with ASMBS guidelines 60% of the time, in contrast to bariatric surgeons, who consistently aligned with the guidelines 100% of the time. ChatGPT showcases potential in offering guidance on bariatric surgery, but it does not have the comprehensive and personalized perspective that doctors exhibit consistently. Enhancing AI's training on intricate patient situations will bolster its role in the medical field.
Collapse
Affiliation(s)
- Sunny Kahlon
- University of South Florida Health Morsani College of Medicine, Tampa, FL, USA
| | - Mary Sleet
- University of South Florida Health Morsani College of Medicine, Tampa, FL, USA
| | - Joseph Sujka
- Department of Surgery, University of South Florida Morsani College of Medicine, Tampa, FL, USA
| | - Salvatore Docimo
- Department of Surgery, University of South Florida Morsani College of Medicine, Tampa, FL, USA
| | - Christopher DuCoin
- Department of Surgery, University of South Florida Morsani College of Medicine, Tampa, FL, USA
| | - Francesca Dimou
- Department of Surgery, University of South Florida Morsani College of Medicine, Tampa, FL, USA
| | - Rahul Mhaskar
- Department of Internal Medicine and Medical Education, University of South Florida Morsani College of Medicine, Tampa, FL, USA
| |
Collapse
|
19
|
Mehta R, Reitz JG, Venna A, Selcuk A, Dhamala B, Klein J, Sawda C, Haverty M, Yerebakan C, Tongut A, Desai M, d'Udekem Y. Navigating the future of pediatric cardiovascular surgery: Insights and innovation powered by Chat Generative Pre-Trained Transformer (ChatGPT). J Thorac Cardiovasc Surg 2025:S0022-5223(25)00093-5. [PMID: 39894069 DOI: 10.1016/j.jtcvs.2025.01.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 12/16/2024] [Accepted: 01/10/2025] [Indexed: 02/04/2025]
Abstract
INTRODUCTION Interdisciplinary consultations are essential to decision-making for patients with congenital heart disease. The integration of artificial intelligence (AI) and natural language processing into medical practice is rapidly accelerating, opening new avenues to diagnosis and treatment. The main objective of this study was to consult the AI-trained model Chat Generative Pre-Trained Transformer (ChatGPT) regarding cases discussed during a cardiovascular surgery conference (CSC) at a single tertiary center and compare the ChatGPT suggestions with CSC expert consensus results. METHODS In total, 37 cases discussed at a single CSC were retrospectively identified. Clinical information comprised deidentified data from the last electrocardiogram, echocardiogram, intensive care unit progress note (or cardiology clinic note if outpatient), as well as a patient summary. The diagnosis was removed from the summary and possible treatment options were deleted from all notes. ChatGPT (version 4.0) was asked to summarize the case, identify diagnoses, and recommend surgical procedures and timing of surgery. The responses of ChatGPT were compared with the results of the CSC. RESULTS Of the 37 cases uploaded to ChatGPT, 45.9% (n = 17) were considered to be less complex cases, with only 1 treatment option, and 54.1% (n = 20) were considered more complex, with several treatment options. ChatGPT correctly provided a detailed and systematically written summary for each case within 10 to 15 seconds. ChatGPT correctly identified diagnoses for approximately 94.5% (n = 35) cases. The surgical intervention plan matched the group decision for approximately 40.5% (n = 15) cases; however, it differed in 27% cases. In 23 of 37 cases, timing of surgery was the same between CSC group and ChatGPT. Overall, the match between ChatGPT responses and CSC decisions for diagnosis was 94.5%, surgical intervention was 40.5%, and timing of surgery was 62.2%. However, within complex cases, we have 25% agreement for surgical intervention and 67% for timing of surgery. CONCLUSIONS ChatGPT can be used as an augmentative tool for surgical conferences to systematically summarize large amounts of patient data from electronic health records and clinical notes in seconds. In addition, our study points out the potential of ChatGPT as an AI-based decision support tool in surgery, particularly for less-complex cases. The discrepancy, particularly in complex cases, emphasizes on the need for caution when using ChatGPT in decision-making for the complex cases in pediatric cardiovascular surgery. There is little doubt that the public will soon use this comparative tool.
Collapse
Affiliation(s)
- Rittal Mehta
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Justus G Reitz
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Alyssia Venna
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Arif Selcuk
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Bishakha Dhamala
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Jennifer Klein
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Christine Sawda
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Mitchell Haverty
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Can Yerebakan
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Aybala Tongut
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Manan Desai
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC
| | - Yves d'Udekem
- Department of Cardiac Surgery, Children's National Heart Institute, Children's National Hospital, Washington, DC.
| |
Collapse
|
20
|
Ma Y, Achiche S, Tu G, Vicente S, Lessard D, Engler K, Lemire B, Laymouna M, de Pokomandy A, Cox J, Lebouché B. The first AI-based Chatbot to promote HIV self-management: A mixed methods usability study. HIV Med 2025; 26:184-206. [PMID: 39390632 PMCID: PMC11786622 DOI: 10.1111/hiv.13720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Accepted: 09/20/2024] [Indexed: 10/12/2024]
Abstract
BACKGROUND We developed MARVIN, an artificial intelligence (AI)-based chatbot that provides 24/7 expert-validated information on self-management-related topics for people with HIV. This study assessed (1) the feasibility of using MARVIN, (2) its usability and acceptability, and (3) four usability subconstructs (perceived ease of use, perceived usefulness, attitude towards use, and behavioural intention to use). METHODS In a mixed-methods study conducted at the McGill University Health Centre, enrolled participants were asked to have 20 conversations within 3 weeks with MARVIN on predetermined topics and to complete a usability questionnaire. Feasibility, usability, acceptability, and usability subconstructs were examined against predetermined success thresholds. Qualitatively, randomly selected participants were invited to semi-structured focus groups/interviews to discuss their experiences with MARVIN. Barriers and facilitators were identified according to the four usability subconstructs. RESULTS From March 2021 to April 2022, 28 participants were surveyed after a 3-week testing period, and nine were interviewed. Study retention was 70% (28/40). Mean usability exceeded the threshold (69.9/68), whereas mean acceptability was very close to target (23.8/24). Ratings of attitude towards MARVIN's use were positive (+14%), with the remaining subconstructs exceeding the target (5/7). Facilitators included MARVIN's reliable and useful real-time information support, its easy accessibility, provision of convivial conversations, confidentiality, and perception as being emotionally safe. However, MARVIN's limited comprehension and the use of Facebook as an implementation platform were identified as barriers, along with the need for more conversation topics and new features (e.g., memorization). CONCLUSIONS The study demonstrated MARVIN's global usability. Our findings show its potential for HIV self-management and provide direction for further development.
Collapse
Affiliation(s)
- Yuanchao Ma
- Department of Biomedical Engineering, Polytechnique Montréal, Montreal, Quebec, Canada
- Centre for Outcomes Research & Evaluation, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, Quebec, Canada
- Chronic Viral Illness Service, Division of Infectious Disease, Department of Medicine, McGill University Health Centre, Montreal, Quebec, Canada
| | - Sofiane Achiche
- Department of Biomedical Engineering, Polytechnique Montréal, Montreal, Quebec, Canada
| | - Gavin Tu
- Faculty of Medicine, Université Laval, Quebec, Quebec, Canada
| | - Serge Vicente
- Centre for Outcomes Research & Evaluation, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, Quebec, Canada
- Department of Family Medicine, Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada
- Department of Mathematics and Statistics, University of Montreal, Montreal, Quebec, Canada
| | - David Lessard
- Centre for Outcomes Research & Evaluation, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, Quebec, Canada
- Chronic Viral Illness Service, Division of Infectious Disease, Department of Medicine, McGill University Health Centre, Montreal, Quebec, Canada
| | - Kim Engler
- Centre for Outcomes Research & Evaluation, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, Quebec, Canada
| | - Benoît Lemire
- Chronic Viral Illness Service, Division of Infectious Disease, Department of Medicine, McGill University Health Centre, Montreal, Quebec, Canada
- Department of Pharmacy, McGill University Health Centre, Montreal, Quebec, Canada
| | - Moustafa Laymouna
- Centre for Outcomes Research & Evaluation, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, Quebec, Canada
- Department of Family Medicine, Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada
| | - Alexandra de Pokomandy
- Centre for Outcomes Research & Evaluation, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, Quebec, Canada
- Chronic Viral Illness Service, Division of Infectious Disease, Department of Medicine, McGill University Health Centre, Montreal, Quebec, Canada
- Department of Family Medicine, Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada
| | - Joseph Cox
- Centre for Outcomes Research & Evaluation, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, Quebec, Canada
- Chronic Viral Illness Service, Division of Infectious Disease, Department of Medicine, McGill University Health Centre, Montreal, Quebec, Canada
- Department of Epidemiology, Biostatistics, and Occupational Health, Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada
| | - Bertrand Lebouché
- Centre for Outcomes Research & Evaluation, Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, Quebec, Canada
- Chronic Viral Illness Service, Division of Infectious Disease, Department of Medicine, McGill University Health Centre, Montreal, Quebec, Canada
- Department of Family Medicine, Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
21
|
Koirala P, Thongprayoon C, Miao J, Garcia Valencia OA, Sheikh MS, Suppadungsuk S, Mao MA, Pham JH, Craici IM, Cheungpasitporn W. Evaluating AI performance in nephrology triage and subspecialty referrals. Sci Rep 2025; 15:3455. [PMID: 39870788 PMCID: PMC11772766 DOI: 10.1038/s41598-025-88074-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 01/23/2025] [Indexed: 01/29/2025] Open
Abstract
Artificial intelligence (AI) has shown promise in revolutionizing medical triage, particularly in the context of the rising prevalence of kidney-related conditions with the aging global population. This study evaluates the utility of ChatGPT, a large language model, in triaging nephrology cases through simulated real-world scenarios. Two nephrologists created 100 patient cases that encompassed various aspects of nephrology. ChatGPT's performance in determining the appropriateness of nephrology consultations and identifying suitable nephrology subspecialties was assessed. The results demonstrated high accuracy; ChatGPT correctly determined the need for nephrology in 99-100% of cases, and it accurately identified the most suitable nephrology subspecialty triage in 96-99% of cases across two evaluation rounds. The agreement between the two rounds was 97%. While ChatGPT showed promise in improving medical triage efficiency and accuracy, the study also identified areas for refinement. This included the need for better integration of multidisciplinary care for patients with complex, intersecting medical conditions. This study's findings highlight the potential of AI in enhancing decision-making processes in clinical workflow, and it can inform the development of AI-assisted triage systems tailored to institution-specific practices including multidisciplinary approaches.
Collapse
Affiliation(s)
| | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Jing Miao
- Division of Nephrology and Hypertension, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Oscar A Garcia Valencia
- Division of Nephrology and Hypertension, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Mohammad S Sheikh
- Division of Nephrology and Hypertension, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Supawadee Suppadungsuk
- Division of Nephrology and Hypertension, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
- Faculty of Medicine Ramathibodi Hospital, Chakri Naruebodindra Medical Institute, Mahidol University, Samut Prakan, 10540, Thailand
| | - Michael A Mao
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Jacksonville, FL, 32224, USA
| | - Justin H Pham
- Internal Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - Iasmina M Craici
- Division of Nephrology and Hypertension, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA.
| |
Collapse
|
22
|
García-Rudolph A, Sanchez-Pinsach D, Opisso E. Evaluating AI Models: Performance Validation Using Formal Multiple-Choice Questions in Neuropsychology. Arch Clin Neuropsychol 2025; 40:150-155. [PMID: 39231527 DOI: 10.1093/arclin/acae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 08/13/2024] [Accepted: 08/19/2024] [Indexed: 09/06/2024] Open
Abstract
High-quality and accessible education is crucial for advancing neuropsychology. A recent study identified key barriers to board certification in clinical neuropsychology, such as time constraints and insufficient specialized knowledge. To address these challenges, this study explored the capabilities of advanced Artificial Intelligence (AI) language models, GPT-3.5 (free-version) and GPT-4.0 (under-subscription version), by evaluating their performance on 300 American Board of Professional Psychology in Clinical Neuropsychology-like questions. The results indicate that GPT-4.0 achieved a higher accuracy rate of 80.0% compared to GPT-3.5's 65.7%. In the "Assessment" category, GPT-4.0 demonstrated a notable improvement with an accuracy rate of 73.4% compared to GPT-3.5's 58.6% (p = 0.012). The "Assessment" category, which comprised 128 questions and exhibited the highest error rate by both AI models, was analyzed. A thematic analysis of the 26 incorrectly answered questions revealed 8 main themes and 17 specific codes, highlighting significant gaps in areas such as "Neurodegenerative Diseases" and "Neuropsychological Testing and Interpretation."
Collapse
Affiliation(s)
- Alejandro García-Rudolph
- Departmento de Investigación e Innovación, Institut Guttmann, Institut Universitari de Neurorehabilitació adscrit a la UAB, Badalona, Barcelona, Spain
- Universitat Autònoma de Barcelona, Bellaterra (Cerdanyola del Vallès), Spain
- Fundació Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol, Badalona, Barcelona, Spain
| | - David Sanchez-Pinsach
- Departmento de Investigación e Innovación, Institut Guttmann, Institut Universitari de Neurorehabilitació adscrit a la UAB, Badalona, Barcelona, Spain
- Universitat Autònoma de Barcelona, Bellaterra (Cerdanyola del Vallès), Spain
- Fundació Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol, Badalona, Barcelona, Spain
| | - Eloy Opisso
- Departmento de Investigación e Innovación, Institut Guttmann, Institut Universitari de Neurorehabilitació adscrit a la UAB, Badalona, Barcelona, Spain
- Universitat Autònoma de Barcelona, Bellaterra (Cerdanyola del Vallès), Spain
- Fundació Institut d'Investigació en Ciències de la Salut Germans Trias i Pujol, Badalona, Barcelona, Spain
| |
Collapse
|
23
|
Kim J, Vajravelu BN. Assessing the Current Limitations of Large Language Models in Advancing Health Care Education. JMIR Form Res 2025; 9:e51319. [PMID: 39819585 PMCID: PMC11756841 DOI: 10.2196/51319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 08/31/2024] [Accepted: 09/03/2024] [Indexed: 01/19/2025] Open
Abstract
Unlabelled The integration of large language models (LLMs), as seen with the generative pretrained transformers series, into health care education and clinical management represents a transformative potential. The practical use of current LLMs in health care sparks great anticipation for new avenues, yet its embracement also elicits considerable concerns that necessitate careful deliberation. This study aims to evaluate the application of state-of-the-art LLMs in health care education, highlighting the following shortcomings as areas requiring significant and urgent improvements: (1) threats to academic integrity, (2) dissemination of misinformation and risks of automation bias, (3) challenges with information completeness and consistency, (4) inequity of access, (5) risks of algorithmic bias, (6) exhibition of moral instability, (7) technological limitations in plugin tools, and (8) lack of regulatory oversight in addressing legal and ethical challenges. Future research should focus on strategically addressing the persistent challenges of LLMs highlighted in this paper, opening the door for effective measures that can improve their application in health care education.
Collapse
Affiliation(s)
- JaeYong Kim
- School of Pharmacy, Massachusetts College of Pharmacy and Health Sciences, Boston, MA, United States
| | - Bathri Narayan Vajravelu
- Department of Physician Assistant Studies, Massachusetts College of Pharmacy and Health Sciences, 179 Longwood Avenue, Boston, MA, 02115, United States, 1 6177322961
| |
Collapse
|
24
|
Jaber SA, Hasan HE, Alzoubi KH, Khabour OF. Knowledge, attitude, and perceptions of MENA researchers towards the use of ChatGPT in research: A cross-sectional study. Heliyon 2025; 11:e41331. [PMID: 39811375 PMCID: PMC11731567 DOI: 10.1016/j.heliyon.2024.e41331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 12/03/2024] [Accepted: 12/17/2024] [Indexed: 01/16/2025] Open
Abstract
Background Artificial intelligence (AI) technologies are increasingly recognized for their potential to revolutionize research practices. However, there is a gap in understanding the perspectives of MENA researchers on ChatGPT. This study explores the knowledge, attitudes, and perceptions of ChatGPT utilization in research. Methods A cross-sectional survey was conducted among 369 MENA researchers. Participants provided demographic information and responded to questions about their knowledge of AI, their experience with ChatGPT, their attitudes toward technology, and their perceptions of the potential roles and benefits of ChatGPT in research. Results The results indicate a moderate level of knowledge about ChatGPT, with a total score of 58.3 ± 19.6. Attitudes towards its use were generally positive, with a total score of 68.1 ± 8.1 expressing enthusiasm for integrating ChatGPT into their research workflow. About 56 % of the sample reported using ChatGPT for various applications. In addition, 27.6 % expressed their intention to use it in their research, while 17.3 % have already started using it in their research. However, perceptions varied, with concerns about accuracy, bias, and ethical implications highlighted. The results showed significant differences in knowledge scores based on gender (p < 0.001), working country (p < 0.05), and work field (p < 0.01). Regarding attitude scores, there were significant differences based on the highest qualification and the employment field (p < 0.05). These findings underscore the need for targeted training programs and ethical guidelines to support the effective use of ChatGPT in research. Conclusion MENA researchers demonstrate significant awareness and interest in integrating ChatGPT into their research workflow. Addressing concerns about reliability and ethical implications is essential for advancing scientific innovation in the MENA region.
Collapse
Affiliation(s)
- Sana'a A. Jaber
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, 22110, Jordan
| | - Hisham E. Hasan
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, 22110, Jordan
| | - Karem H. Alzoubi
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, 22110, Jordan
| | - Omar F. Khabour
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, Jordan University of Science and Technology, Irbid, 22110, Jordan
| |
Collapse
|
25
|
Andriollo L, Picchi A, Iademarco G, Fidanza A, Perticarini L, Rossi SMP, Logroscino G, Benazzo F. The Role of Artificial Intelligence and Emerging Technologies in Advancing Total Hip Arthroplasty. J Pers Med 2025; 15:21. [PMID: 39852213 PMCID: PMC11767033 DOI: 10.3390/jpm15010021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 01/05/2025] [Accepted: 01/07/2025] [Indexed: 01/26/2025] Open
Abstract
Total hip arthroplasty (THA) is a widely performed surgical procedure that has evolved significantly due to advancements in artificial intelligence (AI) and robotics. As demand for THA grows, reliable tools are essential to enhance diagnosis, preoperative planning, surgical precision, and postoperative rehabilitation. AI applications in orthopedic surgery offer innovative solutions, including automated hip osteoarthritis (OA) diagnosis, precise implant positioning, and personalized risk stratification, thereby improving patient outcomes. Deep learning models have transformed OA severity grading and implant identification by automating traditionally manual processes with high accuracy. Additionally, AI-powered systems optimize preoperative planning by predicting the hip joint center and identifying complications using multimodal data. Robotic-assisted THA enhances surgical precision with real-time feedback, reducing complications such as dislocations and leg length discrepancies while accelerating recovery. Despite these advancements, barriers such as cost, accessibility, and the steep learning curve for surgeons hinder widespread adoption. Postoperative rehabilitation benefits from technologies like virtual and augmented reality and telemedicine, which enhance patient engagement and adherence. However, limitations, particularly among elderly populations with lower adaptability to technology, underscore the need for user-friendly platforms. To ensure comprehensiveness, a structured literature search was conducted using PubMed, Scopus, and Web of Science. Keywords included "artificial intelligence", "machine learning", "robotics", and "total hip arthroplasty". Inclusion criteria emphasized peer-reviewed studies published in English within the last decade focusing on technological advancements and clinical outcomes. This review evaluates AI and robotics' role in THA, highlighting opportunities and challenges and emphasizing further research and real-world validation to integrate these technologies into clinical practice effectively.
Collapse
Affiliation(s)
- Luca Andriollo
- Sezione di Chirurgia Protesica ad Indirizzo Robotico—Unità di Traumatologia dello Sport, Ortopedia e Traumatologia, Fondazione Poliambulanza, 25124 Brescia, Italy
- Ortopedia e Traumatologia, Università Cattolica del Sacro Cuore, 00168 Rome, Italy
- Artificial Intelligence Center, Alma Mater Europaea University, 1010 Vienna, Austria
| | - Aurelio Picchi
- Unit of Orthopedics, Department of Life, Health and Environmental Sciences, University of L’Aquila, 67100 L’Aquila, Italy
| | - Giulio Iademarco
- Unit of Orthopedics, Department of Life, Health and Environmental Sciences, University of L’Aquila, 67100 L’Aquila, Italy
| | - Andrea Fidanza
- Unit of Orthopedics, Department of Life, Health and Environmental Sciences, University of L’Aquila, 67100 L’Aquila, Italy
| | - Loris Perticarini
- Sezione di Chirurgia Protesica ad Indirizzo Robotico—Unità di Traumatologia dello Sport, Ortopedia e Traumatologia, Fondazione Poliambulanza, 25124 Brescia, Italy
| | - Stefano Marco Paolo Rossi
- Sezione di Chirurgia Protesica ad Indirizzo Robotico—Unità di Traumatologia dello Sport, Ortopedia e Traumatologia, Fondazione Poliambulanza, 25124 Brescia, Italy
- Department of Life Science, Health, and Health Professions, Università degli Studi Link, Link Campus University, 00165 Rome, Italy
- Biomedical Sciences Area, IUSS University School for Advanced Studies, 27100 Pavia, Italy
| | - Giandomenico Logroscino
- Unit of Orthopedics, Department of Life, Health and Environmental Sciences, University of L’Aquila, 67100 L’Aquila, Italy
| | - Francesco Benazzo
- Sezione di Chirurgia Protesica ad Indirizzo Robotico—Unità di Traumatologia dello Sport, Ortopedia e Traumatologia, Fondazione Poliambulanza, 25124 Brescia, Italy
- Biomedical Sciences Area, IUSS University School for Advanced Studies, 27100 Pavia, Italy
| |
Collapse
|
26
|
Antonie NI, Gheorghe G, Ionescu VA, Tiucă LC, Diaconu CC. The Role of ChatGPT and AI Chatbots in Optimizing Antibiotic Therapy: A Comprehensive Narrative Review. Antibiotics (Basel) 2025; 14:60. [PMID: 39858346 PMCID: PMC11761957 DOI: 10.3390/antibiotics14010060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2024] [Revised: 01/03/2025] [Accepted: 01/07/2025] [Indexed: 01/27/2025] Open
Abstract
Background/Objectives: Antimicrobial resistance represents a growing global health crisis, demanding innovative approaches to improve antibiotic stewardship. Artificial intelligence (AI) chatbots based on large language models have shown potential as tools to support clinicians, especially non-specialists, in optimizing antibiotic therapy. This review aims to synthesize current evidence on the capabilities, limitations, and future directions for AI chatbots in enhancing antibiotic selection and patient outcomes. Methods: A narrative review was conducted by analyzing studies published in the last five years across databases such as PubMed, SCOPUS, Web of Science, and Google Scholar. The review focused on research discussing AI-based chatbots, antibiotic stewardship, and clinical decision support systems. Studies were evaluated for methodological soundness and significance, and the findings were synthesized narratively. Results: Current evidence highlights the ability of AI chatbots to assist in guideline-based antibiotic recommendations, improve medical education, and enhance clinical decision-making. Promising results include satisfactory accuracy in preliminary diagnostic and prescriptive tasks. However, challenges such as inconsistent handling of clinical nuances, susceptibility to unsafe advice, algorithmic biases, data privacy concerns, and limited clinical validation underscore the importance of human oversight and refinement. Conclusions: AI chatbots have the potential to complement antibiotic stewardship efforts by promoting appropriate antibiotic use and improving patient outcomes. Realizing this potential will require rigorous clinical trials, interdisciplinary collaboration, regulatory clarity, and tailored algorithmic improvements to ensure their safe and effective integration into clinical practice.
Collapse
Affiliation(s)
- Ninel Iacobus Antonie
- Faculty of Medicine, University of Medicine and Pharmacy Carol Davila Bucharest, 050474 Bucharest, Romania; (N.I.A.); (V.A.I.); (C.C.D.)
- Internal Medicine Department, Clinical Emergency Hospital of Bucharest, 105402 Bucharest, Romania
| | - Gina Gheorghe
- Faculty of Medicine, University of Medicine and Pharmacy Carol Davila Bucharest, 050474 Bucharest, Romania; (N.I.A.); (V.A.I.); (C.C.D.)
- Internal Medicine Department, Clinical Emergency Hospital of Bucharest, 105402 Bucharest, Romania
| | - Vlad Alexandru Ionescu
- Faculty of Medicine, University of Medicine and Pharmacy Carol Davila Bucharest, 050474 Bucharest, Romania; (N.I.A.); (V.A.I.); (C.C.D.)
- Internal Medicine Department, Clinical Emergency Hospital of Bucharest, 105402 Bucharest, Romania
| | - Loredana-Crista Tiucă
- Faculty of Medicine, University of Medicine and Pharmacy Carol Davila Bucharest, 050474 Bucharest, Romania; (N.I.A.); (V.A.I.); (C.C.D.)
- Internal Medicine Department, Clinical Emergency Hospital of Bucharest, 105402 Bucharest, Romania
| | - Camelia Cristina Diaconu
- Faculty of Medicine, University of Medicine and Pharmacy Carol Davila Bucharest, 050474 Bucharest, Romania; (N.I.A.); (V.A.I.); (C.C.D.)
- Internal Medicine Department, Clinical Emergency Hospital of Bucharest, 105402 Bucharest, Romania
- Academy of Romanian Scientists, 050045 Bucharest, Romania
| |
Collapse
|
27
|
Austin J, Benas K, Caicedo S, Imiolek E, Piekutowski A, Ghanim I. Perceptions of Artificial Intelligence and ChatGPT by Speech-Language Pathologists and Students. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2025; 34:174-200. [PMID: 39496075 DOI: 10.1044/2024_ajslp-24-00218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2024]
Abstract
PURPOSE This project explores the perceived implications of artificial intelligence (AI) tools and generative language tools, like ChatGPT, on practice in speech-language pathology. METHOD A total of 107 clinician (n = 60) and student (n = 47) participants completed an 87-item survey that included Likert-style questions and open-ended qualitative responses. The survey explored participants' current frequency of use, experience with AI tools, ethical concerns, and concern with replacing clinicians, as well as likelihood to use in particular professional and clinical areas. Results were analyzed in the context of qualitative responses to typed-response open-ended questions. RESULTS A series of analyses indicated participants are somewhat knowledgeable and experienced with GPT software and other AI tools. Despite a positive outlook and the belief that AI tools are helpful for practice, programs like ChatGPT and other AI tools are infrequently used by speech-language pathologists and students for clinical purposes, mostly restricted to administrative tasks. CONCLUSION While impressions of GPT and other AI tools cite the beneficial ways that AI tools can enhance a clinician's workloads, participants indicate a hesitancy to use AI tools and call for institutional guidelines and training for its adoption.
Collapse
Affiliation(s)
- Julianna Austin
- Department of Communication Sciences and Disorders, Kean University, Union, NJ
| | - Keith Benas
- Department of Communication Sciences and Disorders, Kean University, Union, NJ
| | - Sara Caicedo
- Department of Communication Sciences and Disorders, Kean University, Union, NJ
| | - Emily Imiolek
- Department of Communication Sciences and Disorders, Kean University, Union, NJ
| | - Anna Piekutowski
- Department of Communication Sciences and Disorders, Kean University, Union, NJ
| | - Iyad Ghanim
- Department of Communication Sciences and Disorders, Kean University, Union, NJ
| |
Collapse
|
28
|
Foltyn-Dumitru M, Rastogi A, Cho J, Schell M, Mahmutoglu MA, Kessler T, Sahm F, Wick W, Bendszus M, Brugnara G, Vollmuth P. The potential of GPT-4 advanced data analysis for radiomics-based machine learning models. Neurooncol Adv 2025; 7:vdae230. [PMID: 39780768 PMCID: PMC11707530 DOI: 10.1093/noajnl/vdae230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025] Open
Abstract
Background This study aimed to explore the potential of the Advanced Data Analytics (ADA) package of GPT-4 to autonomously develop machine learning models (MLMs) for predicting glioma molecular types using radiomics from MRI. Methods Radiomic features were extracted from preoperative MRI of n = 615 newly diagnosed glioma patients to predict glioma molecular types (IDH-wildtype vs IDH-mutant 1p19q-codeleted vs IDH-mutant 1p19q-non-codeleted) with a multiclass ML approach. Specifically, ADA was used to autonomously develop an ML pipeline and benchmark performance against an established handcrafted model using various MRI normalization methods (N4, Zscore, and WhiteStripe). External validation was performed on 2 public glioma datasets D2 (n = 160) and D3 (n = 410). Results GPT-4 achieved the highest accuracy of 0.820 (95% CI = 0.819-0.821) on the D3 dataset with N4/WS normalization, significantly outperforming the benchmark model's accuracy of 0.678 (95% CI = 0.677-0.680) (P < .001). Class-wise analysis showed performance variations across different glioma types. In the IDH-wildtype group, GPT-4 had a recall of 0.997 (95% CI = 0.997-0.997), surpassing the benchmark's 0.742 (95% CI = 0.740-0.743). For the IDH-mut 1p/19q-non-codel group, GPT-4's recall was 0.275 (95% CI = 0.272-0.279), lower than the benchmark's 0.426 (95% CI = 0.423-0.430). In the IDH-mut 1p/19q-codel group, GPT-4's recall was 0.199 (95% CI = 0.191-0.206), below the benchmark's 0.730 (95% CI = 0.721-0.738). On the D2 dataset, GPT-4's accuracy was significantly lower (P < .001) than the benchmark's, with N4/WS achieving 0.668 (95% CI = 0.666-0.671) compared with 0.719 (95% CI = 0.717-0.722) (P < .001). Class-wise analysis revealed the same pattern as observed in D3. Conclusions GPT-4 can autonomously develop radiomics-based MLMs, achieving performance comparable to handcrafted MLMs. However, its poorer class-wise performance due to unbalanced datasets shows limitations in handling complete end-to-end ML pipelines.
Collapse
Affiliation(s)
- Martha Foltyn-Dumitru
- Division for Computational Radiology & Clinical AI (CCIBonn.ai), Department of Neuroradiology, Bonn University Hospital, Bonn, Germany
- Division for Computational Neuroimaging, Heidelberg University Hospital, Heidelberg, Germany
- Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Aditya Rastogi
- Division for Computational Radiology & Clinical AI (CCIBonn.ai), Department of Neuroradiology, Bonn University Hospital, Bonn, Germany
- Division for Computational Neuroimaging, Heidelberg University Hospital, Heidelberg, Germany
- Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Jaeyoung Cho
- Division for Computational Radiology & Clinical AI (CCIBonn.ai), Department of Neuroradiology, Bonn University Hospital, Bonn, Germany
- Division for Computational Neuroimaging, Heidelberg University Hospital, Heidelberg, Germany
- Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Marianne Schell
- Division for Computational Neuroimaging, Heidelberg University Hospital, Heidelberg, Germany
- Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Mustafa Ahmed Mahmutoglu
- Division for Computational Neuroimaging, Heidelberg University Hospital, Heidelberg, Germany
- Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Tobias Kessler
- Clinical Cooperation Unit Neurooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Neurology and Neurooncology Program, Heidelberg University Hospital, Heidelberg University, Heidelberg, Germany
| | - Felix Sahm
- Clinical Cooperation Unit Neuropathology, German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Neuropathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Wolfgang Wick
- Clinical Cooperation Unit Neurooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Neurology and Neurooncology Program, Heidelberg University Hospital, Heidelberg University, Heidelberg, Germany
| | - Martin Bendszus
- Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Gianluca Brugnara
- Division for Medical Image Computing (MIC), German Cancer Research Center (DKFZ), Heidelberg, Germany
- Division for Computational Radiology & Clinical AI (CCIBonn.ai), Department of Neuroradiology, Bonn University Hospital, Bonn, Germany
- Division for Computational Neuroimaging, Heidelberg University Hospital, Heidelberg, Germany
- Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Philipp Vollmuth
- Division for Medical Image Computing (MIC), German Cancer Research Center (DKFZ), Heidelberg, Germany
- Division for Computational Radiology & Clinical AI (CCIBonn.ai), Department of Neuroradiology, Bonn University Hospital, Bonn, Germany
- Division for Computational Neuroimaging, Heidelberg University Hospital, Heidelberg, Germany
- Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany
| |
Collapse
|
29
|
Doreswamy N, Horstmanshof L. Generative AI Decision-Making Attributes in Complex Health Services: A Rapid Review. Cureus 2025; 17:e78257. [PMID: 40026934 PMCID: PMC11871968 DOI: 10.7759/cureus.78257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/29/2025] [Indexed: 03/05/2025] Open
Abstract
The advent of Generative Artificial Intelligence (Generative AI or GAI) marks a significant inflection point in AI development. Long viewed as the epitome of reasoning and logic, Generative AI incorporates programming rules that are normative. However, it also has a descriptive component based on its programmers' subjective preferences and any discrepancies in the underlying data. Generative AI generates both truth and falsehood, supports both ethical and unethical decisions, and is neither transparent nor accountable. These factors pose clear risks to optimal decision-making in complex health services such as health policy and health regulation. It is important to examine how Generative AI makes decisions both from a rational, normative perspective and from a descriptive point of view to ensure an ethical approach to Generative AI design, engineering, and use. The objective is to provide a rapid review that identifies and maps attributes reported in the literature that influence Generative AI decision-making in complex health services. This review provides a clear, reproducible methodology that is reported in accordance with a recognised framework and Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 standards adapted for a rapid review. Inclusion and exclusion criteria were developed, and a database search was undertaken within four search systems: ProQuest, Scopus, Web of Science, and Google Scholar. The results include articles published in 2023 and early 2024. A total of 1,550 articles were identified. After removing duplicates, 1,532 articles remained. Of these, 1,511 articles were excluded based on the selection criteria and a total of 21 articles were selected for analysis. Learning, understanding, and bias were the most frequently mentioned Generative AI attributes. Generative AI brings the promise of advanced automation, but carries significant risk. Learning and pattern recognition are helpful, but the lack of a moral compass, empathy, consideration for privacy, and a propensity for bias and hallucination are detrimental to good decision-making. The results suggest that there is, perhaps, more work to be done before Generative AI can be applied to complex health services.
Collapse
Affiliation(s)
- Nandini Doreswamy
- Faculty of Health Sciences, Southern Cross University, Lismore, AUS
- Health Sciences, National Coalition of Independent Scholars, Canberra, AUS
| | | |
Collapse
|
30
|
Liu J, Koopman B, Brown NJ, Chu K, Nguyen A. Generating synthetic clinical text with local large language models to identify misdiagnosed limb fractures in radiology reports. Artif Intell Med 2025; 159:103027. [PMID: 39580897 DOI: 10.1016/j.artmed.2024.103027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 09/26/2024] [Accepted: 11/15/2024] [Indexed: 11/26/2024]
Abstract
Large language models (LLMs) demonstrate impressive capabilities in generating human-like content and have much potential to improve the performance and efficiency of healthcare. An important application of LLMs is to generate synthetic clinical reports that could alleviate the burden of annotating and collecting real-world data in training AI models. Meanwhile, there could be concerns and limitations in using commercial LLMs to handle sensitive clinical data. In this study, we examined the use of open-source LLMs as an alternative to generate synthetic radiology reports to supplement real-world annotated data. We found LLMs hosted locally can achieve similar performance compared to ChatGPT and GPT-4 in augmenting training data for the downstream report classification task of identifying misdiagnosed fractures. We also examined the predictive value of using synthetic reports alone for training downstream models, where our best setting achieved more than 90 % of the performance using real-world data. Overall, our findings show that open-source, local LLMs can be a favourable option for creating synthetic clinical reports for downstream tasks.
Collapse
Affiliation(s)
- Jinghui Liu
- Australian e-Health Research Centre, CSIRO, Brisbane, Queensland, Australia.
| | - Bevan Koopman
- Australian e-Health Research Centre, CSIRO, Brisbane, Queensland, Australia
| | - Nathan J Brown
- Emergency and Trauma Centre, Royal Brisbane and Women's Hospital, Brisbane, Queensland, Australia
| | - Kevin Chu
- Emergency and Trauma Centre, Royal Brisbane and Women's Hospital, Brisbane, Queensland, Australia
| | - Anthony Nguyen
- Australian e-Health Research Centre, CSIRO, Brisbane, Queensland, Australia
| |
Collapse
|
31
|
Heisinger S, Salzmann SN, Senker W, Aspalter S, Oberndorfer J, Matzner MP, Stienen MN, Motov S, Huber D, Grohs JG. ChatGPT's Performance in Spinal Metastasis Cases-Can We Discuss Our Complex Cases with ChatGPT? J Clin Med 2024; 13:7864. [PMID: 39768787 PMCID: PMC11727723 DOI: 10.3390/jcm13247864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 12/11/2024] [Accepted: 12/19/2024] [Indexed: 01/06/2025] Open
Abstract
Background: The integration of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT-4, is transforming healthcare. ChatGPT's potential to assist in decision-making for complex cases, such as spinal metastasis treatment, is promising but widely untested. Especially in cancer patients who develop spinal metastases, precise and personalized treatment is essential. This study examines ChatGPT-4's performance in treatment planning for spinal metastasis cases compared to experienced spine surgeons. Materials and Methods: Five spine metastasis cases were randomly selected from recent literature. Consequently, five spine surgeons and ChatGPT-4 were tasked with providing treatment recommendations for each case in a standardized manner. Responses were analyzed for frequency distribution, agreement, and subjective rater opinions. Results: ChatGPT's treatment recommendations aligned with the majority of human raters in 73% of treatment choices, with moderate to substantial agreement on systemic therapy, pain management, and supportive care. However, ChatGPT's recommendations tended towards generalized statements, with raters noting its generalized answers. Agreement among raters improved in sensitivity analyses excluding ChatGPT, particularly for controversial areas like surgical intervention and palliative care. Conclusions: ChatGPT shows potential in aligning with experienced surgeons on certain treatment aspects of spinal metastasis. However, its generalized approach highlights limitations, suggesting that training with specific clinical guidelines could potentially enhance its utility in complex case management. Further studies are necessary to refine AI applications in personalized healthcare decision-making.
Collapse
Affiliation(s)
- Stephan Heisinger
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Stephan N. Salzmann
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Wolfgang Senker
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Stefan Aspalter
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Johannes Oberndorfer
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Michael P. Matzner
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Martin N. Stienen
- Spine Center of Eastern Switzerland & Department of Neurosurgery, Kantonsspital St. Gallen, Medical School of St. Gallen, University of St.Gallen, 9000 St. Gallen, Switzerland
| | - Stefan Motov
- Spine Center of Eastern Switzerland & Department of Neurosurgery, Kantonsspital St. Gallen, Medical School of St. Gallen, University of St.Gallen, 9000 St. Gallen, Switzerland
| | - Dominikus Huber
- Division of Oncology, Department of Medicine I, Medical University of Vienna, 1090 Vienna, Austria
| | - Josef Georg Grohs
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| |
Collapse
|
32
|
Başaran M, Duman C. Dialogues with artificial intelligence: Exploring medical students' perspectives on ChatGPT. MEDICAL TEACHER 2024:1-10. [PMID: 39692300 DOI: 10.1080/0142159x.2024.2438766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 12/03/2024] [Indexed: 12/19/2024]
Abstract
ChatGPT has initiated a new era of inquiry into sources of information within the scientific community. Studies leveraging ChatGPT in the medical field have demonstrated notable performance in academic processes and healthcare applications. This research presents how medical students have benefited from ChatGPT during their educational journey and the challenges they encountered, as reported through their personal experiences. The methodological framework of this study adheres to the stages of qualitative research. An explanatory case study, a qualitative research method, was adopted to determine user experiences with ChatGPT. Content analysis based on student experiences with ChatGPT indicates that it may offer advantages in health education as a resource for scientific research activities. However, adverse reports were also identified, including ethical issues, lack of personal data protection, and potential misuse in scientific research. This study emphasizes the need for comprehensive steps in effectively integrating AI tools like ChatGPT into medical education as a new technology.
Collapse
Affiliation(s)
- Mehmet Başaran
- Curriculum and Instruction, Gaziantep University, Gaziantep, Turkey
| | - Cevahir Duman
- Curriculum and Instruction, Gaziantep University, Gaziantep, Turkey
| |
Collapse
|
33
|
Piras A, Mastroleo F, Colciago RR, Morelli I, D'Aviero A, Longo S, Grassi R, Iorio GC, De Felice F, Boldrini L, Desideri I, Salvestrini V. How Italian radiation oncologists use ChatGPT: a survey by the young group of the Italian association of radiotherapy and clinical oncology (yAIRO). LA RADIOLOGIA MEDICA 2024:10.1007/s11547-024-01945-1. [PMID: 39690359 DOI: 10.1007/s11547-024-01945-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 12/11/2024] [Indexed: 12/19/2024]
Abstract
PURPOSE To investigate the awareness and the spread of ChatGPT and its possible role in both scientific research and clinical practice among the young radiation oncologists (RO). MATERIAL AND METHODS An anonymous, online survey via Google Forms (including 24 questions) was distributed among young (< 40 years old) ROs in Italy through the yAIRO network, from March 15, 2024, to 31, 2024. These ROs were officially registered with yAIRO in 2023. We particularly focused on the emerging use of ChatGPT and its future perspectives in clinical practice. RESULTS A total of 76 young physicians answered the survey. Seventy-three participants declared to be familiar with ChatGPT, and 71.1% of the surveyed physicians have already used ChatGPT. Thirty-one (40.8%) participants strongly agreed that AI has the potential to change the medical landscape in the future. Additionally, 79.1% of respondents agreed that AI will be mainly successful in research processes such as literature review and drafting articles/protocols. The belief in ChatGPT's potential results in direct use in daily practice in 43.4% of the cases, with mainly a fair grade of satisfaction (43.2%). A large part of participants (69.7%) believes in the implementation of ChatGPT into clinical practice, even though 53.9% fear an overall negative impact. CONCLUSIONS The results of the present survey clearly highlight the attitude of young Italian ROs toward the implementation of ChatGPT into clinical and academic RO practice. ChatGPT is considered a valuable and effective tool that can ease current and future workflows.
Collapse
Affiliation(s)
- Antonio Piras
- UO Radioterapia Oncologica, Villa Santa Teresa, 90011, Bagheria, Palermo, Italy
- Ri.Med Foundation, 90133, Palermo, Italy
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Molecular and Clinical Medicine, University of Palermo, 90127, Palermo, Italy
- Radiation Oncology, Mater Olbia Hospital, Olbia, Sassari, Italy
| | - Federico Mastroleo
- Division of Radiation Oncology, IEO European Institute of Oncology IRCCS, 20141, Milan, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, 20141, Milan, Italy
| | - Riccardo Ray Colciago
- School of Medicine and Surgery, University of Milano Bicocca, Piazza Dell'Ateneo Nuovo, 1, 20126, Milan, Italy.
| | - Ilaria Morelli
- Radiation Oncology Unit, Department of Experimental and Clinical Biomedical Sciences, Azienda Ospedaliero-Universitaria Careggi, University of Florence, Florence, Italy
| | - Andrea D'Aviero
- Department of Radiation Oncology, "S.S Annunziata" Chieti Hospital, Chieti, Italy
- Department of Medical, Oral and Biotechnogical Sciences, "G.D'Annunzio" University of Chieti, Chieti, Italy
| | - Silvia Longo
- UOC Radioterapia Oncologica, Fondazione Policlinico Universitario "A. Gemelli" IRCCS, Rome, Italy
| | - Roberta Grassi
- Department of Precision Medicine, University of Campania "L. Vanvitelli", Naples, Italy
| | | | - Francesca De Felice
- Radiation Oncology, Policlinico Umberto I, Department of Radiological, Oncological and Pathological Sciences, "Sapienza" University of Rome, Rome, Italy
| | - Luca Boldrini
- UOC Radioterapia Oncologica, Fondazione Policlinico Universitario "A. Gemelli" IRCCS, Rome, Italy
- Università Cattolica del Sacro Cuore, Rome, Italy
| | - Isacco Desideri
- Radiation Oncology Unit, Department of Experimental and Clinical Biomedical Sciences, Azienda Ospedaliero-Universitaria Careggi, University of Florence, Florence, Italy
| | - Viola Salvestrini
- Radiation Oncology Unit, Department of Experimental and Clinical Biomedical Sciences, Azienda Ospedaliero-Universitaria Careggi, University of Florence, Florence, Italy
| |
Collapse
|
34
|
Chung D, Sidhom K, Dhillon H, Bal DS, Fidel MG, Jawanda G, Patel P. Real-world utility of ChatGPT in pre-vasectomy counselling, a safe and efficient practice: a prospective single-centre clinical study. World J Urol 2024; 43:32. [PMID: 39673635 DOI: 10.1007/s00345-024-05385-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 11/15/2024] [Indexed: 12/16/2024] Open
Abstract
PURPOSE This study sought to assess if pre-vasectomy counselling with ChatGPT can safely streamline the consultation process by reducing visit times and increasing patient satisfaction. METHODS A single-institution randomized pilot study was conducted to evaluate the safety and efficacy of ChatGPT for pre-vasectomy counselling. All adult patients interested in undergoing a vasectomy were included. Unwillingness to provide consent or not having internet access constituted exclusion. Patients were randomized 1:1 to ChatGPT with standard in-person or in-person consultation without ChatGPT. Length of visit, number of questions asked, and a Likert scale questionnaire (on a scale of 10, with 10 being defined as great and 0 being defined as poor), were collected. Descriptive statistics and a comparative analysis were performed. RESULTS 18 patients were included with a mean age of 35.8 ± 5.4 (n = 9) in the intervention arm and 36.9 ± 7.4 (n = 9) in the control arm. Pre-vasectomy counselling with ChatGPT was associated with a higher provider perception of patient understanding of the procedure (8.8 ± 1.0 vs. 6.7 ± 2.8; p = 0.047) and a decreased length of in-person consultation (7.7 ± 2.3 min vs. 10.6 ± 3.4 min; p = 0.05). Quality of information provided by ChatGPT, ease of use, and overall experience were rated highly at 8.3 ± 1.9, 9.1 ± 1.5, and 8.6 ± 1.7, respectively. CONCLUSIONS ChatGPT for pre-vasectomy counselling improved the efficiency of consultations and the provider's perception of the patient's understanding of the procedure.
Collapse
Affiliation(s)
- David Chung
- Section of Urology, Department of Surgery, University of Manitoba, AD203-720 McDermot Avenue, Winnipeg, Manitoba, R3N 1B1, Canada.
| | - Karim Sidhom
- Section of Urology, Department of Surgery, University of Manitoba, AD203-720 McDermot Avenue, Winnipeg, Manitoba, R3N 1B1, Canada
| | | | - Dhiraj S Bal
- Max Rady College of Medicine, University of Manitoba, Winnipeg, MB, Canada
| | - Maximilian G Fidel
- Max Rady College of Medicine, University of Manitoba, Winnipeg, MB, Canada
| | - Gary Jawanda
- Manitoba Men's Health Clinic, Winnipeg, MB, Canada
| | - Premal Patel
- Section of Urology, Department of Surgery, University of Manitoba, AD203-720 McDermot Avenue, Winnipeg, Manitoba, R3N 1B1, Canada
- Manitoba Men's Health Clinic, Winnipeg, MB, Canada
| |
Collapse
|
35
|
Koyama H, Kashio A, Yamasoba T. Application of Artificial Intelligence in Otology: Past, Present, and Future. J Clin Med 2024; 13:7577. [PMID: 39768500 PMCID: PMC11727971 DOI: 10.3390/jcm13247577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Revised: 12/11/2024] [Accepted: 12/11/2024] [Indexed: 01/16/2025] Open
Abstract
Artificial Intelligence (AI) is a concept whose goal is to imitate human intellectual activity in computers. It emerged in the 1950s and has gone through three booms. We are in the third boom, and it will continue. Medical applications of AI include diagnosing otitis media from images of the eardrum, often outperforming human doctors. Temporal bone CT and MRI analyses also benefit from AI, with segmentation accuracy improved in anatomically significant structures or diagnostic accuracy improved in conditions such as otosclerosis and vestibular schwannoma. In treatment, AI predicts hearing outcomes for sudden sensorineural hearing loss and post-operative hearing outcomes for patients who have undergone tympanoplasty. AI helps patients with hearing aids hear in challenging situations, such as in noisy environments or when multiple people are speaking. It also provides fitting information to help improve hearing with hearing aids. AI also improves cochlear implant mapping and outcome prediction, even in cases of cochlear malformation. Future trends include generative AI, such as ChatGPT, which can provide medical advice and information, although its reliability and application in clinical settings requires further investigation.
Collapse
Affiliation(s)
- Hajime Koyama
- Department of Otolaryngology and Head and Neck Surgery, Faculty of Medicine, University of Tokyo, Tokyo 113-8655, Japan (A.K.)
| | - Akinori Kashio
- Department of Otolaryngology and Head and Neck Surgery, Faculty of Medicine, University of Tokyo, Tokyo 113-8655, Japan (A.K.)
| | - Tatsuya Yamasoba
- Department of Otolaryngology and Head and Neck Surgery, Faculty of Medicine, University of Tokyo, Tokyo 113-8655, Japan (A.K.)
- Department of Otolaryngology, Tokyo Teishin Hospital, Tokyo 102-8798, Japan
| |
Collapse
|
36
|
Yanagita Y, Yokokawa D, Uchida S, Li Y, Uehara T, Ikusaka M. Can AI-Generated Clinical Vignettes in Japanese Be Used Medically and Linguistically? J Gen Intern Med 2024; 39:3282-3289. [PMID: 39313665 PMCID: PMC11618267 DOI: 10.1007/s11606-024-09031-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 09/10/2024] [Indexed: 09/25/2024]
Abstract
BACKGROUND Creating clinical vignettes requires considerable effort. Recent developments in generative artificial intelligence (AI) for natural language processing have been remarkable and may allow for the easy and immediate creation of diverse clinical vignettes. OBJECTIVE In this study, we evaluated the medical accuracy and grammatical correctness of AI-generated clinical vignettes in Japanese and verified their usefulness. METHODS Clinical vignettes were created using the generative AI model GPT-4-0613. The input prompts for the clinical vignettes specified the following seven elements: (1) age, (2) sex, (3) chief complaint and time course since onset, (4) physical findings, (5) examination results, (6) diagnosis, and (7) treatment course. The list of diseases integrated into the vignettes was based on 202 cases considered in the management of diseases and symptoms in Japan's Primary Care Physicians Training Program. The clinical vignettes were evaluated for medical and Japanese-language accuracy by three physicians using a five-point scale. A total score of 13 points or above was defined as "sufficiently beneficial and immediately usable with minor revisions," a score between 10 and 12 points was defined as "partly insufficient and in need of modifications," and a score of 9 points or below was defined as "insufficient." RESULTS Regarding medical accuracy, of the 202 clinical vignettes, 118 scored 13 points or above, 78 scored between 10 and 12 points, and 6 scored 9 points or below. Regarding Japanese-language accuracy, 142 vignettes scored 13 points or above, 56 scored between 10 and 12 points, and 4 scored 9 points or below. Overall, 97% (196/202) of vignettes were available with some modifications. CONCLUSION Overall, 97% of the clinical vignettes proved practically useful, based on confirmation and revision by Japanese medical physicians. Given the significant effort required by physicians to create vignettes without AI, using GPT is expected to greatly optimize this process.
Collapse
Affiliation(s)
- Yasutaka Yanagita
- Department of General Medicine, Chiba University Hospital, Chiba, Japan.
| | - Daiki Yokokawa
- Department of General Medicine, Chiba University Hospital, Chiba, Japan
| | - Shun Uchida
- Uchida Internal Medicine Clinic, Saitama, Japan
| | - Yu Li
- Department of General Medicine, Chiba University Hospital, Chiba, Japan
| | - Takanori Uehara
- Department of General Medicine, Chiba University Hospital, Chiba, Japan
| | - Masatomi Ikusaka
- Department of General Medicine, Chiba University Hospital, Chiba, Japan
| |
Collapse
|
37
|
Sarumi OA, Heider D. Large language models and their applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:3498-3505. [PMID: 39435343 PMCID: PMC11493188 DOI: 10.1016/j.csbj.2024.09.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 09/30/2024] [Accepted: 09/30/2024] [Indexed: 10/23/2024] Open
Abstract
Recent advancements in Natural Language Processing (NLP) have been significantly driven by the development of Large Language Models (LLMs), representing a substantial leap in language-based technology capabilities. These models, built on sophisticated deep learning architectures, typically transformers, are characterized by billions of parameters and extensive training data, enabling them to achieve high accuracy across various tasks. The transformer architecture of LLMs allows them to effectively handle context and sequential information, which is crucial for understanding and generating human language. Beyond traditional NLP applications, LLMs have shown significant promise in bioinformatics, transforming the field by addressing challenges associated with large and complex biological datasets. In genomics, proteomics, and personalized medicine, LLMs facilitate identifying patterns, predicting protein structures, or understanding genetic variations. This capability is crucial, e.g., for advancing drug discovery, where accurate prediction of molecular interactions is essential. This review discusses the current trends in LLMs research and their potential to revolutionize the field of bioinformatics and accelerate novel discoveries in the life sciences.
Collapse
Affiliation(s)
- Oluwafemi A. Sarumi
- University of Münster, Institute of Medical Informatics, Albert-Schweitzer-Campus, Münster, 48149, Germany
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, Germany
| | - Dominik Heider
- University of Münster, Institute of Medical Informatics, Albert-Schweitzer-Campus, Münster, 48149, Germany
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, Germany
| |
Collapse
|
38
|
Denecke K, Gabarron E. The ethical aspects of integrating sentiment and emotion analysis in chatbots for depression intervention. Front Psychiatry 2024; 15:1462083. [PMID: 39611131 PMCID: PMC11602467 DOI: 10.3389/fpsyt.2024.1462083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 10/18/2024] [Indexed: 11/30/2024] Open
Abstract
Introduction Digital health interventions specifically those realized as chatbots are increasingly available for mental health. They include technologies based on artificial intelligence that assess user's sentiment and emotions for the purpose of responding in an empathetic way, or for treatment purposes, e.g. for analyzing the expressed emotions and suggesting interventions. Methods In this paper, we study the ethical dimensions of integrating these technologies in chatbots for depression intervention using the digital ethics canvas and the DTx Risk Assessment Canvas. Results As result, we identified some specific risks associated with the integration of sentiment and emotion analysis methods into these systems related to the difficulty to recognize correctly the expressed sentiment or emotion from statements of individuals with depressive symptoms and the appropriate system reaction including risk detection. Depending on the realization of the sentiment or emotion analysis, which might be dictionary-based or machine-learning based, additional risks occur from biased training data or misinterpretations. Discussion While technology decisions during system development can be made carefully depending on the use case, other ethical risks cannot be prevented on a technical level, but by carefully integrating such chatbots into the care process allowing for supervision by health professionals. We conclude that a careful reflection is needed when integrating sentiment and emotion analysis into chatbots for depression intervention. Balancing risk factors is key to leveraging technology in mental health in a way that enhances, rather than diminishes, user autonomy and agency.
Collapse
Affiliation(s)
- Kerstin Denecke
- AI for Health, Institute Patient-centered Digital Health, Bern University of Applied Sciences, Biel, Switzerland
| | - Elia Gabarron
- Department of Education, ICT and Learning, Østfold University College, Halden, Norway
- Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway
| |
Collapse
|
39
|
Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton E, Malin B, Yin Z. Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review. J Med Internet Res 2024; 26:e22769. [PMID: 39509695 PMCID: PMC11582494 DOI: 10.2196/22769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 09/19/2024] [Accepted: 10/03/2024] [Indexed: 11/15/2024] Open
Abstract
BACKGROUND The launch of ChatGPT (OpenAI) in November 2022 attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including health care. Numerous studies have since been conducted regarding how to use state-of-the-art LLMs in health-related scenarios. OBJECTIVE This review aims to summarize applications of and concerns regarding conversational LLMs in health care and provide an agenda for future research in this field. METHODS We used PubMed, ACM, and the IEEE digital libraries as primary sources for this review. We followed the guidance of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) to screen and select peer-reviewed research articles that (1) were related to health care applications and conversational LLMs and (2) were published before September 1, 2023, the date when we started paper collection. We investigated these papers and classified them according to their applications and concerns. RESULTS Our search initially identified 820 papers according to targeted keywords, out of which 65 (7.9%) papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT (60/65, 92% of papers), followed by Bard (Google LLC; 1/65, 2% of papers), LLaMA (Meta; 1/65, 2% of papers), and other LLMs (6/65, 9% papers). These papers were classified into four categories of applications: (1) summarization, (2) medical knowledge inquiry, (3) prediction (eg, diagnosis, treatment recommendation, and drug synergy), and (4) administration (eg, documentation and information collection), and four categories of concerns: (1) reliability (eg, training data quality, accuracy, interpretability, and consistency in responses), (2) bias, (3) privacy, and (4) public acceptability. There were 49 (75%) papers using LLMs for either summarization or medical knowledge inquiry, or both, and there are 58 (89%) papers expressing concerns about either reliability or bias, or both. We found that conversational LLMs exhibited promising results in summarization and providing general medical knowledge to patients with a relatively high accuracy. However, conversational LLMs such as ChatGPT are not always able to provide reliable answers to complex health-related tasks (eg, diagnosis) that require specialized domain expertise. While bias or privacy issues are often noted as concerns, no experiments in our reviewed papers thoughtfully examined how conversational LLMs lead to these issues in health care research. CONCLUSIONS Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications bring bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in health care.
Collapse
Affiliation(s)
- Leyao Wang
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
| | - Zhiyu Wan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
- School of Biomedical Engineering, ShanghaiTech University, Shanghai, China
| | - Congning Ni
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
| | - Qingyuan Song
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
| | - Yang Li
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
| | - Ellen Clayton
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, United States
- School of Law, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Bradley Malin
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Zhijun Yin
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| |
Collapse
|
40
|
Chow JCL, Li K. Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots Through Large Language Models. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024; 5:e64406. [PMID: 39321336 PMCID: PMC11579624 DOI: 10.2196/64406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 08/23/2024] [Accepted: 09/23/2024] [Indexed: 09/27/2024]
Abstract
The integration of chatbots in oncology underscores the pressing need for human-centered artificial intelligence (AI) that addresses patient and family concerns with empathy and precision. Human-centered AI emphasizes ethical principles, empathy, and user-centric approaches, ensuring technology aligns with human values and needs. This review critically examines the ethical implications of using large language models (LLMs) like GPT-3 and GPT-4 (OpenAI) in oncology chatbots. It examines how these models replicate human-like language patterns, impacting the design of ethical AI systems. The paper identifies key strategies for ethically developing oncology chatbots, focusing on potential biases arising from extensive datasets and neural networks. Specific datasets, such as those sourced from predominantly Western medical literature and patient interactions, may introduce biases by overrepresenting certain demographic groups. Moreover, the training methodologies of LLMs, including fine-tuning processes, can exacerbate these biases, leading to outputs that may disproportionately favor affluent or Western populations while neglecting marginalized communities. By providing examples of biased outputs in oncology chatbots, the review highlights the ethical challenges LLMs present and the need for mitigation strategies. The study emphasizes integrating human-centric values into AI to mitigate these biases, ultimately advocating for the development of oncology chatbots that are aligned with ethical principles and capable of serving diverse patient populations equitably.
Collapse
Affiliation(s)
- James C L Chow
- Department of Radiation Oncology, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Kay Li
- Department of English, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
41
|
Bai S, Zheng J, Wu W, Gao D, Gu X. Research on healthcare data sharing in the context of digital platforms considering the risks of data breaches. Front Public Health 2024; 12:1438579. [PMID: 39568601 PMCID: PMC11576462 DOI: 10.3389/fpubh.2024.1438579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Accepted: 10/21/2024] [Indexed: 11/22/2024] Open
Abstract
Background Within China's healthcare landscape, the sharing of medical data has emerged as a pivotal force propelling advancements in the insurance sector and enhancing patient engagement with healthcare services. However, medical institutions often exhibit reluctance toward data sharing due to apprehensions regarding data security and privacy safeguards. To navigate this conundrum, our research introduces and empirically validates a model grounded in evolutionary game theory, offering a robust theoretical framework and actionable strategies for facilitating healthcare data sharing while harmonizing the dual imperatives of data utility and privacy preservation. Methods In this paper, we construct an evolutionary game model involving medical institutions, big data innovation platforms, and insurance companies within the context of digital platforms. The model integrates exogenous causes of data breaches, endogenous causes of data breaches, compensation payments, government penalties, subsidies, unreasonable fees, claims efficiency, and insurance fraud. Results The stability analysis of the evolutionary game identifies eight equilibrium points among medical institutions, platforms, and insurance companies. Numerical simulations demonstrate convergence toward strategy E 7 = (0, 0, 1), suggesting a trend for medical institutions to adopt a fully anonymous information-sharing strategy, platforms to implement strict regulation, and insurance companies to opt for an auditing approach. Sensitivity analysis reveals that the parameters selected in this study significantly influence the players' behavioral choices and the game's equilibria. Conclusions When breaches occur, medical institutions tend to seek co-sharing between platforms and insurance companies. This promotes enhanced regulation by platforms and incentivizes insurance companies to perform audits. If the responsibility for the breach is attributed to the platform or the insurance company, the liability sharing system will push healthcare organizations to choose a fully anonymous information sharing strategy. Otherwise, medical institutions will choose partially anonymous information sharing for more benefits. In case of widespread data leakage, the amount of compensation shall augment, and the role of compensation shall replace the role of government supervision. Then, the government shall penalize them, which shall reduce the motivation of each subject.
Collapse
Affiliation(s)
- Shizhen Bai
- School of Management, Harbin University of Commerce, Harbin, China
| | - Jinjin Zheng
- School of Management, Harbin University of Commerce, Harbin, China
| | - Wenya Wu
- School of Management, Harbin University of Commerce, Harbin, China
| | - Dongrui Gao
- School of Management, Harbin University of Commerce, Harbin, China
| | - Xiujin Gu
- School of Management, Harbin University of Commerce, Harbin, China
| |
Collapse
|
42
|
Shi W, Xu R, Zhuang Y, Yu Y, Sun H, Wu H, Yang C, Wang MD. MedAdapter: Efficient Test-Time Adaptation of Large Language Models Towards Medical Reasoning. PROCEEDINGS OF THE CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING. CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING 2024; 2024:22294-22314. [PMID: 40028445 PMCID: PMC11868705 DOI: 10.18653/v1/2024.emnlp-main.1244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Despite their improved capabilities in generation and reasoning, adapting large language models (LLMs) to the biomedical domain remains challenging due to their immense size and privacy concerns. In this study, we propose MedAdapter, a unified post-hoc adapter for test-time adaptation of LLMs towards biomedical applications. Instead of fine-tuning the entire LLM, MedAdapter effectively adapts the original model by fine-tuning only a small BERT-sized adapter to rank candidate solutions generated by LLMs. Experiments on four biomedical tasks across eight datasets demonstrate that MedAdapter effectively adapts both white-box and black-box LLMs in biomedical reasoning, achieving average performance improvements of 18.24% and 10.96%, respectively, without requiring extensive computational resources or sharing data with third parties. MedAdapter also yields enhanced performance when combined with train-time adaptation, highlighting a flexible and complementary solution to existing adaptation methods. Faced with the challenges of balancing model performance, computational resources, and data privacy, MedAdapter provides an efficient, privacy-preserving, cost-effective, and transparent solution for adapting LLMs to the biomedical domain.
Collapse
|
43
|
Pooryousef V, Cordeil M, Besancon L, Bassed R, Dwyer T. Collaborative Forensic Autopsy Documentation and Supervised Report Generation Using a Hybrid Mixed-Reality Environment and Generative AI. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:7452-7462. [PMID: 39250385 DOI: 10.1109/tvcg.2024.3456212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Forensic investigation is a complex procedure involving experts working together to establish cause of death and report findings to legal authorities. While new technologies are being developed to provide better post-mortem imaging capabilities-including mixed-reality (MR) tools to support 3D visualisation of such data-these tools do not integrate seamlessly into their existing collaborative workflow and report authoring process, requiring extra steps, e.g. to extract imagery from the MR tool and combine with physical autopsy findings for inclusion in the report. Therefore, in this work we design and evaluate a new forensic autopsy report generation workflow and present a novel documentation system using hybrid mixed-reality approaches to integrate visualisation, voice and hand interaction, as well as collaboration and procedure recording. Our preliminary findings indicate that this approach has the potential to improve data management, aid reviewability, and thus, achieve more robust standards. Further, it potentially streamlines report generation and minimise dependency on external tools and assistance, reducing autopsy time and related costs. This system also offers significant potential for education. A free copy of this paper and all supplemental materials are available at https://osf.io/ygfzx.
Collapse
|
44
|
Tayebi Arasteh S, Siepmann R, Huppertz M, Lotfinia M, Puladi B, Kuhl C, Truhn D, Nebelung S. The Treasure Trove Hidden in Plain Sight: The Utility of GPT-4 in Chest Radiograph Evaluation. Radiology 2024; 313:e233441. [PMID: 39530893 DOI: 10.1148/radiol.233441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Background Limited statistical knowledge can slow critical engagement with and adoption of artificial intelligence (AI) tools for radiologists. Large language models (LLMs) such as OpenAI's GPT-4, and notably its Advanced Data Analysis (ADA) extension, may improve the adoption of AI in radiology. Purpose To validate GPT-4 ADA outputs when autonomously conducting analyses of varying complexity on a multisource clinical dataset. Materials and Methods In this retrospective study, unique itemized radiologic reports of bedside chest radiographs, associated demographic data, and laboratory markers of inflammation from patients in intensive care from January 2009 to December 2019 were evaluated. GPT-4 ADA, accessed between December 2023 and January 2024, was tasked with autonomously analyzing this dataset by plotting radiography usage rates, providing descriptive statistics measures, quantifying factors of pulmonary opacities, and setting up machine learning (ML) models to predict their presence. Three scientists with 6-10 years of ML experience validated the outputs by verifying the methodology, assessing coding quality, re-executing the provided code, and comparing ML models head-to-head with their human-developed counterparts (based on the area under the receiver operating characteristic curve [AUC], accuracy, sensitivity, and specificity). Statistical significance was evaluated using bootstrapping. Results A total of 43 788 radiograph reports, with their laboratory values, from University Hospital RWTH Aachen were evaluated from 43 788 patients (mean age, 66 years ± 15 [SD]; 26 804 male). While GPT-4 ADA provided largely appropriate visualizations, descriptive statistical measures, quantitative statistical associations based on logistic regression, and gradient boosting machines for the predictive task (AUC, 0.75), some statistical errors and inaccuracies were encountered. ML strategies were valid and based on consistent coding routines, resulting in valid outputs on par with human specialist-developed reference models (AUC, 0.80 [95% CI: 0.80, 0.81] vs 0.80 [95% CI: 0.80, 0.81]; P = .51) (accuracy, 79% [6910 of 8758 patients] vs 78% [6875 of 8758 patients], respectively; P = .27). Conclusion LLMs may facilitate data analysis in radiology, from basic statistics to advanced ML-based predictive modeling. © RSNA, 2024 Supplemental material is available for this article.
Collapse
Affiliation(s)
- Soroosh Tayebi Arasteh
- From the Department of Diagnostic and Interventional Radiology (S.T.A., R.S., M.H., M.L., C.K., D.T., S.N.), Department of Oral and Maxillofacial Surgery (B.P.), and Institute of Medical Informatics (B.P.), University Hospital RWTH Aachen, Pauwelsstr 30, 52074 Aachen, Germany; Pattern Recognition Laboratory, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany (S.T.A.); and Institute of Heat and Mass Transfer, RWTH Aachen University, Aachen, Germany (M.L.)
| | - Robert Siepmann
- From the Department of Diagnostic and Interventional Radiology (S.T.A., R.S., M.H., M.L., C.K., D.T., S.N.), Department of Oral and Maxillofacial Surgery (B.P.), and Institute of Medical Informatics (B.P.), University Hospital RWTH Aachen, Pauwelsstr 30, 52074 Aachen, Germany; Pattern Recognition Laboratory, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany (S.T.A.); and Institute of Heat and Mass Transfer, RWTH Aachen University, Aachen, Germany (M.L.)
| | - Marc Huppertz
- From the Department of Diagnostic and Interventional Radiology (S.T.A., R.S., M.H., M.L., C.K., D.T., S.N.), Department of Oral and Maxillofacial Surgery (B.P.), and Institute of Medical Informatics (B.P.), University Hospital RWTH Aachen, Pauwelsstr 30, 52074 Aachen, Germany; Pattern Recognition Laboratory, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany (S.T.A.); and Institute of Heat and Mass Transfer, RWTH Aachen University, Aachen, Germany (M.L.)
| | - Mahshad Lotfinia
- From the Department of Diagnostic and Interventional Radiology (S.T.A., R.S., M.H., M.L., C.K., D.T., S.N.), Department of Oral and Maxillofacial Surgery (B.P.), and Institute of Medical Informatics (B.P.), University Hospital RWTH Aachen, Pauwelsstr 30, 52074 Aachen, Germany; Pattern Recognition Laboratory, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany (S.T.A.); and Institute of Heat and Mass Transfer, RWTH Aachen University, Aachen, Germany (M.L.)
| | - Behrus Puladi
- From the Department of Diagnostic and Interventional Radiology (S.T.A., R.S., M.H., M.L., C.K., D.T., S.N.), Department of Oral and Maxillofacial Surgery (B.P.), and Institute of Medical Informatics (B.P.), University Hospital RWTH Aachen, Pauwelsstr 30, 52074 Aachen, Germany; Pattern Recognition Laboratory, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany (S.T.A.); and Institute of Heat and Mass Transfer, RWTH Aachen University, Aachen, Germany (M.L.)
| | - Christiane Kuhl
- From the Department of Diagnostic and Interventional Radiology (S.T.A., R.S., M.H., M.L., C.K., D.T., S.N.), Department of Oral and Maxillofacial Surgery (B.P.), and Institute of Medical Informatics (B.P.), University Hospital RWTH Aachen, Pauwelsstr 30, 52074 Aachen, Germany; Pattern Recognition Laboratory, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany (S.T.A.); and Institute of Heat and Mass Transfer, RWTH Aachen University, Aachen, Germany (M.L.)
| | - Daniel Truhn
- From the Department of Diagnostic and Interventional Radiology (S.T.A., R.S., M.H., M.L., C.K., D.T., S.N.), Department of Oral and Maxillofacial Surgery (B.P.), and Institute of Medical Informatics (B.P.), University Hospital RWTH Aachen, Pauwelsstr 30, 52074 Aachen, Germany; Pattern Recognition Laboratory, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany (S.T.A.); and Institute of Heat and Mass Transfer, RWTH Aachen University, Aachen, Germany (M.L.)
| | - Sven Nebelung
- From the Department of Diagnostic and Interventional Radiology (S.T.A., R.S., M.H., M.L., C.K., D.T., S.N.), Department of Oral and Maxillofacial Surgery (B.P.), and Institute of Medical Informatics (B.P.), University Hospital RWTH Aachen, Pauwelsstr 30, 52074 Aachen, Germany; Pattern Recognition Laboratory, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany (S.T.A.); and Institute of Heat and Mass Transfer, RWTH Aachen University, Aachen, Germany (M.L.)
| |
Collapse
|
45
|
AlShenaiber A, Datta S, Mosa AJ, Binhammer PA, Ing EB. Large Language Models in the Diagnosis of Hand and Peripheral Nerve Injuries: An Evaluation of ChatGPT and the Isabel Differential Diagnosis Generator. JOURNAL OF HAND SURGERY GLOBAL ONLINE 2024; 6:847-854. [PMID: 39703593 PMCID: PMC11652307 DOI: 10.1016/j.jhsg.2024.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024] Open
Abstract
Purpose Tools using artificial intelligence may help reduce missed or delayed diagnoses and improve patient care in hand surgery. This study aimed to compare and evaluate the performance of two natural language processing programs, Isabel and ChatGPT-4, in diagnosing hand and peripheral nerve injuries from a set of clinical vignettes. Methods Cases from a virtual library of hand surgery case reports with no history of trauma or previous surgery were included in this study. The clinical details (age, sex, symptoms, signs, and medical history) of 16 hand cases were entered into Isabel and ChatGPT-4 to generate top 10 differential diagnosis lists. Isabel and ChatGPT-4's inclusion and median rank of the correct diagnosis within each list were compared. Two hand surgeons were then provided each list and asked to independently evaluate the performance of the two systems. Results Isabel correctly identified 7/16 (44%) cases with a median rank of two (interquartile range = 3). ChatGPT-4 correctly identified 14/16 (88%) of cases with a median rank of one (interquartile range = 1). Physicians one and two, respectively, preferred the lists generated by ChatGPT-4 in 12/16 (75%) and 13/16 (81%) of cases and had no preference in 2/16 (13%) cases. Conclusions ChatGPT-4 had significantly greater diagnostic accuracy within our sample (P < .05) and generated higher quality differential diagnoses than Isabel. Isabel produced several inappropriate and imprecise differential diagnoses. Clinical relevance Despite large language models' potential utility in generating medical diagnoses, physicians must continue to exercise high caution and use their clinical judgment when making diagnostic decisions.
Collapse
Affiliation(s)
| | - Shaishav Datta
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Division of Plastic, Reconstructive & Aesthetic Surgery, Department of Surgery, University of Toronto, Toronto, ON, Canada
| | - Adam J. Mosa
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Division of Plastic, Reconstructive & Aesthetic Surgery, Department of Surgery, University of Toronto, Toronto, ON, Canada
| | - Paul A. Binhammer
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Division of Plastic, Reconstructive & Aesthetic Surgery, Department of Surgery, University of Toronto, Toronto, ON, Canada
| | - Edsel B. Ing
- Department of Ophthalmology & Vision Sciences, University of Toronto, Toronto, ON, Canada
- Department of Ophthalmology & Visual Sciences, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
46
|
Kale M, Wankhede N, Pawar R, Ballal S, Kumawat R, Goswami M, Khalid M, Taksande B, Upaganlawar A, Umekar M, Kopalli SR, Koppula S. AI-driven innovations in Alzheimer's disease: Integrating early diagnosis, personalized treatment, and prognostic modelling. Ageing Res Rev 2024; 101:102497. [PMID: 39293530 DOI: 10.1016/j.arr.2024.102497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/14/2024] [Accepted: 09/04/2024] [Indexed: 09/20/2024]
Abstract
Alzheimer's disease (AD) presents a significant challenge in neurodegenerative research and clinical practice due to its complex etiology and progressive nature. The integration of artificial intelligence (AI) into the diagnosis, treatment, and prognostic modelling of AD holds promising potential to transform the landscape of dementia care. This review explores recent advancements in AI applications across various stages of AD management. In early diagnosis, AI-enhanced neuroimaging techniques, including MRI, PET, and CT scans, enable precise detection of AD biomarkers. Machine learning models analyze these images to identify patterns indicative of early cognitive decline. Additionally, AI algorithms are employed to detect genetic and proteomic biomarkers, facilitating early intervention. Cognitive and behavioral assessments have also benefited from AI, with tools that enhance the accuracy of neuropsychological tests and analyze speech and language patterns for early signs of dementia. Personalized treatment strategies have been revolutionized by AI-driven approaches. In drug discovery, virtual screening and drug repurposing, guided by predictive modelling, accelerate the identification of effective treatments. AI also aids in tailoring therapeutic interventions by predicting individual responses to treatments and monitoring patient progress, allowing for dynamic adjustment of care plans. Prognostic modelling, another critical area, utilizes AI to predict disease progression through longitudinal data analysis and risk prediction models. The integration of multi-modal data, combining clinical, genetic, and imaging information, enhances the accuracy of these predictions. Deep learning techniques are particularly effective in fusing diverse data types to uncover new insights into disease mechanisms and progression. Despite these advancements, challenges remain, including ethical considerations, data privacy, and the need for seamless integration of AI tools into clinical workflows. This review underscores the transformative potential of AI in AD management while highlighting areas for future research and development. By leveraging AI, the healthcare community can improve early diagnosis, personalize treatments, and predict disease outcomes more accurately, ultimately enhancing the quality of life for individuals with AD.
Collapse
Affiliation(s)
- Mayur Kale
- Smt. Kishoritai Bhoyar College of Pharmacy, Kamptee, Nagpur, Maharashtra 441002, India.
| | - Nitu Wankhede
- Smt. Kishoritai Bhoyar College of Pharmacy, Kamptee, Nagpur, Maharashtra 441002, India.
| | - Rupali Pawar
- Smt. Kishoritai Bhoyar College of Pharmacy, Kamptee, Nagpur, Maharashtra 441002, India.
| | - Suhas Ballal
- Department of Chemistry and Biochemistry, School of Sciences, JAIN (Deemed to be University), Bangalore, Karnataka, India.
| | - Rohit Kumawat
- Department of Neurology, National Institute of Medical Sciences, NIMS University, Jaipur, Rajasthan, India.
| | - Manish Goswami
- Chandigarh Pharmacy College, Chandigarh Group of Colleges, Jhanjeri, Mohali, Punjab 140307, India.
| | - Mohammad Khalid
- Department of pharmacognosy, College of Pharmacy, Prince Sattam Bin Abdulaziz University Alkharj, Saudi Arabia.
| | - Brijesh Taksande
- Smt. Kishoritai Bhoyar College of Pharmacy, Kamptee, Nagpur, Maharashtra 441002, India.
| | - Aman Upaganlawar
- SNJB's Shriman Sureshdada Jain College of Pharmacy, Neminagar, Chandwad, Nashik, Maharashtra, India.
| | - Milind Umekar
- Smt. Kishoritai Bhoyar College of Pharmacy, Kamptee, Nagpur, Maharashtra 441002, India.
| | - Spandana Rajendra Kopalli
- Department of Bioscience and Biotechnology, Sejong University, Gwangjin-gu, Seoul 05006, Republic of Korea.
| | - Sushruta Koppula
- College of Biomedical and Health Sciences, Konkuk University, Chungju-Si, Chungju-Si, Chungcheongbuk Do 27478, Republic of Korea.
| |
Collapse
|
47
|
Zeng J, Zou X, Li S, Tang Y, Teng S, Li H, Wang C, Wu Y, Zhang L, Zhong Y, Liu J, Liu S. Assessing the Role of the Generative Pretrained Transformer (GPT) in Alzheimer's Disease Management: Comparative Study of Neurologist- and Artificial Intelligence-Generated Responses. J Med Internet Res 2024; 26:e51095. [PMID: 39481104 PMCID: PMC11565080 DOI: 10.2196/51095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 10/06/2023] [Accepted: 09/25/2024] [Indexed: 11/02/2024] Open
Abstract
BACKGROUND Alzheimer's disease (AD) is a progressive neurodegenerative disorder posing challenges to patients, caregivers, and society. Accessible and accurate information is crucial for effective AD management. OBJECTIVE This study aimed to evaluate the accuracy, comprehensibility, clarity, and usefulness of the Generative Pretrained Transformer's (GPT) answers concerning the management and caregiving of patients with AD. METHODS In total, 14 questions related to the prevention, treatment, and care of AD were identified and posed to GPT-3.5 and GPT-4 in Chinese and English, respectively, and 4 respondent neurologists were asked to answer them. We generated 8 sets of responses (total 112) and randomly coded them in answer sheets. Next, 5 evaluator neurologists and 5 family members of patients were asked to rate the 112 responses using separate 5-point Likert scales. We evaluated the quality of the responses using a set of 8 questions rated on a 5-point Likert scale. To gauge comprehensibility and participant satisfaction, we included 3 questions dedicated to each aspect within the same set of 8 questions. RESULTS As of April 10, 2023, the 5 evaluator neurologists and 5 family members of patients with AD rated the 112 responses: GPT-3.5: n=28, 25%, responses; GPT-4: n=28, 25%, responses; respondent neurologists: 56 (50%) responses. The top 5 (4.5%) responses rated by evaluator neurologists had 4 (80%) GPT (GPT-3.5+GPT-4) responses and 1 (20%) respondent neurologist's response. For the top 5 (4.5%) responses rated by patients' family members, all but the third response were GPT responses. Based on the evaluation by neurologists, the neurologist-generated responses achieved a mean score of 3.9 (SD 0.7), while the GPT-generated responses scored significantly higher (mean 4.4, SD 0.6; P<.001). Language and model analyses revealed no significant differences in response quality between the GPT-3.5 and GPT-4 models (GPT-3.5: mean 4.3, SD 0.7; GPT-4: mean 4.4, SD 0.5; P=.51). However, English responses outperformed Chinese responses in terms of comprehensibility (Chinese responses: mean 4.1, SD 0.7; English responses: mean 4.6, SD 0.5; P=.005) and participant satisfaction (Chinese responses: mean 4.2, SD 0.8; English responses: mean 4.5, SD 0.5; P=.04). According to the evaluator neurologists' review, Chinese responses had a mean score of 4.4 (SD 0.6), whereas English responses had a mean score of 4.5 (SD 0.5; P=.002). As for the family members of patients with AD, no significant differences were observed between GPT and neurologists, GPT-3.5 and GPT-4, or Chinese and English responses. CONCLUSIONS GPT can provide patient education materials on AD for patients, their families and caregivers, nurses, and neurologists. This capability can contribute to the effective health care management of patients with AD, leading to enhanced patient outcomes.
Collapse
Affiliation(s)
- Jiaqi Zeng
- West China Medical School, Sichuan University, Chengdu, China
| | - Xiaoyi Zou
- Department of Neurology, West China Hospital, Sichuan University, Chengdu, China
- Department of Neurology, Chengdu Shangjin Nanfu Hospital, Chengdu, China
| | - Shirong Li
- Department of Neurology, Guizhou Provincial People's Hospital, Guiyang, China
| | - Yao Tang
- Department of Neurology, Chengdu Shangjin Nanfu Hospital, Chengdu, China
| | - Sisi Teng
- Department of Neurology, Chengdu Shangjin Nanfu Hospital, Chengdu, China
| | - Huanhuan Li
- Mental Health Center, West China Hospital, Sichuan University, Chengdu, China
| | - Changyu Wang
- West China College of Stomatology, Sichuan University, Chengdu, China
| | - Yuxuan Wu
- Department of Medical Informatics, West China Medical School, Chengdu, China
| | - Luyao Zhang
- West China School of Nursing, Sichuan University, Chengdu, China
| | - Yunheng Zhong
- West China School of Nursing, Sichuan University, Chengdu, China
| | - Jialin Liu
- Department of Medical Informatics, West China Medical School, Chengdu, China
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China
| | - Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| |
Collapse
|
48
|
Ghanta SN, Al’Aref SJ, Lala-Trinidade A, Nadkarni GN, Ganatra S, Dani SS, Mehta JL. Applications of ChatGPT in Heart Failure Prevention, Diagnosis, Management, and Research: A Narrative Review. Diagnostics (Basel) 2024; 14:2393. [PMID: 39518361 PMCID: PMC11544991 DOI: 10.3390/diagnostics14212393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2024] [Revised: 10/22/2024] [Accepted: 10/24/2024] [Indexed: 11/16/2024] Open
Abstract
Heart failure (HF) is a leading cause of mortality, morbidity, and financial burden worldwide. The emergence of advanced artificial intelligence (AI) technologies, particularly Generative Pre-trained Transformer (GPT) systems, presents new opportunities to enhance HF management. In this review, we identified and examined existing studies on the use of ChatGPT in HF care by searching multiple medical databases (PubMed, Google Scholar, Medline, and Scopus). We assessed the role of ChatGPT in HF prevention, diagnosis, and management, focusing on its influence on clinical decision-making and patient education. However, ChatGPT faces limited training data, inherent biases, and ethical issues that hinder its widespread clinical adoption. We review these limitations and highlight the need for improved training approaches, greater model transparency, and robust regulatory compliance. Additionally, we explore the effectiveness of ChatGPT in managing HF, particularly in reducing hospital readmissions and improving patient outcomes with customized treatment plans while addressing social determinants of health (SDoH). In this review, we aim to provide healthcare professionals and policymakers with an in-depth understanding of ChatGPT's potential and constraints within the realm of HF care.
Collapse
Affiliation(s)
- Sai Nikhila Ghanta
- Department of Internal Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| | - Subhi J. Al’Aref
- Division of Cardiology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| | - Anuradha Lala-Trinidade
- Division of Cardiology, Ichan School of Medicine at Mount Sinai, New York, NY 10029, USA; (A.L.-T.); (G.N.N.)
| | - Girish N. Nadkarni
- Division of Cardiology, Ichan School of Medicine at Mount Sinai, New York, NY 10029, USA; (A.L.-T.); (G.N.N.)
| | - Sarju Ganatra
- Division of Cardiology, Lahey Hospital and Medical Center, Burlington, MA 01805, USA;
| | - Sourbha S. Dani
- Division of Cardiology, Lahey Hospital and Medical Center, Burlington, MA 01805, USA;
| | - Jawahar L. Mehta
- Division of Cardiology, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| |
Collapse
|
49
|
Değerli Yİ, Özata Değerli MN. Using ChatGPT as a tool during occupational therapy intervention: A case report in mild cognitive impairment. Assist Technol 2024:1-10. [PMID: 39446069 DOI: 10.1080/10400435.2024.2416495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/07/2024] [Indexed: 10/25/2024] Open
Abstract
This case report examined the impact of computer programmed assistive technology developed, using ChatGPT as a tool when designing an occupational therapy intervention on a client's independence in activities of daily living. A 66-year-old female client with mild cognitive impairment consulted an occupational therapist due to difficulties with activities of daily living. The occupational therapist developed two activity assistance computer programs using ChatGPT as a resource. The client did not interact directly with ChatGPT; instead, the occupational therapist used the technology to design and implement the intervention. A computer programmed assistive technology-based occupational therapy intervention was completed for eight weeks. The occupational therapist trained the client to use these programs in the clinical setting and at home. As a result of the intervention, the client's performance and independence in daily activities improved. The results of this study emphasize that ChatGPT may help occupational therapists as a tool to design simple computer programmed assistive technology interventions without requiring additional professional input.
Collapse
Affiliation(s)
- Yusuf İslam Değerli
- Kızılcahamam Vocational School of Health Services, Ankara University, Ankara, Turkey
| | | |
Collapse
|
50
|
Shalong W, Yi Z, Bin Z, Ganglei L, Jinyu Z, Yanwen Z, Zequn Z, Lianwen Y, Feng R. Enhancing self-directed learning with custom GPT AI facilitation among medical students: A randomized controlled trial. MEDICAL TEACHER 2024:1-8. [PMID: 39425996 DOI: 10.1080/0142159x.2024.2413023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 10/02/2024] [Indexed: 10/21/2024]
Abstract
OBJECTIVE This study aims to assess the impact of LearnGuide, a specialized ChatGPT tool designed to support self-directed learning among medical students. MATERIALS AND METHODS In this 14-week randomized controlled trial (ClinicalTrials.gov NCT06276049), 103 medical students were assigned to either an intervention group, which received 12 weeks of problem-based training with LearnGuide support, or a control group, which received identical training without AI assistance. Primary and secondary outcomes, including Self-Directed Learning Scale scores at 6 and 12 weeks, Cornell Critical Thinking Test Level Z scores, and Global Flow Scores, were evaluated with a 14-week follow-up. Mann-Whitney U tests were used for statistical comparisons between the groups. RESULTS At 6 weeks, the intervention group showed a marginally higher median Self-Directed Learning Scale score, which further improved by 12 weeks (4.15 [95% CI, 0.82 to 7.48]; p = 0.01) and was sustained at the 14-week follow-up. Additionally, this group demonstrated notable improvements in the Cornell Critical Thinking Test Score at 12 weeks (7.11 [95% CI, 4.50 to 9.72]; p < 0.001), which persisted into the 14-week follow-up. The group also experienced enhancements in the Global Flow Score from 6 weeks, maintaining superiority over the control group through 12 weeks. CONCLUSIONS LearnGuide significantly enhanced self-directed learning, critical thinking, and flow experiences in medical students, highlighting the crucial role of AI tools in advancing medical education.
Collapse
Affiliation(s)
- Wang Shalong
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Zuo Yi
- School of Information Technology and Management, Hunan University of Finance and Economics, Changsha, China
| | - Zou Bin
- Department of General Surgery, The Affiliated Changsha Central Hospital Hengyang Medical School, University of South China, Changsha, China
| | - Liu Ganglei
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Zhou Jinyu
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Zheng Yanwen
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Zhang Zequn
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Yuan Lianwen
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Ren Feng
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| |
Collapse
|