1
|
Zhang J, Wang J, Zhang J, Xia X, Zhou Z, Zhou X, Wu Y. Young Adult Perspectives on Artificial Intelligence-Based Medication Counseling in China: Discrete Choice Experiment. J Med Internet Res 2025; 27:e67744. [PMID: 40203305 PMCID: PMC12018864 DOI: 10.2196/67744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Revised: 01/12/2025] [Accepted: 03/12/2025] [Indexed: 04/11/2025] Open
Abstract
BACKGROUND As artificial intelligence (AI) permeates the current society, the young generation is becoming increasingly accustomed to using digital solutions. AI-based medication counseling services may help people take medications more accurately and reduce adverse events. However, it is not known which AI-based medication counseling service will be preferred by young people. OBJECTIVE This study aims to assess young people's preferences for AI-based medication counseling services. METHODS A discrete choice experiment (DCE) approach was the main analysis method applied in this study, involving 6 attributes: granularity, linguistic comprehensibility, symptom-specific results, access platforms, content model, and costs. The participants in this study were screened and recruited through web-based registration and investigator visits, and the questionnaire was filled out online, with the questionnaire platform provided by Questionnaire Star. The sample population in this study consisted of young adults aged 18-44 years. A mixed logit model was used to estimate attribute preference coefficients and to estimate the willingness to pay (WTP) and relative importance (RI) scores. Subgroups were also analyzed to check for heterogeneity in preferences. RESULTS In this analysis, 340 participants were included, generating 8160 DCE observations. Participants exhibited a strong preference for receiving 100% symptom-specific results (β=3.18, 95% CI 2.54-3.81; P<.001), and the RI of the attributes (RI=36.99%) was consistent with this. Next, they showed preference for the content model of the video (β=0.86, 95% CI 0.51-1.22; P<.001), easy-to-understand language (β=0.81, 95% CI 0.46-1.16; P<.001), and when considering the granularity, refined content was preferred over general information (β=0.51, 95% CI 0.21-0.8; P<.001). Finally, participants exhibited a notable preference for accessing information through WeChat applets rather than websites (β=0.66, 95% CI 0.27-1.05; P<.001). The WTP for AI-based medication counseling services ranked from the highest to the lowest for symptom-specific results, easy-to-understand language, video content, WeChat applet platform, and refined medication counseling. Among these, the WTP for 100% symptom-specific results was the highest (¥24.01, 95% CI 20.16-28.77; US $1=¥7.09). High-income participants exhibited significantly higher WTP for highly accurate results (¥45.32) compared to low-income participants (¥20.65). Similarly, participants with higher education levels showed greater preferences for easy-to-understand language (¥5.93) and video content (¥12.53). CONCLUSIONS We conducted an in-depth investigation of the preference of young people for AI-based medication counseling services. Service providers should pay attention to symptom-specific results, support more convenient access platforms, and optimize the language description, content models that add multiple digital media interactions, and more refined medication counseling to develop AI-based medication counseling services.
Collapse
Affiliation(s)
- Jia Zhang
- Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Jing Wang
- School of Public Health, Peking University, Beijing, China
| | - JingBo Zhang
- Beijing University of Chinese Medicine, Evidence Based Medicine Research Center, Beijing, China
| | - XiaoQian Xia
- Department of Public Health, Environments & Society, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - ZiYun Zhou
- Xiangya School of Nursing, Central South University, Changsha, China
| | - XiaoMing Zhou
- Department of Research, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - YiBo Wu
- School of Public Health, Peking University, Beijing, China
| |
Collapse
|
2
|
Ren Y, Luo X, Wang Y, Li H, Zhang H, Li Z, Lai H, Li X, Ge L, Estill J, Zhang L, Yang S, Chen Y, Wen C, Bian Z. Large Language Models in Traditional Chinese Medicine: A Scoping Review. J Evid Based Med 2025; 18:e12658. [PMID: 39651543 DOI: 10.1111/jebm.12658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 11/16/2024] [Accepted: 11/25/2024] [Indexed: 12/11/2024]
Abstract
BACKGROUND The application of large language models (LLMs) in medicine has received increasing attention, showing significant potential in teaching, research, and clinical practice, especially in knowledge extraction, management, and understanding. However, the use of LLMs in Traditional Chinese Medicine (TCM) has not been thoroughly studied. This study aims to provide a comprehensive overview of the status and challenges of LLM applications in TCM. METHODS A systematic search of five electronic databases and Google Scholar was conducted between November 2022 and April 2024, using the Arksey and O'Malley five-stage framework to identify relevant studies. Data from eligible studies were comprehensively extracted and organized to describe LLM applications in TCM and assess their performance accuracy. RESULTS A total of 29 studies were identified: 24 peer-reviewed articles, 1 review, and 4 preprints. Two core application areas were found: the extraction, management, and understanding of TCM knowledge, and assisted diagnosis and treatment. LLMs developed specifically for TCM achieved 70% accuracy in the TCM Practitioner Exam, while general-purpose Chinese LLMs achieved 60% accuracy. Common international LLMs did not pass the exam. Models like EpidemicCHAT and MedChatZH, trained on customized TCM corpora, outperformed general LLMs in TCM consultation. CONCLUSION Despite their potential, LLMs in TCM face challenges such as data quality and security issues, the specificity and complexity of TCM data, and the nonquantitative nature of TCM diagnosis and treatment. Future efforts should focus on interdisciplinary talent cultivation, enhanced data standardization and protection, and exploring LLM potential in multimodal interaction and intelligent diagnosis and treatment.
Collapse
Affiliation(s)
- Yaxuan Ren
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Xufei Luo
- Research Unit of Evidence-Based Evaluation and Guidelines, Chinese Academy of Medical Sciences (2021RU017), School of Basic Medical Sciences, Lanzhou University, Lanzhou, China
- Key Laboratory of Evidence-Based Medicine of Gansu Province, Lanzhou, China
- Institute of Health Data Science, Lanzhou University, Lanzhou, China
- WHO Collaborating Center for Guideline Implementation and Knowledge Translation, Lanzhou, China
| | - Ye Wang
- School of Public Health, Lanzhou University, Lanzhou, China
| | - Haodong Li
- School of Public Health, Lanzhou University, Lanzhou, China
| | - Hairong Zhang
- Research Unit of Evidence-Based Evaluation and Guidelines, Chinese Academy of Medical Sciences (2021RU017), School of Basic Medical Sciences, Lanzhou University, Lanzhou, China
- Key Laboratory of Evidence-Based Medicine of Gansu Province, Lanzhou, China
| | - Zeming Li
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| | - Honghao Lai
- Department of Health Policy and Health Management, School of Public Health, Lanzhou University, Lanzhou, China
- Evidence-Based Social Science Research Center, School of Public Health, Lanzhou University, Lanzhou, China
| | - Xuanlin Li
- College of Basic Medical Sciences, Zhejiang Chinese Medical University, Hangzhou, China
- Key Laboratory of Chinese Medicine Rheumatology of Zhejiang Province, Hangzhou, China
| | - Long Ge
- Department of Health Policy and Health Management, School of Public Health, Lanzhou University, Lanzhou, China
- Evidence-Based Social Science Research Center, School of Public Health, Lanzhou University, Lanzhou, China
| | - Janne Estill
- Research Unit of Evidence-Based Evaluation and Guidelines, Chinese Academy of Medical Sciences (2021RU017), School of Basic Medical Sciences, Lanzhou University, Lanzhou, China
- Institute of Global Health, University of Geneva, Geneva, Switzerland
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| | - Shu Yang
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Yaolong Chen
- Research Unit of Evidence-Based Evaluation and Guidelines, Chinese Academy of Medical Sciences (2021RU017), School of Basic Medical Sciences, Lanzhou University, Lanzhou, China
- Key Laboratory of Evidence-Based Medicine of Gansu Province, Lanzhou, China
- Institute of Health Data Science, Lanzhou University, Lanzhou, China
- WHO Collaborating Center for Guideline Implementation and Knowledge Translation, Lanzhou, China
- School of Public Health, Lanzhou University, Lanzhou, China
| | - Chengping Wen
- College of Basic Medical Sciences, Zhejiang Chinese Medical University, Hangzhou, China
- Key Laboratory of Chinese Medicine Rheumatology of Zhejiang Province, Hangzhou, China
| | - Zhaoxiang Bian
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China
| |
Collapse
|
3
|
de Donato DCB, Aguilar GJ, Ribeiro LG, Dos Santos LRA, Dos Santos LMAC, Costa WDL, de Oliveira AM. Development and Content Analysis Protocol for Evaluating Artificial Intelligence in Drug-Related Information. J Eval Clin Pract 2025; 31:e14276. [PMID: 39679434 DOI: 10.1111/jep.14276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 11/05/2024] [Accepted: 11/28/2024] [Indexed: 12/17/2024]
Abstract
INTRODUCTION Artificial intelligence (AI) has significant transformative potential across various sectors, particularly in health care. This study aims to develop a protocol for the content analysis of a method designed to assess AI applications in drug-related information, specifically focusing on contraindications, adverse reactions, and drug interactions. By addressing existing challenges, this preliminary research seeks to enhance the safe and reliable integration of AI into healthcare practices. METHODS A study protocol was developed for the creation of the method, followed by an initial content analysis conducted by an expert panel. The method was established in phases: (1) Analysis of drug-related databases and form development; (2) AI configuration; (3) Expert panel review and initial validation. RESULTS In Phase 1, the Micromedex, UpToDate, and Medscape databases were reviewed to establish terminology and classifications related to contraindications, adverse reactions, and drug interactions, resulting in the development of a questionnaire for the AI. Phase 2 involved configuring the Gemini AI tool to enhance response specificity. In Phase 3, AI responses to 30 questions were validated by an expert panel, yielding a 76.7% agreement rate for appropriateness, while 23.3% were deemed inappropriate, particularly concerning contraindicated drug interactions. CONCLUSION This preliminary study demonstrates the potential for using an AI-powered tool to standardize drug-related information retrieval, particularly for contraindications and adverse reactions. While AI responses were generally appropriate, improvements are needed in identifying contraindicated drug interactions. Further research with larger datasets and broader evaluations is required to enhance AI's reliability in healthcare settings.
Collapse
Affiliation(s)
- Dantony Castro Barros de Donato
- Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo, Brazil
- Intersection, Ribeirão Preto, São Paulo, Brazil
| | - Guilherme José Aguilar
- Intersection, Ribeirão Preto, São Paulo, Brazil
- Faculty of Philosophy, Sciences and Letters, University of São Paulo, Ribeirão Preto, São Paulo, Brazil
| | - Lucas Gaspar Ribeiro
- Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo, Brazil
- Intersection, Ribeirão Preto, São Paulo, Brazil
| | - Luiz Ricardo Albano Dos Santos
- Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo, Brazil
- Intersection, Ribeirão Preto, São Paulo, Brazil
| | | | - Wilbert Dener Lemos Costa
- Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo, Brazil
- Intersection, Ribeirão Preto, São Paulo, Brazil
| | - Alan Maicon de Oliveira
- Intersection, Ribeirão Preto, São Paulo, Brazil
- School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, São Paulo, Brazil
| |
Collapse
|
4
|
Aydin S, Karabacak M, Vlachos V, Margetis K. Navigating the potential and pitfalls of large language models in patient-centered medication guidance and self-decision support. Front Med (Lausanne) 2025; 12:1527864. [PMID: 39917061 PMCID: PMC11798948 DOI: 10.3389/fmed.2025.1527864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Accepted: 01/09/2025] [Indexed: 02/09/2025] Open
Abstract
Large Language Models (LLMs) are transforming patient education in medication management by providing accessible information to support healthcare decision-making. Building on our recent scoping review of LLMs in patient education, this perspective examines their specific role in medication guidance. These artificial intelligence (AI)-driven tools can generate comprehensive responses about drug interactions, side effects, and emergency care protocols, potentially enhancing patient autonomy in medication decisions. However, significant challenges exist, including the risk of misinformation and the complexity of providing accurate drug information without access to individual patient data. Safety concerns are particularly acute when patients rely solely on AI-generated advice for self-medication decisions. This perspective analyzes current capabilities, examines critical limitations, and raises questions regarding the possible integration of LLMs in medication guidance. We emphasize the need for regulatory oversight to ensure these tools serve as supplements to, rather than replacements for, professional healthcare guidance.
Collapse
Affiliation(s)
- Serhat Aydin
- School of Medicine, Koç University, Istanbul, Türkiye
| | - Mert Karabacak
- Department of Neurosurgery, Mount Sinai Health System, New York, NY, United States
| | - Victoria Vlachos
- College of Human Ecology, Cornell University, Ithaca, NY, United States
| | | |
Collapse
|
5
|
Wang YM, Shen HW, Chen TJ, Chiang SC, Lin TG. Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study. JMIR MEDICAL EDUCATION 2025; 11:e56850. [PMID: 39864950 PMCID: PMC11769692 DOI: 10.2196/56850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 09/26/2024] [Accepted: 12/17/2024] [Indexed: 01/28/2025]
Abstract
Background OpenAI released versions ChatGPT-3.5 and GPT-4 between 2022 and 2023. GPT-3.5 has demonstrated proficiency in various examinations, particularly the United States Medical Licensing Examination. However, GPT-4 has more advanced capabilities. Objective This study aims to examine the efficacy of GPT-3.5 and GPT-4 within the Taiwan National Pharmacist Licensing Examination and to ascertain their utility and potential application in clinical pharmacy and education. Methods The pharmacist examination in Taiwan consists of 2 stages: basic subjects and clinical subjects. In this study, exam questions were manually fed into the GPT-3.5 and GPT-4 models, and their responses were recorded; graphic-based questions were excluded. This study encompassed three steps: (1) determining the answering accuracy of GPT-3.5 and GPT-4, (2) categorizing question types and observing differences in model performance across these categories, and (3) comparing model performance on calculation and situational questions. Microsoft Excel and R software were used for statistical analyses. Results GPT-4 achieved an accuracy rate of 72.9%, overshadowing GPT-3.5, which achieved 59.1% (P<.001). In the basic subjects category, GPT-4 significantly outperformed GPT-3.5 (73.4% vs 53.2%; P<.001). However, in clinical subjects, only minor differences in accuracy were observed. Specifically, GPT-4 outperformed GPT-3.5 in the calculation and situational questions. Conclusions This study demonstrates that GPT-4 outperforms GPT-3.5 in the Taiwan National Pharmacist Licensing Examination, particularly in basic subjects. While GPT-4 shows potential for use in clinical practice and pharmacy education, its limitations warrant caution. Future research should focus on refining prompts, improving model stability, integrating medical databases, and designing questions that better assess student competence and minimize guessing.
Collapse
Affiliation(s)
- Ying-Mei Wang
- Department of Medical Education and Research, Taipei Veterans General Hospital Hsinchu Branch, 81, Section 1, Zhongfeng Road, Zhudong, Hsinchu, 310, Taiwan, 886 03-5962134 ext 127
- Department of Pharmacy, Taipei Veterans General Hospital Hsinchu Branch, Hsinchu, Taiwan
- School of Medicine, National Tsing Hua University, Hsinchu, Taiwan
- Hsinchu County Pharmacists Association, Hsinchu, Taiwan
| | - Hung-Wei Shen
- Department of Medical Education and Research, Taipei Veterans General Hospital Hsinchu Branch, 81, Section 1, Zhongfeng Road, Zhudong, Hsinchu, 310, Taiwan, 886 03-5962134 ext 127
- Department of Pharmacy, Taipei Veterans General Hospital Hsinchu Branch, Hsinchu, Taiwan
- Hsinchu County Pharmacists Association, Hsinchu, Taiwan
| | - Tzeng-Ji Chen
- Department of Family Medicine, Taipei Veterans General Hospital Hsinchu Branch, Hsinchu, Taiwan
- Department of Family Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
- Department of Post-Baccalaureate Medicine, National Chung Hsing University, Taichung, Taiwan
| | - Shu-Chiung Chiang
- Department of Medical Education and Research, Taipei Veterans General Hospital Hsinchu Branch, 81, Section 1, Zhongfeng Road, Zhudong, Hsinchu, 310, Taiwan, 886 03-5962134 ext 127
- Institute of Hospital and Health Care Administration, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ting-Guan Lin
- Department of Pharmacy, Taipei Veterans General Hospital Hsinchu Branch, Hsinchu, Taiwan
- Hsinchu County Pharmacists Association, Hsinchu, Taiwan
| |
Collapse
|
6
|
Yang H, Hu M, Most A, Hawkins WA, Murray B, Smith SE, Li S, Sikora A. Evaluating accuracy and reproducibility of large language model performance on critical care assessments in pharmacy education. Front Artif Intell 2025; 7:1514896. [PMID: 39850846 PMCID: PMC11754395 DOI: 10.3389/frai.2024.1514896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 12/23/2024] [Indexed: 01/25/2025] Open
Abstract
Background Large language models (LLMs) have demonstrated impressive performance on medical licensing and diagnosis-related exams. However, comparative evaluations to optimize LLM performance and ability in the domain of comprehensive medication management (CMM) are lacking. The purpose of this evaluation was to test various LLMs performance optimization strategies and performance on critical care pharmacotherapy questions used in the assessment of Doctor of Pharmacy students. Methods In a comparative analysis using 219 multiple-choice pharmacotherapy questions, five LLMs (GPT-3.5, GPT-4, Claude 2, Llama2-7b and 2-13b) were evaluated. Each LLM was queried five times to evaluate the primary outcome of accuracy (i.e., correctness). Secondary outcomes included variance, the impact of prompt engineering techniques (e.g., chain-of-thought, CoT) and training of a customized GPT on performance, and comparison to third year doctor of pharmacy students on knowledge recall vs. knowledge application questions. Accuracy and variance were compared with student's t-test to compare performance under different model settings. Results ChatGPT-4 exhibited the highest accuracy (71.6%), while Llama2-13b had the lowest variance (0.070). All LLMs performed more accurately on knowledge recall vs. knowledge application questions (e.g., ChatGPT-4: 87% vs. 67%). When applied to ChatGPT-4, few-shot CoT across five runs improved accuracy (77.4% vs. 71.5%) with no effect on variance. Self-consistency and the custom-trained GPT demonstrated similar accuracy to ChatGPT-4 with few-shot CoT. Overall pharmacy student accuracy was 81%, compared to an optimal overall LLM accuracy of 73%. Comparing question types, six of the LLMs demonstrated equivalent or higher accuracy than pharmacy students on knowledge recall questions (e.g., self-consistency vs. students: 93% vs. 84%), but pharmacy students achieved higher accuracy than all LLMs on knowledge application questions (e.g., self-consistency vs. students: 68% vs. 80%). Conclusion ChatGPT-4 was the most accurate LLM on critical care pharmacy questions and few-shot CoT improved accuracy the most. Average student accuracy was similar to LLMs overall, and higher on knowledge application questions. These findings support the need for future assessment of customized training for the type of output needed. Reliance on LLMs is only supported with recall-based questions.
Collapse
Affiliation(s)
- Huibo Yang
- Department of Computer Science, University of Virginia, Charlottesville, VA, United States
| | - Mengxuan Hu
- School of Data Science, University of Virginia, Charlottesville, VA, United States
| | - Amoreena Most
- University of Georgia College of Pharmacy, Augusta, GA, United States
| | - W. Anthony Hawkins
- Department of Clinical and Administrative Pharmacy, University of Georgia College of Pharmacy, Albany, GA, United States
| | - Brian Murray
- University of Colorado Skaggs Schools of Pharmacy and Pharamceutical Sciences, Aurora, CO, United States
| | - Susan E. Smith
- Department of Clinical and Administrative Pharmacy, University of Georgia College of Pharmacy, Athens, GA, United States
| | - Sheng Li
- School of Data Science, University of Virginia, Charlottesville, VA, United States
| | - Andrea Sikora
- Department of Clinical and Administrative Pharmacy, University of Georgia College of Pharmacy, Augusta, GA, United States
| |
Collapse
|
7
|
Yassin Y, Nguyen T, Panchal K, Getchell K, Aungst T. Evaluating a generative artificial intelligence accuracy in providing medication instructions from smartphone images. J Am Pharm Assoc (2003) 2025; 65:102284. [PMID: 39515421 DOI: 10.1016/j.japh.2024.102284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/21/2024] [Accepted: 10/28/2024] [Indexed: 11/16/2024]
Abstract
BACKGROUND The Food and Drug Administration mandates patient labeling materials like the Medication Guide (MG) and Instructions for Use (IFU) to support appropriate medication use. However, challenges such as low health literacy and difficulties navigating these materials may lead to incorrect medication usage, resulting in therapy failure or adverse outcomes. The rise of generative AI, presents an opportunity to provide scalable, personalized patient education through image recognition and text generation. OBJECTIVE This study aimed to evaluate the accuracy and safety of medication instructions generated by ChatGPT based on user-provided drug images, compared to the manufacturer's standard instructions. METHODS Images of 12 medications requiring multiple steps for administration were uploaded to ChatGPT's image recognition function. ChatGPT's responses were compared to the official IFU and MG using text classifiers, Count Vectorization (CountVec), and Term Frequency-Inverse Document Frequency (TF-IDF). The clinical accuracy was further evaluated by independent pharmacists to determine if ChatGPT responses were valid for patient instruction. RESULTS ChatGPT correctly identified all medications and generated patient instructions. CountVec outperformed TF-IDF in text similarity analysis, with an average similarity score of 76%. However, clinical evaluation revealed significant gaps in the instructions, particularly for complex administration routes, where ChatGPT's guidance lacked essential details, leading to lower clinical accuracy scores. CONCLUSION While ChatGPT shows promise in generating patient-friendly medication instructions, its effectiveness varies based on the complexity of the medication. The findings underscore the need for further refinement and clinical oversight to ensure the safety and accuracy of AI-generated medical guidance, particularly for medications with complex administration processes.
Collapse
|
8
|
Ramasubramanian S, Balaji S, Kannan T, Jeyaraman N, Sharma S, Migliorini F, Balasubramaniam S, Jeyaraman M. Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study. World J Methodol 2024; 14:92802. [PMID: 39712564 PMCID: PMC11287534 DOI: 10.5662/wjm.v14.i4.92802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 05/29/2024] [Accepted: 06/25/2024] [Indexed: 07/26/2024] Open
Abstract
BACKGROUND Medication errors, especially in dosage calculation, pose risks in healthcare. Artificial intelligence (AI) systems like ChatGPT and Google Bard may help reduce errors, but their accuracy in providing medication information remains to be evaluated. AIM To evaluate the accuracy of AI systems (ChatGPT 3.5, ChatGPT 4, Google Bard) in providing drug dosage information per Harrison's Principles of Internal Medicine. METHODS A set of natural language queries mimicking real-world medical dosage inquiries was presented to the AI systems. Responses were analyzed using a 3-point Likert scale. The analysis, conducted with Python and its libraries, focused on basic statistics, overall system accuracy, and disease-specific and organ system accuracies. RESULTS ChatGPT 4 outperformed the other systems, showing the highest rate of correct responses (83.77%) and the best overall weighted accuracy (0.6775). Disease-specific accuracy varied notably across systems, with some diseases being accurately recognized, while others demonstrated significant discrepancies. Organ system accuracy also showed variable results, underscoring system-specific strengths and weaknesses. CONCLUSION ChatGPT 4 demonstrates superior reliability in medical dosage information, yet variations across diseases emphasize the need for ongoing improvements. These results highlight AI's potential in aiding healthcare professionals, urging continuous development for dependable accuracy in critical medical situations.
Collapse
Affiliation(s)
- Swaminathan Ramasubramanian
- Department of Orthopaedics, Government Medical College, Omandurar Government Estate, Chennai 600002, Tamil Nadu, India
| | - Sangeetha Balaji
- Department of Orthopaedics, Government Medical College, Omandurar Government Estate, Chennai 600002, Tamil Nadu, India
| | - Tejashri Kannan
- Department of Orthopaedics, Government Medical College, Omandurar Government Estate, Chennai 600002, Tamil Nadu, India
| | - Naveen Jeyaraman
- Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India
| | - Shilpa Sharma
- Department of Paediatric Surgery, All India Institute of Medical Sciences, New Delhi 110029, India
| | - Filippo Migliorini
- Department of Life Sciences, Health, Link Campus University, Rome 00165, Italy
- Department of Orthopaedic and Trauma Surgery, Academic Hospital of Bolzano (SABES-ASDAA), Teaching Hospital of the Paracelsus Medical University, Bolzano 39100, Italy
| | - Suhasini Balasubramaniam
- Department of Radio-Diagnosis, Government Stanley Medical College and Hospital, Chennai 600001, Tamil Nadu, India
| | - Madhan Jeyaraman
- Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India
| |
Collapse
|
9
|
Kusaka S, Akitomo T, Hamada M, Asao Y, Iwamoto Y, Tachikake M, Mitsuhata C, Nomura R. Usefulness of Generative Artificial Intelligence (AI) Tools in Pediatric Dentistry. Diagnostics (Basel) 2024; 14:2818. [PMID: 39767179 PMCID: PMC11674453 DOI: 10.3390/diagnostics14242818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 12/11/2024] [Accepted: 12/12/2024] [Indexed: 01/11/2025] Open
Abstract
Background/Objectives: Generative artificial intelligence (AI) such as ChatGPT has developed rapidly in recent years, and in the medical field, its usefulness for diagnostic assistance has been reported. However, there are few reports of AI use in dental fields. Methods: We created 20 questions that we had encountered in clinical pediatric dentistry, and collected the responses to these questions from three types of generative AI. The responses were evaluated on a 5-point scale by six pediatric dental specialists using the Global Quality Scale. Results: The average scores were >3 for the three types of generated AI tools that we tested; the overall average was 3.34. Although the responses for questions related to "consultations from guardians" or "systemic diseases" had high scores (>3.5), the score for questions related to "dental abnormalities" was 2.99, which was the lowest among the four categories. Conclusions: Our results show the usefulness of generative AI tools in clinical pediatric dentistry, indicating that these tools will be useful assistants in the dental field.
Collapse
Affiliation(s)
- Satoru Kusaka
- Department of Pediatric Dentistry, Hiroshima University Hospital, Hiroshima 734-8551, Japan; (S.K.); (M.T.)
| | - Tatsuya Akitomo
- Department of Pediatric Dentistry, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima 734-8553, Japan; (Y.A.); (Y.I.); (C.M.); (R.N.)
| | - Masakazu Hamada
- Department of Oral & Maxillofacial Oncology and Surgery, Graduate School of Dentistry, The University of Osaka, Osaka 565-0871, Japan
| | - Yuria Asao
- Department of Pediatric Dentistry, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima 734-8553, Japan; (Y.A.); (Y.I.); (C.M.); (R.N.)
| | - Yuko Iwamoto
- Department of Pediatric Dentistry, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima 734-8553, Japan; (Y.A.); (Y.I.); (C.M.); (R.N.)
| | - Meiko Tachikake
- Department of Pediatric Dentistry, Hiroshima University Hospital, Hiroshima 734-8551, Japan; (S.K.); (M.T.)
| | - Chieko Mitsuhata
- Department of Pediatric Dentistry, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima 734-8553, Japan; (Y.A.); (Y.I.); (C.M.); (R.N.)
| | - Ryota Nomura
- Department of Pediatric Dentistry, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima 734-8553, Japan; (Y.A.); (Y.I.); (C.M.); (R.N.)
| |
Collapse
|
10
|
Li X, Guo H, Li D, Zheng Y. Engine of Innovation in Hospital Pharmacy: Applications and Reflections of ChatGPT. J Med Internet Res 2024; 26:e51635. [PMID: 39365643 PMCID: PMC11489799 DOI: 10.2196/51635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 04/09/2024] [Accepted: 09/06/2024] [Indexed: 10/05/2024] Open
Abstract
Hospital pharmacy plays an important role in ensuring medical care quality and safety, especially in the area of drug information retrieval, therapy guidance, and drug-drug interaction management. ChatGPT is a powerful artificial intelligence language model that can generate natural-language texts. Here, we explored the applications and reflections of ChatGPT in hospital pharmacy, where it may enhance the quality and efficiency of pharmaceutical care. We also explored ChatGPT's prospects in hospital pharmacy and discussed its working principle, diverse applications, and practical cases in daily operations and scientific research. Meanwhile, the challenges and limitations of ChatGPT, such as data privacy, ethical issues, bias and discrimination, and human oversight, are discussed. ChatGPT is a promising tool for hospital pharmacy, but it requires careful evaluation and validation before it can be integrated into clinical practice. Some suggestions for future research and development of ChatGPT in hospital pharmacy are provided.
Collapse
Affiliation(s)
- Xingang Li
- Department of Pharmacy, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Heng Guo
- Department of Pharmacy, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Dandan Li
- Department of Pharmacy, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Yingming Zheng
- Department of Pharmacy, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
11
|
Grossman S, Zerilli T, Nathan JP. Appropriateness of ChatGPT as a resource for medication-related questions. Br J Clin Pharmacol 2024; 90:2691-2695. [PMID: 39096130 DOI: 10.1111/bcp.16212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 07/04/2024] [Accepted: 07/22/2024] [Indexed: 08/04/2024] Open
Abstract
With its increasing popularity, healthcare professionals and patients may use ChatGPT to obtain medication-related information. This study was conducted to assess ChatGPT's ability to provide satisfactory responses (i.e., directly answers the question, accurate, complete and relevant) to medication-related questions posed to an academic drug information service. ChatGPT responses were compared to responses generated by the investigators through the use of traditional resources, and references were evaluated. Thirty-nine questions were entered into ChatGPT; the three most common categories were therapeutics (8; 21%), compounding/formulation (6; 15%) and dosage (5; 13%). Ten (26%) questions were answered satisfactorily by ChatGPT. Of the 29 (74%) questions that were not answered satisfactorily, deficiencies included lack of a direct response (11; 38%), lack of accuracy (11; 38%) and/or lack of completeness (12; 41%). References were included with eight (29%) responses; each included fabricated references. Presently, healthcare professionals and consumers should be cautioned against using ChatGPT for medication-related information.
Collapse
Affiliation(s)
- Sara Grossman
- LIU Pharmacy, Arnold & Marie Schwartz College of Pharmacy and Health Sciences, Brooklyn, New York, USA
| | - Tina Zerilli
- LIU Pharmacy, Arnold & Marie Schwartz College of Pharmacy and Health Sciences, Brooklyn, New York, USA
| | - Joseph P Nathan
- LIU Pharmacy, Arnold & Marie Schwartz College of Pharmacy and Health Sciences, Brooklyn, New York, USA
| |
Collapse
|
12
|
MohanaSundaram A, Emran TB. A commentary on 'ChatGPT in medicine: prospects and challenges: a review article' - correspondence. Int J Surg 2024; 110:5178-5179. [PMID: 38640507 PMCID: PMC11325996 DOI: 10.1097/js9.0000000000001487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 04/09/2024] [Indexed: 04/21/2024]
Affiliation(s)
| | - Talha Bin Emran
- Department of Pharmacy, Faculty of Allied Health Sciences, Daffodil International University, Dhaka, Bangladesh
| |
Collapse
|
13
|
Zhu L, Mou W, Hong C, Yang T, Lai Y, Qi C, Lin A, Zhang J, Luo P. The Evaluation of Generative AI Should Include Repetition to Assess Stability. JMIR Mhealth Uhealth 2024; 12:e57978. [PMID: 38688841 PMCID: PMC11106698 DOI: 10.2196/57978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 04/30/2024] [Indexed: 05/02/2024] Open
Abstract
The increasing interest in the potential applications of generative artificial intelligence (AI) models like ChatGPT in health care has prompted numerous studies to explore its performance in various medical contexts. However, evaluating ChatGPT poses unique challenges due to the inherent randomness in its responses. Unlike traditional AI models, ChatGPT generates different responses for the same input, making it imperative to assess its stability through repetition. This commentary highlights the importance of including repetition in the evaluation of ChatGPT to ensure the reliability of conclusions drawn from its performance. Similar to biological experiments, which often require multiple repetitions for validity, we argue that assessing generative AI models like ChatGPT demands a similar approach. Failure to acknowledge the impact of repetition can lead to biased conclusions and undermine the credibility of research findings. We urge researchers to incorporate appropriate repetition in their studies from the outset and transparently report their methods to enhance the robustness and reproducibility of findings in this rapidly evolving field.
Collapse
Affiliation(s)
- Lingxuan Zhu
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Weiming Mou
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Chenglin Hong
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Tao Yang
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yancheng Lai
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Chang Qi
- Institute of Logic and Computation, TU Wien, Austria
| | - Anqi Lin
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Jian Zhang
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Peng Luo
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
14
|
He W, Zhang W, Jin Y, Zhou Q, Zhang H, Xia Q. Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis. J Med Internet Res 2024; 26:e54706. [PMID: 38687566 PMCID: PMC11094593 DOI: 10.2196/54706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 03/20/2024] [Accepted: 04/02/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND There is a dearth of feasibility assessments regarding using large language models (LLMs) for responding to inquiries from autistic patients within a Chinese-language context. Despite Chinese being one of the most widely spoken languages globally, the predominant research focus on applying these models in the medical field has been on English-speaking populations. OBJECTIVE This study aims to assess the effectiveness of LLM chatbots, specifically ChatGPT-4 (OpenAI) and ERNIE Bot (version 2.2.3; Baidu, Inc), one of the most advanced LLMs in China, in addressing inquiries from autistic individuals in a Chinese setting. METHODS For this study, we gathered data from DXY-a widely acknowledged, web-based, medical consultation platform in China with a user base of over 100 million individuals. A total of 100 patient consultation samples were rigorously selected from January 2018 to August 2023, amounting to 239 questions extracted from publicly available autism-related documents on the platform. To maintain objectivity, both the original questions and responses were anonymized and randomized. An evaluation team of 3 chief physicians assessed the responses across 4 dimensions: relevance, accuracy, usefulness, and empathy. The team completed 717 evaluations. The team initially identified the best response and then used a Likert scale with 5 response categories to gauge the responses, each representing a distinct level of quality. Finally, we compared the responses collected from different sources. RESULTS Among the 717 evaluations conducted, 46.86% (95% CI 43.21%-50.51%) of assessors displayed varying preferences for responses from physicians, with 34.87% (95% CI 31.38%-38.36%) of assessors favoring ChatGPT and 18.27% (95% CI 15.44%-21.10%) of assessors favoring ERNIE Bot. The average relevance scores for physicians, ChatGPT, and ERNIE Bot were 3.75 (95% CI 3.69-3.82), 3.69 (95% CI 3.63-3.74), and 3.41 (95% CI 3.35-3.46), respectively. Physicians (3.66, 95% CI 3.60-3.73) and ChatGPT (3.73, 95% CI 3.69-3.77) demonstrated higher accuracy ratings compared to ERNIE Bot (3.52, 95% CI 3.47-3.57). In terms of usefulness scores, physicians (3.54, 95% CI 3.47-3.62) received higher ratings than ChatGPT (3.40, 95% CI 3.34-3.47) and ERNIE Bot (3.05, 95% CI 2.99-3.12). Finally, concerning the empathy dimension, ChatGPT (3.64, 95% CI 3.57-3.71) outperformed physicians (3.13, 95% CI 3.04-3.21) and ERNIE Bot (3.11, 95% CI 3.04-3.18). CONCLUSIONS In this cross-sectional study, physicians' responses exhibited superiority in the present Chinese-language context. Nonetheless, LLMs can provide valuable medical guidance to autistic patients and may even surpass physicians in demonstrating empathy. However, it is crucial to acknowledge that further optimization and research are imperative prerequisites before the effective integration of LLMs in clinical settings across diverse linguistic environments can be realized. TRIAL REGISTRATION Chinese Clinical Trial Registry ChiCTR2300074655; https://www.chictr.org.cn/bin/project/edit?pid=199432.
Collapse
Affiliation(s)
- Wenjie He
- Tianjin University of Traditional Chinese Medicine, Tianjin, China
- Dongguan Rehabilitation Experimental School, Dongguan, China
| | - Wenyan Zhang
- Lanzhou University Second Hospital, Lanzhou University, Lanzhou, China
| | - Ya Jin
- Dongguan Songshan Lake Central Hospital, Guangdong Medical University, Dongguan, China
| | - Qiang Zhou
- Dongguan Rehabilitation Experimental School, Dongguan, China
| | - Huadan Zhang
- Dongguan Rehabilitation Experimental School, Dongguan, China
| | - Qing Xia
- Tianjin University of Traditional Chinese Medicine, Tianjin, China
| |
Collapse
|
15
|
Ashraf AR, Mackey TK, Fittler A. Search Engines and Generative Artificial Intelligence Integration: Public Health Risks and Recommendations to Safeguard Consumers Online. JMIR Public Health Surveill 2024; 10:e53086. [PMID: 38512343 PMCID: PMC10995787 DOI: 10.2196/53086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 11/27/2023] [Accepted: 01/04/2024] [Indexed: 03/22/2024] Open
Abstract
BACKGROUND The online pharmacy market is growing, with legitimate online pharmacies offering advantages such as convenience and accessibility. However, this increased demand has attracted malicious actors into this space, leading to the proliferation of illegal vendors that use deceptive techniques to rank higher in search results and pose serious public health risks by dispensing substandard or falsified medicines. Search engine providers have started integrating generative artificial intelligence (AI) into search engine interfaces, which could revolutionize search by delivering more personalized results through a user-friendly experience. However, improper integration of these new technologies carries potential risks and could further exacerbate the risks posed by illicit online pharmacies by inadvertently directing users to illegal vendors. OBJECTIVE The role of generative AI integration in reshaping search engine results, particularly related to online pharmacies, has not yet been studied. Our objective was to identify, determine the prevalence of, and characterize illegal online pharmacy recommendations within the AI-generated search results and recommendations. METHODS We conducted a comparative assessment of AI-generated recommendations from Google's Search Generative Experience (SGE) and Microsoft Bing's Chat, focusing on popular and well-known medicines representing multiple therapeutic categories including controlled substances. Websites were individually examined to determine legitimacy, and known illegal vendors were identified by cross-referencing with the National Association of Boards of Pharmacy and LegitScript databases. RESULTS Of the 262 websites recommended in the AI-generated search results, 47.33% (124/262) belonged to active online pharmacies, with 31.29% (82/262) leading to legitimate ones. However, 19.04% (24/126) of Bing Chat's and 13.23% (18/136) of Google SGE's recommendations directed users to illegal vendors, including for controlled substances. The proportion of illegal pharmacies varied by drug and search engine. A significant difference was observed in the distribution of illegal websites between search engines. The prevalence of links leading to illegal online pharmacies selling prescription medications was significantly higher (P=.001) in Bing Chat (21/86, 24%) compared to Google SGE (6/92, 6%). Regarding the suggestions for controlled substances, suggestions generated by Google led to a significantly higher number of rogue sellers (12/44, 27%; P=.02) compared to Bing (3/40, 7%). CONCLUSIONS While the integration of generative AI into search engines offers promising potential, it also poses significant risks. This is the first study to shed light on the vulnerabilities within these platforms while highlighting the potential public health implications associated with their inadvertent promotion of illegal pharmacies. We found a concerning proportion of AI-generated recommendations that led to illegal online pharmacies, which could not only potentially increase their traffic but also further exacerbate existing public health risks. Rigorous oversight and proper safeguards are urgently needed in generative search to mitigate consumer risks, making sure to actively guide users to verified pharmacies and prioritize legitimate sources while excluding illegal vendors from recommendations.
Collapse
Affiliation(s)
- Amir Reza Ashraf
- Department of Pharmaceutics, Faculty of Pharmacy, University of Pécs, Pécs, Hungary
| | - Tim Ken Mackey
- Global Health Program, Department of Anthropology, University of California, La Jolla, CA, United States
- Global Health Policy and Data Institute, San Diego, CA, United States
- S-3 Research, San Diego, CA, United States
| | - András Fittler
- Department of Pharmaceutics, Faculty of Pharmacy, University of Pécs, Pécs, Hungary
| |
Collapse
|
16
|
Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interact J Med Res 2024; 13:e54704. [PMID: 38276872 PMCID: PMC10905357 DOI: 10.2196/54704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 12/18/2023] [Accepted: 01/26/2024] [Indexed: 01/27/2024] Open
Abstract
BACKGROUND Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models in health care has been evaluated extensively. However, the lack of consensus guidelines on the design and reporting of findings of these studies poses a challenge for the interpretation and synthesis of evidence. OBJECTIVE This study aimed to develop a preliminary checklist to standardize the reporting of generative AI-based studies in health care education and practice. METHODS A literature review was conducted in Scopus, PubMed, and Google Scholar. Published records with "ChatGPT," "Bing," or "Bard" in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and the possible gaps in reporting. A panel discussion was held to establish a unified and thorough checklist for the reporting of AI studies in health care. The finalized checklist was used to evaluate the included records by 2 independent raters. Cohen κ was used as the method to evaluate the interrater reliability. RESULTS The final data set that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included 9 pertinent themes collectively referred to as METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language). Their details are as follows: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and interrater reliability; (8) Count of queries executed to test the model; and (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0 (SD 0.58). The tested METRICS score was acceptable, with the range of Cohen κ of 0.558 to 0.962 (P<.001 for the 9 tested items). With classification per item, the highest average METRICS score was recorded for the "Model" item, followed by the "Specificity" item, while the lowest scores were recorded for the "Randomization" item (classified as suboptimal) and "Individual factors" item (classified as satisfactory). CONCLUSIONS The METRICS checklist can facilitate the design of studies guiding researchers toward best practices in reporting results. The findings highlight the need for standardized reporting algorithms for generative AI-based studies in health care, considering the variability observed in methodologies and reporting. The proposed METRICS checklist could be a preliminary helpful base to establish a universally accepted approach to standardize the design and reporting of generative AI-based studies in health care, which is a swiftly evolving research topic.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, Jordan
- Department of Translational Medicine, Faculty of Medicine, Lund University, Malmo, Sweden
| | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, Jordan
| | - Mohammed Sallam
- Department of Pharmacy, Mediclinic Parkview Hospital, Mediclinic Middle East, Dubai, United Arab Emirates
| |
Collapse
|
17
|
Bazzari FH, Bazzari AH. Utilizing ChatGPT in Telepharmacy. Cureus 2024; 16:e52365. [PMID: 38230387 PMCID: PMC10790595 DOI: 10.7759/cureus.52365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/15/2024] [Indexed: 01/18/2024] Open
Abstract
BACKGROUND ChatGPT is an artificial intelligence-powered chatbot that has demonstrated capabilities in numerous fields, including medical and healthcare sciences. This study evaluates the potential for ChatGPT application in telepharmacy, the delivering of pharmaceutical care via means of telecommunications, through assessing its interactions, adherence to instructions, and ability to role-play as a pharmacist while handling a series of life-like scenario questions. METHODS Two versions (ChatGPT 3.5 and 4.0, OpenAI) were assessed using two independent trials each. ChatGPT was instructed to act as a pharmacist and answer patient inquiries, followed by a set of 20 assessment questions. Then, ChatGPT was instructed to stop its act, provide feedback and list its sources for drug information. The responses to the assessment questions were evaluated in terms of accuracy, precision and clarity using a 4-point Likert-like scale. RESULTS ChatGPT demonstrated the ability to follow detailed instructions, role-play as a pharmacist, and appropriately handle all questions. ChatGPT was able to understand case details, recognize generic and brand drug names, identify drug side effects, interactions, prescription requirements and precautions, and provide proper point-by-point instructions regarding administration, dosing, storage and disposal. The overall means of pooled scores were 3.425 (0.712) and 3.7 (0.61) for ChatGPT 3.5 and 4.0, respectively. The rank distribution of scores was not significantly different (P>0.05). None of the answers could be considered directly harmful or labeled as entirely or mostly incorrect, and most point deductions were due to other factors such as indecisiveness, adding immaterial information, missing certain considerations, or partial unclarity. The answers were similar in length across trials and appropriately concise. ChatGPT 4.0 showed superior performance, higher consistency, better character adherence and the ability to report various reliable information sources. However, it only allowed an input of 40 questions every three hours and provided inaccurate feedback regarding the number of assessed patients, compared to 3.5 which allowed unlimited input but was unable to provide feedback. CONCLUSIONS Integrating ChatGPT in telepharmacy holds promising potential; however, a number of drawbacks are to be overcome in order to function effectively.
Collapse
Affiliation(s)
| | - Amjad H Bazzari
- Basic Scientific Sciences, Applied Science Private University, Amman, JOR
| |
Collapse
|
18
|
Tangadulrat P, Sono S, Tangtrakulwanich B. Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students' and Physicians' Perceptions. JMIR MEDICAL EDUCATION 2023; 9:e50658. [PMID: 38133908 PMCID: PMC10770783 DOI: 10.2196/50658] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 10/17/2023] [Accepted: 12/11/2023] [Indexed: 12/23/2023]
Abstract
BACKGROUND ChatGPT is a well-known large language model-based chatbot. It could be used in the medical field in many aspects. However, some physicians are still unfamiliar with ChatGPT and are concerned about its benefits and risks. OBJECTIVE We aim to evaluate the perception of physicians and medical students toward using ChatGPT in the medical field. METHODS A web-based questionnaire was sent to medical students, interns, residents, and attending staff with questions regarding their perception toward using ChatGPT in clinical practice and medical education. Participants were also asked to rate their perception of ChatGPT's generated response about knee osteoarthritis. RESULTS Participants included 124 medical students, 46 interns, 37 residents, and 32 attending staff. After reading ChatGPT's response, 132 of the 239 (55.2%) participants had a positive rating about using ChatGPT for clinical practice. The proportion of positive answers was significantly lower in graduated physicians (48/115, 42%) compared with medical students (84/124, 68%; P<.001). Participants listed a lack of a patient-specific treatment plan, updated evidence, and a language barrier as ChatGPT's pitfalls. Regarding using ChatGPT for medical education, the proportion of positive responses was also significantly lower in graduate physicians (71/115, 62%) compared to medical students (103/124, 83.1%; P<.001). Participants were concerned that ChatGPT's response was too superficial, might lack scientific evidence, and might need expert verification. CONCLUSIONS Medical students generally had a positive perception of using ChatGPT for guiding treatment and medical education, whereas graduated doctors were more cautious in this regard. Nonetheless, both medical students and graduated doctors positively perceived using ChatGPT for creating patient educational materials.
Collapse
Affiliation(s)
- Pasin Tangadulrat
- Department of Orthopedics, Faculty of Medicine, Prince of Songkla University, Hatyai, Thailand
| | - Supinya Sono
- Division of Family and Preventive Medicine, Faculty of Medicine, Prince of Songkla University, Hatyai, Thailand
| | | |
Collapse
|
19
|
Zawiah M, Al-Ashwal FY, Gharaibeh L, Abu Farha R, Alzoubi KH, Abu Hammour K, Qasim QA, Abrah F. ChatGPT and Clinical Training: Perception, Concerns, and Practice of Pharm-D Students. J Multidiscip Healthc 2023; 16:4099-4110. [PMID: 38116306 PMCID: PMC10729768 DOI: 10.2147/jmdh.s439223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 12/04/2023] [Indexed: 12/21/2023] Open
Abstract
Background The emergence of Chat-Generative Pre-trained Transformer (ChatGPT) by OpenAI has revolutionized AI technology, demonstrating significant potential in healthcare and pharmaceutical education, yet its real-world applicability in clinical training warrants further investigation. Methods A cross-sectional study was conducted between April and May 2023 to assess PharmD students' perceptions, concerns, and experiences regarding the integration of ChatGPT into clinical pharmacy education. The study utilized a convenient sampling method through online platforms and involved a questionnaire with sections on demographics, perceived benefits, concerns, and experience with ChatGPT. Statistical analysis was performed using SPSS, including descriptive and inferential analyses. Results The findings of the study involving 211 PharmD students revealed that the majority of participants were male (77.3%), and had prior experience with artificial intelligence (68.2%). Over two-thirds were aware of ChatGPT. Most students (n= 139, 65.9%) perceived potential benefits in using ChatGPT for various clinical tasks, with concerns including over-reliance, accuracy, and ethical considerations. Adoption of ChatGPT in clinical training varied, with some students not using it at all, while others utilized it for tasks like evaluating drug-drug interactions and developing care plans. Previous users tended to have higher perceived benefits and lower concerns, but the differences were not statistically significant. Conclusion Utilizing ChatGPT in clinical training offers opportunities, but students' lack of trust in it for clinical decisions highlights the need for collaborative human-ChatGPT decision-making. It should complement healthcare professionals' expertise and be used strategically to compensate for human limitations. Further research is essential to optimize ChatGPT's effective integration.
Collapse
Affiliation(s)
- Mohammed Zawiah
- Department of Clinical Pharmacy, College of Pharmacy, Northern Border University, Rafha, 91911, Saudi Arabia
- Department of Pharmacy Practice, College of Clinical Pharmacy, Hodeidah University, Al Hodeidah, Yemen
| | - Fahmi Y Al-Ashwal
- Department of Clinical Pharmacy, College of Pharmacy, Al-Ayen University, Thi-Qar, Iraq
| | - Lobna Gharaibeh
- Pharmacological and Diagnostic Research Center, Faculty of Pharmacy, Al-Ahliyya Amman University, Amman, Jordan
| | - Rana Abu Farha
- Clinical Pharmacy and Therapeutics Department, Faculty of Pharmacy, Applied Science Private University, Amman, Jordan
| | - Karem H Alzoubi
- Department of Pharmacy Practice and Pharmacotherapeutics, University of Sharjah, Sharjah, 27272, United Arab Emirates
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, 22110, Jordan
| | - Khawla Abu Hammour
- Department of Clinical Pharmacy and Biopharmaceutics, Faculty of Pharmacy, University of Jordan, Amman, Jordan
| | - Qutaiba A Qasim
- Department of Clinical Pharmacy, College of Pharmacy, Al-Ayen University, Thi-Qar, Iraq
| | - Fahd Abrah
- Discipline of Social and Administrative Pharmacy, School of Pharmaceutical Sciences, Universiti Sains Malaysia, Penang, Malaysia
| |
Collapse
|