1
|
Langdon C, Haag O, Vigliano M, Levorato M, Leon-Ulate J, Adroher M. Transforming pediatric ENT documentation: Efficiency, accuracy, and adoption of speech recognition technology (Speaknosis). Int J Pediatr Otorhinolaryngol 2025; 191:112275. [PMID: 39987845 DOI: 10.1016/j.ijporl.2025.112275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 02/05/2025] [Accepted: 02/12/2025] [Indexed: 02/25/2025]
Abstract
INTRODUCTION Efficient and accurate medical documentation ensures patient safety, continuity of care, and clinician satisfaction. Speech recognition technology has emerged as a promising alternative to traditional documentation methods, potentially reducing administrative burden and improving workflow efficiency. However, concerns about accuracy, consistency, and clinical adoption remain significant barriers to its integration into medical practice. OBJECTIVE This study evaluates the impact of AI-powered speech recognition technology (Speaknosis) on medical documentation in pediatric ENT settings, focusing on its efficiency, accuracy, and acceptance among clinicians. The research also explores the tool's potential to enhance clinical data interpretation and decision-making. METHODS Ten pediatric ENT physicians participated in 375 AI interactions, and a quasi-experimental design was employed. Speaknosis-generated documentation was assessed for semantic relevance (BERTScore), quality (PDQI-9), and clinician satisfaction using a 5-point Likert scale. Human interventions were analyzed for error correction and alignment with professional standards. Statistical analysis of quantitative data and thematic evaluation of qualitative feedback were conducted. RESULTS The AI system achieved a high average BERTS score (96.50 %), with notable instances of inaccuracies requiring human intervention, including omission of clinical findings, redundant content, and formatting issues. The PDQI-9 mean score was 38.34, indicating overall high-quality documentation, with strengths in organization (mean = 5.0) and internal consistency (mean = 4.83). However, comprehensiveness (mean = 3.99) and timeliness (mean = 4.00) exhibited variability. Clinician satisfaction averaged 4.64, with higher satisfaction rates correlated to interactions with superior documentation quality and duration. CONCLUSION Speaknosis has the potential to improve documentation efficiency and accuracy and alleviate clinician burden. However, challenges in addressing error variability and comprehensiveness highlight the need for ongoing algorithm refinement and human oversight. This study emphasizes the transformative role of AI in healthcare documentation, contingent on robust validation and strategic implementation.
Collapse
Affiliation(s)
- Cristóbal Langdon
- Department of Pediatric Otorhinolaryngology, Hospital Sant Joan de Deu, Barcelona, Spain.
| | - Oliver Haag
- Department of Pediatric Otorhinolaryngology, Hospital Sant Joan de Deu, Barcelona, Spain
| | - Melisa Vigliano
- Department of Pediatric Otorhinolaryngology, Hospital Sant Joan de Deu, Barcelona, Spain
| | - Maurizio Levorato
- Department of Pediatric Otorhinolaryngology, Hospital Sant Joan de Deu, Barcelona, Spain
| | - Johan Leon-Ulate
- Department of Pediatric Otorhinolaryngology, Hospital Sant Joan de Deu, Barcelona, Spain
| | - Marti Adroher
- Department of Pediatric Otorhinolaryngology, Hospital Sant Joan de Deu, Barcelona, Spain
| |
Collapse
|
2
|
Šuto Pavičić J, Marušić A, Buljan I. Using ChatGPT to Improve the Presentation of Plain Language Summaries of Cochrane Systematic Reviews About Oncology Interventions: Cross-Sectional Study. JMIR Cancer 2025; 11:e63347. [PMID: 40106236 PMCID: PMC11939027 DOI: 10.2196/63347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 01/22/2025] [Accepted: 01/27/2025] [Indexed: 03/22/2025] Open
Abstract
Background Plain language summaries (PLSs) of Cochrane systematic reviews are a simple format for presenting medical information to the lay public. This is particularly important in oncology, where patients have a more active role in decision-making. However, current PLS formats often exceed the readability requirements for the general population. There is still a lack of cost-effective and more automated solutions to this problem. Objective This study assessed whether a large language model (eg, ChatGPT) can improve the readability and linguistic characteristics of Cochrane PLSs about oncology interventions, without changing evidence synthesis conclusions. Methods The dataset included 275 scientific abstracts and corresponding PLSs of Cochrane systematic reviews about oncology interventions. ChatGPT-4 was tasked to make each scientific abstract into a PLS using 3 prompts as follows: (1) rewrite this scientific abstract into a PLS to achieve a Simple Measure of Gobbledygook (SMOG) index of 6, (2) rewrite the PLS from prompt 1 so it is more emotional, and (3) rewrite this scientific abstract so it is easier to read and more appropriate for the lay audience. ChatGPT-generated PLSs were analyzed for word count, level of readability (SMOG index), and linguistic characteristics using Linguistic Inquiry and Word Count (LIWC) software and compared with the original PLSs. Two independent assessors reviewed the conclusiveness categories of ChatGPT-generated PLSs and compared them with original abstracts to evaluate consistency. The conclusion of each abstract about the efficacy and safety of the intervention was categorized as conclusive (positive/negative/equal), inconclusive, or unclear. Group comparisons were conducted using the Friedman nonparametric test. Results ChatGPT-generated PLSs using the first prompt (SMOG index 6) were the shortest and easiest to read, with a median SMOG score of 8.2 (95% CI 8-8.4), compared with the original PLSs (median SMOG score 13.1, 95% CI 12.9-13.4). These PLSs had a median word count of 240 (95% CI 232-248) compared with the original PLSs' median word count of 364 (95% CI 339-388). The second prompt (emotional tone) generated PLSs with a median SMOG score of 11.4 (95% CI 11.1-12), again lower than the original PLSs. PLSs produced with the third prompt (write simpler and easier) had a median SMOG score of 8.7 (95% CI 8.4-8.8). ChatGPT-generated PLSs across all prompts demonstrated reduced analytical tone and increased authenticity, clout, and emotional tone compared with the original PLSs. Importantly, the conclusiveness categorization of the original abstracts was unchanged in the ChatGPT-generated PLSs. Conclusions ChatGPT can be a valuable tool in simplifying PLSs as medically related formats for lay audiences. More research is needed, including oversight mechanisms to ensure that the information is accurate, reliable, and culturally relevant for different audiences.
Collapse
Affiliation(s)
- Jelena Šuto Pavičić
- Department of Oncology and Radiotherapy, University Hospital of Split, Spinciceva 1, Split, 21000, Croatia, 385 2155817
| | - Ana Marušić
- Department of Research in Biomedicine in Health, Centre for Evidence-based Medicine, University of Split School of Medicine, Split, Croatia
| | - Ivan Buljan
- Department of Psychology, Faculty of Humanities and Social Sciences, University of Split, Split, Croatia
| |
Collapse
|
3
|
Miller M, DiCiurcio WT, Meade M, Buchan L, Gleimer J, Woods B, Kepler C. Appropriateness and Consistency of an Online Artificial Intelligence System's Response to Common Questions Regarding Cervical Fusion. Clin Spine Surg 2025:01933606-990000000-00435. [PMID: 39928039 DOI: 10.1097/bsd.0000000000001768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Accepted: 01/20/2025] [Indexed: 02/11/2025]
Abstract
STUDY DESIGN Prospective survey study. OBJECTIVE To address a gap that exists concerning ChatGPT's ability to respond to various types of questions regarding cervical surgery. SUMMARY OF BACKGROUND DATA Artificial Intelligence (AI) and machine learning have been creating great change in the landscape of scientific research. Chat Generative Pre-trained Transformer(ChatGPT), an online AI language model, has emerged as a powerful tool in clinical medicine and surgery. Previous studies have demonstrated appropriate and reliable responses from ChatGPT concerning patient questions regarding total joint arthroplasty, distal radius fractures, and lumbar laminectomy. However, there is a gap that exists in examining how accurate and reliable ChatGPT responses are to common questions related to cervical surgery. MATERIALS AND METHODS Twenty questions regarding cervical surgery were presented to the online ChatGPT-3.5 web application 3 separate times, creating 60 responses. Responses were then analyzed by 3 fellowship-trained spine surgeons across 2 institutions using a modified Global Quality Scale (1-5 rating) to evaluate accuracy and utility. Descriptive statistics were reported based on responses, and intraclass correlation coefficients were then calculated to assess the consistency of response quality. RESULTS Out of all questions proposed to the AI platform, the average score was 3.17 (95% CI, 2.92, 3.42), with 66.7% of responses being recorded to be of at least "moderate" quality by 1 reviewer. Nine (45%) questions yielded responses that were graded at least "moderate" quality by all 3 reviewers. The test-retest reliability was poor with the intraclass correlation coefficient (ICC) calculated as 0.0941 (-0.222, 0.135). CONCLUSION This study demonstrated that ChatGPT can answer common patient questions concerning cervical surgery with moderate quality during the majority of responses. Further research within AI is necessary to increase response.
Collapse
|
4
|
Cross CG, Hofmann HL, Cangut B, Garcia-Reyes K, Bishay VL, Vairavamurthy J. When Does Chat-GPT Refer Someone to an Interventional Radiologist? J Vasc Interv Radiol 2025; 36:355-357.e23. [PMID: 39428058 DOI: 10.1016/j.jvir.2024.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 08/01/2024] [Accepted: 10/10/2024] [Indexed: 10/22/2024] Open
Affiliation(s)
- Chloe G Cross
- Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1234, New York, NY 10029.
| | - Hayden L Hofmann
- Keck School of Medicine, University of Southern California, Los Angeles, CA
| | - Busra Cangut
- Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1234, New York, NY 10029
| | - Kirema Garcia-Reyes
- Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1234, New York, NY 10029
| | - Vivian L Bishay
- Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1234, New York, NY 10029
| | - Jenanan Vairavamurthy
- Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1234, New York, NY 10029
| |
Collapse
|
5
|
Dearani JA, Mavroudis C. The Emerging Influence of Artificial Intelligence on Traditional Medical Textbooks. Ann Thorac Surg 2025:S0003-4975(25)00085-2. [PMID: 39892844 DOI: 10.1016/j.athoracsur.2025.01.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 12/10/2024] [Accepted: 01/20/2025] [Indexed: 02/04/2025]
Abstract
This review provides a comprehensive exploration of how artificial intelligence (AI) is reshaping medical education and the role of traditional textbooks. The historical context underscores the evolution of medical knowledge bridging past advances with current AI-driven innovations, highlighting the indispensable role of both printed and electronic medical textbooks. The strengths and limitations of traditional and digital textbooks are considered. The potential for AI to enhance medical education through real-time updates, personalized learning, and advanced visual aids is particularly compelling. This perspective is critical for practitioners and educators who are navigating the integration of AI in their fields.
Collapse
Affiliation(s)
- Joseph A Dearani
- Department of Cardiovascular Surgery, Mayo Clinic, Rochester, Minnesota
| | - Constantine Mavroudis
- Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, Maryland; Pediatric Cardiothoracic Surgery, Peyton Manning Children's Hospital, Indianapolis, Indiana.
| |
Collapse
|
6
|
Jaber SA, Hasan HE, Alzoubi KH, Khabour OF. Knowledge, attitude, and perceptions of MENA researchers towards the use of ChatGPT in research: A cross-sectional study. Heliyon 2025; 11:e41331. [PMID: 39811375 PMCID: PMC11731567 DOI: 10.1016/j.heliyon.2024.e41331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 12/03/2024] [Accepted: 12/17/2024] [Indexed: 01/16/2025] Open
Abstract
Background Artificial intelligence (AI) technologies are increasingly recognized for their potential to revolutionize research practices. However, there is a gap in understanding the perspectives of MENA researchers on ChatGPT. This study explores the knowledge, attitudes, and perceptions of ChatGPT utilization in research. Methods A cross-sectional survey was conducted among 369 MENA researchers. Participants provided demographic information and responded to questions about their knowledge of AI, their experience with ChatGPT, their attitudes toward technology, and their perceptions of the potential roles and benefits of ChatGPT in research. Results The results indicate a moderate level of knowledge about ChatGPT, with a total score of 58.3 ± 19.6. Attitudes towards its use were generally positive, with a total score of 68.1 ± 8.1 expressing enthusiasm for integrating ChatGPT into their research workflow. About 56 % of the sample reported using ChatGPT for various applications. In addition, 27.6 % expressed their intention to use it in their research, while 17.3 % have already started using it in their research. However, perceptions varied, with concerns about accuracy, bias, and ethical implications highlighted. The results showed significant differences in knowledge scores based on gender (p < 0.001), working country (p < 0.05), and work field (p < 0.01). Regarding attitude scores, there were significant differences based on the highest qualification and the employment field (p < 0.05). These findings underscore the need for targeted training programs and ethical guidelines to support the effective use of ChatGPT in research. Conclusion MENA researchers demonstrate significant awareness and interest in integrating ChatGPT into their research workflow. Addressing concerns about reliability and ethical implications is essential for advancing scientific innovation in the MENA region.
Collapse
Affiliation(s)
- Sana'a A. Jaber
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, 22110, Jordan
| | - Hisham E. Hasan
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, 22110, Jordan
| | - Karem H. Alzoubi
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, 22110, Jordan
| | - Omar F. Khabour
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, Jordan University of Science and Technology, Irbid, 22110, Jordan
| |
Collapse
|
7
|
Hirschmann MT, Herbst E, Milano G, Musahl V. Embracing the opportunities of 2025: Shaping the future of KSSTA. Knee Surg Sports Traumatol Arthrosc 2025; 33:7-12. [PMID: 39786336 DOI: 10.1002/ksa.12573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Accepted: 12/16/2024] [Indexed: 01/12/2025]
Affiliation(s)
- Michael T Hirschmann
- Department of Orthopedic Surgery and Traumatology, Kantonsspital Baselland, Bruderholz, Switzerland
- Department of Clinical Research, Research Group Michael T. Hirschmann, Regenerative Medicine & Biomechanics, University of Basel, Basel, Switzerland
| | - Elmar Herbst
- Department of Trauma, Hand and Reconstructive Surgery, University of Muenster, Muenster, Germany
| | - Giuseppe Milano
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, Brescia, Italy
- Department of Bone and Joint Surgery, Spedali Civili, Brescia, Italy
| | - Volker Musahl
- Blue Cross of Western Pennsylvania Professor and Chief Sports Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
8
|
Królikowska A, Urban N, Lech M, Reichert P, Ramadanov N, Kayaalp ME, Prill R. Mapping the reporting practices in recent randomised controlled trials published in Knee Surgery, Sports Traumatology, Arthroscopy: A scoping review of methodological quality. J Exp Orthop 2025; 12:e70117. [PMID: 39776837 PMCID: PMC11705533 DOI: 10.1002/jeo2.70117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Accepted: 12/02/2024] [Indexed: 01/11/2025] Open
Abstract
The official medical journals of scientific societies advocate for high-quality standards. It's important to assess whether randomized controlled trials (RCTs) in influential journals, such as the hybrid journal of the European Society of Sports Traumatology, Knee Surgery, and Arthroscopy (ESSKA), adhere to reporting guidelines and best practices. Therefore, the present scoping review aimed to explore and map the reporting practices and methodological quality in recent RCTs published in the Knee Surgery, Sports Traumatology, Arthroscopy (KSSTA) journal, focusing on identifying gaps in adherence to reporting guidelines and transparency. The study was preregistered and followed the PRISMA-ScR checklist. RCTs published in KSSTA between 2022 and 2023 were included. The search was conducted via PubMed. A two-stage selection process was employed, with two independent reviewers conducting study selection and data extraction. Data collected included study characteristics, intervention details, sample size calculation reporting, data transparency, and adherence to Consolidated Standards of Reporting Trials (CONSORT) guidelines. Critical appraisal was conducted using the JBI tool for RCTs. All included RCTs (n = 25) reported a predetermined minimum sample size. Study protocol preregistration was reported in 52% of the RCTs, while only 24% provided data availability statements. Most RCTs offering data availability indicated data would be shared upon request. Adherence to CONSORT guidelines was reported in 96% of studies, with only one RCT not adhering to recognized reporting standards. All the included studies adequately addressed statistical conclusion validity. However, internal validity was less consistently addressed across the studies. Conclusions While most recently published RCTs in KSSTA adhered to CONSORT guidelines, there is potential for improvement in the reporting of protocol preregistration and data availability statements. Although all studies reported sample size calculations, transparency in data sharing remains limited. Level of Evidence Level I.
Collapse
Affiliation(s)
- Aleksandra Królikowska
- Physiotherapy Research LaboratoryUniversity Centre of Physiotherapy and Rehabilitation, Faculty of Physiotherapy, Wroclaw Medical UniversityWroclawPoland
- Evidence‐Based Healthcare in Wroclaw: A JBI Affiliated GroupThe University of AdelaideAdelaideSouth AustraliaAustralia
| | - Natalia Urban
- Physiotherapy Research LaboratoryUniversity Centre of Physiotherapy and Rehabilitation, Faculty of Physiotherapy, Wroclaw Medical UniversityWroclawPoland
| | - Marcin Lech
- Clinical Department of Orthopedics, Traumatology and Hand SurgeryJan Mikulicz‐Radecki University HospitalWroclawPoland
| | - Paweł Reichert
- Clinical Department of Orthopedics, Traumatology and Hand SurgeryJan Mikulicz‐Radecki University HospitalWroclawPoland
- Department of Orthopedics, Traumatology and Hand SurgeryFaculty of Medicine, Wroclaw Medical UniversityWroclawPoland
| | - Nikolai Ramadanov
- Center of Orthopaedics and TraumatologyUniversity Hospital Brandenburg/Havel, Brandenburg Medical School Theodor FontaneBrandenburg a.d.H.Germany
- Faculty of Health Sciences BrandenburgBrandenburg Medical School Theodor FontaneBrandenburg a.d.H.Germany
| | - Mahmut Enes Kayaalp
- Clinic of Orthopedics and TraumatologyIstanbul Kartal Dr. Lutfi Kirdar Training and Research HospitalIstanbulTurkey
| | - Robert Prill
- Center of Orthopaedics and TraumatologyUniversity Hospital Brandenburg/Havel, Brandenburg Medical School Theodor FontaneBrandenburg a.d.H.Germany
- Faculty of Health Sciences BrandenburgBrandenburg Medical School Theodor FontaneBrandenburg a.d.H.Germany
| |
Collapse
|
9
|
Leng L. Challenge, integration, and change: ChatGPT and future anatomical education. MEDICAL EDUCATION ONLINE 2024; 29:2304973. [PMID: 38217884 PMCID: PMC10791098 DOI: 10.1080/10872981.2024.2304973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/08/2024] [Indexed: 01/15/2024]
Abstract
With the vigorous development of ChatGPT and its application in the field of education, a new era of the collaborative development of human and artificial intelligence and the symbiosis of education has come. Integrating artificial intelligence (AI) into medical education has the potential to revolutionize it. Large language models, such as ChatGPT, can be used as virtual teaching aids to provide students with individualized and immediate medical knowledge, and conduct interactive simulation learning and detection. In this paper, we discuss the application of ChatGPT in anatomy teaching and its various application levels based on our own teaching experiences, and discuss the advantages and disadvantages of ChatGPT in anatomy teaching. ChatGPT increases student engagement and strengthens students' ability to learn independently. At the same time, ChatGPT faces many challenges and limitations in medical education. Medical educators must keep pace with the rapid changes in technology, taking into account ChatGPT's impact on curriculum design, assessment strategies and teaching methods. Discussing the application of ChatGPT in medical education, especially anatomy teaching, is helpful to the effective integration and application of artificial intelligence tools in medical education.
Collapse
Affiliation(s)
- Lige Leng
- Fujian Provincial Key Laboratory of Neurodegenerative Disease and Aging Research, Institute of Neuroscience, School of Medicine, Xiamen University, Xiamen, Fujian, P.R. China
| |
Collapse
|
10
|
Heisinger S, Salzmann SN, Senker W, Aspalter S, Oberndorfer J, Matzner MP, Stienen MN, Motov S, Huber D, Grohs JG. ChatGPT's Performance in Spinal Metastasis Cases-Can We Discuss Our Complex Cases with ChatGPT? J Clin Med 2024; 13:7864. [PMID: 39768787 PMCID: PMC11727723 DOI: 10.3390/jcm13247864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 12/11/2024] [Accepted: 12/19/2024] [Indexed: 01/06/2025] Open
Abstract
Background: The integration of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT-4, is transforming healthcare. ChatGPT's potential to assist in decision-making for complex cases, such as spinal metastasis treatment, is promising but widely untested. Especially in cancer patients who develop spinal metastases, precise and personalized treatment is essential. This study examines ChatGPT-4's performance in treatment planning for spinal metastasis cases compared to experienced spine surgeons. Materials and Methods: Five spine metastasis cases were randomly selected from recent literature. Consequently, five spine surgeons and ChatGPT-4 were tasked with providing treatment recommendations for each case in a standardized manner. Responses were analyzed for frequency distribution, agreement, and subjective rater opinions. Results: ChatGPT's treatment recommendations aligned with the majority of human raters in 73% of treatment choices, with moderate to substantial agreement on systemic therapy, pain management, and supportive care. However, ChatGPT's recommendations tended towards generalized statements, with raters noting its generalized answers. Agreement among raters improved in sensitivity analyses excluding ChatGPT, particularly for controversial areas like surgical intervention and palliative care. Conclusions: ChatGPT shows potential in aligning with experienced surgeons on certain treatment aspects of spinal metastasis. However, its generalized approach highlights limitations, suggesting that training with specific clinical guidelines could potentially enhance its utility in complex case management. Further studies are necessary to refine AI applications in personalized healthcare decision-making.
Collapse
Affiliation(s)
- Stephan Heisinger
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Stephan N. Salzmann
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Wolfgang Senker
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Stefan Aspalter
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Johannes Oberndorfer
- Department of Neurosurgery, Kepler University Hospital, 4020 Linz, Austria (S.A.)
| | - Michael P. Matzner
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| | - Martin N. Stienen
- Spine Center of Eastern Switzerland & Department of Neurosurgery, Kantonsspital St. Gallen, Medical School of St. Gallen, University of St.Gallen, 9000 St. Gallen, Switzerland
| | - Stefan Motov
- Spine Center of Eastern Switzerland & Department of Neurosurgery, Kantonsspital St. Gallen, Medical School of St. Gallen, University of St.Gallen, 9000 St. Gallen, Switzerland
| | - Dominikus Huber
- Division of Oncology, Department of Medicine I, Medical University of Vienna, 1090 Vienna, Austria
| | - Josef Georg Grohs
- Department of Orthopedics and Trauma Surgery, Medical University of Vienna, 1090 Vienna, Austria; (S.H.)
| |
Collapse
|
11
|
Soulage CO, Van Coppenolle F, Guebre-Egziabher F. The conversational AI "ChatGPT" outperforms medical students on a physiology university examination. ADVANCES IN PHYSIOLOGY EDUCATION 2024; 48:677-684. [PMID: 38991037 DOI: 10.1152/advan.00181.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 06/13/2024] [Accepted: 07/09/2024] [Indexed: 07/13/2024]
Abstract
Artificial intelligence (AI) has gained massive interest with the public release of the conversational AI "ChatGPT," but it also has become a matter of concern for academia as it can easily be misused. We performed a quantitative evaluation of the performance of ChatGPT on a medical physiology university examination. Forty-one answers were obtained with ChatGPT and compared to the results of 24 students. The results of ChatGPT were significantly better than those of the students; the median (IQR) score was 75% (66-84%) for the AI compared to 56% (43-65%) for students (P < 0.001). The exam success rate was 100% for ChatGPT, whereas 29% (n = 7) of students failed. ChatGPT could promote plagiarism and intellectual laziness among students and could represent a new and easy way to cheat, especially when evaluations are performed online. Considering that these powerful AI tools are now freely available, scholars should take great care to construct assessments that really evaluate student reflection skills and prevent AI-assisted cheating.NEW & NOTEWORTHY The release of the conversational artificial intelligence (AI) ChatGPT has become a matter of concern for academia as it can easily be misused by students for cheating purposes. We performed a quantitative evaluation of the performance of ChatGPT on a medical physiology university examination and observed that ChatGPT outperforms medical students obtaining significantly better grades. Scholars should therefore take great care to construct assessments crafted to really evaluate the student reflection skills and prevent AI-assisted cheating.
Collapse
Affiliation(s)
- Christophe O Soulage
- CarMeN, INSERM U1060, INRAe U1397, Université Claude Bernard Lyon 1, Bron, France
| | | | - Fitsum Guebre-Egziabher
- CarMeN, INSERM U1060, INRAe U1397, Université Claude Bernard Lyon 1, Bron, France
- Department of Nephrology, Groupement Hospitalier Centre, Hospices Civils de Lyon, Hôpital E. Herriot, Lyon, France
| |
Collapse
|
12
|
Suárez A, Jiménez J, Llorente de Pedro M, Andreu-Vázquez C, Díaz-Flores García V, Gómez Sánchez M, Freire Y. Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery. Comput Struct Biotechnol J 2024; 24:46-52. [PMID: 38162955 PMCID: PMC10755495 DOI: 10.1016/j.csbj.2023.11.058] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/28/2023] [Accepted: 11/28/2023] [Indexed: 01/03/2024] Open
Abstract
AI has revolutionized the way we interact with technology. Noteworthy advances in AI algorithms and large language models (LLM) have led to the development of natural generative language (NGL) systems such as ChatGPT. Although these LLM can simulate human conversations and generate content in real time, they face challenges related to the topicality and accuracy of the information they generate. This study aimed to assess whether ChatGPT-4 could provide accurate and reliable answers to general dentists in the field of oral surgery, and thus explore its potential as an intelligent virtual assistant in clinical decision making in oral surgery. Thirty questions related to oral surgery were posed to ChatGPT4, each question repeated 30 times. Subsequently, a total of 900 responses were obtained. Two surgeons graded the answers according to the guidelines of the Spanish Society of Oral Surgery, using a three-point Likert scale (correct, partially correct/incomplete, and incorrect). Disagreements were arbitrated by an experienced oral surgeon, who provided the final grade Accuracy was found to be 71.7%, and consistency of the experts' grading across iterations, ranged from moderate to almost perfect. ChatGPT-4, with its potential capabilities, will inevitably be integrated into dental disciplines, including oral surgery. In the future, it could be considered as an auxiliary intelligent virtual assistant, though it would never replace oral surgery experts. Proper training and verified information by experts will remain vital to the implementation of the technology. More comprehensive research is needed to ensure the safe and successful application of AI in oral surgery.
Collapse
Affiliation(s)
- Ana Suárez
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Jaime Jiménez
- Department of Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - María Llorente de Pedro
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Cristina Andreu-Vázquez
- Department of Veterinary Medicine, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Víctor Díaz-Flores García
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Margarita Gómez Sánchez
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Yolanda Freire
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| |
Collapse
|
13
|
Hasan B, Saadi S, Rajjoub NS, Hegazi M, Al-Kordi M, Fleti F, Farah M, Riaz IB, Banerjee I, Wang Z, Murad MH. Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment. BMJ Evid Based Med 2024; 29:394-398. [PMID: 38383136 DOI: 10.1136/bmjebm-2023-112597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/12/2024] [Indexed: 02/23/2024]
Abstract
Large language models (LLMs) may facilitate and expedite systematic reviews, although the approach to integrate LLMs in the review process is unclear. This study evaluates GPT-4 agreement with human reviewers in assessing the risk of bias using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool and proposes a framework for integrating LLMs into systematic reviews. The case study demonstrated that raw per cent agreement was the highest for the ROBINS-I domain of 'Classification of Intervention'. Kendall agreement coefficient was highest for the domains of 'Participant Selection', 'Missing Data' and 'Measurement of Outcomes', suggesting moderate agreement in these domains. Raw agreement about the overall risk of bias across domains was 61% (Kendall coefficient=0.35). The proposed framework for integrating LLMs into systematic reviews consists of four domains: rationale for LLM use, protocol (task definition, model selection, prompt engineering, data entry methods, human role and success metrics), execution (iterative revisions to the protocol) and reporting. We identify five basic task types relevant to systematic reviews: selection, extraction, judgement, analysis and narration. Considering the agreement level with a human reviewer in the case study, pairing artificial intelligence with an independent human reviewer remains required.
Collapse
Affiliation(s)
- Bashar Hasan
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Samer Saadi
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Noora S Rajjoub
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
| | - Moustafa Hegazi
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Mohammad Al-Kordi
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Farah Fleti
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Magdoleen Farah
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Irbaz B Riaz
- Division of Hematology-Oncology Department of Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Imon Banerjee
- Department of Radiology, Mayo Clinic Arizona, Scottsdale, Arizona, USA
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, Arizona, USA
| | - Zhen Wang
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Health Care Policy and Research, Mayo Clinic Minnesota, Rochester, Minnesota, USA
| | - Mohammad Hassan Murad
- Kern Center for the Science of Healthcare Delivery, Mayo Clinic, Rochester, Minnesota, USA
- Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
14
|
Ehrett C, Hegde S, Andre K, Liu D, Wilson T. Leveraging Open-Source Large Language Models for Data Augmentation in Hospital Staff Surveys: Mixed Methods Study. JMIR MEDICAL EDUCATION 2024; 10:e51433. [PMID: 39560937 PMCID: PMC11590755 DOI: 10.2196/51433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/09/2024] [Accepted: 08/15/2024] [Indexed: 11/20/2024]
Abstract
Background Generative large language models (LLMs) have the potential to revolutionize medical education by generating tailored learning materials, enhancing teaching efficiency, and improving learner engagement. However, the application of LLMs in health care settings, particularly for augmenting small datasets in text classification tasks, remains underexplored, particularly for cost- and privacy-conscious applications that do not permit the use of third-party services such as OpenAI's ChatGPT. Objective This study aims to explore the use of open-source LLMs, such as Large Language Model Meta AI (LLaMA) and Alpaca models, for data augmentation in a specific text classification task related to hospital staff surveys. Methods The surveys were designed to elicit narratives of everyday adaptation by frontline radiology staff during the initial phase of the COVID-19 pandemic. A 2-step process of data augmentation and text classification was conducted. The study generated synthetic data similar to the survey reports using 4 generative LLMs for data augmentation. A different set of 3 classifier LLMs was then used to classify the augmented text for thematic categories. The study evaluated performance on the classification task. Results The overall best-performing combination of LLMs, temperature, classifier, and number of synthetic data cases is via augmentation with LLaMA 7B at temperature 0.7 with 100 augments, using Robustly Optimized BERT Pretraining Approach (RoBERTa) for the classification task, achieving an average area under the receiver operating characteristic (AUC) curve of 0.87 (SD 0.02; ie, 1 SD). The results demonstrate that open-source LLMs can enhance text classifiers' performance for small datasets in health care contexts, providing promising pathways for improving medical education processes and patient care practices. Conclusions The study demonstrates the value of data augmentation with open-source LLMs, highlights the importance of privacy and ethical considerations when using LLMs, and suggests future directions for research in this field.
Collapse
Affiliation(s)
- Carl Ehrett
- Watt Family Innovation Center, Clemson University, Clemson, SC, United States
| | - Sudeep Hegde
- Department of Industrial Engineering, Clemson University, Clemson, SC, United States
| | - Kwame Andre
- Department of Computer Science, Clemson University, Clemson, SC, United States
| | - Dixizi Liu
- Department of Industrial Engineering, Clemson University, Clemson, SC, United States
| | - Timothy Wilson
- Department of Industrial Engineering, Clemson University, Clemson, SC, United States
| |
Collapse
|
15
|
Hsu TW, Tseng PT, Tsai SJ, Ko CH, Thompson T, Hsu CW, Yang FC, Tsai CK, Tu YK, Yang SN, Liang CS, Su KP. Quality and correctness of AI-generated versus human-written abstracts in psychiatric research papers. Psychiatry Res 2024; 341:116145. [PMID: 39213714 DOI: 10.1016/j.psychres.2024.116145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 05/25/2024] [Accepted: 08/11/2024] [Indexed: 09/04/2024]
Abstract
This study aimed to assess the ability of an artificial intelligence (AI)-based chatbot to generate abstracts from academic psychiatric articles. We provided 30 full-text psychiatric papers to ChatPDF (based on ChatGPT) and prompted generating a similar style structured or unstructured abstract. We further used 10 papers from Psychiatry Research as active comparators (unstructured format). We compared the quality of the ChatPDF-generated abstracts with the original human-written abstracts and examined the similarity, plagiarism, detected AI-content, and correctness of the AI-generated abstracts. Five experts evaluated the quality of the abstracts using a blinded approach. They also identified the abstracts written by the original authors and validated the conclusions produced by ChatPDF. We found that the similarity and plagiarism were relatively low (only 14.07% and 8.34%, respectively). The detected AI-content was 31.48% for generated structure-abstracts, 75.58% for unstructured-abstracts, and 66.48% for active comparators abstracts. For quality, generated structured-abstracts were rated similarly to originals, but unstructured ones received significantly lower scores. Experts rated 40% accuracy with structured abstracts, 73% with unstructured ones, and 77% for active comparators. However, 30% of AI-generated abstract conclusions were incorrect. In conclusion, the data organization capabilities of AI language models hold significant potential for applications to summarize information in clinical psychiatry. However, the use of ChatPDF to summarize psychiatric papers requires caution concerning accuracy.
Collapse
Affiliation(s)
- Tien-Wei Hsu
- Department of Psychiatry, E-DA Dachang Hospital, I-Shou University Kaohsiung, Taiwan; Department of Psychiatry, E-DA Hospital, I-Shou University, Kaohsiung, Taiwan
| | - Ping-Tao Tseng
- Institute of Biomedical Sciences, National Sun Yat-sen University, Kaohsiung, Taiwan; Department of Psychology, College of Medical and Health Science, Asia University, Taichung, Taiwan; Prospect Clinic for Otorhinolaryngology & Neurology, Kaohsiung, Taiwan; Institute of Precision Medicine, National Sun Yat-sen University, Kaohsiung, Taiwan
| | - Shih-Jen Tsai
- Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan
| | - Chih-Hung Ko
- Department of Psychiatry, Faculty of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan; Department of Psychiatry, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan; Department of Psychiatry, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Trevor Thompson
- Centre for Chronic Illness and Ageing, University of Greenwich, London, UK
| | - Chih-Wei Hsu
- Department of Psychiatry, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan
| | - Fu-Chi Yang
- Department of Neurology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Chia-Kuang Tsai
- Department of Neurology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Yu-Kang Tu
- Institute of Epidemiology & Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan; Department of Dentistry, National Taiwan University Hospital, Taipei, Taiwan
| | - Szu-Nian Yang
- Department of Psychiatry, Beitou Branch, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan; Department of Psychiatry, Armed Forces Taoyuan General Hospital, Taoyuan, Taiwan; Graduate Institute of Health and Welfare Policy, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Chih-Sung Liang
- Department of Psychiatry, Beitou Branch, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan; Department of Psychiatry, National Defense Medical Center, Taipei, Taiwan.
| | - Kuan-Pin Su
- College of Medicine, China Medical University, Taichung, Taiwan; Mind-Body Interface Laboratory (MBI-Lab), China Medical University and Hospital, Taichung, 404, Taiwan; An-Nan Hospital, China Medical University, Tainan, 709, Taiwan.
| |
Collapse
|
16
|
Hunter RB, Thammasitboon S, Rahman SS, Fainberg N, Renuart A, Kumar S, Jain PN, Rissmiller B, Sur M, Mehta S. Using ChatGPT to Provide Patient-Specific Answers to Parental Questions in the PICU. Pediatrics 2024; 154:e2024066615. [PMID: 39370900 DOI: 10.1542/peds.2024-066615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 08/12/2024] [Accepted: 08/14/2024] [Indexed: 10/08/2024] Open
Abstract
OBJECTIVES To determine if ChatGPT can incorporate patient-specific information to provide high-quality answers to parental questions in the PICU. We hypothesized that ChatGPT would generate high-quality, patient-specific responses. METHODS In this cross-sectional study, we generated assessments and plans for 3 PICU patients with respiratory failure, septic shock, and status epilepticus and paired them with 8 typical parental questions. We prompted ChatGPT with instructions, an assessment and plan, and 1 question. Six PICU physicians evaluated the responses for accuracy (1-6), completeness (yes/no), empathy (1-6), and understandability (Patient Education Materials Assessment Tool, PEMAT, 0% to 100%; Flesch-Kincaid grade level). We compared answer quality among scenarios and question types using the Kruskal-Wallis and Fischer's exact tests. We used percent agreement, Cohen's Kappa, and Gwet's agreement coefficient to estimate inter-rater reliability. RESULTS All answers incorporated patient details, utilizing them for reasoning in 59% of sentences. Responses had high accuracy (median 5.0, [interquartile range (IQR), 4.0-6.0]), empathy (median 5.0, [IQR, 5.0-6.0]), completeness (97% of all questions), and understandability (PEMAT % median 100, [IQR, 87.5-100]; Flesch-Kincaid level 8.7). Only 4/144 reviewer scores were <4/6 in accuracy, and no response was deemed likely to cause harm. There was no difference in accuracy, completeness, empathy, or understandability among scenarios or question types. We found fair, substantial, and almost perfect agreement among reviewers for accuracy, empathy, and understandability, respectively. CONCLUSIONS ChatGPT used patient-specific information to provide high-quality answers to parental questions in PICU clinical scenarios.
Collapse
Affiliation(s)
- R Brandon Hunter
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | - Satid Thammasitboon
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | | | | | - Andrew Renuart
- Boston Children's Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Shelley Kumar
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | - Parag N Jain
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | - Brian Rissmiller
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | - Moushumi Sur
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | - Sanjiv Mehta
- The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
- University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
17
|
Özbek EA, Ertan MB, Kından P, Karaca MO, Gürsoy S, Chahla J. ChatGPT Can Offer At Least Satisfactory Responses to Common Patient Questions Regarding Hip Arthroscopy. Arthroscopy 2024:S0749-8063(24)00640-6. [PMID: 39242057 DOI: 10.1016/j.arthro.2024.08.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 08/24/2024] [Accepted: 08/24/2024] [Indexed: 09/09/2024]
Abstract
PURPOSE To assess the accuracy of answers provided by ChatGPT 4.0 (an advanced language model developed by OpenAI) regarding 25 common patient questions about hip arthroscopy. METHODS ChatGPT 4.0 was presented with 25 common patient questions regarding hip arthroscopy with no follow-up questions and repetition. Each response was evaluated by 2 board-certified orthopaedic sports medicine surgeons independently. Responses were rated, with scores of 1, 2, 3, and 4 corresponding to "excellent response not requiring clarification," "satisfactory requiring minimal clarification," "satisfactory requiring moderate clarification," and "unsatisfactory requiring substantial clarification," respectively. RESULTS Twenty responses were rated "excellent" and 2 responses were rated "satisfactory requiring minimal clarification" by both of reviewers. Responses to questions "What kind of anesthesia is used for hip arthroscopy?" and "What is the average age for hip arthroscopy?" were rated as "satisfactory requiring minimal clarification" by both reviewers. None of the responses were rated as "satisfactory requiring moderate clarification" or "unsatisfactory" by either of the reviewers. CONCLUSIONS ChatGPT 4.0 provides at least satisfactory responses to patient questions regarding hip arthroscopy. Under the supervision of an orthopaedic sports medicine surgeon, it could be used as a supplementary tool for patient education. CLINICAL RELEVANCE This study compared the answers of ChatGPT to patients' questions regarding hip arthroscopy with the current literature. As ChatGPT has gained popularity among patients, the study aimed to find if the responses that patients get from this chatbot are compatible with the up-to-date literature.
Collapse
Affiliation(s)
- Emre Anıl Özbek
- Department of Orthopaedics and Traumatology, Ankara University, Ankara, Turkey
| | - Mehmet Batu Ertan
- Orthopedics and Traumatology Department, Medicana International Ankara Hospital, Ankara, Turkey
| | - Peri Kından
- Department of Orthopaedics and Traumatology, Ankara University, Ankara, Turkey
| | - Mustafa Onur Karaca
- Department of Orthopaedics and Traumatology, Ankara University, Ankara, Turkey
| | - Safa Gürsoy
- Department of Orthopaedics and Traumatology, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| | - Jorge Chahla
- Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, Illinois, U.S.A..
| |
Collapse
|
18
|
Cavalcante-Silva V, D'Almeida V, Tufik S, Andersen ML. Artificial Intelligence, the Production of Scientific Texts, and the Implications for Sleep Science: Exploring Emerging Paradigms and Perspectives. Sleep Sci 2024; 17:e322-e324. [PMID: 39268338 PMCID: PMC11390161 DOI: 10.1055/s-0044-1788285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 09/21/2023] [Indexed: 09/15/2024] Open
Abstract
The emergence of artificial intelligence (AI) has revolutionized many fields, including natural language processing, and marks a potential paradigm shift in the way we evaluate knowledge. One significant innovation in this area is ChatGPT, a large language model based on the GPT-3.5 architecture created by OpenAI, with one of its main aims being to aid in general text writing, including scientific texts. Here, we highlight the challenges and opportunities related to using generative AI and discuss both the benefits of its use, such as saving time by streamlining the writing process and reducing the amount of time spent on mundane tasks, and the potential drawbacks, including concerns regarding the accuracy and reliability of the information generated and its ethical use. In respect of both education and the writing of scientific texts, clear rules and objectives and institutional principles must be established for the use of AI. We also consider the positive and negative effects of the use of AI technologies on interpersonal interactions and behavior, and, as sleep scientists, its potential impacts on sleep. Striking a balance between the benefits and potential drawbacks of integrating AI into society demands ongoing research by experts, the wide dissemination of the scientific results, as well as continued public discourse on the subject.
Collapse
Affiliation(s)
- Vanessa Cavalcante-Silva
- Departamento de Psicobiologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, SP, Brazil
| | - Vânia D'Almeida
- Departamento de Psicobiologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, SP, Brazil
| | - Sergio Tufik
- Departamento de Psicobiologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, SP, Brazil
- Instituto do Sono, Associação Fundo Incentivo à Pesquisa (AFIP), São Paulo, SP, Brazil
| | - Monica L Andersen
- Departamento de Psicobiologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, SP, Brazil
- Instituto do Sono, Associação Fundo Incentivo à Pesquisa (AFIP), São Paulo, SP, Brazil
| |
Collapse
|
19
|
Nietsch KS, Shrestha N, Mazudie Ndjonko LC, Ahmed W, Mejia MR, Zaidat B, Ren R, Duey AH, Li SQ, Kim JS, Hidden KA, Cho SK. Can Large Language Models (LLMs) Predict the Appropriate Treatment of Acute Hip Fractures in Older Adults? Comparing Appropriate Use Criteria With Recommendations From ChatGPT. J Am Acad Orthop Surg Glob Res Rev 2024; 8:01979360-202408000-00007. [PMID: 39137403 PMCID: PMC11319315 DOI: 10.5435/jaaosglobal-d-24-00206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 06/16/2024] [Indexed: 08/15/2024]
Abstract
BACKGROUND Acute hip fractures are a public health problem affecting primarily older adults. Chat Generative Pretrained Transformer may be useful in providing appropriate clinical recommendations for beneficial treatment. OBJECTIVE To evaluate the accuracy of Chat Generative Pretrained Transformer (ChatGPT)-4.0 by comparing its appropriateness scores for acute hip fractures with the American Academy of Orthopaedic Surgeons (AAOS) Appropriate Use Criteria given 30 patient scenarios. "Appropriateness" indicates the unexpected health benefits of treatment exceed the expected negative consequences by a wide margin. METHODS Using the AAOS Appropriate Use Criteria as the benchmark, numerical scores from 1 to 9 assessed appropriateness. For each patient scenario, ChatGPT-4.0 was asked to assign an appropriate score for six treatments to manage acute hip fractures. RESULTS Thirty patient scenarios were evaluated for 180 paired scores. Comparing ChatGPT-4.0 with AAOS scores, there was a positive correlation for multiple cannulated screw fixation, total hip arthroplasty, hemiarthroplasty, and long cephalomedullary nails. Statistically significant differences were observed only between scores for long cephalomedullary nails. CONCLUSION ChatGPT-4.0 scores were not concordant with AAOS scores, overestimating the appropriateness of total hip arthroplasty, hemiarthroplasty, and long cephalomedullary nails, and underestimating the other three. ChatGPT-4.0 was inadequate in selecting an appropriate treatment deemed acceptable, most reasonable, and most likely to improve patient outcomes.
Collapse
Affiliation(s)
- Katrina S. Nietsch
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| | - Nancy Shrestha
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| | - Laura C. Mazudie Ndjonko
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| | - Wasil Ahmed
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| | - Mateo Restrepo Mejia
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| | - Bashar Zaidat
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| | - Renee Ren
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| | - Akiro H. Duey
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| | - Samuel Q. Li
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| | - Jun S. Kim
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| | - Krystin A. Hidden
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| | - Samuel K. Cho
- From the Icahn School of Medicine at Mount Sinai, New York, NY (Ms. Nietsch, Mr. Ahmed, Mr. Mejia, Mr. Zaidat, Ms. Ren, and Mr. Duey); the Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, IL (Ms. Shrestha); the Northwestern University, Chicago, IL (Ms. Mazudie Ndjonko); the PGY-6, Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Li); the Department of Orthopedics and Orthopedic Surgery, Mount Sinai Hospital, New York, NY (Dr. Kim); the Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN (Dr. Hidden); and the Department of Orthopedic Surgery and Neurosurgery, Mount Sinai Hospital, New York, NY (Dr. Cho)
| |
Collapse
|
20
|
Large Language Models in Orthopaedic Publications: The Good, the Bad and the Ugly. Orthop J Sports Med 2024; 12:23259671241265705. [PMID: 39176267 PMCID: PMC11339935 DOI: 10.1177/23259671241265705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 08/24/2024] Open
|
21
|
Taylor WL, Cheng R, Weinblatt AI, Bergstein V, Long WJ. An Artificial Intelligence Chatbot is an Accurate and Useful Online Patient Resource Prior to Total Knee Arthroplasty. J Arthroplasty 2024; 39:S358-S362. [PMID: 38350517 DOI: 10.1016/j.arth.2024.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 02/05/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND Online information is a useful resource for patients seeking advice on their orthopaedic care. While traditional websites provide responses to specific frequently asked questions (FAQs), sophisticated artificial intelligence tools may be able to provide the same information to patients in a more accessible manner. Chat Generative Pretrained Transformer (ChatGPT) is a powerful artificial intelligence chatbot that has been shown to effectively draw on its large reserves of information in a conversational context with a user. The purpose of this study was to assess the accuracy and reliability of ChatGPT-generated responses to FAQs regarding total knee arthroplasty. METHODS We distributed a survey that challenged arthroplasty surgeons to identify which of the 2 responses to FAQs on our institution's website was human-written and which was generated by ChatGPT. All questions were total knee arthroplasty-related. The second portion of the survey investigated the potential to further leverage ChatGPT to assist with translation and accessibility as a means to better meet the needs of our diverse patient population. RESULTS Surgeons correctly identified the ChatGPT-generated responses 4 out of 10 times on average (range: 0 to 7). No consensus was reached on any of the responses to the FAQs. Additionally, over 90% of our surgeons strongly encouraged the use of ChatGPT to more effectively accommodate the diverse patient populations that seek information from our hospital's online resources. CONCLUSIONS ChatGPT provided accurate, reliable answers to our website's FAQs. Surgeons also agreed that ChatGPT's ability to provide targeted, language-specific responses to FAQs would be of benefit to our diverse patient population.
Collapse
Affiliation(s)
- Walter L Taylor
- Department of Adult Reconstruction and Joint Replacement, Hospital for Special Surgery, New York, New York
| | - Ryan Cheng
- Department of Adult Reconstruction and Joint Replacement, Hospital for Special Surgery, New York, New York
| | - Aaron I Weinblatt
- Department of Adult Reconstruction and Joint Replacement, Hospital for Special Surgery, New York, New York
| | - Victoria Bergstein
- Department of Adult Reconstruction and Joint Replacement, Hospital for Special Surgery, New York, New York
| | - William J Long
- Department of Adult Reconstruction and Joint Replacement, Hospital for Special Surgery, New York, New York
| |
Collapse
|
22
|
Herzog I, Mendiratta D, Para A, Berg A, Kaushal N, Vives M. Assessing the potential role of ChatGPT in spine surgery research. J Exp Orthop 2024; 11:e12057. [PMID: 38873173 PMCID: PMC11170336 DOI: 10.1002/jeo2.12057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 05/12/2024] [Accepted: 05/28/2024] [Indexed: 06/15/2024] Open
Abstract
Purpose Since its release in November 2022, Chat Generative Pre-Trained Transformer 3.5 (ChatGPT), a complex machine learning model, has garnered more than 100 million users worldwide. The aim of this study is to determine how well ChatGPT can generate novel systematic review ideas on topics within spine surgery. Methods ChatGPT was instructed to give ten novel systematic review ideas for five popular topics in spine surgery literature: microdiscectomy, laminectomy, spinal fusion, kyphoplasty and disc replacement. A comprehensive literature search was conducted in PubMed, CINAHL, EMBASE and Cochrane. The number of nonsystematic review articles and number of systematic review papers that had been published on each ChatGPT-generated idea were recorded. Results Overall, ChatGPT had a 68% accuracy rate in creating novel systematic review ideas. More specifically, the accuracy rates were 80%, 80%, 40%, 70% and 70% for microdiscectomy, laminectomy, spinal fusion, kyphoplasty and disc replacement, respectively. However, there was a 32% rate of ChatGPT generating ideas for which there were 0 nonsystematic review articles published. There was a 71.4%, 50%, 22.2%, 50%, 62.5% and 51.2% success rate of generating novel systematic review ideas, for which there were also nonsystematic reviews published, for microdiscectomy, laminectomy, spinal fusion, kyphoplasty, disc replacement and overall, respectively. Conclusions ChatGPT generated novel systematic review ideas at an overall rate of 68%. ChatGPT can help identify knowledge gaps in spine research that warrant further investigation, when used under supervision of an experienced spine specialist. This technology can be erroneous and lacks intrinsic logic; so, it should never be used in isolation. Level of Evidence Not applicable.
Collapse
Affiliation(s)
- Isabel Herzog
- Rutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | | | - Ashok Para
- Rutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | - Ari Berg
- Rutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | - Neil Kaushal
- Rutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | - Michael Vives
- Rutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| |
Collapse
|
23
|
Rupp M, Moser LB, Hess S, Angele P, Aurich M, Dyrna F, Nehrer S, Neubauer M, Pawelczyk J, Izadpanah K, Zellner J, Niemeyer P. Orthopaedic surgeons display a positive outlook towards artificial intelligence: A survey among members of the AGA Society for Arthroscopy and Joint Surgery. J Exp Orthop 2024; 11:e12080. [PMID: 38974054 PMCID: PMC11227606 DOI: 10.1002/jeo2.12080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 06/13/2024] [Accepted: 06/21/2024] [Indexed: 07/09/2024] Open
Abstract
Purpose The purpose of this study was to evaluate the perspective of orthopaedic surgeons on the impact of artificial intelligence (AI) and to evaluate the influence of experience, workplace setting and familiarity with digital solutions on views on AI. Methods Orthopaedic surgeons of the AGA Society for Arthroscopy and Joint Surgery were invited to participate in an online, cross-sectional survey designed to gather information on professional background, subjective AI knowledge, opinion on the future impact of AI, openness towards different applications of AI, and perceived advantages and disadvantages of AI. Subgroup analyses were performed to examine the influence of experience, workplace setting and openness towards digital solutions on perspectives towards AI. Results Overall, 360 orthopaedic surgeons participated. The majority indicated average (43.6%) or rudimentary (38.1%) AI knowledge. Most (54.5%) expected AI to substantially influence orthopaedics within 5-10 years, predominantly as a complementary tool (91.1%). Preoperative planning (83.8%) was identified as the most likely clinical use case. A lack of consensus was observed regarding acceptable error levels. Time savings in preoperative planning (62.5%) and improved documentation (81%) were identified as notable advantages while declining skills of the next generation (64.5%) were rated as the most substantial drawback. There were significant differences in subjective AI knowledge depending on participants' experience (p = 0.021) and familiarity with digital solutions (p < 0.001), acceptable error levels depending on workplace setting (p = 0.004), and prediction of AI impact depending on familiarity with digital solutions (p < 0.001). Conclusion The majority of orthopaedic surgeons in this survey anticipated a notable positive impact of AI on their field, primarily as an assistive technology. A lack of consensus on acceptable error levels of AI and concerns about declining skills among future surgeons were observed. Level of Evidence Level IV, cross-sectional study.
Collapse
Affiliation(s)
- Marco‐Christopher Rupp
- Sektion Sportorthopädie, Klinikum rechts der IsarTechnische Universität MünchenMunichGermany
- Steadman Philippon Research InstituteVailColoradoUSA
| | - Lukas B. Moser
- Klinische Abteilung für Orthopädie und TraumatologieUniversitätsklinikum KremsKrems an der DonauAustria
- Zentrum für Regenerative MedizinUniversität für Weiterbildung KremsKrems an der DonauAustria
- SporthopaedicumRegensburgGermany
| | - Silvan Hess
- Universitätsklinik für Orthopädische Chirurgie und Traumatologie, InselspitalBernSwitzerland
| | - Peter Angele
- SporthopaedicumRegensburgGermany
- Klinik für Unfall‐ und WiederherstellungschirurgieUniversitätsklinikum RegensburgRegensburgGermany
| | | | | | - Stefan Nehrer
- Klinische Abteilung für Orthopädie und TraumatologieUniversitätsklinikum KremsKrems an der DonauAustria
- Zentrum für Regenerative MedizinUniversität für Weiterbildung KremsKrems an der DonauAustria
- Fakultät für Gesundheit und MedizinUniversität für Weiterbildung KremsKrems an der DonauAustria
| | - Markus Neubauer
- Klinische Abteilung für Orthopädie und TraumatologieUniversitätsklinikum KremsKrems an der DonauAustria
- Zentrum für Regenerative MedizinUniversität für Weiterbildung KremsKrems an der DonauAustria
| | - Johannes Pawelczyk
- Sektion Sportorthopädie, Klinikum rechts der IsarTechnische Universität MünchenMunichGermany
| | - Kaywan Izadpanah
- Klinik für Orthopädie und Unfallchirurgie, Universitätsklinikum Freiburg, Medizinische FakultätAlbert‐Ludwigs‐Universität FreiburgFreiburgGermany
| | | | - Philipp Niemeyer
- OCM – Orthopädische Chirurgie MünchenMunichGermany
- Albert‐Ludwigs‐UniversityFreiburgGermany
| | | |
Collapse
|
24
|
Wascher DC, Ollivier M. Large Language Models in Orthopaedic Publications: The Good, the Bad and the Ugly. Am J Sports Med 2024; 52:2193-2195. [PMID: 39101739 DOI: 10.1177/03635465241265692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 08/06/2024]
|
25
|
Xu R, Wang Z. Generative artificial intelligence in healthcare from the perspective of digital media: Applications, opportunities and challenges. Heliyon 2024; 10:e32364. [PMID: 38975200 PMCID: PMC11225727 DOI: 10.1016/j.heliyon.2024.e32364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 06/03/2024] [Indexed: 07/09/2024] Open
Abstract
Introduction The emergence and application of generative artificial intelligence/large language models (hereafter GenAI LLMs) have the potential for significant impact on the healthcare industry. However, there is currently a lack of systematic research on GenAI LLMs in healthcare based on reliable data. This article aims to conduct an exploratory study of the application of GenAI LLMs (i.e., ChatGPT) in healthcare from the perspective of digital media (i.e., online news), including the application scenarios, potential opportunities, and challenges. Methods This research used thematic qualitative text analysis in five steps: firstly, developing main topical categories based on relevant articles; secondly, encoding the search keywords using these categories; thirdly, conducting searches for news articles via Google ; fourthly, encoding the sub-categories using the elaborate category system; and finally, conducting category-based analysis and presenting the results. Natural language processing techniques, including the TermRaider and AntConc tool, were applied in the aforementioned steps to assist in text qualitative analysis. Additionally, this study built a framework, using for analyzing the above three topics, from the perspective of five different stakeholders, including healthcare demanders and providers. Results This study summarizes 26 applications (e.g., provide medical advice, provide diagnosis and triage recommendations, provide mental health support, etc.), 21 opportunities (e.g., make healthcare more accessible, reduce healthcare costs, improve patients care, etc.), and 17 challenges (e.g., generate inaccurate/misleading/wrong answers, raise privacy concerns, lack of transparency, etc.), and analyzes the reasons for the formation of these key items and the links between the three research topics. Conclusions The application of GenAI LLMs in healthcare is primarily focused on transforming the way healthcare demanders access medical services (i.e., making it more intelligent, refined, and humane) and optimizing the processes through which healthcare providers offer medical services (i.e., simplifying, ensuring timeliness, and reducing errors). As the application becomes more widespread and deepens, GenAI LLMs is expected to have a revolutionary impact on traditional healthcare service models, but it also inevitably raises ethical and security concerns. Furthermore, GenAI LLMs applied in healthcare is still in the initial stage, which can be accelerated from a specific healthcare field (e.g., mental health) or a specific mechanism (e.g., GenAI LLMs' economic benefits allocation mechanism applied to healthcare) with empirical or clinical research.
Collapse
Affiliation(s)
- Rui Xu
- School of Economics, Guangdong University of Technology, Guangzhou, China
| | - Zhong Wang
- School of Economics, Guangdong University of Technology, Guangzhou, China
- Key Laboratory of Digital Economy and Data Governance, Guangdong University of Technology, Guangzhou, China
| |
Collapse
|
26
|
Costa ICP, do Nascimento MC, Treviso P, Chini LT, Roza BDA, Barbosa SDFF, Mendes KDS. Using the Chat Generative Pre-trained Transformer in academic writing in health: a scoping review. Rev Lat Am Enfermagem 2024; 32:e4194. [PMID: 38922265 PMCID: PMC11182606 DOI: 10.1590/1518-8345.7133.4194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 02/04/2024] [Indexed: 06/27/2024] Open
Abstract
OBJECTIVE to map the scientific literature regarding the use of the Chat Generative Pre-trained Transformer, ChatGPT, in academic writing in health. METHOD this was a scoping review, following the JBI methodology. Conventional databases and gray literature were included. The selection of studies was applied after removing duplicates and individual and paired evaluation. Data were extracted based on an elaborate script, and presented in a descriptive, tabular and graphical format. RESULTS the analysis of the 49 selected articles revealed that ChatGPT is a versatile tool, contributing to scientific production, description of medical procedures and preparation of summaries aligned with the standards of scientific journals. Its application has been shown to improve the clarity of writing and benefits areas such as innovation and automation. Risks were also observed, such as the possibility of lack of originality and ethical issues. Future perspectives highlight the need for adequate regulation, agile adaptation and the search for an ethical balance in incorporating ChatGPT into academic writing. CONCLUSION ChatGPT presents transformative potential in academic writing in health. However, its adoption requires rigorous human supervision, solid regulation, and transparent guidelines to ensure its responsible and beneficial use by the scientific community.
Collapse
Affiliation(s)
| | | | - Patrícia Treviso
- Universidade do Vale do Rio dos Sinos, Escola de Saúde, São Leopoldo, RS, Brazil
| | | | | | | | - Karina Dal Sasso Mendes
- Universidade de São Paulo, Escola de Enfermagem de Ribeirão Preto, PAHO/WHO Collaborating Centre for Nursing Research Development, Ribeirão Preto, SP, Brazil
| |
Collapse
|
27
|
Maggio MG, Tartarisco G, Cardile D, Bonanno M, Bruschetta R, Pignolo L, Pioggia G, Calabrò RS, Cerasa A. Exploring ChatGPT's potential in the clinical stream of neurorehabilitation. Front Artif Intell 2024; 7:1407905. [PMID: 38903157 PMCID: PMC11187276 DOI: 10.3389/frai.2024.1407905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 05/13/2024] [Indexed: 06/22/2024] Open
Abstract
In several medical fields, generative AI tools such as ChatGPT have achieved optimal performance in identifying correct diagnoses only by evaluating narrative clinical descriptions of cases. The most active fields of application include oncology and COVID-19-related symptoms, with preliminary relevant results also in psychiatric and neurological domains. This scoping review aims to introduce the arrival of ChatGPT applications in neurorehabilitation practice, where such AI-driven solutions have the potential to revolutionize patient care and assistance. First, a comprehensive overview of ChatGPT, including its design, and potential applications in medicine is provided. Second, the remarkable natural language processing skills and limitations of these models are examined with a focus on their use in neurorehabilitation. In this context, we present two case scenarios to evaluate ChatGPT ability to resolve higher-order clinical reasoning. Overall, we provide support to the first evidence that generative AI can meaningfully integrate as a facilitator into neurorehabilitation practice, aiding physicians in defining increasingly efficacious diagnostic and personalized prognostic plans.
Collapse
Affiliation(s)
| | - Gennaro Tartarisco
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | | | - Roberta Bruschetta
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | - Giovanni Pioggia
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | - Antonio Cerasa
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
- S’Anna Institute, Crotone, Italy
- Pharmacotechnology Documentation and Transfer Unit, Preclinical and Translational Pharmacology, Department of Pharmacy, Health and Nutritional Sciences, University of Calabria, Rende, Italy
| |
Collapse
|
28
|
Lee JC, Hamill CS, Shnayder Y, Buczek E, Kakarala K, Bur AM. Exploring the Role of Artificial Intelligence Chatbots in Preoperative Counseling for Head and Neck Cancer Surgery. Laryngoscope 2024; 134:2757-2761. [PMID: 38126511 DOI: 10.1002/lary.31243] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 10/25/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023]
Abstract
OBJECTIVE To evaluate the potential use of artificial intelligence (AI) chatbots, such as ChatGPT, in preoperative counseling for patients undergoing head and neck cancer surgery. STUDY DESIGN Cross-Sectional Survey Study. SETTING Single institution tertiary care center. METHODS ChatGPT was used to generate presurgical educational information including indications, risks, and recovery time for five common head and neck surgeries. Chatbot-generated information was compared with information gathered from a simple browser search (first publicly available website excluding scholarly articles). The accuracy of the information, readability, thoroughness, and number of errors were compared by five experienced head and neck surgeons in a blinded fashion. Each surgeon then chose a preference between the two information sources for each surgery. RESULTS With the exception of total word count, ChatGPT-generated pre-surgical information has similar readability, content of knowledge, accuracy, thoroughness, and numbers of medical errors when compared to publicly available websites. Additionally, ChatGPT was preferred 48% of the time by experienced head and neck surgeons. CONCLUSION Head and neck surgeons rated ChatGPT-generated and readily available online educational materials similarly. Further refinement in AI technology may soon open more avenues for patient counseling. Future investigations into the medical safety of AI counseling and exploring patients' perspectives would be of strong interest. LEVEL OF EVIDENCE N/A. Laryngoscope, 134:2757-2761, 2024.
Collapse
Affiliation(s)
- Jason C Lee
- Department of Otolaryngology, University of Kansas Medical Center, Kansas City, Kansas, U.S.A
| | - Chelsea S Hamill
- Department of Otolaryngology, University of Kansas Medical Center, Kansas City, Kansas, U.S.A
| | - Yelizaveta Shnayder
- Department of Otolaryngology, University of Kansas Medical Center, Kansas City, Kansas, U.S.A
| | - Erin Buczek
- Department of Otolaryngology, University of Kansas Medical Center, Kansas City, Kansas, U.S.A
| | - Kiran Kakarala
- Department of Otolaryngology, University of Kansas Medical Center, Kansas City, Kansas, U.S.A
| | - Andrés M Bur
- Department of Otolaryngology, University of Kansas Medical Center, Kansas City, Kansas, U.S.A
| |
Collapse
|
29
|
Levin G, Meyer R, Guigue PA, Brezinov Y. It takes one to know one-Machine learning for identifying OBGYN abstracts written by ChatGPT. Int J Gynaecol Obstet 2024; 165:1257-1260. [PMID: 38234125 DOI: 10.1002/ijgo.15365] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/08/2023] [Accepted: 12/26/2023] [Indexed: 01/19/2024]
Abstract
OBJECTIVES To use machine learning to optimize the detection of obstetrics and gynecology (OBGYN) Chat Generative Pre-trained Transformer (ChatGPT) -written abstracts of all OBGYN journals. METHODS We used Web of Science to identify all original articles published in all OBGYN journals in 2022. Seventy-five original articles were randomly selected. For each, we prompted ChatGPT to write an abstract based on the title and results of the original abstracts. Each abstract was tested by Grammarly software and reports were inserted into a database. Machine-learning modes were trained and examined on the database created. RESULTS Overall, 75 abstracts from 12 different OBGYN journals were randomly selected. There were seven (58%) Q1 journals, one (8%) Q2 journal, two (17%) Q3 journals, and two (17%) Q4 journals. Use of mixed dialects of English, absence of comma-misuse, absence of incorrect verb forms, and improper formatting were important prediction variables of ChatGPT-written abstracts. The deep-learning model had the highest predictive performance of all examined models. This model achieved the following performance: accuracy 0.90, precision 0.92, recall 0.85, area under the curve 0.95. CONCLUSIONS Machine-learning-based tools reach high accuracy in identifying ChatGPT-written OBGYN abstracts.
Collapse
Affiliation(s)
- Gabriel Levin
- The Department of Obstetrics and Gynecology, Hadassah-Hebrew University Medical Center, Jerusalem, Israel
- Lady Davis Institute for Cancer Research, Jewish General Hospital, McGill University, Montreal, Quebec, Canada
| | - Raanan Meyer
- Division of Minimally Invasive Gynecologic Surgery, Department of Obstetrics and Gynecology, Cedars Sinai Medical Center, Los Angeles, California, USA
- The Dr. Pinchas Bornstein Talpiot Medical Leadership Program, Sheba Medical Center, Ramat-Gan, Israel
| | - Paul-Adrien Guigue
- Lady Davis Institute for Cancer Research, Jewish General Hospital, McGill University, Montreal, Quebec, Canada
| | - Yoav Brezinov
- Department of Experimental Surgery, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
30
|
Morya VK, Lee HW, Shahid H, Magar AG, Lee JH, Kim JH, Jun L, Noh KC. Application of ChatGPT for Orthopedic Surgeries and Patient Care. Clin Orthop Surg 2024; 16:347-356. [PMID: 38827766 PMCID: PMC11130626 DOI: 10.4055/cios23181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 11/15/2023] [Accepted: 12/12/2023] [Indexed: 06/05/2024] Open
Abstract
Artificial intelligence (AI) has rapidly transformed various aspects of life, and the launch of the chatbot "ChatGPT" by OpenAI in November 2022 has garnered significant attention and user appreciation. ChatGPT utilizes natural language processing based on a "generative pre-trained transfer" (GPT) model, specifically the transformer architecture, to generate human-like responses to a wide range of questions and topics. Equipped with approximately 57 billion words and 175 billion parameters from online data, ChatGPT has potential applications in medicine and orthopedics. One of its key strengths is its personalized, easy-to-understand, and adaptive response, which allows it to learn continuously through user interaction. This article discusses how AI, especially ChatGPT, presents numerous opportunities in orthopedics, ranging from preoperative planning and surgical techniques to patient education and medical support. Although ChatGPT's user-friendly responses and adaptive capabilities are laudable, its limitations, including biased responses and ethical concerns, necessitate its cautious and responsible use. Surgeons and healthcare providers should leverage the strengths of the ChatGPT while recognizing its current limitations and verifying critical information through independent research and expert opinions. As AI technology continues to evolve, ChatGPT may become a valuable tool in orthopedic education and patient care, leading to improved outcomes and efficiency in healthcare delivery. The integration of AI into orthopedics offers substantial benefits but requires careful consideration and continuous improvement.
Collapse
Affiliation(s)
- Vivek Kumar Morya
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Ho-Won Lee
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Hamzah Shahid
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Anuja Gajanan Magar
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Ju-Hyung Lee
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Jae-Hyung Kim
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Lang Jun
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| | - Kyu-Cheol Noh
- Department of Orthopedic Surgery, Hallym University Kangnam Sacred Heart Hospital, Seoul, Korea
| |
Collapse
|
31
|
Borna S, Gomez-Cabello CA, Pressman SM, Haider SA, Sehgal A, Leibovich BC, Cole D, Forte AJ. Comparative Analysis of Artificial Intelligence Virtual Assistant and Large Language Models in Post-Operative Care. Eur J Investig Health Psychol Educ 2024; 14:1413-1424. [PMID: 38785591 PMCID: PMC11119735 DOI: 10.3390/ejihpe14050093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/11/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024] Open
Abstract
In postoperative care, patient education and follow-up are pivotal for enhancing the quality of care and satisfaction. Artificial intelligence virtual assistants (AIVA) and large language models (LLMs) like Google BARD and ChatGPT-4 offer avenues for addressing patient queries using natural language processing (NLP) techniques. However, the accuracy and appropriateness of the information vary across these platforms, necessitating a comparative study to evaluate their efficacy in this domain. We conducted a study comparing AIVA (using Google Dialogflow) with ChatGPT-4 and Google BARD, assessing the accuracy, knowledge gap, and response appropriateness. AIVA demonstrated superior performance, with significantly higher accuracy (mean: 0.9) and lower knowledge gap (mean: 0.1) compared to BARD and ChatGPT-4. Additionally, AIVA's responses received higher Likert scores for appropriateness. Our findings suggest that specialized AI tools like AIVA are more effective in delivering precise and contextually relevant information for postoperative care compared to general-purpose LLMs. While ChatGPT-4 shows promise, its performance varies, particularly in verbal interactions. This underscores the importance of tailored AI solutions in healthcare, where accuracy and clarity are paramount. Our study highlights the necessity for further research and the development of customized AI solutions to address specific medical contexts and improve patient outcomes.
Collapse
Affiliation(s)
- Sahar Borna
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
| | | | | | - Syed Ali Haider
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Ajai Sehgal
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
| | - Bradley C. Leibovich
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
- Department of Urology, Mayo Clinic, Rochester, MN 55905, USA
| | - Dave Cole
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
| | - Antonio Jorge Forte
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
32
|
Ruksakulpiwat S, Phianhasin L, Benjasirisan C, Ding K, Ajibade A, Kumar A, Stewart C. Assessing the Efficacy of ChatGPT Versus Human Researchers in Identifying Relevant Studies on mHealth Interventions for Improving Medication Adherence in Patients With Ischemic Stroke When Conducting Systematic Reviews: Comparative Analysis. JMIR Mhealth Uhealth 2024; 12:e51526. [PMID: 38710069 PMCID: PMC11106699 DOI: 10.2196/51526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 02/11/2024] [Accepted: 03/27/2024] [Indexed: 05/08/2024] Open
Abstract
BACKGROUND ChatGPT by OpenAI emerged as a potential tool for researchers, aiding in various aspects of research. One such application was the identification of relevant studies in systematic reviews. However, a comprehensive comparison of the efficacy of relevant study identification between human researchers and ChatGPT has not been conducted. OBJECTIVE This study aims to compare the efficacy of ChatGPT and human researchers in identifying relevant studies on medication adherence improvement using mobile health interventions in patients with ischemic stroke during systematic reviews. METHODS This study used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Four electronic databases, including CINAHL Plus with Full Text, Web of Science, PubMed, and MEDLINE, were searched to identify articles published from inception until 2023 using search terms based on MeSH (Medical Subject Headings) terms generated by human researchers versus ChatGPT. The authors independently screened the titles, abstracts, and full text of the studies identified through separate searches conducted by human researchers and ChatGPT. The comparison encompassed several aspects, including the ability to retrieve relevant studies, accuracy, efficiency, limitations, and challenges associated with each method. RESULTS A total of 6 articles identified through search terms generated by human researchers were included in the final analysis, of which 4 (67%) reported improvements in medication adherence after the intervention. However, 33% (2/6) of the included studies did not clearly state whether medication adherence improved after the intervention. A total of 10 studies were included based on search terms generated by ChatGPT, of which 6 (60%) overlapped with studies identified by human researchers. Regarding the impact of mobile health interventions on medication adherence, most included studies (8/10, 80%) based on search terms generated by ChatGPT reported improvements in medication adherence after the intervention. However, 20% (2/10) of the studies did not clearly state whether medication adherence improved after the intervention. The precision in accurately identifying relevant studies was higher in human researchers (0.86) than in ChatGPT (0.77). This is consistent with the percentage of relevance, where human researchers (9.8%) demonstrated a higher percentage of relevance than ChatGPT (3%). However, when considering the time required for both humans and ChatGPT to identify relevant studies, ChatGPT substantially outperformed human researchers as it took less time to identify relevant studies. CONCLUSIONS Our comparative analysis highlighted the strengths and limitations of both approaches. Ultimately, the choice between human researchers and ChatGPT depends on the specific requirements and objectives of each review, but the collaborative synergy of both approaches holds the potential to advance evidence-based research and decision-making in the health care field.
Collapse
Affiliation(s)
- Suebsarn Ruksakulpiwat
- Department of Medical Nursing, Faculty of Nursing, Mahidol University, Bangkok, Thailand
| | - Lalipat Phianhasin
- Department of Medical Nursing, Faculty of Nursing, Mahidol University, Bangkok, Thailand
| | | | - Kedong Ding
- Jack, Joseph and Morton Mandel School of Applied Social Sciences, Case Western Reserve University, Cleveland, OH, United States
| | - Anuoluwapo Ajibade
- College of Art and Science, Department of Anthropology, Case Western Reserve University, Cleveland, OH, United States
| | - Ayanesh Kumar
- School of Medicine, Case Western Reserve University, Cleveland, OH, United States
| | - Cassie Stewart
- Frances Payne Bolton School of Nursing, Case Western Reserve University, Cleveland, OH, United States
| |
Collapse
|
33
|
Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond) 2024; 38:1252-1261. [PMID: 38172581 PMCID: PMC11076576 DOI: 10.1038/s41433-023-02915-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
ChatGPT, an artificial intelligence (AI) chatbot built on large language models (LLMs), has rapidly gained popularity. The benefits and limitations of this transformative technology have been discussed across various fields, including medicine. The widespread availability of ChatGPT has enabled clinicians to study how these tools could be used for a variety of tasks such as generating differential diagnosis lists, organizing patient notes, and synthesizing literature for scientific research. LLMs have shown promising capabilities in ophthalmology by performing well on the Ophthalmic Knowledge Assessment Program, providing fairly accurate responses to questions about retinal diseases, and in generating differential diagnoses list. There are current limitations to this technology, including the propensity of LLMs to "hallucinate", or confidently generate false information; their potential role in perpetuating biases in medicine; and the challenges in incorporating LLMs into research without allowing "AI-plagiarism" or publication of false information. In this paper, we provide a balanced overview of what LLMs are and introduce some of the LLMs that have been generated in the past few years. We discuss recent literature evaluating the role of these language models in medicine with a focus on ChatGPT. The field of AI is fast-paced, and new applications based on LLMs are being generated rapidly; therefore, it is important for ophthalmologists to be aware of how this technology works and how it may impact patient care. Here, we discuss the benefits, limitations, and future advancements of LLMs in patient care and research.
Collapse
Affiliation(s)
- Nikita Kedia
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | - Joshua Ong
- Department of Ophthalmology and Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, MI, USA
| | - Jay Chhablani
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
34
|
Butler JJ, Puleo J, Harrington MC, Dahmen J, Rosenbaum AJ, Kerkhoffs GMMJ, Kennedy JG. From technical to understandable: Artificial Intelligence Large Language Models improve the readability of knee radiology reports. Knee Surg Sports Traumatol Arthrosc 2024; 32:1077-1086. [PMID: 38488217 DOI: 10.1002/ksa.12133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 02/19/2024] [Accepted: 02/22/2024] [Indexed: 04/23/2024]
Abstract
PURPOSE The purpose of this study was to evaluate the effectiveness of an Artificial Intelligence-Large Language Model (AI-LLM) at improving the readability of knee radiology reports. METHODS Reports of 100 knee X-rays, 100 knee computed tomography (CT) scans and 100 knee magnetic resonance imaging (MRI) scans were retrieved. The following prompt command was inserted into the AI-LLM: 'Explain this radiology report to a patient in layman's terms in the second person:[Report Text]'. The Flesch-Kincaid reading level (FKRL) score, Flesch reading ease (FRE) score and report length were calculated for the original radiology report and the AI-LLM generated report. Any 'hallucination' or inaccurate text produced by the AI-LLM-generated report was documented. RESULTS Statistically significant improvements in mean FKRL scores in the AI-LLM generated X-ray report (12.7 ± 1.0-7.2 ± 0.6), CT report (13.4 ± 1.0-7.5 ± 0.5) and MRI report (13.5 ± 0.9-7.5 ± 0.6) were observed. Statistically significant improvements in mean FRE scores in the AI-LLM generated X-ray report (39.5 ± 7.5-76.8 ± 5.1), CT report (27.3 ± 5.9-73.1 ± 5.6) and MRI report (26.8 ± 6.4-73.4 ± 5.0) were observed. Superior FKRL scores and FRE scores were observed in the AI-LLM-generated X-ray report compared to the AI-LLM-generated CT report and MRI report, p < 0.001. The hallucination rates in the AI-LLM generated X-ray report, CT report and MRI report were 2%, 5% and 5%, respectively. CONCLUSIONS This study highlights the promising use of AI-LLMs as an innovative, patient-centred strategy to improve the readability of knee radiology reports. The clinical relevance of this study is that an AI-LLM-generated knee radiology report may enhance patients' understanding of their imaging reports, potentially reducing the responder burden placed on the ordering physicians. However, due to the 'hallucinations' produced by the AI-LLM-generated report, the ordering physician must always engage in a collaborative discussion with the patient regarding both reports and the corresponding images. LEVEL OF EVIDENCE Level IV.
Collapse
Affiliation(s)
- James J Butler
- Department of Orthopaedic Surgery, Foot and Ankle Division, NYU Langone Health, New York City, New York, USA
| | - James Puleo
- Albany Medical Center, Albany, New York, USA
| | | | - Jari Dahmen
- Department of Orthopaedic Surgery and Sports Medicine, Amsterdam Movement Sciences, Amsterdam UMC, University of Amsterdam, Location AMC, Amsterdam, The Netherlands
- Academic Center for Evidence-Based Sports Medicine, Amsterdam UMC, Amsterdam, The Netherlands
- Amsterdam Collaboration for Health and Safety in Sports, International Olympic Committee Research Center, Amsterdam UMC, Amsterdam, The Netherlands
| | | | - Gino M M J Kerkhoffs
- Department of Orthopaedic Surgery and Sports Medicine, Amsterdam Movement Sciences, Amsterdam UMC, University of Amsterdam, Location AMC, Amsterdam, The Netherlands
- Academic Center for Evidence-Based Sports Medicine, Amsterdam UMC, Amsterdam, The Netherlands
- Amsterdam Collaboration for Health and Safety in Sports, International Olympic Committee Research Center, Amsterdam UMC, Amsterdam, The Netherlands
| | - John G Kennedy
- Department of Orthopaedic Surgery, Foot and Ankle Division, NYU Langone Health, New York City, New York, USA
| |
Collapse
|
35
|
Kernberg A, Gold JA, Mohan V. Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study. J Med Internet Res 2024; 26:e54419. [PMID: 38648636 DOI: 10.2196/54419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/20/2024] [Accepted: 03/10/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND Medical documentation plays a crucial role in clinical practice, facilitating accurate patient management and communication among health care professionals. However, inaccuracies in medical notes can lead to miscommunication and diagnostic errors. Additionally, the demands of documentation contribute to physician burnout. Although intermediaries like medical scribes and speech recognition software have been used to ease this burden, they have limitations in terms of accuracy and addressing provider-specific metrics. The integration of ambient artificial intelligence (AI)-powered solutions offers a promising way to improve documentation while fitting seamlessly into existing workflows. OBJECTIVE This study aims to assess the accuracy and quality of Subjective, Objective, Assessment, and Plan (SOAP) notes generated by ChatGPT-4, an AI model, using established transcripts of History and Physical Examination as the gold standard. We seek to identify potential errors and evaluate the model's performance across different categories. METHODS We conducted simulated patient-provider encounters representing various ambulatory specialties and transcribed the audio files. Key reportable elements were identified, and ChatGPT-4 was used to generate SOAP notes based on these transcripts. Three versions of each note were created and compared to the gold standard via chart review; errors generated from the comparison were categorized as omissions, incorrect information, or additions. We compared the accuracy of data elements across versions, transcript length, and data categories. Additionally, we assessed note quality using the Physician Documentation Quality Instrument (PDQI) scoring system. RESULTS Although ChatGPT-4 consistently generated SOAP-style notes, there were, on average, 23.6 errors per clinical case, with errors of omission (86%) being the most common, followed by addition errors (10.5%) and inclusion of incorrect facts (3.2%). There was significant variance between replicates of the same case, with only 52.9% of data elements reported correctly across all 3 replicates. The accuracy of data elements varied across cases, with the highest accuracy observed in the "Objective" section. Consequently, the measure of note quality, assessed by PDQI, demonstrated intra- and intercase variance. Finally, the accuracy of ChatGPT-4 was inversely correlated to both the transcript length (P=.05) and the number of scorable data elements (P=.05). CONCLUSIONS Our study reveals substantial variability in errors, accuracy, and note quality generated by ChatGPT-4. Errors were not limited to specific sections, and the inconsistency in error types across replicates complicated predictability. Transcript length and data complexity were inversely correlated with note accuracy, raising concerns about the model's effectiveness in handling complex medical cases. The quality and reliability of clinical notes produced by ChatGPT-4 do not meet the standards required for clinical use. Although AI holds promise in health care, caution should be exercised before widespread adoption. Further research is needed to address accuracy, variability, and potential errors. ChatGPT-4, while valuable in various applications, should not be considered a safe alternative to human-generated clinical documentation at this time.
Collapse
Affiliation(s)
- Annessa Kernberg
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Sciences University, Portland, OR, United States
| | - Jeffrey A Gold
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Sciences University, Portland, OR, United States
| | - Vishnu Mohan
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Sciences University, Portland, OR, United States
| |
Collapse
|
36
|
Cevasco KE, Morrison Brown RE, Woldeselassie R, Kaplan S. Patient Engagement with Conversational Agents in Health Applications 2016-2022: A Systematic Review and Meta-Analysis. J Med Syst 2024; 48:40. [PMID: 38594411 PMCID: PMC11004048 DOI: 10.1007/s10916-024-02059-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 04/01/2024] [Indexed: 04/11/2024]
Abstract
Clinicians and patients seeking electronic health applications face challenges in selecting effective solutions due to a high market failure rate. Conversational agent applications ("chatbots") show promise in increasing healthcare user engagement by creating bonds between the applications and users. It is unclear if chatbots improve patient adherence or if past trends to include chatbots in electronic health applications were due to technology hype dynamics and competitive pressure to innovate. We conducted a systematic literature review using Preferred Reporting Items for Systematic reviews and Meta-Analyses methodology on health chatbot randomized control trials. The goal of this review was to identify if user engagement indicators are published in eHealth chatbot studies. A meta-analysis examined patient clinical trial retention of chatbot apps. The results showed no chatbot arm patient retention effect. The small number of studies suggests a need for ongoing eHealth chatbot research, especially given the claims regarding their effectiveness made outside the scientific literatures.
Collapse
Affiliation(s)
- Kevin E Cevasco
- Department of Global and Community Health, George Mason University, 4400 University Dr., Fairfax, 22030, VA, USA.
| | - Rachel E Morrison Brown
- Department of Global and Community Health, George Mason University, 4400 University Dr., Fairfax, 22030, VA, USA
| | - Rediet Woldeselassie
- Department of Health Administration and Policy, George Mason University, Fairfax, VA, USA
| | - Seth Kaplan
- Department of Psychology, George Mason University, Fairfax, VA, USA
| |
Collapse
|
37
|
Yuan S, Li F, Browning MHEM, Bardhan M, Zhang K, McAnirlin O, Patwary MM, Reuben A. Leveraging and exercising caution with ChatGPT and other generative artificial intelligence tools in environmental psychology research. Front Psychol 2024; 15:1295275. [PMID: 38650897 PMCID: PMC11033305 DOI: 10.3389/fpsyg.2024.1295275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 03/01/2024] [Indexed: 04/25/2024] Open
Abstract
Generative Artificial Intelligence (GAI) is an emerging and disruptive technology that has attracted considerable interest from researchers and educators across various disciplines. We discuss the relevance and concerns of ChatGPT and other GAI tools in environmental psychology research. We propose three use categories for GAI tools: integrated and contextualized understanding, practical and flexible implementation, and two-way external communication. These categories are exemplified by topics such as the health benefits of green space, theory building, visual simulation, and identifying practical relevance. However, we also highlight the balance of productivity with ethical issues, as well as the need for ethical guidelines, professional training, and changes in the academic performance evaluation systems. We hope this perspective can foster constructive dialogue and responsible practice of GAI tools.
Collapse
Affiliation(s)
- Shuai Yuan
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Fu Li
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Matthew H. E. M. Browning
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Mondira Bardhan
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Kuiran Zhang
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Olivia McAnirlin
- Virtual Reality and Nature Lab, Department of Parks, Recreation and Tourism Management, Clemson University, Clemson, SC, United States
| | - Muhammad Mainuddin Patwary
- Environment and Sustainability Research Initiative, Khulna, Bangladesh
- Environmental Science Discipline, Life Science School, Khulna University, Khulna, Bangladesh
| | - Aaron Reuben
- Department of Psychology and Neuroscience, Duke University, Durham, NC, United States
| |
Collapse
|
38
|
van Diessen E, van Amerongen RA, Zijlmans M, Otte WM. Potential merits and flaws of large language models in epilepsy care: A critical review. Epilepsia 2024; 65:873-886. [PMID: 38305763 DOI: 10.1111/epi.17907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/30/2023] [Accepted: 01/19/2024] [Indexed: 02/03/2024]
Abstract
The current pace of development and applications of large language models (LLMs) is unprecedented and will impact future medical care significantly. In this critical review, we provide the background to better understand these novel artificial intelligence (AI) models and how LLMs can be of future use in the daily care of people with epilepsy. Considering the importance of clinical history taking in diagnosing and monitoring epilepsy-combined with the established use of electronic health records-a great potential exists to integrate LLMs in epilepsy care. We present the current available LLM studies in epilepsy. Furthermore, we highlight and compare the most commonly used LLMs and elaborate on how these models can be applied in epilepsy. We further discuss important drawbacks and risks of LLMs, and we provide recommendations for overcoming these limitations.
Collapse
Affiliation(s)
- Eric van Diessen
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
- Department of Pediatrics, Franciscus Gasthuis & Vlietland, Rotterdam, The Netherlands
| | - Ramon A van Amerongen
- Faculty of Science, Bioinformatics and Biocomplexity, Utrecht University, Utrecht, The Netherlands
| | - Maeike Zijlmans
- Department of Neurology and Neurosurgery, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
- Stichting Epilepsie Instellingen Nederland, Heemstede, The Netherlands
| | - Willem M Otte
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
39
|
Shorey S, Mattar C, Pereira TLB, Choolani M. A scoping review of ChatGPT's role in healthcare education and research. NURSE EDUCATION TODAY 2024; 135:106121. [PMID: 38340639 DOI: 10.1016/j.nedt.2024.106121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/05/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]
Abstract
OBJECTIVES To examine and consolidate literature regarding the advantages and disadvantages of utilizing ChatGPT in healthcare education and research. DESIGN/METHODS We searched seven electronic databases (PubMed/Medline, CINAHL, Embase, PsycINFO, Scopus, ProQuest Dissertations and Theses Global, and Web of Science) from November 2022 until September 2023. This scoping review adhered to Arksey and O'Malley's framework and followed reporting guidelines outlined in the PRISMA-ScR checklist. For analysis, we employed Thomas and Harden's thematic synthesis framework. RESULTS A total of 100 studies were included. An overarching theme, "Forging the Future: Bridging Theory and Integration of ChatGPT" emerged, accompanied by two main themes (1) Enhancing Healthcare Education, Research, and Writing with ChatGPT, (2) Controversies and Concerns about ChatGPT in Healthcare Education Research and Writing, and seven subthemes. CONCLUSIONS Our review underscores the importance of acknowledging legitimate concerns related to the potential misuse of ChatGPT such as 'ChatGPT hallucinations', its limited understanding of specialized healthcare knowledge, its impact on teaching methods and assessments, confidentiality and security risks, and the controversial practice of crediting it as a co-author on scientific papers, among other considerations. Furthermore, our review also recognizes the urgency of establishing timely guidelines and regulations, along with the active engagement of relevant stakeholders, to ensure the responsible and safe implementation of ChatGPT's capabilities. We advocate for the use of cross-verification techniques to enhance the precision and reliability of generated content, the adaptation of higher education curricula to incorporate ChatGPT's potential, educators' need to familiarize themselves with the technology to improve their literacy and teaching approaches, and the development of innovative methods to detect ChatGPT usage. Furthermore, data protection measures should be prioritized when employing ChatGPT, and transparent reporting becomes crucial when integrating ChatGPT into academic writing.
Collapse
Affiliation(s)
- Shefaly Shorey
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
| | - Citra Mattar
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Travis Lanz-Brian Pereira
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Mahesh Choolani
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| |
Collapse
|
40
|
Beaulieu-Jones BR, Berrigan MT, Shah S, Marwaha JS, Lai SL, Brat GA. Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments. Surgery 2024; 175:936-942. [PMID: 38246839 PMCID: PMC10947829 DOI: 10.1016/j.surg.2023.12.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 12/09/2023] [Accepted: 12/15/2023] [Indexed: 01/23/2024]
Abstract
BACKGROUND Artificial intelligence has the potential to dramatically alter health care by enhancing how we diagnose and treat disease. One promising artificial intelligence model is ChatGPT, a general-purpose large language model trained by OpenAI. ChatGPT has shown human-level performance on several professional and academic benchmarks. We sought to evaluate its performance on surgical knowledge questions and assess the stability of this performance on repeat queries. METHODS We evaluated the performance of ChatGPT-4 on questions from the Surgical Council on Resident Education question bank and a second commonly used surgical knowledge assessment, referred to as Data-B. Questions were entered in 2 formats: open-ended and multiple-choice. ChatGPT outputs were assessed for accuracy and insights by surgeon evaluators. We categorized reasons for model errors and the stability of performance on repeat queries. RESULTS A total of 167 Surgical Council on Resident Education and 112 Data-B questions were presented to the ChatGPT interface. ChatGPT correctly answered 71.3% and 67.9% of multiple choice and 47.9% and 66.1% of open-ended questions for Surgical Council on Resident Education and Data-B, respectively. For both open-ended and multiple-choice questions, approximately two-thirds of ChatGPT responses contained nonobvious insights. Common reasons for incorrect responses included inaccurate information in a complex question (n = 16, 36.4%), inaccurate information in a fact-based question (n = 11, 25.0%), and accurate information with circumstantial discrepancy (n = 6, 13.6%). Upon repeat query, the answer selected by ChatGPT varied for 36.4% of questions answered incorrectly on the first query; the response accuracy changed for 6/16 (37.5%) questions. CONCLUSION Consistent with findings in other academic and professional domains, we demonstrate near or above human-level performance of ChatGPT on surgical knowledge questions from 2 widely used question banks. ChatGPT performed better on multiple-choice than open-ended questions, prompting questions regarding its potential for clinical application. Unique to this study, we demonstrate inconsistency in ChatGPT responses on repeat queries. This finding warrants future consideration including efforts at training large language models to provide the safe and consistent responses required for clinical application. Despite near or above human-level performance on question banks and given these observations, it is unclear whether large language models such as ChatGPT are able to safely assist clinicians in providing care.
Collapse
Affiliation(s)
- Brendin R Beaulieu-Jones
- Department of Surgery, Beth Israel Deaconess Medical Center, Boston, MA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA. https://twitter.com/bratogram
| | | | - Sahaj Shah
- Geisinger Commonwealth School of Medicine, Scranton, PA
| | - Jayson S Marwaha
- Division of Colorectal Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Shuo-Lun Lai
- Division of Colorectal Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Gabriel A Brat
- Department of Surgery, Beth Israel Deaconess Medical Center, Boston, MA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA.
| |
Collapse
|
41
|
Uranbey Ö, Özbey F, Kaygısız Ö, Ayrancı F. Assessing ChatGPT's Diagnostic Accuracy and Therapeutic Strategies in Oral Pathologies: A Cross-Sectional Study. Cureus 2024; 16:e58607. [PMID: 38770501 PMCID: PMC11102887 DOI: 10.7759/cureus.58607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/15/2024] [Indexed: 05/22/2024] Open
Abstract
BACKGROUND The rapid adoption of artificial intelligence (AI) models in the medical field is due to their ability to collaborate with clinicians in the diagnosis and management of a wide range of conditions. This research assesses the diagnostic accuracy and therapeutic strategies of Chat Generative Pre-trained Transformer (ChatGPT) in comparison to dental professionals across 12 clinical cases. METHODOLOGY ChatGPT 3.5 was queried for diagnoses and management plans for 12 retrospective cases. Physicians were tasked with rating the complexity of clinical scenarios and their agreement with the ChatGPT responses using a five-point Likert scale. Comparisons were made between the complexity of the cases and the accuracy of the diagnoses and treatment plans. RESULTS ChatGPT exhibited high accuracy in providing differential diagnoses and acceptable treatment plans. In a survey involving 30 attending physicians, scenarios were rated with an overall median difficulty level of 3, showing acceptable agreement with ChatGPT's differential diagnosis accuracy (overall median 4). Our study revealed lower diagnosis scores correlating with decreased treatment management scores, as demonstrated by univariate ordinal regression analysis. CONCLUSIONS ChatGPT's rapid processing aids healthcare by offering an objective, evidence-based approach, reducing human error and workload. However, potential biases may affect outcomes and challenge less-experienced practitioners. AI in healthcare, including ChatGPT, is still evolving, and further research is needed to understand its full potential in analyzing clinical information, establishing diagnoses, and suggesting treatments.
Collapse
Affiliation(s)
- Ömer Uranbey
- Oral and Maxillofacial Surgery, Ordu University, Ordu, TUR
| | - Furkan Özbey
- Oral and Maxillofacial Radiology, Ordu University, Ordu, TUR
| | - Ömer Kaygısız
- Oral and Maxillofacial Surgery, Gaziantep University, Gaziantep, TUR
| | - Ferhat Ayrancı
- Oral and Maxillofacial Surgery, Ordu University, Ordu, TUR
| |
Collapse
|
42
|
Al Rawi ZM, Kirby BJ, Albrecht PA, Nuelle JAV, London DA. Experimenting With the New Frontier: Artificial Intelligence-Powered Chat Bots in Hand Surgery. Hand (N Y) 2024:15589447241238372. [PMID: 38525794 PMCID: PMC11571578 DOI: 10.1177/15589447241238372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Background: Increased utilization of artificial intelligence (AI)-driven search and large language models by the lay and medical community requires us to evaluate the accuracy of AI responses to common hand surgery questions. We hypothesized that the answers to most hand surgery questions posed to an AI large language model would be correct. Methods: Using the topics covered in Green's Operative Hand Surgery 8th Edition as a guide, 56 hand surgery questions were compiled and posed to ChatGPT (OpenAI, San Francisco, CA). Two attending hand surgeons then independently reviewed ChatGPT's answers for response accuracy, completeness, and usefulness. A Cohen's kappa analysis was performed to assess interrater agreement. Results: An average of 45 of the 56 questions posed to ChatGPT were deemed correct (80%), 39 responses were deemed useful (70%), and 32 responses were deemed complete (57%) by the reviewers. Kappa analysis demonstrated "fair to moderate" agreement between the two raters. Reviewers disagreed on 11 questions regarding correctness, 16 questions regarding usefulness, and 19 questions regarding completeness. Conclusions: Large language models have the potential to both positively and negatively impact patient perceptions and guide referral patterns based on the accuracy, completeness, and usefulness of their responses. While most responses fit these criteria, more precise responses are needed to ensure patient safety and avoid misinformation. Individual hand surgeons and surgical societies must understand these technologies and interface with the companies developing them to provide our patients with the best possible care.
Collapse
Affiliation(s)
| | - Benjamin J. Kirby
- Department of Surgery, University of Missouri Health Care, Columbia, USA
| | | | - Julia A. V. Nuelle
- Department of Orthopaedics, University of Missouri Health Care, Columbia, USA
| | - Daniel A. London
- Department of Orthopaedics, University of Missouri Health Care, Columbia, USA
| |
Collapse
|
43
|
Pal S, Bhattacharya M, Lee SS, Chakraborty C. A Domain-Specific Next-Generation Large Language Model (LLM) or ChatGPT is Required for Biomedical Engineering and Research. Ann Biomed Eng 2024; 52:451-454. [PMID: 37428337 DOI: 10.1007/s10439-023-03306-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 07/03/2023] [Indexed: 07/11/2023]
Abstract
Large language models or ChatGPT have recently gained extensive media coverage. At the same time, the use of ChatGPT has increased deistically. Biomedical researchers, engineers, and clinicians have shown significant interest and started using it due to its diverse applications, especially in the biomedical field. However, it has been found that ChatGPT sometimes provided incorrect or partly correct information. It is unable to give the most recent information. Therefore, we urgently advocate a domain-specific next-generation, ChatBot for biomedical engineering and research, providing error-free, more accurate, and updated information. The domain-specific ChatBot can perform diversified functions in biomedical engineering, such as performing innovation in biomedical engineering, designing a medical device, etc. The domain-specific artificial intelligence enabled device will revolutionize biomedical engineering and research if a biomedical domain-specific ChatBot is produced.
Collapse
Affiliation(s)
- Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, Odisha, 756020, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon, Gangwon-Do, 24252, Republic of Korea
| | - Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, 700126, India.
| |
Collapse
|
44
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
45
|
Christy M, Morris MT, Goldfarb CA, Dy CJ. Appropriateness and Reliability of an Online Artificial Intelligence Platform's Responses to Common Questions Regarding Distal Radius Fractures. J Hand Surg Am 2024; 49:91-98. [PMID: 38069953 DOI: 10.1016/j.jhsa.2023.10.019] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 02/05/2024]
Abstract
PURPOSE Chat Generative Pre-Trained Transformer (ChatGPT) is a novel artificial intelligence chatbot that is changing the way humans gather information online. The purpose of this study was to investigate ChatGPT's ability to appropriately and reliably answer common questions regarding distal radius fractures. METHODS Thirty common questions regarding distal radius fractures were presented in an identical manner to the online ChatGPT-3.5 interface three separate times, yielding 90 unique responses because ChatGPT produces an original answer with each query. All responses were graded as "appropriate," "appropriate but incomplete," or "inappropriate" by a consensus discussion among three hand surgeon reviewers. The questions were additionally subcategorized into one of four domains based on Bloom's cognitive learning taxonomy, and descriptive statistics were reported. RESULTS Seventy of the 90 total responses (78%) produced by ChatGPT were "appropriate," and 29 of the 30 questions (97%) had at least one response considered appropriate (of the three possible). However, only 17 of the 30 questions (57%) were answered appropriately on all three iterations. The test-retest reliability of ChatGPT was poor with an intraclass correlation coefficient of 0.12. Finally, ChatGPT performed best answering questions requiring lower-order thinking skills (Bloom's levels 1-3) and less well on level 4 questions. CONCLUSIONS This study found that although ChatGPT has the capability to answer common questions regarding distal radius fractures, caution should be taken before implementing its use, given ChatGPT's inconsistency in providing a complete and accurate response to the same question every time. CLINICAL RELEVANCE As the popularity and technology of ChatGPT continue to grow, it is important to understand the potential and limitations of this platform to determine how it may be best implemented to improve patient care.
Collapse
Affiliation(s)
- Michele Christy
- Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO
| | - Marie T Morris
- Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO
| | - Charles A Goldfarb
- Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO
| | - Christopher J Dy
- Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO.
| |
Collapse
|
46
|
Liu JW, McCulloch PC. SONNET #29888: ChatGPT Finds Poetry in Anterior Cruciate Ligament Reconstruction and Return to Sport. Arthroscopy 2024; 40:197-198. [PMID: 38296427 DOI: 10.1016/j.arthro.2023.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 09/26/2023] [Indexed: 02/08/2024]
Affiliation(s)
- Jennifer W Liu
- Department of Orthopedic Surgery & Sports Medicine, Houston Methodist Hospital, Houston, Texas, U.S.A
| | - Patrick C McCulloch
- Department of Orthopedic Surgery & Sports Medicine, Houston Methodist Hospital, Houston, Texas, U.S.A
| |
Collapse
|
47
|
Elyoseph Z, Levkovich I, Shinan-Altman S. Assessing prognosis in depression: comparing perspectives of AI models, mental health professionals and the general public. Fam Med Community Health 2024; 12:e002583. [PMID: 38199604 PMCID: PMC10806564 DOI: 10.1136/fmch-2023-002583] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) has rapidly permeated various sectors, including healthcare, highlighting its potential to facilitate mental health assessments. This study explores the underexplored domain of AI's role in evaluating prognosis and long-term outcomes in depressive disorders, offering insights into how AI large language models (LLMs) compare with human perspectives. METHODS Using case vignettes, we conducted a comparative analysis involving different LLMs (ChatGPT-3.5, ChatGPT-4, Claude and Bard), mental health professionals (general practitioners, psychiatrists, clinical psychologists and mental health nurses), and the general public that reported previously. We evaluate the LLMs ability to generate prognosis, anticipated outcomes with and without professional intervention, and envisioned long-term positive and negative consequences for individuals with depression. RESULTS In most of the examined cases, the four LLMs consistently identified depression as the primary diagnosis and recommended a combined treatment of psychotherapy and antidepressant medication. ChatGPT-3.5 exhibited a significantly pessimistic prognosis distinct from other LLMs, professionals and the public. ChatGPT-4, Claude and Bard aligned closely with mental health professionals and the general public perspectives, all of whom anticipated no improvement or worsening without professional help. Regarding long-term outcomes, ChatGPT 3.5, Claude and Bard consistently projected significantly fewer negative long-term consequences of treatment than ChatGPT-4. CONCLUSIONS This study underscores the potential of AI to complement the expertise of mental health professionals and promote a collaborative paradigm in mental healthcare. The observation that three of the four LLMs closely mirrored the anticipations of mental health experts in scenarios involving treatment underscores the technology's prospective value in offering professional clinical forecasts. The pessimistic outlook presented by ChatGPT 3.5 is concerning, as it could potentially diminish patients' drive to initiate or continue depression therapy. In summary, although LLMs show potential in enhancing healthcare services, their utilisation requires thorough verification and a seamless integration with human judgement and skills.
Collapse
Affiliation(s)
- Zohar Elyoseph
- Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Yezreel Valley, Israel
- Department of Brain Sciences, Imperial College London, London, UK
| | - Inbar Levkovich
- Faculty of Graduate Studies, Oranim Academic College, Tivon, Israel
| | - Shiri Shinan-Altman
- The Louis and Gabi Weisfeld School of Social Work, Bar-Ilan University, Ramat Gan, Tel Aviv, Israel
| |
Collapse
|
48
|
Padovan M, Cosci B, Petillo A, Nerli G, Porciatti F, Scarinci S, Carlucci F, Dell’Amico L, Meliani N, Necciari G, Lucisano VC, Marino R, Foddis R, Palla A. ChatGPT in Occupational Medicine: A Comparative Study with Human Experts. Bioengineering (Basel) 2024; 11:57. [PMID: 38247934 PMCID: PMC10813435 DOI: 10.3390/bioengineering11010057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 01/01/2024] [Accepted: 01/04/2024] [Indexed: 01/23/2024] Open
Abstract
The objective of this study is to evaluate ChatGPT's accuracy and reliability in answering complex medical questions related to occupational health and explore the implications and limitations of AI in occupational health medicine. The study also provides recommendations for future research in this area and informs decision-makers about AI's impact on healthcare. A group of physicians was enlisted to create a dataset of questions and answers on Italian occupational medicine legislation. The physicians were divided into two teams, and each team member was assigned a different subject area. ChatGPT was used to generate answers for each question, with/without legislative context. The two teams then evaluated human and AI-generated answers blind, with each group reviewing the other group's work. Occupational physicians outperformed ChatGPT in generating accurate questions on a 5-point Likert score, while the answers provided by ChatGPT with access to legislative texts were comparable to those of professional doctors. Still, we found that users tend to prefer answers generated by humans, indicating that while ChatGPT is useful, users still value the opinions of occupational medicine professionals.
Collapse
Affiliation(s)
- Martina Padovan
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Bianca Cosci
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Armando Petillo
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Gianluca Nerli
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Francesco Porciatti
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Sergio Scarinci
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Francesco Carlucci
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Letizia Dell’Amico
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Niccolò Meliani
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Gabriele Necciari
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Vincenzo Carmelo Lucisano
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Riccardo Marino
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | - Rudy Foddis
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126 Pisa, Italy; (M.P.); (B.C.); (A.P.); (G.N.); (F.P.); (S.S.); (F.C.); (L.D.); (N.M.); (G.N.); (R.M.)
| | | |
Collapse
|
49
|
Younis HA, Eisa TAE, Nasser M, Sahib TM, Noor AA, Alyasiri OM, Salisu S, Hayder IM, Younis HA. A Systematic Review and Meta-Analysis of Artificial Intelligence Tools in Medicine and Healthcare: Applications, Considerations, Limitations, Motivation and Challenges. Diagnostics (Basel) 2024; 14:109. [PMID: 38201418 PMCID: PMC10802884 DOI: 10.3390/diagnostics14010109] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/02/2023] [Accepted: 12/04/2023] [Indexed: 01/12/2024] Open
Abstract
Artificial intelligence (AI) has emerged as a transformative force in various sectors, including medicine and healthcare. Large language models like ChatGPT showcase AI's potential by generating human-like text through prompts. ChatGPT's adaptability holds promise for reshaping medical practices, improving patient care, and enhancing interactions among healthcare professionals, patients, and data. In pandemic management, ChatGPT rapidly disseminates vital information. It serves as a virtual assistant in surgical consultations, aids dental practices, simplifies medical education, and aids in disease diagnosis. A total of 82 papers were categorised into eight major areas, which are G1: treatment and medicine, G2: buildings and equipment, G3: parts of the human body and areas of the disease, G4: patients, G5: citizens, G6: cellular imaging, radiology, pulse and medical images, G7: doctors and nurses, and G8: tools, devices and administration. Balancing AI's role with human judgment remains a challenge. A systematic literature review using the PRISMA approach explored AI's transformative potential in healthcare, highlighting ChatGPT's versatile applications, limitations, motivation, and challenges. In conclusion, ChatGPT's diverse medical applications demonstrate its potential for innovation, serving as a valuable resource for students, academics, and researchers in healthcare. Additionally, this study serves as a guide, assisting students, academics, and researchers in the field of medicine and healthcare alike.
Collapse
Affiliation(s)
- Hussain A. Younis
- College of Education for Women, University of Basrah, Basrah 61004, Iraq
| | | | - Maged Nasser
- Computer & Information Sciences Department, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia;
| | - Thaeer Mueen Sahib
- Kufa Technical Institute, Al-Furat Al-Awsat Technical University, Kufa 54001, Iraq;
| | - Ameen A. Noor
- Computer Science Department, College of Education, University of Almustansirya, Baghdad 10045, Iraq;
| | | | - Sani Salisu
- Department of Information Technology, Federal University Dutse, Dutse 720101, Nigeria;
| | - Israa M. Hayder
- Qurna Technique Institute, Southern Technical University, Basrah 61016, Iraq;
| | - Hameed AbdulKareem Younis
- Department of Cybersecurity, College of Computer Science and Information Technology, University of Basrah, Basrah 61016, Iraq;
| |
Collapse
|
50
|
Kayaalp ME, Ollivier M, Winkler PW, Dahmen J, Musahl V, Hirschmann MT, Karlsson J. Embrace responsible ChatGPT usage to overcome language barriers in academic writing. Knee Surg Sports Traumatol Arthrosc 2024; 32:5-9. [PMID: 38226673 DOI: 10.1002/ksa.12014] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Accepted: 11/08/2023] [Indexed: 01/17/2024]
Affiliation(s)
- M Enes Kayaalp
- Department of Orthopaedic Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Department for Orthopaedics and Traumatology, Istanbul Kartal Research and Training Hospital, Istanbul, Turkiye
| | - Matthieu Ollivier
- CNRS, Institute of Movement Sciences (ISM), Aix Marseille University, Marseille, France
| | - Philipp W Winkler
- Department for Orthopaedics and Traumatology, Kepler University Hospital GmbH, Linz, Austria
| | - Jari Dahmen
- Department of Orthopaedic Surgery and Sports Medicine, Amsterdam Movement Sciences, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
- Academic Center for Evidence Based Sports Medicine (ACES), Amsterdam, The Netherlands
- Amsterdam Collaboration for Health and Safety in Sports (ACHSS), International Olympic Committee (IOC) Research Center Amsterdam UMC, Amsterdam, The Netherlands
| | - Volker Musahl
- Department of Orthopaedic Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Michael T Hirschmann
- Department of Orthopedic Surgery and Traumatology, Head Knee Surgery and DKF Head of Research, Kantonsspital Baselland, Bruderholz, Bottmingen, Switzerland
- University of Basel, Basel, Switzerland
| | - Jon Karlsson
- Department for Orthopaedics, Sahlgrenska University Hospital, Institute of Clinical Sciences, Sahlgrenska Academy, Gothenburg University, Gothenburg, Sweden
| |
Collapse
|