Copyright
©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation
Chrysanthos D Christou, Olga Sitsiani, Panagiotis Boutos, Georgios Katsanos, Georgios Papadakis, Anastasios Tefas, Vassilios Papalois, Georgios Tsoulfas
Chrysanthos D Christou, Georgios Katsanos, Georgios Tsoulfas, Center for Research and Innovation in Solid Organ Transplantation, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki 54622, Greece
Olga Sitsiani, Panagiotis Boutos, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki 54622, Greece
Georgios Papadakis, Department of Nephrology and Transplantation, Guy’s Hospital, Guy’s and St Thomas’ NHS Foundation Trust, London SE1 1UL, United Kingdom
Anastasios Tefas, Computational Intelligence and Deep Learning Group, Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki 54636, Greece
Vassilios Papalois, Renal and Transplant Unit, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London W120HS, United Kingdom
Author contributions: Christou CD, Sitsiani O, Boutos P, Katsanos G, Papadakis G, Tefas A, Papalois V, and Tsoulfas G gathered and prepared the clinical scenarios; Christou CD, Sitsiani O, and Boutos P ran the conversations and recorded the answers; Christou CD performed the statistical analysis and drafted the manuscript; and all authors reviewed and edited the manuscript.
Institutional review board statement: This study was conducted using anonymized patient data that are derived from medical records and in compliance with the Declaration of Helsinki and its later amendments and thus does not require IRB approval.
Informed consent statement: This study was conducted using anonymized patient data that are derived from medical records and in compliance with the Declaration of Helsinki and its later amendments and thus does not require informed consent.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-checklist of items.
Data sharing statement: The data underlying this article are available upon reasonable request from the corresponding author.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:
https://creativecommons.org/Licenses/by-nc/4.0/ Corresponding author: Chrysanthos D Christou, MD, Center for Research and Innovation in Solid Organ Transplantation, School of Medicine, Aristotle University of Thessaloniki, 49 Konstantinoupoleos Street, Thessaloniki 54622, Greece.
christouchrysanthosd@gmail.com
Received: November 25, 2024
Revised: January 26, 2025
Accepted: March 5, 2025
Published online: September 18, 2025
Processing time: 147 Days and 12.2 Hours
BACKGROUND
Kidney and liver transplantation are two sub-specialized medical disciplines, with transplant professionals spending decades in training. While artificial intelligence-based (AI-based) tools could potentially assist in everyday clinical practice, comparative assessment of their effectiveness in clinical decision-making remains limited.
AIM
To compare the use of ChatGPT and GPT-4 as potential tools in AI-assisted clinical practice in these challenging disciplines.
METHODS
In total, 400 different questions tested ChatGPT’s/GPT-4 knowledge and decision-making capacity in various renal and liver transplantation concepts. Specifically, 294 multiple-choice questions were derived from open-access sources, 63 questions were derived from published open-access case reports, and 43 from unpublished cases of patients treated at our department. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others.
RESULTS
ChatGPT correctly answered 50.3% of the 294 multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (P < 0.001). Regarding the 63 questions from published cases, ChatGPT achieved an agreement rate of 50.79% and partial agreement of 17.46%, while GPT-4 demonstrated an agreement rate of 80.95% and partial agreement of 9.52% (P = 0.01). Regarding the 43 questions from unpublished cases, ChatGPT demonstrated an agreement rate of 53.49% and partial agreement of 23.26%, while GPT-4 demonstrated an agreement rate of 72.09% and partial agreement of 6.98% (P = 0.004). When factoring by the nature of the task for all cases, notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (P = 0.008), and successfully predicting the prognosis of the patient in 100% of related questions (P < 0.001).
CONCLUSION
GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. The progression and refinement of such AI-based tools could reshape the future of clinical practice, making their early adoption and adaptation by physicians a necessity.
Core Tip: GPT-4 outperformed ChatGPT in a wide range of clinical scenarios related to kidney and liver transplantation, demonstrating greater accuracy and alignment with physician decisions across a variety of tasks, including differential diagnosis, choosing appropriate diagnostic tests and treatment, and predicting the prognosis of patients. These findings highlight the potential of artificial intelligence models like GPT-4 as valuable tools in supporting clinical decision-making in transplantation.