Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation

doi:10.5500/wjt.v15.i3.103536

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 15, Issue 3

This Article

Academic Content and Language Evaluation of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (2291)

All Articles published online

The chart showing PDF series, HTML series, Tables (1-7) series.

Item

Count

PDF

HTML

1376

Tables (1-7)

331

Sum=1763

Publishing Process of This Article

The chart showing Browse series, Download series.

Item

Count

Browse

Download

388

Sum=441

Sep 18, 2025 (publication date) through Aug 16, 2025

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

World Journal of Transplantation

ISSN

2220-3230

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Observational Study

World J Transplant. Sep 18, 2025; 15(3): 103536
Published online Sep 18, 2025. doi: 10.5500/wjt.v15.i3.103536

Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation

Chrysanthos D Christou, Olga Sitsiani, Panagiotis Boutos, Georgios Katsanos, Georgios Papadakis, Anastasios Tefas, Vassilios Papalois, Georgios Tsoulfas

Chrysanthos D Christou, Georgios Katsanos, Georgios Tsoulfas, Center for Research and Innovation in Solid Organ Transplantation, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki 54622, Greece

Olga Sitsiani, Panagiotis Boutos, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki 54622, Greece

Georgios Papadakis, Department of Nephrology and Transplantation, Guy’s Hospital, Guy’s and St Thomas’ NHS Foundation Trust, London SE1 1UL, United Kingdom

Anastasios Tefas, Computational Intelligence and Deep Learning Group, Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki 54636, Greece

Vassilios Papalois, Renal and Transplant Unit, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London W120HS, United Kingdom

Author contributions: Christou CD, Sitsiani O, Boutos P, Katsanos G, Papadakis G, Tefas A, Papalois V, and Tsoulfas G gathered and prepared the clinical scenarios; Christou CD, Sitsiani O, and Boutos P ran the conversations and recorded the answers; Christou CD performed the statistical analysis and drafted the manuscript; and all authors reviewed and edited the manuscript.

Institutional review board statement: This study was conducted using anonymized patient data that are derived from medical records and in compliance with the Declaration of Helsinki and its later amendments and thus does not require IRB approval.

Informed consent statement: This study was conducted using anonymized patient data that are derived from medical records and in compliance with the Declaration of Helsinki and its later amendments and thus does not require informed consent.

Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.

STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-checklist of items.

Data sharing statement: The data underlying this article are available upon reasonable request from the corresponding author.

Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/

Corresponding author: Chrysanthos D Christou, MD, Center for Research and Innovation in Solid Organ Transplantation, School of Medicine, Aristotle University of Thessaloniki, 49 Konstantinoupoleos Street, Thessaloniki 54622, Greece. christouchrysanthosd@gmail.com

Received: November 25, 2024
Revised: January 26, 2025
Accepted: March 5, 2025
Published online: September 18, 2025
Processing time: 147 Days and 12.2 Hours

Abstract

BACKGROUND

Kidney and liver transplantation are two sub-specialized medical disciplines, with transplant professionals spending decades in training. While artificial intelligence-based (AI-based) tools could potentially assist in everyday clinical practice, comparative assessment of their effectiveness in clinical decision-making remains limited.

AIM

To compare the use of ChatGPT and GPT-4 as potential tools in AI-assisted clinical practice in these challenging disciplines.

METHODS

In total, 400 different questions tested ChatGPT’s/GPT-4 knowledge and decision-making capacity in various renal and liver transplantation concepts. Specifically, 294 multiple-choice questions were derived from open-access sources, 63 questions were derived from published open-access case reports, and 43 from unpublished cases of patients treated at our department. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others.

RESULTS

ChatGPT correctly answered 50.3% of the 294 multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (P < 0.001). Regarding the 63 questions from published cases, ChatGPT achieved an agreement rate of 50.79% and partial agreement of 17.46%, while GPT-4 demonstrated an agreement rate of 80.95% and partial agreement of 9.52% (P = 0.01). Regarding the 43 questions from unpublished cases, ChatGPT demonstrated an agreement rate of 53.49% and partial agreement of 23.26%, while GPT-4 demonstrated an agreement rate of 72.09% and partial agreement of 6.98% (P = 0.004). When factoring by the nature of the task for all cases, notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (P = 0.008), and successfully predicting the prognosis of the patient in 100% of related questions (P < 0.001).

CONCLUSION

GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. The progression and refinement of such AI-based tools could reshape the future of clinical practice, making their early adoption and adaptation by physicians a necessity.

Keywords: Artificial intelligence; ChatGPT; GPT-4; Transplantation; Kidney; Liver; Clinical decision support; Generative artificial intelligence

Core Tip: GPT-4 outperformed ChatGPT in a wide range of clinical scenarios related to kidney and liver transplantation, demonstrating greater accuracy and alignment with physician decisions across a variety of tasks, including differential diagnosis, choosing appropriate diagnostic tests and treatment, and predicting the prognosis of patients. These findings highlight the potential of artificial intelligence models like GPT-4 as valuable tools in supporting clinical decision-making in transplantation.