Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation

doi:10.5500/wjt.v15.i3.103536

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 15, Issue 3

This Article

Table of Contents

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (3471)

All Articles published online

The chart showing PDF series, HTML series, Tables (1-7) series.

Item

Count

PDF

124

HTML

1898

Tables (1-7)

442

Sum=2464

Publishing Process of This Article

The chart showing Browse series, Download series.

Item

Count

Browse

114

Download

755

Sum=869

Sep 18, 2025 (publication date) through Mar 2, 2026

Times Cited of This Article

Times Cited (2)

Journal Information of This Article

Publication Name

World Journal of Transplantation

ISSN

2220-3230

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Observational Study Open Access

World J Transplant. Sep 18, 2025; 15(3): 103536
Published online Sep 18, 2025. doi: 10.5500/wjt.v15.i3.103536

Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation

Chrysanthos D Christou, Olga Sitsiani, Panagiotis Boutos, Georgios Katsanos, Georgios Papadakis, Anastasios Tefas, Vassilios Papalois, Georgios Tsoulfas

Chrysanthos D Christou, Georgios Katsanos, Georgios Tsoulfas, Center for Research and Innovation in Solid Organ Transplantation, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki 54622, Greece

Olga Sitsiani, Panagiotis Boutos, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki 54622, Greece

Georgios Papadakis, Department of Nephrology and Transplantation, Guy’s Hospital, Guy’s and St Thomas’ NHS Foundation Trust, London SE1 1UL, United Kingdom

Anastasios Tefas, Computational Intelligence and Deep Learning Group, Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki 54636, Greece

Vassilios Papalois, Renal and Transplant Unit, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London W120HS, United Kingdom

ORCID number: Chrysanthos D Christou (0000-0002-5417-8686); Georgios Katsanos (0000-0002-5845-8175); Vassilios Papalois (0000-0003-1645-8684); Georgios Tsoulfas (0000-0001-5043-7962).

Author contributions: Christou CD, Sitsiani O, Boutos P, Katsanos G, Papadakis G, Tefas A, Papalois V, and Tsoulfas G gathered and prepared the clinical scenarios; Christou CD, Sitsiani O, and Boutos P ran the conversations and recorded the answers; Christou CD performed the statistical analysis and drafted the manuscript; and all authors reviewed and edited the manuscript.

Institutional review board statement: This study was conducted using anonymized patient data that are derived from medical records and in compliance with the Declaration of Helsinki and its later amendments and thus does not require IRB approval.

Informed consent statement: This study was conducted using anonymized patient data that are derived from medical records and in compliance with the Declaration of Helsinki and its later amendments and thus does not require informed consent.

Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.

STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-checklist of items.

Data sharing statement: The data underlying this article are available upon reasonable request from the corresponding author.

Corresponding author: Chrysanthos D Christou, MD, Center for Research and Innovation in Solid Organ Transplantation, School of Medicine, Aristotle University of Thessaloniki, 49 Konstantinoupoleos Street, Thessaloniki 54622, Greece. christouchrysanthosd@gmail.com

Received: November 25, 2024
Revised: January 26, 2025
Accepted: March 5, 2025
Published online: September 18, 2025
Processing time: 147 Days and 12.2 Hours

Abstract

BACKGROUND

Kidney and liver transplantation are two sub-specialized medical disciplines, with transplant professionals spending decades in training. While artificial intelligence-based (AI-based) tools could potentially assist in everyday clinical practice, comparative assessment of their effectiveness in clinical decision-making remains limited.

AIM

To compare the use of ChatGPT and GPT-4 as potential tools in AI-assisted clinical practice in these challenging disciplines.

METHODS

In total, 400 different questions tested ChatGPT’s/GPT-4 knowledge and decision-making capacity in various renal and liver transplantation concepts. Specifically, 294 multiple-choice questions were derived from open-access sources, 63 questions were derived from published open-access case reports, and 43 from unpublished cases of patients treated at our department. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others.

RESULTS

ChatGPT correctly answered 50.3% of the 294 multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (P < 0.001). Regarding the 63 questions from published cases, ChatGPT achieved an agreement rate of 50.79% and partial agreement of 17.46%, while GPT-4 demonstrated an agreement rate of 80.95% and partial agreement of 9.52% (P = 0.01). Regarding the 43 questions from unpublished cases, ChatGPT demonstrated an agreement rate of 53.49% and partial agreement of 23.26%, while GPT-4 demonstrated an agreement rate of 72.09% and partial agreement of 6.98% (P = 0.004). When factoring by the nature of the task for all cases, notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (P = 0.008), and successfully predicting the prognosis of the patient in 100% of related questions (P < 0.001).

CONCLUSION

GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. The progression and refinement of such AI-based tools could reshape the future of clinical practice, making their early adoption and adaptation by physicians a necessity.

Key Words: Artificial intelligence; ChatGPT; GPT-4; Transplantation; Kidney; Liver; Clinical decision support; Generative artificial intelligence

Core Tip: GPT-4 outperformed ChatGPT in a wide range of clinical scenarios related to kidney and liver transplantation, demonstrating greater accuracy and alignment with physician decisions across a variety of tasks, including differential diagnosis, choosing appropriate diagnostic tests and treatment, and predicting the prognosis of patients. These findings highlight the potential of artificial intelligence models like GPT-4 as valuable tools in supporting clinical decision-making in transplantation.

Citation: Christou CD, Sitsiani O, Boutos P, Katsanos G, Papadakis G, Tefas A, Papalois V, Tsoulfas G. Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation. World J Transplant 2025; 15(3): 103536
URL: https://www.wjgnet.com/2220-3230/full/v15/i3/103536.htm
DOI: https://dx.doi.org/10.5500/wjt.v15.i3.103536

INTRODUCTION

Artificial intelligence (AI) is an umbrella term used to describe any application where computer systems perform tasks traditionally linked with human intelligence. AI is, in reality, a broad field with a plethora of interrelated fields, including, among others, machine learning and probabilistic reasoning, deep learning, fuzzy systems, computer vision, and natural language processing[1]. Despite their differences, all these fields have one thing in common: They are driven by the advancements in big data and computing power. Other disciplines of AI, particularly machine learning, have found profound applications in healthcare, with models being utilized in the prevention, diagnosis, treatment, and prognosis of a plethora of diseases[2]. However, the use of other disciplines, such as natural language processing, in healthcare has been, until recently, limited. ChatGPT emerged in late 2022, by the AI research company OpenAI (San Francisco, CA, United States), having a profound impact in many industries. ChatGPT is a large language model that uses deep learning techniques to produce human-like responses to natural language inputs, based on a vast corpus of text data[3]. It is designed to interact with the user in a human-like manner in order to understand users’ requests and answer in an appropriate manner in order to assist in the problem-solving process, or it can just be used to conduct a human-to-computer real-time dialogue.

Despite not being developed as a medical tool, its potential application in healthcare has raised widespread attention with articles investigating its ability to provide reliable medical information on a variety of medical topics, pass medical exams, and assist in medical writing[4-6]. Contemporary healthcare systems urgently need to enhance their precision and accuracy in examinations, diagnosis, and treatment of the patient while reducing the time required for these procedures. Of utmost importance in this process is deemed to be the right and timely decision-making. The two characteristics of ChatGPT, the wide variety of information that has been trained with and the capability of constant answers, raise the question of whether this breakthrough could be used in everyday medical decision-making processes[7]. GPT-, the latest large language model introduced by OpenAI, has already proved to perform superiorly to its “predecessor”, ChatGPT, in medical exams[8]. AI tools have shown promising performance in tasks such as data analysis, diagnostic support, and clinical decision-making. However, their adoption into the clinical setting is hindered by several challenges, such as their underperformance in complex clinical reasoning and their proneness to biases, particularly when rare conditions are underrepresented in the training data. Additionally, their lack of transparency in how decisions are generated undermines their trustworthiness. Transplantation is a highly sub-specialized medical discipline, with transplant professionals spending decades in training. Few studies have been published regarding heart and lung transplantation[9,10]. This paper aims to investigate the performance of ChatGPT in the challenging medical disciplines of kidney and liver transplantation and compare it with the performance of GPT-4. Currently, only a few efforts exist in the literature investigating the role of ChatGPT in renal and liver transplantation and particularly comparing the performance of ChatGPT with GPT-4 in these disciplines[11-13].

MATERIALS AND METHODS

Resources

In total 400 different questions tested ChatGPT’s/GPT-4 knowledge and decision-making capacity in various concepts regarding renal and liver transplantation as follows: (1) 294 multiple-choice questions regarding liver and renal transplantation were collected from various open-access sources, including transplantation associations, medical examinations (such as United States Medical Licensing Examination), and book chapters; (2) Secondly, 63 questions derived from 20 open-access, published case reports regarding renal and liver transplantation were collected from the literature. From each case, a series of questions regarding differential diagnosis, diagnostic test, appropriate treatment, and prognosis were derived; and (3) Lastly, 43 questions derived from 10 unpublished cases regarding renal and liver transplantation from patients treated in our department were written. Similarly, a series of questions regarding the diagnosis and management of these patients were derived.

These 400 questions were selected to encompass a broad spectrum of clinical scenarios commonly encountered in kidney and liver transplantation. We also ensured they covered a plethora of tasks such as diagnostic reasoning, treatment planning, and prognostic evaluation to comprehensively assess of the models’ capabilities. Questions were sources from published case reports as well as unpublished department cases, which provided diverse and nuanced real-world contexts. This approach ensured the evaluation captured both the consistency and adaptability of these tools in clinical practice.

Dialogue design

Two different conversations were made for each multiple question, case report, or department case, one using ChatGPT and one using GPT-4, respectively. Default models (no fine-tuning or customization models) were used, as provided by OpenAI at the time of research. An example of a multiple-choice question dialogue is shown below:

“Physician: I am going to provide a series of virtual scenarios regarding liver transplantation. As a first task, I will need you to assign, based on the context of the question, one of the following labels: i. General information, ii. Diagnosis, iii. Differential Diagnosis, iv. Treatment, v. Prognosis, vi. Other. Then, as a second task, I will provide you with multiple-choice answers and I will need you to give the correct answer. Are you ready?

AI: Absolutely, I’m ready! Please proceed with the virtual scenarios, and I’ll do my best to assist you with the labelling and answering tasks.

Physician: Question 1: Which of the following is an absolute contraindication for liver transplantation?

A. History of cancer

B. Active tuberculosis

C. Cystic Fibrosis

D. Portal thrombosis

E. Active substance abuse

AI: Label: Treatment

Correct Answer: E. Active substance use”

The first task was to assign a label regarding the nature of the task of the multiple-choice question. The answer was compared with the label assigned by two physicians. Secondly, provided with the multiple-choice question, ChatGPT/GPT-4 was asked to solve the question to the best of their knowledge. Regarding the published case reports and the unpublished department cases, the cases were presented gradually, as parts, posing questions in between. All parts/questions were pre-defined and were not affected by the in-between ChatGPT/GPT-4 answers to avoid bias. Unpublished work was used to ensure that performance remained consistent for cases that were not part of ChatGPT’s or GPT-4’s training dataset (leading to high performance due to overfitting). ChatGPT and GPT-4were tested for the following skills: Providing a differential diagnosis for a patient following case presentation, suggesting the most probable diagnosis, suggesting the next diagnostic test, and finally, suggesting appropriate treatment options. An example of a case report/unpublished department case dialogue interaction is shown below:

“Physician: I am going to provide you with information regarding a case report of a patient who underwent liver transplantation. I am going to provide the information gradually, asking some questions in between.

A 60-year-old patient with a history of cirrhosis due to alcohol-associated liver disease […]

Could you provide a differential diagnosis for the etiology of this patient’s shock?

AI: Based on the information provided […]

Physician: Which is the most probable diagnosis for this patient?

AI: Based on the information provided […]

Physician: What further tests do you believe are needed for this patient?

AI: Based on the patient’s presentation and suspected […]

Physician: Further investigations were sent including mixed chimerism studies that revealed […], consistent with the diagnosis of graft vs host disorder.

What treatment(s) could be used for this patient?”

Study outcomes

For multiple-choice questions, we recorded agreement and disagreement with the physician’s label for concept labelling and with the correct answer for response selection. Concept labelling was conducted by two authors independently and then reviewed by a third author. Disagreements were resolved through discussion among the three. For case reports and unpublished department cases, we assigned end-points as follows: Disagreement when the ChatGPT/GPT-4 proposal did not match/was different from the physicians’ decision. Partial agreement was assigned if the ChatGPT/GPT-4 proposal included a portion of the actions taken by physicians, and finally, agreement was assigned when the ChatGPT/GPT-4 proposal either perfectly matched the physicians’ actions or when the ChatGPT/GPT-4 included additional actions. For example, if physicians used medications A and B for treatment, and ChatGPT/GPT-4 proposed drug C, this would be labelled as disagreement. If ChatGPT/GPT-4 proposed drug A, it would be labelled as partial agreement. If it proposed both drugs A and B, or suggested a choice among A, B, and C, it would be labelled as agreement. All “ground truth” labels were determined before any of the conversations with these tools took place to mitigate confirmation and observer bias.

Data extraction

To create a dataset that could be used for statistical analysis, we constructed dataset tables that could then be translated into variables. This procedure required two layers of data decoding. More specifically, two types of tables were created for each resource dialogue conducted with ChatGPT/GPT-4 that had the following structure: (1) For multiple-choice questions, the table included the following details: Serial number of the dialogue with ChatGPT/GPT-4, the question posed, the resource, the predefined label for the question, the label assigned by ChatGPT/GPT-4, the agreement (A) or disagreement (D) status regarding the label, the ChatGPT/GPT-4 response to the question, and the agreement or disagreement status regarding the question; and (2) For the case reports and department cases, the table included: Serial number of the dialogue with ChatGPT/GPT-4, the question posed, the resource (not included in department cases), the action taken by the author’s teams for published case reports and our team for department cases, the actions proposed by ChatGPT/GPT-4, and the agreement, disagreement, or partial agreement status.

Statistical analysis

For each scenario, we assessed the model’s performance by categorizing the response as agreement, partial agreement, or disagreement based on the predefined criteria above. The proportions of responses in each category were then calculated across all scenarios. To compare the performance of ChatGPT and GPT-4, we used Pearson’s χ²-test to evaluate the distribution of agreement levels across the two models. Statistical significance was defined as P < 0.05. All statistical analyses were performed using SPSS29.

RESULTS

Collected resources

In total, the study generated 1388 data points, 1176 from multiple-choice questions and 212 from case questions. Two hundred ninety-four multiple-choice questions regarding renal and liver transplantations were collected, 108 regarding renal and 186 regarding liver transplantation[14-20]. When it comes to the nature of each scenario, 78 (26.5%) regarded general information, 22 (7.5%) regarded differential diagnosis, 48 (16.3%) regarded appropriate diagnostic test(s), 86 (29.3%) regarded treatment, and 60 (20.4%) regarded prognosis. Twenty case reports were selected from the literature. Ten cases regarded kidney transplantation, and 29 questions were derived from those cases. Ten cases regarded liver transplantation, and 34 questions were derived from those cases. Thus, in total, 63 questions on published case reports were tested. Regarding unpublished department cases, we chose 10 cases. Five cases regarded renal transplantation, and 26 questions were derived from those cases. Five cases regarded liver transplantation, and 17 questions were derived from those cases. In total, 43 questions about department cases were tested.

Comparing ChatGPT and GPT-4 performance

Multiple-choice questions: Tables 1 and 2 show the performance of ChatGPT and GPT-4 in assigning context labels for the 294 multiple-choice questions. Supplementary Tables 1 and 2 show the performance of ChatGPT and GPT-4 in answering those questions. Overall, ChatGPT assigned 58.2% correct labels (171 out of 294). ChatGPT’s accuracy in assigning appropriate labels varied across categories. Specifically, the highest accuracy was demonstrated in the treatment category, reaching 74.42% (64 out of 86), followed by diagnosis at 68.75% (33 out of 48). Performance regarding differential diagnosis and prognosis was the same, at 50%.

Table 1 Overall ChatGPT performance in assigning context labels across 294 virtual cases, highlighting agreement with predefined labels, n (%).

Actual label	Assigned label, GI	Assigned label, diagnosis	Assigned label, DD	Assigned label, treatment	Assigned label, prognosis	Assigned label, total
GI	33 (11.22)	20 (6.8)	6 (24)	16 (5.44)	3 (12)	78 (26.53)
Diagnosis	8 (2.72)	33 (11.22)	5 (1.7)	0 (0)	2 (0.68)	48 (16.33)
DD	0 (0)	11 (3.74)	11 (3.74)	0 (0)	0 (0)	22 (7.48)
Treatment	9 (36)	10 (3.4)	2 (0.68)	64 (21.77)	1 (0.34)	86 (29.25)
Prognosis	12 (48)	12 (48)	3 (12)	3 (12)	30 (10.2)	60 (20.41)
Total	62 (219)	86 (29.25)	27 (9.18)	83 (28.23)	36 (12.24)	294 (100)

DD: Differential diagnosis; GI: General information.

Open in New Tab Full Size Table

Table 2 Overall GPT-4 performance in assigning context labels in virtual cases across 294 virtual cases, highlighting agreement with predefined labels, n (%).

Actual label	Assigned label, GI	Assigned label, diagnosis	Assigned label, DD	Assigned label, treatment	Assigned label, prognosis	Assigned label, total
GI	42 (14.29)	14 (4.76)	1 (0.34)	20 (6.8)	1 (0.34)	78 (26.53)
Diagnosis	2 (0.68)	44 (14.97)	1 (0.34)	1 (0.34)	0 (0)	48 (16.33)
DD	0 (0)	15 (5.1)	5 (1.7)	2 (0.68)	0 (0)	22 (7.48)
Treatment	5 (1.7)	7 (2.38)	1 (0.34)	73 (24.83)	0 (0)	86 (29.25)
Prognosis	10 (3.4)	11 (3.74)	5 (1.7)	7 (2.38)	27 (9.18)	60 (20.41)
Total	59 (207)	91 (30.95)	13 (4.42)	103 (353)	28 (9.52)	294 (100)

DD: Differential diagnosis; GI: General information.

Open in New Tab Full Size Table

GPT-4 significantly improved overall accuracy compared to ChatGPT labelling correctly 191 out of 294 scenarios (64.97% vs 58.16%, P < 0.001). GPT-4 demonstrated improved performance in some categories compared to ChatGPT and lower in others. Notably, its performance in assigning the diagnosis label reached 91.7% (44 out of 48), a statistically significant difference compared to ChatGPT (P = 0.049). The treatment category also demonstrated a statistically significant improved accuracy compared with ChatGPT at 84.9% (73 out of 86, P < 0.001). GPT-4 performed poorer than ChatGPT in assigning differential diagnosis (22.7%), but this did not reach statistical significance (P = 0.13).

The performance of ChatGPT and GPT-4 in answering questions regarding kidney and liver transplantation was evaluated through a detailed review of their agreement and disagreement rates across multiple scenarios. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others (Supplementary Tables 1 and 2). Overall, ChatGPT correctly answered 50.3% (148 out of the 294) multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (208 out of 294), which was found statistically significant (P < 0.001). Regarding kidney transplantation, ChatGPT demonstrated an accuracy of 71.3% (77 out of 108), while GPT-4 had an accuracy of 83.3% (90 out of 108), a statistically significant difference (P = 0.006). Interestingly, both tools were right in 63.9% of instances (69 out of 108), both incorrect in 9.3% (10 out of 108), ChatGPT correct and GPT-4 incorrect in 7.4% (8 out of 108), and ChatGPT incorrect and GPT-4 correct in 19.4% (21 out of 108). Regarding liver transplantation, ChatGPT demonstrated an accuracy of 38.2% (71 out of 186), while GPT-4 had an accuracy of 63.4% (118 out of 186), a statistically significant improvement (P < 0.001) Interestingly, both tools were right in 33.9% of instances (63 out of 186), both incorrect in 32.3% (60 out of 186), ChatGPT correct and GPT-4 incorrect in 4.3% (8 out of 186), and ChatGPT incorrect and GPT-4 correct in 29.6% (55 out of 186).

When factoring based on the nature of the scenario, ChatGPT demonstrated an overall agreement of 43.6% (34 out of 78) for general information, 81.8% (18 out of 22) for differential diagnosis, 60.4% (29 out of 48) for the next diagnostic test, 45.3% for treatment (39 out of 86), and 46.7% for prognosis (28 out of 60). On the other hand, GPT-4 demonstrated superior performance in all types of scenarios except those regarding differential diagnosis. Specifically, GPT-4 demonstrated an overall agreement rate of 67.9% (53 out of 78) for general information, 77.3% (17 out of 22) for differential diagnosis, 77.1% (37 out of 48) for the next diagnostic test, 66.3% for treatment (57 out of 86), and 73.3% for prognosis (44 out of 60).

Published case reports: Table 3[21-30] and Table 4[22,31-38] compare the performance of ChatGPT and GPT-4 in various clinical tasks derived from published case reports. Overall, ChatGPT demonstrated an agreement rate of 50.79% (32 out of 63), a partial agreement rate of 17.46% (11 out of 63), and a disagreement rate of 31.75% (20 out of 63). GPT-4 demonstrated an agreement rate of 80.95% (51 out of 63), partial agreement of 9.52% (6 out of 63), and disagreement of 9.52% (6 out of 63). The overall performance of GPT-4 was found to be significantly higher compared with ChatGPT (P = 0.01). Regarding renal transplantation, ChatGPT demonstrated an agreement rate of 62.07% (18 out of 29), partial agreement of 13.79% (4 out of 29), and disagreement of 24.14% (7 out of 29). GPT-4 demonstrated an agreement rate of 89.66% (26 out of 29), partial agreement of 6.9% (2 out of 29), and disagreement of 3.45% (1 out of 29). Regarding liver transplantation, ChatGPT demonstrated an agreement rate of 41.18% (14 out of 34), partial agreement of 20.59% (7 out of 29), and disagreement of 38.24% (13 out of 34). GPT-4 demonstrated an agreement rate of 73.53% (25 out of 34), partial agreement of 11.76% (4 out of 34), and disagreement of 14.71% (5 out of 34). Supplementary Table 3 presents the performance of ChatGPT vs GPT-4 when categorized by the nature of the task. Notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 94.4% of the cases (17 out of 18). Furthermore, GPT-4 suggested an appropriate diagnostic test for further investigating patient’s symptoms in 90% of cases (9 out of 10). Finally, GPT-4 successfully suggested a treatment that agreed with the ground truth in 93.3% of the cases (14 out of 15).

Table 3 Comparative performance of ChatGPT and GPT-4 in case reports on renal transplantation, detailing agreement levels by task type.

Ref.	Question number	Task	Performance, ChatGPT/GPT-4	Physicians course of action/ground truth	Agreement status, ChatGPT/GPT-4
Alharbi et al[21]	1	Provide a list of suitable antibiotics for pseudomonas aeruginosa urinary tract infection.	Provided a list of suitable antibiotics including the one used by physicians (meropenem)/provided a list of suitable antibiotics including the one used by physicians (meropenem)	Meropenem was administrated	A/A
	2	Suggest the next diagnostic test(s) needed for the patient	Suggested a renal ultrasound and a stool culture/suggested a renal ultrasound, abdominal CT, blood cultures, and a stool culture	Abdominal ultrasound and abdominal CT scan were conducted	PA/A
	3	Identify the most probable diagnosis for the patient	Renal allograft malignancy/renal allograft malignancy	Eosinophilic chromophobe renal cell carcinoma was confirmed by the histopathological examination of the graft	A/A
Rubin et al[22]	4	Provide a DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	CMV viremia	A/A
	5	Provide the most likely diagnosis for the patient	Post-influenza bacterial pneumonia/CMV reactivation	CMV viremia was demonstrated by antigenemia and PCR assay	D/A
	6	Suggest treatment for the patient	Suggested ganciclovir, valganciclovir, foscarnet, and cidofovir (most preferable ganciclovir or valganciclovir)/suggested ganciclovir, valganciclovir, foscarnet, and cidofovir (most preferable ganciclovir or valganciclovir)	Intravenous ganciclovir followed by oral valganciclovir at a dose of 900 mg/day was administered	A/A
Molina-Andújar et al[23]	7	Provide a DD for the patient	Provided a DD that included the final diagnosis/ Provided a DD that included the final diagnosis	Acute post-infectious glomerulonephritis	A/A
Molina-Andújar et al[23]	8	Provide the most likely diagnosis for the patient	Acute post-infectious glomerulonephritis/acute post-infectious glomerulonephritis	Acute post-infectious glomerulonephritis	A/A
Baker et al[24]	9	Provide the next step patient’s management	Suggested hemodynamic stabilization with transfusion of blood products and bleeding control including surgical intervention, if necessary, followed by continuous monitoring/Suggested hemodynamic stabilization with transfusion of blood products and surgical exploration if bleeding if suspected to be within the surgical site. Suggested medication reevaluation focus on anticoagulants, prophylactic treatment for infection prevention and continuous monitoring.	The patient was taken back to theatre for exploration where ligation of the bleeding artery, removal of blood clots and blood transfusion took place. Postoperative monitoring was performed	A/A
	10	Suggest the next diagnostic test needed for the patient	Suggested an abdominal CT scan or an ultrasound/suggested imaging such as abdominal CT with contrast, an ultrasound or an angiogram is performed. Suggested evaluating the patient with new laboratory tests and for the need of re-exploration	An urgent CT angiogram was performed	PA/A
	11	Provide a DD for the bleeding	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Bleeding from a small branch of the renal artery	A/A
	12	Provide the most likely diagnosis for the patient	Failure or dislodgement of a surgical clip: Bleeding from a small branch of the renal artery where a surgical clip had come off during the re-exploration surgery/Failure or dislodgement of a surgical clip: Bleeding from a small branch of the renal artery where a surgical clip had come off during the re-exploration surgery	Bleeding was noticed from a small branch of the renal artery	A/A
Gewehr et al[25]	13	Provide a DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Fungal infection	A/A
Gewehr et al[25]	14	Provide the most likely diagnosis for the patient	Fungal Infection/fungal Infection, and specifically sporotrichosis	Fungal Infection (sporotrichosis)	A/A
Vassallo et al[26]	15	Provide a DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Active hepatitis E virus infection	A/A
	16	Provide the most likely diagnosis for the patient	NAFLD/NAFLD or drug induced	Active hepatitis E virus infection	D/D
	17	Suggest the next diagnostic test needed for the patient	Suggested liver biopsy along with further imaging and laboratory investigations/suggested liver biopsy along with further imaging and laboratory investigations	Liver biopsy	A/A
	18	Suggest the next diagnostic test needed for the patient after the biopsy results	Suggested extensive viral serologic tests, PCR for suspected viruses, immunostaining of liver biopsy, and continuous monitoring of liver function/suggested extensive viral serologic tests, PCR for suspected viruses, immunostaining of liver biopsy, and continuous monitoring of liver function	A more extensive viral screen was conducted	A/A
Olsen et al[27]	19	Provide a DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Epstein-Barr virus-negative, diffuse, large B-cell lymphoma	A/A
	20	Provide the most likely diagnosis for the patient	Suggested that infectious aetiologies such as disseminated tuberculosis or fungal infections are more likely. It implied that diagnosis is difficult without further diagnostic investigations/determined as PTLD as the most likely diagnosis followed by infectious aetiologies	Epstein-Barr virus-negative, diffuse, large B cell lymphoma	D/A
	21	Suggest the next diagnostic test needed for the patient	Suggested sputum and/or BAL cultures, Mantoux test or IGRA, Blood tests, further imaging and laboratory tests, and lung biopsy/suggested liver biopsy, sputum and/or BAL cultures, Mantoux test or IGRA, Blood tests, further imaging and laboratory tests, and lung biopsy	Biopsy from one of the liver lesions	D/A
Allam et al[28]	22	Suggest the next diagnostic test needed for the patient	Suggested a kidney biopsy/suggested a kidney biopsy and further laboratory tests	Transplant biopsy was performed	A/A
	23	Provide a DD for the patient	Provided a DD that did not include the final diagnosis/provided a DD including vascular complications such as vein stenosis	Biopsy-induced arteriovenous fistula and venous stenosis	D/PA
	24	Suggest treatment for the patient	Suggested intervention to address the arteriovenous fistula and stenosis of the main renal vein (embolization, angioplasty, stenting)/suggested intervention to address the arteriovenous fistula and stenosis of the main renal vein (embolization, angioplasty, stenting)	Embolization of fistula (coil occlusion)	A/A
Subramanian et al[29]	25	Provide a DD for the patient	Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis	A small basal ganglia infarct and an infarct of the spinal cord was found	D/A
	26	Provide the most likely diagnosis for the patient	Suggested ischemic injury or infarction of the spinal cord/suggested spinal cord ischemia or infarction	A small basal ganglia infarct and an infarct of the spinal cord was found	A/A
	27	Suggest the next diagnostic test needed for the patient	Suggested spine MRI, NCS and EMG to assess peripheral nerves and muscles, lumbar puncture if infections suspected, and transplant biopsy if rejection or ischemia is suspected/suggested spine MRI-MRA, neurond physiological studies (SSEP, NCS and EMG), lumbar puncture if infections suspected	A CTAP, and spine/brain MRI were performed	PA/PA
Ainsworth et al[30]	28	Provide a DD for the patient	Provided a DD that included immune-mediated hemolysis but did not specifically include PLS/provided a DD that included the final diagnosis	PLS	PA/A
Ainsworth et al[30]	29	Provide the most likely diagnosis for the patient	Suggested hemolysis due to mismatched blood type of the donor/suggested PLS	PLS	D/A

CT: Computed tomography; DD: Differential diagnosis; A: Agreement; BAL: Bronchoalveolar lavage; D: Disagreement; CMV: Cytomegalovirus; CTAP: Computed tomography arterial portography; EMG: Electromyography; IGRA: Interferon-gamma release assay; MRA: Magnetic resonance angiography; MRI: Magnetic resonance imaging; NAFLD: Non-alcoholic fatty liver disease; NCS: Nerve conduction studies; PA: Partial agreement; PCR: Polymerase chain reaction; PLS: Passenger lymphocyte syndrome; PTLD: Post-transplant lymphoproliferative disorder; SSEP: Somatosensory evoked potentials.

Open in New Tab Full Size Table

Table 4 Comparative performance of ChatGPT and GPT-4 in case reports on liver transplantation, detailing agreement levels by task type.

Ref.	Question number	Task	Performance, ChatGPT/GPT-4	Physicians course of action/ground truth	Agreement status, ChatGPT/GPT-4
Rubin et al[22]	1	Case presentation/provide a DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	CMV	A/A
	2	Provide the most likely diagnosis for the patient	Suggested post-transplant infection, particularly a viral infection (CMV, EBV, or VZV)/CMV	CMV	PA/A
	3	Justify the recurrence of CMV infections despite treatment	Suggested resistance to ganciclovir/suggested resistance to ganciclovir or/and inadequate duration of initial treatment-secondary infections	Ganciclovir resistant infection	A/A
	4	Suggest alternative treatment for the patient	Suggested foscarnet/suggested foscarnet or cidofovir or letermovir or/and CMV immunoglobulins	Foscarnet was administered	A/A
Okeke et al[31]	5	Case presentation/suggest treatment for the patient given no arterial flow in the liver	Suggested interventional radiology procedures or/and surgical revascularization/suggested interventional radiology procedures or/and surgical revascularization (thrombectomy or re-anastomosis)	Interventional radiology procedure (thrombolysis) was performed. Then revascularization was achieved intraoperatively (infra-aortic jump was performed)	PA/PA
	6	Suggest the diagnostic tests needed for the patient following re-thrombosis	Suggested doppler ultrasound, CT angiogram, coagulation profile-thrombophilia testing/suggested thrombophilia workup, repeat imaging (doppler ultrasound, CT/MRI angiography), and autoimmune markers	Hypercoagulable workup was performed	A/A
	7	Provide a DD behind re-thrombosis	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Antiphospholipid syndrome	A/A
	8	Provide the most likely diagnosis for the patient	Suggested hepatic artery thrombosis/suggested antiphospholipid syndrome	Antiphospholipid syndrome	D/A
Eubank et al[32]	9	Case presentation/determine the most likely microorganism to be identified by the swab	Suggested Staphylococcus aureus, Streptococcus species, Enterococcus species, and Pseudomonas aeruginosa, and fungi like Candida albicans/suggested Staphylococcus aureus, Enterococcus species, Pseudomonas aeruginosa, Escherichia coli, fungi like Candida or Aspergillus, viruses like CMV, and mycobacteria	94% Enterococcus faecalis, 93% Rhizopus oryzae, and 5% Aspergillus flavus	D/PA
Eubank et al[32]	10	Suggest treatment for the patient given the pathogens identified	Suggested intravenous liposomal amphotericin B at an appropriate dosage, along with surgical debridement of infected tissue/suggested intravenous liposomal amphotericin B at an appropriate dosage, oral posaconazole along with surgical debridement of infected tissue.	Oral posaconazole 300 mg and IV amphotericin B and micafungin daily. Amphotericin B deoxycholate irrigation in the wound vacuum	PA/A
Kim et al[33]	11	Case presentation/provide a DD for the patient’s shock	Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis	GVHD	D/A
	12	Provide the most likely diagnosis for the patient	Suggested a surgical complication, specifically duodenal perforation/suggested duodenal perforation or drug-induced kidney injury/neutropenia	GVHD	D/D
	13	Suggest the further diagnostic tests needed for the patient	Suggested blood cultures, peritoneal fluid analysis, endoscopy or upper GI imaging/suggested blood and urine cultures, viral and fungal tests, peritoneal fluid analysis, laboratory tests, and endoscopy or upper GI imaging	Mixed chimerism studies and skin biopsy were performed	D/D
	14	Suggest further treatment for the patient given the mixed chimerism studies results	The following treatment options were suggested: Systemic corticosteroids, adjusting tacrolimus dose, consider additional immunosuppressives such as mycophenolate, and phototherapy/suggested considering the following treatment options: High-dose corticosteroids, ATG, ECP, infliximab, ruxolitinib, MSC transplantation, additional immunosuppressive agents, and IL-2 diphtheria toxin	Steroids were administrated for 4 days followed by ruxolitinib due to patient not responding to treatment	PA/A
	15	Guess the survival of the patient	Suggested that the patient did not, most likely, survive/suggested that the patient did not, most likely, survive	The patient died on day 16 of re-admission, 45 days following transplantation	A/A
Kim et al[33], (b)	16	Case presentation/provide a DD for the patient	Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis	GVHD	D/A
	17	Provide the most likely diagnosis for the patient	Suggested Clostridioides difficile colitis/suggested GVHD	GVHD	D/A
	18	Suggest treatment for the patient	The following treatment options were suggested: Glucocorticoids, CNIs, ATG, T-cell depleting agents such as basiliximab/high-dose corticosteroids, adjust immunosuppression, consider second line treatments such as ATG, ECP, sirolimus, infliximab, and basiliximab	Steroids were administrated for 2 days followed by ruxolitinib due to patient not responding to treatment	PA/PA
	19	Guess the survival of the patient	Declined to make a prediction/suggested that the patient did not, most likely, survive	The patient died 29 days after transplant	D/A
Ramírez de la Piscina et al[34]	20	Case presentation/Provide a DD for the patient	Provided a DD that included the final diagnosis/ provided a DD that included the final diagnosis	Budd-Chiari syndrome secondary to ADPKD	A/A
	21	Provide the most likely diagnosis for the patient	Suggested Budd-Chiari syndrome/suggested Budd-Chiari syndrome secondary to the compression from ADPKD cysts	Budd-Chiari syndrome secondary to ADPKD	A/A
	22	Suggest treatment for the patient	Provided a list of suitable treatment options including only liver transplantation/provided a list of suitable treatment options including combined transplantation	A combined liver and renal transplantation was performed	PA/A
Arstikyte et al[35]	23	Case presentation/provide a DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Venous air embolism	A/A
	24	Provide the most likely diagnosis for the patient	Suggested that information given is insufficient to single out a specific diagnosis/suggested that based on given information hemorrhage or venous air embolism are the two most likely diagnoses	Venous air embolism	D/A
	25	Suggest appropriate diagnostic test for the patient	Suggested TEE/suggested TEE	TEE	A/A
Aucejo et al[36]	26	Case presentation/provide a DD for the patient	Provided a DD that did not include the final diagnosis/provided a DD that did not include the final diagnosis	Narrowing of the RHV at the level of the cava-caval anastomosis	D/D
	27	Provide the most likely diagnosis for the patient	Suggested adhesions, anastomotic leakage, or biliary complications/suggested PVT	Narrowing of the RHV at the level of the cava-caval anastomosis	D/D
	28	Given the RHV stenosis diagnosis, suggest treatment for the patient	Suggested considering stent placement, TIPS or surgical revision/suggested considering stent placement, TIPS or surgical revision	A wall stent 14 mm in diameter by 40 mm in length was placed across the RHV stenosis	A/A
Ichimura et al[37]	29	Case presentation/provide a DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	VOD/SOS	A/A
	30	Provide the most likely diagnosis for the patient	Suggested GVHD/suggested VOD/SOS	VOD/SOS	D/A
	31	Suggest treatment for the patient given VOD/SOS	Suggested considering defibrotide, anticoagulant medications, and liver transplantation/suggested considering defibrotide, anticoagulant medications, TIPS, and liver transplantation	The physicians performed a liver transplantation since defibrotide had not yet been approved	A/A
	32	Provide a new differential diagnosis for the patient’s deterioration postoperatively	Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis	GVHD, several infections	D/A
Trevizoli et al[38]	33	Case presentation/suggest appropriate treatment for the patient	Suggested considering corticosteroids, aminosalicylates, immunomodulators such as azathioprine, biologic agents such as infliximab, diuretics, variceal bleeding prophylaxis and liver transplant evaluation/suggested considering corticosteroids, aminosalicylates, immunomodulators such as azathioprine, biologic agents such as infliximab, consider surgical management (colectomy), diuretics, variceal bleeding prophylaxis and liver transplant evaluation	Sodium restriction, diuretic therapy, hydrocortisone 300 mg was started without adequate response, vedolizumab	PA/PA
Trevizoli et al[38]	34	Suggest appropriate treatment for the patient given the DVT progression	Suggested LMWH and IVF/suggested LMWH	He underwent hemodynamic intervention with the placement of a vena cava filter	A/D

A: Agreement; ADPKD: Autosomal dominant polycystic kidney disease; ATG: Anti-thymocyte globulin; CMV: Cytomegalovirus; CNI: Calcineurin inhibitor; CT: Computed tomography; D: Disagreement; DD: Differential diagnosis; EBV: Epstein-Barr virus; ECP: Extracorporeal photopheresis; GI: Gastrointestinal; GVHD: Graft-versus-host disease; IVF: Inferior vena cava; IL: Interleukin; LMWH: Low-molecular-weight heparin; MRI: Magnetic resonance imaging; MSC: Mesenchymal stem cells; PA: Partial agreement; PVT: Portal vein thrombosis; RHV: Right hepatic vein; SOS: Sinusoidal obstruction syndrome; TEE: Transesophageal echocardiogram; TIPS: Transjugular intrahepatic portosystemic shunt; VOD: Veno-occlusive disease; VZV: Varicella-zoster virus.

Open in New Tab Full Size Table

Unpublished department cases: Supplementary Tables 4 and 5 provide the case presentation of the unpublished department cases provided to ChatGPT/GPT-4 before their performance was tested on various tasks. Tables 5 and 6 compare the accuracy of ChatGPT and GPT-4 in various clinical tasks derived from those unpublished department cases. Overall, ChatGPT demonstrated an agreement rate of 53.49% (23 out of 43), partial agreement of 23.26% (10 out of 43), and disagreement of 23.26% (10 out of 43). GPT-4 demonstrated an agreement rate of 72.09% (31 out of 43), partial agreement of 6.98% (3 out of 43), and disagreement of 20.93% (9 out of 43). The overall performance of GPT-4 was found to be significantly higher compared with ChatGPT (P = 0.004).

Table 5 Comparative performance of ChatGPT and GPT-4 in department cases on renal transplantation, detailing agreement levels by task type.

Case ID	Question number	Task	Performance, ChatGPT/GPT-4	Physicians course of action/ground truth	Agreement status, ChatGPT/GPT-4
1	1	Case presentation/provide the diagnostic tests needed to investigate refractory ascites in patient with ADPKD	Suggested abdominal ultrasound, paracentesis with fluid analysis, LF tests, tumor marker tests, CT scan, serologic testing, genetic testing/ suggested paracentesis with fluid analysis, LF tests, abdominal ultrasound, CT scan, echocardiogram, and endoscopy, further evaluation for elevated markers	Paracentesis (ascites fluid was send for cytology, culture, TB investigation, SAAG calculation), abdominal CT, liver ultrasound, LF tests, tumor marker tests, serologic testing, echocardiogram, and endoscopy	PA/A
	2	Provide a DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Tuberculous peritonitis	A/A
	3	Provide the most likely diagnosis for the patient	Suggested malignancy (most likely ovarian cancer) or SBP are the most likely diagnoses/suggested tuberculous peritonitis or malignancy or SBP as the most likely diagnoses	Tuberculous peritonitis	D/A
2	4	Case presentation/provide a differential diagnosis for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Acute PE	A/A
	5	Provide the most probable diagnosis for the patient	Suggested myocardial infraction as the most probable diagnosis/suggested PE as the most probable diagonal	Acute PE	D/A
	6	What diagnostic test is more suitable for this patient	Suggested CTPA and ECG be performed/suggested CTPA, ECG, and d-dimers tests be performed	CTPA was performed	A/A
	7	What treatment do you recommend for this patient, given PE is confirmed	Suggested a choice among LMWH, DOACs, and warfarin. No discrimination between short and long-term anticoagulation was made. Suggested initial anticoagulation with either LMWH or DOACs including apixaban followed by a long-term anticoagulation with either a DOAC or warfarin	10 mg apixaban BD was commenced followed by 5 mg BD for 6 months	PA/A
3	8	Case presentation/provide a DD given the post-operative signs/symptoms of the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Post-operative bleeding	A/A
	9	Provide the most probable diagnosis	Suggested exacerbation or progression of her underlying thrombocytopenic disorder/suggested post-transplant acute thrombotic microangiopathy	Post-operative bleeding	D/D
	10	Predict the next diagnostic test that the patient requires	Suggested coagulation studies, renal function test, peripheral blood smear, infectious testing and imaging including ultrasound and CT/suggested peripheral blood smear, LDH level, Coombs test, renal function, immunosuppressive level tests, and infection screening.	Abdominal ultrasound and abdomen/pelvis CT with contrast	PA/D
	11	Appropriate treatment given the evidence of active bleeding	Suggested stabilization with intravenous fluids and blood products, surgical intervention, and close monitoring/suggested stabilization with intravenous fluids and blood products, surgical intervention, and close monitoring	The patient was transfused and was re-explored	A/A
4	12	Case presentation/provide a DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Acute graft thrombosis due to renal vein thrombosis	A/A
	13	Provide the most probable diagnosis	Suggested acute graft thrombosis due to either renal artery or vein thrombosis/suggested acute graft thrombosis due to renal vein thrombosis	Acute graft thrombosis due to renal vein thrombosis	A/A
	14	Provide the most suitable diagnostic test	Suggested choosing among transplant duplex US, CT angiography, and renal scintigraphy/suggested choosing among transplant duplex US, CT angiography, and renal scintigraphy	Transplant doppler US	A/A
	15	Given the transplant US findings, provide the patient’s diagnosis	Acute renal allograft rejection/acute renal artery thrombosis or artery stenosis	Renal vein thrombosis	D/D
	16	Given the transplant US findings, suggest a diagnostic modality that could verify diagnosis	Renal biopsy/suggested CT angiography	CT angiography was performed	D/A
	17	Suggest treatment options for the patient	Suggested considering high-dose corticosteroids, antithymocyte globulin, calcineurin inhibitors, mycophenolate mofetil, basiliximab or alemtuzumab, and plasmapheresis with intravenous immunoglobulin/suggested surgical revascularization	Patient was re-explored	D/A
	18	Findings of reperfusion during benchwork after explanation	Suggested inadequate restoration of tissue perfusion and significant vascular compromise and tissue damage/suggested extensive, vascular thrombosis with poor kidney perfusion, and evidence of parenchymal damage	Artery perfusion required high pressure, kidney became turgid, swollen, and a capsular tear was seen	A/A
5	19	Case presentation/provide DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Post-transplant obstructive LUTS due to clot retention	A/A
	20	Provide most probable diagnosis	Suggested urinary tract obstruction most probably at the side of the anastomosis/suggested urinary tract obstruction due to blood clot formation as the most probable diagnosis	Post-transplant obstructive LUTS due clot retention	PA/A
	21	Suggest next diagnosis test to verify the diagnosis	Suggested considering transplant US, abdominal CT or renal scintigraphy/suggested transplant US as the first-line image modality. Suggested that other option include abdominal CT, MRI, and nuclear medicine scans	A transplant US was performed	A/A
	22	Given findings of US/suggest a suitable treatment option for the patient	Suggested considering manual irrigation, catheter flushing, cystoscopic clot evaluation, and monitoring/suggested replacing the foley catheter to flush out smaller clots, cystoscopic clot evaluation, consider percutaneous nephrostomy, and monitoring	A 3-way irrigation system was applied	PA/PA
	23	Despite resolved hematuria patient’s clearance did not improved/provide a DD	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Acute kidney injury with acute tubular necrosis	A/A
	24	Provide most probable diagnosis	Suggested acute kidney injury with acute tubular necrosis as the most probable diagnosis/suggested acute kidney injury with acute tubular necrosis as the most probable diagnosis	Acute kidney injury with acute tubular necrosis	A/A
	25	Case progression update/poor renal function 3 months post-operatively provide DD for patient’s signs and symptoms	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Recurrence of underlying disease	A/A
	26	Provide most probable diagnosis	Suggested chronic allograft dysfunction as the most probable diagnosis/suggested chronic allograft dysfunction and recurrence of the underlying disease as the two most probable diagnoses	Recurrence of underlying disease	D/PA

A: agreement; ADPKD: Autosomal dominant polycystic kidney disease; BD: Twice a day; CT: Computed tomography; CTPA: Computed tomography pulmonary angiogram; D: Disagreement; DD: Differential diagnosis; DOAC: Direct oral anticoagulants; ECG: Electrocardiogram; LDH: Lactate dehydrogenase; LF: Liver function; LMWH: Low molecular weight heparin; LUTS: Lower urinary tract symptoms; MRI: Magnetic resonance imaging; PA: Partial agreement; PE: Pulmonary embolism; SAAG: Serum ascites albumin gradient; SBP: Spontaneous bacterial peritonitis; TB: Tuberculosis; US: Ultrasound.

Open in New Tab Full Size Table

Table 6 Comparative performance of ChatGPT and GPT-4 in department cases on liver transplantation, detailing agreement levels by task type.

Case ID	Question number	Task	Performance, ChatGPT/GPT-4	Physicians course of action/ground truth	Agreement status, ChatGPT/GPT-4
1	1	Case presentation/provide a DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	Early anastomotic bile leak	A/A
	2	Provide the most probable diagnosis	Suggested that a biliary complication including bile leak as the most probable diagnosis/suggested bile leak as the most probable diagnosis	Early anastomotic bile leak	A/A
	3	Suggest a suitable diagnostic test to confirm the diagnosis	Suggested considering abdominal US or CT, and MRCP/suggested considering abdominal US or CT, fluid drain analysis, and MRCP	Abdominal CT and fluid drain analysis were performed	PA/A
	4	Suggest a suitable treatment for this patient	Suggested considering percutaneous drainage, ERCP, surgical intervention, and antibiotics if there are signs of infection/suggested considering as a first line less invasive treatments such as percutaneous drainage and ERCP and procced with re-exploration if those fail, while covering the patient with antibiotics	Antibiotics were commenced, followed by an ERCP which did not resolve the bile leak and the patient was re-explored	A/A
2	5	Case presentation/calculate CP score, MELD score, and MELD-sodium score	Accurately calculated CP score and MELD score, underestimated MELD-sodium score/accurately calculated the required scores	CP score = 13, MELD score = 34, and MELD-sodium score = 37	PA/A
2	6	Patient’s pre-operative assessment findings presented/evaluate patient’s eligibility to proceed with transplantation	Suggested that it’s likely that the operation was postponed or deferred until the patient's condition improved/suggested that given the findings the transplant team would have opted to delay the liver transplantation until active issues were adequately addressed	Transplantation did not proceed	A/A
3	7	Case presentation/provide a DD for the patient	Provided a DD that did not include the final diagnosis/provided a DD that did not include the final diagnosis	PLS	D/D
	8	Provide the most probable diagnosis	Suggested acute cellular rejection as the most probable diagnosis/suggested acute hemolytic transfusion reaction	PLS	D/D
	9	Suggest treatment options for the patient	Suggested high-dose of intravenous corticosteroids, other anti-rejection medications, and plasmapheresis/suggested not furtherly transfusing the patient, administer corticosteroids, and monitor the patient	Patient was treated with high-dose corticosteroids, plasmapheresis, and intravenous immunoglobulin	PA/D
	10	Given the patient’s 3-month new signs/symptoms (recurrent ascites, low-grade fever etc.), provide a new DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	PTLD	A/A
	11	Provide the most probable diagnosis	Suggested PTLD as the most probable diagnosis/suggested nephrotic syndrome as the most probable diagnosis	PTLD	A/D
4	12	Case presentation/ suggest the most suitable diagnostic test	Brain imaging was suggested/suggested brain imaging, EEG, and tacrolimus level test	A brain CT, EEG, and tacrolimus level test were performed	PA/A
	13	Provide a DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	PRES	A/A
	14	Provide the most probable diagnosis	Suggested PRES as the most probable diagnosis/suggested tacrolimus neurotoxicity as the most probable diagnosis	PRES	A/D
5	15	Case presentation/provide DD for the patient	Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis	GVHD	A/A
	16	Provide most probable diagnosis	Suggested CMV infection as the most probable diagnosis/suggest CMV infection as the most probable diagnosis	GVHD	D/D
	17	Suggest appropriate diagnostic tests	Suggested CMV testing, biopsy, and imaging studies/suggested CMV testing, imaging studies, and skin biopsy	Peripheral blood flow cytometry, colonoscopy, and skin biopsy were performed	PA/PA

A: Agreement; CMV: Cytomegalovirus; CP: Child-Pugh; CT: Computed tomography; D: Disagreement; DD: Differential diagnosis; EEG: Electroencephalography; ERCP: Endoscopic retrograde cholangiopancreatography; GVHD: Graft-versus-host disease; MELD: Model for end-stage liver disease; MRCP: Magnetic resonance cholangiopancreatography; PA: Partial agreement; PLS: Passenger lymphocyte syndrome; PRES: Posterior reversible encephalopathy syndrome; PTLD: Post-transplantation lymphoproliferative disorder; US: Ultrasound.

Open in New Tab Full Size Table

Regarding renal transplantation, ChatGPT demonstrated an agreement rate of 53.85% (14 out of 26), partial agreement of 19.23% (5 out of 26), and disagreement of 26.92% (7 out of 26). GPT-4 demonstrated an agreement rate of 80.77% (21 out of 26), partial agreement of 7.69% (2 out of 29), and disagreement of 11.54% (3 out of 26). Regarding liver transplantation, ChatGPT demonstrated an agreement rate of 52.94% (9 out of 17), partial agreement of 29.41% (5 out of 17), and disagreement of 17.65% (3 out of 17). GPT-4 demonstrated an agreement rate of 58.82% (10 out of 17), partial agreement of 5.88% (1 out of 17), and disagreement of 35.29% (6 out of 17). Supplementary Table 6 shows the performance of ChatGPT vs GPT-4 when factoring in the nature of the task. Notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 91.7% of the cases (11 out of 12). Furthermore, GPT-4 successfully suggested an appropriate diagnostic test for further investigating patient’s symptoms in 77.8% of cases (7 out of 9).

When compared, the performance of ChatGPT did not differ between the published case reports and the unpublished department cases (P = 0.459). Similarly, the performance of GPT-4 did not differ significantly (P = 0.232). Finally, Table 7 represents the performance of ChatGPT vs GPT-4 when factoring in the nature of the task for all cases, published or unpublished. Notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (27 out of 30). Furthermore, GPT- suggested an appropriate diagnostic test for further investigating patient’s symptoms in 78.9% of cases (7 out of 9). Finally, GPT-4 predicted the prognosis in 100% of related questions (5 out of 5).

Table 7 Aggregated performance of ChatGPT and GPT-4 in clinical scenarios across published and unpublished cases, categorized by task type, n (%).

Type of task	Overall chatGPT agreement level	Overall GPT-4 agreement level	chatGPT renal transplantation agreement level	GPT-4 renal transplantation agreement level	chatGPT liver transplantation agreement level	GPT-4 liver transplantation agreement level
DD that includes final diagnosis	A: 22/30 (73.3)	A: 27/30 (90)	A: 13/16 (81.3)	A: 15/16 (93.8)	A: 9/14 (64.3)	A: 12/14 (85.7)
DD that includes final diagnosis	PA: 1/30 (3.33)	PA: 1/30 (3.3)	PA: 1/16 (6.3)	PA: 1/16 (6.2)	PA: 0/14 (0)	PA: 0/14 (0)
Final diagnosis prediction	A: 11/31 (35.5)	A: 20/31 (64.5)	A: 7/17 (41.2)	A: 13/17 (76.5)	A: 4/14 (28.6)	A: 7/14 (50)
Final diagnosis prediction	PA: 2/31 (6.45)	PA: 2/31 (6.5)	PA: 1/17 (5.9)	PA: 1/17 (5.9)	PA: 1/14 (7.1)	PA: 1/14 (7.1)
Appropriate next diagnostic test	A: 8/19 (42.1)	A: 15/19 (78.9)	A: 6/13 (46.2)	A: 11/13 (84.6)	A: 2/6 (33.6)	A: 4/6 (66.7)
Appropriate next diagnostic test	PA: 8/19 (42.1)	PA: 2/19 (10.5)	PA: 5/13 (38.5)	PA: 1/13 (7.7)	PA: 3/6 (50)	PA: 1/6 (16.7)
Appropriate treatment	A: 11/21 (52.4)	A: 15/21 (71.4)	A: 5/8 (62.5)	A: 7/8 (87.5)	A: 6/13 (46.2)	A: 4/6 (66.7)
Appropriate treatment	PA: 9/21 (42.9)	PA: 4/21 (19)	PA: 2/8(25%)	PA: 1/8 (12.5)	PA: 7/13 (53.8)	PA: 1/6 (16.7)
Prediction of prognosis	A: 3/5 (60)	A: 5/5 (100)	A: 1/1 (100%)	A: 1/1 (100)	A: 2/4 (50)	A: 4/4 (100)
Prediction of prognosis	PA: 1/5 (20)	PA: 0/5 (0)	PA: 0/0 (0%)	PA: 0/0 (0)	PA: 1/4 (25)	PA: 0/4 (0)

DD: Differential diagnosis; A: Agreement; PA: Partial agreement.

Open in New Tab Full Size Table

DISCUSSION

In this paper, we investigated the performance of ChatGPT and GPT-4 in various clinical scenarios regarding renal and liver transplantation in an effort to evaluate the potential role of these tools in AI-assisted clinical practice. GPT-4 demonstrated a superior performance in all types of scenarios. Specifically, GPT-4 was right approximately six out of 10 times when solving challenging multiple-choice questions in renal and liver transplantation. Regarding published case reports, the comparative analysis across these real-world case reports reveals that both models are highly capable, while GPT-4 generally demonstrates an edge in comprehensive responses and alignment with clinical practices. GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. These findings were similar in unpublished work. Notably, GPT-4 demonstrated outstanding performance in specific tasks, providing a differential diagnosis that included the final diagnosis in 90% of the cases (27 out of 30), suggested an appropriate diagnostic test for further investigating patient’s symptoms in 78.9% of cases (7 out of 9), and predicted the prognosis of the patient in 100% of related questions (5 out of 5). ChatGPT’s and GPT-4’ performance remained consistent when tested in unpublished material. This suggests that the performance of these tools is unaffected by whether the cases presented were potentially part of the training set. In other words, the performance is genuine, and not a result of overfitting (higher performance in the training dataset, which drops significantly when unknown instances are introduced). While both tools demonstrated notable strengths in addressing a wide range of clinical scenarios, certain areas revealed consistent underperformance. Firstly, the more detailed a case summary was, the more comprehensive the response. These tools underperformed when tasked with interpreting ambiguous or incomplete clinical data, as their reasoning relies on patterns learned from the training data rather than experiential understanding. Additionally, both models struggled with rare conditions, as those are underrepresented in their training datasets, leading to oversimplified or incorrect recommendations. Furthermore, while GPT-4 demonstrates improved contextual awareness, both models generate responses that, while plausible, lack the depth required for clinical decision-making. These areas of underperformance underscore the importance of human oversight and highlight opportunities for further refinement in AI models for clinical use.

Some other studies have also investigated the role of ChatGPT in renal or liver transplantation. Rawashdeh et al[13] evaluated the potential use of ChatGPT in medical scenarios related to kidney transplantation and its applicability. ChatGPT was tested on general questions about kidney transplantation, writing scientific texts on this topic, and generating summaries of texts about kidney transplantation[13]. The authors, with the help of two experts, assessed the validity, scientific accuracy, clarity, conciseness, and repeatability of the texts and answers generated by ChatGPT. The study results indicated that ChatGPT demonstrated satisfactory knowledge of general issues about kidney transplantation but failed to present detailed and accurate answers to specific questions[13]. ChatGPT’s responses maintained a scientific language and tone, but some elements were not factual. According to the two experts, none of the answers were error-free, and some of the bibliographies were inaccurate and unreliable. Finally, the ChatGPT answers and texts had sufficient repeatability, as there were no statistically significant differences on separate days[13].

Endo et al[11] investigated the accuracy and reliability of ChatGPT’s responses to questions related to liver transplantation . The authors developed a set of 29 questions covering general information about liver transplantation, including: (1) 4 general questions; (2) 7 questions about the waiting list; (3) 13 questions about the pre-transplant period; and (4) 5 questions about the donor[11]. The quality of the responses was independently assessed by “quality grades” by 17 experts in the field of abdominal transplant surgery. A total of 493 “quality scores” (29 questions × 17 experts) were collected, of which 46.0% were “very good”, 30.2% were “excellent” and 7.2% were “poor” or “fair”. Overall, 70.6% of the experts considered ChatGPT to be an accurate source of information[11]. In a different, recent study, regarding liver transplantation[12]. Finally, in a recent study, Mankowski et al[39] compared the performance of ChatGPT, GPT-4, GPT-4 visual against nephrology fellows and training program directors in 12 multiple-choice questions assessing six kidney transplant cases. Notably, GPT-4 visual, performed comparably to nephrology fellows and training program directors, answering correctly in 10 questions, while nephrology fellows and training program directors answered 9 and 11 questions correctly, respectively[39]. Notably, GPT-4 visual demonstrated significantly higher performance compared to all its predecessors, showcasing how these models rapidly evolve significantly in short periods of time.

With 400 tested questions, this is the first study of this scale and versatility in clinical transplantation. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. Our analysis underscores how the performance of these tools is enhanced as those tools become more sophisticated. As these AI tools evolve, they could potentially address several gaps in renal and liver transplant practice. These include, among others, optimizing workflow by automating routine documentation, synthesizing and summarizing extensive medical literature for clinicians in seconds, enhancing access to transplantation expertise in resource-limited settings (particularly in time-sensitive settings) to non-specialist clinicians, providing personalized decision-support tools for transplant candidate selection, and enhancing patient education by simplifying complex medical concepts, thus fostering better understanding and communication between patients and healthcare providers. Notably, the ultimate goal is for these tools to augment, rather than replace, the role of physicians, ensuring safer and more effective patient care.

Physicians, like all professionals, are not infallible. In general, errors by AI are met more harshly than errors by humans. An intriguing aspect to explore would be to provide experienced transplant physicians with the same scenarios and compare the performance of AI-based tools with the physician’s performance. Nevertheless, it’s only fair to assume that as these models keep progressing and becoming more sophisticated, they will eventually surpass physicians in performing certain tasks. Future research should focus on validating these results across a broader range of medical fields, patient populations, and clinical environments to ensure generalizability. Additionally, continuous evaluation and updating are imperative to maintain performance and relevance in clinical decision-making as these models evolve and diverge from their initial versions. Aside from accuracy concerns, the application of AI-based tools in healthcare faces a plethora of other challenges, such as intrinsic bias, data protection and cybersecurity concerns, cost-effectiveness, interpretability, intellectual property, oversight and liability concerns, and ethical concerns[2,40]. Ethical considerations include the complexity of assigning liability in cases of erroneous AI recommendations. Current legal frameworks need physicians to provide care consistent with standard practices, which shields them from liability when standard care is followed[41]. However, this may inadvertently discourage physicians from fully leveraging AI tools, reducing them to confirmatory aids rather than tools to enhance care[41]. Without a comprehensive legal framework addressing AI liability, healthcare facilities may remain hesitant to adopt these technologies due to concerns about potential exposure to malpractice claims. This highlights the urgent need for clear policies to balance innovation with accountability in AI-assisted clinical decision-making.

Another important challenge of adopting AI tools in healthcare is the lack of interpretability (inability to provide an explanation of the inner logic that led to the recommendation)[42]. To address this, actionable steps must be taken to ensure that these tools are both trusted and effective. First, prioritizing tools equipped with interpretability features, demonstrating which patient characteristics most influenced the decision. Currently, the latest versions of GPT-4 are able to provide citations for the information provided. Integrating AI into real-world clinical workflows will first require studies to assess its practical impact on patient outcomes, workflow efficiency, and clinical adoption. Addressing ethical and regulatory challenges, such as mitigating biases, ensuring data security, and establishing accountability frameworks, will also be critical to realizing the full potential of AI in healthcare. Policymakers must establish clear guidelines mandating a baseline level of explainability for AI tools used in healthcare. This ensures that clinicians can understand and justify AI-assisted decisions, which is essential for maintaining patient trust and ethical integrity. Both transplant professionals and policymakers should encourage ongoing education and training on the use of AI, ensuring that clinicians can effectively apply and evaluate those tools in clinical practice.

Although this study included diverse question types and scenarios related to kidney and liver transplantation, the generalizability of the findings may be limited by the regional and demographic scope of the cases used. One critical consideration is that while not fully disclosed, the training sets of the above models likely draw heavily on publicly available medical literature. Thus, the clinical scenarios tested in this paper could have also been part of the training data of these models. This introduces a bias of overfitting, which means that the model demonstrates superior performance on well-documented scenarios in the literature, significantly decreasing when the model encounters less commonly studied conditions or atypical clinical presentations. This lack of generalizability underscores the need for caution when using these models for conditions underrepresented in the literature. However, we have mitigated this by comparing the performance to a set of unpublished work, proving that the performance is maintained at statistically comparable levels.

CONCLUSION

In earlier publications, we predicted that AI would eventually “infiltrate” the healthcare industry[43]. It seems now that AI is at healthcare’s doorstep. It is essential to highlight that AI in healthcare should aim to embrace the complexity of our profession and augment our intelligence rather than replace it. Clinical reasoning and critical thinking involve non-quantifiable information that AI cannot integrate. In other words, we should aim for AI-assisted and not AI-driven clinical practice. As more AI tools are integrated into clinical practice, advanced evaluation systems must be developed to assess their unintended consequences and impact on patient outcomes. AI is here, and physicians must engage with it to avoid obsolescence.

References

1.	Ertel W. Introduction to Artificial Intelligence. Wiesbaden: Springer Wiesbaden, 2024. [PubMed] [DOI] [Full Text]

Christou CD, Tsoulfas G. Challenges and opportunities in the application of artificial intelligence in gastroenterology and hepatology. World J Gastroenterol. 2021;27:6191-6223. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 40] [Cited by in RCA: 31] [Article Influence: 6.2] [Reference Citation Analysis (9)]

3.	Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. [cited 15 October 2024]. Available from: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf. [PubMed] [DOI]

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1564] [Cited by in RCA: 1766] [Article Influence: 588.7] [Reference Citation Analysis (0)]

Park JY. Could ChatGPT help you to write your next scientific paper?: concerns on research ethics related to usage of artificial intelligence tools. J Korean Assoc Oral Maxillofac Surg. 2023;49:105-106. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 12] [Reference Citation Analysis (0)]

Eysenbach G. The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers. JMIR Med Educ. 2023;9:e46885. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 18] [Cited by in RCA: 301] [Article Influence: 100.3] [Reference Citation Analysis (0)]

Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 653] [Reference Citation Analysis (0)]

8.	Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of gpt-4 on medical challenge problems. 2023 Preprint. Available from: arXiv:230313375. [PubMed] [DOI] [Full Text]

9.	Clark SC. Can ChatGPT transform cardiac surgery and heart transplantation? J Cardiothorac Surg. 2024;19:108. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 10] [Reference Citation Analysis (0)]

10.	Rozenberg D, Singer LG. Predicting outcomes in lung transplantation: From tea leaves to ChatGPT. J Heart Lung Transplant. 2023;42:905-907. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 2] [Article Influence: 0.7] [Reference Citation Analysis (0)]

11.	Endo Y, Sasaki K, Moazzam Z, Lima HA, Schenk A, Limkemann A, Washburn K, Pawlik TM. Quality of ChatGPT Responses to Questions Related To Liver Transplantation. J Gastrointest Surg. 2023;27:1716-1719. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 30] [Reference Citation Analysis (0)]

12.

Akabane M, Iwadoh K, Melcher ML, Sasaki K. Exploring the potential of ChatGPT in generating unknown clinical questions about liver transplantation: A feasibility study. Liver Transpl. 2024;30:229-234. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Cited by in RCA: 8] [Article Influence: 4.0] [Reference Citation Analysis (0)]

13.	Rawashdeh B, Kim J, AlRyalat SA, Prasad R, Cooper M. ChatGPT and Artificial Intelligence in Transplantation Research: Is It Always Correct? Cureus. 2023;15:e42150. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 12] [Reference Citation Analysis (0)]

14.	Hricik D. Multiple choice questions. In: Primer on Transplantation. 3^rd ed. Hoboken: Wiley Online Library, 2011. [PubMed] [DOI] [Full Text]

15.	Clavien PA, Trotter JF. Multiple Choice Questions. In: Medical Care of the Liver Transplant Patient. 4^th ed. Hoboken: Wiley Online Library, 2012. [PubMed] [DOI] [Full Text]

16.	Transplant Hepatology Board Review Course Practice Module Supplement QUESTIONS 1. [cited 15 October 2024]. Available from: https://6443bb74ef7c532515d0-3858179a21f8875f9590fc888a54448a.ssl.cf2.rackcdn.com/aasld_f27253d3ef5b93f482a4d5b239a79a86.pdf. [PubMed] [DOI]

17.	Aziz F, Parajuli S. Complications in Kidney Transplantation: A Case-Based Guide to Management. Cham: Springer Cham, 2022. [PubMed] [DOI] [Full Text]

18.	MSD Manual Professional Version. [cited 15 October 2024]. Available from: https://www.msdmanuals.com/en-gb/professional. [PubMed] [DOI]

19.	The Transplantation Society. IPTA Question Bank. [cited 15 October 2024]. Available from: https://tts.org/91-uncategorised/ipta/ipta-resources/144-ipta-question-bank. [PubMed] [DOI]

20.	United States Medical Licensing Examination. Step 1 Exam Content \| USMLE. [cited 15 October 2024]. Available from: https://www.usmle.org/step-exams/step-1/step-1-exam-content. [PubMed] [DOI]

21.

Alharbi A, Al Turki MS, Aloudah N, Alsaad KO. Incidental Eosinophilic Chromophobe Renal Cell Carcinoma in Renal Allograft. Case Rep Transplant. 2017;2017:4232474. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 6] [Article Influence: 0.7] [Reference Citation Analysis (0)]

22.	Rubin R. Case Studies. Transplantation. 2007;84:S15-S16. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

23.

Molina-Andújar A, Montagud-Marrahí E, Cucchiari D, Ventura-Aguiar P, De Sousa-Amorim E, Revuelta I, Cofan F, Solé M, García-Herrera A, Diekmann F, Poch E, Quintana LF. Postinfectious Acute Glomerulonephritis in Renal Transplantation: An Emergent Aetiology of Renal Allograft Loss. Case Rep Transplant. 2019;2019:7438254. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 2] [Article Influence: 0.3] [Reference Citation Analysis (0)]

24.	Baker S, Popescu M, Akoh JA. Rupture of renal transplant. Case Rep Transplant. 2015;2015:686584. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

25.

Gewehr P, Jung B, Aquino V, Manfro RC, Spuldaro F, Rosa RG, Goldani LZ. Sporotrichosis in renal transplant patients. Can J Infect Dis Med Microbiol. 2013;24:e47-e49. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 14] [Cited by in RCA: 18] [Article Influence: 1.5] [Reference Citation Analysis (0)]

26.	Vassallo D, Husain MM, Greer S, McGrath S, Ijaz S, Kanigicherla D. Hepatitis e infection in a renal transplant recipient. Case Rep Nephrol. 2014;2014:865471. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

27.	Olsen SR, Bhutani M. Multiple cavitating nodules in a renal transplant recipient. Can Respir J. 2009;16:195-197. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 4] [Article Influence: 0.3] [Reference Citation Analysis (0)]

28.

Allam SR, Sankarapandian B, Memon IA, Nef PC, Livingston TS, Rofaiel G. Biopsy Induced Arteriovenous Fistula and Venous Stenosis in a Renal Transplant. Case Rep Nephrol. 2015;2015:313610. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

29.

Subramanian JB, Siddiqui F, Chotai PN, Al-Adwan Y, Rajab A, Washburn K, Schenk AD, Limkemann AJ, Luttrull M, Al-Ebrahim M, Bumgardner G, Singh N. Spinal Stroke following Kidney Transplant. Case Rep Transplant. 2022;2022:2058600. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

30.

Ainsworth CD, Crowther MA, Treleaven D, Evanovitch D, Webert KE, Blajchman MA. Severe hemolytic anemia post-renal transplantation produced by donor anti-D passenger lymphocytes: case report and literature review. Transfus Med Rev. 2009;23:155-159. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 30] [Cited by in RCA: 30] [Article Influence: 1.8] [Reference Citation Analysis (0)]

31.

Okeke R, Lok J, Wells R, Wycoff M, Engelhardt A, Bettag J, O'Leary C, Hallcox T, Nazzal M. Catastrophic Antiphospholipid Syndrome after Orthotopic Liver Transplant. Case Rep Transplant. 2022;2022:6209300. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 2] [Reference Citation Analysis (0)]

32.

Eubank TA, Mobley CM, Moaddab M, Hobeika MJ, O'Neal M, Musick WL, Knight JM, Galati JS, Kodali S, Shetty A, Victor DW 3rd, Saharia A, Ghobrial RM, Grimes KA. Successful Treatment of Invasive Mucormycosis in Orthotopic Liver Transplant Population. Case Rep Transplant. 2021;2021:8667589. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 3] [Reference Citation Analysis (0)]

33.

Kim E, Adeel A, Bozorgzadeh A, Amano S, Barry CT, Daly JS, Devuni D, Elaba Z, Houk L, Martins PN, Movahedi B, Ramanathan M, Theodoropoulos NM. Treatment of Acute Graft-versus-Host Disease in Liver Transplant Recipients. Case Rep Transplant. 2021;2021:8981429. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 3] [Reference Citation Analysis (0)]

34.

Ramírez de la Piscina P, Duca I, Estrada S, Calderón R, Ganchegui I, Campos A, Spicakova K, Urtasun L, Salvador M, Delgado E, Bengoa R, García-Campos F. Combined liver and kidney transplant in a patient with budd-Chiari syndrome secondary to autosomal dominant polycystic kidney disease associated with polycystic liver disease: report of a case with a 9-year follow-up. Case Rep Gastrointest Med. 2014;2014:585291. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3] [Cited by in RCA: 3] [Article Influence: 0.3] [Reference Citation Analysis (0)]

35.

Arstikyte K, Vitkute G, Traskaite-Juskeviciene V, Macas A. Disseminated intravascular coagulation following air embolism during orthotropic liver transplantation: is this just a coincidence? BMC Anesthesiol. 2021;21:264. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3] [Cited by in RCA: 4] [Article Influence: 0.8] [Reference Citation Analysis (0)]

36.

Aucejo F, Winans C, Henderson JM, Vogt D, Eghtesad B, Fung JJ, Sands M, Miller CM. Isolated right hepatic vein obstruction after piggyback liver transplantation. Liver Transpl. 2006;12:808-812. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 12] [Cited by in RCA: 14] [Article Influence: 0.7] [Reference Citation Analysis (0)]

37.

Ichimura K, Kawamura N, Goto R, Watanabe M, Ganchiku Y, Shimamura T, Taketomi A. Living Donor Liver Transplantation for Hepatic Venoocclusive Disease/Sinusoidal Obstruction Syndrome Originating from Hematopoietic Stem Cell Transplantation. Case Rep Transplant. 2022;2022:8361769. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 2] [Reference Citation Analysis (0)]

38.

Trevizoli NC, Obeid EJ, Romeres SGB, Oliveira CAM, Rocha HC, Carvalho-Louro DM, Arantes Ferreira GS, De Campos PB, Ullmann RFB, Figueira AVF, Diaz LGG, Jorge FMF, Caja GON, Bortoli ZB, Watanabe ALC. Liver Transplant and Active Ulcerative Colitis: A Case Report. Transplant Proc. 2022;54:1361-1364. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

39.

Mankowski MA, Jaffe IS, Xu J, Bae S, Oermann EK, Aphinyanaphongs Y, McAdams-DeMarco MA, Lonze BE, Orandi BJ, Stewart D, Levan M, Massie A, Gentry S, Segev DL. ChatGPT Solving Complex Kidney Transplant Cases: A Comparative Study With Human Respondents. Clin Transplant. 2024;38:e15466. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 3] [Reference Citation Analysis (0)]

40.

Christou CD, Tsoulfas G. Challenges involved in the application of artificial intelligence in gastroenterology: The race is on! World J Gastroenterol. 2023;29:6168-6178. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 7] [Cited by in RCA: 14] [Article Influence: 4.7] [Reference Citation Analysis (0)]

41.	Price WN 2nd, Gerke S, Cohen IG. Potential Liability for Physicians Using Artificial Intelligence. JAMA. 2019;322:1765-1766. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 147] [Cited by in RCA: 254] [Article Influence: 36.3] [Reference Citation Analysis (0)]

42.	Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L. Explaining Explanations: An Overview of Interpretability of Machine Learning. 2018 IEEE 5^th International Conference on Data Science and Advanced Analytics (DSAA); Turin, Italy. New York: IEEE, 2018: 80-89. [PubMed] [DOI] [Full Text]

43.

Christou CD, Tsoulfas G. Role of three-dimensional printing and artificial intelligence in the management of hepatocellular carcinoma: Challenges and opportunities. World J Gastrointest Oncol. 2022;14:765-793. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 14] [Cited by in RCA: 12] [Article Influence: 3.0] [Reference Citation Analysis (4)]

Footnotes

Provenance and peer review: Unsolicited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Transplantation

Country of origin: Greece

Peer-review report’s classification

Scientific Quality: Grade A, Grade B, Grade C

Novelty: Grade A, Grade B, Grade B

Creativity or Innovation: Grade A, Grade B, Grade B

Scientific Significance: Grade A, Grade B, Grade B

Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/

P-Reviewer: Ghafourian E; Li SF; Yi G S-Editor: Wei YF L-Editor: A P-Editor: Zheng XM