Observational Study Open Access
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Transplant. Sep 18, 2025; 15(3): 103536
Published online Sep 18, 2025. doi: 10.5500/wjt.v15.i3.103536
Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation
Chrysanthos D Christou, Georgios Katsanos, Georgios Tsoulfas, Center for Research and Innovation in Solid Organ Transplantation, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki 54622, Greece
Olga Sitsiani, Panagiotis Boutos, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki 54622, Greece
Georgios Papadakis, Department of Nephrology and Transplantation, Guy’s Hospital, Guy’s and St Thomas’ NHS Foundation Trust, London SE1 1UL, United Kingdom
Anastasios Tefas, Computational Intelligence and Deep Learning Group, Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki 54636, Greece
Vassilios Papalois, Renal and Transplant Unit, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London W120HS, United Kingdom
ORCID number: Chrysanthos D Christou (0000-0002-5417-8686); Georgios Katsanos (0000-0002-5845-8175); Vassilios Papalois (0000-0003-1645-8684); Georgios Tsoulfas (0000-0001-5043-7962).
Author contributions: Christou CD, Sitsiani O, Boutos P, Katsanos G, Papadakis G, Tefas A, Papalois V, and Tsoulfas G gathered and prepared the clinical scenarios; Christou CD, Sitsiani O, and Boutos P ran the conversations and recorded the answers; Christou CD performed the statistical analysis and drafted the manuscript; and all authors reviewed and edited the manuscript.
Institutional review board statement: This study was conducted using anonymized patient data that are derived from medical records and in compliance with the Declaration of Helsinki and its later amendments and thus does not require IRB approval.
Informed consent statement: This study was conducted using anonymized patient data that are derived from medical records and in compliance with the Declaration of Helsinki and its later amendments and thus does not require informed consent.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-checklist of items.
Data sharing statement: The data underlying this article are available upon reasonable request from the corresponding author.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Chrysanthos D Christou, MD, Center for Research and Innovation in Solid Organ Transplantation, School of Medicine, Aristotle University of Thessaloniki, 49 Konstantinoupoleos Street, Thessaloniki 54622, Greece. christouchrysanthosd@gmail.com
Received: November 25, 2024
Revised: January 26, 2025
Accepted: March 5, 2025
Published online: September 18, 2025
Processing time: 147 Days and 12.2 Hours

Abstract
BACKGROUND

Kidney and liver transplantation are two sub-specialized medical disciplines, with transplant professionals spending decades in training. While artificial intelligence-based (AI-based) tools could potentially assist in everyday clinical practice, comparative assessment of their effectiveness in clinical decision-making remains limited.

AIM

To compare the use of ChatGPT and GPT-4 as potential tools in AI-assisted clinical practice in these challenging disciplines.

METHODS

In total, 400 different questions tested ChatGPT’s/GPT-4 knowledge and decision-making capacity in various renal and liver transplantation concepts. Specifically, 294 multiple-choice questions were derived from open-access sources, 63 questions were derived from published open-access case reports, and 43 from unpublished cases of patients treated at our department. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others.

RESULTS

ChatGPT correctly answered 50.3% of the 294 multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (P < 0.001). Regarding the 63 questions from published cases, ChatGPT achieved an agreement rate of 50.79% and partial agreement of 17.46%, while GPT-4 demonstrated an agreement rate of 80.95% and partial agreement of 9.52% (P = 0.01). Regarding the 43 questions from unpublished cases, ChatGPT demonstrated an agreement rate of 53.49% and partial agreement of 23.26%, while GPT-4 demonstrated an agreement rate of 72.09% and partial agreement of 6.98% (P = 0.004). When factoring by the nature of the task for all cases, notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (P = 0.008), and successfully predicting the prognosis of the patient in 100% of related questions (P < 0.001).

CONCLUSION

GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. The progression and refinement of such AI-based tools could reshape the future of clinical practice, making their early adoption and adaptation by physicians a necessity.

Key Words: Artificial intelligence; ChatGPT; GPT-4; Transplantation; Kidney; Liver; Clinical decision support; Generative artificial intelligence

Core Tip: GPT-4 outperformed ChatGPT in a wide range of clinical scenarios related to kidney and liver transplantation, demonstrating greater accuracy and alignment with physician decisions across a variety of tasks, including differential diagnosis, choosing appropriate diagnostic tests and treatment, and predicting the prognosis of patients. These findings highlight the potential of artificial intelligence models like GPT-4 as valuable tools in supporting clinical decision-making in transplantation.



INTRODUCTION

Artificial intelligence (AI) is an umbrella term used to describe any application where computer systems perform tasks traditionally linked with human intelligence. AI is, in reality, a broad field with a plethora of interrelated fields, including, among others, machine learning and probabilistic reasoning, deep learning, fuzzy systems, computer vision, and natural language processing[1]. Despite their differences, all these fields have one thing in common: They are driven by the advancements in big data and computing power. Other disciplines of AI, particularly machine learning, have found profound applications in healthcare, with models being utilized in the prevention, diagnosis, treatment, and prognosis of a plethora of diseases[2]. However, the use of other disciplines, such as natural language processing, in healthcare has been, until recently, limited. ChatGPT emerged in late 2022, by the AI research company OpenAI (San Francisco, CA, United States), having a profound impact in many industries. ChatGPT is a large language model that uses deep learning techniques to produce human-like responses to natural language inputs, based on a vast corpus of text data[3]. It is designed to interact with the user in a human-like manner in order to understand users’ requests and answer in an appropriate manner in order to assist in the problem-solving process, or it can just be used to conduct a human-to-computer real-time dialogue.

Despite not being developed as a medical tool, its potential application in healthcare has raised widespread attention with articles investigating its ability to provide reliable medical information on a variety of medical topics, pass medical exams, and assist in medical writing[4-6]. Contemporary healthcare systems urgently need to enhance their precision and accuracy in examinations, diagnosis, and treatment of the patient while reducing the time required for these procedures. Of utmost importance in this process is deemed to be the right and timely decision-making. The two characteristics of ChatGPT, the wide variety of information that has been trained with and the capability of constant answers, raise the question of whether this breakthrough could be used in everyday medical decision-making processes[7]. GPT-, the latest large language model introduced by OpenAI, has already proved to perform superiorly to its “predecessor”, ChatGPT, in medical exams[8]. AI tools have shown promising performance in tasks such as data analysis, diagnostic support, and clinical decision-making. However, their adoption into the clinical setting is hindered by several challenges, such as their underperformance in complex clinical reasoning and their proneness to biases, particularly when rare conditions are underrepresented in the training data. Additionally, their lack of transparency in how decisions are generated undermines their trustworthiness. Transplantation is a highly sub-specialized medical discipline, with transplant professionals spending decades in training. Few studies have been published regarding heart and lung transplantation[9,10]. This paper aims to investigate the performance of ChatGPT in the challenging medical disciplines of kidney and liver transplantation and compare it with the performance of GPT-4. Currently, only a few efforts exist in the literature investigating the role of ChatGPT in renal and liver transplantation and particularly comparing the performance of ChatGPT with GPT-4 in these disciplines[11-13].

MATERIALS AND METHODS
Resources

In total 400 different questions tested ChatGPT’s/GPT-4 knowledge and decision-making capacity in various concepts regarding renal and liver transplantation as follows: (1) 294 multiple-choice questions regarding liver and renal transplantation were collected from various open-access sources, including transplantation associations, medical examinations (such as United States Medical Licensing Examination), and book chapters; (2) Secondly, 63 questions derived from 20 open-access, published case reports regarding renal and liver transplantation were collected from the literature. From each case, a series of questions regarding differential diagnosis, diagnostic test, appropriate treatment, and prognosis were derived; and (3) Lastly, 43 questions derived from 10 unpublished cases regarding renal and liver transplantation from patients treated in our department were written. Similarly, a series of questions regarding the diagnosis and management of these patients were derived.

These 400 questions were selected to encompass a broad spectrum of clinical scenarios commonly encountered in kidney and liver transplantation. We also ensured they covered a plethora of tasks such as diagnostic reasoning, treatment planning, and prognostic evaluation to comprehensively assess of the models’ capabilities. Questions were sources from published case reports as well as unpublished department cases, which provided diverse and nuanced real-world contexts. This approach ensured the evaluation captured both the consistency and adaptability of these tools in clinical practice.

Dialogue design

Two different conversations were made for each multiple question, case report, or department case, one using ChatGPT and one using GPT-4, respectively. Default models (no fine-tuning or customization models) were used, as provided by OpenAI at the time of research. An example of a multiple-choice question dialogue is shown below:

“Physician: I am going to provide a series of virtual scenarios regarding liver transplantation. As a first task, I will need you to assign, based on the context of the question, one of the following labels: i. General information, ii. Diagnosis, iii. Differential Diagnosis, iv. Treatment, v. Prognosis, vi. Other. Then, as a second task, I will provide you with multiple-choice answers and I will need you to give the correct answer. Are you ready?

AI: Absolutely, I’m ready! Please proceed with the virtual scenarios, and I’ll do my best to assist you with the labelling and answering tasks.

Physician: Question 1: Which of the following is an absolute contraindication for liver transplantation?

A. History of cancer

B. Active tuberculosis

C. Cystic Fibrosis

D. Portal thrombosis

E. Active substance abuse

AI: Label: Treatment

Correct Answer: E. Active substance use”

The first task was to assign a label regarding the nature of the task of the multiple-choice question. The answer was compared with the label assigned by two physicians. Secondly, provided with the multiple-choice question, ChatGPT/GPT-4 was asked to solve the question to the best of their knowledge. Regarding the published case reports and the unpublished department cases, the cases were presented gradually, as parts, posing questions in between. All parts/questions were pre-defined and were not affected by the in-between ChatGPT/GPT-4 answers to avoid bias. Unpublished work was used to ensure that performance remained consistent for cases that were not part of ChatGPT’s or GPT-4’s training dataset (leading to high performance due to overfitting). ChatGPT and GPT-4were tested for the following skills: Providing a differential diagnosis for a patient following case presentation, suggesting the most probable diagnosis, suggesting the next diagnostic test, and finally, suggesting appropriate treatment options. An example of a case report/unpublished department case dialogue interaction is shown below:

“Physician: I am going to provide you with information regarding a case report of a patient who underwent liver transplantation. I am going to provide the information gradually, asking some questions in between.

A 60-year-old patient with a history of cirrhosis due to alcohol-associated liver disease […]

Could you provide a differential diagnosis for the etiology of this patient’s shock?

AI: Based on the information provided […]

Physician: Which is the most probable diagnosis for this patient?

AI: Based on the information provided […]

Physician: What further tests do you believe are needed for this patient?

AI: Based on the patient’s presentation and suspected […]

Physician: Further investigations were sent including mixed chimerism studies that revealed […], consistent with the diagnosis of graft vs host disorder.

What treatment(s) could be used for this patient?”

Study outcomes

For multiple-choice questions, we recorded agreement and disagreement with the physician’s label for concept labelling and with the correct answer for response selection. Concept labelling was conducted by two authors independently and then reviewed by a third author. Disagreements were resolved through discussion among the three. For case reports and unpublished department cases, we assigned end-points as follows: Disagreement when the ChatGPT/GPT-4 proposal did not match/was different from the physicians’ decision. Partial agreement was assigned if the ChatGPT/GPT-4 proposal included a portion of the actions taken by physicians, and finally, agreement was assigned when the ChatGPT/GPT-4 proposal either perfectly matched the physicians’ actions or when the ChatGPT/GPT-4 included additional actions. For example, if physicians used medications A and B for treatment, and ChatGPT/GPT-4 proposed drug C, this would be labelled as disagreement. If ChatGPT/GPT-4 proposed drug A, it would be labelled as partial agreement. If it proposed both drugs A and B, or suggested a choice among A, B, and C, it would be labelled as agreement. All “ground truth” labels were determined before any of the conversations with these tools took place to mitigate confirmation and observer bias.

Data extraction

To create a dataset that could be used for statistical analysis, we constructed dataset tables that could then be translated into variables. This procedure required two layers of data decoding. More specifically, two types of tables were created for each resource dialogue conducted with ChatGPT/GPT-4 that had the following structure: (1) For multiple-choice questions, the table included the following details: Serial number of the dialogue with ChatGPT/GPT-4, the question posed, the resource, the predefined label for the question, the label assigned by ChatGPT/GPT-4, the agreement (A) or disagreement (D) status regarding the label, the ChatGPT/GPT-4 response to the question, and the agreement or disagreement status regarding the question; and (2) For the case reports and department cases, the table included: Serial number of the dialogue with ChatGPT/GPT-4, the question posed, the resource (not included in department cases), the action taken by the author’s teams for published case reports and our team for department cases, the actions proposed by ChatGPT/GPT-4, and the agreement, disagreement, or partial agreement status.

Statistical analysis

For each scenario, we assessed the model’s performance by categorizing the response as agreement, partial agreement, or disagreement based on the predefined criteria above. The proportions of responses in each category were then calculated across all scenarios. To compare the performance of ChatGPT and GPT-4, we used Pearson’s χ2-test to evaluate the distribution of agreement levels across the two models. Statistical significance was defined as P < 0.05. All statistical analyses were performed using SPSS29.

RESULTS
Collected resources

In total, the study generated 1388 data points, 1176 from multiple-choice questions and 212 from case questions. Two hundred ninety-four multiple-choice questions regarding renal and liver transplantations were collected, 108 regarding renal and 186 regarding liver transplantation[14-20]. When it comes to the nature of each scenario, 78 (26.5%) regarded general information, 22 (7.5%) regarded differential diagnosis, 48 (16.3%) regarded appropriate diagnostic test(s), 86 (29.3%) regarded treatment, and 60 (20.4%) regarded prognosis. Twenty case reports were selected from the literature. Ten cases regarded kidney transplantation, and 29 questions were derived from those cases. Ten cases regarded liver transplantation, and 34 questions were derived from those cases. Thus, in total, 63 questions on published case reports were tested. Regarding unpublished department cases, we chose 10 cases. Five cases regarded renal transplantation, and 26 questions were derived from those cases. Five cases regarded liver transplantation, and 17 questions were derived from those cases. In total, 43 questions about department cases were tested.

Comparing ChatGPT and GPT-4 performance

Multiple-choice questions: Tables 1 and 2 show the performance of ChatGPT and GPT-4 in assigning context labels for the 294 multiple-choice questions. Supplementary Tables 1 and 2 show the performance of ChatGPT and GPT-4 in answering those questions. Overall, ChatGPT assigned 58.2% correct labels (171 out of 294). ChatGPT’s accuracy in assigning appropriate labels varied across categories. Specifically, the highest accuracy was demonstrated in the treatment category, reaching 74.42% (64 out of 86), followed by diagnosis at 68.75% (33 out of 48). Performance regarding differential diagnosis and prognosis was the same, at 50%.

Table 1 Overall ChatGPT performance in assigning context labels across 294 virtual cases, highlighting agreement with predefined labels, n (%).
Actual label
Assigned label, GI
Assigned label, diagnosis
Assigned label, DD
Assigned label, treatment
Assigned label, prognosis
Assigned label, total
GI33 (11.22)20 (6.8)6 (24)16 (5.44)3 (12)78 (26.53)
Diagnosis8 (2.72)33 (11.22)5 (1.7)0 (0)2 (0.68)48 (16.33)
DD0 (0)11 (3.74)11 (3.74)0 (0)0 (0)22 (7.48)
Treatment9 (36)10 (3.4)2 (0.68)64 (21.77)1 (0.34)86 (29.25)
Prognosis12 (48)12 (48)3 (12)3 (12)30 (10.2)60 (20.41)
Total62 (219)86 (29.25)27 (9.18)83 (28.23)36 (12.24)294 (100)
Table 2 Overall GPT-4 performance in assigning context labels in virtual cases across 294 virtual cases, highlighting agreement with predefined labels, n (%).
Actual label
Assigned label, GI
Assigned label, diagnosis
Assigned label, DD
Assigned label, treatment
Assigned label, prognosis
Assigned label, total
GI42 (14.29)14 (4.76)1 (0.34)20 (6.8)1 (0.34)78 (26.53)
Diagnosis2 (0.68)44 (14.97)1 (0.34)1 (0.34)0 (0)48 (16.33)
DD0 (0)15 (5.1)5 (1.7)2 (0.68)0 (0)22 (7.48)
Treatment5 (1.7)7 (2.38)1 (0.34)73 (24.83)0 (0)86 (29.25)
Prognosis10 (3.4)11 (3.74)5 (1.7)7 (2.38)27 (9.18)60 (20.41)
Total59 (207)91 (30.95)13 (4.42)103 (353)28 (9.52)294 (100)

GPT-4 significantly improved overall accuracy compared to ChatGPT labelling correctly 191 out of 294 scenarios (64.97% vs 58.16%, P < 0.001). GPT-4 demonstrated improved performance in some categories compared to ChatGPT and lower in others. Notably, its performance in assigning the diagnosis label reached 91.7% (44 out of 48), a statistically significant difference compared to ChatGPT (P = 0.049). The treatment category also demonstrated a statistically significant improved accuracy compared with ChatGPT at 84.9% (73 out of 86, P < 0.001). GPT-4 performed poorer than ChatGPT in assigning differential diagnosis (22.7%), but this did not reach statistical significance (P = 0.13).

The performance of ChatGPT and GPT-4 in answering questions regarding kidney and liver transplantation was evaluated through a detailed review of their agreement and disagreement rates across multiple scenarios. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others (Supplementary Tables 1 and 2). Overall, ChatGPT correctly answered 50.3% (148 out of the 294) multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (208 out of 294), which was found statistically significant (P < 0.001). Regarding kidney transplantation, ChatGPT demonstrated an accuracy of 71.3% (77 out of 108), while GPT-4 had an accuracy of 83.3% (90 out of 108), a statistically significant difference (P = 0.006). Interestingly, both tools were right in 63.9% of instances (69 out of 108), both incorrect in 9.3% (10 out of 108), ChatGPT correct and GPT-4 incorrect in 7.4% (8 out of 108), and ChatGPT incorrect and GPT-4 correct in 19.4% (21 out of 108). Regarding liver transplantation, ChatGPT demonstrated an accuracy of 38.2% (71 out of 186), while GPT-4 had an accuracy of 63.4% (118 out of 186), a statistically significant improvement (P < 0.001) Interestingly, both tools were right in 33.9% of instances (63 out of 186), both incorrect in 32.3% (60 out of 186), ChatGPT correct and GPT-4 incorrect in 4.3% (8 out of 186), and ChatGPT incorrect and GPT-4 correct in 29.6% (55 out of 186).

When factoring based on the nature of the scenario, ChatGPT demonstrated an overall agreement of 43.6% (34 out of 78) for general information, 81.8% (18 out of 22) for differential diagnosis, 60.4% (29 out of 48) for the next diagnostic test, 45.3% for treatment (39 out of 86), and 46.7% for prognosis (28 out of 60). On the other hand, GPT-4 demonstrated superior performance in all types of scenarios except those regarding differential diagnosis. Specifically, GPT-4 demonstrated an overall agreement rate of 67.9% (53 out of 78) for general information, 77.3% (17 out of 22) for differential diagnosis, 77.1% (37 out of 48) for the next diagnostic test, 66.3% for treatment (57 out of 86), and 73.3% for prognosis (44 out of 60).

Published case reports: Table 3[21-30] and Table 4[22,31-38] compare the performance of ChatGPT and GPT-4 in various clinical tasks derived from published case reports. Overall, ChatGPT demonstrated an agreement rate of 50.79% (32 out of 63), a partial agreement rate of 17.46% (11 out of 63), and a disagreement rate of 31.75% (20 out of 63). GPT-4 demonstrated an agreement rate of 80.95% (51 out of 63), partial agreement of 9.52% (6 out of 63), and disagreement of 9.52% (6 out of 63). The overall performance of GPT-4 was found to be significantly higher compared with ChatGPT (P = 0.01). Regarding renal transplantation, ChatGPT demonstrated an agreement rate of 62.07% (18 out of 29), partial agreement of 13.79% (4 out of 29), and disagreement of 24.14% (7 out of 29). GPT-4 demonstrated an agreement rate of 89.66% (26 out of 29), partial agreement of 6.9% (2 out of 29), and disagreement of 3.45% (1 out of 29). Regarding liver transplantation, ChatGPT demonstrated an agreement rate of 41.18% (14 out of 34), partial agreement of 20.59% (7 out of 29), and disagreement of 38.24% (13 out of 34). GPT-4 demonstrated an agreement rate of 73.53% (25 out of 34), partial agreement of 11.76% (4 out of 34), and disagreement of 14.71% (5 out of 34). Supplementary Table 3 presents the performance of ChatGPT vs GPT-4 when categorized by the nature of the task. Notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 94.4% of the cases (17 out of 18). Furthermore, GPT-4 suggested an appropriate diagnostic test for further investigating patient’s symptoms in 90% of cases (9 out of 10). Finally, GPT-4 successfully suggested a treatment that agreed with the ground truth in 93.3% of the cases (14 out of 15).

Table 3 Comparative performance of ChatGPT and GPT-4 in case reports on renal transplantation, detailing agreement levels by task type.
Ref.
Question number
Task
Performance, ChatGPT/GPT-4
Physicians course of action/ground truth
Agreement status, ChatGPT/GPT-4
Alharbi et al[21]1Provide a list of suitable antibiotics for pseudomonas aeruginosa urinary tract infection.Provided a list of suitable antibiotics including the one used by physicians (meropenem)/provided a list of suitable antibiotics including the one used by physicians (meropenem)Meropenem was administratedA/A
2Suggest the next diagnostic test(s) needed for the patientSuggested a renal ultrasound and a stool culture/suggested a renal ultrasound, abdominal CT, blood cultures, and a stool cultureAbdominal ultrasound and abdominal CT scan were conductedPA/A
3Identify the most probable diagnosis for the patient Renal allograft malignancy/renal allograft malignancyEosinophilic chromophobe renal cell carcinoma was confirmed by the histopathological examination of the graftA/A
Rubin et al[22]4Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisCMV viremiaA/A
5Provide the most likely diagnosis for the patientPost-influenza bacterial pneumonia/CMV reactivationCMV viremia was demonstrated by antigenemia and PCR assayD/A
6Suggest treatment for the patientSuggested ganciclovir, valganciclovir, foscarnet, and cidofovir (most preferable ganciclovir or valganciclovir)/suggested ganciclovir, valganciclovir, foscarnet, and cidofovir (most preferable ganciclovir or valganciclovir)Intravenous ganciclovir followed by oral valganciclovir at a dose of 900 mg/day was administeredA/A
Molina-Andújar et al[23]7Provide a DD for the patientProvided a DD that included the final diagnosis/ Provided a DD that included the final diagnosisAcute post-infectious glomerulonephritisA/A
8Provide the most likely diagnosis for the patientAcute post-infectious glomerulonephritis/acute post-infectious glomerulonephritisAcute post-infectious glomerulonephritisA/A
Baker et al[24]9Provide the next step patient’s managementSuggested hemodynamic stabilization with transfusion of blood products and bleeding control including surgical intervention, if necessary, followed by continuous monitoring/Suggested hemodynamic stabilization with transfusion of blood products and surgical exploration if bleeding if suspected to be within the surgical site. Suggested medication reevaluation focus on anticoagulants, prophylactic treatment for infection prevention and continuous monitoring. The patient was taken back to theatre for exploration where ligation of the bleeding artery, removal of blood clots and blood transfusion took place. Postoperative monitoring was performedA/A
10Suggest the next diagnostic test needed for the patientSuggested an abdominal CT scan or an ultrasound/suggested imaging such as abdominal CT with contrast, an ultrasound or an angiogram is performed. Suggested evaluating the patient with new laboratory tests and for the need of re-explorationAn urgent CT angiogram was performedPA/A
11Provide a DD for the bleedingProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisBleeding from a small branch of the renal arteryA/A
12Provide the most likely diagnosis for the patientFailure or dislodgement of a surgical clip: Bleeding from a small branch of the renal artery where a surgical clip had come off during the re-exploration surgery/Failure or dislodgement of a surgical clip: Bleeding from a small branch of the renal artery where a surgical clip had come off during the re-exploration surgeryBleeding was noticed from a small branch of the renal arteryA/A
Gewehr et al[25]13Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisFungal infectionA/A
14Provide the most likely diagnosis for the patientFungal Infection/fungal Infection, and specifically sporotrichosisFungal Infection (sporotrichosis)A/A
Vassallo et al[26] 15Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisActive hepatitis E virus infectionA/A
16Provide the most likely diagnosis for the patientNAFLD/NAFLD or drug inducedActive hepatitis E virus infectionD/D
17Suggest the next diagnostic test needed for the patientSuggested liver biopsy along with further imaging and laboratory investigations/suggested liver biopsy along with further imaging and laboratory investigationsLiver biopsyA/A
18Suggest the next diagnostic test needed for the patient after the biopsy resultsSuggested extensive viral serologic tests, PCR for suspected viruses, immunostaining of liver biopsy, and continuous monitoring of liver function/suggested extensive viral serologic tests, PCR for suspected viruses, immunostaining of liver biopsy, and continuous monitoring of liver functionA more extensive viral screen was conductedA/A
Olsen et al[27]19Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisEpstein-Barr virus-negative, diffuse, large B-cell lymphomaA/A
20Provide the most likely diagnosis for the patientSuggested that infectious aetiologies such as disseminated tuberculosis or fungal infections are more likely. It implied that diagnosis is difficult without further diagnostic investigations/determined as PTLD as the most likely diagnosis followed by infectious aetiologiesEpstein-Barr virus-negative, diffuse, large B cell lymphomaD/A
21Suggest the next diagnostic test needed for the patientSuggested sputum and/or BAL cultures, Mantoux test or IGRA, Blood tests, further imaging and laboratory tests, and lung biopsy/suggested liver biopsy, sputum and/or BAL cultures, Mantoux test or IGRA, Blood tests, further imaging and laboratory tests, and lung biopsyBiopsy from one of the liver lesionsD/A
Allam et al[28]22Suggest the next diagnostic test needed for the patientSuggested a kidney biopsy/suggested a kidney biopsy and further laboratory testsTransplant biopsy was performedA/A
23Provide a DD for the patientProvided a DD that did not include the final diagnosis/provided a DD including vascular complications such as vein stenosisBiopsy-induced arteriovenous fistula and venous stenosisD/PA
24Suggest treatment for the patientSuggested intervention to address the arteriovenous fistula and stenosis of the main renal vein (embolization, angioplasty, stenting)/suggested intervention to address the arteriovenous fistula and stenosis of the main renal vein (embolization, angioplasty, stenting)Embolization of fistula (coil occlusion)A/A
Subramanian et al[29]25Provide a DD for the patientProvided a DD that did not include the final diagnosis/provided a DD that included the final diagnosisA small basal ganglia infarct and an infarct of the spinal cord was foundD/A
26Provide the most likely diagnosis for the patientSuggested ischemic injury or infarction of the spinal cord/suggested spinal cord ischemia or infarctionA small basal ganglia infarct and an infarct of the spinal cord was foundA/A
27Suggest the next diagnostic test needed for the patientSuggested spine MRI, NCS and EMG to assess peripheral nerves and muscles, lumbar puncture if infections suspected, and transplant biopsy if rejection or ischemia is suspected/suggested spine MRI-MRA, neurond physiological studies (SSEP, NCS and EMG), lumbar puncture if infections suspectedA CTAP, and spine/brain MRI were performed PA/PA
Ainsworth et al[30]28Provide a DD for the patientProvided a DD that included immune-mediated hemolysis but did not specifically include PLS/provided a DD that included the final diagnosisPLSPA/A
29Provide the most likely diagnosis for the patientSuggested hemolysis due to mismatched blood type of the donor/suggested PLSPLSD/A
Table 4 Comparative performance of ChatGPT and GPT-4 in case reports on liver transplantation, detailing agreement levels by task type.
Ref.
Question number
Task
Performance, ChatGPT/GPT-4
Physicians course of action/ground truth
Agreement status, ChatGPT/GPT-4
Rubin et al[22]1Case presentation/provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisCMVA/A
2Provide the most likely diagnosis for the patientSuggested post-transplant infection, particularly a viral infection (CMV, EBV, or VZV)/CMVCMVPA/A
3Justify the recurrence of CMV infections despite treatmentSuggested resistance to ganciclovir/suggested resistance to ganciclovir or/and inadequate duration of initial treatment-secondary infectionsGanciclovir resistant infectionA/A
4Suggest alternative treatment for the patientSuggested foscarnet/suggested foscarnet or cidofovir or letermovir or/and CMV immunoglobulinsFoscarnet was administeredA/A
Okeke et al[31]5Case presentation/suggest treatment for the patient given no arterial flow in the liverSuggested interventional radiology procedures or/and surgical revascularization/suggested interventional radiology procedures or/and surgical revascularization (thrombectomy or re-anastomosis)Interventional radiology procedure (thrombolysis) was performed. Then revascularization was achieved intraoperatively (infra-aortic jump was performed)PA/PA
6Suggest the diagnostic tests needed for the patient following re-thrombosisSuggested doppler ultrasound, CT angiogram, coagulation profile-thrombophilia testing/suggested thrombophilia workup, repeat imaging (doppler ultrasound, CT/MRI angiography), and autoimmune markersHypercoagulable workup was performedA/A
7Provide a DD behind re-thrombosisProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisAntiphospholipid syndromeA/A
8Provide the most likely diagnosis for the patientSuggested hepatic artery thrombosis/suggested antiphospholipid syndromeAntiphospholipid syndromeD/A
Eubank et al[32]9Case presentation/determine the most likely microorganism to be identified by the swabSuggested Staphylococcus aureus, Streptococcus species, Enterococcus species, and Pseudomonas aeruginosa, and fungi like Candida albicans/suggested Staphylococcus aureus, Enterococcus species, Pseudomonas aeruginosa, Escherichia coli, fungi like Candida or Aspergillus, viruses like CMV, and mycobacteria94% Enterococcus faecalis, 93% Rhizopus oryzae, and 5% Aspergillus flavusD/PA
10Suggest treatment for the patient given the pathogens identifiedSuggested intravenous liposomal amphotericin B at an appropriate dosage, along with surgical debridement of infected tissue/suggested intravenous liposomal amphotericin B at an appropriate dosage, oral posaconazole along with surgical debridement of infected tissue. Oral posaconazole 300 mg and IV amphotericin B and micafungin daily. Amphotericin B deoxycholate irrigation in the wound vacuumPA/A
Kim et al[33]11Case presentation/provide a DD for the patient’s shockProvided a DD that did not include the final diagnosis/provided a DD that included the final diagnosisGVHDD/A
12Provide the most likely diagnosis for the patientSuggested a surgical complication, specifically duodenal perforation/suggested duodenal perforation or drug-induced kidney injury/neutropeniaGVHDD/D
13Suggest the further diagnostic tests needed for the patientSuggested blood cultures, peritoneal fluid analysis, endoscopy or upper GI imaging/suggested blood and urine cultures, viral and fungal tests, peritoneal fluid analysis, laboratory tests, and endoscopy or upper GI imagingMixed chimerism studies and skin biopsy were performedD/D
14Suggest further treatment for the patient given the mixed chimerism studies resultsThe following treatment options were suggested: Systemic corticosteroids, adjusting tacrolimus dose, consider additional immunosuppressives such as mycophenolate, and phototherapy/suggested considering the following treatment options: High-dose corticosteroids, ATG, ECP, infliximab, ruxolitinib, MSC transplantation, additional immunosuppressive agents, and IL-2 diphtheria toxinSteroids were administrated for 4 days followed by ruxolitinib due to patient not responding to treatmentPA/A
15Guess the survival of the patientSuggested that the patient did not, most likely, survive/suggested that the patient did not, most likely, surviveThe patient died on day 16 of re-admission, 45 days following transplantationA/A
Kim et al[33], (b)16Case presentation/provide a DD for the patientProvided a DD that did not include the final diagnosis/provided a DD that included the final diagnosisGVHDD/A
17Provide the most likely diagnosis for the patientSuggested Clostridioides difficile colitis/suggested GVHDGVHDD/A
18Suggest treatment for the patientThe following treatment options were suggested: Glucocorticoids, CNIs, ATG, T-cell depleting agents such as basiliximab/high-dose corticosteroids, adjust immunosuppression, consider second line treatments such as ATG, ECP, sirolimus, infliximab, and basiliximabSteroids were administrated for 2 days followed by ruxolitinib due to patient not responding to treatmentPA/PA
19Guess the survival of the patientDeclined to make a prediction/suggested that the patient did not, most likely, surviveThe patient died 29 days after transplantD/A
Ramírez de la Piscina et al[34]20Case presentation/Provide a DD for the patientProvided a DD that included the final diagnosis/ provided a DD that included the final diagnosisBudd-Chiari syndrome secondary to ADPKDA/A
21Provide the most likely diagnosis for the patientSuggested Budd-Chiari syndrome/suggested Budd-Chiari syndrome secondary to the compression from ADPKD cystsBudd-Chiari syndrome secondary to ADPKDA/A
22Suggest treatment for the patientProvided a list of suitable treatment options including only liver transplantation/provided a list of suitable treatment options including combined transplantationA combined liver and renal transplantation was performedPA/A
Arstikyte et al[35]23Case presentation/provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisVenous air embolismA/A
24Provide the most likely diagnosis for the patientSuggested that information given is insufficient to single out a specific diagnosis/suggested that based on given information hemorrhage or venous air embolism are the two most likely diagnosesVenous air embolismD/A
25Suggest appropriate diagnostic test for the patientSuggested TEE/suggested TEETEEA/A
Aucejo et al[36]26Case presentation/provide a DD for the patientProvided a DD that did not include the final diagnosis/provided a DD that did not include the final diagnosisNarrowing of the RHV at the level of the cava-caval anastomosisD/D
27Provide the most likely diagnosis for the patientSuggested adhesions, anastomotic leakage, or biliary complications/suggested PVTNarrowing of the RHV at the level of the cava-caval anastomosisD/D
28Given the RHV stenosis diagnosis, suggest treatment for the patientSuggested considering stent placement, TIPS or surgical revision/suggested considering stent placement, TIPS or surgical revisionA wall stent 14 mm in diameter by 40 mm in length was placed across the RHV stenosisA/A
Ichimura et al[37]29Case presentation/provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisVOD/SOSA/A
30Provide the most likely diagnosis for the patientSuggested GVHD/suggested VOD/SOSVOD/SOSD/A
31Suggest treatment for the patient given VOD/SOSSuggested considering defibrotide, anticoagulant medications, and liver transplantation/suggested considering defibrotide, anticoagulant medications, TIPS, and liver transplantationThe physicians performed a liver transplantation since defibrotide had not yet been approvedA/A
32Provide a new differential diagnosis for the patient’s deterioration postoperativelyProvided a DD that did not include the final diagnosis/provided a DD that included the final diagnosisGVHD, several infectionsD/A
Trevizoli et al[38]33Case presentation/suggest appropriate treatment for the patientSuggested considering corticosteroids, aminosalicylates, immunomodulators such as azathioprine, biologic agents such as infliximab, diuretics, variceal bleeding prophylaxis and liver transplant evaluation/suggested considering corticosteroids, aminosalicylates, immunomodulators such as azathioprine, biologic agents such as infliximab, consider surgical management (colectomy), diuretics, variceal bleeding prophylaxis and liver transplant evaluationSodium restriction, diuretic therapy, hydrocortisone 300 mg was started without adequate response, vedolizumabPA/PA
34Suggest appropriate treatment for the patient given the DVT progressionSuggested LMWH and IVF/suggested LMWHHe underwent hemodynamic intervention with the placement of a vena cava filterA/D

Unpublished department cases: Supplementary Tables 4 and 5 provide the case presentation of the unpublished department cases provided to ChatGPT/GPT-4 before their performance was tested on various tasks. Tables 5 and 6 compare the accuracy of ChatGPT and GPT-4 in various clinical tasks derived from those unpublished department cases. Overall, ChatGPT demonstrated an agreement rate of 53.49% (23 out of 43), partial agreement of 23.26% (10 out of 43), and disagreement of 23.26% (10 out of 43). GPT-4 demonstrated an agreement rate of 72.09% (31 out of 43), partial agreement of 6.98% (3 out of 43), and disagreement of 20.93% (9 out of 43). The overall performance of GPT-4 was found to be significantly higher compared with ChatGPT (P = 0.004).

Table 5 Comparative performance of ChatGPT and GPT-4 in department cases on renal transplantation, detailing agreement levels by task type.
Case ID
Question number
Task
Performance, ChatGPT/GPT-4
Physicians course of action/ground truth
Agreement status, ChatGPT/GPT-4
11Case presentation/provide the diagnostic tests needed to investigate refractory ascites in patient with ADPKDSuggested abdominal ultrasound, paracentesis with fluid analysis, LF tests, tumor marker tests, CT scan, serologic testing, genetic testing/ suggested paracentesis with fluid analysis, LF tests, abdominal ultrasound, CT scan, echocardiogram, and endoscopy, further evaluation for elevated markersParacentesis (ascites fluid was send for cytology, culture, TB investigation, SAAG calculation), abdominal CT, liver ultrasound, LF tests, tumor marker tests, serologic testing, echocardiogram, and endoscopyPA/A
2Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisTuberculous peritonitisA/A
3Provide the most likely diagnosis for the patientSuggested malignancy (most likely ovarian cancer) or SBP are the most likely diagnoses/suggested tuberculous peritonitis or malignancy or SBP as the most likely diagnosesTuberculous peritonitisD/A
24Case presentation/provide a differential diagnosis for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisAcute PEA/A
5Provide the most probable diagnosis for the patientSuggested myocardial infraction as the most probable diagnosis/suggested PE as the most probable diagonalAcute PED/A
6What diagnostic test is more suitable for this patientSuggested CTPA and ECG be performed/suggested CTPA, ECG, and d-dimers tests be performed CTPA was performedA/A
7What treatment do you recommend for this patient, given PE is confirmedSuggested a choice among LMWH, DOACs, and warfarin. No discrimination between short and long-term anticoagulation was made. Suggested initial anticoagulation with either LMWH or DOACs including apixaban followed by a long-term anticoagulation with either a DOAC or warfarin10 mg apixaban BD was commenced followed by 5 mg BD for 6 monthsPA/A
38Case presentation/provide a DD given the post-operative signs/symptoms of the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisPost-operative bleedingA/A
9Provide the most probable diagnosisSuggested exacerbation or progression of her underlying thrombocytopenic disorder/suggested post-transplant acute thrombotic microangiopathyPost-operative bleeding D/D
10Predict the next diagnostic test that the patient requiresSuggested coagulation studies, renal function test, peripheral blood smear, infectious testing and imaging including ultrasound and CT/suggested peripheral blood smear, LDH level, Coombs test, renal function, immunosuppressive level tests, and infection screening.Abdominal ultrasound and abdomen/pelvis CT with contrastPA/D
11Appropriate treatment given the evidence of active bleedingSuggested stabilization with intravenous fluids and blood products, surgical intervention, and close monitoring/suggested stabilization with intravenous fluids and blood products, surgical intervention, and close monitoringThe patient was transfused and was re-exploredA/A
412Case presentation/provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisAcute graft thrombosis due to renal vein thrombosisA/A
13Provide the most probable diagnosisSuggested acute graft thrombosis due to either renal artery or vein thrombosis/suggested acute graft thrombosis due to renal vein thrombosisAcute graft thrombosis due to renal vein thrombosisA/A
14Provide the most suitable diagnostic testSuggested choosing among transplant duplex US, CT angiography, and renal scintigraphy/suggested choosing among transplant duplex US, CT angiography, and renal scintigraphyTransplant doppler USA/A
15Given the transplant US findings, provide the patient’s diagnosisAcute renal allograft rejection/acute renal artery thrombosis or artery stenosisRenal vein thrombosisD/D
16Given the transplant US findings, suggest a diagnostic modality that could verify diagnosisRenal biopsy/suggested CT angiographyCT angiography was performedD/A
17Suggest treatment options for the patientSuggested considering high-dose corticosteroids, antithymocyte globulin, calcineurin inhibitors, mycophenolate mofetil, basiliximab or alemtuzumab, and plasmapheresis with intravenous immunoglobulin/suggested surgical revascularizationPatient was re-exploredD/A
18Findings of reperfusion during benchwork after explanationSuggested inadequate restoration of tissue perfusion and significant vascular compromise and tissue damage/suggested extensive, vascular thrombosis with poor kidney perfusion, and evidence of parenchymal damageArtery perfusion required high pressure, kidney became turgid, swollen, and a capsular tear was seenA/A
519Case presentation/provide DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisPost-transplant obstructive LUTS due to clot retentionA/A
20Provide most probable diagnosisSuggested urinary tract obstruction most probably at the side of the anastomosis/suggested urinary tract obstruction due to blood clot formation as the most probable diagnosisPost-transplant obstructive LUTS due clot retentionPA/A
21Suggest next diagnosis test to verify the diagnosisSuggested considering transplant US, abdominal CT or renal scintigraphy/suggested transplant US as the first-line image modality. Suggested that other option include abdominal CT, MRI, and nuclear medicine scansA transplant US was performedA/A
22Given findings of US/suggest a suitable treatment option for the patientSuggested considering manual irrigation, catheter flushing, cystoscopic clot evaluation, and monitoring/suggested replacing the foley catheter to flush out smaller clots, cystoscopic clot evaluation, consider percutaneous nephrostomy, and monitoringA 3-way irrigation system was appliedPA/PA
23Despite resolved hematuria patient’s clearance did not improved/provide a DDProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisAcute kidney injury with acute tubular necrosisA/A
24Provide most probable diagnosisSuggested acute kidney injury with acute tubular necrosis as the most probable diagnosis/suggested acute kidney injury with acute tubular necrosis as the most probable diagnosisAcute kidney injury with acute tubular necrosisA/A
25Case progression update/poor renal function 3 months post-operatively provide DD for patient’s signs and symptomsProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisRecurrence of underlying diseaseA/A
26Provide most probable diagnosisSuggested chronic allograft dysfunction as the most probable diagnosis/suggested chronic allograft dysfunction and recurrence of the underlying disease as the two most probable diagnosesRecurrence of underlying diseaseD/PA
Table 6 Comparative performance of ChatGPT and GPT-4 in department cases on liver transplantation, detailing agreement levels by task type.
Case ID
Question number
Task
Performance, ChatGPT/GPT-4
Physicians course of action/ground truth
Agreement status, ChatGPT/GPT-4
11Case presentation/provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisEarly anastomotic bile leakA/A
2Provide the most probable diagnosisSuggested that a biliary complication including bile leak as the most probable diagnosis/suggested bile leak as the most probable diagnosisEarly anastomotic bile leakA/A
3Suggest a suitable diagnostic test to confirm the diagnosisSuggested considering abdominal US or CT, and MRCP/suggested considering abdominal US or CT, fluid drain analysis, and MRCPAbdominal CT and fluid drain analysis were performedPA/A
4Suggest a suitable treatment for this patientSuggested considering percutaneous drainage, ERCP, surgical intervention, and antibiotics if there are signs of infection/suggested considering as a first line less invasive treatments such as percutaneous drainage and ERCP and procced with re-exploration if those fail, while covering the patient with antibioticsAntibiotics were commenced, followed by an ERCP which did not resolve the bile leak and the patient was re-exploredA/A
25Case presentation/calculate CP score, MELD score, and MELD-sodium scoreAccurately calculated CP score and MELD score, underestimated MELD-sodium score/accurately calculated the required scoresCP score = 13, MELD score = 34, and MELD-sodium score = 37PA/A
6Patient’s pre-operative assessment findings presented/evaluate patient’s eligibility to proceed with transplantationSuggested that it’s likely that the operation was postponed or deferred until the patient's condition improved/suggested that given the findings the transplant team would have opted to delay the liver transplantation until active issues were adequately addressedTransplantation did not proceedA/A
37Case presentation/provide a DD for the patientProvided a DD that did not include the final diagnosis/provided a DD that did not include the final diagnosis PLSD/D
8Provide the most probable diagnosisSuggested acute cellular rejection as the most probable diagnosis/suggested acute hemolytic transfusion reactionPLSD/D
9Suggest treatment options for the patientSuggested high-dose of intravenous corticosteroids, other anti-rejection medications, and plasmapheresis/suggested not furtherly transfusing the patient, administer corticosteroids, and monitor the patientPatient was treated with high-dose corticosteroids, plasmapheresis, and intravenous immunoglobulinPA/D
10Given the patient’s 3-month new signs/symptoms (recurrent ascites, low-grade fever etc.), provide a new DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisPTLDA/A
11Provide the most probable diagnosisSuggested PTLD as the most probable diagnosis/suggested nephrotic syndrome as the most probable diagnosisPTLDA/D
412Case presentation/ suggest the most suitable diagnostic testBrain imaging was suggested/suggested brain imaging, EEG, and tacrolimus level testA brain CT, EEG, and tacrolimus level test were performedPA/A
13Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisPRESA/A
14Provide the most probable diagnosisSuggested PRES as the most probable diagnosis/suggested tacrolimus neurotoxicity as the most probable diagnosisPRESA/D
515Case presentation/provide DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisGVHDA/A
16Provide most probable diagnosisSuggested CMV infection as the most probable diagnosis/suggest CMV infection as the most probable diagnosisGVHDD/D
17Suggest appropriate diagnostic testsSuggested CMV testing, biopsy, and imaging studies/suggested CMV testing, imaging studies, and skin biopsyPeripheral blood flow cytometry, colonoscopy, and skin biopsy were performedPA/PA

Regarding renal transplantation, ChatGPT demonstrated an agreement rate of 53.85% (14 out of 26), partial agreement of 19.23% (5 out of 26), and disagreement of 26.92% (7 out of 26). GPT-4 demonstrated an agreement rate of 80.77% (21 out of 26), partial agreement of 7.69% (2 out of 29), and disagreement of 11.54% (3 out of 26). Regarding liver transplantation, ChatGPT demonstrated an agreement rate of 52.94% (9 out of 17), partial agreement of 29.41% (5 out of 17), and disagreement of 17.65% (3 out of 17). GPT-4 demonstrated an agreement rate of 58.82% (10 out of 17), partial agreement of 5.88% (1 out of 17), and disagreement of 35.29% (6 out of 17). Supplementary Table 6 shows the performance of ChatGPT vs GPT-4 when factoring in the nature of the task. Notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 91.7% of the cases (11 out of 12). Furthermore, GPT-4 successfully suggested an appropriate diagnostic test for further investigating patient’s symptoms in 77.8% of cases (7 out of 9).

When compared, the performance of ChatGPT did not differ between the published case reports and the unpublished department cases (P = 0.459). Similarly, the performance of GPT-4 did not differ significantly (P = 0.232). Finally, Table 7 represents the performance of ChatGPT vs GPT-4 when factoring in the nature of the task for all cases, published or unpublished. Notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (27 out of 30). Furthermore, GPT- suggested an appropriate diagnostic test for further investigating patient’s symptoms in 78.9% of cases (7 out of 9). Finally, GPT-4 predicted the prognosis in 100% of related questions (5 out of 5).

Table 7 Aggregated performance of ChatGPT and GPT-4 in clinical scenarios across published and unpublished cases, categorized by task type, n (%).
Type of task
Overall chatGPT agreement level
Overall GPT-4 agreement level
chatGPT renal transplantation agreement level
GPT-4 renal transplantation agreement level
chatGPT liver transplantation agreement level
GPT-4 liver transplantation agreement level
DD that includes final diagnosisA: 22/30 (73.3)A: 27/30 (90)A: 13/16 (81.3)A: 15/16 (93.8)A: 9/14 (64.3)A: 12/14 (85.7)
PA: 1/30 (3.33)PA: 1/30 (3.3)PA: 1/16 (6.3)PA: 1/16 (6.2)PA: 0/14 (0)PA: 0/14 (0)
Final diagnosis predictionA: 11/31 (35.5)A: 20/31 (64.5)A: 7/17 (41.2)A: 13/17 (76.5)A: 4/14 (28.6)A: 7/14 (50)
PA: 2/31 (6.45)PA: 2/31 (6.5)PA: 1/17 (5.9)PA: 1/17 (5.9)PA: 1/14 (7.1)PA: 1/14 (7.1)
Appropriate next diagnostic testA: 8/19 (42.1)A: 15/19 (78.9)A: 6/13 (46.2)A: 11/13 (84.6)A: 2/6 (33.6)A: 4/6 (66.7)
PA: 8/19 (42.1)PA: 2/19 (10.5)PA: 5/13 (38.5)PA: 1/13 (7.7)PA: 3/6 (50)PA: 1/6 (16.7)
Appropriate treatment A: 11/21 (52.4)A: 15/21 (71.4)A: 5/8 (62.5)A: 7/8 (87.5)A: 6/13 (46.2)A: 4/6 (66.7)
PA: 9/21 (42.9)PA: 4/21 (19)PA: 2/8(25%)PA: 1/8 (12.5)PA: 7/13 (53.8)PA: 1/6 (16.7)
Prediction of prognosis A: 3/5 (60)A: 5/5 (100)A: 1/1 (100%)A: 1/1 (100)A: 2/4 (50)A: 4/4 (100)
PA: 1/5 (20)PA: 0/5 (0)PA: 0/0 (0%)PA: 0/0 (0)PA: 1/4 (25)PA: 0/4 (0)
DISCUSSION

In this paper, we investigated the performance of ChatGPT and GPT-4 in various clinical scenarios regarding renal and liver transplantation in an effort to evaluate the potential role of these tools in AI-assisted clinical practice. GPT-4 demonstrated a superior performance in all types of scenarios. Specifically, GPT-4 was right approximately six out of 10 times when solving challenging multiple-choice questions in renal and liver transplantation. Regarding published case reports, the comparative analysis across these real-world case reports reveals that both models are highly capable, while GPT-4 generally demonstrates an edge in comprehensive responses and alignment with clinical practices. GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. These findings were similar in unpublished work. Notably, GPT-4 demonstrated outstanding performance in specific tasks, providing a differential diagnosis that included the final diagnosis in 90% of the cases (27 out of 30), suggested an appropriate diagnostic test for further investigating patient’s symptoms in 78.9% of cases (7 out of 9), and predicted the prognosis of the patient in 100% of related questions (5 out of 5). ChatGPT’s and GPT-4’ performance remained consistent when tested in unpublished material. This suggests that the performance of these tools is unaffected by whether the cases presented were potentially part of the training set. In other words, the performance is genuine, and not a result of overfitting (higher performance in the training dataset, which drops significantly when unknown instances are introduced). While both tools demonstrated notable strengths in addressing a wide range of clinical scenarios, certain areas revealed consistent underperformance. Firstly, the more detailed a case summary was, the more comprehensive the response. These tools underperformed when tasked with interpreting ambiguous or incomplete clinical data, as their reasoning relies on patterns learned from the training data rather than experiential understanding. Additionally, both models struggled with rare conditions, as those are underrepresented in their training datasets, leading to oversimplified or incorrect recommendations. Furthermore, while GPT-4 demonstrates improved contextual awareness, both models generate responses that, while plausible, lack the depth required for clinical decision-making. These areas of underperformance underscore the importance of human oversight and highlight opportunities for further refinement in AI models for clinical use.

Some other studies have also investigated the role of ChatGPT in renal or liver transplantation. Rawashdeh et al[13] evaluated the potential use of ChatGPT in medical scenarios related to kidney transplantation and its applicability. ChatGPT was tested on general questions about kidney transplantation, writing scientific texts on this topic, and generating summaries of texts about kidney transplantation[13]. The authors, with the help of two experts, assessed the validity, scientific accuracy, clarity, conciseness, and repeatability of the texts and answers generated by ChatGPT. The study results indicated that ChatGPT demonstrated satisfactory knowledge of general issues about kidney transplantation but failed to present detailed and accurate answers to specific questions[13]. ChatGPT’s responses maintained a scientific language and tone, but some elements were not factual. According to the two experts, none of the answers were error-free, and some of the bibliographies were inaccurate and unreliable. Finally, the ChatGPT answers and texts had sufficient repeatability, as there were no statistically significant differences on separate days[13].

Endo et al[11] investigated the accuracy and reliability of ChatGPT’s responses to questions related to liver transplantation . The authors developed a set of 29 questions covering general information about liver transplantation, including: (1) 4 general questions; (2) 7 questions about the waiting list; (3) 13 questions about the pre-transplant period; and (4) 5 questions about the donor[11]. The quality of the responses was independently assessed by “quality grades” by 17 experts in the field of abdominal transplant surgery. A total of 493 “quality scores” (29 questions × 17 experts) were collected, of which 46.0% were “very good”, 30.2% were “excellent” and 7.2% were “poor” or “fair”. Overall, 70.6% of the experts considered ChatGPT to be an accurate source of information[11]. In a different, recent study, regarding liver transplantation[12]. Finally, in a recent study, Mankowski et al[39] compared the performance of ChatGPT, GPT-4, GPT-4 visual against nephrology fellows and training program directors in 12 multiple-choice questions assessing six kidney transplant cases. Notably, GPT-4 visual, performed comparably to nephrology fellows and training program directors, answering correctly in 10 questions, while nephrology fellows and training program directors answered 9 and 11 questions correctly, respectively[39]. Notably, GPT-4 visual demonstrated significantly higher performance compared to all its predecessors, showcasing how these models rapidly evolve significantly in short periods of time.

With 400 tested questions, this is the first study of this scale and versatility in clinical transplantation. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. Our analysis underscores how the performance of these tools is enhanced as those tools become more sophisticated. As these AI tools evolve, they could potentially address several gaps in renal and liver transplant practice. These include, among others, optimizing workflow by automating routine documentation, synthesizing and summarizing extensive medical literature for clinicians in seconds, enhancing access to transplantation expertise in resource-limited settings (particularly in time-sensitive settings) to non-specialist clinicians, providing personalized decision-support tools for transplant candidate selection, and enhancing patient education by simplifying complex medical concepts, thus fostering better understanding and communication between patients and healthcare providers. Notably, the ultimate goal is for these tools to augment, rather than replace, the role of physicians, ensuring safer and more effective patient care.

Physicians, like all professionals, are not infallible. In general, errors by AI are met more harshly than errors by humans. An intriguing aspect to explore would be to provide experienced transplant physicians with the same scenarios and compare the performance of AI-based tools with the physician’s performance. Nevertheless, it’s only fair to assume that as these models keep progressing and becoming more sophisticated, they will eventually surpass physicians in performing certain tasks. Future research should focus on validating these results across a broader range of medical fields, patient populations, and clinical environments to ensure generalizability. Additionally, continuous evaluation and updating are imperative to maintain performance and relevance in clinical decision-making as these models evolve and diverge from their initial versions. Aside from accuracy concerns, the application of AI-based tools in healthcare faces a plethora of other challenges, such as intrinsic bias, data protection and cybersecurity concerns, cost-effectiveness, interpretability, intellectual property, oversight and liability concerns, and ethical concerns[2,40]. Ethical considerations include the complexity of assigning liability in cases of erroneous AI recommendations. Current legal frameworks need physicians to provide care consistent with standard practices, which shields them from liability when standard care is followed[41]. However, this may inadvertently discourage physicians from fully leveraging AI tools, reducing them to confirmatory aids rather than tools to enhance care[41]. Without a comprehensive legal framework addressing AI liability, healthcare facilities may remain hesitant to adopt these technologies due to concerns about potential exposure to malpractice claims. This highlights the urgent need for clear policies to balance innovation with accountability in AI-assisted clinical decision-making.

Another important challenge of adopting AI tools in healthcare is the lack of interpretability (inability to provide an explanation of the inner logic that led to the recommendation)[42]. To address this, actionable steps must be taken to ensure that these tools are both trusted and effective. First, prioritizing tools equipped with interpretability features, demonstrating which patient characteristics most influenced the decision. Currently, the latest versions of GPT-4 are able to provide citations for the information provided. Integrating AI into real-world clinical workflows will first require studies to assess its practical impact on patient outcomes, workflow efficiency, and clinical adoption. Addressing ethical and regulatory challenges, such as mitigating biases, ensuring data security, and establishing accountability frameworks, will also be critical to realizing the full potential of AI in healthcare. Policymakers must establish clear guidelines mandating a baseline level of explainability for AI tools used in healthcare. This ensures that clinicians can understand and justify AI-assisted decisions, which is essential for maintaining patient trust and ethical integrity. Both transplant professionals and policymakers should encourage ongoing education and training on the use of AI, ensuring that clinicians can effectively apply and evaluate those tools in clinical practice.

Although this study included diverse question types and scenarios related to kidney and liver transplantation, the generalizability of the findings may be limited by the regional and demographic scope of the cases used. One critical consideration is that while not fully disclosed, the training sets of the above models likely draw heavily on publicly available medical literature. Thus, the clinical scenarios tested in this paper could have also been part of the training data of these models. This introduces a bias of overfitting, which means that the model demonstrates superior performance on well-documented scenarios in the literature, significantly decreasing when the model encounters less commonly studied conditions or atypical clinical presentations. This lack of generalizability underscores the need for caution when using these models for conditions underrepresented in the literature. However, we have mitigated this by comparing the performance to a set of unpublished work, proving that the performance is maintained at statistically comparable levels.

CONCLUSION

In earlier publications, we predicted that AI would eventually “infiltrate” the healthcare industry[43]. It seems now that AI is at healthcare’s doorstep. It is essential to highlight that AI in healthcare should aim to embrace the complexity of our profession and augment our intelligence rather than replace it. Clinical reasoning and critical thinking involve non-quantifiable information that AI cannot integrate. In other words, we should aim for AI-assisted and not AI-driven clinical practice. As more AI tools are integrated into clinical practice, advanced evaluation systems must be developed to assess their unintended consequences and impact on patient outcomes. AI is here, and physicians must engage with it to avoid obsolescence.

Footnotes

Provenance and peer review: Unsolicited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Transplantation

Country of origin: Greece

Peer-review report’s classification

Scientific Quality: Grade A, Grade B, Grade C

Novelty: Grade A, Grade B, Grade B

Creativity or Innovation: Grade A, Grade B, Grade B

Scientific Significance: Grade A, Grade B, Grade B

P-Reviewer: Ghafourian E; Li SF; Yi G S-Editor: Wei YF L-Editor: A P-Editor: Zheng XM

References
1.  Ertel W  Introduction to Artificial Intelligence. Wiesbaden: Springer Wiesbaden, 2024.  [PubMed]  [DOI]  [Full Text]
2.  Christou CD, Tsoulfas G. Challenges and opportunities in the application of artificial intelligence in gastroenterology and hepatology. World J Gastroenterol. 2021;27:6191-6223.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in CrossRef: 17]  [Cited by in RCA: 17]  [Article Influence: 4.3]  [Reference Citation Analysis (7)]
3.  Radford A, Narasimhan K, Salimans T, Sutskever I.   Improving language understanding by generative pre-training. [cited 15 October 2024]. Available from: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.  [PubMed]  [DOI]
4.  Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1564]  [Cited by in RCA: 1295]  [Article Influence: 647.5]  [Reference Citation Analysis (0)]
5.  Park JY. Could ChatGPT help you to write your next scientific paper?: concerns on research ethics related to usage of artificial intelligence tools. J Korean Assoc Oral Maxillofac Surg. 2023;49:105-106.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 9]  [Reference Citation Analysis (0)]
6.  Eysenbach G. The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers. JMIR Med Educ. 2023;9:e46885.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 18]  [Cited by in RCA: 224]  [Article Influence: 112.0]  [Reference Citation Analysis (0)]
7.  Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 467]  [Reference Citation Analysis (0)]
8.  Nori H, King N, McKinney SM, Carignan D, Horvitz E.   Capabilities of gpt-4 on medical challenge problems. 2023 Preprint. Available from: arXiv:230313375.  [PubMed]  [DOI]  [Full Text]
9.  Clark SC. Can ChatGPT transform cardiac surgery and heart transplantation? J Cardiothorac Surg. 2024;19:108.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
10.  Rozenberg D, Singer LG. Predicting outcomes in lung transplantation: From tea leaves to ChatGPT. J Heart Lung Transplant. 2023;42:905-907.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Reference Citation Analysis (0)]
11.  Endo Y, Sasaki K, Moazzam Z, Lima HA, Schenk A, Limkemann A, Washburn K, Pawlik TM. Quality of ChatGPT Responses to Questions Related To Liver Transplantation. J Gastrointest Surg. 2023;27:1716-1719.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 6]  [Reference Citation Analysis (0)]
12.  Akabane M, Iwadoh K, Melcher ML, Sasaki K. Exploring the potential of ChatGPT in generating unknown clinical questions about liver transplantation: A feasibility study. Liver Transpl. 2024;30:229-234.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Reference Citation Analysis (0)]
13.  Rawashdeh B, Kim J, AlRyalat SA, Prasad R, Cooper M. ChatGPT and Artificial Intelligence in Transplantation Research: Is It Always Correct? Cureus. 2023;15:e42150.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 9]  [Reference Citation Analysis (0)]
14.  Hricik D  Multiple choice questions. In: Primer on Transplantation. 3rd ed. Hoboken: Wiley Online Library, 2011.  [PubMed]  [DOI]  [Full Text]
15.  Clavien PA, Trotter JF.   Multiple Choice Questions. In: Medical Care of the Liver Transplant Patient. 4th ed. Hoboken: Wiley Online Library, 2012.  [PubMed]  [DOI]  [Full Text]
16.   Transplant Hepatology Board Review Course Practice Module Supplement QUESTIONS 1. [cited 15 October 2024]. Available from: https://6443bb74ef7c532515d0-3858179a21f8875f9590fc888a54448a.ssl.cf2.rackcdn.com/aasld_f27253d3ef5b93f482a4d5b239a79a86.pdf.  [PubMed]  [DOI]
17.  Aziz F, Parajuli S.   Complications in Kidney Transplantation: A Case-Based Guide to Management. Cham: Springer Cham, 2022.  [PubMed]  [DOI]  [Full Text]
18.   MSD Manual Professional Version. [cited 15 October 2024]. Available from: https://www.msdmanuals.com/en-gb/professional.  [PubMed]  [DOI]
19.  The Transplantation Society  IPTA Question Bank. [cited 15 October 2024]. Available from: https://tts.org/91-uncategorised/ipta/ipta-resources/144-ipta-question-bank.  [PubMed]  [DOI]
20.  United States Medical Licensing Examination  Step 1 Exam Content | USMLE. [cited 15 October 2024]. Available from: https://www.usmle.org/step-exams/step-1/step-1-exam-content.  [PubMed]  [DOI]
21.  Alharbi A, Al Turki MS, Aloudah N, Alsaad KO. Incidental Eosinophilic Chromophobe Renal Cell Carcinoma in Renal Allograft. Case Rep Transplant. 2017;2017:4232474.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 1]  [Cited by in RCA: 3]  [Article Influence: 0.4]  [Reference Citation Analysis (0)]
22.  Rubin R. Case Studies. Transplantation. 2007;84:S15-S16.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
23.  Molina-Andújar A, Montagud-Marrahí E, Cucchiari D, Ventura-Aguiar P, De Sousa-Amorim E, Revuelta I, Cofan F, Solé M, García-Herrera A, Diekmann F, Poch E, Quintana LF. Postinfectious Acute Glomerulonephritis in Renal Transplantation: An Emergent Aetiology of Renal Allograft Loss. Case Rep Transplant. 2019;2019:7438254.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 1]  [Cited by in RCA: 1]  [Article Influence: 0.2]  [Reference Citation Analysis (0)]
24.  Baker S, Popescu M, Akoh JA. Rupture of renal transplant. Case Rep Transplant. 2015;2015:686584.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
25.  Gewehr P, Jung B, Aquino V, Manfro RC, Spuldaro F, Rosa RG, Goldani LZ. Sporotrichosis in renal transplant patients. Can J Infect Dis Med Microbiol. 2013;24:e47-e49.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 14]  [Cited by in RCA: 16]  [Article Influence: 1.5]  [Reference Citation Analysis (0)]
26.  Vassallo D, Husain MM, Greer S, McGrath S, Ijaz S, Kanigicherla D. Hepatitis e infection in a renal transplant recipient. Case Rep Nephrol. 2014;2014:865471.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
27.  Olsen SR, Bhutani M. Multiple cavitating nodules in a renal transplant recipient. Can Respir J. 2009;16:195-197.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 3]  [Article Influence: 0.2]  [Reference Citation Analysis (0)]
28.  Allam SR, Sankarapandian B, Memon IA, Nef PC, Livingston TS, Rofaiel G. Biopsy Induced Arteriovenous Fistula and Venous Stenosis in a Renal Transplant. Case Rep Nephrol. 2015;2015:313610.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
29.  Subramanian JB, Siddiqui F, Chotai PN, Al-Adwan Y, Rajab A, Washburn K, Schenk AD, Limkemann AJ, Luttrull M, Al-Ebrahim M, Bumgardner G, Singh N. Spinal Stroke following Kidney Transplant. Case Rep Transplant. 2022;2022:2058600.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
30.  Ainsworth CD, Crowther MA, Treleaven D, Evanovitch D, Webert KE, Blajchman MA. Severe hemolytic anemia post-renal transplantation produced by donor anti-D passenger lymphocytes: case report and literature review. Transfus Med Rev. 2009;23:155-159.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 30]  [Cited by in RCA: 25]  [Article Influence: 1.6]  [Reference Citation Analysis (0)]
31.  Okeke R, Lok J, Wells R, Wycoff M, Engelhardt A, Bettag J, O'Leary C, Hallcox T, Nazzal M. Catastrophic Antiphospholipid Syndrome after Orthotopic Liver Transplant. Case Rep Transplant. 2022;2022:6209300.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
32.  Eubank TA, Mobley CM, Moaddab M, Hobeika MJ, O'Neal M, Musick WL, Knight JM, Galati JS, Kodali S, Shetty A, Victor DW 3rd, Saharia A, Ghobrial RM, Grimes KA. Successful Treatment of Invasive Mucormycosis in Orthotopic Liver Transplant Population. Case Rep Transplant. 2021;2021:8667589.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
33.  Kim E, Adeel A, Bozorgzadeh A, Amano S, Barry CT, Daly JS, Devuni D, Elaba Z, Houk L, Martins PN, Movahedi B, Ramanathan M, Theodoropoulos NM. Treatment of Acute Graft-versus-Host Disease in Liver Transplant Recipients. Case Rep Transplant. 2021;2021:8981429.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
34.  Ramírez de la Piscina P, Duca I, Estrada S, Calderón R, Ganchegui I, Campos A, Spicakova K, Urtasun L, Salvador M, Delgado E, Bengoa R, García-Campos F. Combined liver and kidney transplant in a patient with budd-Chiari syndrome secondary to autosomal dominant polycystic kidney disease associated with polycystic liver disease: report of a case with a 9-year follow-up. Case Rep Gastrointest Med. 2014;2014:585291.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 3]  [Cited by in RCA: 3]  [Article Influence: 0.3]  [Reference Citation Analysis (0)]
35.  Arstikyte K, Vitkute G, Traskaite-Juskeviciene V, Macas A. Disseminated intravascular coagulation following air embolism during orthotropic liver transplantation: is this just a coincidence? BMC Anesthesiol. 2021;21:264.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 3]  [Cited by in RCA: 2]  [Article Influence: 0.5]  [Reference Citation Analysis (0)]
36.  Aucejo F, Winans C, Henderson JM, Vogt D, Eghtesad B, Fung JJ, Sands M, Miller CM. Isolated right hepatic vein obstruction after piggyback liver transplantation. Liver Transpl. 2006;12:808-812.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 12]  [Cited by in RCA: 13]  [Article Influence: 0.7]  [Reference Citation Analysis (0)]
37.  Ichimura K, Kawamura N, Goto R, Watanabe M, Ganchiku Y, Shimamura T, Taketomi A. Living Donor Liver Transplantation for Hepatic Venoocclusive Disease/Sinusoidal Obstruction Syndrome Originating from Hematopoietic Stem Cell Transplantation. Case Rep Transplant. 2022;2022:8361769.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
38.  Trevizoli NC, Obeid EJ, Romeres SGB, Oliveira CAM, Rocha HC, Carvalho-Louro DM, Arantes Ferreira GS, De Campos PB, Ullmann RFB, Figueira AVF, Diaz LGG, Jorge FMF, Caja GON, Bortoli ZB, Watanabe ALC. Liver Transplant and Active Ulcerative Colitis: A Case Report. Transplant Proc. 2022;54:1361-1364.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
39.  Mankowski MA, Jaffe IS, Xu J, Bae S, Oermann EK, Aphinyanaphongs Y, McAdams-DeMarco MA, Lonze BE, Orandi BJ, Stewart D, Levan M, Massie A, Gentry S, Segev DL. ChatGPT Solving Complex Kidney Transplant Cases: A Comparative Study With Human Respondents. Clin Transplant. 2024;38:e15466.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
40.  Christou CD, Tsoulfas G. Challenges involved in the application of artificial intelligence in gastroenterology: The race is on! World J Gastroenterol. 2023;29:6168-6178.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
41.  Price WN 2nd, Gerke S, Cohen IG. Potential Liability for Physicians Using Artificial Intelligence. JAMA. 2019;322:1765-1766.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 147]  [Cited by in RCA: 214]  [Article Influence: 35.7]  [Reference Citation Analysis (0)]
42.  Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L.   Explaining Explanations: An Overview of Interpretability of Machine Learning. 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA); Turin, Italy. New York: IEEE, 2018: 80-89.  [PubMed]  [DOI]  [Full Text]
43.  Christou CD, Tsoulfas G. Role of three-dimensional printing and artificial intelligence in the management of hepatocellular carcinoma: Challenges and opportunities. World J Gastrointest Oncol. 2022;14:765-793.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in CrossRef: 1]  [Reference Citation Analysis (2)]