Published online Sep 18, 2025. doi: 10.5500/wjt.v15.i3.103536
Revised: January 26, 2025
Accepted: March 5, 2025
Published online: September 18, 2025
Processing time: 147 Days and 12.2 Hours
Kidney and liver transplantation are two sub-specialized medical disciplines, with transplant professionals spending decades in training. While artificial intelligence-based (AI-based) tools could potentially assist in everyday clinical practice, com
To compare the use of ChatGPT and GPT-4 as potential tools in AI-assisted clinical practice in these challenging disciplines.
In total, 400 different questions tested ChatGPT’s/GPT-4 knowledge and decision-making capacity in various renal and liver transplantation concepts. Specifically, 294 multiple-choice questions were derived from open-access sources, 63 questions were derived from published open-access case reports, and 43 from unpublished cases of patients treated at our department. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others.
ChatGPT correctly answered 50.3% of the 294 multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (P < 0.001). Regarding the 63 questions from published cases, ChatGPT achieved an agreement rate of 50.79% and partial agreement of 17.46%, while GPT-4 demonstrated an agreement rate of 80.95% and partial agreement of 9.52% (P = 0.01). Regarding the 43 questions from unpublished cases, ChatGPT demonstrated an agreement rate of 53.49% and partial agreement of 23.26%, while GPT-4 demonstrated an agreement rate of 72.09% and partial agreement of 6.98% (P = 0.004). When factoring by the nature of the task for all cases, notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (P = 0.008), and successfully predicting the prognosis of the patient in 100% of related questions (P < 0.001).
GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. The progression and refinement of such AI-based tools could reshape the future of clinical practice, making their early adoption and adaptation by physicians a necessity.
Core Tip: GPT-4 outperformed ChatGPT in a wide range of clinical scenarios related to kidney and liver transplantation, demonstrating greater accuracy and alignment with physician decisions across a variety of tasks, including differential diagnosis, choosing appropriate diagnostic tests and treatment, and predicting the prognosis of patients. These findings highlight the potential of artificial intelligence models like GPT-4 as valuable tools in supporting clinical decision-making in transplantation.
- Citation: Christou CD, Sitsiani O, Boutos P, Katsanos G, Papadakis G, Tefas A, Papalois V, Tsoulfas G. Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation. World J Transplant 2025; 15(3): 103536
- URL: https://www.wjgnet.com/2220-3230/full/v15/i3/103536.htm
- DOI: https://dx.doi.org/10.5500/wjt.v15.i3.103536
Artificial intelligence (AI) is an umbrella term used to describe any application where computer systems perform tasks traditionally linked with human intelligence. AI is, in reality, a broad field with a plethora of interrelated fields, including, among others, machine learning and probabilistic reasoning, deep learning, fuzzy systems, computer vision, and natural language processing[1]. Despite their differences, all these fields have one thing in common: They are driven by the advancements in big data and computing power. Other disciplines of AI, particularly machine learning, have found profound applications in healthcare, with models being utilized in the prevention, diagnosis, treatment, and prognosis of a plethora of diseases[2]. However, the use of other disciplines, such as natural language processing, in healthcare has been, until recently, limited. ChatGPT emerged in late 2022, by the AI research company OpenAI (San Francisco, CA, United States), having a profound impact in many industries. ChatGPT is a large language model that uses deep learning techniques to produce human-like responses to natural language inputs, based on a vast corpus of text data[3]. It is designed to interact with the user in a human-like manner in order to understand users’ requests and answer in an appropriate manner in order to assist in the problem-solving process, or it can just be used to conduct a human-to-computer real-time dialogue.
Despite not being developed as a medical tool, its potential application in healthcare has raised widespread attention with articles investigating its ability to provide reliable medical information on a variety of medical topics, pass medical exams, and assist in medical writing[4-6]. Contemporary healthcare systems urgently need to enhance their precision and accuracy in examinations, diagnosis, and treatment of the patient while reducing the time required for these procedures. Of utmost importance in this process is deemed to be the right and timely decision-making. The two characteristics of ChatGPT, the wide variety of information that has been trained with and the capability of constant answers, raise the question of whether this breakthrough could be used in everyday medical decision-making processes[7]. GPT-, the latest large language model introduced by OpenAI, has already proved to perform superiorly to its “predecessor”, ChatGPT, in medical exams[8]. AI tools have shown promising performance in tasks such as data analysis, diagnostic support, and clinical decision-making. However, their adoption into the clinical setting is hindered by several challenges, such as their underperformance in complex clinical reasoning and their proneness to biases, particularly when rare conditions are underrepresented in the training data. Additionally, their lack of transparency in how decisions are generated undermines their trustworthiness. Transplantation is a highly sub-specialized medical discipline, with transplant professionals spending decades in training. Few studies have been published regarding heart and lung transplantation[9,10]. This paper aims to investigate the performance of ChatGPT in the challenging medical disciplines of kidney and liver transplantation and compare it with the performance of GPT-4. Currently, only a few efforts exist in the literature investigating the role of ChatGPT in renal and liver transplantation and particularly comparing the performance of ChatGPT with GPT-4 in these disciplines[11-13].
In total 400 different questions tested ChatGPT’s/GPT-4 knowledge and decision-making capacity in various concepts regarding renal and liver transplantation as follows: (1) 294 multiple-choice questions regarding liver and renal transplantation were collected from various open-access sources, including transplantation associations, medical examinations (such as United States Medical Licensing Examination), and book chapters; (2) Secondly, 63 questions derived from 20 open-access, published case reports regarding renal and liver transplantation were collected from the literature. From each case, a series of questions regarding differential diagnosis, diagnostic test, appropriate treatment, and prognosis were derived; and (3) Lastly, 43 questions derived from 10 unpublished cases regarding renal and liver transplantation from patients treated in our department were written. Similarly, a series of questions regarding the diagnosis and management of these patients were derived.
These 400 questions were selected to encompass a broad spectrum of clinical scenarios commonly encountered in kidney and liver transplantation. We also ensured they covered a plethora of tasks such as diagnostic reasoning, treatment planning, and prognostic evaluation to comprehensively assess of the models’ capabilities. Questions were sources from published case reports as well as unpublished department cases, which provided diverse and nuanced real-world contexts. This approach ensured the evaluation captured both the consistency and adaptability of these tools in clinical practice.
Two different conversations were made for each multiple question, case report, or department case, one using ChatGPT and one using GPT-4, respectively. Default models (no fine-tuning or customization models) were used, as provided by OpenAI at the time of research. An example of a multiple-choice question dialogue is shown below:
“Physician: I am going to provide a series of virtual scenarios regarding liver transplantation. As a first task, I will need you to assign, based on the context of the question, one of the following labels: i. General information, ii. Diagnosis, iii. Differential Diagnosis, iv. Treatment, v. Prognosis, vi. Other. Then, as a second task, I will provide you with multiple-choice answers and I will need you to give the correct answer. Are you ready?
AI: Absolutely, I’m ready! Please proceed with the virtual scenarios, and I’ll do my best to assist you with the labelling and answering tasks.
Physician: Question 1: Which of the following is an absolute contraindication for liver transplantation?
A. History of cancer
B. Active tuberculosis
C. Cystic Fibrosis
D. Portal thrombosis
E. Active substance abuse
AI: Label: Treatment
Correct Answer: E. Active substance use”
The first task was to assign a label regarding the nature of the task of the multiple-choice question. The answer was compared with the label assigned by two physicians. Secondly, provided with the multiple-choice question, ChatGPT/GPT-4 was asked to solve the question to the best of their knowledge. Regarding the published case reports and the unpublished department cases, the cases were presented gradually, as parts, posing questions in between. All parts/questions were pre-defined and were not affected by the in-between ChatGPT/GPT-4 answers to avoid bias. Unpublished work was used to ensure that performance remained consistent for cases that were not part of ChatGPT’s or GPT-4’s training dataset (leading to high performance due to overfitting). ChatGPT and GPT-4were tested for the following skills: Providing a differential diagnosis for a patient following case presentation, suggesting the most probable diagnosis, suggesting the next diagnostic test, and finally, suggesting appropriate treatment options. An example of a case report/unpublished department case dialogue interaction is shown below:
“Physician: I am going to provide you with information regarding a case report of a patient who underwent liver transplantation. I am going to provide the information gradually, asking some questions in between.
A 60-year-old patient with a history of cirrhosis due to alcohol-associated liver disease […]
Could you provide a differential diagnosis for the etiology of this patient’s shock?
AI: Based on the information provided […]
Physician: Which is the most probable diagnosis for this patient?
AI: Based on the information provided […]
Physician: What further tests do you believe are needed for this patient?
AI: Based on the patient’s presentation and suspected […]
Physician: Further investigations were sent including mixed chimerism studies that revealed […], consistent with the diagnosis of graft vs host disorder.
What treatment(s) could be used for this patient?”
For multiple-choice questions, we recorded agreement and disagreement with the physician’s label for concept labelling and with the correct answer for response selection. Concept labelling was conducted by two authors independently and then reviewed by a third author. Disagreements were resolved through discussion among the three. For case reports and unpublished department cases, we assigned end-points as follows: Disagreement when the ChatGPT/GPT-4 proposal did not match/was different from the physicians’ decision. Partial agreement was assigned if the ChatGPT/GPT-4 proposal included a portion of the actions taken by physicians, and finally, agreement was assigned when the ChatGPT/GPT-4 proposal either perfectly matched the physicians’ actions or when the ChatGPT/GPT-4 included additional actions. For example, if physicians used medications A and B for treatment, and ChatGPT/GPT-4 proposed drug C, this would be labelled as disagreement. If ChatGPT/GPT-4 proposed drug A, it would be labelled as partial agreement. If it proposed both drugs A and B, or suggested a choice among A, B, and C, it would be labelled as agreement. All “ground truth” labels were determined before any of the conversations with these tools took place to mitigate confirmation and observer bias.
To create a dataset that could be used for statistical analysis, we constructed dataset tables that could then be translated into variables. This procedure required two layers of data decoding. More specifically, two types of tables were created for each resource dialogue conducted with ChatGPT/GPT-4 that had the following structure: (1) For multiple-choice questions, the table included the following details: Serial number of the dialogue with ChatGPT/GPT-4, the question posed, the resource, the predefined label for the question, the label assigned by ChatGPT/GPT-4, the agreement (A) or disagreement (D) status regarding the label, the ChatGPT/GPT-4 response to the question, and the agreement or disagreement status regarding the question; and (2) For the case reports and department cases, the table included: Serial number of the dialogue with ChatGPT/GPT-4, the question posed, the resource (not included in department cases), the action taken by the author’s teams for published case reports and our team for department cases, the actions proposed by ChatGPT/GPT-4, and the agreement, disagreement, or partial agreement status.
For each scenario, we assessed the model’s performance by categorizing the response as agreement, partial agreement, or disagreement based on the predefined criteria above. The proportions of responses in each category were then calculated across all scenarios. To compare the performance of ChatGPT and GPT-4, we used Pearson’s χ2-test to evaluate the distribution of agreement levels across the two models. Statistical significance was defined as P < 0.05. All statistical analyses were performed using SPSS29.
In total, the study generated 1388 data points, 1176 from multiple-choice questions and 212 from case questions. Two hundred ninety-four multiple-choice questions regarding renal and liver transplantations were collected, 108 regarding renal and 186 regarding liver transplantation[14-20]. When it comes to the nature of each scenario, 78 (26.5%) regarded general information, 22 (7.5%) regarded differential diagnosis, 48 (16.3%) regarded appropriate diagnostic test(s), 86 (29.3%) regarded treatment, and 60 (20.4%) regarded prognosis. Twenty case reports were selected from the literature. Ten cases regarded kidney transplantation, and 29 questions were derived from those cases. Ten cases regarded liver transplantation, and 34 questions were derived from those cases. Thus, in total, 63 questions on published case reports were tested. Regarding unpublished department cases, we chose 10 cases. Five cases regarded renal transplantation, and 26 questions were derived from those cases. Five cases regarded liver transplantation, and 17 questions were derived from those cases. In total, 43 questions about department cases were tested.
Multiple-choice questions: Tables 1 and 2 show the performance of ChatGPT and GPT-4 in assigning context labels for the 294 multiple-choice questions. Supplementary Tables 1 and 2 show the performance of ChatGPT and GPT-4 in answering those questions. Overall, ChatGPT assigned 58.2% correct labels (171 out of 294). ChatGPT’s accuracy in assigning appropriate labels varied across categories. Specifically, the highest accuracy was demonstrated in the treatment category, reaching 74.42% (64 out of 86), followed by diagnosis at 68.75% (33 out of 48). Performance regarding differential diagnosis and prognosis was the same, at 50%.
Actual label | Assigned label, GI | Assigned label, diagnosis | Assigned label, DD | Assigned label, treatment | Assigned label, prognosis | Assigned label, total |
GI | 33 (11.22) | 20 (6.8) | 6 (24) | 16 (5.44) | 3 (12) | 78 (26.53) |
Diagnosis | 8 (2.72) | 33 (11.22) | 5 (1.7) | 0 (0) | 2 (0.68) | 48 (16.33) |
DD | 0 (0) | 11 (3.74) | 11 (3.74) | 0 (0) | 0 (0) | 22 (7.48) |
Treatment | 9 (36) | 10 (3.4) | 2 (0.68) | 64 (21.77) | 1 (0.34) | 86 (29.25) |
Prognosis | 12 (48) | 12 (48) | 3 (12) | 3 (12) | 30 (10.2) | 60 (20.41) |
Total | 62 (219) | 86 (29.25) | 27 (9.18) | 83 (28.23) | 36 (12.24) | 294 (100) |
Actual label | Assigned label, GI | Assigned label, diagnosis | Assigned label, DD | Assigned label, treatment | Assigned label, prognosis | Assigned label, total |
GI | 42 (14.29) | 14 (4.76) | 1 (0.34) | 20 (6.8) | 1 (0.34) | 78 (26.53) |
Diagnosis | 2 (0.68) | 44 (14.97) | 1 (0.34) | 1 (0.34) | 0 (0) | 48 (16.33) |
DD | 0 (0) | 15 (5.1) | 5 (1.7) | 2 (0.68) | 0 (0) | 22 (7.48) |
Treatment | 5 (1.7) | 7 (2.38) | 1 (0.34) | 73 (24.83) | 0 (0) | 86 (29.25) |
Prognosis | 10 (3.4) | 11 (3.74) | 5 (1.7) | 7 (2.38) | 27 (9.18) | 60 (20.41) |
Total | 59 (207) | 91 (30.95) | 13 (4.42) | 103 (353) | 28 (9.52) | 294 (100) |
GPT-4 significantly improved overall accuracy compared to ChatGPT labelling correctly 191 out of 294 scenarios (64.97% vs 58.16%, P < 0.001). GPT-4 demonstrated improved performance in some categories compared to ChatGPT and lower in others. Notably, its performance in assigning the diagnosis label reached 91.7% (44 out of 48), a statistically significant difference compared to ChatGPT (P = 0.049). The treatment category also demonstrated a statistically significant improved accuracy compared with ChatGPT at 84.9% (73 out of 86, P < 0.001). GPT-4 performed poorer than ChatGPT in assigning differential diagnosis (22.7%), but this did not reach statistical significance (P = 0.13).
The performance of ChatGPT and GPT-4 in answering questions regarding kidney and liver transplantation was evaluated through a detailed review of their agreement and disagreement rates across multiple scenarios. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others (Supplementary Tables 1 and 2). Overall, ChatGPT correctly answered 50.3% (148 out of the 294) multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (208 out of 294), which was found statistically significant (P < 0.001). Regarding kidney transplantation, ChatGPT demonstrated an accuracy of 71.3% (77 out of 108), while GPT-4 had an accuracy of 83.3% (90 out of 108), a statistically significant difference (P = 0.006). Interestingly, both tools were right in 63.9% of instances (69 out of 108), both incorrect in 9.3% (10 out of 108), ChatGPT correct and GPT-4 incorrect in 7.4% (8 out of 108), and ChatGPT incorrect and GPT-4 correct in 19.4% (21 out of 108). Regarding liver transplantation, ChatGPT demonstrated an accuracy of 38.2% (71 out of 186), while GPT-4 had an accuracy of 63.4% (118 out of 186), a statistically significant improvement (P < 0.001) Interestingly, both tools were right in 33.9% of instances (63 out of 186), both incorrect in 32.3% (60 out of 186), ChatGPT correct and GPT-4 incorrect in 4.3% (8 out of 186), and ChatGPT incorrect and GPT-4 correct in 29.6% (55 out of 186).
When factoring based on the nature of the scenario, ChatGPT demonstrated an overall agreement of 43.6% (34 out of 78) for general information, 81.8% (18 out of 22) for differential diagnosis, 60.4% (29 out of 48) for the next diagnostic test, 45.3% for treatment (39 out of 86), and 46.7% for prognosis (28 out of 60). On the other hand, GPT-4 demonstrated superior performance in all types of scenarios except those regarding differential diagnosis. Specifically, GPT-4 demonstrated an overall agreement rate of 67.9% (53 out of 78) for general information, 77.3% (17 out of 22) for differential diagnosis, 77.1% (37 out of 48) for the next diagnostic test, 66.3% for treatment (57 out of 86), and 73.3% for prognosis (44 out of 60).
Published case reports: Table 3[21-30] and Table 4[22,31-38] compare the performance of ChatGPT and GPT-4 in various clinical tasks derived from published case reports. Overall, ChatGPT demonstrated an agreement rate of 50.79% (32 out of 63), a partial agreement rate of 17.46% (11 out of 63), and a disagreement rate of 31.75% (20 out of 63). GPT-4 demonstrated an agreement rate of 80.95% (51 out of 63), partial agreement of 9.52% (6 out of 63), and disagreement of 9.52% (6 out of 63). The overall performance of GPT-4 was found to be significantly higher compared with ChatGPT (P = 0.01). Regarding renal transplantation, ChatGPT demonstrated an agreement rate of 62.07% (18 out of 29), partial agreement of 13.79% (4 out of 29), and disagreement of 24.14% (7 out of 29). GPT-4 demonstrated an agreement rate of 89.66% (26 out of 29), partial agreement of 6.9% (2 out of 29), and disagreement of 3.45% (1 out of 29). Regarding liver transplantation, ChatGPT demonstrated an agreement rate of 41.18% (14 out of 34), partial agreement of 20.59% (7 out of 29), and disagreement of 38.24% (13 out of 34). GPT-4 demonstrated an agreement rate of 73.53% (25 out of 34), partial agreement of 11.76% (4 out of 34), and disagreement of 14.71% (5 out of 34). Supplementary Table 3 presents the performance of ChatGPT vs GPT-4 when categorized by the nature of the task. Notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 94.4% of the cases (17 out of 18). Furthermore, GPT-4 suggested an appropriate diagnostic test for further investigating patient’s symptoms in 90% of cases (9 out of 10). Finally, GPT-4 successfully suggested a treatment that agreed with the ground truth in 93.3% of the cases (14 out of 15).
Ref. | Question number | Task | Performance, ChatGPT/GPT-4 | Physicians course of action/ground truth | Agreement status, ChatGPT/GPT-4 |
Alharbi et al[21] | 1 | Provide a list of suitable antibiotics for pseudomonas aeruginosa urinary tract infection. | Provided a list of suitable antibiotics including the one used by physicians (meropenem)/provided a list of suitable antibiotics including the one used by physicians (meropenem) | Meropenem was administrated | A/A |
2 | Suggest the next diagnostic test(s) needed for the patient | Suggested a renal ultrasound and a stool culture/suggested a renal ultrasound, abdominal CT, blood cultures, and a stool culture | Abdominal ultrasound and abdominal CT scan were conducted | PA/A | |
3 | Identify the most probable diagnosis for the patient | Renal allograft malignancy/renal allograft malignancy | Eosinophilic chromophobe renal cell carcinoma was confirmed by the histopathological examination of the graft | A/A | |
Rubin et al[22] | 4 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | CMV viremia | A/A |
5 | Provide the most likely diagnosis for the patient | Post-influenza bacterial pneumonia/CMV reactivation | CMV viremia was demonstrated by antigenemia and PCR assay | D/A | |
6 | Suggest treatment for the patient | Suggested ganciclovir, valganciclovir, foscarnet, and cidofovir (most preferable ganciclovir or valganciclovir)/suggested ganciclovir, valganciclovir, foscarnet, and cidofovir (most preferable ganciclovir or valganciclovir) | Intravenous ganciclovir followed by oral valganciclovir at a dose of 900 mg/day was administered | A/A | |
Molina-Andújar et al[23] | 7 | Provide a DD for the patient | Provided a DD that included the final diagnosis/ Provided a DD that included the final diagnosis | Acute post-infectious glomerulonephritis | A/A |
8 | Provide the most likely diagnosis for the patient | Acute post-infectious glomerulonephritis/acute post-infectious glomerulonephritis | Acute post-infectious glomerulonephritis | A/A | |
Baker et al[24] | 9 | Provide the next step patient’s management | Suggested hemodynamic stabilization with transfusion of blood products and bleeding control including surgical intervention, if necessary, followed by continuous monitoring/Suggested hemodynamic stabilization with transfusion of blood products and surgical exploration if bleeding if suspected to be within the surgical site. Suggested medication reevaluation focus on anticoagulants, prophylactic treatment for infection prevention and continuous monitoring. | The patient was taken back to theatre for exploration where ligation of the bleeding artery, removal of blood clots and blood transfusion took place. Postoperative monitoring was performed | A/A |
10 | Suggest the next diagnostic test needed for the patient | Suggested an abdominal CT scan or an ultrasound/suggested imaging such as abdominal CT with contrast, an ultrasound or an angiogram is performed. Suggested evaluating the patient with new laboratory tests and for the need of re-exploration | An urgent CT angiogram was performed | PA/A | |
11 | Provide a DD for the bleeding | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Bleeding from a small branch of the renal artery | A/A | |
12 | Provide the most likely diagnosis for the patient | Failure or dislodgement of a surgical clip: Bleeding from a small branch of the renal artery where a surgical clip had come off during the re-exploration surgery/Failure or dislodgement of a surgical clip: Bleeding from a small branch of the renal artery where a surgical clip had come off during the re-exploration surgery | Bleeding was noticed from a small branch of the renal artery | A/A | |
Gewehr et al[25] | 13 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Fungal infection | A/A |
14 | Provide the most likely diagnosis for the patient | Fungal Infection/fungal Infection, and specifically sporotrichosis | Fungal Infection (sporotrichosis) | A/A | |
Vassallo et al[26] | 15 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Active hepatitis E virus infection | A/A |
16 | Provide the most likely diagnosis for the patient | NAFLD/NAFLD or drug induced | Active hepatitis E virus infection | D/D | |
17 | Suggest the next diagnostic test needed for the patient | Suggested liver biopsy along with further imaging and laboratory investigations/suggested liver biopsy along with further imaging and laboratory investigations | Liver biopsy | A/A | |
18 | Suggest the next diagnostic test needed for the patient after the biopsy results | Suggested extensive viral serologic tests, PCR for suspected viruses, immunostaining of liver biopsy, and continuous monitoring of liver function/suggested extensive viral serologic tests, PCR for suspected viruses, immunostaining of liver biopsy, and continuous monitoring of liver function | A more extensive viral screen was conducted | A/A | |
Olsen et al[27] | 19 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Epstein-Barr virus-negative, diffuse, large B-cell lymphoma | A/A |
20 | Provide the most likely diagnosis for the patient | Suggested that infectious aetiologies such as disseminated tuberculosis or fungal infections are more likely. It implied that diagnosis is difficult without further diagnostic investigations/determined as PTLD as the most likely diagnosis followed by infectious aetiologies | Epstein-Barr virus-negative, diffuse, large B cell lymphoma | D/A | |
21 | Suggest the next diagnostic test needed for the patient | Suggested sputum and/or BAL cultures, Mantoux test or IGRA, Blood tests, further imaging and laboratory tests, and lung biopsy/suggested liver biopsy, sputum and/or BAL cultures, Mantoux test or IGRA, Blood tests, further imaging and laboratory tests, and lung biopsy | Biopsy from one of the liver lesions | D/A | |
Allam et al[28] | 22 | Suggest the next diagnostic test needed for the patient | Suggested a kidney biopsy/suggested a kidney biopsy and further laboratory tests | Transplant biopsy was performed | A/A |
23 | Provide a DD for the patient | Provided a DD that did not include the final diagnosis/provided a DD including vascular complications such as vein stenosis | Biopsy-induced arteriovenous fistula and venous stenosis | D/PA | |
24 | Suggest treatment for the patient | Suggested intervention to address the arteriovenous fistula and stenosis of the main renal vein (embolization, angioplasty, stenting)/suggested intervention to address the arteriovenous fistula and stenosis of the main renal vein (embolization, angioplasty, stenting) | Embolization of fistula (coil occlusion) | A/A | |
Subramanian et al[29] | 25 | Provide a DD for the patient | Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis | A small basal ganglia infarct and an infarct of the spinal cord was found | D/A |
26 | Provide the most likely diagnosis for the patient | Suggested ischemic injury or infarction of the spinal cord/suggested spinal cord ischemia or infarction | A small basal ganglia infarct and an infarct of the spinal cord was found | A/A | |
27 | Suggest the next diagnostic test needed for the patient | Suggested spine MRI, NCS and EMG to assess peripheral nerves and muscles, lumbar puncture if infections suspected, and transplant biopsy if rejection or ischemia is suspected/suggested spine MRI-MRA, neurond physiological studies (SSEP, NCS and EMG), lumbar puncture if infections suspected | A CTAP, and spine/brain MRI were performed | PA/PA | |
Ainsworth et al[30] | 28 | Provide a DD for the patient | Provided a DD that included immune-mediated hemolysis but did not specifically include PLS/provided a DD that included the final diagnosis | PLS | PA/A |
29 | Provide the most likely diagnosis for the patient | Suggested hemolysis due to mismatched blood type of the donor/suggested PLS | PLS | D/A |
Ref. | Question number | Task | Performance, ChatGPT/GPT-4 | Physicians course of action/ground truth | Agreement status, ChatGPT/GPT-4 |
Rubin et al[22] | 1 | Case presentation/provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | CMV | A/A |
2 | Provide the most likely diagnosis for the patient | Suggested post-transplant infection, particularly a viral infection (CMV, EBV, or VZV)/CMV | CMV | PA/A | |
3 | Justify the recurrence of CMV infections despite treatment | Suggested resistance to ganciclovir/suggested resistance to ganciclovir or/and inadequate duration of initial treatment-secondary infections | Ganciclovir resistant infection | A/A | |
4 | Suggest alternative treatment for the patient | Suggested foscarnet/suggested foscarnet or cidofovir or letermovir or/and CMV immunoglobulins | Foscarnet was administered | A/A | |
Okeke et al[31] | 5 | Case presentation/suggest treatment for the patient given no arterial flow in the liver | Suggested interventional radiology procedures or/and surgical revascularization/suggested interventional radiology procedures or/and surgical revascularization (thrombectomy or re-anastomosis) | Interventional radiology procedure (thrombolysis) was performed. Then revascularization was achieved intraoperatively (infra-aortic jump was performed) | PA/PA |
6 | Suggest the diagnostic tests needed for the patient following re-thrombosis | Suggested doppler ultrasound, CT angiogram, coagulation profile-thrombophilia testing/suggested thrombophilia workup, repeat imaging (doppler ultrasound, CT/MRI angiography), and autoimmune markers | Hypercoagulable workup was performed | A/A | |
7 | Provide a DD behind re-thrombosis | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Antiphospholipid syndrome | A/A | |
8 | Provide the most likely diagnosis for the patient | Suggested hepatic artery thrombosis/suggested antiphospholipid syndrome | Antiphospholipid syndrome | D/A | |
Eubank et al[32] | 9 | Case presentation/determine the most likely microorganism to be identified by the swab | Suggested Staphylococcus aureus, Streptococcus species, Enterococcus species, and Pseudomonas aeruginosa, and fungi like Candida albicans/suggested Staphylococcus aureus, Enterococcus species, Pseudomonas aeruginosa, Escherichia coli, fungi like Candida or Aspergillus, viruses like CMV, and mycobacteria | 94% Enterococcus faecalis, 93% Rhizopus oryzae, and 5% Aspergillus flavus | D/PA |
10 | Suggest treatment for the patient given the pathogens identified | Suggested intravenous liposomal amphotericin B at an appropriate dosage, along with surgical debridement of infected tissue/suggested intravenous liposomal amphotericin B at an appropriate dosage, oral posaconazole along with surgical debridement of infected tissue. | Oral posaconazole 300 mg and IV amphotericin B and micafungin daily. Amphotericin B deoxycholate irrigation in the wound vacuum | PA/A | |
Kim et al[33] | 11 | Case presentation/provide a DD for the patient’s shock | Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis | GVHD | D/A |
12 | Provide the most likely diagnosis for the patient | Suggested a surgical complication, specifically duodenal perforation/suggested duodenal perforation or drug-induced kidney injury/neutropenia | GVHD | D/D | |
13 | Suggest the further diagnostic tests needed for the patient | Suggested blood cultures, peritoneal fluid analysis, endoscopy or upper GI imaging/suggested blood and urine cultures, viral and fungal tests, peritoneal fluid analysis, laboratory tests, and endoscopy or upper GI imaging | Mixed chimerism studies and skin biopsy were performed | D/D | |
14 | Suggest further treatment for the patient given the mixed chimerism studies results | The following treatment options were suggested: Systemic corticosteroids, adjusting tacrolimus dose, consider additional immunosuppressives such as mycophenolate, and phototherapy/suggested considering the following treatment options: High-dose corticosteroids, ATG, ECP, infliximab, ruxolitinib, MSC transplantation, additional immunosuppressive agents, and IL-2 diphtheria toxin | Steroids were administrated for 4 days followed by ruxolitinib due to patient not responding to treatment | PA/A | |
15 | Guess the survival of the patient | Suggested that the patient did not, most likely, survive/suggested that the patient did not, most likely, survive | The patient died on day 16 of re-admission, 45 days following transplantation | A/A | |
Kim et al[33], (b) | 16 | Case presentation/provide a DD for the patient | Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis | GVHD | D/A |
17 | Provide the most likely diagnosis for the patient | Suggested Clostridioides difficile colitis/suggested GVHD | GVHD | D/A | |
18 | Suggest treatment for the patient | The following treatment options were suggested: Glucocorticoids, CNIs, ATG, T-cell depleting agents such as basiliximab/high-dose corticosteroids, adjust immunosuppression, consider second line treatments such as ATG, ECP, sirolimus, infliximab, and basiliximab | Steroids were administrated for 2 days followed by ruxolitinib due to patient not responding to treatment | PA/PA | |
19 | Guess the survival of the patient | Declined to make a prediction/suggested that the patient did not, most likely, survive | The patient died 29 days after transplant | D/A | |
Ramírez de la Piscina et al[34] | 20 | Case presentation/Provide a DD for the patient | Provided a DD that included the final diagnosis/ provided a DD that included the final diagnosis | Budd-Chiari syndrome secondary to ADPKD | A/A |
21 | Provide the most likely diagnosis for the patient | Suggested Budd-Chiari syndrome/suggested Budd-Chiari syndrome secondary to the compression from ADPKD cysts | Budd-Chiari syndrome secondary to ADPKD | A/A | |
22 | Suggest treatment for the patient | Provided a list of suitable treatment options including only liver transplantation/provided a list of suitable treatment options including combined transplantation | A combined liver and renal transplantation was performed | PA/A | |
Arstikyte et al[35] | 23 | Case presentation/provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Venous air embolism | A/A |
24 | Provide the most likely diagnosis for the patient | Suggested that information given is insufficient to single out a specific diagnosis/suggested that based on given information hemorrhage or venous air embolism are the two most likely diagnoses | Venous air embolism | D/A | |
25 | Suggest appropriate diagnostic test for the patient | Suggested TEE/suggested TEE | TEE | A/A | |
Aucejo et al[36] | 26 | Case presentation/provide a DD for the patient | Provided a DD that did not include the final diagnosis/provided a DD that did not include the final diagnosis | Narrowing of the RHV at the level of the cava-caval anastomosis | D/D |
27 | Provide the most likely diagnosis for the patient | Suggested adhesions, anastomotic leakage, or biliary complications/suggested PVT | Narrowing of the RHV at the level of the cava-caval anastomosis | D/D | |
28 | Given the RHV stenosis diagnosis, suggest treatment for the patient | Suggested considering stent placement, TIPS or surgical revision/suggested considering stent placement, TIPS or surgical revision | A wall stent 14 mm in diameter by 40 mm in length was placed across the RHV stenosis | A/A | |
Ichimura et al[37] | 29 | Case presentation/provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | VOD/SOS | A/A |
30 | Provide the most likely diagnosis for the patient | Suggested GVHD/suggested VOD/SOS | VOD/SOS | D/A | |
31 | Suggest treatment for the patient given VOD/SOS | Suggested considering defibrotide, anticoagulant medications, and liver transplantation/suggested considering defibrotide, anticoagulant medications, TIPS, and liver transplantation | The physicians performed a liver transplantation since defibrotide had not yet been approved | A/A | |
32 | Provide a new differential diagnosis for the patient’s deterioration postoperatively | Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis | GVHD, several infections | D/A | |
Trevizoli et al[38] | 33 | Case presentation/suggest appropriate treatment for the patient | Suggested considering corticosteroids, aminosalicylates, immunomodulators such as azathioprine, biologic agents such as infliximab, diuretics, variceal bleeding prophylaxis and liver transplant evaluation/suggested considering corticosteroids, aminosalicylates, immunomodulators such as azathioprine, biologic agents such as infliximab, consider surgical management (colectomy), diuretics, variceal bleeding prophylaxis and liver transplant evaluation | Sodium restriction, diuretic therapy, hydrocortisone 300 mg was started without adequate response, vedolizumab | PA/PA |
34 | Suggest appropriate treatment for the patient given the DVT progression | Suggested LMWH and IVF/suggested LMWH | He underwent hemodynamic intervention with the placement of a vena cava filter | A/D |
Unpublished department cases: Supplementary Tables 4 and 5 provide the case presentation of the unpublished department cases provided to ChatGPT/GPT-4 before their performance was tested on various tasks. Tables 5 and 6 compare the accuracy of ChatGPT and GPT-4 in various clinical tasks derived from those unpublished department cases. Overall, ChatGPT demonstrated an agreement rate of 53.49% (23 out of 43), partial agreement of 23.26% (10 out of 43), and disagreement of 23.26% (10 out of 43). GPT-4 demonstrated an agreement rate of 72.09% (31 out of 43), partial agreement of 6.98% (3 out of 43), and disagreement of 20.93% (9 out of 43). The overall performance of GPT-4 was found to be significantly higher compared with ChatGPT (P = 0.004).
Case ID | Question number | Task | Performance, ChatGPT/GPT-4 | Physicians course of action/ground truth | Agreement status, ChatGPT/GPT-4 |
1 | 1 | Case presentation/provide the diagnostic tests needed to investigate refractory ascites in patient with ADPKD | Suggested abdominal ultrasound, paracentesis with fluid analysis, LF tests, tumor marker tests, CT scan, serologic testing, genetic testing/ suggested paracentesis with fluid analysis, LF tests, abdominal ultrasound, CT scan, echocardiogram, and endoscopy, further evaluation for elevated markers | Paracentesis (ascites fluid was send for cytology, culture, TB investigation, SAAG calculation), abdominal CT, liver ultrasound, LF tests, tumor marker tests, serologic testing, echocardiogram, and endoscopy | PA/A |
2 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Tuberculous peritonitis | A/A | |
3 | Provide the most likely diagnosis for the patient | Suggested malignancy (most likely ovarian cancer) or SBP are the most likely diagnoses/suggested tuberculous peritonitis or malignancy or SBP as the most likely diagnoses | Tuberculous peritonitis | D/A | |
2 | 4 | Case presentation/provide a differential diagnosis for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Acute PE | A/A |
5 | Provide the most probable diagnosis for the patient | Suggested myocardial infraction as the most probable diagnosis/suggested PE as the most probable diagonal | Acute PE | D/A | |
6 | What diagnostic test is more suitable for this patient | Suggested CTPA and ECG be performed/suggested CTPA, ECG, and d-dimers tests be performed | CTPA was performed | A/A | |
7 | What treatment do you recommend for this patient, given PE is confirmed | Suggested a choice among LMWH, DOACs, and warfarin. No discrimination between short and long-term anticoagulation was made. Suggested initial anticoagulation with either LMWH or DOACs including apixaban followed by a long-term anticoagulation with either a DOAC or warfarin | 10 mg apixaban BD was commenced followed by 5 mg BD for 6 months | PA/A | |
3 | 8 | Case presentation/provide a DD given the post-operative signs/symptoms of the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Post-operative bleeding | A/A |
9 | Provide the most probable diagnosis | Suggested exacerbation or progression of her underlying thrombocytopenic disorder/suggested post-transplant acute thrombotic microangiopathy | Post-operative bleeding | D/D | |
10 | Predict the next diagnostic test that the patient requires | Suggested coagulation studies, renal function test, peripheral blood smear, infectious testing and imaging including ultrasound and CT/suggested peripheral blood smear, LDH level, Coombs test, renal function, immunosuppressive level tests, and infection screening. | Abdominal ultrasound and abdomen/pelvis CT with contrast | PA/D | |
11 | Appropriate treatment given the evidence of active bleeding | Suggested stabilization with intravenous fluids and blood products, surgical intervention, and close monitoring/suggested stabilization with intravenous fluids and blood products, surgical intervention, and close monitoring | The patient was transfused and was re-explored | A/A | |
4 | 12 | Case presentation/provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Acute graft thrombosis due to renal vein thrombosis | A/A |
13 | Provide the most probable diagnosis | Suggested acute graft thrombosis due to either renal artery or vein thrombosis/suggested acute graft thrombosis due to renal vein thrombosis | Acute graft thrombosis due to renal vein thrombosis | A/A | |
14 | Provide the most suitable diagnostic test | Suggested choosing among transplant duplex US, CT angiography, and renal scintigraphy/suggested choosing among transplant duplex US, CT angiography, and renal scintigraphy | Transplant doppler US | A/A | |
15 | Given the transplant US findings, provide the patient’s diagnosis | Acute renal allograft rejection/acute renal artery thrombosis or artery stenosis | Renal vein thrombosis | D/D | |
16 | Given the transplant US findings, suggest a diagnostic modality that could verify diagnosis | Renal biopsy/suggested CT angiography | CT angiography was performed | D/A | |
17 | Suggest treatment options for the patient | Suggested considering high-dose corticosteroids, antithymocyte globulin, calcineurin inhibitors, mycophenolate mofetil, basiliximab or alemtuzumab, and plasmapheresis with intravenous immunoglobulin/suggested surgical revascularization | Patient was re-explored | D/A | |
18 | Findings of reperfusion during benchwork after explanation | Suggested inadequate restoration of tissue perfusion and significant vascular compromise and tissue damage/suggested extensive, vascular thrombosis with poor kidney perfusion, and evidence of parenchymal damage | Artery perfusion required high pressure, kidney became turgid, swollen, and a capsular tear was seen | A/A | |
5 | 19 | Case presentation/provide DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Post-transplant obstructive LUTS due to clot retention | A/A |
20 | Provide most probable diagnosis | Suggested urinary tract obstruction most probably at the side of the anastomosis/suggested urinary tract obstruction due to blood clot formation as the most probable diagnosis | Post-transplant obstructive LUTS due clot retention | PA/A | |
21 | Suggest next diagnosis test to verify the diagnosis | Suggested considering transplant US, abdominal CT or renal scintigraphy/suggested transplant US as the first-line image modality. Suggested that other option include abdominal CT, MRI, and nuclear medicine scans | A transplant US was performed | A/A | |
22 | Given findings of US/suggest a suitable treatment option for the patient | Suggested considering manual irrigation, catheter flushing, cystoscopic clot evaluation, and monitoring/suggested replacing the foley catheter to flush out smaller clots, cystoscopic clot evaluation, consider percutaneous nephrostomy, and monitoring | A 3-way irrigation system was applied | PA/PA | |
23 | Despite resolved hematuria patient’s clearance did not improved/provide a DD | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Acute kidney injury with acute tubular necrosis | A/A | |
24 | Provide most probable diagnosis | Suggested acute kidney injury with acute tubular necrosis as the most probable diagnosis/suggested acute kidney injury with acute tubular necrosis as the most probable diagnosis | Acute kidney injury with acute tubular necrosis | A/A | |
25 | Case progression update/poor renal function 3 months post-operatively provide DD for patient’s signs and symptoms | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Recurrence of underlying disease | A/A | |
26 | Provide most probable diagnosis | Suggested chronic allograft dysfunction as the most probable diagnosis/suggested chronic allograft dysfunction and recurrence of the underlying disease as the two most probable diagnoses | Recurrence of underlying disease | D/PA |
Case ID | Question number | Task | Performance, ChatGPT/GPT-4 | Physicians course of action/ground truth | Agreement status, ChatGPT/GPT-4 |
1 | 1 | Case presentation/provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Early anastomotic bile leak | A/A |
2 | Provide the most probable diagnosis | Suggested that a biliary complication including bile leak as the most probable diagnosis/suggested bile leak as the most probable diagnosis | Early anastomotic bile leak | A/A | |
3 | Suggest a suitable diagnostic test to confirm the diagnosis | Suggested considering abdominal US or CT, and MRCP/suggested considering abdominal US or CT, fluid drain analysis, and MRCP | Abdominal CT and fluid drain analysis were performed | PA/A | |
4 | Suggest a suitable treatment for this patient | Suggested considering percutaneous drainage, ERCP, surgical intervention, and antibiotics if there are signs of infection/suggested considering as a first line less invasive treatments such as percutaneous drainage and ERCP and procced with re-exploration if those fail, while covering the patient with antibiotics | Antibiotics were commenced, followed by an ERCP which did not resolve the bile leak and the patient was re-explored | A/A | |
2 | 5 | Case presentation/calculate CP score, MELD score, and MELD-sodium score | Accurately calculated CP score and MELD score, underestimated MELD-sodium score/accurately calculated the required scores | CP score = 13, MELD score = 34, and MELD-sodium score = 37 | PA/A |
6 | Patient’s pre-operative assessment findings presented/evaluate patient’s eligibility to proceed with transplantation | Suggested that it’s likely that the operation was postponed or deferred until the patient's condition improved/suggested that given the findings the transplant team would have opted to delay the liver transplantation until active issues were adequately addressed | Transplantation did not proceed | A/A | |
3 | 7 | Case presentation/provide a DD for the patient | Provided a DD that did not include the final diagnosis/provided a DD that did not include the final diagnosis | PLS | D/D |
8 | Provide the most probable diagnosis | Suggested acute cellular rejection as the most probable diagnosis/suggested acute hemolytic transfusion reaction | PLS | D/D | |
9 | Suggest treatment options for the patient | Suggested high-dose of intravenous corticosteroids, other anti-rejection medications, and plasmapheresis/suggested not furtherly transfusing the patient, administer corticosteroids, and monitor the patient | Patient was treated with high-dose corticosteroids, plasmapheresis, and intravenous immunoglobulin | PA/D | |
10 | Given the patient’s 3-month new signs/symptoms (recurrent ascites, low-grade fever etc.), provide a new DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | PTLD | A/A | |
11 | Provide the most probable diagnosis | Suggested PTLD as the most probable diagnosis/suggested nephrotic syndrome as the most probable diagnosis | PTLD | A/D | |
4 | 12 | Case presentation/ suggest the most suitable diagnostic test | Brain imaging was suggested/suggested brain imaging, EEG, and tacrolimus level test | A brain CT, EEG, and tacrolimus level test were performed | PA/A |
13 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | PRES | A/A | |
14 | Provide the most probable diagnosis | Suggested PRES as the most probable diagnosis/suggested tacrolimus neurotoxicity as the most probable diagnosis | PRES | A/D | |
5 | 15 | Case presentation/provide DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | GVHD | A/A |
16 | Provide most probable diagnosis | Suggested CMV infection as the most probable diagnosis/suggest CMV infection as the most probable diagnosis | GVHD | D/D | |
17 | Suggest appropriate diagnostic tests | Suggested CMV testing, biopsy, and imaging studies/suggested CMV testing, imaging studies, and skin biopsy | Peripheral blood flow cytometry, colonoscopy, and skin biopsy were performed | PA/PA |
Regarding renal transplantation, ChatGPT demonstrated an agreement rate of 53.85% (14 out of 26), partial agreement of 19.23% (5 out of 26), and disagreement of 26.92% (7 out of 26). GPT-4 demonstrated an agreement rate of 80.77% (21 out of 26), partial agreement of 7.69% (2 out of 29), and disagreement of 11.54% (3 out of 26). Regarding liver transplantation, ChatGPT demonstrated an agreement rate of 52.94% (9 out of 17), partial agreement of 29.41% (5 out of 17), and disagreement of 17.65% (3 out of 17). GPT-4 demonstrated an agreement rate of 58.82% (10 out of 17), partial agreement of 5.88% (1 out of 17), and disagreement of 35.29% (6 out of 17). Supplementary Table 6 shows the performance of ChatGPT vs GPT-4 when factoring in the nature of the task. Notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 91.7% of the cases (11 out of 12). Furthermore, GPT-4 successfully suggested an appropriate diagnostic test for further investigating patient’s symptoms in 77.8% of cases (7 out of 9).
When compared, the performance of ChatGPT did not differ between the published case reports and the unpublished department cases (P = 0.459). Similarly, the performance of GPT-4 did not differ significantly (P = 0.232). Finally, Table 7 represents the performance of ChatGPT vs GPT-4 when factoring in the nature of the task for all cases, published or unpublished. Notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (27 out of 30). Furthermore, GPT- suggested an appropriate diagnostic test for further investigating patient’s symptoms in 78.9% of cases (7 out of 9). Finally, GPT-4 predicted the prognosis in 100% of related questions (5 out of 5).
Type of task | Overall chatGPT agreement level | Overall GPT-4 agreement level | chatGPT renal transplantation agreement level | GPT-4 renal transplantation agreement level | chatGPT liver transplantation agreement level | GPT-4 liver transplantation agreement level |
DD that includes final diagnosis | A: 22/30 (73.3) | A: 27/30 (90) | A: 13/16 (81.3) | A: 15/16 (93.8) | A: 9/14 (64.3) | A: 12/14 (85.7) |
PA: 1/30 (3.33) | PA: 1/30 (3.3) | PA: 1/16 (6.3) | PA: 1/16 (6.2) | PA: 0/14 (0) | PA: 0/14 (0) | |
Final diagnosis prediction | A: 11/31 (35.5) | A: 20/31 (64.5) | A: 7/17 (41.2) | A: 13/17 (76.5) | A: 4/14 (28.6) | A: 7/14 (50) |
PA: 2/31 (6.45) | PA: 2/31 (6.5) | PA: 1/17 (5.9) | PA: 1/17 (5.9) | PA: 1/14 (7.1) | PA: 1/14 (7.1) | |
Appropriate next diagnostic test | A: 8/19 (42.1) | A: 15/19 (78.9) | A: 6/13 (46.2) | A: 11/13 (84.6) | A: 2/6 (33.6) | A: 4/6 (66.7) |
PA: 8/19 (42.1) | PA: 2/19 (10.5) | PA: 5/13 (38.5) | PA: 1/13 (7.7) | PA: 3/6 (50) | PA: 1/6 (16.7) | |
Appropriate treatment | A: 11/21 (52.4) | A: 15/21 (71.4) | A: 5/8 (62.5) | A: 7/8 (87.5) | A: 6/13 (46.2) | A: 4/6 (66.7) |
PA: 9/21 (42.9) | PA: 4/21 (19) | PA: 2/8(25%) | PA: 1/8 (12.5) | PA: 7/13 (53.8) | PA: 1/6 (16.7) | |
Prediction of prognosis | A: 3/5 (60) | A: 5/5 (100) | A: 1/1 (100%) | A: 1/1 (100) | A: 2/4 (50) | A: 4/4 (100) |
PA: 1/5 (20) | PA: 0/5 (0) | PA: 0/0 (0%) | PA: 0/0 (0) | PA: 1/4 (25) | PA: 0/4 (0) |
In this paper, we investigated the performance of ChatGPT and GPT-4 in various clinical scenarios regarding renal and liver transplantation in an effort to evaluate the potential role of these tools in AI-assisted clinical practice. GPT-4 demonstrated a superior performance in all types of scenarios. Specifically, GPT-4 was right approximately six out of 10 times when solving challenging multiple-choice questions in renal and liver transplantation. Regarding published case reports, the comparative analysis across these real-world case reports reveals that both models are highly capable, while GPT-4 generally demonstrates an edge in comprehensive responses and alignment with clinical practices. GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. These findings were similar in unpublished work. Notably, GPT-4 demonstrated outstanding performance in specific tasks, providing a differential diagnosis that included the final diagnosis in 90% of the cases (27 out of 30), suggested an appropriate diagnostic test for further investigating patient’s symptoms in 78.9% of cases (7 out of 9), and predicted the prognosis of the patient in 100% of related questions (5 out of 5). ChatGPT’s and GPT-4’ performance remained consistent when tested in unpublished material. This suggests that the performance of these tools is unaffected by whether the cases presented were potentially part of the training set. In other words, the performance is genuine, and not a result of overfitting (higher performance in the training dataset, which drops significantly when unknown instances are introduced). While both tools demonstrated notable strengths in addressing a wide range of clinical scenarios, certain areas revealed consistent underperformance. Firstly, the more detailed a case summary was, the more comprehensive the response. These tools underperformed when tasked with interpreting ambiguous or incomplete clinical data, as their reasoning relies on patterns learned from the training data rather than experiential understanding. Additionally, both models struggled with rare conditions, as those are underrepresented in their training datasets, leading to oversimplified or incorrect recommendations. Furthermore, while GPT-4 demonstrates improved contextual awareness, both models generate responses that, while plausible, lack the depth required for clinical decision-making. These areas of underperformance underscore the importance of human oversight and highlight opportunities for further refinement in AI models for clinical use.
Some other studies have also investigated the role of ChatGPT in renal or liver transplantation. Rawashdeh et al[13] evaluated the potential use of ChatGPT in medical scenarios related to kidney transplantation and its applicability. ChatGPT was tested on general questions about kidney transplantation, writing scientific texts on this topic, and generating summaries of texts about kidney transplantation[13]. The authors, with the help of two experts, assessed the validity, scientific accuracy, clarity, conciseness, and repeatability of the texts and answers generated by ChatGPT. The study results indicated that ChatGPT demonstrated satisfactory knowledge of general issues about kidney transplantation but failed to present detailed and accurate answers to specific questions[13]. ChatGPT’s responses maintained a scientific language and tone, but some elements were not factual. According to the two experts, none of the answers were error-free, and some of the bibliographies were inaccurate and unreliable. Finally, the ChatGPT answers and texts had sufficient repeatability, as there were no statistically significant differences on separate days[13].
Endo et al[11] investigated the accuracy and reliability of ChatGPT’s responses to questions related to liver transplantation . The authors developed a set of 29 questions covering general information about liver transplantation, including: (1) 4 general questions; (2) 7 questions about the waiting list; (3) 13 questions about the pre-transplant period; and (4) 5 questions about the donor[11]. The quality of the responses was independently assessed by “quality grades” by 17 experts in the field of abdominal transplant surgery. A total of 493 “quality scores” (29 questions × 17 experts) were collected, of which 46.0% were “very good”, 30.2% were “excellent” and 7.2% were “poor” or “fair”. Overall, 70.6% of the experts considered ChatGPT to be an accurate source of information[11]. In a different, recent study, regarding liver transplantation[12]. Finally, in a recent study, Mankowski et al[39] compared the performance of ChatGPT, GPT-4, GPT-4 visual against nephrology fellows and training program directors in 12 multiple-choice questions assessing six kidney transplant cases. Notably, GPT-4 visual, performed comparably to nephrology fellows and training program directors, answering correctly in 10 questions, while nephrology fellows and training program directors answered 9 and 11 questions correctly, respectively[39]. Notably, GPT-4 visual demonstrated significantly higher performance compared to all its predecessors, showcasing how these models rapidly evolve significantly in short periods of time.
With 400 tested questions, this is the first study of this scale and versatility in clinical transplantation. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. Our analysis underscores how the performance of these tools is enhanced as those tools become more sophisticated. As these AI tools evolve, they could potentially address several gaps in renal and liver transplant practice. These include, among others, optimizing workflow by automating routine documentation, synthesizing and summarizing extensive medical literature for clinicians in seconds, enhancing access to transplantation expertise in resource-limited settings (particularly in time-sensitive settings) to non-specialist clinicians, providing personalized decision-support tools for transplant candidate selection, and enhancing patient education by simplifying complex medical concepts, thus fostering better understanding and communication between patients and healthcare providers. Notably, the ultimate goal is for these tools to augment, rather than replace, the role of physicians, ensuring safer and more effective patient care.
Physicians, like all professionals, are not infallible. In general, errors by AI are met more harshly than errors by humans. An intriguing aspect to explore would be to provide experienced transplant physicians with the same scenarios and compare the performance of AI-based tools with the physician’s performance. Nevertheless, it’s only fair to assume that as these models keep progressing and becoming more sophisticated, they will eventually surpass physicians in performing certain tasks. Future research should focus on validating these results across a broader range of medical fields, patient populations, and clinical environments to ensure generalizability. Additionally, continuous evaluation and updating are imperative to maintain performance and relevance in clinical decision-making as these models evolve and diverge from their initial versions. Aside from accuracy concerns, the application of AI-based tools in healthcare faces a plethora of other challenges, such as intrinsic bias, data protection and cybersecurity concerns, cost-effectiveness, interpretability, intellectual property, oversight and liability concerns, and ethical concerns[2,40]. Ethical considerations include the complexity of assigning liability in cases of erroneous AI recommendations. Current legal frameworks need physicians to provide care consistent with standard practices, which shields them from liability when standard care is followed[41]. However, this may inadvertently discourage physicians from fully leveraging AI tools, reducing them to confirmatory aids rather than tools to enhance care[41]. Without a comprehensive legal framework addressing AI liability, healthcare facilities may remain hesitant to adopt these technologies due to concerns about potential exposure to malpractice claims. This highlights the urgent need for clear policies to balance innovation with accountability in AI-assisted clinical decision-making.
Another important challenge of adopting AI tools in healthcare is the lack of interpretability (inability to provide an explanation of the inner logic that led to the recommendation)[42]. To address this, actionable steps must be taken to ensure that these tools are both trusted and effective. First, prioritizing tools equipped with interpretability features, demonstrating which patient characteristics most influenced the decision. Currently, the latest versions of GPT-4 are able to provide citations for the information provided. Integrating AI into real-world clinical workflows will first require studies to assess its practical impact on patient outcomes, workflow efficiency, and clinical adoption. Addressing ethical and regulatory challenges, such as mitigating biases, ensuring data security, and establishing accountability frameworks, will also be critical to realizing the full potential of AI in healthcare. Policymakers must establish clear guidelines mandating a baseline level of explainability for AI tools used in healthcare. This ensures that clinicians can understand and justify AI-assisted decisions, which is essential for maintaining patient trust and ethical integrity. Both transplant professionals and policymakers should encourage ongoing education and training on the use of AI, ensuring that clinicians can effectively apply and evaluate those tools in clinical practice.
Although this study included diverse question types and scenarios related to kidney and liver transplantation, the generalizability of the findings may be limited by the regional and demographic scope of the cases used. One critical consideration is that while not fully disclosed, the training sets of the above models likely draw heavily on publicly available medical literature. Thus, the clinical scenarios tested in this paper could have also been part of the training data of these models. This introduces a bias of overfitting, which means that the model demonstrates superior performance on well-documented scenarios in the literature, significantly decreasing when the model encounters less commonly studied conditions or atypical clinical presentations. This lack of generalizability underscores the need for caution when using these models for conditions underrepresented in the literature. However, we have mitigated this by comparing the performance to a set of unpublished work, proving that the performance is maintained at statistically comparable levels.
In earlier publications, we predicted that AI would eventually “infiltrate” the healthcare industry[43]. It seems now that AI is at healthcare’s doorstep. It is essential to highlight that AI in healthcare should aim to embrace the complexity of our profession and augment our intelligence rather than replace it. Clinical reasoning and critical thinking involve non-quantifiable information that AI cannot integrate. In other words, we should aim for AI-assisted and not AI-driven clinical practice. As more AI tools are integrated into clinical practice, advanced evaluation systems must be developed to assess their unintended consequences and impact on patient outcomes. AI is here, and physicians must engage with it to avoid obsolescence.
1. | Ertel W. Introduction to Artificial Intelligence. Wiesbaden: Springer Wiesbaden, 2024. [DOI] [Full Text] |
2. | Christou CD, Tsoulfas G. Challenges and opportunities in the application of artificial intelligence in gastroenterology and hepatology. World J Gastroenterol. 2021;27:6191-6223. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 17] [Cited by in RCA: 17] [Article Influence: 4.3] [Reference Citation Analysis (7)] |
3. | Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. [cited 15 October 2024]. Available from: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf. |
4. | Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1564] [Cited by in RCA: 1295] [Article Influence: 647.5] [Reference Citation Analysis (0)] |
5. | Park JY. Could ChatGPT help you to write your next scientific paper?: concerns on research ethics related to usage of artificial intelligence tools. J Korean Assoc Oral Maxillofac Surg. 2023;49:105-106. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 9] [Reference Citation Analysis (0)] |
6. | Eysenbach G. The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers. JMIR Med Educ. 2023;9:e46885. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 18] [Cited by in RCA: 224] [Article Influence: 112.0] [Reference Citation Analysis (0)] |
7. | Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 467] [Reference Citation Analysis (0)] |
8. | Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of gpt-4 on medical challenge problems. 2023 Preprint. Available from: arXiv:230313375. [DOI] [Full Text] |
9. | Clark SC. Can ChatGPT transform cardiac surgery and heart transplantation? J Cardiothorac Surg. 2024;19:108. [RCA] [PubMed] [DOI] [Full Text] [Reference Citation Analysis (0)] |
10. | Rozenberg D, Singer LG. Predicting outcomes in lung transplantation: From tea leaves to ChatGPT. J Heart Lung Transplant. 2023;42:905-907. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Reference Citation Analysis (0)] |
11. | Endo Y, Sasaki K, Moazzam Z, Lima HA, Schenk A, Limkemann A, Washburn K, Pawlik TM. Quality of ChatGPT Responses to Questions Related To Liver Transplantation. J Gastrointest Surg. 2023;27:1716-1719. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 6] [Reference Citation Analysis (0)] |
12. | Akabane M, Iwadoh K, Melcher ML, Sasaki K. Exploring the potential of ChatGPT in generating unknown clinical questions about liver transplantation: A feasibility study. Liver Transpl. 2024;30:229-234. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1] [Reference Citation Analysis (0)] |
13. | Rawashdeh B, Kim J, AlRyalat SA, Prasad R, Cooper M. ChatGPT and Artificial Intelligence in Transplantation Research: Is It Always Correct? Cureus. 2023;15:e42150. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 9] [Reference Citation Analysis (0)] |
14. | Hricik D. Multiple choice questions. In: Primer on Transplantation. 3rd ed. Hoboken: Wiley Online Library, 2011. [DOI] [Full Text] |
15. | Clavien PA, Trotter JF. Multiple Choice Questions. In: Medical Care of the Liver Transplant Patient. 4th ed. Hoboken: Wiley Online Library, 2012. [DOI] [Full Text] |
16. | Transplant Hepatology Board Review Course Practice Module Supplement QUESTIONS 1. [cited 15 October 2024]. Available from: https://6443bb74ef7c532515d0-3858179a21f8875f9590fc888a54448a.ssl.cf2.rackcdn.com/aasld_f27253d3ef5b93f482a4d5b239a79a86.pdf. |
17. | Aziz F, Parajuli S. Complications in Kidney Transplantation: A Case-Based Guide to Management. Cham: Springer Cham, 2022. [DOI] [Full Text] |
18. | MSD Manual Professional Version. [cited 15 October 2024]. Available from: https://www.msdmanuals.com/en-gb/professional. |
19. | The Transplantation Society. IPTA Question Bank. [cited 15 October 2024]. Available from: https://tts.org/91-uncategorised/ipta/ipta-resources/144-ipta-question-bank. |
20. | United States Medical Licensing Examination. Step 1 Exam Content | USMLE. [cited 15 October 2024]. Available from: https://www.usmle.org/step-exams/step-1/step-1-exam-content. |
21. | Alharbi A, Al Turki MS, Aloudah N, Alsaad KO. Incidental Eosinophilic Chromophobe Renal Cell Carcinoma in Renal Allograft. Case Rep Transplant. 2017;2017:4232474. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 3] [Article Influence: 0.4] [Reference Citation Analysis (0)] |
22. | Rubin R. Case Studies. Transplantation. 2007;84:S15-S16. [RCA] [DOI] [Full Text] [Reference Citation Analysis (0)] |
23. | Molina-Andújar A, Montagud-Marrahí E, Cucchiari D, Ventura-Aguiar P, De Sousa-Amorim E, Revuelta I, Cofan F, Solé M, García-Herrera A, Diekmann F, Poch E, Quintana LF. Postinfectious Acute Glomerulonephritis in Renal Transplantation: An Emergent Aetiology of Renal Allograft Loss. Case Rep Transplant. 2019;2019:7438254. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1] [Cited by in RCA: 1] [Article Influence: 0.2] [Reference Citation Analysis (0)] |
24. | Baker S, Popescu M, Akoh JA. Rupture of renal transplant. Case Rep Transplant. 2015;2015:686584. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Reference Citation Analysis (0)] |
25. | Gewehr P, Jung B, Aquino V, Manfro RC, Spuldaro F, Rosa RG, Goldani LZ. Sporotrichosis in renal transplant patients. Can J Infect Dis Med Microbiol. 2013;24:e47-e49. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 14] [Cited by in RCA: 16] [Article Influence: 1.5] [Reference Citation Analysis (0)] |
26. | Vassallo D, Husain MM, Greer S, McGrath S, Ijaz S, Kanigicherla D. Hepatitis e infection in a renal transplant recipient. Case Rep Nephrol. 2014;2014:865471. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Reference Citation Analysis (0)] |
27. | Olsen SR, Bhutani M. Multiple cavitating nodules in a renal transplant recipient. Can Respir J. 2009;16:195-197. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2] [Cited by in RCA: 3] [Article Influence: 0.2] [Reference Citation Analysis (0)] |
28. | Allam SR, Sankarapandian B, Memon IA, Nef PC, Livingston TS, Rofaiel G. Biopsy Induced Arteriovenous Fistula and Venous Stenosis in a Renal Transplant. Case Rep Nephrol. 2015;2015:313610. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Reference Citation Analysis (0)] |
29. | Subramanian JB, Siddiqui F, Chotai PN, Al-Adwan Y, Rajab A, Washburn K, Schenk AD, Limkemann AJ, Luttrull M, Al-Ebrahim M, Bumgardner G, Singh N. Spinal Stroke following Kidney Transplant. Case Rep Transplant. 2022;2022:2058600. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Reference Citation Analysis (0)] |
30. | Ainsworth CD, Crowther MA, Treleaven D, Evanovitch D, Webert KE, Blajchman MA. Severe hemolytic anemia post-renal transplantation produced by donor anti-D passenger lymphocytes: case report and literature review. Transfus Med Rev. 2009;23:155-159. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 30] [Cited by in RCA: 25] [Article Influence: 1.6] [Reference Citation Analysis (0)] |
31. | Okeke R, Lok J, Wells R, Wycoff M, Engelhardt A, Bettag J, O'Leary C, Hallcox T, Nazzal M. Catastrophic Antiphospholipid Syndrome after Orthotopic Liver Transplant. Case Rep Transplant. 2022;2022:6209300. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Reference Citation Analysis (0)] |
32. | Eubank TA, Mobley CM, Moaddab M, Hobeika MJ, O'Neal M, Musick WL, Knight JM, Galati JS, Kodali S, Shetty A, Victor DW 3rd, Saharia A, Ghobrial RM, Grimes KA. Successful Treatment of Invasive Mucormycosis in Orthotopic Liver Transplant Population. Case Rep Transplant. 2021;2021:8667589. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Reference Citation Analysis (0)] |
33. | Kim E, Adeel A, Bozorgzadeh A, Amano S, Barry CT, Daly JS, Devuni D, Elaba Z, Houk L, Martins PN, Movahedi B, Ramanathan M, Theodoropoulos NM. Treatment of Acute Graft-versus-Host Disease in Liver Transplant Recipients. Case Rep Transplant. 2021;2021:8981429. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in RCA: 1] [Reference Citation Analysis (0)] |
34. | Ramírez de la Piscina P, Duca I, Estrada S, Calderón R, Ganchegui I, Campos A, Spicakova K, Urtasun L, Salvador M, Delgado E, Bengoa R, García-Campos F. Combined liver and kidney transplant in a patient with budd-Chiari syndrome secondary to autosomal dominant polycystic kidney disease associated with polycystic liver disease: report of a case with a 9-year follow-up. Case Rep Gastrointest Med. 2014;2014:585291. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3] [Cited by in RCA: 3] [Article Influence: 0.3] [Reference Citation Analysis (0)] |
35. | Arstikyte K, Vitkute G, Traskaite-Juskeviciene V, Macas A. Disseminated intravascular coagulation following air embolism during orthotropic liver transplantation: is this just a coincidence? BMC Anesthesiol. 2021;21:264. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 3] [Cited by in RCA: 2] [Article Influence: 0.5] [Reference Citation Analysis (0)] |
36. | Aucejo F, Winans C, Henderson JM, Vogt D, Eghtesad B, Fung JJ, Sands M, Miller CM. Isolated right hepatic vein obstruction after piggyback liver transplantation. Liver Transpl. 2006;12:808-812. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 12] [Cited by in RCA: 13] [Article Influence: 0.7] [Reference Citation Analysis (0)] |
37. | Ichimura K, Kawamura N, Goto R, Watanabe M, Ganchiku Y, Shimamura T, Taketomi A. Living Donor Liver Transplantation for Hepatic Venoocclusive Disease/Sinusoidal Obstruction Syndrome Originating from Hematopoietic Stem Cell Transplantation. Case Rep Transplant. 2022;2022:8361769. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Reference Citation Analysis (0)] |
38. | Trevizoli NC, Obeid EJ, Romeres SGB, Oliveira CAM, Rocha HC, Carvalho-Louro DM, Arantes Ferreira GS, De Campos PB, Ullmann RFB, Figueira AVF, Diaz LGG, Jorge FMF, Caja GON, Bortoli ZB, Watanabe ALC. Liver Transplant and Active Ulcerative Colitis: A Case Report. Transplant Proc. 2022;54:1361-1364. [RCA] [PubMed] [DOI] [Full Text] [Reference Citation Analysis (0)] |
39. | Mankowski MA, Jaffe IS, Xu J, Bae S, Oermann EK, Aphinyanaphongs Y, McAdams-DeMarco MA, Lonze BE, Orandi BJ, Stewart D, Levan M, Massie A, Gentry S, Segev DL. ChatGPT Solving Complex Kidney Transplant Cases: A Comparative Study With Human Respondents. Clin Transplant. 2024;38:e15466. [RCA] [PubMed] [DOI] [Full Text] [Reference Citation Analysis (0)] |
40. | Christou CD, Tsoulfas G. Challenges involved in the application of artificial intelligence in gastroenterology: The race is on! World J Gastroenterol. 2023;29:6168-6178. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Reference Citation Analysis (0)] |
41. | Price WN 2nd, Gerke S, Cohen IG. Potential Liability for Physicians Using Artificial Intelligence. JAMA. 2019;322:1765-1766. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 147] [Cited by in RCA: 214] [Article Influence: 35.7] [Reference Citation Analysis (0)] |
42. | Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L. Explaining Explanations: An Overview of Interpretability of Machine Learning. 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA); Turin, Italy. New York: IEEE, 2018: 80-89. [DOI] [Full Text] |
43. | Christou CD, Tsoulfas G. Role of three-dimensional printing and artificial intelligence in the management of hepatocellular carcinoma: Challenges and opportunities. World J Gastrointest Oncol. 2022;14:765-793. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in CrossRef: 1] [Reference Citation Analysis (2)] |