Observational Study
Copyright ©The Author(s) 2024.
World J Methodol. Dec 20, 2024; 14(4): 92802
Published online Dec 20, 2024. doi: 10.5662/wjm.v14.i4.92802
Table 1 Total count and percentage of 'Yes' responses for each Large Language Model
System
Neutral
No
Yes
Total
GPT 4174387 (83.77)462
GPT 3.55101356 (77.06)462
Bard12980253 (54.76)462
Table 2 Weighted accuracy comparison across the Large Language Model
Model
Weighted accuracy
ChatGPT 40.6775
ChatGPT 3.50.5519
Bard0.3745
Table 3 Disease accuracy comparison
Name of disease
ChatGPT 4
ChatGPT 3.5
Bard
Acromegaly1.01.01.0
Orthostatic hypotension1.01.00.0
Myasthenia gravis1.01.00.5
Myoclonus1.01.01.0
Myotonic dystrophy1.0-1.01.0
Neonatal onset multisystem inflammatory disease1.01.01.0
Neoplastic spinal cord compression1.01.01.0
Nephrolithiasis1.01.01.0
Neurological infections1.01.00.0
Neuromyelitis optica1.00.01.0
Thiamine deficiency-1.01.01.0
Anaphylactic reaction-1.0-1.00.0
Reactive arthritis-1.0-1.00.0
Fibrous dysplasia-1.0-1.01.0
Hypothyroidism-1.0-1.0-1.0
Multiple sclerosis-1.0-1.0-1.0
Hypophosphatemia-1.0-1.01.0
Hypomagnesemia-1.0-1.00.0
Alcohol intoxication-1.0-1.00.0
Post-concussive state-1.0-1.00.0
Table 4 Detailed accuracy values for each organ system across the three Large Language Model
Organ system
ChatGPT 4
ChatGPT 3.5
Bard
Cardio vascular system, respiratory system1.00000.66670.6667
Hematology1.00001.0000-1.0000
Respiratory1.00000.33330.3333
Respiratory system1.00001.00000.5000
Infectious diseases0.80390.74510.2059
Immune system0.67520.41880.2650
Central nervous system0.65850.62200.5610
Hematological malignancies0.64290.57140.4286
Cardio vascular system0.60000.66670.3333
Endocrine system0.55560.44440.5714
Renal0.55560.37040.5185
Gastrointestinal tract0.53850.23080.2308