Ramasubramanian S, Balaji S, Kannan T, Jeyaraman N, Sharma S, Migliorini F, Balasubramaniam S, Jeyaraman M. Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study. World J Methodol 2024; 14(4): 92802 [PMID: 39712564 DOI: 10.5662/wjm.v14.i4.92802]
Corresponding Author of This Article
Madhan Jeyaraman, MS, PhD, Assistant Professor, Research Associate, Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Velappanchavadi, Chennai 600077, Tamil Nadu, India.madhanjeyaraman@gmail.com
Research Domain of This Article
Computer Science, Artificial Intelligence
Article-Type of This Article
Observational Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
World J Methodol. Dec 20, 2024; 14(4): 92802 Published online Dec 20, 2024. doi: 10.5662/wjm.v14.i4.92802
Table 1 Total count and percentage of 'Yes' responses for each Large Language Model
System
Neutral
No
Yes
Total
GPT 4
1
74
387 (83.77)
462
GPT 3.5
5
101
356 (77.06)
462
Bard
129
80
253 (54.76)
462
Table 2 Weighted accuracy comparison across the Large Language Model
Model
Weighted accuracy
ChatGPT 4
0.6775
ChatGPT 3.5
0.5519
Bard
0.3745
Table 3 Disease accuracy comparison
Name of disease
ChatGPT 4
ChatGPT 3.5
Bard
Acromegaly
1.0
1.0
1.0
Orthostatic hypotension
1.0
1.0
0.0
Myasthenia gravis
1.0
1.0
0.5
Myoclonus
1.0
1.0
1.0
Myotonic dystrophy
1.0
-1.0
1.0
Neonatal onset multisystem inflammatory disease
1.0
1.0
1.0
Neoplastic spinal cord compression
1.0
1.0
1.0
Nephrolithiasis
1.0
1.0
1.0
Neurological infections
1.0
1.0
0.0
Neuromyelitis optica
1.0
0.0
1.0
Thiamine deficiency
-1.0
1.0
1.0
Anaphylactic reaction
-1.0
-1.0
0.0
Reactive arthritis
-1.0
-1.0
0.0
Fibrous dysplasia
-1.0
-1.0
1.0
Hypothyroidism
-1.0
-1.0
-1.0
Multiple sclerosis
-1.0
-1.0
-1.0
Hypophosphatemia
-1.0
-1.0
1.0
Hypomagnesemia
-1.0
-1.0
0.0
Alcohol intoxication
-1.0
-1.0
0.0
Post-concussive state
-1.0
-1.0
0.0
Table 4 Detailed accuracy values for each organ system across the three Large Language Model
Organ system
ChatGPT 4
ChatGPT 3.5
Bard
Cardio vascular system, respiratory system
1.0000
0.6667
0.6667
Hematology
1.0000
1.0000
-1.0000
Respiratory
1.0000
0.3333
0.3333
Respiratory system
1.0000
1.0000
0.5000
Infectious diseases
0.8039
0.7451
0.2059
Immune system
0.6752
0.4188
0.2650
Central nervous system
0.6585
0.6220
0.5610
Hematological malignancies
0.6429
0.5714
0.4286
Cardio vascular system
0.6000
0.6667
0.3333
Endocrine system
0.5556
0.4444
0.5714
Renal
0.5556
0.3704
0.5185
Gastrointestinal tract
0.5385
0.2308
0.2308
Citation: Ramasubramanian S, Balaji S, Kannan T, Jeyaraman N, Sharma S, Migliorini F, Balasubramaniam S, Jeyaraman M. Comparative evaluation of artificial intelligence systems' accuracy in providing medical drug dosages: A methodological study. World J Methodol 2024; 14(4): 92802