Javid K, Driessche A, Clymer C, Abbas MJ, Pantuso A, Maier LM, Hoegler J, Hakeos WM, Guthrie ST. OpenEvidence performs at similar levels compared to current and previous GPT models on orthopedic training and education questions. World J Orthop 2026; 17(6): 118593 [DOI: 10.5312/wjo.v17.i6.118593]
Corresponding Author of This Article
Kashif Javid, Department of Orthopaedic Surgery, Henry Ford Health System, 2799 W. Grand Blvd, Detroit, MI 48202, United States. kjavid1@hfhs.org
Research Domain of This Article
Orthopedics
Article-Type of This Article
research-article
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA
Share the Article
Javid K, Driessche A, Clymer C, Abbas MJ, Pantuso A, Maier LM, Hoegler J, Hakeos WM, Guthrie ST. OpenEvidence performs at similar levels compared to current and previous GPT models on orthopedic training and education questions. World J Orthop 2026; 17(6): 118593 [DOI: 10.5312/wjo.v17.i6.118593]
World J Orthop. Jun 18, 2026; 17(6): 118593 Published online Jun 18, 2026. doi: 10.5312/wjo.v17.i6.118593
OpenEvidence performs at similar levels compared to current and previous GPT models on orthopedic training and education questions
Kashif Javid, Alexander Driessche, Colton Clymer, Muhammad J Abbas, Annamarie Pantuso, Lindsay M Maier, Joseph Hoegler, William M Hakeos, Stuart T Guthrie
Kashif Javid, Alexander Driessche, Colton Clymer, Muhammad J Abbas, Annamarie Pantuso, Lindsay M Maier, Joseph Hoegler, William M Hakeos, Stuart T Guthrie, Department of Orthopaedic Surgery, Henry Ford Health System, Detroit, MI 48202, United States
Author contributions: All authors contributed to the study conception and design. Javid K and Driessche A contributed to material preparation, data collection and analysis; Javid K, Driessche A, and Clymer C wrote the first draft of the manuscript; Abbas M, Pantuso A, Maier LM, Hoegler J, Hakeos WM, and Guthrie ST contributed to revisions; all authors commented on previous versions of the manuscript, read and approved the final manuscript.
Institutional review board statement: This study does not involve human or animal experiments and thus does not require an ethical document.
Informed consent statement: This study does not require an informed consent form.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement- checklist of items.
Data sharing statement: Not applicable.
Corresponding author: Kashif Javid, Department of Orthopaedic Surgery, Henry Ford Health System, 2799 W. Grand Blvd, Detroit, MI 48202, United States. kjavid1@hfhs.org
Received: January 7, 2026 Revised: February 6, 2026 Accepted: March 30, 2026 Published online: June 18, 2026 Processing time: 162 Days and 3.4 Hours
Core Tip
Core Tip: We evaluated the performance of contemporary large language models on orthopedic board-style questions, comparing ChatGPT-5 and OpenEvidence (OE), with the established GPT-4. Using a standardized orthopedic training exam question set, we found that ChatGPT-5 achieved the highest overall accuracy and consistently outperformed prior models across subspecialties and question formats. OE performed comparably to GPT-4 across multiple fields. All models demonstrated reduced accuracy on image-based questions, highlighting persistent limitations in visual interpretation. We assert that OE is a reputable addition to the tools available to orthopedists. The added benefit of training drawn from peer-reviewed literature adds to its potential value.