OpenEvidence performs at similar levels compared to current and previous GPT models on orthopedic training and education questions

doi:10.5312/wjo.v17.i6.118593

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 17, Issue 6

This Article

(13)

(14)

(0)

(185)

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Journal Information of This Article

Publication Name

World Journal of Orthopedics

ISSN

2218-5836

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Observational Study

Copyright: ©Author(s) 2026. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) license. No commercial re-use. See permissions. Published by Baishideng Publishing Group Inc.

World J Orthop. Jun 18, 2026; 17(6): 118593
Published online Jun 18, 2026. doi: 10.5312/wjo.v17.i6.118593

OpenEvidence performs at similar levels compared to current and previous GPT models on orthopedic training and education questions

Stuart T Guthrie, William M Hakeos, Joseph Hoegler, Lindsay M Maier, Annamarie Pantuso, Muhammad J Abbas, Colton Clymer, Alexander Driessche, Kashif Javid

Kashif Javid, Alexander Driessche, Colton Clymer, Muhammad J Abbas, Annamarie Pantuso, Lindsay M Maier, Joseph Hoegler, William M Hakeos, Stuart T Guthrie, Department of Orthopaedic Surgery, Henry Ford Health System, Detroit, MI 48202, United States

Author contributions: All authors contributed to the study conception and design. Javid K and Driessche A contributed to material preparation, data collection and analysis; Javid K, Driessche A, and Clymer C wrote the first draft of the manuscript; Abbas M, Pantuso A, Maier LM, Hoegler J, Hakeos WM, and Guthrie ST contributed to revisions; all authors commented on previous versions of the manuscript, read and approved the final manuscript.

Institutional review board statement: This study does not involve human or animal experiments and thus does not require an ethical document.

Informed consent statement: This study does not require an informed consent form.

Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.

STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement- checklist of items.

Data sharing statement: Not applicable.

Corresponding author: Kashif Javid, Department of Orthopaedic Surgery, Henry Ford Health System, 2799 W. Grand Blvd, Detroit, MI 48202, United States. kjavid1@hfhs.org

Received: January 7, 2026
Revised: February 6, 2026
Accepted: March 30, 2026
Published online: June 18, 2026
Processing time: 162 Days and 3.4 Hours

Core Tip

Core Tip: We evaluated the performance of contemporary large language models on orthopedic board-style questions, comparing ChatGPT-5 and OpenEvidence (OE), with the established GPT-4. Using a standardized orthopedic training exam question set, we found that ChatGPT-5 achieved the highest overall accuracy and consistently outperformed prior models across subspecialties and question formats. OE performed comparably to GPT-4 across multiple fields. All models demonstrated reduced accuracy on image-based questions, highlighting persistent limitations in visual interpretation. We assert that OE is a reputable addition to the tools available to orthopedists. The added benefit of training drawn from peer-reviewed literature adds to its potential value.