Exploring the performance of large language models on hepatitis B infection-related questions: A comparative study

doi:10.3748/wjg.v31.i3.101092

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 31, Issue 3

This Article

(0) (0) (0)

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (4370)

All Articles published online

Item

Count

PDF

139

HTML

2094

Figures (1-1)

497

Tables (1-5)

520

Sum=3250

Publishing Process of This Article

Item

Count

Browse

168

Download

721

Sum=889

Jan 21, 2025 (publication date) through May 20, 2026

Times Cited of This Article

Times Cited (7)

Journal Information of This Article

Publication Name

World Journal of Gastroenterology

ISSN

1007-9327

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Basic Study

World J Gastroenterol. Jan 21, 2025; 31(3): 101092
Published online Jan 21, 2025. doi: 10.3748/wjg.v31.i3.101092

Exploring the performance of large language models on hepatitis B infection-related questions: A comparative study

Yu Li, Chen-Kai Huang, Yi Hu, Xiao-Dong Zhou, Cong He, Jia-Wei Zhong

Yu Li, Chen-Kai Huang, Yi Hu, Xiao-Dong Zhou, Cong He, Jia-Wei Zhong, Department of Gastroenterology, Jiangxi Provincial Key Laboratory of Digestive Diseases, Jiangxi Clinical Research Center for Gastroenterology, Digestive Disease Hospital, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, Jiangxi Province, China

Yu Li, HuanKui Academy, Nanchang University, Nanchang 330006, Jiangxi Province, China

Co-first authors: Yu Li and Chen-Kai Huang.

Co-corresponding authors: Cong He and Jia-Wei Zhong.

Author contributions: Li Y, Huang CK, Hu Y, and Zhou XD performed the data acquisition and statistical analysis; Li Y and Huang CK contributed equally as co-first author; Li Y wrote the manuscript; He C and Zhong JW designed the study and revised the manuscript, they contributed equally as co-corresponding authors; and all authors read and approved the final manuscript.

Supported by National Natural Science Foundation of China, No. 82260133; the Key Laboratory Project of Digestive Diseases in Jiangxi Province, No. 2024SSY06101; and Jiangxi Clinical Research Center for Gastroenterology, No. 20223BCG74011.

Institutional review board statement: Institutional review board approval was not required for this study since it is an analysis of data and no patients or animals were affected by the study.

Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.

Data sharing statement: All data generated or analyzed during this study are included in the supplementary information files.

Corresponding author: Cong He, Associate Chief Physician, MD, Department of Gastroenterology, Jiangxi Provincial Key Laboratory of Digestive Diseases, Jiangxi Clinical Research Center for Gastroenterology, Digestive Disease Hospital, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, No. 17 Yong Waizheng Street, Nanchang 330006, Jiangxi Province, China. hecong.1987@163.com

Received: September 4, 2024
Revised: October 29, 2024
Accepted: December 3, 2024
Published online: January 21, 2025
Processing time: 106 Days and 21.9 Hours

Abstract

BACKGROUND

Patients with hepatitis B virus (HBV) infection require chronic and personalized care to improve outcomes. Large language models (LLMs) can potentially provide medical information for patients.

AIM

To examine the performance of three LLMs, ChatGPT-3.5, ChatGPT-4.0, and Google Gemini, in answering HBV-related questions.

METHODS

LLMs’ responses to HBV-related questions were independently graded by two medical professionals using a four-point accuracy scale, and disagreements were resolved by a third reviewer. Each question was run three times using three LLMs. Readability was assessed via the Gunning Fog index and Flesch-Kincaid grade level.

RESULTS

Overall, all three LLM chatbots achieved high average accuracy scores for subjective questions (ChatGPT-3.5: 3.50; ChatGPT-4.0: 3.69; Google Gemini: 3.53, out of a maximum score of 4). With respect to objective questions, ChatGPT-4.0 achieved an 80.8% accuracy rate, compared with 62.9% for ChatGPT-3.5 and 73.1% for Google Gemini. Across the six domains, ChatGPT-4.0 performed better in terms of diagnosis, whereas Google Gemini demonstrated excellent clinical manifestations. Notably, in the readability analysis, the mean Gunning Fog index and Flesch-Kincaid grade level scores of the three LLM chatbots were significantly higher than the standard level eight, far exceeding the reading level of the normal population.

CONCLUSION

Our results highlight the potential of LLMs, especially ChatGPT-4.0, for delivering responses to HBV-related questions. LLMs may be an adjunctive informational tool for patients and physicians to improve outcomes. Nevertheless, current LLMs should not replace personalized treatment recommendations from physicians in the management of HBV infection.

Keywords: ChatGPT-3.5; ChatGPT-4.0; Google Gemini; Hepatitis B infection; Accuracy

Core Tip: Hepatitis B virus (HBV) infection remains a global health problem that may cause chronic hepatitis, liver cirrhosis, or hepatocellular carcinoma. There is a notable trend among the public to acknowledge HBV-related information to improve outcomes. Artificial intelligence is a large language model that provides updated and helpful knowledge. Since the ChatGPT was developed by OpenAI, an increasing number of studies have explored its utility in responding to medical questions. This study evaluates and compares the abilities of OpenAI’s ChatGPT and Google’s Gemini in answering test questions concerning HBV using both subjective and objective metrics.