Biases of large language models in diagnosing Cushing’s syndrome

doi:10.5662/wjm.v16.i2.115059

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 16, Issue 2

This Article

(27)

(21)

(0)

(11)

(627)

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Journal Information of This Article

Publication Name

World Journal of Methodology

ISSN

2222-0682

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Minireviews

Copyright: ©Author(s) 2026. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) license. No commercial re-use. See permissions. Published by Baishideng Publishing Group Inc.

World J Methodol. Jun 20, 2026; 16(2): 115059
Published online Jun 20, 2026. doi: 10.5662/wjm.v16.i2.115059

Table 1 Selected biases in the diagnosis of Cushing’s syndrome and their corresponding mitigation strategies

Type of bias	Mechanism of action in CS context	Clinical impact/example	Ref.	Potential mitigation strategies
Anchoring/intrinsic cognitive bias	Overreliance on early, salient information (heuristics) that suggest a common diagnosis (e.g., obesity or metabolic syndrome). LLMs are misled by case-intrinsic biasing information (SDFs)	Delayed diagnosis of true CS because symptoms are anchored to common benign conditions (e.g., “just obesity” or “pseudo-Cushing”). LLM accuracy declines when distracting features are present	[14,15,24]	Utilize LLM self-reflection or sequential prompting frameworks to challenge initial impressions and improve accuracy[14,24]
Spectrum bias/effect	Training data derived from highly specialized referral centers, skew the spectrum toward severe or advanced cases. Performance is overestimated compared to general practice populations	Diagnostic algorithms report inflated accuracy metrics when applied in diverse community settings where presentation overlaps heavily with pseudo-Cushing states	[4,5,23]	Require inclusion of representative cohorts across the full clinical spectrum and report results via subgroup analysis based on disease severity[23]
Exclusion/demographic bias	Exclusion of demographic factors (e.g., gender), which may be statistically irrelevant in model optimization, ignores their clinical relevance and association with diagnostic delays in real-world practice	An ML model for CS diagnosis excluded sex due to low statistical association in the training dataset[14], potentially failing to perform optimally for female subgroups who already face provider bias/stigma[30]	[4,11,30]	Employ mathematical modeling or stratification to control for demographic confounders[37]. Use adversarial debiasing or reweighting techniques to ensure equitable treatment across demographic groups[18,41]
Measurement bias (methodological)	The variability in laboratory methods (e.g., immunoassays vs LC-MS/MS for cortisol) used across different training centers, compromises eventual model transferability and predictive stability	A model developed using non-standardized immunoassay data from a single center[4] may perform poorly when used in a clinic relying on mass spectrometry, as results are not standardized	[4,23]	Demand transparency regarding data acquisition protocols and device/software versions used (STARD-AI items 13 and 14)[41]. Ensure dataset diversity from multiple centers with stringent protocols
Small sample size/class imbalance	Relying on limited samples for rare subtypes (e.g., EAS) affects model robustness and reproducibility. Reliance on simple oversampling (SMOTE) may bias accuracy	The differential diagnosis model for ACTH-dependent CS included only 26 EAS patients, limiting robustness and generalizability[8]	[4,8]	Use collaborative learning techniques across multiple centers to pool data while maintaining privacy and security[8]. Conduct multi-center, collaborative trials to achieve larger, more diverse sample sizes[4,37]

CS: Cushing’s syndrome; LLM: Large language model; SDF: Salient distracting features; LC-MS/MS: Liquid chromatography-tandem mass spectrometry; EAS: Ectopic adrenocorticotropic hormone secretion; SMOTE: Synthetic minority oversampling technique; ML: Machine learning; ACTH: Adrenocorticotropic hormone; STARD: Standards for Reporting Diagnostic Accuracy Studies; AI: Artificial intelligence.

Full Size Table

Citation: Savvidis C, Liakopoulos C, Ilias I. Biases of large language models in diagnosing Cushing’s syndrome. World J Methodol 2026; 16(2): 115059
URL: https://www.wjgnet.com/2222-0682/full/v16/i2/115059.htm
DOI: https://dx.doi.org/10.5662/wjm.v16.i2.115059

Savvidis C, Liakopoulos C, Ilias I. Biases of large language models in diagnosing Cushing’s syndrome. World J Methodol 2026; 16(2): 115059 [PMID: 42058814 DOI: 10.5662/wjm.v16.i2.115059]

All content on this site: Copyright © 1993-2026 Baishideng Publishing Group Inc, its licensors, and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies. For all open access content, the relevant licensing terms apply.