Copyright: ©Author(s) 2026.
World J Methodol. Jun 20, 2026; 16(2): 115059
Published online Jun 20, 2026. doi: 10.5662/wjm.v16.i2.115059
Published online Jun 20, 2026. doi: 10.5662/wjm.v16.i2.115059
Table 1 Selected biases in the diagnosis of Cushing’s syndrome and their corresponding mitigation strategies
| Type of bias | Mechanism of action in CS context | Clinical impact/example | Ref. | Potential mitigation strategies |
| Anchoring/intrinsic cognitive bias | Overreliance on early, salient information (heuristics) that suggest a common diagnosis (e.g., obesity or metabolic syndrome). LLMs are misled by case-intrinsic biasing information (SDFs) | Delayed diagnosis of true CS because symptoms are anchored to common benign conditions (e.g., “just obesity” or “pseudo-Cushing”). LLM accuracy declines when distracting features are present | [14,15,24] | Utilize LLM self-reflection or sequential prompting frameworks to challenge initial impressions and improve accuracy[14,24] |
| Spectrum bias/effect | Training data derived from highly specialized referral centers, skew the spectrum toward severe or advanced cases. Performance is overestimated compared to general practice populations | Diagnostic algorithms report inflated accuracy metrics when applied in diverse community settings where presentation overlaps heavily with pseudo-Cushing states | [4,5,23] | Require inclusion of representative cohorts across the full clinical spectrum and report results via subgroup analysis based on disease severity[23] |
| Exclusion/demographic bias | Exclusion of demographic factors (e.g., gender), which may be statistically irrelevant in model optimization, ignores their clinical relevance and association with diagnostic delays in real-world practice | An ML model for CS diagnosis excluded sex due to low statistical association in the training dataset[14], potentially failing to perform optimally for female subgroups who already face provider bias/stigma[30] | [4,11,30] | Employ mathematical modeling or stratification to control for demographic confounders[37]. Use adversarial debiasing or reweighting techniques to ensure equitable treatment across demographic groups[18,41] |
| Measurement bias (methodological) | The variability in laboratory methods (e.g., immunoassays vs LC-MS/MS for cortisol) used across different training centers, compromises eventual model transferability and predictive stability | A model developed using non-standardized immunoassay data from a single center[4] may perform poorly when used in a clinic relying on mass spectrometry, as results are not standardized | [4,23] | Demand transparency regarding data acquisition protocols and device/software versions used (STARD-AI items 13 and 14)[41]. Ensure dataset diversity from multiple centers with stringent protocols |
| Small sample size/class imbalance | Relying on limited samples for rare subtypes (e.g., EAS) affects model robustness and reproducibility. Reliance on simple oversampling (SMOTE) may bias accuracy | The differential diagnosis model for ACTH-dependent CS included only 26 EAS patients, limiting robustness and generalizability[8] | [4,8] | Use collaborative learning techniques across multiple centers to pool data while maintaining privacy and security[8]. Conduct multi-center, collaborative trials to achieve larger, more diverse sample sizes[4,37] |
- Citation: Savvidis C, Liakopoulos C, Ilias I. Biases of large language models in diagnosing Cushing’s syndrome. World J Methodol 2026; 16(2): 115059
- URL: https://www.wjgnet.com/2222-0682/full/v16/i2/115059.htm
- DOI: https://dx.doi.org/10.5662/wjm.v16.i2.115059
