BPG is committed to discovery and dissemination of knowledge
Minireviews
Copyright: ©Author(s) 2026.
World J Methodol. Jun 20, 2026; 16(2): 115059
Published online Jun 20, 2026. doi: 10.5662/wjm.v16.i2.115059
Table 1 Selected biases in the diagnosis of Cushing’s syndrome and their corresponding mitigation strategies
Type of bias
Mechanism of action in CS context
Clinical impact/example
Ref.
Potential mitigation strategies
Anchoring/intrinsic cognitive biasOverreliance on early, salient information (heuristics) that suggest a common diagnosis (e.g., obesity or metabolic syndrome). LLMs are misled by case-intrinsic biasing information (SDFs)Delayed diagnosis of true CS because symptoms are anchored to common benign conditions (e.g., “just obesity” or “pseudo-Cushing”). LLM accuracy declines when distracting features are present[14,15,24]Utilize LLM self-reflection or sequential prompting frameworks to challenge initial impressions and improve accuracy[14,24]
Spectrum bias/effectTraining data derived from highly specialized referral centers, skew the spectrum toward severe or advanced cases. Performance is overestimated compared to general practice populationsDiagnostic algorithms report inflated accuracy metrics when applied in diverse community settings where presentation overlaps heavily with pseudo-Cushing states[4,5,23]Require inclusion of representative cohorts across the full clinical spectrum and report results via subgroup analysis based on disease severity[23]
Exclusion/demographic bias Exclusion of demographic factors (e.g., gender), which may be statistically irrelevant in model optimization, ignores their clinical relevance and association with diagnostic delays in real-world practiceAn ML model for CS diagnosis excluded sex due to low statistical association in the training dataset[14], potentially failing to perform optimally for female subgroups who already face provider bias/stigma[30][4,11,30]Employ mathematical modeling or stratification to control for demographic confounders[37]. Use adversarial debiasing or reweighting techniques to ensure equitable treatment across demographic groups[18,41]
Measurement bias (methodological)The variability in laboratory methods (e.g., immunoassays vs LC-MS/MS for cortisol) used across different training centers, compromises eventual model transferability and predictive stabilityA model developed using non-standardized immunoassay data from a single center[4] may perform poorly when used in a clinic relying on mass spectrometry, as results are not standardized[4,23]Demand transparency regarding data acquisition protocols and device/software versions used (STARD-AI items 13 and 14)[41]. Ensure dataset diversity from multiple centers with stringent protocols
Small sample size/class imbalanceRelying on limited samples for rare subtypes (e.g., EAS) affects model robustness and reproducibility. Reliance on simple oversampling (SMOTE) may bias accuracyThe differential diagnosis model for ACTH-dependent CS included only 26 EAS patients, limiting robustness and generalizability[8][4,8]Use collaborative learning techniques across multiple centers to pool data while maintaining privacy and security[8]. Conduct multi-center, collaborative trials to achieve larger, more diverse sample sizes[4,37]