BPG is committed to discovery and dissemination of knowledge
Review
Copyright ©The Author(s) 2025.
World J Radiol. Nov 28, 2025; 17(11): 114754
Published online Nov 28, 2025. doi: 10.4329/wjr.v17.i11.114754
Table 1 Methodological comparison of standard large language models and large concept models
Feature
LLMs
LCMs
Level of abstractionToken-level prediction (word/sub word)Concept-level prediction (sentence/idea)
Input representationProcesses individual tokens, language-specificUses sentence embeddings, language-agnostic
Reasoning and planningFocuses on local predictions, lacks structured reasoningExplicitly models hierarchical reasoning and structured planning
Zero-shot generalizationRequires fine-tuning for new tasks/LanguagesStrong zero-shot learning across languages and modalities
Architectural modularityMonolithic transformer, hard to modifyModular design, allows easy extension and updates
Table 2 Key limitations of artificial intelligence in radiology: From machine learning to large language models
Category
Description
Ref.
Data requirementsAI models (ML, DL, LLMs) require vast amounts of high-quality, annotated data, which is scarce in the medical domain. Privacy concerns and the cost of data acquisition and annotation are significant barrierNadkarni and Merchant[19], 2022
Hager et al[52], 2024
Variability and biasDifferences in imaging protocols, scanner types, and patient demographics can reduce model robustness. Training on biased datasets can perpetuate and even amplify clinical disparitiesMarcus et al[79], 2023
Chen et al[80], 2021
Guo et al[81], 2024
Incorrectness and hallucinationsLLMs, in particular, may produce outputs that are factually inaccurate or fabricated. This is a critical issue in high-stakes clinical
    scenarios where accuracy is paramount
Guo et al[81], 2024
Olabiyi et al[17], 2025
Floridi et al[88], 2018
Bajaj S et al[6], 2024
Limited contextual uderstandingLLMs operate at a token level and struggle with long-range dependencies, abstract reasoning, and integrating non-linguistic data. This leads to outputs that are often superficial and lack the depth of a human physician’s diagnostic reasoningHendrycks et al[86], 2021
Marcus et al[79], 2023
Polonski et al[96], 2018
Bender et al[87], 2021
Najjar et al[18], 2023
Jiang et al[57], 2023
Nam et al[31], 2025
Grandison[64], 2025
Vaswani et al[30], 2017
Lack of interpretability and trustMany AI models function as “black boxes”, providing no insight into their decision- making process. This opacity undermines trust among clinicians and poses a significant hurdle to clinical adoption and patient safetyHendrycks et al[86], 2021
Bender et al[87], 2021
Performance in rare diseasesAI performance often deteriorates in rare conditions, atypical presentations, and underrepresented demographics. These “edge cases” demand nuanced reasoning that goes beyond narrow pattern recognitionBusch et al[43], 2025
Ouyang et al[84], 2025
Atil et al[85], 2025
Obsolescence and adaptabilityThe rapid pace of innovation in radiology means models must continually adapt to new techniques and protocols. Without frequent
retraining and validation, AI systems can become obsolete, or their performance can degrade
Najjar et al[18], 2023
Clinical integration and medicolegalEmbedding AI into clinical workflowsNadkarni and Merchant[19], 2022
LiabilityRequires extensive infrastructure and training. If an AI error leads to patient harm, the question of legal liability between developers, institutions, and clinicians remains a significant, unresolved barrierHager et al[52], 2024
Soni et al[55], 2025
Computational and environmental costsTraining and running large-scale models like LLMs are resource-intensive, expensive, and have a high carbon footprint, which presents a sustainability challengeFaiz et al[90], 2024
The Institution of Engineering and Technology[92], 2024
Ren et al[93], 2024
Delusions of progressThe overestimation of current AI capabilities, often due to misleading metrics, can lead to a “delusion of progress” that results in flawed decision-making and misplaced trust in high-stakes clinical scenariosTopol[1], 2019