Zhou AL, Chiang JYH, Chan KS, Tan N, Shelat VG. Decoding Alexander the Great’s gastrointestinal cause of death using artificial wisdom: An artificial intelligence-human inquiry into a medical mystery. World J Gastroenterol 2025; 31(46): 111669 [DOI: 10.3748/wjg.v31.i46.111669]
Corresponding Author of This Article
An-Lai Zhou, MD, Doctor, Department of General Medicine, Tan Tock Seng Hospital, 11 Jalan Tan Tock Seng, Singapore City 308433, Singapore. zhouanlai99@gmail.com
Research Domain of This Article
Gastroenterology & Hepatology
Article-Type of This Article
Observational Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Dec 14, 2025 (publication date) through Dec 10, 2025
Times Cited of This Article
Times Cited (0)
Journal Information of This Article
Publication Name
World Journal of Gastroenterology
ISSN
1007-9327
Publisher of This Article
Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA
Share the Article
Zhou AL, Chiang JYH, Chan KS, Tan N, Shelat VG. Decoding Alexander the Great’s gastrointestinal cause of death using artificial wisdom: An artificial intelligence-human inquiry into a medical mystery. World J Gastroenterol 2025; 31(46): 111669 [DOI: 10.3748/wjg.v31.i46.111669]
World J Gastroenterol. Dec 14, 2025; 31(46): 111669 Published online Dec 14, 2025. doi: 10.3748/wjg.v31.i46.111669
Decoding Alexander the Great’s gastrointestinal cause of death using artificial wisdom: An artificial intelligence-human inquiry into a medical mystery
An-Lai Zhou, Department of General Medicine, Tan Tock Seng Hospital, Singapore City 308433, Singapore
An-Lai Zhou, Joelle Yee-Hui Chiang, Kai Siang Chan, Vishal G Shelat, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore City 308232, Singapore
Joelle Yee-Hui Chiang, Department of General Surgery, Singapore General Hospital, Singapore City 169608, Singapore
Kai Siang Chan, Vishal G Shelat, Yong Loo Lin School of Medicine, National University of Singapore, Singapore City 117597, Singapore
Nicole Tan, Faculty of Medicine, Monash University, Melbourne 3800, Victoria, Australia
Vishal G Shelat, Department of General Surgery, Tan Tock Seng Hospital, Singapore City 308433, Singapore
Co-first authors: An-Lai Zhou and Joelle Yee-Hui Chiang.
Author contributions: Zhou AL and Chiang JYH contributed to data acquisition, analysis and interpretation, and manuscript writing; Tan N contributed to data acquisition, analysis and interpretation, and manuscript drafting; Chan KS contributed to data analysis and interpretation, manuscript drafting, critical revisions, and final approval; Shelat VG conceived and designed the study, and contributed to data interpretation, critical revisions of the manuscript, and final approval.
Institutional review board statement: This study does not involve human participants or human data and hence do not require institutional review.
Informed consent statement: This study does not involve human participants or human data and hence do not require informed consent.
Conflict-of-interest statement: The authors have no conflicts of interest to declare.
STROBE statement: The authors have read the STROBE Statement—a checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-a checklist of items.
Data sharing statement: This project does not involve any data or its sharing with any parties or persons.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: An-Lai Zhou, MD, Doctor, Department of General Medicine, Tan Tock Seng Hospital, 11 Jalan Tan Tock Seng, Singapore City 308433, Singapore. zhouanlai99@gmail.com
Received: July 7, 2025 Revised: August 4, 2025 Accepted: October 27, 2025 Published online: December 14, 2025 Processing time: 156 Days and 17.8 Hours
Abstract
BACKGROUND
ChatGPT was developed in November 2022 with studies showing its impressive performance in academic examinations, serving as a promising tool to answer questions even on controversial topics. Artificial intelligence (AI) achieving surface-level performance does not necessarily equate to a deep understanding of human cognition. The development of artificial wisdom, therefore, necessitates a shift from simply mimicking intelligent behavior to modeling the underlying mechanisms of human wisdom, including emotional understanding, ethical considerations, and contextual awareness. Several theories exist behind the death of Alexander the Great, but no definitive conclusion has been made.
AIM
To evaluate whether a hybrid approach, combining generative AI (ChatGPT) with human clinical judgment, can meaningfully reassess the cause of death of Alexander the Great.
METHODS
This is a cross-sectional study using ChatGPT (version 4 Pro). A search was performed with search terms describing the symptoms experienced by Alexander the Great and possible causes of his death: West Nile virus (WNV) encephalitis, poisoning, acute pancreatitis due to excessive alcohol consumption, typhoid fever, and malaria. The historical data and symptomatology were analyzed, weighing evidence and context in a manner akin to human wisdom.
RESULTS
The most likely cause of death of Alexander the Great, as generated by ChatGPT, was typhoid fever complicated by Guillain-Barré syndrome (GBS). The hypothesis was based on the alignment between Alexander’s reported symptoms, such as prolonged high fever, severe abdominal pain, neurological decline, and the known clinical presentation of typhoid fever. However, after carefully reviewing the sources mentioned by ChatGPT, many did not back up the idea that typhoid caused GBS and instead pointed to Campylobacter jejuni as the more likely trigger. Other possible causes of death suggested by ChatGPT including acute pancreatitis from excessive alcohol consumption, infectious causes (WNV encephalitis, malaria), and poisoning were less likely.
CONCLUSION
While ChatGPT initially concluded typhoid fever with GBS as the most plausible cause of death, expert reappraisal of the sources and pathophysiology suggested that C. jejuni-associated GBS was more likely. This study exemplifies how incorporating AI’s pattern recognition with human scrutiny can yield responsible interpretations of historical records.
Core Tip: While ChatGPT is an impressive tool, its conclusions still require human validation. Although ChatGPT identified typhoid fever with Guillain-Barré syndrome (GBS) as the most likely cause of Alexander the Great’s death, cross-checking both the available literature and the sources cited by ChatGPT revealed that Campylobacter jejuni, a more common and well-established cause of GBS, better fits the clinical and historical presentation. This study shows how artificial wisdom, which combines artificial intelligence with human judgment, can support more contextually accurate interpretations in historical medical investigations.
Citation: Zhou AL, Chiang JYH, Chan KS, Tan N, Shelat VG. Decoding Alexander the Great’s gastrointestinal cause of death using artificial wisdom: An artificial intelligence-human inquiry into a medical mystery. World J Gastroenterol 2025; 31(46): 111669
Several fields, including automotives, medicine, education, and finance, have applied artificial intelligence (AI) sporadically over the years[1]. In recent years, AI has become increasingly accessible to individual users, particularly following the release of ChatGPT in 2022, which has since been widely used for problem-solving, organizing daily tasks, and content generation. In the medical context, ChatGPT has shown promise in areas such as streamlining workflows, providing accessible health education, generating diagnostic algorithms, and even improving healthcare documentation[2]. While AI’s potential has garnered much attention, critics have raised concerns about the dangers of relying blindly on generative AI for everyday decision-making and information retrieval, including the risk of accepting inaccurate outputs due to biases and stereotypes embedded in AI training data. Moreover, AI remains limited in its ability to feel empathy or compassion, engage in self-reflection, take perspective, or understand and appropriately respond to emotionally or morally complex situations factors, which may lead to inappropriate or context-insensitive conclusions[3-5]. In a field like medicine, where decisions often involve uncertainty and require context-sensitive interpretation, current AI systems may not be able to fully replicate human adaptability and contextual judgment. Additionally, ChatGPT’s ability to seem reasonable on the surface often misses deeper critical thinking about sources; therefore, it is essential to combine AI results with expert knowledge in the field. This combination leads to what is called “artificial wisdom” (AW), a mixed thinking approach that is becoming more important in looking back at diagnoses and investigating past events[6].
In 2020, Jeste et al[6] introduced the AW concept, a form of institutional wisdom. In this model, wisdom does not arise from a single AI system or human agent alone, but from a collaborative system that integrates AI’s ability to rapidly synthesize vast amounts of information with the human capacity for ethical reasoning, emotional intelligence, and contextual understanding.
Rather than offering answers, AW offers perspectives, which hold particular value in processing ambiguous or biased data, weighing the credibility of sources, and drawing context-sensitive, ethically restrained conclusions. This makes AW especially applicable to areas such as forensic medicine and retrospective historical investigations, where unresolved or unnatural deaths often involve complex and conflicting evidence. In such cases, conclusions must be drawn with care and humility. Figure 1 illustrates a conceptual flow diagram explaining how AI and human insight interplay to derive AW.
Figure 1 Conceptual flow of artificial wisdom.
AI: Artificial intelligence.
A relevant case where AW may be applied is the death of Alexander the Great, the ancient Macedonian ruler who became king at the age of 20 and famously never lost a battle. He died unexpectedly in 323 B.C. at the age of 32, marking the fall of his empire and the beginning of the Hellenistic period, during which Greek influence persisted[7]. To this day, the exact cause of his death remains medically unconfirmed and has sparked ongoing debate, fueled by conflicting historical accounts, the passage of time, and the absence of concrete evidence.
Historians such as Plutarch, Arrian, Adrienne Mayor, Richard Stoneman, and toxicologist Leo Schep have proposed various theories to explain his demise, ranging from poisoning and infectious diseases to congenital conditions[8]. Among modern hypotheses, Marr and Calisher[7] have suggested West Nile virus (WNV) encephalitis, based on epidemiological clues and symptomatology. However, this too remains speculative.
This study applies the framework of AW to a medical-historical investigation: The gastrointestinal cause of death of Alexander the Great. By combining ChatGPT’s outputs with expert clinical reasoning, we critically reassessed competing etiologies like typhoid, Campylobacter jejuni, malaria, pancreatitis, and poisoning in light of both symptomatology and source reliability. While ChatGPT excels at surface-level synthesis and pattern recognition, it lacks higher-order cognitive capabilities such as ethical reflection, contextual awareness, emotional nuance, and epistemic humility[3,4]. Relying solely on ChatGPT for such research questions can therefore be problematic, as their persuasive tone can create a false sense of credibility when they could be confidently wrong, leading to the uncritical acceptance of speculative claims. For instance, ChatGPT has been documented to have cited non-existent sources, misquoted references or even made biased overgeneralizations in attempts to summarize and simplify complex concepts to end users[3,5].
As noted by Jeste et al[6], a wise system should “learn from experience, integrate multiple perspectives, and learn from its mistakes.” In line with this principle, ChatGPT was used to generate and rank possible gastrointestinal diagnoses based on symptom patterns. These outputs were then subjected to human clinical discernment to scrutinize conclusions, identify overgeneralizations, and refine them in light of available historical and medical evidence. This process illustrates how the integration of AI and expert insight (termed AW) can enable a more responsible and context-sensitive approach to historical medical analysis.
MATERIALS AND METHODS
This was a qualitative, cross-sectional simulation study applying ChatGPT (version 4 Pro, accessed June 3, 2025) to assess differential diagnoses based on Alexander the Great’s reported symptoms. Prompts were crafted to simulate clinical reasoning by integrating historical data with medical context. ChatGPT’s outputs were evaluated through a structured framework simulating the principles of AW.
Rather than simply retrieving or listing causes, the prompts were designed to encourage contextual reasoning and the weighing of evidence, in alignment with the principles of AW. Factors such as the plausibility of each hypothesis, the historical and medical context, and the need to balance conflicting information were incorporated into the prompt structure. References and justifications supporting each proposed diagnosis were embedded in the inputs to guide ChatGPT’s analysis. The full search prompts, in verbatim, are appended in Supplementary Table 1.
The input provided to ChatGPT was derived from multiple academic and historical references, simulating a reasoning process similar to human clinical and historical analysis integrating evidence, evaluating probabilities, and acknowledging uncertainty, which are hallmarks of AW-based decision-making.
ChatGPT was then prompted to cite sources to support its claims. All sources mentioned by ChatGPT were reviewed by two doctors who rated them on a scale from 0 to 3 scale (0 = irrelevant, 1 = partial, 2 = relevant, 3 = strongly supporting) across four domains: (1) Clinical plausibility; (2) Source reliability; (3) Cultural/historical contextual fit; and (4) Direct relevance to determining the cause of Alexander’s death. Finally, the sources were evaluated qualitatively, and conclusions were drawn based on a combination of input by ChatGPT and human discretion on the part of the two reviewers. Inter-reviewer agreement in each domain was calculated using Cohen’s weighted κ statistic with linear weights, and major discrepancies were resolved by consensus. Source misapplication, hallucinated citations, and speculative logic were coded as “critical weaknesses” and tabulated separately. Misapplied sources were still qualitatively reviewed by both clinicians to assess whether they might support alternative pathologies, whereas hallucinated citations were automatically assigned a score of ‘0’ in each evaluation domain.
RESULTS
Using the search terms described in the methodology, ChatGPT’s response was obtained. The complete response is in Supplementary material (ChatGPT’s in-verbatim output using Supplementary Table 1 prompt [3/6/25]), and following the rules of AW, the AI’s thinking process included looking at different pieces of evidence, taking into account historical and medical background, and weighing the chances in its evaluation.
Based on our search entry into ChatGPT, ChatGPT determined the most likely cause of death of Alexander the Great to be typhoid fever with Guillain-Barré syndrome (GBS). Several factors, reflecting a nuanced interpretation of symptoms and relevant medical knowledge, supported this conclusion. Specifically, the symptoms exhibited by Alexander such as high fever, severe abdominal pain, chills, restlessness and delirium closely aligned with the clinical presentation of typhoid fever, possibly also involving enteric perforation as a complication in view of the abdominal pain and fever. Furthermore, the mention of paralysis and inability to speak was linked to GBS, a rare complication associated with typhoid, illustrating an integrative analysis that considers less common but plausible neurological sequelae. Reports of Alexander’s body allegedly not decaying after death may also be attributable to GBS-induced paralysis, leading observers to mistakenly believe he was dead initially. Typhoid fever also fits the historical context as it was prevalent during Alexander’s time and would have been lethal without the advent of antibiotics.
WNV encephalitis was described to be a less likely cause of death due to “historical and epidemiological inconsistencies regarding the presence of WNV in that era,” as the symptoms he exhibited did not match the typical earlier onset of neurological symptoms of WNV. This demonstrates AW’s capacity to incorporate historical context and evaluate the likelihood of hypotheses based on scientific plausibility. Table 1 summarizes the reasons provided by ChatGPT on why other possible causes of death were less likely. The detailed explanation against other etiologies as possible causes of this death is shown in verbatim under ChatGPT’s generated output in Supplementary material (ChatGPT’s in-verbatim output using Supplementary Table 1 prompt [3/6/25]).
Table 1 ChatGPT’s reasoning for rejection of alternative diagnoses in Alexander the Great’s death.
Cause of death
Explanation
Poisoning theories
The symptoms of the poisons speculated do not match as closely with the full range of symptoms Alexander exhibited, especially the neurological ones. Most poisons would have caused a more rapid deterioration and death, unlike the 12 days of illness Alexander endured
Acute pancreatitis from alcohol consumption
This cause is plausible given Alexander’s history of heavy drinking. However, the progression of symptoms and his ultimate neurological signs do not align as well with pancreatitis, which typically would not lead to paralysis or a ‘locked-in’ state
Malaria
While malaria is a strong contender due to the endemic nature of the disease in the region and his symptoms, the lack of specific mention of periodic fever cycles and symptom onset typical of malaria slightly weakens this argument compared to the detailed matching of typhoid fever symptoms
WNV encephalitis
This is less likely due to the historical and epidemiological inconsistencies regarding the presence of WNV in that era. Furthermore, Alexander’s symptoms do not entirely match the typical clinical presentations of WNV
The sources that ChatGPT used to generate its responses fell broadly into three categories: Historical accounts from books, governmental or organizational health guidelines, and peer-reviewed journal articles. However, while the list was extensive, not all sources were appropriately applied to support the claims made by ChatGPT. In some cases, the cited references did not adequately address the specific claims or failed to consider the cultural and historical context. In addition, 13% of ChatGPT’s 24 cited sources were inaccessible, and 8% were misapplied or quoted out of context. Table 2 demonstrates the degree of citation misapplication or irrelevance that reduced the epistemic value of ChatGPT’s output. These limitations undermined the validity of ChatGPT’s conclusions and required additional human input. To quantify these limitations, a structured review of the sources cited by ChatGPT was performed by human reviewers based on reliability, clinical relevance, contextual fit, and clinical plausibility, with the full results detailed in Supplementary Table 2. Several references, particularly for the typhoid + GBS hypothesis, were inaccessible, misapplied, or lacked historical-contextual alignment, as demonstrated.
Table 2 Evaluation of sources cited by ChatGPT: Accuracy, contextual fit, and misapplication[7,10,18,19,38,50,59-73].
Did Alexander the Great die from Guillain-Barré syndrome? In: Howe T, Anson E, Balmaceda C, Fronda M, Hollander D, McAuley A, Muller S, Vanderspoel J, Wheatley P, Dunn C, editors. Ancient History Bulletin, 2018: 106-128
Alexander the Great and the Guillain-Barré syndrome hypothesis
11
This source actually argues that Campylobacter jejuni is the cause of the GBS rather than typhoid while ChatGPT uses it to argue for typhoid. It is a source that was taken out of context to support the argument that typhoid caused GBS, when the paper argued the opposite
Two independent reviewers assessed the sources to verify their relevance, and accuracy, to provide context-based interpretation in determining the plausibility of the AI-generated claims, and verified the sources to be more supportive of C. jejuni infection as a cause of death after assessment of the sources and cross-checking with relevant literature. Between the two raters, the Cohen’s weighted kappa statistic demonstrated good strength of agreement in assessing the sources, with values of κ = 0.77 for clinical plausibility, κ = 0.76 for source reliability, κ = 0.76 for cultural and historical contextual fit, and κ = 0.61 for relevance to determining the cause of Alexander’s death, respectively. The comparative analysis revealed key discrepancies in diagnostic weighting and citation fidelity. ChatGPT exhibited “anchoring bias,” favoring typhoid despite conflicting citations. By contrast, expert reviewers applied an epistemically grounded approach, weighing the neurological trajectory (ascending paralysis) against the clinical epidemiology (Campylobacter vs typhoid incidence). This led to the re-ranking of C. jejuni as the most plausible etiological agent. A full list of these sources is provided in Supplementary material, along with reviewer ratings for clinical plausibility, source reliability, cultural-context fit, relevance to the claim, and accompanying comments in Supplementary Table 2. Table 3 shows the differential diagnosis matrix using ChatGPT and human expert assessment, and the comparative plausibility of each proposed diagnosis as ranked by ChatGPT vs human reviewers. This reflects the corrective function of AW in reappraising overconfident or misinformed AI-generated hypotheses.
Table 3 Differential diagnosis matrix of ChatGPT and human expert assessment.
The past often serves as our most reliable source of information, particularly when the present is evolving and the future remains uncertain[9]. However, historical records are inherently constrained by personal opinions, subjective viewpoints, limited documentation, and genuine uncertainties about the context in which events occurred. In such settings, augmenting human reasoning with AW embodied in tools like ChatGPT offers a novel approach that synthesizes evidence, evaluates plausibility, and balances conflicting narratives. However, AW does not merely validate AI outputs. It explicitly scrutinizes source provenance, assesses the plausibility of mechanistic links and incorporates medical reasoning that large language models (LLMs) cannot autonomously perform. This methodology is particularly suited for reinterpreting historical deaths where primary data are limited. However, AI-generated information must first be critically validated by subject-matter experts before it can be accepted as historical or clinical truth. This integration seeks to emulate a human-like reasoning process, where context, ethical considerations, and uncertainty are acknowledged and carefully weighed.
Born on 20 September 356 B.C., Alexander the Great lived a life of extraordinary accomplishment until his untimely death at age 32. His decline began on 29 May 323 B.C. and progressed over 12 days until his demise in Babylon on 10 June. During this time, he experienced abdominal pain, escalating fever, chills, fatigue, extreme thirst, progressive weakness, paralysis, speech difficulties, and delirium[10]. Possible causes of death can be broadly categorized into: (1) Infectious (e.g., WNV encephalitis, typhoid, malaria); (2) Inflammatory (e.g., acute pancreatitis [AP]); and (3) Toxic (e.g., poisoning).
To explore these possibilities, ChatGPT (GPT-4 Pro) was applied to simulate an AW-guided inquiry. While ChatGPT’s vast training dataset enables fluent synthesis, its reasoning lacks transparency, often presenting speculative links as confident truths. This opacity is exacerbated by its persuasive tone, which risks uncritical acceptance[11,12]. Its performance is especially noteworthy in its ability to generate structured, human-like responses based on a massive parameter base (approximately 100 billion in GPT-3)[13]. Despite its utility, concerns remain about its use of fabricated references[14]. Our study exposes this gap: ChatGPT promoted typhoid-GBS despite citing a source that explicitly argues for C. jejuni instead, a clear example of misrepresentation. Nonetheless, a study by Ayers et al[15] showed ChatGPT’s responses were preferred for quality and empathy in nearly 79% of 585 evaluations, reflecting its potential to mimic aspects of AW, particularly when human oversight is applied.
When prompted, ChatGPT determined WNV encephalitis to be a less likely cause of Alexander’s death. Marr and Calisher[7] proposed this etiology based on features of febrile illness, encephalopathy, and paralysis. However, ChatGPT’s analysis noted misalignment between the historical symptom timeline and the typical presentation of severe WNV encephalitis, which usually involves earlier onset of neurological signs such as dysarthria, nuchal rigidity, seizures, and tremors[16]. Additionally, WNV was only first described in 1937 in Uganda[17] and while its presence in 323 B.C. cannot be ruled out entirely, the epidemiological and clinical inconsistencies reduce its likelihood. ChatGPT’s rejection of WNV as a likely cause aligns with current scientific reasoning, although its sources required human vetting to confirm reliability.
ChatGPT attributed typhoid fever complicated by GBS as the most probable cause of death. This conclusion was based on symptoms including prolonged fever, abdominal pain, and neurological decline[18-20]. However, a review of the sources revealed inconsistencies: Several did not mention typhoid at all, or identified other pathogens, notably C. jejuni, as the most probable precipitant of GBS[18]. C. jejuni is now a well-established cause of bacterial gastroenteritis and one of the most common infectious triggers for GBS, with an incidence of 30.4 per 100000 infections[21,22]. First identified in 1913 in livestock and initially classified under Vibrio, Campylobacter was reclassified in 1973 and only recognized as a major human pathogen in the 1980s when culture techniques advanced. Similarly, GBS was first described in 1859 as “acute ascending paralysis” and formally characterized during World War I in 1916 in two French soldiers. GBS as a complication of C. jejuni may also explain Alexander’s profound paralysis, delayed decomposition, and apparent death. GBS can result in locked-in syndrome a state where a person is conscious but unable to move or speak, potentially mistaken for death in ancient times[23-25].
By contrast, typhoid fever, while common both historically and today, has only a handful of documented associations with GBS. A Boolean search on PubMed using “Guillain-Barré and typhoid” yielded only 11 case reports, mostly in pediatric patients[26-36]. This suggests that typhoid-induced GBS in an adult male especially without modern treatment is unlikely. As the adage goes, “when you hear the sound of hooves, think horses, not zebras” and given Alexander’s symptoms, a C. jejuni gastroenteritis precipitating GBS emerges as a more clinically plausible explanation.
ChatGPT’s ranking of AP as a less likely cause was based on a lack of typical historical features associated with severe or necrotizing AP. While Alexander’s history of excessive alcohol use is noted particularly during his final illness, such as in the instance where he fatally harmed Clitus, the co-commander of the aristocratic cavalry at the end of a long drunken quarrel[37,38], the evidence leans more toward episodic intoxication than chronic alcoholism[39], which limits its relevance as a trigger for AP. Furthermore, the availability of laboratory or radiological data at the time was necessary for severity stratification. Most AP cases are mild and self-limiting, and only 12%-20% result in severe complications[40]. Thus, while alcohol-induced or biliary AP cannot be entirely ruled out, it is a less likely primary cause of death.
Regarding malaria, ChatGPT acknowledged its plausibility due to endemicity and symptoms like fever, chills, and fatigue. However, the absence of intermittent fever patterns, characteristic of malaria caused by Plasmodium vivax, P. ovale, and P. falciparum, reduced its likelihood[41-43]. By contrast, typhoid fever with its sustained fever pattern caused by the release of endogenous pyrogens[44] aligns more closely with the recorded symptomatology, while C. jejuni bacteremia can also present with high fevers as seen in Alexander’s case[45]. Cerebral malaria could also indeed cause a coma-like state as seen in Alexander; however, its course is often marked by cerebral swelling and more overt signs of terminal illness[46].
ChatGPT also reviewed several poisoning hypotheses. Strychnine was rejected due to its rapid onset typically within 15-60 minutes, and characteristic convulsions which were absent in historical accounts[47]. The Styx river water poisoning theory based on calicheamicin lacks both historical and chemical evidence[48,49]. Meanwhile, the White hellebore hypothesis proposed by Schep L is plausible due to its slow-acting nature and overlap with Alexander’s symptoms[50,51], but it still lacks direct supporting evidence and depends on speculative assumptions about alcohol fermentation and toxin delivery[38,52].
Despite its strengths, ChatGPT exhibits well-known limitations. Its sources often include content from health education websites, which lack peer review and are prone to updates. Issues like hallucinated references, as reported by Meyer et al[14], and a high incidence of fabricated citations (69% in one study)[53], highlight its unreliability. These issues represent persistent impurities in generative AI, where hallucinations, false claims, and fabricated references may be presented with unwarranted confidence. Furthermore, ChatGPT functions as a black box as its internal reasoning is opaque[54], limiting the trustworthiness of its outputs. As Starke et al[54] suggest, trust in AI requires reliability, competence, and ethical intent. Moreover, ChatGPT cannot interpret sensory clinical signs (for example, “toxic-looking appearance” or “board-like abdomen”), which limits its diagnostic accuracy without human interpretation. This study highlights a core epistemic lesson: Plausibility without responsibility is insufficient. ChatGPT generated a superficially credible explanation but failed to apply the evidentiary burden required in clinical reasoning. In contrast, the AW framework imposed epistemic discipline by triangulating source credibility, pathophysiological fit, and logical consistency. As LLMs become increasingly embedded in medical reasoning, such guardrails are essential to preserve trust and accuracy.
This study had notable strengths. It is one of the first to explore ChatGPT’s role in investigating a historical medical mystery using a framework rooted in AW. Each hypothesis was cross-referenced against scientific literature to assess the validity of ChatGPT’s conclusions. This study proposes a replicable AW framework for AI-assisted historical diagnostics, introduces a novel causative hypothesis and demonstrates how source triangulation can refine LLM outputs. Future work could include blinded expert panels, benchmarking with other LLMs, and developing a formal AW scoring protocol. Nonetheless, limitations remain. The principle of “garbage in, garbage out”[55] underscores the importance of high-quality input hence, we curated references from peer-reviewed and credible sources. Worryingly, our study found that ChatGPT had the potential to generate fictitious sources and quote sources out of context as discussed in the results segment. Despite the small sample size, the fact that ChatGPT generated these citations and passed them off as contributory towards its conclusions greatly diminishes its reliability, and these discrepancies demonstrate the need for human vetting to validate AI outputs. The simulation relied on publicly available ChatGPT-4, which may behave differently from domain-specific medical LLMs. The assessment of citations and diagnoses involved expert subjective interpretation, though inter-rater agreement was high. The historical records of Alexander’s illness are inherently incomplete, limiting definitive conclusions. Lastly, LLMs are vulnerable to algorithmic bias, especially when user prompts subtly guide responses[56]. Beyond our findings, the growing body of literature on generative AI reflects widespread concerns about its reliability in healthcare. As of July 2025, a PubMed search using the term “ChatGPT” retrieves over 11000 publications, many of which examine its clinical utility and limitations. For example, a recent study by Steele et al[57] compared AI-generated and authentic personal statements in emergency medicine applications and found no statistically significant differences in perceived quality or influence on selection decisions, underscoring the potential for AI to mimic human reasoning even in evaluative domains without necessarily ensuring factual or contextual integrity[57]. These findings reinforce the importance of validating AI-generated content with expert clinical judgment, particularly when such outputs may be persuasive yet flawed. A recent comparative study evaluated ChatGPT-4 alongside other advanced LLMs such as DeepSeek-R1 and Google Gemini in the context of medical education. All three models achieved comparable accuracy exceeding 80% in both basic and clinical medical sciences, underscoring their expanding capabilities in knowledge processing and test performance[58]. However, such promising performance further amplifies the necessity of human oversight when AI tools are applied beyond controlled settings, particularly in clinical or historically ambiguous cases. These discrepancies underscore the need for “AW” models where LLMs serve not as definitive experts, but as co-analysts in a rigorously supervised environment. In sum, this exercise demonstrates that ChatGPT, while generative, is not inherently discerning. The AW framework transforms AI output into responsibly reasoned insight by embedding source vetting, clinical logic, and epistemic humility. This methodology is not only useful for historical forensics but also offers a model for safe AI integration in contemporary diagnostic practice.
CONCLUSION
This study demonstrates that while ChatGPT can generate plausible hypotheses regarding historical medical events, it remains vulnerable to citation errors, logical overreach, and source misapplication. By applying AW, which relies on human clinical judgment to critically assess AI outputs, C. jejuni-induced GBS was determined as a more credible cause of Alexander the Great’s death than typhoid fever. AW offers a novel pathway for fusing computational power with contextual nuance, especially in settings where medical, historical, and ethical ambiguity intersect.
Footnotes
Provenance and peer review: Invited article; Externally peer reviewed.
Peer-review model: Single blind
Specialty type: Gastroenterology and hepatology
Country of origin: Singapore
Peer-review report’s classification
Scientific Quality: Grade A, Grade B
Novelty: Grade A, Grade A
Creativity or Innovation: Grade A, Grade C
Scientific Significance: Grade A, Grade C
P-Reviewer: Del Carpio-Orantes L, MD, Associate Professor, Mexico; Kanaan MHG, Adjunct Associate Professor, Iraq S-Editor: Fan M L-Editor: Filipodia P-Editor: Lei YY
Cong-Lem N, Soyoof A, Tsering D. A Systematic Review of the Limitations and Associated Opportunities of ChatGPT.Int J Hum-Comput Int. 2025;41:3851-3866.
[PubMed] [DOI] [Full Text]
Ray PP. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope.Int Things Cyber-Phys Syst. 2023;3:121-154.
[PubMed] [DOI] [Full Text]
Cobianchi L, Piccolo D, Dal Mas F, Agnoletti V, Ansaloni L, Balch J, Biffl W, Butturini G, Catena F, Coccolini F, Denicolai S, De Simone B, Frigerio I, Fugazzola P, Marseglia G, Marseglia GR, Martellucci J, Modenese M, Previtali P, Ruta F, Venturi A, Kaafarani HM, Loftus TJ; Team Dynamics Study Group. Surgeons' perspectives on artificial intelligence to support clinical decision-making in trauma and emergency contexts: results from an international survey.World J Emerg Surg. 2023;18:1.
[RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)][Cited by in Crossref: 40][Cited by in RCA: 36][Article Influence: 18.0][Reference Citation Analysis (0)]
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D.
Language models are few-shot learners. Proceedings of the 34th International Conference on Neural Information Processing Systems; 2020 Dec 6-12; Vancouver, BC, Canada. New York: Curran Associates Inc, 2020: 1877-1901.
[PubMed] [DOI]
Hall K.
Did Alexander the Great die from Guillain-Barré syndrome? In: Howe T, Anson E, Balmaceda C, Fronda M, Hollander D, McAuley A, Muller S, Vanderspoel J, Wheatley P, Dunn C, editors. Ancient History Bulletin, 2018: 106-128.
[PubMed] [DOI]
Leonhard SE, Mandarakas MR, Gondim FAA, Bateman K, Ferreira MLB, Cornblath DR, van Doorn PA, Dourado ME, Hughes RAC, Islam B, Kusunoki S, Pardo CA, Reisin R, Sejvar JJ, Shahrizaila N, Soares C, Umapathi T, Wang Y, Yiu EM, Willison HJ, Jacobs BC. Diagnosis and management of Guillain-Barré syndrome in ten steps.Nat Rev Neurol. 2019;15:671-683.
[RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)][Cited by in Crossref: 446][Cited by in RCA: 521][Article Influence: 86.8][Reference Citation Analysis (0)]
May W, Senitiri I. Guillain-Barré syndrome associated with typhoid fever. A case study in the Fiji Islands.Pac Health Dialog. 2010;16:85-88.
[PubMed] [DOI]
Banks PA, Bollen TL, Dervenis C, Gooszen HG, Johnson CD, Sarr MG, Tsiotos GG, Vege SS; Acute Pancreatitis Classification Working Group. Classification of acute pancreatitis--2012: revision of the Atlanta classification and definitions by international consensus.Gut. 2013;62:102-111.
[RCA] [PubMed] [DOI] [Full Text][Cited by in Crossref: 4932][Cited by in RCA: 4524][Article Influence: 377.0][Reference Citation Analysis (45)]
Fu R, Huang Y, Singh PV.
Artificial Intelligence and Algorithmic Bias: Source, Detection, Mitigation, and Implications. In: Pushing the Boundaries: Frontiers in Impactful OR/OM Research. Maryland: Institute for Operations Research and the Management Sciences, 2020: 39-63.
[PubMed] [DOI] [Full Text]
Steele E, Steratore A, Dilcher BZ, Bandi K. Comparative Analysis of Artificial Intelligence-Generated and Human-Written Personal Statements in Emergency Medicine Applications.Cureus. 2025;17:e88818.
[PubMed] [DOI] [Full Text]
Meo SA, Abukhalaf FA, ElToukhy RA, Sattar K. Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they?Pak J Med Sci. 2025;41:1887-1892.
[PubMed] [DOI] [Full Text]
Mayor A.
Greek Fire, Poison Arrows & Scorpion Bombs: Biological and Chemical Warfare in the Ancient World. New York: Overlook Press, 2003.
[PubMed] [DOI]