Published online Nov 19, 2019. doi: 10.5492/wjccm.v8.i7.120
Peer-review started: May 8, 2019
First decision: August 2, 2019
Revised: August 21, 2019
Accepted: October 27, 2019
Article in press: October 27, 2019
Published online: November 19, 2019
Processing time: 212 Days and 21.9 Hours
With the recent change in the definition (Sepsis-3 Definition) of sepsis and septic shock, an electronic search algorithm was required to identify the cases for data automation. This supervised machine learning method would help screen a large amount of electronic medical records (EMR) for efficient research purposes.
To develop and validate a computable phenotype via supervised machine learning method for retrospectively identifying sepsis and septic shock in critical care patients.
A supervised machine learning method was developed based on culture orders, Sequential Organ Failure Assessment (SOFA) scores, serum lactate levels and vasopressor use in the intensive care units (ICUs). The computable phenotype was derived from a retrospective analysis of a random cohort of 100 patients admitted to the medical ICU. This was then validated in an independent cohort of 100 patients. We compared the results from computable phenotype to a gold standard by manual review of EMR by 2 blinded reviewers. Disagreement was resolved by a critical care clinician. A SOFA score ≥ 2 during the ICU stay with a culture 72 h before or after the time of admission was identified. Sepsis versions as V1 was defined as blood cultures with SOFA ≥ 2 and Sepsis V2 was defined as any culture with SOFA score ≥ 2. A serum lactate level ≥ 2 mmol/L from 24 h before admission till their stay in the ICU and vasopressor use with Sepsis-1 and-2 were identified as Septic Shock-V1 and-V2 respectively.
In the derivation subset of 100 random patients, the final machine learning strategy achieved a sensitivity-specificity of 100% and 84% for Sepsis-1, 100% and 95% for Sepsis-2, 78% and 80% for Septic Shock-1, and 80% and 90% for Septic Shock-2. An overall percent of agreement between two blinded reviewers had a k = 0.86 and 0.90 for Sepsis 2 and Septic shock 2 respectively. In validation of the algorithm through a separate 100 random patient subset, the reported sensitivity and specificity for all 4 diagnoses were 100%-100% each.
Supervised machine learning for identification of sepsis and septic shock is reliable and an efficient alternative to manual chart review.
Core tip: This study presents and validates a supervised machine learning model for the identification of sepsis and septic shock cases using electronic medical records as an alternative to manual chart review. This method showed to be an efficient, fast and reliable option for retrospective data abstraction, with the potential to be applied to other clinical conditions.