Charilaou P, Battat R. Machine learning models and over-fitting considerations. World J Gastroenterol 2022; 28(5): 605-607 [PMID: 35316964 DOI: 10.3748/wjg.v28.i5.605]
Corresponding Author of This Article
Robert Battat, MD, Assistant Professor, Jill Roberts Center for Inflammatory Bowel Disease - Division of Gastroenterology & Hepatology, Weill Cornell Medicine, 1315 York Avenue, New York, NY 10021, United States. rob9175@med.cornell.edu
Research Domain of This Article
Gastroenterology & Hepatology
Article-Type of This Article
Letter to the Editor
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
World J Gastroenterol. Feb 7, 2022; 28(5): 605-607 Published online Feb 7, 2022. doi: 10.3748/wjg.v28.i5.605
Machine learning models and over-fitting considerations
Paris Charilaou, Robert Battat
Paris Charilaou, Robert Battat, Jill Roberts Center for Inflammatory Bowel Disease - Division of Gastroenterology & Hepatology, Weill Cornell Medicine, New York, NY 10021, United States
Author contributions: Charilaou P and Battat R drafted and edited the manuscript, and reviewed the intellectual content.
Conflict-of-interest statement: The authors have no conflict of interest to declare.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Robert Battat, MD, Assistant Professor, Jill Roberts Center for Inflammatory Bowel Disease - Division of Gastroenterology & Hepatology, Weill Cornell Medicine, 1315 York Avenue, New York, NY 10021, United States. rob9175@med.cornell.edu
Received: October 26, 2021 Peer-review started: October 26, 2021 First decision: December 27, 2021 Revised: December 29, 2021 Accepted: January 14, 2022 Article in press: January 14, 2022 Published online: February 7, 2022 Processing time: 90 Days and 12.8 Hours
Abstract
Machine learning models may outperform traditional statistical regression algorithms for predicting clinical outcomes. Proper validation of building such models and tuning their underlying algorithms is necessary to avoid over-fitting and poor generalizability, which smaller datasets can be more prone to. In an effort to educate readers interested in artificial intelligence and model-building based on machine-learning algorithms, we outline important details on cross-validation techniques that can enhance the performance and generalizability of such models.
Core Tip: Machine learning models are increasingly being used in clinical medicine to predict outcomes. Proper validation techniques of these models are essential to avoid over-fitting and poor generalization on new data.