Machine learning integration in microRNA-based markers for cardiovascular diseases: A systematic review

doi:10.4330/wjc.120747

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 18, Issue 6

This Article

(15)

(0)

(18)

(245)

Peer-Review Report of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Journal Information of This Article

Publication Name

World Journal of Cardiology

ISSN

1949-8462

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Systematic Reviews

Copyright: ©Author(s) 2026. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) license. No commercial re-use. See permissions. Published by Baishideng Publishing Group Inc.

World J Cardiol. Jun 26, 2026; 18(6): 120747
Published online Jun 26, 2026. doi: 10.4330/wjc.120747

Table 1 Characteristics of the selected studies that met the inclusion criteria

Ref.	Setting	CVD	Sample size	Identified miRNAs	Role	Main outcome
Kayvanpour et al[8], 2021	Germany	ACS	66 ACS patients and 68 healthy controls; 148 suspected ACS patients initially enrolled	Top 10 miRNAs selected via ANOVA F-value: MiR-142-5p, miR-151a-3p, miR-145-5p, miR-186-5p, miR-191-5p, miR-29c-5p, miR-30d-5p, miR-342-5p, miR-362-5p, and miR-589-5p	Diagnosis of ACS	Machine learning models, including neural networks, classified ACS with high diagnostic performance
Ren et al[13], 2024	United States	AMI/STEMI	24 screening samples; n = 6 each for no known CAD, known CAD, STEMI-pre, and STEMI-PCI; validation samples also used	Already identified: MiR-499, miR-1, miR-208b. Newly identified: MiR-331-3p, miR-142-5p, miR-200b-3p, miR-132-3p, miR-3605-5p, miR-18a-5p, miR-423-5p, miR-543, miR-301a-3p	Diagnosis and differentiation of AMI/STEMI from stable CAD	SCAD/LASSO regularized LR identified a 9-miRNA profile that differentiated no known CAD, known CAD, STEMI-pre, and STEMI-PCI, with ROC curves approaching 1 in selected comparisons; explored for rapid point-of-care diagnosis using MIX.miR ion-exchange membrane technology
Samadishadlou et al[14], 2023	Iran	AMI and stable CAD	Healthy (51), CAD (46), AMI (111)	Differentially expressed: Hsa-miR-21-3p, hsa-miR-32-3p, hsa-miR-186-5p. Additionally selected via AUC-ROC: Hsa-miR-197-5p, hsa-miR-29a-5p, hsa-miR-296-5p	Diagnosis of AMI; differentiating AMI from healthy samples and from CAD	Peripheral blood mononuclear cell-derived miRNA signatures were used to differentiate healthy controls, stable CAD, and MI samples
Samadishadlou et al[15], 2024	Iran	AMI	Training set: 62 MI and 94 healthy controls; independent test set: 8 MI and 6 healthy controls	Hsa-miR-375-3p, hsa-miR-601, hsa-miR-34a-5p, hsa-miR-29c-5p, hsa-miR-330-5p, hsa-miR-199b-5p, hsa-miR-142-3p, hsa-miR-200a-3p, hsa-miR-132-5p, hsa-miR-133a-3p	Diagnosis of early-stage AMI	ML model identified 10 miRNAs with accuracy of 0.86 and AUC of 0.83 for diagnosing AMI
Reel et al[16], 2025	United Kingdom	Essential HTN subtypes	Cushing’s syndrome (35), primary aldosteronism (109), paraganglioma/pheochromocytoma (75), primary HTN (111)	Hsa-miR-15a-5p, hsa-miR-32-5p, hsa-miR-485-3p, hsa-miR-495-3p, hsa-miR-1260a, hsa-miR-186-5p, hsa-miR-195-5p, hsa-miR-326, hsa-miR-139-5p, hsa-miR-133a-3p, hsa-miR-223-3p	Differentiation of endocrine HTN subtypes from primary HTN	Models trained with the miRNAs achieved balanced accuracy of 0.71-0.89 and AUCs of 0.8-0.9 in differentiating HTN subtypes and other conditions
Sajid et al[17], 2024	Pakistan	CAD	CAD cases (58), controls without CAD/stenosis < 50% (55)	MiR-21, miR-33a, miR-133a, miR-145, miR-146a	Diagnosis of CAD	ML models using miRNA biomarkers showed good diagnostic performance for angiography-defined CAD
Yerukala Sathipati et al[18], 2025	United States	Post-operative AF after CABG	Cases (7), controls (8)	Hsa-miR-19a-3p, hsa-miR-19b-3p, hsa-miR-184, hsa-let-7a-5p, hsa-miR-124-3p, hsa-miR-200a-3p, hsa-miR-423-5p, hsa-miR-96-5p, hsa-miR-100-5p, hsa-miR-17-5p	Prediction of post-operative AF after CABG	10 pre-operative circulating miRNA signatures were used to develop ML models for predicting POAF after CABG
Jusic et al[19], 2023	Luxembourg/Bosnia and Herzegovina	HTN	89 cases, 85 controls	MiR-361-3p and miR-501-5p	Diagnosis of HTN	SVM model using the two miRNAs plus clinical characteristics achieved accuracy of 0.87, specificity of 0.91, sensitivity of 0.83, and AUC of 0.90
Errington et al[20], 2021	United Kingdom	PAH	64 cases, 43 disease and healthy controls	MiR-636 and miR-187-5p	Diagnosis of PAH	Models using the two miRNAs showed high diagnostic accuracy in differentiating PAH patients from healthy controls

CVD: Cardiovascular disease, CAD: Coronary artery disease, HTN: Hypertension; AMI: Acute myocardial infarction; ACS: Acute coronary syndrome, AF: Atrial fibrillation, PAH: Pulmonary arterial hypertension, CABG: Coronary artery bypass grafting; miRNA: MicroRNA; STEMI: ST elevation myocardial infarction; PCI: Percutaneous coronary intervention; SCAD: Smoothly clipped absolute deviation; LASSO: Least absolute shrinkage and selection operator; LR: Logistic regression; ROC: Receiver operating characteristic; MI: Myocardial infarction; AUC: Area under the curve; POAF: Post-operative atrial fibrillation; SVM: Support vector machines.

Full Size Table

Table 2 Characteristics of the model used in the studies

Ref.	Models evaluated	Internal validation strategy	External validation	Training dataset	Test/validation dataset	Performance metrics
Kayvanpour et al[8], 2021	LR, kNN, LDA, NB, RF, CT, SVM, XGB, and ANN	The subjects were divided into training and test sets in the ratio of 9:1, respectively. This was repeated 10 times to enable ten-fold cross-validation	None	90% of subjects; 121 samples per split	10% of subjects; 13 samples per split	Accuracy, sensitivity, specificity, and ROC-AUC
Ren et al[13], 2024	Regularized LR using either SCAD or LASSO	Leave-one-out cross-validation	None for the ML model; selected miRNAs were biologically evaluated in matched clinical samples using an ion-exchange membrane sensor platform	24 subjects; 800-miRNA screening library (100%)	None; leave-one-out cross-validation was used because of small sample size	ROC curves and AUC (used to evaluate the selected miRNA combinations)
Samadishadlou et al[14], 2023	A 2-layer architecture utilizing SVM (with linear, polynomial, and RBF kernels), LR, RF, kNN, GB, XGB, and DT models (layer 1 isolated healthy vs not-healthy; layer 2 separated MI vs CAD)	The data was split in a 7:3 ratio into the training and test sets, respectively. A ten-fold cross-validation followed this	None	70% of all the samples	30% of all the samples	AUC-ROC, accuracy, sensitivity, specificity, and confusion matrix
Samadishadlou et al[15], 2024	SVM, GB, XGB and hard voting ensemble model	Done in 2 phases. In miRNA selection: The LASSO method was cross-validated using the dataset 10-fold to select the best miRNA to be used in model development. In model selection, the training dataset was split in a 7:3 ratio into training and validation datasets. The models were then cross-validated 5-fold on the datasets. The best-performing models were then tested on the independent dataset	Performed using an independent dataset (GSE29532)	GSE61741 (62 MI samples and 94 healthy samples)	GSE29532 (8 MI samples and 6 healthy samples)	Accuracy, AUC-ROC, sensitivity, and specificity
Reel et al[16], 2025	J48, NB, IBk, RF, LB, LMT, SL, and SMO	The data was randomly split into training and testing sets in an 8:2 ratio for model development and validation	None	80% of all the samples	20% of all the samples	Balanced accuracy is the primary metric. Other metrics include sensitivity, specificity, AUC-ROC, F1 score and Kappa score
Sajid et al[17], 2024	LR, SVM, nonlinear kNN, tree-based (DT, RF), GB, XGBM, CBoost, ABoost, and ensemble voting	The data subset was first split in an 8:2 ratio into a CV subset and a hold-out subset for final evaluation. The CV subset was then divided into 10 folds. Nine folds were used for training and one-fold for testing, and the process was repeated 10 times. The best models were then tested on the hold-out dataset	None	80% of the 113 subjects (cohort: 58 CAD cases, 55 healthy controls)	20% of the 113 subjects (hold-out subset)	Accuracy, sensitivity, specificity, AUC-ROC, performance evaluation measure, F-statistic, and P values
Yerukala Sathipati et al[18], 2025	kNN, XGB, SVM, and RF	The data was split in a 8:2 ratio into training and validation datasets, respectively	Performed using an independent GEO dataset (GSE222739)	80% of the dataset (n = 12)	20% of the dataset (n = 3)	AUC-ROC, accuracy, specificity, and sensitivity
Jusic et al[19], 2023	RF, SVM, MLP, XGB, kNN, Logit	Hyperparameter tuning used two repeated 10-fold CV. The final model was also evaluated using leave-one-out cross-validation	None	147 subjects (89 from the validation cohort + 58 from the sequenced discovery cohort)	23 subjects (randomly extracted as 20% of the 112-subject validation cohort)	AUC-ROC, balanced accuracy, F1 score, precision, sensitivity, and specificity
Errington et al[20], 2021	RF, Rpart, LASSO, XGB, and Ensemble	The data was split into training and validation data sets. The models were then CV 10-fold in the training dataset	The models were externally validated using publicly available datasets	Two-thirds of the samples	One-third of the samples (validation set)	Sensitivity, specificity, AUC, correct classification rate (accuracy), positive predictive value, and negative predictive value

ML: Machine learning; CV: Cross validation; SVM: Support vector machines; CT: Classification tree; NB: Naïve bayes; LR: Regularized logistic regression; SCAD: Smoothly clipped absolute deviation; LASSO: Least absolute shrinkage and selection operator; RF: Random forests; XGB: XGBoost; GB: Gradient boosting; kNN: K-nearest neighbors; DT: Decision tree models; LB: Logitboost; SL: Simple logistic; LMT: Logistic model tree; SMO: Sequential minimal optimization; CBoost: CatBoost; ABoost: AdaBoost; ANN: Artificial neural networks; Rpart: Regression partition tree; LDA: Linear discriminant analysis; miRNA: MicroRNA; ROC: Receiver operating characteristic; AUC: Area under the curve; RBF: Radial basis function; CAD: Coronary artery disease; MI: Myocardial infarction; XGBM: Extreme gradient boosting machine; MLP: Multilayer perceptron.

Full Size Table

Table 3 Key diagnostic performance metrics of machine learning models integrating microRNAs for cardiovascular disease diagnosis

Ref.	Best model(s)	AUC-ROC (range or best)	Accuracy (best reported)	Sensitivity (best)	Specificity (best)	Notes on interpretation
Kayvanpour et al[8], 2021	ANN was the best-performing model (SVM, kNN, LDA, and RF also performed highly)	0.87-0.99	0.87-0.96	0.87-0.95	0.87-1.00	Good internal discriminative performance, but no external validation; risk of optimistic bias
Ren et al[13], 2024	Regularized LR (LASSO/SCAD)	0.5 to approximately 1.0	NR	NR	NR	Focus on miRNA identification; no full diagnostic model metrics
Samadishadlou et al[14], 2023	Two-layer architecture utilizing SVM (RBF)	0.96 (layer 2) to 1.0 (layer 1)	0.96 (overall two-layer architecture)	0.97 (layer 2) to 1.0 (layer 1)	0.86 (layer 2) to 1.0 (layer 1)	Good internal performance (two-layer approach isolated healthy samples perfectly), but no external validation cohort utilized
Samadishadlou et al[15], 2024	HVE (aggregating SVM, GB, and XGB)	0.83 (HVE on test set)	0.86	1.00	0.67	Very small test set (14 samples total: 8 MI, 6 healthy) limits reliability; platform differences between training and test sets impacted individual model performance
Reel et al[16], 2025	LMT/LogitBoost (along with SL and SMO)	0.80-0.90	0.71-0.89 (balanced accuracy)	0.43-0.95	0.83-1.00	Moderate-large sample; balanced accuracy used
Sajid et al[17], 2024	AdaBoost (for miRNA biomarkers) and GB (for atherosclerosis inflammatory biomarkers)	0.88-0.95 (CV)/0.76-0.93 (hold-out)	0.87-0.90 (CV)/0.78-0.96 (hold-out)	0.88-0.92 (CV)/0.71-0.86 (hold-out)	0.96-1.00 (CV)/0.81-1.00 (hold-out)	Moderate sample; strong internal metrics but no external validation
Yerukala Sathipati et al[18], 2025	RF/XGB	0.76-0.83	0.73-0.80	0.75-0.87	0.71	Very small sample, high risk of overfitting, though external validation was performed
Jusic et al[19], 2023	SVM	0.90	0.87	0.83	0.91	Moderate sample; internal only
Errington et al[20], 2021	RF, XGB and Ensemble model	0.82-0.85	0.81-0.83	0.86-0.91	0.64-0.71	Study with external validation, more reliable estimates

HVE: Hard voting ensemble; CV: Cross validation; SVM: Support vector machines; ROC: Receiver operating characteristic; AUC: Area under the curve; ANN: Artificial neural networks; kNN: K-nearest neighbors; RF: Random forests; LR: Regularized logistic regression; LASSO: Least absolute shrinkage and selection operator; SCAD: Smoothly clipped absolute deviation; miRNA: MicroRNA; RBF: Radial basis function; XGB: XGBoost; GB: Gradient boosting; LMT: Logistic model tree; SL: Simple logistic; SMO: Sequential minimal optimization; MI: Myocardial infarction; LDA: Linear discriminant analysis; NR: Not reported.

Full Size Table

Table 4 Risk-of-bias assessment using QUADAS-2 domains

Ref.	Patient selection (risk of bias/applicability)	Index test (including feature selection and overfitting risk)	Reference standard	Flow and timing (including external validation)	Overall ML-specific concerns
Kayvanpour et al[8], 2021	Low/Low	Low (10-fold CV was reported, but feature-selection timing was not fully clear)	Low (diagnosis strictly adjudicated by a board of 3 expert cardiologists using ESC guidelines)	Low (internal CV only; no external)	Moderate overfitting risk due to no external validation
Ren et al[13], 2024	Unclear/Low	Low (LASSO/SCAD used for selection; validation on matched sample)	Unclear	High (no formal validation split reported)	Limited validation; ML mainly for feature identification
Samadishadlou et al[14], 2023	Low/Low	Low (7:3 split and 10-fold CV)	Unclear	Low (internal only)	Small test set; class imbalance addressed using sample weighting
Samadishadlou et al[15], 2024	Low/Low	Low (LASSO and 5/10-fold CV)	Unclear	Low (internal and independent test set)	Improved validation compared to prior work
Reel et al[16], 2025	Low/Low	Low (8:2 split and balanced accuracy metric)	Unclear	Low (internal only)	Use of balanced accuracy helps with potential imbalance
Sajid et al[17], 2024	Low/Low	Low (8:2 and 10-fold CV and hold-out)	Unclear	Low (internal only)	Ensemble methods; good internal practices
Yerukala Sathipati et al[18], 2025	Unclear/Low (small n = 15)	Low (8:2 split)	Unclear	Low (external validation performed via independent GEO dataset)	Very small sample; high risk of overfitting
Jusic et al[19], 2023	Low/Low	Low (8:2 and 10-fold CV)	Unclear	Low (internal only)	Small cohort
Errington et al[20], 2021	Low/Low	Low (10-fold CV and external datasets)	Unclear	Low (external validation performed)	Strongest validation approach among included studies

CV: Cross validation; ML: Machine learning; ESC: European society of cardiology; LASSO: Least absolute shrinkage and selection operator; SCAD: Smoothly clipped absolute deviation; GEO: Gene Expression Omnibus.

Full Size Table

Citation: Popat A, Sathipati S, Sharma P. Machine learning integration in microRNA-based markers for cardiovascular diseases: A systematic review. World J Cardiol 2026; 18(6): 120747
URL: https://www.wjgnet.com/1949-8462/full/v18/i6/120747.htm
DOI: https://dx.doi.org/10.4330/wjc.120747