Yang CC, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Hsia TL, Lin CY. Comparison between multiple logistic regression and machine learning methods in prediction of abnormal thallium scans in type 2 diabetes. World J Clin Cases 2023; 11(33): 7951-7964 [PMID: 38075576 DOI: 10.12998/wjcc.v11.i33.7951]
Corresponding Author of This Article
Chung-Yu Lin, MD, Doctor, Department of Cardiology, Fu Jen Catholic University Hospital, No. 69 Guizi Road, Taishan District, New Taipei City 24352, Taiwan. a02076@mail.fjuh.fju.edu.tw
Research Domain of This Article
Endocrinology & Metabolism
Article-Type of This Article
Retrospective Cohort Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
World J Clin Cases. Nov 26, 2023; 11(33): 7951-7964 Published online Nov 26, 2023. doi: 10.12998/wjcc.v11.i33.7951
Table 1 Participant demographics
Variables
mean ± SD
N
Age
67.38 ± 9.69
556
Body mass index
26.16 ± 3.9
556
Duration of diabetes
13.69 ± 7.94
556
Systolic blood pressure
131.14 ± 15.42
493
Diastolic blood pressure
73.32 ± 10.15
493
Hemoglobin
12.92 ± 1.68
444
Triglyceride
153.74 ± 45.85
539
Glycated hemoglobin
7.79 ± 1.36
538
High density lipoprotein cholesterol
122.65 ± 74.34
535
Low density lipoprotein cholesterol
49.65 ± 14.75
498
Alanine aminotransferase
23.87 ± 13.94
537
Creatinine
1.16 ± 1
536
Microalbumin creatinine ratio
194.18 ± 733.73
526
Homeostasis assessment-insulin resistance
0.63 ± 0.34
366
Homeostasis assessment-insulin secretion
1.71 ± 0.37
366
Table 2 Participant demographics – sex, smoking and sum stressed score
N (%)
N
Sex
556
0
287 (51.62)
1
269 (48.38)
Smoking
310
0
202 (65.16)
1
108 (34.84)
Sum stressed score
556
0
180 (32.37)
1
376 (67.63)
Table 3 Summary of the values of the hyperparameters for the best random forest, classification and regression tree, Naïve Byer’s classifier, eXtreme gradient boosting
Methods
Hyperparameters
Best value
Meaning
RF
Mtry
8
The number of random features used in each tree
Ntree
500
The number of trees in forest
CART
Minispilt
20
The minimum number of observations required to attempt a split in a node
Minibucket
7
The minimum number of observations in a terminal node
Maxdepth
10
The maximum depth of any node in the final tree
Xval
10
Number of cross-validations
Cp
0.03588
Complexity parameter: The minimum improvement required in the model at each node
XGBoost
Nrounds
100
The number of tree model iterations
Max_depth
3
The maximum depth of a tree
Eta
0.4
Shrinkage coefficient of tree
Gamma
0
The minimum loss reduction
Subsample
0.75
Subsample ratio of columns when building each tree
Colsample_bytree
0.8
Subsample ratio of columns when constructing each tree
Rate_drop
0.5
Rate of trees dropped
Skip_drop
0.05
Probability of skipping the dropout procedure during a boosting iteration
Min_child_weight
1
The minimum sum of instance weight
NB
Fl
0
Adjustment of Laplace smoother
Usekernel
TRUE
Using kernel density estimate for continuous variable versus a Gaussian density estimate
Adjust
1
Adjust the bandwidth of the kernel density
Table 4 The average performance of the LR, random forest, stellate ganglion block, classification and regression tree, and eXtreme gradient boosting methods
Accuracy
Sensitivity
Specificity
AUC
LGR
0.685 ± 0.072
0.687 ± 0.152
0.683 ± 0.114
0.703 ± 0.057
CART
0.541 ± 0.074
0.546 ± 0.078
0.529 ± 0.670
0.540 ± 0.070
RF
0.707 ± 0.047
0.711 ± 0.100
0.678 ± 0.099
0.707 ± 0.037
XGBoost
0.712 ± 0.072
0.727 ± 0.139
0.674 ± 0.088
0.719 ± 0.062
NB
0.692 ± 0.059
0.702 ± 0.116
0.669 ± 0.090
0.704 ± 0.056
Table 5 The variable importance and rank of the importance of the risk factors derived from machine learning methods
Variables
RF
XGBoost
NB
Average
Rank
Sex
100.0 ± 0
100.0 ± 0
100.0 ± 0
100.0
1.0
Body mass index
54.2 ± 6.6
61.1 ± 14.7
86.2 ± 6.8
67.1
2.0
Age
13.1 ± 7.6
78.3 ± 13.2
67.9 ± 6.5
53.1
3.0
Low density lipoprotein cholesterol
30.4 ± 3.1
8.4 ± 12.8
71.0 ± 7.8
36.6
4.0
Glycated hemoglobin
15.4 ± 5.9
12.8 ± 11.9
48.0 ± 8.3
25.4
5.0
Smoking
12.2 ± 2.7
28.8 ± 9.2
34.5 ± 6.6
25.2
6.0
Creatinine
10.1 ± 2.3
5.3 ± 9.12
53.1 ± 7.3
22.8
7.0
Duration
6.3 ± 4.61
41.5 ± 8.6
10.1 ± 8.9
19.3
8.0
Hemoglobin
8.0 ± 4.16
16.6 ± 8.9
17.0 ± 5.7
13.8
9.0
Blood urine nitrogen
9.0 ± 8.15
6.5 ± 6.79
17.3 ± 9.6
11.0
10.0
Systolic blood pressure
4.2 ± 1.03
21.6 ± 5.1
6.4 ± 2.88
10.7
11.0
Triglyceride
5.4 ± 17.5
15.0 ± 4.4
11.1 ± 12.3
10.5
12.0
Microalbumin
4.3 ± 2.23
3.6 ± 3.83
22.7 ± 6.9
10.2
13.0
Diastolic blood pressure
2.5 ± 5.91
18.9 ± 3.7
5.6 ± 9.33
9.0
14.0
Alainine aminotransferase
3.2 ± 5.96
6.9 ± 3.90
13.0 ± 12.6
7.7
15.0
High density lipoprotein cholesterol
1.3 ± 3.60
9.8 ± 3.29
7.3 ± 8.41
6.1
16.0
HOMA-IR
5.7 ± 2.85
2.2 ± 2.52
10.2 ± 8.1
6.0
17.0
HOMA-B
4.3 ± 2.22
0.0 ± 0.00
7.4 ± 8.831
3.9
18.0
Citation: Yang CC, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Hsia TL, Lin CY. Comparison between multiple logistic regression and machine learning methods in prediction of abnormal thallium scans in type 2 diabetes. World J Clin Cases 2023; 11(33): 7951-7964