The effect of different machine learning algorithms on model prediction performance (bootstrapping)
AUC | Accuracy | Precision | Recall rate | F1 value | ||||||
Mean±SD | 95% CI | Mean±SD | 95% CI | Mean±SD | 95% CI | Mean±SD | 95% CI | Mean±SD | 95% CI | |
Machine learning algorithms | ||||||||||
AdaBoost | 0.702±0.104 | 0.700 to 0.703 | 0.761±0.061 | 0.760 to 0.762 | 0.434±0.134 | 0.432 to 0.436 | 0.538±0.142 | 0.535 to 0.540 | 0.465±0.105 | 0.463 to 0.467 |
Bagging | 0.749±0.083 | 0.748 to 0.750 | 0.776±0.064 | 0.774 to 0.777 | 0.457±0.137 | 0.454 to 0.459 | 0.486±0.159 | 0.483 to 0.489 | 0.452±0.112 | 0.450 to 0.454 |
Bernoulli NB | 0.718±0.099 | 0.716 to 0.720 | 0.771±0.056 | 0.770 to 0.772 | 0.444±0.133 | 0.442 to 0.447 | 0.541±0.141 | 0.538 to 0.543 | 0.475±0.109 | 0.474 to 0.477 |
DT | 0.667±0.085 | 0.665 to 0.668 | 0.738±0.067 | 0.737 to 0.739 | 0.388±0.127 | 0.386 to 0.390 | 0.491±0.151 | 0.489 to 0.494 | 0.417±0.105 | 0.416 to 0.419 |
Ensemble Learning | 0.793±0.083 | 0.791 to 0.794 | 0.810±0.058 | 0.809 to 0.811 | 0.545±0.157 | 0.543 to 0.548 | 0.576±0.162 | 0.573 to 0.579 | 0.537±0.108 | 0.535 to 0.539 |
ET | 0.596±0.097 | 0.594 to 0.598 | 0.703±0.081 | 0.701 to 0.704 | 0.308±0.149 | 0.305 to 0.310 | 0.393±0.186 | 0.390 to 0.396 | 0.326±0.139 | 0.324 to 0.329 |
Gaussian NB | 0.667±0.106 | 0.665 to 0.669 | 0.720±0.061 | 0.719 to 0.721 | 0.364±0.106 | 0.362 to 0.366 | 0.543±0.133 | 0.541 to 0.545 | 0.429±0.103 | 0.427 to 0.431 |
Gradient boosting | 0.718±0.100 | 0.716 to 0.720 | 0.783±0.060 | 0.782 to 0.784 | 0.487±0.161 | 0.484 to 0.490 | 0.524±0.144 | 0.521 to 0.526 | 0.481±0.105 | 0.479 to 0.483 |
KNN | 0.655±0.101 | 0.654 to 0.657 | 0.741±0.086 | 0.740 to 0.743 | 0.394±0.262 | 0.389 to 0.399 | 0.355±0.217 | 0.351 to 0.359 | 0.316±0.166 | 0.313 to 0.319 |
LDA | 0.724±0.097 | 0.722 to 0.725 | 0.770±0.065 | 0.769 to 0.772 | 0.457±0.149 | 0.454 to 0.459 | 0.561±0.141 | 0.558 to 0.564 | 0.487±0.110 | 0.485 to 0.489 |
LR | 0.728±0.094 | 0.727 to 0.730 | 0.770±0.070 | 0.769 to 0.771 | 0.465±0.155 | 0.462 to 0.467 | 0.580±0.143 | 0.577 to 0.583 | 0.497±0.110 | 0.495 to 0.499 |
Multinomial NB | 0.727±0.099 | 0.725 to 0.728 | 0.753±0.071 | 0.752 to 0.754 | 0.450±0.170 | 0.447 to 0.453 | 0.570±0.175 | 0.567 to 0.573 | 0.467±0.111 | 0.465 to 0.469 |
Passive aggressive | 0.686±0.094 | 0.684 to 0.688 | 0.701±0.087 | 0.699 to 0.703 | 0.358±0.119 | 0.355 to 0.360 | 0.558±0.156 | 0.555 to 0.560 | 0.421±0.107 | 0.419 to 0.423 |
QDA | 0.660±0.115 | 0.658 to 0.662 | 0.774±0.057 | 0.773 to 0.775 | 0.428±0.178 | 0.425 to 0.431 | 0.436±0.188 | 0.433 to 0.440 | 0.411±0.152 | 0.408 to 0.413 |
RF | 0.742±0.088 | 0.741 to 0.744 | 0.792±0.075 | 0.791 to 0.793 | 0.534±0.194 | 0.531 to 0.538 | 0.430±0.155 | 0.427 to 0.432 | 0.444±0.119 | 0.441 to 0.446 |
SGD | 0.720±0.099 | 0.718 to 0.722 | 0.762±0.064 | 0.761 to 0.764 | 0.452±0.196 | 0.448 to 0.455 | 0.507±0.213 | 0.503 to 0.511 | 0.434±0.141 | 0.432 to 0.437 |
SVM | 0.735±0.090 | 0.734 to 0.737 | 0.792±0.073 | 0.790 to 0.793 | 0.533±0.194 | 0.529 to 0.536 | 0.443±0.165 | 0.440 to 0.446 | 0.449±0.115 | 0.447 to 0.451 |
XGBoost | 0.740±0.095 | 0.738 to 0.741 | 0.790±0.074 | 0.789 to 0.792 | 0.515±0.161 | 0.512 to 0.518 | 0.513±0.165 | 0.510 to 0.516 | 0.486±0.112 | 0.484 to 0.488 |
p value | p<0.0001 | p<0.0001 | p<0.0001 | p<0.0001 | p<0.0001 |
AUC, area under curve; DT, Decision Tree; ET, Extra Tree; KNN, K-Nearest Neighbour; LDA, Latent Dirichlet Allocation; LR, Logistic Regression; NB, Naïve Bayes; QDA, Quadratic Discriminant Analysis; SGD, Stochastic Gradient Descent; SVM, Support Vector Machine.