Table 2

The effect of different machine learning algorithms on model prediction performance (bootstrapping)

AUCAccuracyPrecisionRecall rateF1 value
Mean±SD95% CIMean±SD95% CIMean±SD95% CIMean±SD95% CIMean±SD95% CI
Machine learning algorithms
 AdaBoost0.702±0.1040.700 to 0.7030.761±0.0610.760 to 0.7620.434±0.1340.432 to 0.4360.538±0.1420.535 to 0.5400.465±0.1050.463 to 0.467
 Bagging0.749±0.0830.748 to 0.7500.776±0.0640.774 to 0.7770.457±0.1370.454 to 0.4590.486±0.1590.483 to 0.4890.452±0.1120.450 to 0.454
 Bernoulli NB0.718±0.0990.716 to 0.7200.771±0.0560.770 to 0.7720.444±0.1330.442 to 0.4470.541±0.1410.538 to 0.5430.475±0.1090.474 to 0.477
 DT0.667±0.0850.665 to 0.6680.738±0.0670.737 to 0.7390.388±0.1270.386 to 0.3900.491±0.1510.489 to 0.4940.417±0.1050.416 to 0.419
 Ensemble Learning0.793±0.0830.791 to 0.7940.810±0.0580.809 to 0.8110.545±0.1570.543 to 0.5480.576±0.1620.573 to 0.5790.537±0.1080.535 to 0.539
 ET0.596±0.0970.594 to 0.5980.703±0.0810.701 to 0.7040.308±0.1490.305 to 0.3100.393±0.1860.390 to 0.3960.326±0.1390.324 to 0.329
 Gaussian NB0.667±0.1060.665 to 0.6690.720±0.0610.719 to 0.7210.364±0.1060.362 to 0.3660.543±0.1330.541 to 0.5450.429±0.1030.427 to 0.431
 Gradient boosting0.718±0.1000.716 to 0.7200.783±0.0600.782 to 0.7840.487±0.1610.484 to 0.4900.524±0.1440.521 to 0.5260.481±0.1050.479 to 0.483
 KNN0.655±0.1010.654 to 0.6570.741±0.0860.740 to 0.7430.394±0.2620.389 to 0.3990.355±0.2170.351 to 0.3590.316±0.1660.313 to 0.319
 LDA0.724±0.0970.722 to 0.7250.770±0.0650.769 to 0.7720.457±0.1490.454 to 0.4590.561±0.1410.558 to 0.5640.487±0.1100.485 to 0.489
 LR0.728±0.0940.727 to 0.7300.770±0.0700.769 to 0.7710.465±0.1550.462 to 0.4670.580±0.1430.577 to 0.5830.497±0.1100.495 to 0.499
 Multinomial NB0.727±0.0990.725 to 0.7280.753±0.0710.752 to 0.7540.450±0.1700.447 to 0.4530.570±0.1750.567 to 0.5730.467±0.1110.465 to 0.469
 Passive aggressive0.686±0.0940.684 to 0.6880.701±0.0870.699 to 0.7030.358±0.1190.355 to 0.3600.558±0.1560.555 to 0.5600.421±0.1070.419 to 0.423
 QDA0.660±0.1150.658 to 0.6620.774±0.0570.773 to 0.7750.428±0.1780.425 to 0.4310.436±0.1880.433 to 0.4400.411±0.1520.408 to 0.413
 RF0.742±0.0880.741 to 0.7440.792±0.0750.791 to 0.7930.534±0.1940.531 to 0.5380.430±0.1550.427 to 0.4320.444±0.1190.441 to 0.446
 SGD0.720±0.0990.718 to 0.7220.762±0.0640.761 to 0.7640.452±0.1960.448 to 0.4550.507±0.2130.503 to 0.5110.434±0.1410.432 to 0.437
 SVM0.735±0.0900.734 to 0.7370.792±0.0730.790 to 0.7930.533±0.1940.529 to 0.5360.443±0.1650.440 to 0.4460.449±0.1150.447 to 0.451
 XGBoost0.740±0.0950.738 to 0.7410.790±0.0740.789 to 0.7920.515±0.1610.512 to 0.5180.513±0.1650.510 to 0.5160.486±0.1120.484 to 0.488
 p valuep<0.0001p<0.0001p<0.0001p<0.0001p<0.0001
  • AUC, area under curve; DT, Decision Tree; ET, Extra Tree; KNN, K-Nearest Neighbour; LDA, Latent Dirichlet Allocation; LR, Logistic Regression; NB, Naïve Bayes; QDA, Quadratic Discriminant Analysis; SGD, Stochastic Gradient Descent; SVM, Support Vector Machine.