Model and metric | Optimal sensitivity-specificity | Sensitivity retained at over 90% | ||
Cross-validation (n=3608) | Test (n=1277) | Cross-validation (n=3608) | Test (n=1277) | |
Gradient boosted tree model | ||||
Threshold for positive prediction | 10% | 12.5% | 6% | 6% |
True negative (n (%)) | 2126 (58.9) | 829 (64.9) | 1369 (37.9) | 473 (37.0) |
True positive (n (%)) | 322 (8.9) | 103 (8.1) | 400 (11.1) | 142 (1.1) |
False negative (n (%)) | 121 (3.4) | 52 (4.1) | 43 (1.2) | 13 (1.0) |
False positive (n (%)) | 1039 (28.8) | 293 (22.9) | 1796 (49.8) | 649 (50.8%) |
Sensitivity (%) | 72.7 | 66.5 | 90.3 | 91.6 |
Specificity (%) | 67.2 | 73.9 | 43.3 | 42.2 |
Positive predictive value (%) | 23.7 | 26 | 18.2 | 18 |
Negative predictive value (%) | 94.6 | 94.1 | 97 | 97.3 |
Logistic regression model | ||||
Threshold for positive prediction | 12.5% | 10%* | 6% | 6% |
True negative (n (%)) | 2172 (60.2) | 680 (53.2) | 1144 (31.7) | 429 (33.6) |
True positive (n (%)) | 308 (8.5) | 123 (9.6) | 405 (11.2) | 142 (11.1) |
False negative (n (%)) | 135 (3.7) | 32 (2.5) | 38 (1.1) | 13 (1.0) |
False positive (n (%)) | 993 (27.5) | 442 (34.6) | 2021 (56.0) | 693 (45.3) |
Sensitivity (%) | 69.5 | 79.4 | 91.4 | 91.6 |
Specificity (%) | 68.6 | 60.6 | 36.1 | 38.2 |
Positive predictive value (%) | 23.7 | 21.8 | 16.7 | 17 |
Negative predictive value (%) | 94.1 | 95.5 | 96.8 | 97.1 |
*This is the only scenario where the optimal threshold would be different when based on the maximum sum of sensitivity and specificity or on a minimal difference between sensitivity and specificity. In this case, the threshold was chosen based on the maximum sum of sensitivity and specificity.