Table 3

Performance metrics for both models at preselected thresholds in the aggregated cross-validation sets and the test set

Model and metricOptimal sensitivity-specificitySensitivity retained at over 90%
Cross-validation (n=3608)Test
(n=1277)
Cross-validation (n=3608)Test
(n=1277)
Gradient boosted tree model
 Threshold for positive prediction10%12.5%6%6%
 True negative (n (%))2126 (58.9)829 (64.9)1369 (37.9)473 (37.0)
 True positive (n (%))322 (8.9)103 (8.1)400 (11.1)142 (1.1)
 False negative (n (%))121 (3.4)52 (4.1)43 (1.2)13 (1.0)
 False positive (n (%))1039 (28.8)293 (22.9)1796 (49.8)649 (50.8%)
 Sensitivity (%)72.766.590.391.6
 Specificity (%)67.273.943.342.2
 Positive predictive value (%)23.72618.218
 Negative predictive value (%)94.694.19797.3
Logistic regression model
 Threshold for positive prediction12.5%10%*6%6%
 True negative (n (%))2172 (60.2)680 (53.2)1144 (31.7)429 (33.6)
 True positive (n (%))308 (8.5)123 (9.6)405 (11.2)142 (11.1)
 False negative (n (%))135 (3.7)32 (2.5)38 (1.1)13 (1.0)
 False positive (n (%))993 (27.5)442 (34.6)2021 (56.0)693 (45.3)
 Sensitivity (%)69.579.491.491.6
 Specificity (%)68.660.636.138.2
 Positive predictive value (%)23.721.816.717
 Negative predictive value (%)94.195.596.897.1
  • *This is the only scenario where the optimal threshold would be different when based on the maximum sum of sensitivity and specificity or on a minimal difference between sensitivity and specificity. In this case, the threshold was chosen based on the maximum sum of sensitivity and specificity.