Table 1

Highest percentiles of estimated risk and predictive performance using the XGBoost and logistic regression classifiers for the 2018 validation dataset (n=393 023)

MetricTop 0.1%ileTop 1%ileTop 5%ileTop 10%ile
XGBoostLogistic regressionXGBoostLogistic regressionXGBoostLogistic regressionXGBoostLogistic regression
No of dispenses1977197719 77419 77498 86998 869197 739197 739
TP captured6554724204410013 22413 29318 40418 409
Per cent of TP2.091.5013.3913.0642.1342.3558.6358.64
FP captured1322150515 57015 67485 64585 576179 335179 330
PPV33.1323.8721.2620.7313.3813.459.319.31
PLR30.7119.4416.7416.229.579.636.366.36
Post-test Probability*33.1323.8721.2620.7313.3813.459.319.31
NNS3.174.495.085.228.488.4312.9512.95
  • Logistic regression used L1 (lasso) parameter regularisation.

  • Total number of dispenses=1 977 389; total number of outcomes=31 392.

  • *Pretest probability estimated at 1.6% using prevalence.

  • FP, false positives; NNS, number needed to screen; PLR, positive likelihood ratio; PPV, positive predictive value; TP, true positives.