Table 4

Feature contribution for ADHD prediction and the impact of fairness reweighting

Population cohort (n=56 257)Clinical cohort (n=4178)
LRRFLRRF
UnweightedWeightedUnweightedWeighted
KS1 writing score0.7610.8580.1060.1000.4070.109
EYFSP personal, social and emotional development0.8411.0000.0510.0650.3140.047
KS1 attendance (%)0.9581.1040.0100.0080.0930.059
Male gender0.6080.6260.0290.0610.3890.142
KS1 no SEN−0.198−0.1690.1460.138−0.0940.087
English as first language0.6960.1830.0490.0060.1130.032
EYFSP attendance (%)0.9831.0190.0100.0050.0930.050
EYFSP problem-solving, reasoning and numeracy0.2010.3180.0080.0050.1830.019
White ethnicity0.2240.0220.004<0.0010.0780.006
  • For the population cohort, coefficients of the model when trained on the reweighted dataset are displayed. Feature importance metrics are different for LR and RF, and not directly comparable. LR beta coefficients represent the log-odds of a given feature, negative values correspond with a decrease in the probability of the case being classified as ADHD and vice versa. RF feature importance is calculated as the decrease in node impurity on a given branch within a decision tree, weighted by the probability of reaching that node. The feature importance value presented represents the average feature importance over all the trees. The table displays the top 5 features for at least one model (marked in bold for the corresponding column) and top 20 features for all four models. The feature ‘white ethnicity’ was added for comparing its significance after reducing bias. Ranking was made by decreasing significance. For the population cohort, coefficients of both models, when trained on the reweighted dataset using both English and white as protected attributes, are displayed.

  • ADHD, attention deficit hyperactivity disorder; EYFSP, Early Years Foundation Stage Profile; KS1, Key Stage 1; LR, logistic regression; RF, random forest; SEN, special educational need.