Table 3

Pairwise agreement between treatment hierarchies obtained from the different ranking metrics measured by Spearman ρ, Kendall τ, Yilmaz and Average Overlap

	vs	versus relative treatment effect	versus relative treatment effect	versus
Spearmanρ	0.9 (0.8 to 0.96)	1 (0.99 to 1)	0.9 (0.8 to 0.97)	1 (0.98 to 1)
Kendallτ	0.8 (0.67 to 0.91)	1 (0.95 to 1)	0.8 (0.69 to 0.91)	1 (0.93 to 1)
Yilmaz	0.78 (0.6 to 0.9)	1 (0.93 to 1)	0.79 (0.65 to 0.9)	1 (0.93 to 1)
Average Overlap	0.85 (0.72 to 0.96)	1 (0.91 to 1)	0.88 (0.79 to 1)	1 (0.94 to 1)

Medians, first and third quartiles are reported.
Relative treatment effect stands for the relative treatment effect against fictional treatment of average performance.
P_BV, probability of producing the best value; SUCRA_B, surface under the cumulative ranking curve (calculated in Bayesian setting); SUCRA_F, surface under the cumulative ranking curve (calculated in frequentist setting).