[discovery] How to measure disagreement between human judges in discernatron?