Hi Kaushik,

Per our recent conversation, I think that https://phabricator.wikimedia.org/T223788 is a better task to pick up right now. 

BUT!  I like that you're still curious about this task.  So the real problem is that we get different metrics for True ROC-AUC and False ROC-AUC in a binary classifier.  This should be impossible.  You can get TPR and FPR from ORES directly.  E.g., https://ores.wikimedia.org/v3/scores/enwiki/?model_info=statistics.thresholds.true&models=damaging will give you all of the statistics at each threshold.  You can take the recall (which incidentally is just another name for TPR) and the FPR to generate some curves.  If you run the same query for "false" (e.g. https://ores.wikimedia.org/v3/scores/enwiki/?model_info=statistics.thresholds.true&models=damaging) you can compare the ROC of both and help us figure out where the disparity is.  

See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html for a nice utility for generating area-under-the-curve metrics.  


On Mon, Jul 29, 2019 at 12:47 PM K. Kaushik Reddy <reddykaushik18@gmail.com> wrote:

---------- Forwarded message ---------
From: K. Kaushik Reddy <reddykaushik18@gmail.com>
Date: Mon, Jul 29, 2019, 7:28 PM
Subject: Help needed for the bug patch
To: Application of Artificial Intelligence and other advanced computing strategies to Wikimedia Projects <ai@lists.wikimedia.org>
Cc: Aaron Halfaker <aaron.halfaker@gmail.com>

Hi developers

This is K. Kaushik Reddy. I had been assigned with the measuring of the ROC AUC issue to patch for, I had this ROC wikipage to help me with the concept of working. Since, I'm a in my beginner level, I need time to understand. The problem is to find out why the algorithm is showing huge values at times. I need help regarding the understanding of where the in the functions had gone wrong and what could be done?
I hope I made my question clear.

AI mailing list