We talked about this briefly in our meeting today, i got the links from Aaron for the original slides[1] and talk[2] (starts about half way through). i don't think we are using any features with as strong of a bias as the logged in bit for ores, but still something to think about.
[1] https://www.mediawiki.org/wiki/File:Deploying_and_maintaining_AI_in_a_socio-... [2] https://www.youtube.com/watch?v=rsFmqYxtt9w
Thanks, Erik!
For those who want to jump to *just* the right spot:
- Aaron starts his presentation at 28m38s https://www.youtube.com/watch?v=rsFmqYxtt9w&t=28m38s - He talks about bad signals as part 2 of 3 at 42m02s https://www.youtube.com/watch?v=rsFmqYxtt9w&t=42m02s, covering Italian "ha" and anonymous users - He talks about anon specifically at 47m15s https://www.youtube.com/watch?v=rsFmqYxtt9w&t=47m15s —and that section ends at about 55m, so it's only 8 minutes of video to watch (~4 minutes at 2x)
Some key things are the differences in robustness between the two types of models, and the testing by holding one feature constant to assess its global effect. Neat stuff.
—Trey
Trey Jones Sr. Software Engineer, Search Platform Wikimedia Foundation
On Wed, Jan 31, 2018 at 2:23 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
We talked about this briefly in our meeting today, i got the links from Aaron for the original slides[1] and talk[2] (starts about half way through). i don't think we are using any features with as strong of a bias as the logged in bit for ores, but still something to think about.
[1] https://www.mediawiki.org/wiki/File:Deploying_and_ maintaining_AI_in_a_socio-technical_system_--_Research_ Showcase_(August_2016).pdf [2] https://www.youtube.com/watch?v=rsFmqYxtt9w
Discovery mailing list Discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
Hi!
We talked about this briefly in our meeting today, i got the links from Aaron for the original slides[1] and talk[2] (starts about half way through). i don't think we are using any features with as strong of a bias as the logged in bit for ores, but still something to think about.
I've noticed when reviewing edits (mostly on Wikidata) that ORES is alerting a lot on anonymous edits, even benign ones. From one point of view, it's correct - most of vandalism is done by anonymous accounts - but from the other, it doesn't help when we need to distinguish specifically between good and bad anon edits.
I guess the problem is that there's only one score (AFAIK?), in which anonymous is indeed a strong marker, but we really need to distinguish good edits from bad edits *inside* anonymous edits set, and algorithm that lumps them all together, which formally doing pretty good on the *whole* data set, is not very useful for this purpose. I guess we also may be encountering this when optimizing all kinds of corner cases, where our changes may hardly be visible on the whole data set, even if they make special case better (or worse :).
A good machine learning algorithm should try to distinguish categories (good vs bad) within large sub-categories (anon). That's supposed to be one of the advantages over a simple scoring formula—different elements can be weighted differently (even positively vs negatively) depending on the exact combination of features.
It may have been difficult for ORES to make the distinction because the signal within the anonymous sub-group was too noisy. At the end of the relevant section in the video (around 54m30s) Aaron mentions using a grammar to try to parse the edit to take more of the edit's content into account, which is where the real distinction is to be made among anon users.
(The other interesting takeaway from his presentation is that if you want to really vandalize Wikipedia successfully, make an account and then wait 8 years—then you'll be free to do anything!)
Trey Jones Sr. Software Engineer, Search Platform Wikimedia Foundation
On Wed, Jan 31, 2018 at 6:00 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
We talked about this briefly in our meeting today, i got the links from Aaron for the original slides[1] and talk[2] (starts about half way through). i don't think we are using any features with as strong of a bias as the logged in bit for ores, but still something to think about.
I've noticed when reviewing edits (mostly on Wikidata) that ORES is alerting a lot on anonymous edits, even benign ones. From one point of view, it's correct - most of vandalism is done by anonymous accounts - but from the other, it doesn't help when we need to distinguish specifically between good and bad anon edits.
I guess the problem is that there's only one score (AFAIK?), in which anonymous is indeed a strong marker, but we really need to distinguish good edits from bad edits *inside* anonymous edits set, and algorithm that lumps them all together, which formally doing pretty good on the *whole* data set, is not very useful for this purpose. I guess we also may be encountering this when optimizing all kinds of corner cases, where our changes may hardly be visible on the whole data set, even if they make special case better (or worse :).
-- Stas Malyshev smalyshev@wikimedia.org
Discovery mailing list Discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery