--- El lun, 17/11/08, Platonides Platonides@gmail.com escribió:
De: Platonides Platonides@gmail.com Asunto: Re: [Wiki-research-l] "Regular contributor" Para: wiki-research-l@lists.wikimedia.org Fecha: lunes, 17 noviembre, 2008 9:42 Felipe Ortega wrote:
I also have my doubts about the filtering conditions.
For
instance, in eswiki, 'BOTpolicia' is not
registered as such
and it's responsible for more than 90.000 edits,
so far. On
the other hand, a famous user in eswiki (retired for
this
moment, id=13770 to be precise)
He has returned, ~500 edits this week ;)
Wow, this is getting interesting :D
Filtering by number of edits/hour or similar may
require
a lot of time/resources, specially in larger
Wikipedias,
(sorry, but for my thesis I'm mainly focused on
the top-ten
Wikipedias :) ).
The problem is that here you need the edits *per user*, not per page. I understand from the WikiXRay page that you're recreating the mediawiki tables.
Yeap, but only as an initial stage. Then I create some new intermediate tables to speed up the data mining.
It'd just to query each user contributions and
check the time difference. With indexes in place, you would get a time good enough.
When it may get terribly slow is if applying to all users, as you would make the algorithm quadratic.
I agree, but then, we still would need some basic criteria to decide which users to probe to identify hidden bots. I suppose a good starting point would be looking for BOT patterns in the name ¿? Mmmm, or perhaps directly with the number of revisions.
I will try to have a closer look at this after the thesis (I need to plan my next "entertainments" :) ).
Cheers,
F.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l