--- El lun, 17/11/08, Platonides <Platonides(a)gmail.com> escribió:
De: Platonides <Platonides(a)gmail.com>
Asunto: Re: [Wiki-research-l] "Regular contributor"
Para: wiki-research-l(a)lists.wikimedia.org
Fecha: lunes, 17 noviembre, 2008 9:42
Felipe Ortega wrote:
I also have my doubts about the filtering
conditions.
For
instance, in eswiki, 'BOTpolicia' is not
registered as such
and it's responsible for more than 90.000
edits,
so far. On
the other hand, a famous user in eswiki (retired
for
this
moment, id=13770 to be precise)
He has returned, ~500 edits this week ;)
Wow, this is getting interesting :D
Filtering by number of edits/hour or similar may
require
a lot of time/resources, specially in larger
Wikipedias,
(sorry, but for my thesis I'm mainly focused
on
the top-ten
Wikipedias :) ).
The problem is that here you need the edits *per user*, not
per page.
I understand from the WikiXRay page that you're
recreating the mediawiki
tables.
Yeap, but only as an initial stage. Then I create some new
intermediate tables to speed up the data mining.
It'd just to query each user contributions and
check the time
difference.
With indexes in place, you would get a time good enough.
When it may get terribly slow is if applying to all users,
as you would
make the algorithm quadratic.
I agree, but then, we still would need some basic criteria to decide which
users to probe to identify hidden bots. I suppose a good starting
point would be looking for BOT patterns in the name ¿? Mmmm, or
perhaps directly with the number of revisions.
I will try to have a closer look at this after the thesis
(I need to plan my next "entertainments" :) ).
Cheers,
F.
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l