Wikistats has several criteria for bot detection:
1) Is a name registered as bot, in other words is there a bot flag in user
group table?
2) Does it sound like a bot? (nowadays certain names are only allowed for
bots, on many wikis)
More precise does '[Bb]ot' occur at the end of a name or before a non alpha
character ?
3) Is it known to be an unregistered bot ? (WIkipedia has a list of false
negatives at
http://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edit
s/Unflagged_bots )
I copied that list long ago but do not keep it auto-updated.
4) Is a name flagged as a bot on at least 10 wikis than treat it so on any
wiki within the project
(in the past when user names could easily collide this was more relevant)
Basic rationale is that on smaller wikis bot registrations are often
forgotten.
With SUL it is unlikely that people use same name as bot on one wiki and as
regular user on another wiki.
5) Three names that sound like bot are hard coded as exemptions (people who
wrote about it)
--
Jonathan:
I definitely think it would be useful to make
bot-filtered data available
in wikistats and/or Limn.
And we do, in Wikistats, did you miss this part of the
thread:
<http://stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm>
http://stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm
more charts and tables per wiki
http://stats.wikimedia.org/EN/EditsRevertsEN.htm
See also:
http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/
But bot free edits in Limn is a good point, thanks for your +1
Erik
From: analytics-bounces(a)lists.wikimedia.org
[mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Jonathan Morgan
Sent: Wednesday, July 24, 2013 9:11 PM
To: A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics.
Subject: Re: [Analytics] non-bot edits per month
Bots aren't difficult to filter out btw, at least not for sub-samples or on
a one-off basis.
For instance, this ugly, inefficient query gets all non-bot and non-IP edits
to the project pages of WikiProject Medicine from 2007-2012 (19,097 edits).
It runs in about 2 seconds on stat1.
select count(rev_id) from enwiki.revision as r, enwiki.page as p where
p.page_id = r.rev_page and p.page_namespace in (4,5) and p.page_title like
"WikiProject_Medicine%" and r.rev_timestamp between "20070101000000"
and
"20080101000000" and r.rev_user != 0 and r.rev_user_text not like
"%Bot" and
r.rev_user_text not like "%bot" AND r.rev_user NOT IN (SELECT ug_user FROM
enwiki.user_groups WHERE ug_group = 'bot');
The user_groups table tracks registered bots, and the string matching
excludes bots* that are being run on the DL (which are more common than you
might expect).
I always exclude bots** from any analysis I do, since they grossly inflate
activity counts in unpredictable ways.
I definitely think it would be useful to make bot-filtered data available in
wikistats and/or Limn.
- J
*also unfortunately excludes the odd user with 'bot' in their username, like
User:I_Jethrobot :(
**unless, of course, I'm studying bots specifically
On Tue, Jul 23, 2013 at 11:01 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com>
wrote:
Ryan Kaldari, 23/07/2013 07:44:
My question is: Would it be possible to replace or augment these graphs
with graphs that exclude bot edits? I know that bot status is not stored
in the revision table, so this would be quite expensive to tally. Would
it be prohibitively expensive? Sorry if this is a dumb question.
Just don't use that graph to answer that question, because it's not the
appropriate one. Changing the definitions of metrics is however tricky and
best avoided whenever possible.
If you want number of edits specifically, you can instead look at the
recently revived
http://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htm
(most of them still to be updated), see
http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/
Nemo
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Research Strategist
Wikimedia Foundation