Bots aren't difficult to filter out btw, *at least* not for sub-samples or on a one-off basis.
For instance, this ugly, inefficient query gets all non-bot and non-IP edits to the project pages of WikiProject Medicine from 2007-2012 (19,097 edits). It runs in about 2 seconds on stat1.
*select count(rev_id) from enwiki.revision as r, enwiki.page as p where p.page_id = r.rev_page and p.page_namespace in (4,5) and p.page_title like "WikiProject_Medicine%" and r.rev_timestamp between "20070101000000" and "20080101000000" and r.rev_user != 0 and r.rev_user_text not like "%Bot" and r.rev_user_text not like "%bot" AND r.rev_user NOT IN (SELECT ug_user FROM enwiki.user_groups WHERE ug_group = 'bot'); *
The user_groups table tracks registered bots, and the string matching excludes bots* that are being run on the DL (which are more common than you might expect).
I always exclude bots** from any analysis I do, since they grossly inflate activity counts in unpredictable ways.
I definitely think it would be useful to make bot-filtered data available in wikistats and/or Limn.
- J *also unfortunately excludes the odd user with 'bot' in their username, like User:I_Jethrobot :( **unless, of course, I'm studying bots specifically
On Tue, Jul 23, 2013 at 11:01 AM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Ryan Kaldari, 23/07/2013 07:44:
My question is: Would it be possible to replace or augment these graphs
with graphs that exclude bot edits? I know that bot status is not stored in the revision table, so this would be quite expensive to tally. Would it be prohibitively expensive? Sorry if this is a dumb question.
Just don't use that graph to answer that question, because it's not the appropriate one. Changing the definitions of metrics is however tricky and best avoided whenever possible. If you want number of edits specifically, you can instead look at the recently revived http://stats.wikimedia.org/EN/** PlotsPngEditHistoryAll.htmhttp://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htm(most of them still to be updated), see http://infodisiac.com/blog/**2013/07/new-edit-and-revert-**stats/http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/
Nemo
______________________________**_________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/analyticshttps://lists.wikimedia.org/mailman/listinfo/analytics