I was poking around on stats.wikimedia.org and reportcard.wmflabs.org to see if I could find out how overall editing levels had changed (if any) over the past year. Unfortunately, it seems that all of our "edits per month" graphs show all edits, including bot edits. Since changes in bot editing levels are often dramatic from month to month, this noise effectively cancels out the usefulness of the graphs. For example, you can see a huge spike in March when I presume the Wikidata bots were running at full force: http://reportcard.wmflabs.org/#secondary-graphs-tab http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm#3
My question is: Would it be possible to replace or augment these graphs with graphs that exclude bot edits? I know that bot status is not stored in the revision table, so this would be quite expensive to tally. Would it be prohibitively expensive? Sorry if this is a dumb question.
Ryan Kaldari
I think the best way to track would be to select a random group of 1000 active editor vets each year (who have been editing for 2+ years and are thus probably going to be around for at least a year more) and track their edits during the course of a year, and then based on the same criteria, the next year make a new random selection, etc.
I would assume that over time, the yearly activities of such a random group will show some interesting seasonal fluctuations based on the language (and thus the holidays in the country of origin) of such editors.
Basing your data on actual usernames that have been somehow vetted to be active editors and not temporary bot- or project-related would be easier than just eliminating all bots.
I believe a similar study was done for new contributors, but I don't know how the selection was done.
2013/7/23, Ryan Kaldari rkaldari@wikimedia.org:
I was poking around on stats.wikimedia.org and reportcard.wmflabs.org to see if I could find out how overall editing levels had changed (if any) over the past year. Unfortunately, it seems that all of our "edits per month" graphs show all edits, including bot edits. Since changes in bot editing levels are often dramatic from month to month, this noise effectively cancels out the usefulness of the graphs. For example, you can see a huge spike in March when I presume the Wikidata bots were running at full force: http://reportcard.wmflabs.org/#secondary-graphs-tab http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm#3
My question is: Would it be possible to replace or augment these graphs with graphs that exclude bot edits? I know that bot status is not stored in the revision table, so this would be quite expensive to tally. Would it be prohibitively expensive? Sorry if this is a dumb question.
Ryan Kaldari
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Yes, I would also love to see these stats without Bot edits. We currently calculate them ourselves for Wikidata, where we gleam much more interesting and informative data than when we include the bot-edits. I would love to be able to compare that to the Wikipedias.
I also assume that the March-Spike was due to Wikidata bots, and that the following reduction on a very low level is due to the language links edits not being necessary anymore, but I would love to see this confirmed with actual data instead of gut feeling and interpretation.
Cheers, Denny
2013/7/23 Ryan Kaldari rkaldari@wikimedia.org
I was poking around on stats.wikimedia.org and reportcard.wmflabs.org to see if I could find out how overall editing levels had changed (if any) over the past year. Unfortunately, it seems that all of our "edits per month" graphs show all edits, including bot edits. Since changes in bot editing levels are often dramatic from month to month, this noise effectively cancels out the usefulness of the graphs. For example, you can see a huge spike in March when I presume the Wikidata bots were running at full force: http://reportcard.wmflabs.org/**#secondary-graphs-tabhttp://reportcard.wmflabs.org/#secondary-graphs-tab http://stats.wikimedia.org/EN/**ChartsWikipediaEN.htm#3http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm#3
My question is: Would it be possible to replace or augment these graphs with graphs that exclude bot edits? I know that bot status is not stored in the revision table, so this would be quite expensive to tally. Would it be prohibitively expensive? Sorry if this is a dumb question.
Ryan Kaldari
______________________________**_________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/analyticshttps://lists.wikimedia.org/mailman/listinfo/analytics
These charts may be what you're looking for:
http://stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm http://stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm
more charts and tables per wiki
http://stats.wikimedia.org/EN/EditsRevertsEN.htm
Wikidata: http://stats.wikimedia.org/wikispecial/EN/EditsRevertsWIKIDATA.htm
See also: http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/
In a week or so all wikis should have up to date charts.
Erik Zachte
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Denny Vrandecic Sent: Tuesday, July 23, 2013 12:13 PM To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] non-bot edits per month
Yes, I would also love to see these stats without Bot edits. We currently calculate them ourselves for Wikidata, where we gleam much more interesting and informative data than when we include the bot-edits. I would love to be able to compare that to the Wikipedias.
I also assume that the March-Spike was due to Wikidata bots, and that the following reduction on a very low level is due to the language links edits not being necessary anymore, but I would love to see this confirmed with actual data instead of gut feeling and interpretation.
Cheers,
Denny
2013/7/23 Ryan Kaldari rkaldari@wikimedia.org
I was poking around on stats.wikimedia.org and reportcard.wmflabs.org to see if I could find out how overall editing levels had changed (if any) over the past year. Unfortunately, it seems that all of our "edits per month" graphs show all edits, including bot edits. Since changes in bot editing levels are often dramatic from month to month, this noise effectively cancels out the usefulness of the graphs. For example, you can see a huge spike in March when I presume the Wikidata bots were running at full force: http://reportcard.wmflabs.org/#secondary-graphs-tab http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm#3
My question is: Would it be possible to replace or augment these graphs with graphs that exclude bot edits? I know that bot status is not stored in the revision table, so this would be quite expensive to tally. Would it be prohibitively expensive? Sorry if this is a dumb question.
Ryan Kaldari
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Graphs showing non-bot edits have been available for the largest Wikipedias since earlier this month, see Erik Z.'s announcement at http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/ .
E.g. English Wikipedia: http://stats.wikimedia.org/EN/EditsRevertsEN.htm
On Mon, Jul 22, 2013 at 10:44 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
I was poking around on stats.wikimedia.org and reportcard.wmflabs.org to see if I could find out how overall editing levels had changed (if any) over the past year. Unfortunately, it seems that all of our "edits per month" graphs show all edits, including bot edits. Since changes in bot editing levels are often dramatic from month to month, this noise effectively cancels out the usefulness of the graphs. For example, you can see a huge spike in March when I presume the Wikidata bots were running at full force: http://reportcard.wmflabs.org/#secondary-graphs-tab http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm#3
My question is: Would it be possible to replace or augment these graphs with graphs that exclude bot edits? I know that bot status is not stored in the revision table, so this would be quite expensive to tally. Would it be prohibitively expensive? Sorry if this is a dumb question.
Ryan Kaldari
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks! Those graphs are really useful. Is there any possibility that they could be added to the main stat pages for the projects, like http://stats.wikimedia.org/EN/SummaryEN.htm, or even the report card pages?
It's unfortunate that such informative graphs are currently hidden away.
Ryan Kaldari
On Jul 23, 2013, at 7:41 AM, Tilman Bayer tbayer@wikimedia.org wrote:
Graphs showing non-bot edits have been available for the largest Wikipedias since earlier this month, see Erik Z.'s announcement at http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/ .
E.g. English Wikipedia: http://stats.wikimedia.org/EN/EditsRevertsEN.htm
On Mon, Jul 22, 2013 at 10:44 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
I was poking around on stats.wikimedia.org and reportcard.wmflabs.org to see if I could find out how overall editing levels had changed (if any) over the past year. Unfortunately, it seems that all of our "edits per month" graphs show all edits, including bot edits. Since changes in bot editing levels are often dramatic from month to month, this noise effectively cancels out the usefulness of the graphs. For example, you can see a huge spike in March when I presume the Wikidata bots were running at full force: http://reportcard.wmflabs.org/#secondary-graphs-tab http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm#3
My question is: Would it be possible to replace or augment these graphs with graphs that exclude bot edits? I know that bot status is not stored in the revision table, so this would be quite expensive to tally. Would it be prohibitively expensive? Sorry if this is a dumb question.
Ryan Kaldari
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Tilman Bayer Senior Operations Analyst (Movement Communications) Wikimedia Foundation IRC (Freenode): HaeB
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Erik,
Would it be possible to generate the non-bot edits as part of the Monthly Reportcard set of jobs?
best, D
On Tue, Jul 23, 2013 at 12:24 PM, Ryan Kaldari rkaldari@wikimedia.orgwrote:
Thanks! Those graphs are really useful. Is there any possibility that they could be added to the main stat pages for the projects, like http://stats.wikimedia.org/EN/SummaryEN.htm, or even the report card pages?
It's unfortunate that such informative graphs are currently hidden away.
Ryan Kaldari
On Jul 23, 2013, at 7:41 AM, Tilman Bayer tbayer@wikimedia.org wrote:
Graphs showing non-bot edits have been available for the largest Wikipedias since earlier this month, see Erik Z.'s announcement at http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/ .
E.g. English Wikipedia: http://stats.wikimedia.org/EN/EditsRevertsEN.htm
On Mon, Jul 22, 2013 at 10:44 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
I was poking around on stats.wikimedia.org and reportcard.wmflabs.org to see
if I could find out how overall editing levels had changed (if any) over the
past year. Unfortunately, it seems that all of our "edits per month" graphs
show all edits, including bot edits. Since changes in bot editing levels are
often dramatic from month to month, this noise effectively cancels out the
usefulness of the graphs. For example, you can see a huge spike in March
when I presume the Wikidata bots were running at full force:
http://reportcard.wmflabs.org/#secondary-graphs-tab
http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm#3
My question is: Would it be possible to replace or augment these graphs with
graphs that exclude bot edits? I know that bot status is not stored in the
revision table, so this would be quite expensive to tally. Would it be
prohibitively expensive? Sorry if this is a dumb question.
Ryan Kaldari
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
-- Tilman Bayer Senior Operations Analyst (Movement Communications) Wikimedia Foundation IRC (Freenode): HaeB
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Yep, I'm also in favor of using bot-free edits on most charts and for Monthly Reportcard.
I think in most cases they can replace current stats instead of duplicating charts.
Erik
From: Diederik van Liere [mailto:dvanliere@wikimedia.org] Sent: Tuesday, July 23, 2013 6:28 PM To: Erik Zachte Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] non-bot edits per month
Hi Erik,
Would it be possible to generate the non-bot edits as part of the Monthly Reportcard set of jobs?
best,
D
On Tue, Jul 23, 2013 at 12:24 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
Thanks! Those graphs are really useful. Is there any possibility that they could be added to the main stat pages for the projects, like http://stats.wikimedia.org/EN/SummaryEN.htm, or even the report card pages?
It's unfortunate that such informative graphs are currently hidden away.
Ryan Kaldari
On Jul 23, 2013, at 7:41 AM, Tilman Bayer tbayer@wikimedia.org wrote:
Graphs showing non-bot edits have been available for the largest Wikipedias since earlier this month, see Erik Z.'s announcement at http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/ .
E.g. English Wikipedia: http://stats.wikimedia.org/EN/EditsRevertsEN.htm
On Mon, Jul 22, 2013 at 10:44 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
I was poking around on stats.wikimedia.org and reportcard.wmflabs.org to see
if I could find out how overall editing levels had changed (if any) over the
past year. Unfortunately, it seems that all of our "edits per month" graphs
show all edits, including bot edits. Since changes in bot editing levels are
often dramatic from month to month, this noise effectively cancels out the
usefulness of the graphs. For example, you can see a huge spike in March
when I presume the Wikidata bots were running at full force:
http://reportcard.wmflabs.org/#secondary-graphs-tab
http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm#3
My question is: Would it be possible to replace or augment these graphs with
graphs that exclude bot edits? I know that bot status is not stored in the
revision table, so this would be quite expensive to tally. Would it be
prohibitively expensive? Sorry if this is a dumb question.
Ryan Kaldari
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
Navigation on wikistats portal is far from ideal.
Right now you can find them as follows
choose Special
choose Edits & Reverts
you'll see links to overview per project which again links to page per wiki with detailed tables and charts
Erik Zachte
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Ryan Kaldari Sent: Tuesday, July 23, 2013 6:25 PM To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] non-bot edits per month
Thanks! Those graphs are really useful. Is there any possibility that they could be added to the main stat pages for the projects, like http://stats.wikimedia.org/EN/SummaryEN.htm, or even the report card pages?
It's unfortunate that such informative graphs are currently hidden away.
Ryan Kaldari
On Jul 23, 2013, at 7:41 AM, Tilman Bayer tbayer@wikimedia.org wrote:
Graphs showing non-bot edits have been available for the largest Wikipedias since earlier this month, see Erik Z.'s announcement at http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/ .
E.g. English Wikipedia: http://stats.wikimedia.org/EN/EditsRevertsEN.htm
On Mon, Jul 22, 2013 at 10:44 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
I was poking around on stats.wikimedia.org and reportcard.wmflabs.org to see
if I could find out how overall editing levels had changed (if any) over the
past year. Unfortunately, it seems that all of our "edits per month" graphs
show all edits, including bot edits. Since changes in bot editing levels are
often dramatic from month to month, this noise effectively cancels out the
usefulness of the graphs. For example, you can see a huge spike in March
when I presume the Wikidata bots were running at full force:
http://reportcard.wmflabs.org/#secondary-graphs-tab
http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm#3
My question is: Would it be possible to replace or augment these graphs with
graphs that exclude bot edits? I know that bot status is not stored in the
revision table, so this would be quite expensive to tally. Would it be
prohibitively expensive? Sorry if this is a dumb question.
Ryan Kaldari
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
Great stuff Erik, thanks! It's fascinating to see the differences per project. The 9% for reversions on the English Wikipedia seemed really high to me, but then I saw that about half of those are IP's who revert their own edits.
I love the huge spike in bot edits on the Dutch Wikipedia - is that all WikiData stuff, or is that the species/genus bot? Jane
2013/7/23, Erik Zachte ezachte@wikimedia.org:
Navigation on wikistats portal is far from ideal.
Right now you can find them as follows
choose Special
choose Edits & Reverts
you'll see links to overview per project which again links to page per wiki with detailed tables and charts
Erik Zachte
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Ryan Kaldari Sent: Tuesday, July 23, 2013 6:25 PM To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] non-bot edits per month
Thanks! Those graphs are really useful. Is there any possibility that they could be added to the main stat pages for the projects, like http://stats.wikimedia.org/EN/SummaryEN.htm, or even the report card pages?
It's unfortunate that such informative graphs are currently hidden away.
Ryan Kaldari
On Jul 23, 2013, at 7:41 AM, Tilman Bayer tbayer@wikimedia.org wrote:
Graphs showing non-bot edits have been available for the largest Wikipedias since earlier this month, see Erik Z.'s announcement at http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/ .
E.g. English Wikipedia: http://stats.wikimedia.org/EN/EditsRevertsEN.htm
On Mon, Jul 22, 2013 at 10:44 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
I was poking around on stats.wikimedia.org and reportcard.wmflabs.org to see
if I could find out how overall editing levels had changed (if any) over the
past year. Unfortunately, it seems that all of our "edits per month" graphs
show all edits, including bot edits. Since changes in bot editing levels are
often dramatic from month to month, this noise effectively cancels out the
usefulness of the graphs. For example, you can see a huge spike in March
when I presume the Wikidata bots were running at full force:
http://reportcard.wmflabs.org/#secondary-graphs-tab
http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm#3
My question is: Would it be possible to replace or augment these graphs with
graphs that exclude bot edits? I know that bot status is not stored in the
revision table, so this would be quite expensive to tally. Would it be
prohibitively expensive? Sorry if this is a dumb question.
Ryan Kaldari
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
-- Tilman Bayer Senior Operations Analyst (Movement Communications) Wikimedia Foundation IRC (Freenode): HaeB
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Ryan Kaldari, 23/07/2013 07:44:
My question is: Would it be possible to replace or augment these graphs with graphs that exclude bot edits? I know that bot status is not stored in the revision table, so this would be quite expensive to tally. Would it be prohibitively expensive? Sorry if this is a dumb question.
Just don't use that graph to answer that question, because it's not the appropriate one. Changing the definitions of metrics is however tricky and best avoided whenever possible. If you want number of edits specifically, you can instead look at the recently revived http://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htm (most of them still to be updated), see http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/
Nemo
Bots aren't difficult to filter out btw, *at least* not for sub-samples or on a one-off basis.
For instance, this ugly, inefficient query gets all non-bot and non-IP edits to the project pages of WikiProject Medicine from 2007-2012 (19,097 edits). It runs in about 2 seconds on stat1.
*select count(rev_id) from enwiki.revision as r, enwiki.page as p where p.page_id = r.rev_page and p.page_namespace in (4,5) and p.page_title like "WikiProject_Medicine%" and r.rev_timestamp between "20070101000000" and "20080101000000" and r.rev_user != 0 and r.rev_user_text not like "%Bot" and r.rev_user_text not like "%bot" AND r.rev_user NOT IN (SELECT ug_user FROM enwiki.user_groups WHERE ug_group = 'bot'); *
The user_groups table tracks registered bots, and the string matching excludes bots* that are being run on the DL (which are more common than you might expect).
I always exclude bots** from any analysis I do, since they grossly inflate activity counts in unpredictable ways.
I definitely think it would be useful to make bot-filtered data available in wikistats and/or Limn.
- J *also unfortunately excludes the odd user with 'bot' in their username, like User:I_Jethrobot :( **unless, of course, I'm studying bots specifically
On Tue, Jul 23, 2013 at 11:01 AM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Ryan Kaldari, 23/07/2013 07:44:
My question is: Would it be possible to replace or augment these graphs
with graphs that exclude bot edits? I know that bot status is not stored in the revision table, so this would be quite expensive to tally. Would it be prohibitively expensive? Sorry if this is a dumb question.
Just don't use that graph to answer that question, because it's not the appropriate one. Changing the definitions of metrics is however tricky and best avoided whenever possible. If you want number of edits specifically, you can instead look at the recently revived http://stats.wikimedia.org/EN/** PlotsPngEditHistoryAll.htmhttp://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htm(most of them still to be updated), see http://infodisiac.com/blog/**2013/07/new-edit-and-revert-**stats/http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/
Nemo
______________________________**_________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/analyticshttps://lists.wikimedia.org/mailman/listinfo/analytics
Wikistats has several criteria for bot detection:
1) Is a name registered as bot, in other words is there a bot flag in user group table?
2) Does it sound like a bot? (nowadays certain names are only allowed for bots, on many wikis)
More precise does '[Bb]ot' occur at the end of a name or before a non alpha character ?
3) Is it known to be an unregistered bot ? (WIkipedia has a list of false negatives at http://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edit s/Unflagged_bots ) I copied that list long ago but do not keep it auto-updated.
4) Is a name flagged as a bot on at least 10 wikis than treat it so on any wiki within the project
(in the past when user names could easily collide this was more relevant)
Basic rationale is that on smaller wikis bot registrations are often forgotten.
With SUL it is unlikely that people use same name as bot on one wiki and as regular user on another wiki.
5) Three names that sound like bot are hard coded as exemptions (people who wrote about it)
--
Jonathan:
I definitely think it would be useful to make bot-filtered data available
in wikistats and/or Limn.
And we do, in Wikistats, did you miss this part of the thread:
http://stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm http://stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm
more charts and tables per wiki
http://stats.wikimedia.org/EN/EditsRevertsEN.htm
See also: http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/
But bot free edits in Limn is a good point, thanks for your +1
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Jonathan Morgan Sent: Wednesday, July 24, 2013 9:11 PM To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] non-bot edits per month
Bots aren't difficult to filter out btw, at least not for sub-samples or on a one-off basis.
For instance, this ugly, inefficient query gets all non-bot and non-IP edits to the project pages of WikiProject Medicine from 2007-2012 (19,097 edits). It runs in about 2 seconds on stat1.
select count(rev_id) from enwiki.revision as r, enwiki.page as p where p.page_id = r.rev_page and p.page_namespace in (4,5) and p.page_title like "WikiProject_Medicine%" and r.rev_timestamp between "20070101000000" and "20080101000000" and r.rev_user != 0 and r.rev_user_text not like "%Bot" and r.rev_user_text not like "%bot" AND r.rev_user NOT IN (SELECT ug_user FROM enwiki.user_groups WHERE ug_group = 'bot');
The user_groups table tracks registered bots, and the string matching excludes bots* that are being run on the DL (which are more common than you might expect).
I always exclude bots** from any analysis I do, since they grossly inflate activity counts in unpredictable ways.
I definitely think it would be useful to make bot-filtered data available in wikistats and/or Limn.
- J
*also unfortunately excludes the odd user with 'bot' in their username, like User:I_Jethrobot :(
**unless, of course, I'm studying bots specifically
On Tue, Jul 23, 2013 at 11:01 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Ryan Kaldari, 23/07/2013 07:44:
My question is: Would it be possible to replace or augment these graphs with graphs that exclude bot edits? I know that bot status is not stored in the revision table, so this would be quite expensive to tally. Would it be prohibitively expensive? Sorry if this is a dumb question.
Just don't use that graph to answer that question, because it's not the appropriate one. Changing the definitions of metrics is however tricky and best avoided whenever possible. If you want number of edits specifically, you can instead look at the recently revived http://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htm (most of them still to be updated), see http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/
Nemo
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Yep. Missed it :) maybe I need a bot to summarize email threads.
tapped on a touchscreen. please excuse my terseness and tyypos.
On Jul 24, 2013, at 1:50 PM, "Erik Zachte" ezachte@wikimedia.org wrote:
Wikistats has several criteria for bot detection:
Is a name registered as bot, in other words is there a bot flag in user group table?
Does it sound like a bot? (nowadays certain names are only allowed for bots, on many wikis)
More precise does '[Bb]ot' occur at the end of a name or before a non alpha character ?
- Is it known to be an unregistered bot ? (WIkipedia has a list of false negatives at http://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edit... )
I copied that list long ago but do not keep it auto-updated.
- Is a name flagged as a bot on at least 10 wikis than treat it so on any wiki within the project
(in the past when user names could easily collide this was more relevant) Basic rationale is that on smaller wikis bot registrations are often forgotten. With SUL it is unlikely that people use same name as bot on one wiki and as regular user on another wiki.
- Three names that sound like bot are hard coded as exemptions (people who wrote about it)
--
Jonathan:
I definitely think it would be useful to make bot-filtered data available in wikistats and/or Limn.
And we do, in Wikistats, did you miss this part of the thread: http://stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm more charts and tables per wiki http://stats.wikimedia.org/EN/EditsRevertsEN.htm See also: http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/
But bot free edits in Limn is a good point, thanks for your +1
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Jonathan Morgan Sent: Wednesday, July 24, 2013 9:11 PM To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] non-bot edits per month
Bots aren't difficult to filter out btw, at least not for sub-samples or on a one-off basis.
For instance, this ugly, inefficient query gets all non-bot and non-IP edits to the project pages of WikiProject Medicine from 2007-2012 (19,097 edits). It runs in about 2 seconds on stat1.
select count(rev_id) from enwiki.revision as r, enwiki.page as p where p.page_id = r.rev_page and p.page_namespace in (4,5) and p.page_title like "WikiProject_Medicine%" and r.rev_timestamp between "20070101000000" and "20080101000000" and r.rev_user != 0 and r.rev_user_text not like "%Bot" and r.rev_user_text not like "%bot" AND r.rev_user NOT IN (SELECT ug_user FROM enwiki.user_groups WHERE ug_group = 'bot');
The user_groups table tracks registered bots, and the string matching excludes bots* that are being run on the DL (which are more common than you might expect).
I always exclude bots** from any analysis I do, since they grossly inflate activity counts in unpredictable ways.
I definitely think it would be useful to make bot-filtered data available in wikistats and/or Limn.
- J
*also unfortunately excludes the odd user with 'bot' in their username, like User:I_Jethrobot :( **unless, of course, I'm studying bots specifically
On Tue, Jul 23, 2013 at 11:01 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote: Ryan Kaldari, 23/07/2013 07:44:
My question is: Would it be possible to replace or augment these graphs with graphs that exclude bot edits? I know that bot status is not stored in the revision table, so this would be quite expensive to tally. Would it be prohibitively expensive? Sorry if this is a dumb question.
Just don't use that graph to answer that question, because it's not the appropriate one. Changing the definitions of metrics is however tricky and best avoided whenever possible. If you want number of edits specifically, you can instead look at the recently revived http://stats.wikimedia.org/EN/PlotsPngEditHistoryAll.htm (most of them still to be updated), see http://infodisiac.com/blog/2013/07/new-edit-and-revert-stats/
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Jonathan T. Morgan Research Strategist Wikimedia Foundation _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics