Summary: we have some new stats regarding gadget usage across WMF sites, but I'd like more analysis of gadget & bot usage.
Oliver Keyes has some code and results up at https://github.com/Ironholds/MetaAnalysis/tree/master/GadgetUsage to analyze "data around gadgets being used on various wikimedia projects":
"GadgetUsage.r is the generation script. It is dependent on (a) access to the analytics slaves and (b) the list of databases
"gadget_data.tsv is the raw data, consisting of an aggregate number of users for each preference on each wiki, with preference, wiki and wiki type (source, wiki, versity, etc) defined.
"gadgets_by_wikis.tsv is a rework of the data to look at what gadgets are used on multiple wikis, and how many wikis that is. It also includes an aggregate of the number of users across those wikis using the gadget.
"wikis_by_gadgets.tsv is a rework that looks at the number of distinct gadgets on each individual wiki. Unsuprisingly there's a power law."
This helps a lot with addressing one of the analytics "dreams" from https://www.mediawiki.org/wiki/Analytics/Dreams - "What proportion of logged-in editors have activated any gadgets at all? What are the most popular gadgets?" However, Oliver's data "is based on preference data - it may or may not include data for those gadgets set as defaults." So if someone could improve this to ensure that we appropriately count gadget usage for gadgets that default to on, that would be very helpful.
My team would also like to know: * who maintains the most popular gadgets? (so we can invite them to hackathons, help get them training, get those gadgets localised and ported to other wikis, and so on) * when were the gadgets last updated? (so we can identify stale ones that enthusiastic volunteers could take over maintaining) * similar stats regarding bot usage -- what bots are making the most edits, or edits that in aggregate change the most bytes? who owns those bots? what wikis are they active on? (so we can help maintainers better, ensure they hear about API breaking changes, etc., and develop a bot inventory/directory to make it easier for other wikis' users to start using useful bots)
If there's anyone interested in taking this on, either inside or outside WMF's Analytics team, that would be great. Otherwise I anticipate that Engineering Community Team will take it on sometime in the October-December 2013 period.
Sumana,
- similar stats regarding bot usage -- what bots are making the most edits, or edits that in aggregate change the most bytes? who owns those bots? what wikis are they active on?
There are some wikistats reports on bots for each project, e.g. Wikipedia: http://stats.wikimedia.org/EN/BotActivityMatrixCreates.htm http://stats.wikimedia.org/EN/BotActivityMatrixEdits.htm
Erik
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Sumana Harihareswara Sent: Thursday, July 04, 2013 12:23 AM To: A mailinglist for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: [Analytics] Statistics on gadget & bot usage on all wikis
Summary: we have some new stats regarding gadget usage across WMF sites, but I'd like more analysis of gadget & bot usage.
Oliver Keyes has some code and results up at https://github.com/Ironholds/MetaAnalysis/tree/master/GadgetUsage to analyze "data around gadgets being used on various wikimedia projects":
"GadgetUsage.r is the generation script. It is dependent on (a) access to the analytics slaves and (b) the list of databases
"gadget_data.tsv is the raw data, consisting of an aggregate number of users for each preference on each wiki, with preference, wiki and wiki type (source, wiki, versity, etc) defined.
"gadgets_by_wikis.tsv is a rework of the data to look at what gadgets are used on multiple wikis, and how many wikis that is. It also includes an aggregate of the number of users across those wikis using the gadget.
"wikis_by_gadgets.tsv is a rework that looks at the number of distinct gadgets on each individual wiki. Unsuprisingly there's a power law."
This helps a lot with addressing one of the analytics "dreams" from https://www.mediawiki.org/wiki/Analytics/Dreams - "What proportion of logged-in editors have activated any gadgets at all? What are the most popular gadgets?" However, Oliver's data "is based on preference data - it may or may not include data for those gadgets set as defaults." So if someone could improve this to ensure that we appropriately count gadget usage for gadgets that default to on, that would be very helpful.
My team would also like to know: * who maintains the most popular gadgets? (so we can invite them to hackathons, help get them training, get those gadgets localised and ported to other wikis, and so on) * when were the gadgets last updated? (so we can identify stale ones that enthusiastic volunteers could take over maintaining) * similar stats regarding bot usage -- what bots are making the most edits, or edits that in aggregate change the most bytes? who owns those bots? what wikis are they active on? (so we can help maintainers better, ensure they hear about API breaking changes, etc., and develop a bot inventory/directory to make it easier for other wikis' users to start using useful bots)
If there's anyone interested in taking this on, either inside or outside WMF's Analytics team, that would be great. Otherwise I anticipate that Engineering Community Team will take it on sometime in the October-December 2013 period.
-- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
HI Sumana,
We've got a bit of an analyst shortage at the moment, but I'll see if we can get this done earlier than October. I'll touch base next week.
-Toby
On Wed, Jul 3, 2013 at 3:22 PM, Sumana Harihareswara sumanah@wikimedia.orgwrote:
Summary: we have some new stats regarding gadget usage across WMF sites, but I'd like more analysis of gadget & bot usage.
Oliver Keyes has some code and results up at https://github.com/Ironholds/MetaAnalysis/tree/master/GadgetUsage to analyze "data around gadgets being used on various wikimedia projects":
"GadgetUsage.r is the generation script. It is dependent on (a) access to the analytics slaves and (b) the list of databases
"gadget_data.tsv is the raw data, consisting of an aggregate number of users for each preference on each wiki, with preference, wiki and wiki type (source, wiki, versity, etc) defined.
"gadgets_by_wikis.tsv is a rework of the data to look at what gadgets are used on multiple wikis, and how many wikis that is. It also includes an aggregate of the number of users across those wikis using the gadget.
"wikis_by_gadgets.tsv is a rework that looks at the number of distinct gadgets on each individual wiki. Unsuprisingly there's a power law."
This helps a lot with addressing one of the analytics "dreams" from https://www.mediawiki.org/wiki/Analytics/Dreams - "What proportion of logged-in editors have activated any gadgets at all? What are the most popular gadgets?" However, Oliver's data "is based on preference data - it may or may not include data for those gadgets set as defaults." So if someone could improve this to ensure that we appropriately count gadget usage for gadgets that default to on, that would be very helpful.
My team would also like to know:
- who maintains the most popular gadgets? (so we can invite them to
hackathons, help get them training, get those gadgets localised and ported to other wikis, and so on)
- when were the gadgets last updated? (so we can identify stale ones
that enthusiastic volunteers could take over maintaining)
- similar stats regarding bot usage -- what bots are making the most
edits, or edits that in aggregate change the most bytes? who owns those bots? what wikis are they active on? (so we can help maintainers better, ensure they hear about API breaking changes, etc., and develop a bot inventory/directory to make it easier for other wikis' users to start using useful bots)
If there's anyone interested in taking this on, either inside or outside WMF's Analytics team, that would be great. Otherwise I anticipate that Engineering Community Team will take it on sometime in the October-December 2013 period.
-- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Sumana,
This is great information on gadget "enablement" but you're right, an enabled preference does not equate to usage per se. There must be some way to get this information out of the logfiles. I'm not familiar enough with how gadget usage can be quantified, but I'm sure that it should be possible.
I'm in the process of getting involved deeper in the analytics team as a volunteer ever since I found out about Kraken at the Amsterdam Hackathon. To this end I've come up with some questions that I'd like to be able to ask of the data and sent this to Diederik and Erik as a basis for providing some insights that I think will be valuable. When I am further along in exploring the data and learning the tools, I'll be happy to help take a look at gadget and bot "signs of life" that can be found in the logfiles.
Cheers,
--Michael
On Thu, Jul 4, 2013 at 12:22 AM, Sumana Harihareswara <sumanah@wikimedia.org
wrote:
Summary: we have some new stats regarding gadget usage across WMF sites, but I'd like more analysis of gadget & bot usage.
Oliver Keyes has some code and results up at https://github.com/Ironholds/MetaAnalysis/tree/master/GadgetUsage to analyze "data around gadgets being used on various wikimedia projects":
"GadgetUsage.r is the generation script. It is dependent on (a) access to the analytics slaves and (b) the list of databases
"gadget_data.tsv is the raw data, consisting of an aggregate number of users for each preference on each wiki, with preference, wiki and wiki type (source, wiki, versity, etc) defined.
"gadgets_by_wikis.tsv is a rework of the data to look at what gadgets are used on multiple wikis, and how many wikis that is. It also includes an aggregate of the number of users across those wikis using the gadget.
"wikis_by_gadgets.tsv is a rework that looks at the number of distinct gadgets on each individual wiki. Unsuprisingly there's a power law."
This helps a lot with addressing one of the analytics "dreams" from https://www.mediawiki.org/wiki/Analytics/Dreams - "What proportion of logged-in editors have activated any gadgets at all? What are the most popular gadgets?" However, Oliver's data "is based on preference data - it may or may not include data for those gadgets set as defaults." So if someone could improve this to ensure that we appropriately count gadget usage for gadgets that default to on, that would be very helpful.
My team would also like to know:
- who maintains the most popular gadgets? (so we can invite them to
hackathons, help get them training, get those gadgets localised and ported to other wikis, and so on)
- when were the gadgets last updated? (so we can identify stale ones
that enthusiastic volunteers could take over maintaining)
- similar stats regarding bot usage -- what bots are making the most
edits, or edits that in aggregate change the most bytes? who owns those bots? what wikis are they active on? (so we can help maintainers better, ensure they hear about API breaking changes, etc., and develop a bot inventory/directory to make it easier for other wikis' users to start using useful bots)
If there's anyone interested in taking this on, either inside or outside WMF's Analytics team, that would be great. Otherwise I anticipate that Engineering Community Team will take it on sometime in the October-December 2013 period.
-- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks, Michael! Have you had any progress in this area?
Thanks, Sumana
I'm about to head off on my sabbatical, but I just wanted to mention that I would be grateful for any work people could do on the request I mentioned below. Thanks!
(More info at http://lists.wikimedia.org/pipermail/wikitech-l/2013-August/071542.html about my sabbatical. If you need to talk about Wikimedia technical community stuff before January, please consult Quim Gil, qgil at wikimedia dot org. Looking forward to coming back in January.)
Hi Sumana, (in three months I guess)
Op 4-7-2013 0:22, Sumana Harihareswara schreef:
Summary: we have some new stats regarding gadget usage across WMF sites, but I'd like more analysis of gadget & bot usage.
Oliver Keyes has some code and results up at https://github.com/Ironholds/MetaAnalysis/tree/master/GadgetUsage to analyze "data around gadgets being used on various wikimedia projects":
A while ago I did some some simple analytics on user language settings. It's at https://commons.wikimedia.org/w/index.php?title=Commons:Template_i18n/Interf... This was based on the user_properties table (https://www.mediawiki.org/wiki/Manual:User_properties_table). I'm a pretty sure you can grab from that table what gadgets are enabled. Should be straightforward to write a script that goes over all wiki's, grabs these settings and produce a report. You need elevated access for this because this is not normally exposed in the data we have at the Toolserver or Toollabs.
My team would also like to know:
- who maintains the most popular gadgets? (so we can invite them to
hackathons, help get them training, get those gadgets localised and ported to other wikis, and so on)
All gadgets have a prefix, it should be straightforward to write a script to see who do the most edits on these.
- when were the gadgets last updated? (so we can identify stale ones
that enthusiastic volunteers could take over maintaining)
Easy one too.
- similar stats regarding bot usage -- what bots are making the most
edits, or edits that in aggregate change the most bytes? who owns those bots? what wikis are they active on? (so we can help maintainers better, ensure they hear about API breaking changes, etc., and develop a bot inventory/directory to make it easier for other wikis' users to start using useful bots)
Erik already pointed out some useful statistics on that. As you might know I'm one of the maintainers of Pywikipedia. In the framework we send the user-agent in this format: USER_AGENT_FORMAT = '{script}/r{version[rev]} Pywikipediabot/1.0' or '{script}/r{version[rev]} Pywikipediabot/2.0'. We really don't know who is running what. Maybe you can gather some data (maybe with more samples) to answer to following questions: * Usage per script * What version are people running? (we switched from svn to git some time ago, but a lot of people seem to be still using svn) * Total number of distinct users using this (distinct ipaddresses) * The 1.0 / 2.0 ratio
Maarten
User agents would be great to get a hold of in an easy/aggregated format :). I'm planning on redoing my gadgetry script (which runs over the analytics slaves inside the cluster, rather than the TS), so if people have specific questions - last edits to gadgets, that sort of thing - drop me a note.
On 28 September 2013 04:55, Maarten Dammers maarten@mdammers.nl wrote:
Hi Sumana, (in three months I guess)
Op 4-7-2013 0:22, Sumana Harihareswara schreef:
Summary: we have some new stats regarding gadget usage across WMF sites,
but I'd like more analysis of gadget & bot usage.
Oliver Keyes has some code and results up at https://github.com/Ironholds/**MetaAnalysis/tree/master/**GadgetUsagehttps://github.com/Ironholds/MetaAnalysis/tree/master/GadgetUsageto analyze "data around gadgets being used on various wikimedia projects":
A while ago I did some some simple analytics on user language settings. It's at https://commons.wikimedia.org/**w/index.php?title=Commons:** Template_i18n/Interface_**language_statistics&oldid=**19334123https://commons.wikimedia.org/w/index.php?title=Commons:Template_i18n/Interface_language_statistics&oldid=19334123 This was based on the user_properties table (https://www.mediawiki.org/** wiki/Manual:User_properties_**tablehttps://www.mediawiki.org/wiki/Manual:User_properties_table). I'm a pretty sure you can grab from that table what gadgets are enabled. Should be straightforward to write a script that goes over all wiki's, grabs these settings and produce a report. You need elevated access for this because this is not normally exposed in the data we have at the Toolserver or Toollabs.
My team would also like to know:
- who maintains the most popular gadgets? (so we can invite them to
hackathons, help get them training, get those gadgets localised and ported to other wikis, and so on)
All gadgets have a prefix, it should be straightforward to write a script to see who do the most edits on these.
- when were the gadgets last updated? (so we can identify stale ones
that enthusiastic volunteers could take over maintaining)
Easy one too.
- similar stats regarding bot usage -- what bots are making the most
edits, or edits that in aggregate change the most bytes? who owns those bots? what wikis are they active on? (so we can help maintainers better, ensure they hear about API breaking changes, etc., and develop a bot inventory/directory to make it easier for other wikis' users to start using useful bots)
Erik already pointed out some useful statistics on that. As you might know I'm one of the maintainers of Pywikipedia. In the framework we send the user-agent in this format: USER_AGENT_FORMAT = '{script}/r{version[rev]} Pywikipediabot/1.0' or '{script}/r{version[rev]} Pywikipediabot/2.0'. We really don't know who is running what. Maybe you can gather some data (maybe with more samples) to answer to following questions:
- Usage per script
- What version are people running? (we switched from svn to git some time
ago, but a lot of people seem to be still using svn)
- Total number of distinct users using this (distinct ipaddresses)
- The 1.0 / 2.0 ratio
Maarten
______________________________**_________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/analyticshttps://lists.wikimedia.org/mailman/listinfo/analytics