Hi Analytics,
I'm looking for a way to do a cohort analysis for a presentation that I'm drafting.
I want a report that shows: * A list of languages showing the users who speak each language as identified on their user pages * A list of projects where users have made at least 5 edits in the past 12 months * A list of group members that have administrator rights and which wikis are involved * A list of public mailing lists where members have contributed in the past 12 months * Number of public emails on those mailing lists in the past 12 months * Total edits made by the cohort * Total bytes changed by the cohort * Total logged-in time for the cohort, if log-in time aggregation is being done
What automated tools could I use to create this report?
Are there any significant editor productivity metrics that are missing from this list?
Thanks,
Pine
Context for this?
The answer, for some of those questions, is like to be "none of them". Those that look problematic: *We don't track self-reported language skills, because, well, userboxen are incredibly inconsistent between projects, and building up an accurate index would be a monumental effort. Wikidata would reduce the difficulty of this, but not to the point where it's either easy enough that it's no problem for someone to do in their spare time, or important enough that we've dedicated time to it. *Tracking users between projects is something we're beard-stroking over at the moment because the CentralAuth infrastructure and db structure is the anti-pattern for "designing for researchers". *We don't track contributions to mailing lists, and it would be very difficult to accurately and automatically line up mailing list contributions with users in a public tool unless we exposed email addresses as registered by MediaWiki - which we don't do, because that's personal information. *We don't do any work around login time. We don't track it directly, and we can't approximate it indirectly (we can approximate edit session length, but that's a different thing entirely, and something done on an ad-hoc rather than regular basis).
On 25 May 2014 00:16, ENWP Pine deyntestiss@hotmail.com wrote:
Hi Analytics,
I'm looking for a way to do a cohort analysis for a presentation that I'm drafting.
I want a report that shows:
- A list of languages showing the users who speak each language as
identified on their user pages
- A list of projects where users have made at least 5 edits in the past 12
months
- A list of group members that have administrator rights and which wikis
are involved
- A list of public mailing lists where members have contributed in the
past 12 months
- Number of public emails on those mailing lists in the past 12 months
- Total edits made by the cohort
- Total bytes changed by the cohort
- Total logged-in time for the cohort, if log-in time aggregation is being
done
What automated tools could I use to create this report?
Are there any significant editor productivity metrics that are missing from this list?
Thanks,
Pine
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Bah. "likely to be".
On 25 May 2014 03:24, Oliver Keyes okeyes@wikimedia.org wrote:
Context for this?
The answer, for some of those questions, is like to be "none of them". Those that look problematic: *We don't track self-reported language skills, because, well, userboxen are incredibly inconsistent between projects, and building up an accurate index would be a monumental effort. Wikidata would reduce the difficulty of this, but not to the point where it's either easy enough that it's no problem for someone to do in their spare time, or important enough that we've dedicated time to it. *Tracking users between projects is something we're beard-stroking over at the moment because the CentralAuth infrastructure and db structure is the anti-pattern for "designing for researchers". *We don't track contributions to mailing lists, and it would be very difficult to accurately and automatically line up mailing list contributions with users in a public tool unless we exposed email addresses as registered by MediaWiki - which we don't do, because that's personal information. *We don't do any work around login time. We don't track it directly, and we can't approximate it indirectly (we can approximate edit session length, but that's a different thing entirely, and something done on an ad-hoc rather than regular basis).
On 25 May 2014 00:16, ENWP Pine deyntestiss@hotmail.com wrote:
Hi Analytics,
I'm looking for a way to do a cohort analysis for a presentation that I'm drafting.
I want a report that shows:
- A list of languages showing the users who speak each language as
identified on their user pages
- A list of projects where users have made at least 5 edits in the past
12 months
- A list of group members that have administrator rights and which wikis
are involved
- A list of public mailing lists where members have contributed in the
past 12 months
- Number of public emails on those mailing lists in the past 12 months
- Total edits made by the cohort
- Total bytes changed by the cohort
- Total logged-in time for the cohort, if log-in time aggregation is
being done
What automated tools could I use to create this report?
Are there any significant editor productivity metrics that are missing from this list?
Thanks,
Pine
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
On May 25, 2014 5:24 PM, "Oliver Keyes" okeyes@wikimedia.org wrote:
Context for this?
The answer, for some of those questions, is like to be "none of them".
Those that look problematic:
*We don't track self-reported language skills, because, well, userboxen
are incredibly inconsistent between projects,
How many wikis have the Babel extension implemented?
-- John
John Mark Vandenberg, 25/05/2014 14:51:
How many wikis have the Babel extension implemented?
All, since 2011. (But usage practices vary.) https://blog.wikimedia.org/2011/09/21/babel-extension-live-on-the-wmf-projec...
Nemo
On Mon, May 26, 2014 at 3:19 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
John Mark Vandenberg, 25/05/2014 14:51:
How many wikis have the Babel extension implemented?
All, since 2011. (But usage practices vary.) https://blog.wikimedia.org/2011/09/21/babel-extension-live-on-the-wmf-projec...
It looks like it isnt implemented on English Wikipedia...?
Can we see how many wikis have 'fully' implemented it? i.e. all non-deprecated templates use it, and most user-pages have been migrated if necessary.
There are quite a few bugs in it which might be preventing adoption, such as auto-recreating deleted categories which tends to annoy admins.
On Mon, May 26, 2014 at 1:46 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
John Mark Vandenberg, 26/05/2014 02:32:
Can we see how many wikis have 'fully' implemented it? i.e. all non-deprecated templates use it, and most user-pages have been migrated if necessary.
Probably not. But why would someone care about English Wikipedia?
;-)
This type of implementation (template changes; userboxes) change tends to, if done on en.wp, trickle from English Wikipedia to most of the Wikipedias outside the top ~20. Copying templates from en.wp is [[Claytons]] scary transclusion.
-- John Vandenberg
John Mark Vandenberg, 26/05/2014 09:20:
;-)
This type of implementation (template changes; userboxes) change tends to, if done on en.wp, trickle from English Wikipedia to most of the Wikipedias outside the top ~20. Copying templates from en.wp is [[Claytons]] scary transclusion.
Why would someone import {{babel}} when {{#babel}} works out of the box with same syntax? Not that any wikis are being created anyway.
Nemo
Federico Leva (Nemo), 26/05/2014 10:38:
Why would someone import {{babel}} when {{#babel}} works out of the box with same syntax? Not that any wikis are being created anyway.
I did some silly bzgrep -c and the usage of {{#babel}} is significant on most wikis; to have the total you should count how many {{babel}} are just (correct) wrappers of {{#babel}}. Not only on several huge wikis #babel is more used, but even where it's less used it usually grows more than the old template from one dump to the next (even on en.wiki new usages are comparable). http://koti.kapsi.fi/~federico/babel.csv
Nemo
On Wed, May 28, 2014 at 7:55 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Federico Leva (Nemo), 26/05/2014 10:38:
Why would someone import {{babel}} when {{#babel}} works out of the box with same syntax? Not that any wikis are being created anyway.
I did some silly bzgrep -c and the usage of {{#babel}} is significant on most wikis; to have the total you should count how many {{babel}} are just (correct) wrappers of {{#babel}}. Not only on several huge wikis #babel is more used, but even where it's less used it usually grows more than the old template from one dump to the next (even on en.wiki new usages are comparable). http://koti.kapsi.fi/~federico/babel.csv
Thank you!
So, the wikis with the largest non-#babel usage are enwiki, dewiki, plwiki and itwiki. Upgrading the usage on them to #babel should establish a good dataset to work with.
Since you have the dumps there, any chance you could find which templates use #babel on enwiki, dewiki, plwiki and itwiki. The search interface doesnt report them.
https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=adva...
Here is the status of {{Babel}} on the wikis with the largest usage of {{#babel}}.
https://commons.wikimedia.org/wiki/Template:Babel - deprecated https://meta.wikimedia.org/wiki/Template:Babel - deprecated https://incubator.wikimedia.org/wiki/Template:Babel - deprecated
https://www.wikidata.org/wiki/Template:Babel - 'do not use'
https://la.wikipedia.org/wiki/Template:Babel - deleted https://is.wikipedia.org/wiki/Template:Babel - deleted https://en.wikisource.org/wiki/Template:Babel - deleted
{{Babel}} is a wrapper for #babel: eswiki and cswiki
I want a report that shows:
- A list of languages showing the users who speak each language as
identified on their user pages
- A list of projects where users have made at least 5 edits in the past 12
months
- A list of group members that have administrator rights and which wikis
are involved
- A list of public mailing lists where members have contributed in the
past 12 months
* Number of public emails on those mailing lists in the past 12 months
What Oliver said for these questions
- Total edits made by the cohort
- Total bytes changed by the cohort
For these, you could use Wikimetrics: https://metrics.wmflabs.org/. We have Edits and Bytes Added metrics which you can run on any cohort. Cohorts consist of lists of users in specific projects. So it wouldn't be able to go across projects for you (yet, we're working on that) but you could upload a cohort that has each user you care about in each project you care about. Also, we're adding new metrics soon.
- Total logged-in time for the cohort, if log-in time aggregation is being
done
Same as above, see Oliver's answer
Are there any significant editor productivity metrics that are missing from
this list?
We're adding a few new metrics to Wikimetrics but Survival, Pages Created, and Threshold are also available right now. I wouldn't say those are "significant productivity metrics" but they're something to get started. The new metrics we're implementing aim to be more along the lines of what you're looking for. You can see that work here: https://meta.wikimedia.org/wiki/Research:Metrics_standardization and specifically we're going to be adding these four metrics within the next few sprints:
https://meta.wikimedia.org/wiki/Research:Newly_registered_user https://meta.wikimedia.org/wiki/Research:New_editor https://meta.wikimedia.org/wiki/Research:Productive_new_editor https://meta.wikimedia.org/wiki/Research:Surviving_new_editor
Hope that helps and check out the Wikimetrics support page for more info.
* A list of public mailing lists where members have contributed in the past 12 months
You could start to build from the script used for
http://www.infodisiac.com/Wikipedia/ScanMail/_PowerPosters.html
Cheers,
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of ENWP Pine Sent: Sunday, May 25, 2014 9:16 To: analytics@lists.wikimedia.org Subject: [Analytics] Cohort analysis
Hi Analytics,
I'm looking for a way to do a cohort analysis for a presentation that I'm drafting.
I want a report that shows: * A list of languages showing the users who speak each language as identified on their user pages * A list of projects where users have made at least 5 edits in the past 12 months * A list of group members that have administrator rights and which wikis are involved * A list of public mailing lists where members have contributed in the past 12 months * Number of public emails on those mailing lists in the past 12 months * Total edits made by the cohort * Total bytes changed by the cohort * Total logged-in time for the cohort, if log-in time aggregation is being done
What automated tools could I use to create this report?
Are there any significant editor productivity metrics that are missing from this list?
Thanks,
Pine
Hmm, assuming you can find the connection between mailing list name and editor name.
Both are public but often not the same.
From: Erik Zachte [mailto:ezachte@wikimedia.org] Sent: Sunday, May 25, 2014 17:01 To: 'A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.' Subject: RE: [Analytics] Cohort analysis
* A list of public mailing lists where members have contributed in the past 12 months
You could start to build from the script used for
http://www.infodisiac.com/Wikipedia/ScanMail/_PowerPosters.html
Cheers,
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of ENWP Pine Sent: Sunday, May 25, 2014 9:16 To: analytics@lists.wikimedia.org Subject: [Analytics] Cohort analysis
Hi Analytics,
I'm looking for a way to do a cohort analysis for a presentation that I'm drafting.
I want a report that shows: * A list of languages showing the users who speak each language as identified on their user pages * A list of projects where users have made at least 5 edits in the past 12 months * A list of group members that have administrator rights and which wikis are involved * A list of public mailing lists where members have contributed in the past 12 months * Number of public emails on those mailing lists in the past 12 months * Total edits made by the cohort * Total bytes changed by the cohort * Total logged-in time for the cohort, if log-in time aggregation is being done
What automated tools could I use to create this report?
Are there any significant editor productivity metrics that are missing from this list?
Thanks,
Pine