Hi all,
On the Growth team, we (and by we, I mean Aaron Halfaker) have been doing a great deal of work to understand trends in new article creation,[1] particularly from the new user perspective. Along with this and our launch of the new Draft namespace, we've discovered that our current data sources for tracking page creations, moves, and deletions are far too slow and awkward to use on a daily or weekly basis.
To solve this problem and answer on-going questions about how many page creators there are, how successful they are, and what workflows they use, we've created three new schemas:
- https://meta.wikimedia.org/wiki/Schema:PageCreation - https://meta.wikimedia.org/wiki/Schema:PageDeletion - https://meta.wikimedia.org/wiki/Schema:PageMove
We envision using similar to how we're using schemas like Schema:ServerSideAccountCreation and Schema:PrefUpdate. We will likely be implementing these in our team's next sprint, starting on February 5th, so if you have feedback please speak up soon. :)
1. https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
Steven, thanks for the heads up. Is the expectation that you want to make these logs publicly available? I guess we could explore this possibility for these 3 schemas (they encode already public data, although in a cleaner format), but not the other two logs that you mention.
Dario
On Jan 24, 2014, at 3:54 PM, Steven Walling swalling@wikimedia.org wrote:
Hi all,
On the Growth team, we (and by we, I mean Aaron Halfaker) have been doing a great deal of work to understand trends in new article creation,[1] particularly from the new user perspective. Along with this and our launch of the new Draft namespace, we've discovered that our current data sources for tracking page creations, moves, and deletions are far too slow and awkward to use on a daily or weekly basis.
To solve this problem and answer on-going questions about how many page creators there are, how successful they are, and what workflows they use, we've created three new schemas: https://meta.wikimedia.org/wiki/Schema:PageCreation https://meta.wikimedia.org/wiki/Schema:PageDeletion https://meta.wikimedia.org/wiki/Schema:PageMove We envision using similar to how we're using schemas like Schema:ServerSideAccountCreation and Schema:PrefUpdate. We will likely be implementing these in our team's next sprint, starting on February 5th, so if you have feedback please speak up soon. :)
-- Steven Walling, Product Manager https://wikimediafoundation.org/ _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Fri, Jan 24, 2014 at 4:07 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Steven, thanks for the heads up. Is the expectation that you want to make these logs publicly available? I guess we could explore this possibility for these 3 schemas (they encode already public data, although in a cleaner format), but not the other two logs that you mention.
No. Not intending to make these public, though I don't think they contain any data that isn't already public.
In fact the idea of publishing a subset or censored view of EventLogging logs is something we’ve been tinkering with for a while (I think Erik M brought it up on this list a while ago).
Replicating a subset of the logs stored on s1.log on labs db sounds like the best way to approach the problem. The added bonus is that by having this (public / censored) data exposed via labs db, wikimetrics would be able to access it out of the box.
Copying Sean to see if this seems even remotely possible.
D
On Jan 24, 2014, at 4:24 PM, Steven Walling swalling@wikimedia.org wrote:
On Fri, Jan 24, 2014 at 4:07 PM, Dario Taraborelli dtaraborelli@wikimedia.org wrote: Steven, thanks for the heads up. Is the expectation that you want to make these logs publicly available? I guess we could explore this possibility for these 3 schemas (they encode already public data, although in a cleaner format), but not the other two logs that you mention.
No. Not intending to make these public, though I don't think they contain any data that isn't already public.
-- Steven Walling, Product Manager https://wikimediafoundation.org/ _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Fri, Jan 24, 2014 at 4:44 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
In fact the idea of publishing a subset or censored view of EventLogging logs is something we’ve been tinkering with for a while (I think Erik M brought it up on this list a while ago).
Replicating a subset of the logs stored on s1.log on labs db sounds like the best way to approach the problem. The added bonus is that by having this (public / censored) data exposed via labs db, wikimetrics would be able to access it out of the box.
Cool. I'd be behind doing that. I am also very interested in "exposing" data via automatic creation of cohorts in Wikimetrics in the future. (Will bring this up separately.)
On Fri, Jan 24, 2014 at 4:44 PM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
In fact the idea of publishing a subset or censored view of EventLogging logs is something we've been tinkering with for a while (I think Erik M brought it up on this list a while ago).
Replicating a subset of the logs stored on s1.log on labs db sounds like the best way to approach the problem. The added bonus is that by having this (public / censored) data exposed via labs db, wikimetrics would be able to access it out of the box.
Cool. I'd be behind doing that. I am also very interested in "exposing" data via automatic creation of cohorts in Wikimetrics in the future. (Will bring this up separately.)
Just wanted to publicly support both of these endeavors.
Sean, any thoughts on sanitized Event Logging tables ending up in a log database in labsdb?
Steven, any more thoughts on wikimetrics cohorts since the last time we spoke?
Also worth noting is that we're considering two bugs that would improve move and deletion logging in core as well.
See https://bugzilla.wikimedia.org/show_bug.cgi?id=26122 and https://bugzilla.wikimedia.org/show_bug.cgi?id=57084
On Fri, Jan 24, 2014 at 5:54 PM, Steven Walling swalling@wikimedia.orgwrote:
Hi all,
On the Growth team, we (and by we, I mean Aaron Halfaker) have been doing a great deal of work to understand trends in new article creation,[1] particularly from the new user perspective. Along with this and our launch of the new Draft namespace, we've discovered that our current data sources for tracking page creations, moves, and deletions are far too slow and awkward to use on a daily or weekly basis.
To solve this problem and answer on-going questions about how many page creators there are, how successful they are, and what workflows they use, we've created three new schemas:
- https://meta.wikimedia.org/wiki/Schema:PageCreation
- https://meta.wikimedia.org/wiki/Schema:PageDeletion
- https://meta.wikimedia.org/wiki/Schema:PageMove
We envision using similar to how we're using schemas like Schema:ServerSideAccountCreation and Schema:PrefUpdate. We will likely be implementing these in our team's next sprint, starting on February 5th, so if you have feedback please speak up soon. :)
-- Steven Walling, Product Manager https://wikimediafoundation.org/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On 01/24/2014 04:14 PM, Aaron Halfaker wrote:
Also worth noting is that we're considering two bugs that would improve move and deletion logging in core as well.
See https://bugzilla.wikimedia.org/show_bug.cgi?id=26122 and https://bugzilla.wikimedia.org/show_bug.cgi?id=57084
Yes, I think this is important for a couple reasons:
* People will expect to see this information in the core logging dumps (pages-logging.xml.gz) and in Special:Log (some of the changes might only affect one of those), since moving etc. is core functionality, and there are logs for these already (albeit they need work).
* EventLogging does not have much (if any) uptake from third-party wikis yet.
Matt Flaschen
Matthew Flaschen, 27/01/2014 18:29:
- EventLogging does not have much (if any) uptake from third-party wikis
yet.
Or more precisely one wiki (that we know). https://wikiapiary.com/wiki/SkyshardWiki
Nemo