Aaron, 

>(2) https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
The bug does not have to do with making data public. It has to do with how data is inserted in to EL from the 
consumers, so it deals with the 'system', not the 'data'. The raw data as inserted cannot be replicated directly to be made public so whether inserts are more efficient does not affect the public/private discussion.


>(1) there needs to be a good review process in place to make sure that the data we surface isn't sensitive
There is a bunch of work involved on this item. For example: per our privacy policy some of this data should be discarded after 90 days and currently it is not. Also, you are aware of the discussions under sanitization: 
https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization

Basically to make EL data public it needs to be aggregated with a level of anonymization we think is acceptable. There is quite a bit of work on this regard, here are some bugs that were filed a while back:

https://bugzilla.wikimedia.org/show_bug.cgi?id=62978

https://bugzilla.wikimedia.org/show_bug.cgi?id=59832







On Wed, Aug 13, 2014 at 3:39 PM, Aaron Halfaker <ahalfaker@wikimedia.org> wrote:
Hey folks,

We've been discussing ways to make more Wikimedia data public.  One of our sources for data is EventLogging (EL)[1], a system that lets us track events on both the client and server-side.  Recently, YuviPanda and springle have been working with us to figure out what issues need to be resolved in order to begin loading EL events that contain public data[2] into LabsDB for public consumption and for use in WikiMetrics.

It looks like there are three major concerns about directing EL to LabsDB.  (1) there needs to be a good review process in place to make sure that the data we surface isn't sensitive, (2) https://bugzilla.wikimedia.org/show_bug.cgi?id=67450 will need to be addressed to make sure that we don't over-utilize labs infrastructure and (3) we'll need signoff from legal. 

It looks like (2) can be taken care of independently from (1) and (3).  Is this bug already prioritized, and if not, could it be?

1. https://www.mediawiki.org/wiki/Extension:EventLogging
2. Eventually, we'll want a means to sanitize and surface events that contain sensitive information, but I'd argue that is a second step that we should address later since it will likely require more substantial technical work.

-Aaron


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics