Aaron,
The bug does not have to do
with making data public. It has to do with how
data is inserted in to EL from the
consumers, so it deals with the 'system', not the 'data'. The raw data as
inserted cannot be replicated directly to be made public so whether inserts
are more efficient does not affect the public/private discussion.
(1) there needs to be a good review process in place to
make sure that the
data we surface isn't sensitive
There is a bunch of work involved on this item. For example: per our
privacy policy some of this data should be discarded after 90 days and
currently it is not. Also, you are aware of the discussions under
sanitization:
https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
Basically to make EL data public it needs to be aggregated with a level of
anonymization we think is acceptable. There is quite a bit of work on this
regard, here are some bugs that were filed a while back:
https://bugzilla.wikimedia.org/show_bug.cgi?id=62978
https://bugzilla.wikimedia.org/show_bug.cgi?id=59832
On Wed, Aug 13, 2014 at 3:39 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org>
wrote:
Hey folks,
We've been discussing ways to make more Wikimedia data public. One of our
sources for data is EventLogging (EL)[1], a system that lets us track
events on both the client and server-side. Recently, YuviPanda and
springle have been working with us to figure out what issues need to be
resolved in order to begin loading EL events that contain public data[2]
into LabsDB for public consumption and for use in WikiMetrics.
It looks like there are three major concerns about directing EL to LabsDB.
(1) there needs to be a good review process in place to make sure that the
data we surface isn't sensitive, (2)
https://bugzilla.wikimedia.org/show_bug.cgi?id=67450 will need to be
addressed to make sure that we don't over-utilize labs infrastructure and
(3) we'll need signoff from legal.
It looks like (2) can be taken care of independently from (1) and (3). Is
this bug already prioritized, and if not, could it be?
1.
https://www.mediawiki.org/wiki/Extension:EventLogging
2. Eventually, we'll want a means to sanitize and surface events that
contain sensitive information, but I'd argue that is a second step that we
should address later since it will likely require more substantial
technical work.
-Aaron
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics