(expanding on what I think Dan is referring to re: goals), addressing this issue would
allow EEVS to access data needed to generate breakdowns for metrics by method/target site
(mobile, desktop, apps).
On Aug 13, 2014, at 1:40 PM, Dan Andreescu <dandreescu(a)wikimedia.org> wrote:
Kevin, for what it's worth I don't think that
bug that Sean is asking for is that challenging. The relevant part we'd have to
change is really just a few lines [1]. I respect your decision of course, but I just
wanted to point out that this issue does drive towards some of our goals, as we talked a
bit about getting EventLogging data to be usable by Wikimetrics, and this is the first
step.
[1] -
https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FEventLogging/4d917e…
On Wed, Aug 13, 2014 at 4:19 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org> wrote:
OK. Sounds reasonable. Sorry to seem as though I am pushing on you & the devs. In
fact, specifying that you won't have the bandwidth to even consider the bug until next
quarter gives me the power to push on others. >:)
Thanks!
-Aaron
On Wed, Aug 13, 2014 at 8:56 PM, Kevin Leduc <kevin(a)wikimedia.org> wrote:
Hi Aaron,
I was not planning on prioritizing any EventLogging work for the rest of this quarter.
The analytics dev team has a goal to get an EEVS dashboard running and I want to keep them
focused otherwise we will not reach this goal.
I'm tempted to ask what springle and YuviPanda can accomplish without the help of the
analytics devs, but even that will imply discussions and distractions from our goals.
In September I am planning on looking at what goals we can set for the next quarter and
look at what we want to accomplish with EventLogging. I was going to prioritize it at
that point.
On Wed, Aug 13, 2014 at 10:28 AM, Aaron Halfaker <ahalfaker(a)wikimedia.org> wrote:
Excellent. Kevin, can you work to get that bug[1] prioritized and let us know? I can
start working with R&D on a proposal to bring to legal.
1.
https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
It stands to reason that you would be interested on the capsule too as it holds the
timestamp and wiki project the event applies to, but I imagine we can make fields public
selectively.
Fair enough. I think we can drop that one column from the capsule and be quite happy
with the rest. No need to purge EventLogging.
-Aaron
On Wed, Aug 13, 2014 at 6:08 PM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
Re. (2), I didn't say anything about that
being related to public/private.
This is a request from springle -- that if we are going to start pushing
Events to LabsDB, he'd like us to do so more efficiently. That bug is about
efficiently batching inserts.
ah, my mistake. Kevin can do prioritization as
needed.
If you are concerned about UserAgents as the
sanitization page you linked to suggests, then we should talk about the >EventLogging
capsule, not the event.
If you want to be so precise, sure, that is correct. Note
that currently there is no distinction in storage as to the event and the capsule, they
are stored together in the same record. Capsule data is only identified by a prefix on the
column name. It stands to reason that you would be interested on the capsule too as it
holds the timestamp and wiki project the event applies to, but I imagine we can make
fields public selectively.
On Wed, Aug 13, 2014 at 6:47 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org> wrote:
Re. (2), I didn't say anything about that being related to public/private. This is a
request from springle -- that if we are going to start pushing Events to LabsDB, he'd
like us to do so more efficiently. That bug is about efficiently batching inserts.
I don't know what you are talking about re. 90 day purges. I'm talking about
100% public Event logging events -- E.g.
https://meta.wikimedia.org/wiki/Schema:PageMove
Also, we do *not* need to purge EventLogging event data at 90 days. We need to purge PII
at 90 days. We generally do not store PII in EventLogging events, but when we do, we
organize 90 days purges as we have recently for the anonymous editor experiments. If you
are concerned about UserAgents as the sanitization page you linked to suggests, then we
should talk about the EventLogging capsule, not the event.
Re. (1), we are already performing this review internally in order to determine what does
and does not conform to the Data Retention Guidelines. It seems clear that a robust
process could also identify non-sensitive Schemas that could be published in labs.
-Aaron
On Wed, Aug 13, 2014 at 5:00 PM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
Aaron,
The bug does not have to do
with making data public. It has to do with how data is inserted in to EL from the
consumers, so it deals with the 'system', not the 'data'. The raw data as
inserted cannot be replicated directly to be made public so whether inserts are more
efficient does not affect the public/private discussion.
(1) there needs to be a good review process in
place to make sure that the data we surface isn't sensitive
There is a bunch of
work involved on this item. For example: per our privacy policy some of this data should
be discarded after 90 days and currently it is not. Also, you are aware of the discussions
under sanitization:
https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
Basically to make EL data public it needs to be aggregated with a level of anonymization
we think is acceptable. There is quite a bit of work on this regard, here are some bugs
that were filed a while back:
https://bugzilla.wikimedia.org/show_bug.cgi?id=62978
https://bugzilla.wikimedia.org/show_bug.cgi?id=59832
On Wed, Aug 13, 2014 at 3:39 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org> wrote:
Hey folks,
We've been discussing ways to make more Wikimedia data public. One of our sources
for data is EventLogging (EL)[1], a system that lets us track events on both the client
and server-side. Recently, YuviPanda and springle have been working with us to figure out
what issues need to be resolved in order to begin loading EL events that contain public
data[2] into LabsDB for public consumption and for use in WikiMetrics.
It looks like there are three major concerns about directing EL to LabsDB. (1) there
needs to be a good review process in place to make sure that the data we surface isn't
sensitive, (2)
https://bugzilla.wikimedia.org/show_bug.cgi?id=67450 will need to be
addressed to make sure that we don't over-utilize labs infrastructure and (3)
we'll need signoff from legal.
It looks like (2) can be taken care of independently from (1) and (3). Is this bug
already prioritized, and if not, could it be?
1.
https://www.mediawiki.org/wiki/Extension:EventLogging
2. Eventually, we'll want a means to sanitize and surface events that contain
sensitive information, but I'd argue that is a second step that we should address
later since it will likely require more substantial technical work.
-Aaron
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics