Hi Aaron,
I was not planning on prioritizing any EventLogging work for the rest of
this quarter. The analytics dev team has a goal to get an EEVS dashboard
running and I want to keep them focused otherwise we will not reach this
goal.
I'm tempted to ask what springle and YuviPanda can accomplish without
the help of the analytics devs, but even that will imply discussions and
distractions from our goals.
In September I am planning on looking at what goals we can set for the
next quarter and look at what we want to accomplish with EventLogging. I
was going to prioritize it at that point.
On Wed, Aug 13, 2014 at 10:28 AM, Aaron Halfaker <
ahalfaker(a)wikimedia.org> wrote:
Excellent. Kevin, can you work to get that
bug[1] prioritized and let
us know? I can start working with R&D on a proposal to bring to legal.
1.
https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
It stands to reason that you would be interested on the capsule too as
> it holds the timestamp and wiki project the event applies to, but I imagine
> we can make fields public selectively.
Fair enough. I think we can drop that one column from the capsule and
be quite happy with the rest. No need to purge EventLogging.
-Aaron
On Wed, Aug 13, 2014 at 6:08 PM, Nuria Ruiz <nuria(a)wikimedia.org>
wrote:
> > Re. (2), I didn't say anything about that being related to
> public/private.
> > This is a request from springle -- that if we are going to start
> pushing
> > Events to LabsDB, he'd like us to do so more efficiently. That bug
> is about efficiently batching inserts.
> ah, my mistake. Kevin can do prioritization as needed.
>
> >If you are concerned about UserAgents as the sanitization page you
> linked to suggests, then we should talk about the >EventLogging capsule,
> not the event.
> If you want to be so precise, sure, that is correct. Note that
> currently there is no distinction in storage as to the event and the
> capsule, they are stored together in the same record. Capsule data is only
> identified by a prefix on the column name. It stands to reason that you
> would be interested on the capsule too as it holds the timestamp and wiki
> project the event applies to, but I imagine we can make fields public
> selectively.
>
>
>
>
>
> On Wed, Aug 13, 2014 at 6:47 PM, Aaron Halfaker <
> ahalfaker(a)wikimedia.org> wrote:
>
>> Re. (2), I didn't say anything about that being related to
>> public/private. This is a request from springle -- that if we are going to
>> start pushing Events to LabsDB, he'd like us to do so more efficiently.
>> That bug is about efficiently batching inserts.
>>
>> I don't know what you are talking about re. 90 day purges. I'm
>> talking about 100% public Event logging events -- E.g.
>>
https://meta.wikimedia.org/wiki/Schema:PageMove Also, we do *not*
>> need to purge EventLogging event data at 90 days. We need to purge PII at
>> 90 days. We generally do not store PII in EventLogging events, but when we
>> do, we organize 90 days purges as we have recently for the anonymous editor
>> experiments. If you are concerned about UserAgents as the sanitization
>> page you linked to suggests, then we should talk about the EventLogging
>> capsule, not the event.
>>
>> Re. (1), we are already performing this review internally in order to
>> determine what does and does not conform to the Data Retention Guidelines.
>> It seems clear that a robust process could also identify non-sensitive
>> Schemas that could be published in labs.
>>
>> -Aaron
>>
>>
>> On Wed, Aug 13, 2014 at 5:00 PM, Nuria Ruiz <nuria(a)wikimedia.org>
>> wrote:
>>
>>> Aaron,
>>>
>>> >(2)
https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
>>> The bug does not have to do with making data public. It has to do
>>> with how data is inserted in to EL from the
>>> consumers, so it deals with the 'system', not the 'data'. The
raw
>>> data as inserted cannot be replicated directly to be made public so whether
>>> inserts are more efficient does not affect the public/private discussion.
>>>
>>>
>>> >(1) there needs to be a good review process in place to make sure
>>> that the data we surface isn't sensitive
>>> There is a bunch of work involved on this item. For example: per our
>>> privacy policy some of this data should be discarded after 90 days and
>>> currently it is not. Also, you are aware of the discussions under
>>> sanitization:
>>>
https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
>>>
>>> Basically to make EL data public it needs to be aggregated with a
>>> level of anonymization we think is acceptable. There is quite a bit of work
>>> on this regard, here are some bugs that were filed a while back:
>>>
>>>
https://bugzilla.wikimedia.org/show_bug.cgi?id=62978
>>>
>>>
https://bugzilla.wikimedia.org/show_bug.cgi?id=59832
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 13, 2014 at 3:39 PM, Aaron Halfaker <
>>> ahalfaker(a)wikimedia.org> wrote:
>>>
>>>> Hey folks,
>>>>
>>>> We've been discussing ways to make more Wikimedia data public. One
>>>> of our sources for data is EventLogging (EL)[1], a system that lets us
>>>> track events on both the client and server-side. Recently, YuviPanda
and
>>>> springle have been working with us to figure out what issues need to be
>>>> resolved in order to begin loading EL events that contain public data[2]
>>>> into LabsDB for public consumption and for use in WikiMetrics.
>>>>
>>>> It looks like there are three major concerns about directing EL to
>>>> LabsDB. (1) there needs to be a good review process in place to make
sure
>>>> that the data we surface isn't sensitive, (2)
>>>>
https://bugzilla.wikimedia.org/show_bug.cgi?id=67450 will need to
>>>> be addressed to make sure that we don't over-utilize labs
infrastructure
>>>> and (3) we'll need signoff from legal.
>>>>
>>>> It looks like (2) can be taken care of independently from (1) and
>>>> (3). Is this bug already prioritized, and if not, could it be?
>>>>
>>>> 1.
https://www.mediawiki.org/wiki/Extension:EventLogging
>>>> 2. Eventually, we'll want a means to sanitize and surface events
>>>> that contain sensitive information, but I'd argue that is a second
step
>>>> that we should address later since it will likely require more
substantial
>>>> technical work.
>>>>
>>>> -Aaron
>>>>
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics(a)lists.wikimedia.org
>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org