Kevin, for what it's worth I don't think
that bug that Sean is asking for is that challenging. The relevant part we'd have to
change is really just a few lines [1]. I respect your decision of course, but I just
wanted to point out that this issue does drive towards some of our goals, as we talked a
bit about getting EventLogging data to be usable by Wikimetrics, and this is the first
step.
[1] -
On Wed, Aug 13, 2014 at 4:19 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org> wrote:
OK. Sounds reasonable. Sorry to seem as though
I am pushing on you & the devs. In fact, specifying that you won't have the
bandwidth to even consider the bug until next quarter gives me the power to push on
others. >:)
Thanks!
-Aaron
On Wed, Aug 13, 2014 at 8:56 PM, Kevin Leduc <kevin(a)wikimedia.org> wrote:
> Hi Aaron,
>
> I was not planning on prioritizing any EventLogging work for the rest of this
quarter. The analytics dev team has a goal to get an EEVS dashboard running and I want to
keep them focused otherwise we will not reach this goal.
>
> I'm tempted to ask what springle and YuviPanda can accomplish without the help of
the analytics devs, but even that will imply discussions and distractions from our goals.
>
> In September I am planning on looking at what goals we can set for the next quarter
and look at what we want to accomplish with EventLogging. I was going to prioritize it at
that point.
>
>
>
>
> On Wed, Aug 13, 2014 at 10:28 AM, Aaron Halfaker <ahalfaker(a)wikimedia.org>
wrote:
>> Excellent. Kevin, can you work to get that bug[1] prioritized and let us know?
I can start working with R&D on a proposal to bring to legal.
>>
>> 1.
https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
>>
>>> It stands to reason that you would be interested on the capsule too as it
holds the timestamp and wiki project the event applies to, but I imagine we can make
fields public selectively.
>>
>> Fair enough. I think we can drop that one column from the capsule and be quite
happy with the rest. No need to purge EventLogging.
>>
>> -Aaron
>>
>>
>> On Wed, Aug 13, 2014 at 6:08 PM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
>>> > Re. (2), I didn't say anything about that being related to
public/private.
>>> > This is a request from springle -- that if we are going to start pushing
>>> > Events to LabsDB, he'd like us to do so more efficiently. That bug
is about efficiently batching inserts.
>>> ah, my mistake. Kevin can do prioritization as needed.
>>>
>>> >If you are concerned about UserAgents as the sanitization page you linked
to suggests, then we should talk about the >EventLogging capsule, not the event.
>>> If you want to be so precise, sure, that is correct. Note that currently
there is no distinction in storage as to the event and the capsule, they are stored
together in the same record. Capsule data is only identified by a prefix on the column
name. It stands to reason that you would be interested on the capsule too as it holds the
timestamp and wiki project the event applies to, but I imagine we can make fields public
selectively.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 13, 2014 at 6:47 PM, Aaron Halfaker
<ahalfaker(a)wikimedia.org> wrote:
>>>> Re. (2), I didn't say anything about that being related to
public/private. This is a request from springle -- that if we are going to start pushing
Events to LabsDB, he'd like us to do so more efficiently. That bug is about
efficiently batching inserts.
>>>>
>>>> I don't know what you are talking about re. 90 day purges. I'm
talking about 100% public Event logging events -- E.g.
https://meta.wikimedia.org/wiki/Schema:PageMove Also, we do *not* need to purge
EventLogging event data at 90 days. We need to purge PII at 90 days. We generally do not
store PII in EventLogging events, but when we do, we organize 90 days purges as we have
recently for the anonymous editor experiments. If you are concerned about UserAgents as
the sanitization page you linked to suggests, then we should talk about the EventLogging
capsule, not the event.
>>>>
>>>> Re. (1), we are already performing this review internally in order to
determine what does and does not conform to the Data Retention Guidelines. It seems clear
that a robust process could also identify non-sensitive Schemas that could be published in
labs.
>>>>
>>>> -Aaron
>>>>
>>>>
>>>> On Wed, Aug 13, 2014 at 5:00 PM, Nuria Ruiz <nuria(a)wikimedia.org>
wrote:
>>>>> Aaron,
>>>>>
>>>>> >(2)
https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
>>>>> The bug does not have to do with making data public. It has to do
with how data is inserted in to EL from the
>>>>> consumers, so it deals with the 'system', not the
'data'. The raw data as inserted cannot be replicated directly to be made public
so whether inserts are more efficient does not affect the public/private discussion.
>>>>>
>>>>>
>>>>> >(1) there needs to be a good review process in place to make sure
that the data we surface isn't sensitive
>>>>> There is a bunch of work involved on this item. For example: per our
privacy policy some of this data should be discarded after 90 days and currently it is
not. Also, you are aware of the discussions under sanitization:
>>>>>
https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
>>>>>
>>>>> Basically to make EL data public it needs to be aggregated with a
level of anonymization we think is acceptable. There is quite a bit of work on this
regard, here are some bugs that were filed a while back:
>>>>>
>>>>>
https://bugzilla.wikimedia.org/show_bug.cgi?id=62978
>>>>>
>>>>>
https://bugzilla.wikimedia.org/show_bug.cgi?id=59832
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 13, 2014 at 3:39 PM, Aaron Halfaker
<ahalfaker(a)wikimedia.org> wrote:
>>>>>> Hey folks,
>>>>>>
>>>>>> We've been discussing ways to make more Wikimedia data
public. One of our sources for data is EventLogging (EL)[1], a system that lets us track
events on both the client and server-side. Recently, YuviPanda and springle have been
working with us to figure out what issues need to be resolved in order to begin loading EL
events that contain public data[2] into LabsDB for public consumption and for use in
WikiMetrics.
>>>>>>
>>>>>> It looks like there are three major concerns about directing EL
to LabsDB. (1) there needs to be a good review process in place to make sure that the
data we surface isn't sensitive, (2)
https://bugzilla.wikimedia.org/show_bug.cgi?id=67450 will need to be addressed to make
sure that we don't over-utilize labs infrastructure and (3) we'll need signoff
from legal.
>>>>>>
>>>>>> It looks like (2) can be taken care of independently from (1) and
(3). Is this bug already prioritized, and if not, could it be?
>>>>>>
>>>>>> 1.
https://www.mediawiki.org/wiki/Extension:EventLogging
>>>>>> 2. Eventually, we'll want a means to sanitize and surface
events that contain sensitive information, but I'd argue that is a second step that we
should address later since it will likely require more substantial technical work.
>>>>>>
>>>>>> -Aaron
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Analytics mailing list
>>>>>> Analytics(a)lists.wikimedia.org
>>>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics(a)lists.wikimedia.org
>>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics(a)lists.wikimedia.org
>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org