You can backfill the events according to Ori's new logic. Then your query
is simple going forward.
On Mon, Dec 2, 2013 at 6:55 PM, Jon Robson <jrobson(a)wikimedia.org> wrote:
If Kenan schedules a task we can update the schema to
record this for
newly created data and given the issues with this it seems like a good
idea.
That said we will have a lot of historic data that will still need to
be joined and saved as a new table... via a UNION i guess?
On Mon, Dec 2, 2013 at 2:52 PM, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
On Mon, Dec 2, 2013 at 5:45 PM, Kenan Wang <kwang(a)wikimedia.org> wrote:
>
> It sounds good to me. Dario, Dan?
>
>
> On Mon, Dec 2, 2013 at 1:35 PM, Arthur Richards <
arichards(a)wikimedia.org>
> wrote:
>>
>>
>> On Thu, Nov 28, 2013 at 3:17 AM, Ori Livneh <ori(a)wikimedia.org> wrote:
>>>
>>> It doesn't make sense to do it that way. Instead of inferring that
>>> something must have happened by cross-referencing conditions across
>>> datasets, just do the following: in MediaWiki, every time a user
makes
an
>>> edit, check their registration date
and edit count. If the date is
within
>>> the last thirty days and the edit
count is 5, log an event. Doing it
this
>>> way will easily scale to the entire
cluster, not just enwiki, and to
any
>>> number of bins, not just 5 edits.
>>>
>>> Patch at <https://gerrit.wikimedia.org/r/#/c/98079/>; you can take it
>>> from there if you like.
>>
>>
>> Thanks Ori - this sounds and looks viable to me, and seems like a
better
>> solution. Kenan, Jon, Dario, Dan, et al -
can we move forward with
this?
I'm ok with this. I do see it as a temporary measure though. What Ori
says
here, "inferring that something must have
happened", is sort of the whole
reason SQL exists. In my opinion, the problem is that these two data
sources can't be joined efficiently to do analytics work on them. But
since
that's a harder problem at the moment, I
agree with Ori's solution.
Jon/Arthur, who set up your Event Logging solution and do you need help
reviewing / merging this Change? I don't know much about Event Logging
but
I'm happy to learn and help if you need.
Dan