On Mon, Dec 2, 2013 at 5:45 PM, Kenan Wang <kwang@wikimedia.org> wrote:
It sounds good to me. Dario, Dan?


On Mon, Dec 2, 2013 at 1:35 PM, Arthur Richards <arichards@wikimedia.org> wrote:

On Thu, Nov 28, 2013 at 3:17 AM, Ori Livneh <ori@wikimedia.org> wrote:
It doesn't make sense to do it that way. Instead of inferring that something must have happened by cross-referencing conditions across datasets, just do the following: in MediaWiki, every time a user makes an edit, check their registration date and edit count. If the date is within the last thirty days and the edit count is 5, log an event. Doing it this way will easily scale to the entire cluster, not just enwiki, and to any number of bins, not just 5 edits.

Patch at <https://gerrit.wikimedia.org/r/#/c/98079/>; you can take it from there if you like.

Thanks Ori - this sounds and looks viable to me, and seems like a better solution. Kenan, Jon, Dario, Dan, et al - can we move forward with this? 

I'm ok with this.  I do see it as a temporary measure though.  What Ori says here, "inferring that something must have happened", is sort of the whole reason SQL exists.  In my opinion, the problem is that these two data sources can't be joined efficiently to do analytics work on them.  But since that's a harder problem at the moment, I agree with Ori's solution.

Jon/Arthur, who set up your Event Logging solution and do you need help reviewing / merging this Change?  I don't know much about Event Logging but I'm happy to learn and help if you need.

Dan