On Wed, Nov 27, 2013 at 12:18 PM, Dan Andreescu <dandreescu(a)wikimedia.org>wrote;wrote:
On Wed, Nov 27, 2013 at 2:41 PM, Kenan Wang
<kwang(a)wikimedia.org> wrote:
Dan here is what I'm looking for:
How many users registered on enwiki in month X and reached 5 edits within
30 days
I talked with Dario and we're hoping that restricting it to enwiki solves
the cross-db join issue that you were facing.
Thank you. I'll see if I can tune the query to do this efficiently. The
cross-db issue comes from joining the Event Logging table with the
mediawiki table. If my tuning doesn't yield results, the only viable
solution is to import the event logging stuff into a temp table in
labsdb/enwiki_p. Then they'll be on the same database and the query should
fly. Is that possible with the schema you're capturing for mobile
registrations? In other words, can that data be shared publicly?
It doesn't make sense to do it that way. Instead of inferring that
something must have happened by cross-referencing conditions across
datasets, just do the following: in MediaWiki, every time a user makes an
edit, check their registration date and edit count. If the date is within
the last thirty days and the edit count is 5, log an event. Doing it this
way will easily scale to the entire cluster, not just enwiki, and to any
number of bins, not just 5 edits.
Patch at <https://gerrit.wikimedia.org/r/#/c/98079/>; you can take it from
there if you like.