some thoughts:

• All data in the log DB strictly follows https://meta.wikimedia.org/wiki/Schema:EventCapsule. This includes fields such as seqId and uuid that allow recovery of data from the raw JSON dumps. Should something catastrophic happen, we could restore the entire DB by re-importing raw JSON data, which is guaranteed to match the EventCapsule specs. This wouldn’t apply to any custom table created in the DB with data from a different source.

• For the same reason, should a global change apply to EventCapsule (for example https://bugzilla.wikimedia.org/show_bug.cgi?id=52295 ) all tables would need to have their schema updated. Hosting custom tables with arbitrary schemas not matching EventCapsule specs would make global updates unnecessarily complicated. 

• Writing of data into the log DB is intentionally restricted to the eventlog user, which was created for the unique purpose to autogenerate tables and write data into SQL when new schemas are deployed in production. Making an exception to this principle sets a precedent whereby humans and other scripts can arbitrarily manipulate data or create tables in the DB, which is a first step towards turning the log db into the same zoo that the staging db is.

On Dec 6, 2013, at 6:39 AM, Dan Andreescu <dandreescu@wikimedia.org> wrote:

On Thu, Dec 5, 2013 at 4:18 PM, Matthew Flaschen <mflaschen@wikimedia.org> wrote:
On 12/05/2013 09:49 AM, Dan Andreescu wrote:
I think the plan is to change this EL instance to log the event "this
user has now made X edits within their first 30 days", where X is in
{1,5,10,25,50,100}.  That will start happening when this patch:

https://gerrit.wikimedia.org/r/#/c/98079/1/WikimediaEvents.php is
merged.  So my idea is to backfill these milestone events so new the
query can just be:

Yeah, but if it's backfilled, it's no longer an actual EL event.  I agree with Dario it's cleaner to have a separate table (holding only the backfilled data), and do a UNION.

Matt Flaschen

 I guess it's up to the mobile web team, but I disagree on YAGNI grounds.  What's an example of a situation in which you'd care that these milestone EL events are "actual" vs. back-filled?
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics