Hi Aaron,

I like the tool Marcel built in the spring.  It's called reportupdater and it's been pretty stable and useful but it's not documented because we haven't publicized it yet.  What it does is allow you to configure templates for SQL or shell scripts that take parameters and generate separated value files as output.  You can specify the time granularity that you want results for and it will re-run jobs for time periods that don't exist in the output (because of failures, etc.).  It also does other useful things like reports errors like a champ and ensures only one instance is running at any given time.  You can even change your scripts to output new columns or re-arrange the column order and it will morph the output files to match the new header (you just can't remove columns - because that's crazy!).

If you wanna talk more about it I'd like to give you the details privately because I'd want to start documenting this tool properly as I do.

On Tue, Oct 6, 2015 at 12:16 PM, Aaron Halfaker <ahalfaker@wikimedia.org> wrote:
Hey folks,

I know there was some work in the past on systems to support keeping database reports up to date.  I'm looking into this type of work with Jeph Paul now and I realized I don't have any good pointers to this past work.  Right now, we're looking at running database reports based on cron jobs and checking the recentchanges table to make sure that replication isn't too lagged.  Is there a better way?

FWIW, I expect these queries to run daily and have a runtime of up to an hour.

-Aaron

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics