>(Aaron? still keen?)

Totally. :). I'm down for some query performance review too.  I'm already doing that informally.

> For queries that need indexes on wiki schemas, I think they will always need to be checked by Ops or a MW core dev. 

Sounds reasonable.  Are you imagining a gerrit style code review?

On May 12, 2014 6:32 AM, "Sean Pringle" <springle@wikimedia.org> wrote:
Hi Giles, Aaron

Actually, I've started collecting data so we an analyze the analysts... or at least your SQL ;-)

https://tendril.wikimedia.org/report/slow_queries?host=^db1047&user=&schema=&qmode=eq&query=&hours=12

(only accessible to WMF staff)

Which ones are your tsvs stuff?

As you no doubt know, adding indexes to an RDBMS arbitrarily will eventually cause the system to struggle as overhead for writes increases. Also, while indexing is necessary and definitely acceptable, it shouldn't be the only port of call when performance tuning. So, I'd like to approach this carefully:

1. For the day-to-day queries run by your tools, like most of those queries on the URL above I expect, I'll begin adding some appropriate indexes, and also follow up with you guys if queries can be redesigned to be more efficient.

First example awaiting feedback: https://gerrit.wikimedia.org/r/#/c/131929/

2. For the ad-hoc or massive reporting queries like those Aaron runs from time to time, we give the halfak user DDL permissions for the "log" schema just as Aaron suggested, and let him spearhead the effort because I won't have a hope of predicting what you'll need.

However, regarding #2: We need to tread carefully because DDL for index operations is not necessarily something that can run any old time. Plain old ALTER TABLE will cause replication lag  and table metadata locks, and I suggest whoever handles it should become familiar with percona toolkit. (Aaron? still keen?)

3. For queries that need indexes on wiki schemas, I think they will always need to be checked by Ops or a MW core dev. Obviously you guys could only affect yourselves, but I bet you wouldn't appreciate having db1047 broken by an incompatible upstream change.

What do you think?

BR
Sean

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics