(Aaron? still keen?)
Totally. :). I'm down for some query performance review too. I'm already doing that informally.
For queries that need indexes on wiki schemas, I think they will always
need to be checked by Ops or a MW core dev.
Sounds reasonable. Are you imagining a gerrit style code review? On May 12, 2014 6:32 AM, "Sean Pringle" springle@wikimedia.org wrote:
Hi Giles, Aaron
Actually, I've started collecting data so we an analyze the analysts... or at least your SQL ;-)
https://tendril.wikimedia.org/report/slow_queries?host= ^db1047&user=&schema=&qmode=eq&query=&hours=12
(only accessible to WMF staff)
Which ones are your tsvs stuff?
As you no doubt know, adding indexes to an RDBMS arbitrarily will eventually cause the system to struggle as overhead for writes increases. Also, while indexing is necessary and definitely acceptable, it shouldn't be the only port of call when performance tuning. So, I'd like to approach this carefully:
- For the day-to-day queries run by your tools, like most of those
queries on the URL above I expect, I'll begin adding some appropriate indexes, and also follow up with you guys if queries can be redesigned to be more efficient.
First example awaiting feedback: https://gerrit.wikimedia.org/r/#/c/131929/
- For the ad-hoc or massive reporting queries like those Aaron runs from
time to time, we give the halfak user DDL permissions for the "log" schema just as Aaron suggested, and let him spearhead the effort because I won't have a hope of predicting what you'll need.
However, regarding #2: We need to tread carefully because DDL for index operations is not necessarily something that can run any old time. Plain old ALTER TABLE will cause replication lag and table metadata locks, and I suggest whoever handles it should become familiar with percona toolkit. (Aaron? still keen?)
- For queries that need indexes on wiki schemas, I think they will always
need to be checked by Ops or a MW core dev. Obviously you guys could only affect yourselves, but I bet you wouldn't appreciate having db1047 broken by an incompatible upstream change.
What do you think?
BR Sean
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics