Hi Giles, Aaron
Actually, I've started collecting data so we an analyze the analysts... or
at least your SQL ;-)
https://tendril.wikimedia.org/report/slow_queries?host=
^db1047&user=&schema=&qmode=eq&query=&hours=12
(only accessible to WMF staff)
Which ones are your tsvs stuff?
As you no doubt know, adding indexes to an RDBMS arbitrarily will
eventually cause the system to struggle as overhead for writes increases.
Also, while indexing is necessary and definitely acceptable, it shouldn't
be the only port of call when performance tuning. So, I'd like to approach
this carefully:
1. For the day-to-day queries run by your tools, like most of those queries
on the URL above I expect, I'll begin adding some appropriate indexes, and
also follow up with you guys if queries can be redesigned to be more
efficient.
First example awaiting feedback:
https://gerrit.wikimedia.org/r/#/c/131929/
2. For the ad-hoc or massive reporting queries like those Aaron runs from
time to time, we give the halfak user DDL permissions for the "log" schema
just as Aaron suggested, and let him spearhead the effort because I won't
have a hope of predicting what you'll need.
However, regarding #2: We need to tread carefully because DDL for index
operations is not necessarily something that can run any old time. Plain
old ALTER TABLE will cause replication lag and table metadata locks, and I
suggest whoever handles it should become familiar with percona toolkit.
(Aaron? still keen?)
3. For queries that need indexes on wiki schemas, I think they will always
need to be checked by Ops or a MW core dev. Obviously you guys could only
affect yourselves, but I bet you wouldn't appreciate having db1047 broken
by an incompatible upstream change.
What do you think?
BR
Sean