Which ones are your tsvs stuff?
The ones I'm trying to improve are querying MediaViewer_8245578 (just grep for that on the page you've linked to, there will be some things I ran manually today, but the big queries are the problematic ones). The existing queries are full table scans and seem to take around 15 minutes. We run 27 of these every day one after the other. The index I want to add will actually help me break those queries down into several smaller queries. There's no point trying to break them down right now, because I expect that without indexes the simpler queries will take just as long because they'll be full table scans as well.
On Mon, May 12, 2014 at 12:32 PM, Sean Pringle springle@wikimedia.orgwrote:
Hi Giles, Aaron
Actually, I've started collecting data so we an analyze the analysts... or at least your SQL ;-)
https://tendril.wikimedia.org/report/slow_queries?host= ^db1047&user=&schema=&qmode=eq&query=&hours=12
(only accessible to WMF staff)
Which ones are your tsvs stuff?
As you no doubt know, adding indexes to an RDBMS arbitrarily will eventually cause the system to struggle as overhead for writes increases. Also, while indexing is necessary and definitely acceptable, it shouldn't be the only port of call when performance tuning. So, I'd like to approach this carefully:
- For the day-to-day queries run by your tools, like most of those
queries on the URL above I expect, I'll begin adding some appropriate indexes, and also follow up with you guys if queries can be redesigned to be more efficient.
First example awaiting feedback: https://gerrit.wikimedia.org/r/#/c/131929/
- For the ad-hoc or massive reporting queries like those Aaron runs from
time to time, we give the halfak user DDL permissions for the "log" schema just as Aaron suggested, and let him spearhead the effort because I won't have a hope of predicting what you'll need.
However, regarding #2: We need to tread carefully because DDL for index operations is not necessarily something that can run any old time. Plain old ALTER TABLE will cause replication lag and table metadata locks, and I suggest whoever handles it should become familiar with percona toolkit. (Aaron? still keen?)
- For queries that need indexes on wiki schemas, I think they will always
need to be checked by Ops or a MW core dev. Obviously you guys could only affect yourselves, but I bet you wouldn't appreciate having db1047 broken by an incompatible upstream change.
What do you think?
BR Sean
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics