Hello,
So recently I've been trying to find a method where I could test database queries on production data. I originally thought Tool Labs might help, but I found out due to a bug in MySQL you can't perform EXPLAIN queries in the database replicas. And naturally I don't have access to the actual production cluster because I'm not on ops.
My last method to try is going to be downloading the enwiki database dump and setting up my own replica. However, this involves hard drive space that I don't have.
Basically I want to know if anybody is aware of any other method through which I can test queries? Thanks in advance.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science
On 09/10/13 13:38, Tyler Romeo wrote:
Hello,
So recently I've been trying to find a method where I could test database queries on production data. I originally thought Tool Labs might help, but I found out due to a bug in MySQL you can't perform EXPLAIN queries in the database replicas. And naturally I don't have access to the actual production cluster because I'm not on ops.
My last method to try is going to be downloading the enwiki database dump and setting up my own replica. However, this involves hard drive space that I don't have.
EXPLAIN on a small wiki will usually give you the same query plan as EXPLAIN on a large wiki, if the indexes are the same. So you can try running your EXPLAIN against your local test wiki. But for the most part, we estimate query performance a priori, based on code review and what we know about MySQL, rather than measuring performance during testing.
Note that EXPLAIN can be misleading even if it is run against a production database. Even actual performance measurement in production can be misleading, if the test is run against a small table which will gain rows over time. And measuring execution time doesn't tell you everything about the performance of the server -- for example, it doesn't show buffer pool usage.
-- Tim Starling
They often give the same results on smallish wikis, but I wouldn't carry that over to test wikis unless lots of content and users, logging, other table data was somehow imported in. For example a tiny user table might make mysql start INNER JOINs with that table in queries where it would never do that in production. In my experience development test wikis are often useless for estimating what query plan will happen in production.
A smallish wiki with 10ks of pages and the full history and the table data (not just revision/page/*links stuff from dumps) would probably be useful. I'm not sure where the threshold roughly starts though.
-- View this message in context: http://wikimedia.7.x6.nabble.com/Method-of-Testing-DB-Queries-tp5014676p5014... Sent from the Wikipedia Developers mailing list archive at Nabble.com.
On Wed, Oct 9, 2013 at 2:09 AM, Aaron Schulz aschulz4587@gmail.com wrote:
They often give the same results on smallish wikis, but I wouldn't carry that over to test wikis unless lots of content and users, logging, other table data was somehow imported in. For example a tiny user table might make mysql start INNER JOINs with that table in queries where it would never do that in production. In my experience development test wikis are often useless for estimating what query plan will happen in production.
This has been my experience as well. Using my test wiki is pretty much useless.
A smallish wiki with 10ks of pages and the full history and the table data
(not just revision/page/*links stuff from dumps) would probably be useful. I'm not sure where the threshold roughly starts though.
OK, maybe I'll try this then. Thanks for the advice.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science
On Wed, Oct 9, 2013 at 6:09 AM, Aaron Schulz aschulz4587@gmail.com wrote:
A smallish wiki with 10ks of pages and the full history and the table data (not just revision/page/*links stuff from dumps) would probably be useful. I'm not sure where the threshold roughly starts though.
We could get a script included in maintenance or at least vagrant to populate some abusefilters/etc. and then generate some fake accounts/prefs/edits/log entries/etc. maybe with selenium? (and the API. but the data won't be complete without edits made with an HTML interface.)
See https://github.com/fzaninotto/Faker or one of the several variants in other languages.
-Jeremy
wikitech-l@lists.wikimedia.org