Lee Daniel Crocker wrote:
Now that I have the test suite working and installation is quick, I set up the software on a freshly-installed machine on my home network, ran the suite, reinstalled using InnoDB tables instead of MyISAM, ran again, installed MySQL 4.0.12, and ran again.
The semi-bad news: there didn't seem to be any difference in performance with any of these changes. The variance in timing among setups wasn't much more than the variance from one run to the next. The actual numbers are below. Probably the most important numbers are the "sec per fetch" and "sec per search" at the end--those are the timings of regular page fetches and searches done by background threads that run during the conformance tests and best simulate actual use.
The differences between MySQL versions and table types may not be the determining factor in performance here. Inconsequential test results could indicate a performance bottleneck on your test system. Disk throughput, available RAM or other could be limiting all test configurations.
For Example: -If maximum disk throughput on your test system is 18 Mbytes/sec, all configurations may produce similar results at this level. -Increase the disk throughput to 33 Mbytes/sec. At this level, configuration #1 may outperform configuration #2 because it is capable of taking advantage of the increased disk throughput. Configuration 2# may reach maximum performance at 28 Mbytes/sec with little to no improvement at 33 Mbytes/sec. The performance of configuration #1 could taper higher than 33 Mbytes/sec, say 39 Mbytes/sec.
Or on the other hand, your message indicates that default MySQL configurations were used. The default configuration options may not be taking advantage of the resources available on your test system. The next step could be adjusting these configurations to optimize the use of available resources.
The fact that Wikipedia can be installed on various configurations and see similar results, is good. Because, it provides a solid baseline for performance measurement.
BTW, this is my first post to the list and I wanted to note and thank all of you for the excellent work this project has produced. We are testing the Wikipedia engine for use as a team knowledgebase. I know there are other engines that may be more suitable for this, but it was hard to pass up the combination of features included in Wikipedia.
Thank you.
-- Jason Dreyer
(Dreyer, Jason Jason.Dreyer@deg.state.wi.us):
The differences between MySQL versions and table types may not be the determining factor in performance here.
Absolutely; lies, damned lies, and benchmarks, and all that. Disk I/O may well be a major culprit. Memory/CPU usage probably isn't. I'll run some more tests to check some of those things. I'll also run some tests for things like having the database on a separate machine, even PostgreSQL if I have the time (I'd appreciate it if the fellow who said he had it working would send me a patch).
But I did want to get this initial set of numbers out there for discussion, and even these first limited results do give me some warm fuzzies about MySQL 4.0.12, which was something I wanted to look hard at because I wanted some of its features.
I'd lso appreciate suggestions for other benchmarks (specific MySQL settings, for example).
On Thu, 2003-04-17 at 11:42, Lee Daniel Crocker wrote:
...even these first limited results do give me some warm fuzzies about MySQL 4.0.12, which was something I wanted to look hard at because I wanted some of its features.
If you get a chance, you might try tweaking the search to use the new boolean search mode, and see what performance impact that has compared to what we're doing now. (IIRC you have to alter the fulltext index in some way. MySQL docs should describe it.)
I'm going to be busy and/or out of town for a few days, so I won't have a chance soon... :)
-- brion vibber (brion @ pobox.com)
Lee Daniel Crocker wrote:
Absolutely; lies, damned lies, and benchmarks, and all that. Disk I/O may well be a major culprit. Memory/CPU usage probably isn't. I'll run some more tests to check some of those things. I'll also run some tests for things like having the database on a separate machine, even PostgreSQL if I have the time (I'd appreciate it if the fellow who said he had it working would send me a patch).
If you meen me with the fellow, I thought I mentioned that I got the data into a Postgresql DB, nothing more. To do this is easy, but that will not result in a database you will run a 'speed' test with. And I don't have the PHP Code running. I only use it to test queries.
To get a pg database which will give useful results, you have to modify the db schemas in a appropriate way (and later the queries in the php pages). Without this pg will loose in any test (especially speed measuring ones).
But my Server (a very fast Pentium 1 with 90 Mhz) is pretty happy with the german wikipedia pages ;)
Nevertheless, I can help on questions concerning a proper implementation, and I will continue to port the schemas. The next test I will work on is a propper fulltext search, which is handled by a pg add on. But my Server is a bit slow in moving of about 27MB data (german wiki) ;)
Smurf
(Thomas Corell T.Corell@t-online.de):
I thought I mentioned that I got the data into a Postgresql DB, nothing more. To do this is easy, but that will not result in a database you will run a 'speed' test with. And I don't have the PHP Code running. I only use it to test queries.
Ah, I see. Then testing PG is further away. Once you do get a reasonably efficient schema, fulltext search, etc., let me know and I'll see how easy it is to abstract those functions in the wiki software sufficiently to allow for a database comparison. For now I'll keep testing MySQL tweaks.
Lee Daniel Crocker wrote:
Ah, I see. Then testing PG is further away. Once you do get a reasonably efficient schema, fulltext search, etc., let me know and I'll see how easy it is to abstract those functions in the wiki software sufficiently to allow for a database comparison. For now I'll keep testing MySQL tweaks.
Just a question for optimizing the right way. Are the queries known used mainly? This will help e.g. to setup helpful views. I think all queries concerning the displaying of a page, e.g. Or is your test suite a proper place to look for such queries?
Smurf
(Thomas Corell T.Corell@t-online.de):
Just a question for optimizing the right way. Are the queries known used mainly? This will help e.g. to setup helpful views. I think all queries concerning the displaying of a page, e.g. Or is your test suite a proper place to look for such queries?
I'm sorry, I don't understand that question at all. There are no views at all used in the DB. All queries are composed by the software referring directly to the database tables, and are about as optimal as we could make them under the limits of MySQL, but it's quite possible that we've missed a number of optimizations.
The test suite interacts with the wiki over the web, just a user would, so it has no knowledge of any code internals.
Lee Daniel Crocker wrote:
(Thomas Corell T.Corell@t-online.de):
Just a question for optimizing the right way. Are the queries known used mainly? This will help e.g. to setup helpful views. I think all queries concerning the displaying of a page, e.g. Or is your test suite a proper place to look for such queries?
I'm sorry, I don't understand that question at all. There are no views at all used in the DB. All queries are composed by the software referring directly to the database tables, and are about as optimal as we could make them under the limits of MySQL, but it's quite possible that we've missed a number of optimizations.
Well, of course there are actually no views - MySQL don't support them. But PostgreSQL does. And if you have for one often used operation (e.g. displaying a wikipage) a select operation depending on a selection of tables and rows, it can improve performance if you have a view with proper indices exactly optimized for this operation.
Knowing these operations and the time they need to fullfill a successful operation, plus the usage statistic of this operation, leads to the knowlegde which of them will need as much performance as it can get. Example: DB-costs / operation operations / hour total time Update of a page: 2sec 1000 2000sec Display a page 1sec 100000 100000sec
If you can reduce the DB-costs/operation by 50% ( 1sec , 0.5sec ) you get for the Update 1000sec benefit, for the Display 50000sec. This shows that getting a better performance on Update operation is quite useless.
I hope this explanation was a bit more clearly. I will take a look at your test suite and again at the php source. If I get a proper running PostgreSQL configuration, I will tell you.
The test suite interacts with the wiki over the web, just a user would, so it has no knowledge of any code internals.
Smurf
wikitech-l@lists.wikimedia.org