I run a MediaWiki 1.17.0 site for 3000 users (see the architecture below) and would appreciate some tips on improving performance. Specifically, what should we try next, given our current setup? (I have read http://www.mediawiki.org/wiki/Manual:Performance_tuning.)
The platform is a single VMware virtual machine (CentOS Linux 5.6) with two CPUs (2.5GHz Opteron) and 3 GB RAM. The whole MediaWiki/LAMP stack runs on this VM, including mySQL. This is on a fast intranet, so network speed is not an issue. Other statistics include:
* Page views per day: 22,000 (about 30 hits per minute during peak hours) * Edits per day: 1200 * Users: 1800 registered editors and 1200 anonymous readers * Titles: 100,000 * Revisions: 850,000 * Page rendering time (based on the embedded HTML comment at the bottom of each page) is about 0.25 to 0.5 seconds today. * System load average usually runs between 1.00 and 4.00. A little swapping occurs (around 135MB swap in use) * RAM buffers free: around 2.3 GB right now. * PHP 5.33, mySQL 5.0.77, httpd 2.2.3
For caching, we use eAccelerator (huge improvement) and $wgMainCacheType = CACHE_ACCEL.
Another important detail: Unlike Wikipedia (and most other wikis), approximately 10,000 of our pages make live SQL queries to non-MediaWiki databases, pull in the results, and display them to the user. This is important for our business, and our users are accustomed to seeing up-to-the-second live data. (So we have not investigated Squid, for example, which I think would cache the rendered pages and therefore lose the "up-to-the-second" live data.)
The problems I am seeing are:
* Sometimes an individual "Save Page" operation will sit for 20-30 seconds before completing. * Occasionally some pages take a long time to render (10-15 seconds) for no discernable reason. (This is not due to the live SQL queries mentioned above.)
I'd like to eliminate these delays and decrease page rendering time to 0.1 second or less.
I have determined that our extensions do not slow the wiki down much. After removing all of them, the speed stays about the same.
Given our architecture, what's the best next step we should investigate to improve performance?
* File cache or Squid? (And is there some easy way to tell Squid to exclude our dynamic SQL pages? They all run a particular wiki extension, so if there's something programmatic we can do in the extension, that's great. I am not very familiar with Squid.) * memcached? * Increase number of CPUs? * Multiple front-end servers? * Change from a VM to a physical machine? * Move mySQL to a separate server (possibly physical)? * Something else?
What additional measurements would be most helpful in making this decision?
Thanks for any advice, DanB
On Wed, Oct 26, 2011 at 1:46 PM, Daniel Barrett danb@vistaprint.com wrote:
[snip] The problems I am seeing are:
Sometimes an individual "Save Page" operation will sit for 20-30
seconds before completing.
Occasionally some pages take a long time to render (10-15 seconds)
for no discernable reason. (This is not due to the live SQL queries mentioned above.)
I'd like to eliminate these delays and decrease page rendering time to 0.1 second or less.
I have determined that our extensions do not slow the wiki down much. After removing all of them, the speed stays about the same.
Given our architecture, what's the best next step we should investigate to improve performance?
A few quick comments on each of these:
File cache or Squid? (And is there some easy way to tell Squid to
exclude our dynamic SQL pages? They all run a particular wiki extension, so if there's something programmatic we can do in the extension, that's great. I am not very familiar with Squid.)
The file cache mode probably isn't as well tested as it used to be, but setting up squid or varnish can be a bit intimidating.
As long as your extension disables caching on the pages that use it, I think either should work fine on the other pages and safely let things through.
Note though that both methods will only cache non-logged in page views; once someone has a login session going, they're not going to get cached there. If much of your traffic is from logged-in users, it may not help much.
memcached?
I would strongly recommend using memcached in place of eAccelerator for data caching for the following reasons:
* it's a prerequisite for adding more web servers (eAccel's cache is not shared between servers)
* this is required to maintain consistency if you also have to run command-line maintenance scripts (eAccel's cache is not shared between web and CLI)
* you can make the cache size as large as you like, including sharding over multiple machines!
Increase number of CPUs?
More CPUs means the web server can process more web requests in parallel, and can definitely be a win under load. In a VM environment this should be easy to try -- bump up the CPU count, reboot, and see how it changes your performance profile under load.
* Multiple front-end servers?
In the Apache+PHP web execution model, multiple front-end servers and more CPUs on the first front-end server are very similar.
If the VMs will all run on the same physical machine then there may not be much benefit against just assigning more CPU and RAM to the first VM. But if you need more VMs to spread them over multiple physical hosts, that's the way to go.
See the notes about about switching from eAccelerator to memcached; you'd also need to make sure that PHP's session storage is shared between all front-end servers. (If it's not, you can use the $wgSessionsInMemcache switch to stick them into the shared cache space.)
Change from a VM to a physical machine?
This may well help; I'd recommend prepping a test by adding another frontend as a VM, and then swap in the same configuration on a real machine and see how they behave.
Move mySQL to a separate server (possibly physical)?
Potentially also a help; MySQL will have its own needs for RAM, disk access etc which may not fit well with the virtualization. This is usually a service that's very easy to try moving out since it's reasonably isolated; you only need move it out and switch the LocalSettings.php to point at the new IP.
Something else?
A few things off the top of my head:
* definitely keep using eAccelerator or APC or another PHP opcode cache: this is key to getting decent basic large-PHP-app performance
* double-check for slow queries on the database; MySQL has logging options that can help.
* if using PHP's default filesystem storage for sessions, consider also trying the memcached sessions option -- default filesystem sessions perform locking serialization which prevents multiple requests from the _same user_ from running at once, which could cause slowness in a few scenarios. Then again it might not hit you at all. :)
* on slow page saves etc; check whether that time is being spent in the page itself or if that's from additional cleanup jobs running. You could try enabling the job queue so it runs cleanup tasks separately (such as invalidating caches & rebuilding links on pages using a template you've edited)
http://www.mediawiki.org/wiki/Manual:Job_queue
* When making performance measurements, look at outliers -- it's often not the average case, but the *horrible horrible 10000x worse worst case* that kills you. :)
There may also be other things like localization recaching that are expensive. A larger memcache space might help with those, not sure.
-- brion
mediawiki-l@lists.wikimedia.org