Someone on IRC was asking about performance relative to DokuWiki, a lightweight file-backed PHP-based wiki.
Quick benchmark results and some comments on ways we can improve: http://meta.wikimedia.org/wiki/DokuWiki_vs_MediaWiki_benchmarks
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Quick benchmark results and some comments on ways we can improve: http://meta.wikimedia.org/wiki/DokuWiki_vs_MediaWiki_benchmarks
One of the issues seems to be cache hits vs. non-hits, as always. Something I've poked around at a bit, but not found a clean way to start hacking on without completely rewriting large section of the code, is moving from a "render on view but cache" to a "render on edit, and then just serve static pages on view" model. The "render on edit" would just generate a static HTML file for non-logged-in users, so a large proportion of page-views would never hit PHP at all.
Is something like that at all feasible?
-Mark
Delirium wrote:
One of the issues seems to be cache hits vs. non-hits, as always. Something I've poked around at a bit, but not found a clean way to start hacking on without completely rewriting large section of the code, is moving from a "render on view but cache" to a "render on edit, and then just serve static pages on view" model. The "render on edit" would just generate a static HTML file for non-logged-in users, so a large proportion of page-views would never hit PHP at all.
Is something like that at all feasible?
I think it's feasible in that it's been done for similar sites, but it's not necessarily a good direction to be heading in. Other systems that have taken this approach (eg message boards) have sometimes found it to be worse than the cache-on-view approach. You'd certainly have to do some preliminary benchmarking first.
Remember that a change to one page can cause changes to other pages, and those other pages all need to be updated. We don't really seem to have a great system for working out the "dependencies" of a page change at the moment, it seems when you include pages with {{}} you often have to flush the cache to get them to re-render correctly. If you do decide to rip the caching system apart, I hope you pay attention to this area, since it's quite annoying.
Shane.
Shane King wrote:
Remember that a change to one page can cause changes to other pages, and those other pages all need to be updated. We don't really seem to have a great system for working out the "dependencies" of a page change at the moment, it seems when you include pages with {{}} you often have to flush the cache to get them to re-render correctly. If you do decide to rip the caching system apart, I hope you pay attention to this area, since it's quite annoying.
Template cache invalidation is far from ideal. The problem is that template inclusions are stored in the link table, there's no way to tell the difference between links and inclusions. We use a kind of heuristic -- we assume that all links to a page in the Template namespace are inclusions. When a page in the Template namespace is changed, all pages linking to it are invalidated. Many templates are outside the template namespace, so they don't update their referring pages properly.
We're getting quite a large collection of different link tables, and Brion and I agree that it would be better if we merged them, with a "link type" field, rather than adding yet another table.
-- Tim Starling
On Dec 8, 2004, at 5:15 PM, Shane King wrote:
Remember that a change to one page can cause changes to other pages, and those other pages all need to be updated.
Bingo -- that's the big ugly. Creating or killing a page changes the look and behavior of links to it, requiring the pages with the links in them to be re-rendered. Changes to templates likewise require invalidating the including pages (something we currently only do one level deep, IIRC, which sometimes causes problems).
As it is, updating the cache invalidation timestamps and purging the squid cache of affected pages can be a significant burden in some cases. Having to also re-render all those tens, or hundreds, or thousands of affected pages immediately would quickly become impractical.
-- brion vibber (brion @ pobox.com)
Do you load extension only when used into text? If not, we can include_once() those modules only when we found the corresponding markup.
How many SQL access are needed to render an empty page? How many per link?
Aoineko
Brion Vibber wrote :
Someone on IRC was asking about performance relative to DokuWiki, a lightweight file-backed PHP-based wiki.
Quick benchmark results and some comments on ways we can improve: http://meta.wikimedia.org/wiki/DokuWiki_vs_MediaWiki_benchmarks
-- brion vibber (brion @ pobox.com)
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On Dec 8, 2004, at 9:25 PM, Guillaume Blanchard wrote:
Do you load extension only when used into text? If not, we can include_once() those modules only when we found the corresponding markup.
Extensions are loaded from LocalSettings.php. You could split an extension into a stub initialiser/hook setup file and a 'guts' file, which might be worth it if it's very long and will be used on a site that can't install an opcode cache.
How many SQL access are needed to render an empty page?
For a file cache hit we're down to this:
BEGIN SELECT N FROM user_newtalk WHERE user_ip='X' SELECT cur_id FROM cur WHERE cur_namespace='X' LIMIT N SELECT cur_touched,cur_is_redirect FROM cur WHERE cur_id='X' LIMIT N UPDATE LOW_PRIORITY site_stats SET ss_total_views=(ss_total_views+N),ss_total_edits=(ss_total_edits),ss_goo d_articles=(ss_good_articles) WHERE ss_row_id=N UPDATE cur SET cur_counter = cur_counter + N WHERE cur_id = N COMMIT
These take a couple milliseconds even in good times, but without a PHP opcode cache (Turck etc) they are dwarfed by PHP parsing by about a factor of 50.
A page render or parser cache hit will take a few more queries to load the message cache, the page data, and the parser cache data (if any) and a few individual link checks. A logged-in user will also load some user data and check watchlist status.
How many per link?
Links in the body of a wiki page are saved up and checked in large chunked queries (one per namespace). Template inclusions and links in the skin are checked individually as they are found. If you turn on the stub threshold, a query must be run for every link.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org