[Resending mail; the first copy I sent apparently did not make it to the list]
Hi,
apologies for the long mail. Here is a summary:
We are experiencing overly high memory needs on some of our MediaWiki websites. I tried to trace the problem but my practical findings leave me somewhat puzzled. In particular, I initially expected Semantic MediaWiki to cause some of the problem, but now it seems that the problems are rather caused by the fact that SMW can easily generate long outputs, not by SMW's memory need for creating these outputs. I am not getting any further with debugging the problem, and I do not know how to improve SMW/MW to avoid it.
== How to do memory profiling? ==
I tried to enable PHP memory profiling in xdebug, but all I got was time data, and I gave up on this for now. The aggregated outputs in profileinfo.php were not very useful for me either; in particular, I think that they do not take garbage collection into account, i.e. they only show new memory allocations, but not the freeing of old memory. So one piece of code may allocate 20M but never need more than 4M at a time, while another consumes the same amount and keeps it due to some mem leak. Especially, the sums and percentages do apparently not show the real impact that a piece of code has on using PHP's memory.
So I based my memory estimations on the minimal PHP memory limit that would not return a blank page when creating a page preview (ini_set('memory_limit',...);). This measure is rather coarse for debugging, but it might be the one number that matters most to the user. The results were reproducible.
== Sudden memory explosion and other findings ==
The strangest observation I made on my local machine (Kubuntu Linux, no caches whatsoever, MW 1.16alpha r56781, PHP 5.2.6-3ubuntu4.2) was that there tends to be a sharp boundary between "no memory problem" and "massive memory consumption". Even long pages could be generated with as little as 4M of PHP memory. However, if they got just a little too long (one additional line in a table), then only a memory limit of 50M or more was sufficient to get a result. I disabled extensions for the test. The table I used for testing was about 62K with roughly 150 lines, and it used HTML tags, CSS classes, and MW links.
The findings really indicate a kind of explosion, not a gradual increase. Disabling extensions increased the maximal size of the table by a few rows each, but the explosion still happened in the same way. Of course, it is not clear how useful the PHP memory limit method of measurement is here.
== What to do about it? ==
How can I fix this? I noticed that it helps to shorten the input (e.g. I can render longer tables if the CSS class names in the tables are shorter!), but also if I simplify the input (replacing links by plain texts). The table I use is based on HTML syntax: I did not try yet if MW's pipe syntax leads to better performance. So one option would be to try and simplify SMW's table code (maybe making it less readable).
But this would only shift the problem towards longer tables. Caching certainly would also reduce memory consumption, but I have observed very high memory need on sites even with APC as a PHP bytecode cache (as I said: loading less code simply moved the problem by a few lines). I did not try object caching (memcached or APC) yet -- is it expected to help here? Squid does not solve the problem, since the page still needs to be rendered at some point, and the memory limit must be high enough to allow this (and if there is a cache miss and rebuilding the cache takes very long, then there can be another cache miss while the first request is not done rebuilding the page -- we have had this killing one of our servers recently).
Besides this, I would still like to know which measures in SMW or MW could help to reduce the problem at its source. Maybe it would help to know which aspects of parsing a table are having the highest impact on MW's memory usage. Or is this a PHP issue?
Any relevant insights/pointers are welcome,
Markus