[Resending mail; the first copy I sent apparently did not make it to the list]
Hi,
apologies for the long mail. Here is a summary:
We are experiencing overly high memory needs on some of our MediaWiki
websites. I tried to trace the problem but my practical findings leave me
somewhat puzzled. In particular, I initially expected Semantic MediaWiki to
cause some of the problem, but now it seems that the problems are rather
caused by the fact that SMW can easily generate long outputs, not by SMW's
memory need for creating these outputs. I am not getting any further with
debugging the problem, and I do not know how to improve SMW/MW to avoid it.
== How to do memory profiling? ==
I tried to enable PHP memory profiling in xdebug, but all I got was time data,
and I gave up on this for now. The aggregated outputs in profileinfo.php were
not very useful for me either; in particular, I think that they do not take
garbage collection into account, i.e. they only show new memory allocations,
but not the freeing of old memory. So one piece of code may allocate 20M but
never need more than 4M at a time, while another consumes the same amount and
keeps it due to some mem leak. Especially, the sums and percentages do
apparently not show the real impact that a piece of code has on using PHP's
memory.
So I based my memory estimations on the minimal PHP memory limit that would
not return a blank page when creating a page preview
(ini_set('memory_limit',...);). This measure is rather coarse for debugging,
but it might be the one number that matters most to the user. The results were
reproducible.
== Sudden memory explosion and other findings ==
The strangest observation I made on my local machine (Kubuntu Linux, no caches
whatsoever, MW 1.16alpha r56781, PHP 5.2.6-3ubuntu4.2) was that there tends to
be a sharp boundary between "no memory problem" and "massive memory
consumption". Even long pages could be generated with as little as 4M of PHP
memory. However, if they got just a little too long (one additional line in a
table), then only a memory limit of 50M or more was sufficient to get a
result. I disabled extensions for the test. The table I used for testing was
about 62K with roughly 150 lines, and it used HTML tags, CSS classes, and MW
links.
The findings really indicate a kind of explosion, not a gradual increase.
Disabling extensions increased the maximal size of the table by a few rows
each, but the explosion still happened in the same way. Of course, it is not
clear how useful the PHP memory limit method of measurement is here.
== What to do about it? ==
How can I fix this? I noticed that it helps to shorten the input (e.g. I can
render longer tables if the CSS class names in the tables are shorter!), but
also if I simplify the input (replacing links by plain texts). The table I use
is based on HTML syntax: I did not try yet if MW's pipe syntax leads to better
performance. So one option would be to try and simplify SMW's table code
(maybe making it less readable).
But this would only shift the problem towards longer tables. Caching certainly
would also reduce memory consumption, but I have observed very high memory
need on sites even with APC as a PHP bytecode cache (as I said: loading less
code simply moved the problem by a few lines). I did not try object caching
(memcached or APC) yet -- is it expected to help here? Squid does not solve
the problem, since the page still needs to be rendered at some point, and the
memory limit must be high enough to allow this (and if there is a cache miss
and rebuilding the cache takes very long, then there can be another cache miss
while the first request is not done rebuilding the page -- we have had this
killing one of our servers recently).
Besides this, I would still like to know which measures in SMW or MW could
help to reduce the problem at its source. Maybe it would help to know which
aspects of parsing a table are having the highest impact on MW's memory usage.
Or is this a PHP issue?
Any relevant insights/pointers are welcome,
Markus
--
Markus Krötzsch <markus(a)semantic-mediawiki.org>
* Personal page:
http://korrekt.org
* Semantic MediaWiki:
http://semantic-mediawiki.org
* Semantic Web textbook:
http://semantic-web-book.org
--