tlg wrote:
Hello, I run a sort of semi busy wiki, and I have been
experiencing
difficulties with its CPU load lately, with load jumping to as high as 140
at noon (not 1.4, not 14, but ~140). Obviously this brought the site to a
crawl. After investigation I have found the course- multiple diff3
comparisons were called at the same time.
To explain the cause of this needs a little background explanation. The wiki
I run deals with the edit of large text files. It is common to see pages
with hundreds of kb of pure text on any given wiki page. Normally my servers
would be able to handle the edit requests of these pages.
However, it seems that searchbots/crawlbots (from both search engines and
individual users) have been hitting my wiki pretty hard lately. Each of
these bots tries to copy all the pages, this include Revision History of
each of these 100kb sized wiki text pages. Since each page could have
potentially hundreds of edits, for every single large text files, hundreds
of Revision history diff (from lighttpd/apache -> php5 -> diff3? ) are
spawned.
diff3 is invoked in two cases: on page save when there is an edit
conflict, and when someone clicks "undo". Neither is particularly
vital to the operation of the wiki, so the first thing you should do
is turn them both off, using
$wgDiff3 = false;
in LocalSettings.php. Then see if that fixes your load problems. If it
does, then you were right about diff3 being the problem. Next you
should look at your logs to find out where the edits or undo requests
are coming from.
If the problem is undo requests from search engine crawlers, you could
fix the problem by disabling anonymous edits. This will prevent the
bots from accessing the undo link.
Please tell us what you find, because it's likely that you're not the
only one having this problem.
-- Tim Starling