[Mediawiki-l] possible revision comparison optimization with diff3?

Tim Starling tstarling at wikimedia.org
Sun Mar 1 01:58:52 UTC 2009


tlg wrote:
> Hello, I run a sort of semi busy wiki, and I have been experiencing
> difficulties with its CPU load lately, with load jumping to as high as 140
> at noon (not 1.4, not 14, but ~140). Obviously this brought the site to a
> crawl. After investigation I have found the course- multiple diff3
> comparisons were called at the same time.
> 
> To explain the cause of this needs a little background explanation. The wiki
> I run deals with the edit of large text files. It is common to see pages
> with hundreds of kb of pure text on any given wiki page. Normally my servers
> would be able to handle the edit requests of these pages.
> 
> However, it seems that searchbots/crawlbots (from both search engines and
> individual users) have been hitting my wiki pretty hard lately. Each of
> these bots tries to copy all the pages, this include Revision History of
> each of these 100kb sized wiki text pages. Since each page could have
> potentially hundreds of edits, for every single large text files, hundreds
> of Revision history diff (from lighttpd/apache -> php5 -> diff3? ) are
> spawned.

diff3 is invoked in two cases: on page save when there is an edit
conflict, and when someone clicks "undo". Neither is particularly
vital to the operation of the wiki, so the first thing you should do
is turn them both off, using

$wgDiff3 = false;

in LocalSettings.php. Then see if that fixes your load problems. If it
does, then you were right about diff3 being the problem. Next you
should look at your logs to find out where the edits or undo requests
are coming from.

If the problem is undo requests from search engine crawlers, you could
fix the problem by disabling anonymous edits. This will prevent the
bots from accessing the undo link.

Please tell us what you find, because it's likely that you're not the
only one having this problem.

-- Tim Starling




More information about the MediaWiki-l mailing list