Server performance: Google is killing MediaWiki

List overview All Threads
Download

newer

older

Welcome to mediawiki-l (weekly...

500 Internal Error ONLY with ONE...

Agon S. Buchholz

22 Dec 2007 22 Dec '07

1:30 p.m.

Hi,

I have ongoing and heavy problems with my MediaWiki installation; it causes a continous high CPU load of 100% on a Dual Core AMD opteron (2 GB of RAM).

If I'm monitoring what's going on with "mytop", I find queries like, e.g.:

-- snip --

/* WhatLinksHerePage::showIndirectLinks 66.249.65.201 */ SELECT /*! STRAIGHT_JOIN */ page_id,page_namespace,page_title,page_is_redirect FROM `pagelinks`,`page` WHERE (page_id=pl_from) AND pl_namespace = '0' AND pl_title = 'Fungistatikum' ORDER BY pl_from LIMIT 51

-- or --

/* CategoryViewer::doCategoryQuery 217.227.171.13 */ SELECT page_title,page_namespace,page_len,page_is_redirect,cl_sortkey FROM `page`,`categorylinks` FORCE INDEX (cl_sortkey) WHERE (1 = 1) AND (cl_from = page_id) AND cl_to = 'Chemie' ORDER BY cl_sortkey LIMIT 201

-- snip --

66.249.65.201 and the other stressing acces are coming from a Google subnet, so I don't like the idea very much to block these hosts with ipchains.

However, these threads/processes run for hours and continue to increase the load, until the server becomes completey inaccessible (ssh doesn't respond any more) or MySQL simply dies.

I've tried so far (a) to mimic the setup of the Wikimedia wikis with a similar robots.txt, and (b) to enable file caching in MediaWiki; this might have halped a bit, but didn't solve the basic problem (i've to stop and restart mysqld every few hours).

Any help on this (or pointers to existing solutions) is greatly appreciated!

Greetings, asb

Show replies by date

Emufarmers Sangly

22 Dec 22 Dec

2:08 p.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

On Dec 22, 2007 1:30 PM, Agon S. Buchholz asb@kefk.net wrote:

...

I've tried so far (a) to mimic the setup of the Wikimedia wikis with a similar robots.txt,

Oh? Have you blocked /w/? How about special pages?

...

and (b) to enable file caching in MediaWiki; this might have halped a bit, but didn't solve the basic problem (i've to stop and restart mysqld every few hours).

I don't see why file caching would really help here. Other forms of caching, however, such as eAccelerator or APC bytecode caching, and/or memcached, could dramatically improve your wiki's ability to handle traffic.

-- Arr, ye emus, http://emufarmers.com

Agon S. Buchholz

23 Dec 23 Dec

12:44 a.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

Emufarmers Sangly wrote:

...

...
I've tried so far (a) to mimic the setup of the Wikimedia wikis with a similar robots.txt,

...

Oh? Have you blocked /w/? How about special pages?

I hope so: "User-agent: * Disallow: /w/"

The special pages sections are taken directly from the Wikimedia directives.

...

I don't see why file caching would really help here. Other forms of caching, however, such as eAccelerator or APC bytecode caching, and/or memcached, could dramatically improve your wiki's ability to handle traffic.

The general idea was to relieve MediaWiki from generating dynamic pages all over again to free server resources for other tasks. I've never thought I'd need something like APC for my (small) site, but I'll check this out.

Thanks! -asb

Emufarmers Sangly

1:45 a.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

On Dec 23, 2007 12:44 AM, Agon S. Buchholz asb@kefk.net wrote:

...

Emufarmers Sangly wrote:

...
I don't see why file caching would really help here. Other forms of caching, however, such as eAccelerator or APC bytecode caching, and/or memcached, could dramatically improve your wiki's ability to handle

traffic.

The general idea was to relieve MediaWiki from generating dynamic pages all over again to free server resources for other tasks. I've never thought I'd need something like APC for my (small) site, but I'll check this out.

I believe file caching is just for static pages; I don't think it would be of any help here, but PHP caching should improve things.

You might also want to check Apache's logs: Is Googlebot actually hitting your site an inordinate number of times, or is your site just choking on requests to a few particular pages?

-- Arr, ye emus, http://emufarmers.com

MinuteElectron

5:04 a.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

Agon S. Buchholz wrote:

...

Emufarmers Sangly wrote:

...
...
I've tried so far (a) to mimic the setup of the Wikimedia wikis with a similar robots.txt,

...
Oh? Have you blocked /w/? How about special pages?

I hope so: "User-agent: * Disallow: /w/"

The special pages sections are taken directly from the Wikimedia directives.

Have you tried "User-agent: * Disallow /wiki/Special:" ? When I wanted to stop Google from indexing specific namespaces that worked on my wiki (although it took about a year before all my disallowed namespaces got cleaned out of the Google index completely).

MinuteElectron.

Agon S. Buchholz

26 Dec 26 Dec

6:33 p.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

MinuteElectron wrote:

...

Have you tried "User-agent: * Disallow /wiki/Special:" ? When I wanted to stop Google from indexing specific namespaces that worked on my wiki [...]

I tried that, and additionally installed and enabled APC caching ($wgMainCacheType = CACHE_ACCEL): Absolutely no recognizable effects; after about one hour, the server load reaches 100% on both CPU cores; some time later, the server becomes inaccessible (e.g. by ssh), or mysqld simply dies. Exactly like before.

I have no idea how to block mysql queries like the following that run for hours and don't go away, until I restart mysqld:

-- snip --

/* 66.249.65.15 */ SELECT 'Wantedpages' AS type, pl_namespace AS namespace, pl_title AS title, COUNT(*) AS value FROM `pagelinks` LEFT JOIN `page` AS pg1 ON pl_namespace = pg1.page_namespace AND pl_title = pg1.page_title LEFT JOIN `page` AS pg2 ON pl_from = pg2.page_id WHERE pg1.page_namespace IS NULL AND pl_namespace NOT IN ( 2, 3 ) AND pg2.page_namespace != 8 GROUP BY 1,2,3 HAVING COUNT(*) > 0 ORDER BY value DESC LIMIT 50

-- snip --

'Wantedpages' smells like Spezial:Gewu"nschte_Seiten (de) resp. Special:Wantedpages (en) which should be blocked by "User-agent: * Disallow /wiki/Special:" and "User-agent: * Disallow /wiki/Spezial:", right?

Most of the Wikipedia sites say on this Special page "The following data is cached", but I don't find a matching directive (except $wgWantedPagesThreshold).

Even trickier: How do I get rid of queries like this

-- snip --

/* WhatLinksHerePage::showIndirectLinks 66.249.65.15 */ SELECT /*! STRAIGHT_JOIN */ page_id,page_namespace,page_title,page_is_redirect FROM `pagelinks`,`page` WHERE (page_id=pl_from) AND pl_namespace = '0' AND pl_title = 'Titus' ORDER BY pl_from LIMIT 51

-- snip --

Those also should be blocked by directives like "User-agent: * Disallow /wiki/Special:", but obviously they don't.

Spiders look in the server's root; additionally I pu a copy in /w/; maybe there are other places I should try?

Thanks -asb

jidanni＠jidanni.org

7:42 p.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

http://www.mediawiki.org/wiki/Robots.txt#Blocking_via_.htaccess

David Gerard

23 Dec 23 Dec

7:15 a.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

On 23/12/2007, Agon S. Buchholz asb@kefk.net wrote:

...

The general idea was to relieve MediaWiki from generating dynamic pages all over again to free server resources for other tasks. I've never thought I'd need something like APC for my (small) site, but I'll check this out.

Mediawiki has the power of a tank and the durability of a tank, with the gas mileage of a tank ;-)

PHP is sometimes not such an efficient use of server resources and caching is a good idea. I even run wpcache on my tiny WordPress blog.

- d.

David A. Desrosiers

22 Dec 22 Dec

5:09 p.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

On Sat, 2007-12-22 at 19:30 +0100, Agon S. Buchholz wrote:

...

66.249.65.201 and the other stressing acces are coming from a Google subnet, so I don't like the idea very much to block these hosts with ipchains.

Log into Google Webmaster Tools, add your site to their system, and adjust the crawl rate to be slower using the tools provided there.

-- David A. Desrosiers desrod@gnu-designs.com Skype...: 860-967-3820

Agon S. Buchholz

23 Dec 23 Dec

12:38 a.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

David A. Desrosiers wrote:

...

Log into Google Webmaster Tools, add your site to their system, and adjust the crawl rate to be slower using the tools provided there.

I tried this; it has no noticable effect, neither on the effective server load nor on the threads/queries I see in mytop.

Thanks anyway! -asb

David A. Desrosiers

7:22 a.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

On Sun, 2007-12-23 at 06:38 +0100, Agon S. Buchholz wrote:

...

I tried this; it has no noticable effect, neither on the effective server load nor on the threads/queries I see in mytop.

How long did you wait? It takes a few days before it takes effect..

In my case, it made a significant and appreciable difference.

-- David A. Desrosiers desrod@gnu-designs.com Skype...: 860-967-3820

Hiram Clawson

26 Dec 26 Dec

4:16 p.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

With your google Webmaster tools account, you can view the graphs of number of pages fetched per day in the Tools->Set crawl rate option. On the wiki I manage, with about 2,000 pages of content, it is hit an average of 130 pages per day from google over the past four months. Certainly not a load of much importance on the WEB server. Beware of alias names that google may see your site as, such as: website.com www.website.com ... etc ...

--Hiram

Agon S. Buchholz wrote:

...

David A. Desrosiers wrote:

...
Log into Google Webmaster Tools, add your site to their system, and adjust the crawl rate to be slower using the tools provided there.

I tried this; it has no noticable effect, neither on the effective server load nor on the threads/queries I see in mytop.

Thanks anyway! -asb

Gabriel Millerd

25 Dec 25 Dec

11:51 a.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

On Dec 22, 2007 12:30 PM, Agon S. Buchholz asb@kefk.net wrote:

...

Hi,

I have ongoing and heavy problems with my MediaWiki installation; it causes a continous high CPU load of 100% on a Dual Core AMD opteron (2 GB of RAM).

This is quite odd, googlebot doesn't hit hard enough to cause this sort of chaos on any of the systems I run, which sadly are much less robust that yours. Are you running memcache at all? What OS is this on?

Is there a monster jobqueue by chance http://meta.wikimedia.org/wiki/Help:Job_queue

-- Gabriel Millerd

Alejandro Sánchez Marín

26 Dec 26 Dec

3:44 a.m.

New subject: [Mediawiki-l] Server performance: Google is killing MediaWiki

This is usual with googlebot, people search via google and hit your database directly.

What operating system do you have? And Mediawiki version?

If you have a upgraded operating system and software, you shouldn't have problems.

How big is your database? If is bigger than 1 gb change myisam for innodb, or if you have a lot of users, do it too.

A last way is to install squid on accel http mode between internet and your apache site, with default parameters you can get at 30-40% increase performance.

For testing you have mysql proxy a new tool to control mysql queries.

If you need more help or information, write me again.

Good luck.

Agon S. Buchholz escribió:

...

Hi,

I have ongoing and heavy problems with my MediaWiki installation; it causes a continous high CPU load of 100% on a Dual Core AMD opteron (2 GB of RAM).

If I'm monitoring what's going on with "mytop", I find queries like, e.g.:

-- snip --

/* WhatLinksHerePage::showIndirectLinks 66.249.65.201 */ SELECT /*! STRAIGHT_JOIN */ page_id,page_namespace,page_title,page_is_redirect FROM `pagelinks`,`page` WHERE (page_id=pl_from) AND pl_namespace = '0' AND pl_title = 'Fungistatikum' ORDER BY pl_from LIMIT 51

-- or --

/* CategoryViewer::doCategoryQuery 217.227.171.13 */ SELECT page_title,page_namespace,page_len,page_is_redirect,cl_sortkey FROM `page`,`categorylinks` FORCE INDEX (cl_sortkey) WHERE (1 = 1) AND (cl_from = page_id) AND cl_to = 'Chemie' ORDER BY cl_sortkey LIMIT 201

-- snip --

66.249.65.201 and the other stressing acces are coming from a Google subnet, so I don't like the idea very much to block these hosts with ipchains.

However, these threads/processes run for hours and continue to increase the load, until the server becomes completey inaccessible (ssh doesn't respond any more) or MySQL simply dies.

I've tried so far (a) to mimic the setup of the Wikimedia wikis with a similar robots.txt, and (b) to enable file caching in MediaWiki; this might have halped a bit, but didn't solve the basic problem (i've to stop and restart mysqld every few hours).

Any help on this (or pointers to existing solutions) is greatly appreciated!

Greetings, asb

MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

6208

Age (days ago)

6213

Last active (days ago)

mediawiki-l@lists.wikimedia.org

13 comments

9 participants

tags (0)

participants (9)

Agon S. Buchholz
Alejandro Sánchez Marín
David A. Desrosiers
David Gerard
Emufarmers Sangly
Gabriel Millerd
Hiram Clawson
jidanni＠jidanni.org
MinuteElectron