MinuteElectron wrote:
Have you tried "User-agent: * Disallow /wiki/Special:" ? When I wanted to stop Google from indexing specific namespaces that worked on my wiki [...]
I tried that, and additionally installed and enabled APC caching ($wgMainCacheType = CACHE_ACCEL): Absolutely no recognizable effects; after about one hour, the server load reaches 100% on both CPU cores; some time later, the server becomes inaccessible (e.g. by ssh), or mysqld simply dies. Exactly like before.
I have no idea how to block mysql queries like the following that run for hours and don't go away, until I restart mysqld:
-- snip --
/* 66.249.65.15 */ SELECT 'Wantedpages' AS type, pl_namespace AS namespace, pl_title AS title, COUNT(*) AS value FROM `pagelinks` LEFT JOIN `page` AS pg1 ON pl_namespace = pg1.page_namespace AND pl_title = pg1.page_title LEFT JOIN `page` AS pg2 ON pl_from = pg2.page_id WHERE pg1.page_namespace IS NULL AND pl_namespace NOT IN ( 2, 3 ) AND pg2.page_namespace != 8 GROUP BY 1,2,3 HAVING COUNT(*) > 0 ORDER BY value DESC LIMIT 50
-- snip --
'Wantedpages' smells like Spezial:Gewu"nschte_Seiten (de) resp. Special:Wantedpages (en) which should be blocked by "User-agent: * Disallow /wiki/Special:" and "User-agent: * Disallow /wiki/Spezial:", right?
Most of the Wikipedia sites say on this Special page "The following data is cached", but I don't find a matching directive (except $wgWantedPagesThreshold).
Even trickier: How do I get rid of queries like this
-- snip --
/* WhatLinksHerePage::showIndirectLinks 66.249.65.15 */ SELECT /*! STRAIGHT_JOIN */ page_id,page_namespace,page_title,page_is_redirect FROM `pagelinks`,`page` WHERE (page_id=pl_from) AND pl_namespace = '0' AND pl_title = 'Titus' ORDER BY pl_from LIMIT 51
-- snip --
Those also should be blocked by directives like "User-agent: * Disallow /wiki/Special:", but obviously they don't.
Spiders look in the server's root; additionally I pu a copy in /w/; maybe there are other places I should try?
Thanks -asb