-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Barrett wrote:
What's the right way to disable "nofollow" in MediaWiki? Grep for "setRobotpolicy( 'noindex,nofollow' )" in all the /include files and change them one by one?
Problem: Our intranet spider, a Google Mini search appliance, randomly sees "nofollow" directives for pages in MediaWiki 1.9.3 and reports:
Excluded: On page with robots nofollow meta tag.
In repeated spiderings, the same page might get excluded or not, seemingly randomly. Out of 10 spiderings of a particular page, say, 6 of them will fail with this error. Maybe the same link appears with and without "nofollow" on different wiki pages (say, articles vs. category pages), and whichever one the spider hits, it obeys?
Many of the special pages have a 'nofollow' meta robots tag to prevent useless and expensive infinite spidering loops. On a list page with several list-length and several paging links, there's a combinatorial explosion -- many thousands and thousands of different variations on how you could page through the list, with different starting points and lengths. Then if you follow the links, you'll get the extremely expensive list of thousands and thousands of diff views between every individual version of every page, etc etc......
- From your description, the search appliance is logging each individual link from such a page and saying "I saw this link but I'm not following it." If so, that's a little weird logging ;) but perfectly normal behavior. When spidering from another page which is marked to follow links, it should indeed follow them.
You should probably not try to change it, or you'll just end up with a loaded web server and a lot of extra crap in your search index.
If you're pretty sure it is behaving weird, I'd recommend checking with Google's support to figure out what it's doing and get it sorted out.
- -- brion vibber (brion @ wikimedia.org)