What's the right way to disable "nofollow" in MediaWiki? Grep for "setRobotpolicy( 'noindex,nofollow' )" in all the /include files and change them one by one?
Problem: Our intranet spider, a Google Mini search appliance, randomly sees "nofollow" directives for pages in MediaWiki 1.9.3 and reports:
Excluded: On page with robots nofollow meta tag.
In repeated spiderings, the same page might get excluded or not, seemingly randomly. Out of 10 spiderings of a particular page, say, 6 of them will fail with this error. Maybe the same link appears with and without "nofollow" on different wiki pages (say, articles vs. category pages), and whichever one the spider hits, it obeys?
Thanks, DanB
Daniel Barrett wrote:
What's the right way to disable "nofollow" in MediaWiki? Grep for "setRobotpolicy( 'noindex,nofollow' )" in all the /include files and change them one by one?
In your LocalSettings.php, immediately before the final "?>", put this:
$wgNoFollowLinks = false;
-- Tim Starling
Tim Starling wrote:
In your LocalSettings.php, immediately before the final "?>", put this: $wgNoFollowLinks = false;
Gee, that was easy! Any reason why it has to be the final line of PHP in the file? Or were you just making your instructions simple (in case I'm a PHP novice)?
Thanks again, DanB
Tim Starling wrote:
In your LocalSettings.php, immediately before the final "?>", put this: $wgNoFollowLinks = false;
Hmm, $wgNoFollowLinks affects only external links, according to DefaultSettings.php. I am getting "nofollow" problems when spidering *internal* wiki links. Is there more to the story?
Maybe I should be setting $wgNamespaceRobotPolicies to '' for all namespaces?
DanB
On 23/03/07, Daniel Barrett danb@vistaprint.com wrote:
Any reason why it has to be the final line of PHP in the file? Or were you just making your instructions simple (in case I'm a PHP novice)?
Yes. It prevents the line appearing above the inclusion of DefaultSettings.php, which would overwrite the customised setting. This is a common problem we have with users who are new to PHP, let alone MediaWiki, so we're careful to give out the "safest case" advice.
Rob Church
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Barrett wrote:
What's the right way to disable "nofollow" in MediaWiki? Grep for "setRobotpolicy( 'noindex,nofollow' )" in all the /include files and change them one by one?
Problem: Our intranet spider, a Google Mini search appliance, randomly sees "nofollow" directives for pages in MediaWiki 1.9.3 and reports:
Excluded: On page with robots nofollow meta tag.
In repeated spiderings, the same page might get excluded or not, seemingly randomly. Out of 10 spiderings of a particular page, say, 6 of them will fail with this error. Maybe the same link appears with and without "nofollow" on different wiki pages (say, articles vs. category pages), and whichever one the spider hits, it obeys?
Many of the special pages have a 'nofollow' meta robots tag to prevent useless and expensive infinite spidering loops. On a list page with several list-length and several paging links, there's a combinatorial explosion -- many thousands and thousands of different variations on how you could page through the list, with different starting points and lengths. Then if you follow the links, you'll get the extremely expensive list of thousands and thousands of diff views between every individual version of every page, etc etc......
- From your description, the search appliance is logging each individual link from such a page and saying "I saw this link but I'm not following it." If so, that's a little weird logging ;) but perfectly normal behavior. When spidering from another page which is marked to follow links, it should indeed follow them.
You should probably not try to change it, or you'll just end up with a loaded web server and a lot of extra crap in your search index.
If you're pretty sure it is behaving weird, I'd recommend checking with Google's support to figure out what it's doing and get it sorted out.
- -- brion vibber (brion @ wikimedia.org)
Thanks Brion. For the time being I have disabled MediaWiki's nofollow behavior entirely, and told our spider to avoid all Special pages and edit/history/diff/etc. pages. Seems to be working fine.
DanB
mediawiki-l@lists.wikimedia.org