[Mediawiki-l] Google Mini search appliance & MediaWiki
Brion Vibber
brion at wikimedia.org
Fri Mar 23 14:02:47 UTC 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Daniel Barrett wrote:
> What's the right way to disable "nofollow" in MediaWiki? Grep for
> "setRobotpolicy( 'noindex,nofollow' )" in all the /include files and
> change them one by one?
>
> Problem: Our intranet spider, a Google Mini search appliance, randomly
> sees "nofollow" directives for pages in MediaWiki 1.9.3 and reports:
>
> Excluded: On page with robots nofollow meta tag.
>
> In repeated spiderings, the same page might get excluded or not,
> seemingly randomly. Out of 10 spiderings of a particular page, say, 6
> of them will fail with this error. Maybe the same link appears with and
> without "nofollow" on different wiki pages (say, articles vs. category
> pages), and whichever one the spider hits, it obeys?
Many of the special pages have a 'nofollow' meta robots tag to prevent
useless and expensive infinite spidering loops. On a list page with
several list-length and several paging links, there's a combinatorial
explosion -- many thousands and thousands of different variations on how
you could page through the list, with different starting points and
lengths. Then if you follow the links, you'll get the extremely
expensive list of thousands and thousands of diff views between every
individual version of every page, etc etc......
- From your description, the search appliance is logging each individual
link from such a page and saying "I saw this link but I'm not following
it." If so, that's a little weird logging ;) but perfectly normal
behavior. When spidering from another page which is marked to follow
links, it should indeed follow them.
You should probably not try to change it, or you'll just end up with a
loaded web server and a lot of extra crap in your search index.
If you're pretty sure it is behaving weird, I'd recommend checking with
Google's support to figure out what it's doing and get it sorted out.
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGA94HwRnhpk1wk44RArBjAKDFUBsngZjceYjicJNHGQHhHokxNQCfRdgb
uoM0M4gcVAJrter2d+ZVij0=
=uPf1
-----END PGP SIGNATURE-----
More information about the MediaWiki-l
mailing list