-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Rene Bakker wrote:
I noticed that the Googlebot does not use the exact
article path from
the sitemap files but uses Special:RecentchangesLinked instead to access
the articles. So instead of /www.mysite.com/wiki/My_Article/, Google
uses /www.mysite.com/wiki/Special:RecentchangesLinked/My_Article/. Why
is that?
It'll follow whatever links it sees. Given that there is such a link on
every article page, I'm sure it happily follows them as well as the
article pages.
I ran into this because I disallowed
//wiki/Special:RecentchangesLinked/
within robots.txt. This was to prevent the bots from indexing all
revisions.
Disallowing RecentchangesLinked won't affect indexing of revisions.
Note that Recentchanges and Recentchangeslinked both have meta robots
tags telling spiders not to index them or follow links from them. This
does not affect whether they get _spidered_ -- unless robots.txt blocks
those URLs, search spiders will reach those pages through regular links.
Old versions also have meta robots tags telling spiders not to index
them. Whether they get _spidered_ in the first place is up to your
robots.txt configuration, and what other pages on the web link to them,
and what their meta robots settings are.
- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iEYEARECAAYFAkklxQQACgkQwRnhpk1wk46jzwCfe+dTmTKLhlHSi7rUDleS4GR5
aIQAn1f0lpRJbZ7z4GA7UewJfepot5xw
=7ENI
-----END PGP SIGNATURE-----