"Brion Vibber" skribis:
I didn't say it was being cached, that its content could be word-searched, or that it had been spidered through to other pages. I said it was *indexed*. Now, maybe Google uses some word other than "indexed" to mean "contained in a database of links which are shown to users when they search for words contained in the link". I'll buy that. Maybe the word they use is "florble". In that case, the page is being florbled despite our best efforts to stop it from being florbled.
Is there any way we can tell google not to florble pages that are explicitly excluded by our robots.txt file so that people will stop complaining to *us* about google's overzealous florbling?
As I understand it:
The problem is that there are two parts of GoogleBot.
First step is collecting URLs and adding it to their database, without doing any checking of it, nor retrieving the page. This step actually uses nor robots.txt nor meta-noindex of the given links. meta-nofollow of the page containing the links is probably used.
The second step (which can occur some weeks later) is taking URLs from their database, and retrieve the page. When they are excluded in the respective robots.txt or by a meta-noindex, they are deleted from the database. (At the same time, step one is done with the links on this page).
Between those two steps, the url stays in the database, and whenever it contains the search-words (in the url itself) it is shown as a search result.
Hypothetically we could jimmy the page to not produce edit links if the user agent is googlebot, but that would be very annoying for several reasons:
- The google-cached page would be missing those links.
- This would screw with page caching. Google hits a lot of pages, and
we'd have to either not cache any of its hits or be very careful in coding around it.
What about changing the edit urls, so that they don't contain anything, which people would search?
For example
http://pl.wikipedia.org/w/wiki.phtml?title=W.i.b.r.a.t.o.r&action=edit
or
http://pl.wikipedia.org/w/wiki.phtml?articlenum=12345678&action=edit
Paul