On Tue, Aug 25, 2009 at 12:30 PM, Dmitriy Sintsovquestpc@rambler.ru wrote:
For every invocation of the same article with any action that produces HTML output.
That's wrong. The canonical version of a page must be a page with substantially identical content. Edit pages serve totally different HTML; rel=canonical pointing to the article will just be ignored by search engines. See here for a discussion of how rel=canonical works:
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.ht...
Note, e.g., "We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content." Totally different content, no.
Wouldn't the action=edit be indexed by robots when we have no proper robots.txt? Or, there will be meta noindex, nofollow in the head of such page?
Yes, we set noindex on edit pages.
Anyway, it seems that Yandex crawler doesn't like the meta noindex rules in the header of the page, giving an error (warning) message in the stats of their webmaster tools.
What does the warning say? Ideally, of course, you should ban them in robots.txt, so the search engine doesn't have to bother fetching the URL.
I've thought that the purpose of canonical link is to threat the multiple actions of the page as the single page to the web indexer, thus, improving the ranks.
The purpose is to tell search engines which URL you'd prefer them to present to users, if the same content is being served under multiple URLs. It is not meant to artificially inflate rankings by counting unindexed pages as contributing to some entirely different page of your choosing, and using it that way won't actually work. Since search engines were already using heuristics to identify duplicate content, and might well continue to use those exact same heuristics to validate rel=canonical, it might not improve rankings at all.