On Tue, Aug 25, 2009 at 12:30 PM, Dmitriy Sintsov<questpc(a)rambler.ru> wrote:
For every invocation of the same article with any
action that produces
HTML output.
That's wrong. The canonical version of a page must be a page with
substantially identical content. Edit pages serve totally different
HTML; rel=canonical pointing to the article will just be ignored by
search engines. See here for a discussion of how rel=canonical works:
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.h…
Note, e.g., "We allow slight differences, e.g., in the sort order of a
table of products. We also recognize that we may crawl the canonical
and the duplicate pages at different points in time, so we may
occasionally see different versions of your content." Totally
different content, no.
Wouldn't the action=edit be indexed by robots when
we have no proper
robots.txt? Or, there will be meta noindex, nofollow in the head of such
page?
Yes, we set noindex on edit pages.
Anyway, it seems that Yandex crawler doesn't like
the meta noindex
rules in the header of the page, giving an error (warning) message in
the stats of their webmaster tools.
What does the warning say? Ideally, of course, you should ban them in
robots.txt, so the search engine doesn't have to bother fetching the
URL.
I've thought that the purpose of
canonical link is to threat the multiple actions of the page as the
single page to the web indexer, thus, improving the ranks.
The purpose is to tell search engines which URL you'd prefer them to
present to users, if the same content is being served under multiple
URLs. It is not meant to artificially inflate rankings by counting
unindexed pages as contributing to some entirely different page of
your choosing, and using it that way won't actually work. Since
search engines were already using heuristics to identify duplicate
content, and might well continue to use those exact same heuristics to
validate rel=canonical, it might not improve rankings at all.