On Fri, Aug 14, 2009 at 5:23 AM, Gregory Maxwellgmaxwell@gmail.com wrote:
On Thu, Aug 13, 2009 at 2:56 PM, Cox, SeritaSerita.Cox@bridgespan.org wrote:
Google's new search engine, Caffeine, is supposedly kicking Wikipedia entries further down results page. Thoughts? Comments? http://software.silicon.com/applications/0,39024653,39484015,00.htm
[from my comments in #wikimedia-tech the other day] "So— I tried 20 random words, and the WP result was lower in four of them, the same in the rest." "No pattern really... We still have the problem with "article at funny name; redirect from common name; common name search on google gives squat", which I consider to be much more major."
A simple solution to this is using the canonical tags which all major search engines started supporting earlier this year.
<http://www.mattcutts.com/blog/canonical-link-tag/? Wikia's GPL code to add this to MediaWiki is available here: <https://wikia-code.com/wikia/trunk/extensions/wikia/CanonicalHref/CanonicalH... More info on it in Nick's blog post at http://www.techyouruniverse.com/wikia/google-canonical-href-with-mediawiki
Angela
On 8/13/09 5:28 PM, Angela wrote:
On Fri, Aug 14, 2009 at 5:23 AM, Gregory Maxwellgmaxwell@gmail.com wrote:
"So— I tried 20 random words, and the WP result was lower in four of them, the same in the rest." "No pattern really... We still have the problem with "article at funny name; redirect from common name; common name search on google gives squat", which I consider to be much more major."
A simple solution to this is using the canonical tags which all major search engines started supporting earlier this year.
That's been deployed for a while, eg:
<link rel="canonical" href="/wiki/Foobar" /> at http://en.wikipedia.org/wiki/Foo
-- brion
* Brion Vibber brion@wikimedia.org [Thu, 13 Aug 2009 18:13:38 -0700]:
That's been deployed for a while, eg:
<link rel="canonical" href="/wiki/Foobar" /> at http://en.wikipedia.org/wiki/Foo
I haven't found such code in MediaWiki 54916 snapshot from SVN (currently seems to be running at WMF). I am missing the code (I've looked into monobook and grepped for "canonical" through the subtree), or does it use some kind of extension? The most logical place for it is the monobook skin source code. Dmitriy
2009/8/14 Dmitriy Sintsov questpc@rambler.ru:
- Brion Vibber brion@wikimedia.org [Thu, 13 Aug 2009 18:13:38 -0700]:
That's been deployed for a while, eg:
<link rel="canonical" href="/wiki/Foobar" /> at http://en.wikipedia.org/wiki/Foo
I haven't found such code in MediaWiki 54916 snapshot from SVN
You were not looking closely enough. See http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Article.php?... (function showRedirectedFromHeader())
The most logical place for it is the monobook skin source code.
Nope, there is no reason to limit the functionality to one skin.
-- [[cs:User:Mormegil | Petr Kadlec]]
* Petr Kadlec petr.kadlec@gmail.com [Fri, 14 Aug 2009 13:28:32 +0200]:
2009/8/14 Dmitriy Sintsov questpc@rambler.ru:
- Brion Vibber brion@wikimedia.org [Thu, 13 Aug 2009 18:13:38
-0700]:
That's been deployed for a while, eg:
<link rel="canonical" href="/wiki/Foobar" /> at http://en.wikipedia.org/wiki/Foo
I haven't found such code in MediaWiki 54916 snapshot from SVN
You were not looking closely enough. See
http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Article.php?...
(function showRedirectedFromHeader())
Why this is being added only for redirects? In one of my wiki (old v 1.11) there's no such method showRedirectedFromHeader() in Article class. I've tried to add the canonical link with $WgOut->addLink() to Article::view(), then the canonical link is not being displayed for action=edit, for example. Then, placed it to Article::outputWikiText(), the same behavior. What's the proper place for this code in MediaWiki 1.11?
Can't update that wiki (patched code and many exotic extensions) just yet. Dmitriy
Dmitriy Sintsov wrote:
Why this is being added only for redirects? In one of my wiki (old v 1.11) there's no such method showRedirectedFromHeader() in Article class. I've tried to add the canonical link with $WgOut->addLink() to Article::view(), then the canonical link is not being displayed for action=edit, for example. Then, placed it to Article::outputWikiText(), the same behavior. What's the proper place for this code in MediaWiki 1.11?
Can't update that wiki (patched code and many exotic extensions) just yet. Dmitriy
The proper fix would be to move your patched code, and update your exotic extensions so you can get up to date from now on, instead of patching it still more.
Hoi, I know how it feels not to be able to update your wiki. My recommendations are ... make sure that your extensions are in the Wikimedia Foundations SVN code repository. Spend time on making your exotic extensions conform to development standards and move as much as possible from you Core changes to extensions think hooks in stead.
One thing is clear, you cannot compare your functionality with what exists in release 1.16a and, with more usability initiative changes going in, you will regret even more that it is hard for you to update. Thanks, GerardM
2009/8/25 Dmitriy Sintsov questpc@rambler.ru
- Petr Kadlec petr.kadlec@gmail.com [Fri, 14 Aug 2009 13:28:32 +0200]:
2009/8/14 Dmitriy Sintsov questpc@rambler.ru:
- Brion Vibber brion@wikimedia.org [Thu, 13 Aug 2009 18:13:38
-0700]:
That's been deployed for a while, eg:
<link rel="canonical" href="/wiki/Foobar" /> at http://en.wikipedia.org/wiki/Foo
I haven't found such code in MediaWiki 54916 snapshot from SVN
You were not looking closely enough. See
http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Article.php?...
(function showRedirectedFromHeader())
Why this is being added only for redirects? In one of my wiki (old v 1.11) there's no such method showRedirectedFromHeader() in Article class. I've tried to add the canonical link with $WgOut->addLink() to Article::view(), then the canonical link is not being displayed for action=edit, for example. Then, placed it to Article::outputWikiText(), the same behavior. What's the proper place for this code in MediaWiki 1.11?
Can't update that wiki (patched code and many exotic extensions) just yet. Dmitriy
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, Aug 25, 2009 at 4:24 AM, Dmitriy Sintsovquestpc@rambler.ru wrote:
Why this is being added only for redirects?
What else should it be added for?
I've tried to add the canonical link with $WgOut->addLink() to Article::view(), then the canonical link is not being displayed for action=edit, for example. Then, placed it to Article::outputWikiText(), the same behavior.
What purpose would a canonical link serve on action=edit?
What's the proper place for this code in MediaWiki 1.11?
It's unlikely anyone else is going to spend time hunting through nearly two-year-old code for you. Two-year-old code which, by the way, very possibly has known, unpatched security vulnerabilities, since it hasn't been supported in a year or so. If you're not willing to upgrade for whatever reason, you'll probably have to figure this kind of thing out yourself.
* Aryeh Gregor Simetrical+wikilist@gmail.com [Tue, 25 Aug 2009 09:35:25 -0400]:
On Tue, Aug 25, 2009 at 4:24 AM, Dmitriy Sintsovquestpc@rambler.ru wrote:
Why this is being added only for redirects?
What else should it be added for?
For every invocation of the same article with any action that produces HTML output.
I've tried to add the canonical link with $WgOut->addLink() to Article::view(), then the canonical link is not being displayed for action=edit, for example. Then, placed it to
Article::outputWikiText(),
the same behavior.
What purpose would a canonical link serve on action=edit?
Wouldn't the action=edit be indexed by robots when we have no proper robots.txt? Or, there will be meta noindex, nofollow in the head of such page? Anyway, it seems that Yandex crawler doesn't like the meta noindex rules in the header of the page, giving an error (warning) message in the stats of their webmaster tools. I've thought that the purpose of canonical link is to threat the multiple actions of the page as the single page to the web indexer, thus, improving the ranks.
What's the proper place for this code in MediaWiki 1.11?
It's unlikely anyone else is going to spend time hunting through nearly two-year-old code for you. Two-year-old code which, by the way, very possibly has known, unpatched security vulnerabilities, since it hasn't been supported in a year or so. If you're not willing to upgrade for whatever reason, you'll probably have to figure this kind of thing out yourself.
I am willing to upgrade, just not yet. It's not my fault that the wiki wasn't upgraded for such long time - I work with it only recently. My other wikis run 1.14.1 Yes, it uses some custom made extensions which aren't in SVN nor www.mediawiki.org. I'll try to figure out myself, of course. Dmitriy
On Tue, Aug 25, 2009 at 12:30 PM, Dmitriy Sintsovquestpc@rambler.ru wrote:
For every invocation of the same article with any action that produces HTML output.
That's wrong. The canonical version of a page must be a page with substantially identical content. Edit pages serve totally different HTML; rel=canonical pointing to the article will just be ignored by search engines. See here for a discussion of how rel=canonical works:
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.ht...
Note, e.g., "We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content." Totally different content, no.
Wouldn't the action=edit be indexed by robots when we have no proper robots.txt? Or, there will be meta noindex, nofollow in the head of such page?
Yes, we set noindex on edit pages.
Anyway, it seems that Yandex crawler doesn't like the meta noindex rules in the header of the page, giving an error (warning) message in the stats of their webmaster tools.
What does the warning say? Ideally, of course, you should ban them in robots.txt, so the search engine doesn't have to bother fetching the URL.
I've thought that the purpose of canonical link is to threat the multiple actions of the page as the single page to the web indexer, thus, improving the ranks.
The purpose is to tell search engines which URL you'd prefer them to present to users, if the same content is being served under multiple URLs. It is not meant to artificially inflate rankings by counting unindexed pages as contributing to some entirely different page of your choosing, and using it that way won't actually work. Since search engines were already using heuristics to identify duplicate content, and might well continue to use those exact same heuristics to validate rel=canonical, it might not improve rankings at all.
* Aryeh Gregor Simetrical+wikilist@gmail.com [Tue, 25 Aug 2009 13:13:56 -0400]:
That's wrong. The canonical version of a page must be a page with substantially identical content. Edit pages serve totally different HTML; rel=canonical pointing to the article will just be ignored by search engines. See here for a discussion of how rel=canonical works:
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.ht...
Thanks for pointing out.
Note, e.g., "We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content." Totally different content, no.
Well, semantically an edit page and action=view page are not totally different, for sure. Both of these will contain very similar information. But I cannot go against standards, that's impossible. That's something like law, you don't always like it, but you have to obey it.
Anyway, it seems that Yandex crawler doesn't like the meta noindex rules in the header of the page, giving an error (warning) message
in
the stats of their webmaster tools.
What does the warning say? Ideally, of course, you should ban them in robots.txt, so the search engine doesn't have to bother fetching the URL.
I've banned them in robots.txt It produces the warning due to non-existing titles, which also have meta noindex. There are some links from foreign sites to non-existing titles which I obviously cannot disable something like "http://mywiki.org/wiki/nonexsitingtitle" . Yandex gives the warning "Document contains meta-tag noindex" (approximately translated from Russian). A lots of such warnings. A bit strange, why this is a warning at all. Google doesn't give such warning.
The purpose is to tell search engines which URL you'd prefer them to present to users, if the same content is being served under multiple URLs. It is not meant to artificially inflate rankings by counting unindexed pages as contributing to some entirely different page of your choosing, and using it that way won't actually work. Since search engines were already using heuristics to identify duplicate content, and might well continue to use those exact same heuristics to validate rel=canonical, it might not improve rankings at all.
I am not so sure that such inflation is artifical. The artifical one would be when the article/revision is not the same or, even mixing MediaWiki generated HTML and other HTML. But, anyway I cannot change how the search engines will interpret it. Dmitriy
wikitech-l@lists.wikimedia.org