Re: [Wikitech-l] [Foundation-l] Question to post...

List overview All Threads
Download

newer

older

A potential land mine

Re: [Wikitech-l] Batik SVG-to-PNG...

Angela

14 Aug 2009 14 Aug '09

2:28 a.m.

On Fri, Aug 14, 2009 at 5:23 AM, Gregory Maxwell<gmaxwell(a)gmail.com> wrote:

...

On Thu, Aug 13, 2009 at 2:56 PM, Cox, Serita<Serita.Cox(a)bridgespan.org> wrote:

Google's new search engine, Caffeine, is supposedly kicking Wikipedia entries further down results page. Thoughts? Comments? http://software.silicon.com/applications/0,39024653,39484015,00.htm

[from my comments in #wikimedia-tech the other day] "So— I tried 20 random words, and the WP result was lower in four of them, the same in the rest." "No pattern really... We still have the problem with "article at funny name; redirect from common name; common name search on google gives squat", which I consider to be much more major."

A simple solution to this is using the canonical tags which all major search engines started supporting earlier this year. <http://www.mattcutts.com/blog/canonical-link-tag/? Wikia's GPL code to add this to MediaWiki is available here: <https://wikia-code.com/wikia/trunk/extensions/wikia/CanonicalHref/CanonicalHref.php? More info on it in Nick's blog post at <http://www.techyouruniverse.com/wikia/google-canonical-href-with-mediawiki> Angela

Show replies by date

Brion Vibber

14 Aug 14 Aug

3:13 a.m.

New subject: [Foundation-l] Question to post...

On 8/13/09 5:28 PM, Angela wrote:

...

On Fri, Aug 14, 2009 at 5:23 AM, Gregory Maxwell<gmaxwell(a)gmail.com> wrote:

"So— I tried 20 random words, and the WP result was lower in four of them, the same in the rest." "No pattern really... We still have the problem with "article at funny name; redirect from common name; common name search on google gives squat", which I consider to be much more major."

A simple solution to this is using the canonical tags which all major search engines started supporting earlier this year.

That's been deployed for a while, eg: <link rel="canonical" href="/wiki/Foobar" /> at http://en.wikipedia.org/wiki/Foo -- brion

Dmitriy Sintsov

1:21 p.m.

New subject: [Foundation-l] Question to post...

* Brion Vibber <brion(a)wikimedia.org> [Thu, 13 Aug 2009 18:13:38 -0700]:

...

That's been deployed for a while, eg: <link rel="canonical" href="/wiki/Foobar" /> at http://en.wikipedia.org/wiki/Foo

I haven't found such code in MediaWiki 54916 snapshot from SVN (currently seems to be running at WMF). I am missing the code (I've looked into monobook and grepped for "canonical" through the subtree), or does it use some kind of extension? The most logical place for it is the monobook skin source code. Dmitriy

Petr Kadlec

1:28 p.m.

New subject: [Foundation-l] Question to post...

2009/8/14 Dmitriy Sintsov <questpc(a)rambler.ru>ru>:

...

* Brion Vibber <brion(a)wikimedia.org> [Thu, 13 Aug 2009 18:13:38 -0700]:

That's been deployed for a while, eg: <link rel="canonical" href="/wiki/Foobar" /> at http://en.wikipedia.org/wiki/Foo

I haven't found such code in MediaWiki 54916 snapshot from SVN

You were not looking closely enough. See http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Article.php… (function showRedirectedFromHeader())

...

The most logical place for it is the monobook skin source code.

Nope, there is no reason to limit the functionality to one skin. -- [[cs:User:Mormegil | Petr Kadlec]]

Dmitriy Sintsov

25 Aug 25 Aug

10:24 a.m.

New subject: [Foundation-l] Question to post...

* Petr Kadlec <petr.kadlec(a)gmail.com> [Fri, 14 Aug 2009 13:28:32 +0200]:

...

2009/8/14 Dmitriy Sintsov <questpc(a)rambler.ru>ru>:

* Brion Vibber <brion(a)wikimedia.org> [Thu, 13 Aug 2009 18:13:38

-0700]:

That's been deployed for a while, eg: <link rel="canonical" href="/wiki/Foobar" /> at http://en.wikipedia.org/wiki/Foo

I haven't found such code in MediaWiki 54916 snapshot from SVN

You were not looking closely enough. See

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Article.php…

...

(function showRedirectedFromHeader())

Why this is being added only for redirects? In one of my wiki (old v 1.11) there's no such method showRedirectedFromHeader() in Article class. I've tried to add the canonical link with $WgOut->addLink() to Article::view(), then the canonical link is not being displayed for action=edit, for example. Then, placed it to Article::outputWikiText(), the same behavior. What's the proper place for this code in MediaWiki 1.11? Can't update that wiki (patched code and many exotic extensions) just yet. Dmitriy

Platonides

12:55 p.m.

New subject: [Foundation-l] Question to post...

Dmitriy Sintsov wrote:

...

The proper fix would be to move your patched code, and update your exotic extensions so you can get up to date from now on, instead of patching it still more.

Gerard Meijssen

1:05 p.m.

New subject: [Foundation-l] Question to post...

Hoi, I know how it feels not to be able to update your wiki. My recommendations are ... make sure that your extensions are in the Wikimedia Foundations SVN code repository. Spend time on making your exotic extensions conform to development standards and move as much as possible from you Core changes to extensions think hooks in stead. One thing is clear, you cannot compare your functionality with what exists in release 1.16a and, with more usability initiative changes going in, you will regret even more that it is hard for you to update. Thanks, GerardM 2009/8/25 Dmitriy Sintsov <questpc(a)rambler.ru>

...

* Petr Kadlec <petr.kadlec(a)gmail.com> [Fri, 14 Aug 2009 13:28:32 +0200]:

2009/8/14 Dmitriy Sintsov <questpc(a)rambler.ru>ru>:

* Brion Vibber <brion(a)wikimedia.org> [Thu, 13 Aug 2009 18:13:38

-0700]:

That's been deployed for a while, eg: <link rel="canonical" href="/wiki/Foobar" /> at http://en.wikipedia.org/wiki/Foo

I haven't found such code in MediaWiki 54916 snapshot from SVN

You were not looking closely enough. See

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Article.php…

(function showRedirectedFromHeader())

Aryeh Gregor

3:35 p.m.

New subject: [Foundation-l] Question to post...

On Tue, Aug 25, 2009 at 4:24 AM, Dmitriy Sintsov<questpc(a)rambler.ru> wrote:

...

Why this is being added only for redirects?

What else should it be added for?

...

I've tried to add the canonical link with $WgOut->addLink() to Article::view(), then the canonical link is not being displayed for action=edit, for example. Then, placed it to Article::outputWikiText(), the same behavior.

What purpose would a canonical link serve on action=edit?

...

What's the proper place for this code in MediaWiki 1.11?

It's unlikely anyone else is going to spend time hunting through nearly two-year-old code for you. Two-year-old code which, by the way, very possibly has known, unpatched security vulnerabilities, since it hasn't been supported in a year or so. If you're not willing to upgrade for whatever reason, you'll probably have to figure this kind of thing out yourself.

Dmitriy Sintsov

6:30 p.m.

New subject: [Foundation-l] Question to post...

* Aryeh Gregor <Simetrical+wikilist(a)gmail.com> [Tue, 25 Aug 2009 09:35:25 -0400]:

...

On Tue, Aug 25, 2009 at 4:24 AM, Dmitriy Sintsov<questpc(a)rambler.ru> wrote:

Why this is being added only for redirects?

What else should it be added for?

For every invocation of the same article with any action that produces HTML output.

...

I've tried to add the canonical link with $WgOut->addLink() to Article::view(), then the canonical link is not being displayed for action=edit, for example. Then, placed it to

Article::outputWikiText(),

the same behavior.

What purpose would a canonical link serve on action=edit?

Wouldn't the action=edit be indexed by robots when we have no proper robots.txt? Or, there will be meta noindex, nofollow in the head of such page? Anyway, it seems that Yandex crawler doesn't like the meta noindex rules in the header of the page, giving an error (warning) message in the stats of their webmaster tools. I've thought that the purpose of canonical link is to threat the multiple actions of the page as the single page to the web indexer, thus, improving the ranks.

...

What's the proper place for this code in MediaWiki 1.11?

I am willing to upgrade, just not yet. It's not my fault that the wiki wasn't upgraded for such long time - I work with it only recently. My other wikis run 1.14.1 Yes, it uses some custom made extensions which aren't in SVN nor www.mediawiki.org. I'll try to figure out myself, of course. Dmitriy

Aryeh Gregor

7:13 p.m.

New subject: [Foundation-l] Question to post...

On Tue, Aug 25, 2009 at 12:30 PM, Dmitriy Sintsov<questpc(a)rambler.ru> wrote:

...

For every invocation of the same article with any action that produces HTML output.

That's wrong. The canonical version of a page must be a page with substantially identical content. Edit pages serve totally different HTML; rel=canonical pointing to the article will just be ignored by search engines. See here for a discussion of how rel=canonical works: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.h… Note, e.g., "We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content." Totally different content, no.

...

Wouldn't the action=edit be indexed by robots when we have no proper robots.txt? Or, there will be meta noindex, nofollow in the head of such page?

Yes, we set noindex on edit pages.

...

Anyway, it seems that Yandex crawler doesn't like the meta noindex rules in the header of the page, giving an error (warning) message in the stats of their webmaster tools.

What does the warning say? Ideally, of course, you should ban them in robots.txt, so the search engine doesn't have to bother fetching the URL.

...

I've thought that the purpose of canonical link is to threat the multiple actions of the page as the single page to the web indexer, thus, improving the ranks.

The purpose is to tell search engines which URL you'd prefer them to present to users, if the same content is being served under multiple URLs. It is not meant to artificially inflate rankings by counting unindexed pages as contributing to some entirely different page of your choosing, and using it that way won't actually work. Since search engines were already using heuristics to identify duplicate content, and might well continue to use those exact same heuristics to validate rel=canonical, it might not improve rankings at all.

Dmitriy Sintsov

9:02 p.m.

New subject: [Foundation-l] Question to post...

* Aryeh Gregor <Simetrical+wikilist(a)gmail.com> [Tue, 25 Aug 2009 13:13:56 -0400]:

...

http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.h…

...

Thanks for pointing out. > Note, e.g., "We allow slight differences, e.g., in the sort order of a > table of products. We also recognize that we may crawl the canonical > and the duplicate pages at different points in time, so we may > occasionally see different versions of your content." Totally > different content, no.

...

Well, semantically an edit page and action=view page are not totally different, for sure. Both of these will contain very similar information. But I cannot go against standards, that's impossible. That's something like law, you don't always like it, but you have to obey it.

...

> Anyway, it seems that Yandex crawler doesn't like the meta noindex > rules in the header of the page, giving an error (warning) message

in > > the stats of their webmaster tools.

...

> What does the warning say? Ideally, of course, you should ban them in > robots.txt, so the search engine doesn't have to bother fetching the > URL.

...

I've banned them in robots.txt It produces the warning due to non-existing titles, which also have meta noindex. There are some links from foreign sites to non-existing titles which I obviously cannot disable something like "http://mywiki.org/wiki/nonexsitingtitle" . Yandex gives the warning "Document contains meta-tag noindex" (approximately translated from Russian). A lots of such warnings. A bit strange, why this is a warning at all. Google doesn't give such warning. > The purpose is to tell search engines which URL you'd prefer them to > present to users, if the same content is being served under multiple > URLs. It is not meant to artificially inflate rankings by counting > unindexed pages as contributing to some entirely different page of > your choosing, and using it that way won't actually work. Since > search engines were already using heuristics to identify duplicate > content, and might well continue to use those exact same heuristics to > validate rel=canonical, it might not improve rankings at all.

...

I am not so sure that such inflation is artifical. The artifical one would be when the article/revision is not the same or, even mixing MediaWiki generated HTML and other HTML. But, anyway I cannot change how the search engines will interpret it. Dmitriy

5357

days inactive

5368

days old

wikitech-l@lists.wikimedia.org

Manage subscription

10 comments

7 participants

tags (0)

participants (7)

Angela
Aryeh Gregor
Brion Vibber
Dmitriy Sintsov
Gerard Meijssen
Petr Kadlec
Platonides