Hi folks,
MediaWiki emits the hreflang attribute on language links, but only as part of the links in the body, and not in the <head> as recommended by Google [1]. The result of this is that Google (and possibly other search engines) doesn't interpret the hreflang attribute for purposes of prioritizing search results in the user's own language.
From a contact at Google we asked about this: "we currently don't use those
annotations on the links, we need to see the hreflang link-elements in the head in order to understand that connection. The important parts there are the we need to have them in the "head", we need to have them confirmed from the other versions (so DE needs to refer to EN, and EN to DE -- it can't be one-sided), and it needs to be between the canonical URLs. (...) I imagine if you just added the cross-links as you have them in the sidebar as link-elements to the head then you'd be covered."
This of course would add some additional payload to pages with lots of language links, but could help avoid results like [2] where the English language version of an article is #1 and the Indonesian one makes no appearance at all. Results vary greatly and it's hard to say how big a problem this is, but even if boosts discoverability of content in the user's language by only 10% or so, that would still be a pretty big win for local content.
I'm curious if folks see any downside, other than the additional page payload, in adding this information to the page header. Given the time it takes for the index to be updated, we should be careful about any potential negative consequences.
Thanks, Erik
[1] https://support.google.com/webmasters/answer/189077?hl=en [2] https://www.google.co.id/?gws_rd=ssl#q=edison
On 2015-02-09 2:16 AM, Erik Moeller wrote:
This of course would add some additional payload to pages with lots of language links, but could help avoid results like [2] where the English language version of an article is #1 and the Indonesian one makes no appearance at all. Results vary greatly and it's hard to say how big a problem this is, but even if boosts discoverability of content in the user's language by only 10% or so, that would still be a pretty big win for local content.
I think we used to have <link>s in the head (though the exact implementation may have been wrong), if it wasn't a problem then it wouldn't be one now.
Also if it really is a problem for Wikipedia Google also supports getting this data from the sitemap.
https://support.google.com/webmasters/answer/2620865?hl=en
So if it's a problem we could upgrade the sitemap generator to include this data (maybe as an extra parameter) and we could have a config setting that disables the <head> output.
That should work out pretty well. Small wiki setups don't have many language links so <link>s in the <head> aren't an issue. And if a wiki setup is big enough to have an issue, then it's probably big enough to be using sitemaps.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
On Feb 9, 2015 7:07 AM, "Daniel Friesen" daniel@nadir-seen-fire.com wrote:
On 2015-02-09 2:16 AM, Erik Moeller wrote:
This of course would add some additional payload to pages with lots of language links, but could help avoid results like [2] where the English language version of an article is #1 and the Indonesian one makes no appearance at all. Results vary greatly and it's hard to say how big a problem this is, but even if boosts discoverability of content in the user's language by only 10% or so, that would still be a pretty big win
for
local content.
I think we used to have <link>s in the head (though the exact implementation may have been wrong), if it wasn't a problem then it wouldn't be one now.
I think we still do for lang converter.
--bawolff
This appears to be tracked at https://phabricator.wikimedia.org/T3433 It was reverted in 2009 for (alleged) l10n issues, which should be solvable. FWIW we just saved 4 KB per page with https://gerrit.wikimedia.org/r/#/c/177501/
Nemo
wikitech-l@lists.wikimedia.org