Hi,
I noticed that when I'm searching on Google, many Wikipedia results are in the form of lang-code.zero.wikipedia.org, perhaps just since a day or two ago.
I'm not sure what items are indexed this way, but it would really be a trouble - there is no link on the page that jumps you to the standard site (even the notice links to main page of m.wikipedia.org, not the corresponding article on m.wikipedia.org)
Regards,
Benjamin Chen / [[User:Bencmq]]
Can you please file this in bugzilla https://bugzilla.wikimedia.org.
Thanks.
On Mon, May 27, 2013 at 9:41 PM, Benjamin Chen bencmqwiki@gmail.com wrote:
Hi,
I noticed that when I'm searching on Google, many Wikipedia results are in the form of lang-code.zero.wikipedia.org, perhaps just since a day or two ago.
I'm not sure what items are indexed this way, but it would really be a trouble - there is no link on the page that jumps you to the standard site (even the notice links to main page of m.wikipedia.org, not the corresponding article on m.wikipedia.org)
Regards,
Benjamin Chen / [[User:Bencmq]]
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
K. Peachey wrote:
Can you please file this in bugzilla https://bugzilla.wikimedia.org?
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
MZMcBride
Looping Dan Foy in who's managing the Zero backlog.
On Mon, May 27, 2013 at 8:01 AM, MZMcBride z@mzmcbride.com wrote:
K. Peachey wrote:
Can you please file this in bugzilla https://bugzilla.wikimedia.org?
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
MZMcBride
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Adam Baso (copied on this email) is working on it and a fix is ready. He'll do some testing to make sure it's resolved.
On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tfinc@wikimedia.org wrote:
Looping Dan Foy in who's managing the Zero backlog.
On Mon, May 27, 2013 at 8:01 AM, MZMcBride z@mzmcbride.com wrote:
K. Peachey wrote:
Can you please file this in bugzilla https://bugzilla.wikimedia.org?
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
MZMcBride
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
As I mentioned on the bug report [1] I worry more about the fact that the content is not accessible to users on another IP. If a user of Wikipedia Zero shares a link do we not want that to be accessible to other people? By showing a message as we currently do "Sorry, zero.wikipedia.org is only supported by select mobile carriers and is not available from your mobile carrier." are we not denying people knowledge and going against our mission? Should this not redirect or at least be a message pointing to the actual content?
Also if we block google from indexing zero any links shared on the zero domain will not contribute to the page ranking of that page which seems a bit dumb...
As I mentioned the fact that pages are currently in google results is only a temporary problem due to an upstream bug [2] in the MobileFrontend extension which should resolve itself as the cache is cleared.
[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=48856#c3 [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=35233
On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwadhwa@wikimedia.org wrote:
Adam Baso (copied on this email) is working on it and a fix is ready. He'll do some testing to make sure it's resolved.
On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tfinc@wikimedia.org wrote:
Looping Dan Foy in who's managing the Zero backlog.
On Mon, May 27, 2013 at 8:01 AM, MZMcBride z@mzmcbride.com wrote:
K. Peachey wrote:
Can you please file this in bugzilla https://bugzilla.wikimedia.org?
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
MZMcBride
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
-- Kul Wadhwa Head of Mobile Wikimedia Foundation _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
-- Jon Robson http://jonrobson.me.uk @rakugojon
Hello All,
We had shelved my patch, patch 64629 https://gerrit.wikimedia.org/r/64629, in hopes that an earlier patch, patch 61809https://gerrit.wikimedia.org/r/61809(bug 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would resolve the issue naturally as Google re-indexed. But it appears Google has re-indexed and yet the .zero.wikipedia.org URLs are still present in Google's index, instead of the <language>.wikipedia.org URLs.
I have thus resubmitted patch 64629 https://gerrit.wikimedia.org/r/64629 for re-review. We will need to further discuss whether it is appropriate to have Google completely remove .zero.wikipedia.org links from their cache, or if perhaps we need to open a support thread with Google about canonical URLs.
On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwadhwa@wikimedia.org wrote:
Adam Baso (copied on this email) is working on it and a fix is ready. He'll do some testing to make sure it's resolved.
On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tfinc@wikimedia.org wrote:
Looping Dan Foy in who's managing the Zero backlog.
On Mon, May 27, 2013 at 8:01 AM, MZMcBride z@mzmcbride.com wrote:
K. Peachey wrote:
Can you please file this in bugzilla https://bugzilla.wikimedia.org?
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
MZMcBride
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
-- Kul Wadhwa Head of Mobile Wikimedia Foundation
At some level there seems to be a change in google (or our) settings that are doing this everywhere. I've also been seeing a lot of links indexed and appearing in google as the primary domain too (wikipedia.org/wiki/Bostonrather then en.wikipedia.org/wiki/Boston, seen it on Wikivoyage as well and I assume the others). At some level we're probably going to want to figure out what's happening because at some point down the road 'something' changed just not sure on what side.
James Alexander Legal and Community Advocacy Wikimedia Foundation (415) 839-6885 x6716 @jamesofur
On Tue, May 28, 2013 at 1:49 PM, Adam Baso abaso@wikimedia.org wrote:
Hello All,
We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629, in hopes that an earlier patch, patch 61809https://gerrit.wikimedia.org/r/61809(bug 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would resolve the issue naturally as Google re-indexed. But it appears Google has re-indexed and yet the .zero.wikipedia.org URLs are still present in Google's index, instead of the <language>.wikipedia.org URLs.
I have thus resubmitted patch 64629 https://gerrit.wikimedia.org/r/64629 for re-review. We will need to further discuss whether it is appropriate to have Google completely remove .zero.wikipedia.org links from their cache, or if perhaps we need to open a support thread with Google about canonical URLs.
On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwadhwa@wikimedia.org wrote:
Adam Baso (copied on this email) is working on it and a fix is ready. He'll do some testing to make sure it's resolved.
On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tfinc@wikimedia.orgwrote:
Looping Dan Foy in who's managing the Zero backlog.
On Mon, May 27, 2013 at 8:01 AM, MZMcBride z@mzmcbride.com wrote:
K. Peachey wrote:
Can you please file this in bugzilla https://bugzilla.wikimedia.org?
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
MZMcBride
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
-- Kul Wadhwa Head of Mobile Wikimedia Foundation
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Sorry for the double post: For what it's worth .... It does not appear that we have a canonical url appearing on most pages. Mobile has canonical pointing to the main site but neither zero or the main site have a canonical expressly stated.
James Alexander Legal and Community Advocacy Wikimedia Foundation (415) 839-6885 x6716 @jamesofur
On Tue, May 28, 2013 at 1:53 PM, James Alexander jalexander@wikimedia.orgwrote:
At some level there seems to be a change in google (or our) settings that are doing this everywhere. I've also been seeing a lot of links indexed and appearing in google as the primary domain too (wikipedia.org/wiki/Bostonrather then en.wikipedia.org/wiki/Boston, seen it on Wikivoyage as well and I assume the others). At some level we're probably going to want to figure out what's happening because at some point down the road 'something' changed just not sure on what side.
James Alexander Legal and Community Advocacy Wikimedia Foundation (415) 839-6885 x6716 @jamesofur
On Tue, May 28, 2013 at 1:49 PM, Adam Baso abaso@wikimedia.org wrote:
Hello All,
We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629, in hopes that an earlier patch, patch 61809https://gerrit.wikimedia.org/r/61809(bug 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would resolve the issue naturally as Google re-indexed. But it appears Google has re-indexed and yet the .zero.wikipedia.org URLs are still present in Google's index, instead of the <language>.wikipedia.org URLs.
I have thus resubmitted patch 64629https://gerrit.wikimedia.org/r/64629 for re-review. We will need to further discuss whether it is appropriate to have Google completely remove .zero.wikipedia.org links from their cache, or if perhaps we need to open a support thread with Google about canonical URLs.
On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwadhwa@wikimedia.orgwrote:
Adam Baso (copied on this email) is working on it and a fix is ready. He'll do some testing to make sure it's resolved.
On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tfinc@wikimedia.orgwrote:
Looping Dan Foy in who's managing the Zero backlog.
On Mon, May 27, 2013 at 8:01 AM, MZMcBride z@mzmcbride.com wrote:
K. Peachey wrote:
Can you please file this in bugzilla <https://bugzilla.wikimedia.org
?
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
MZMcBride
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
-- Kul Wadhwa Head of Mobile Wikimedia Foundation
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
FWIW Adding debug=true on zero domains should show the canonical url to be present. As stated before this will fix itself within less than 30 days as the caches update. e.g. http://hak.zero.wikipedia.org/wiki/Th%C3%A8u-Ya%CC%8Dp?debug=true
As James points out the main site doesn't use canonical urls and probably should... but I'd say that's another bug.
On Tue, May 28, 2013 at 1:58 PM, James Alexander jalexander@wikimedia.org wrote:
Sorry for the double post: For what it's worth .... It does not appear that we have a canonical url appearing on most pages. Mobile has canonical pointing to the main site but neither zero or the main site have a canonical expressly stated.
James Alexander Legal and Community Advocacy Wikimedia Foundation (415) 839-6885 x6716 @jamesofur
On Tue, May 28, 2013 at 1:53 PM, James Alexander jalexander@wikimedia.orgwrote:
At some level there seems to be a change in google (or our) settings that are doing this everywhere. I've also been seeing a lot of links indexed and appearing in google as the primary domain too (wikipedia.org/wiki/Bostonrather then en.wikipedia.org/wiki/Boston, seen it on Wikivoyage as well and I assume the others). At some level we're probably going to want to figure out what's happening because at some point down the road 'something' changed just not sure on what side.
James Alexander Legal and Community Advocacy Wikimedia Foundation (415) 839-6885 x6716 @jamesofur
On Tue, May 28, 2013 at 1:49 PM, Adam Baso abaso@wikimedia.org wrote:
Hello All,
We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629, in hopes that an earlier patch, patch 61809https://gerrit.wikimedia.org/r/61809(bug 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would resolve the issue naturally as Google re-indexed. But it appears Google has re-indexed and yet the .zero.wikipedia.org URLs are still present in Google's index, instead of the <language>.wikipedia.org URLs.
I have thus resubmitted patch 64629https://gerrit.wikimedia.org/r/64629 for re-review. We will need to further discuss whether it is appropriate to have Google completely remove .zero.wikipedia.org links from their cache, or if perhaps we need to open a support thread with Google about canonical URLs.
On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwadhwa@wikimedia.orgwrote:
Adam Baso (copied on this email) is working on it and a fix is ready. He'll do some testing to make sure it's resolved.
On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tfinc@wikimedia.orgwrote:
Looping Dan Foy in who's managing the Zero backlog.
On Mon, May 27, 2013 at 8:01 AM, MZMcBride z@mzmcbride.com wrote:
K. Peachey wrote: >Can you please file this in bugzilla <https://bugzilla.wikimedia.org ?
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
MZMcBride
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
-- Kul Wadhwa Head of Mobile Wikimedia Foundation
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
All,
My mistake. The pages in Google's index that I used for sampling - the ones that have "Sorry, ..." in their description in Google search results - are cached pages. I assumed incorrectly that those pages were based on recent indexing (e.g., in the past few days).
I think we can actually stick to the original plan of Google re-indexing and the search results de-emphasizing the <language>.zero.wikipedia.orglinks within the next 30 days.
I still find it strange that there are <language>.zero.wikipedia.org links that turned up higher in the search engine rankings than their better-established <language>.wikipedia.org counterparts. But I suppose with fewer competing page elements, especially on long-tail articles with fewer or no direct links to the desktop page, this is maybe not totally unexpected.
-Adam
On Tue, May 28, 2013 at 1:49 PM, Adam Baso abaso@wikimedia.org wrote:
Hello All,
We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629, in hopes that an earlier patch, patch 61809https://gerrit.wikimedia.org/r/61809(bug 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would resolve the issue naturally as Google re-indexed. But it appears Google has re-indexed and yet the .zero.wikipedia.org URLs are still present in Google's index, instead of the <language>.wikipedia.org URLs.
I have thus resubmitted patch 64629 https://gerrit.wikimedia.org/r/64629 for re-review. We will need to further discuss whether it is appropriate to have Google completely remove .zero.wikipedia.org links from their cache, or if perhaps we need to open a support thread with Google about canonical URLs.
On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwadhwa@wikimedia.org wrote:
Adam Baso (copied on this email) is working on it and a fix is ready. He'll do some testing to make sure it's resolved.
On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tfinc@wikimedia.orgwrote:
Looping Dan Foy in who's managing the Zero backlog.
On Mon, May 27, 2013 at 8:01 AM, MZMcBride z@mzmcbride.com wrote:
K. Peachey wrote:
Can you please file this in bugzilla https://bugzilla.wikimedia.org?
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
MZMcBride
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
-- Kul Wadhwa Head of Mobile Wikimedia Foundation
Update:
We've added an enhancement to Wikipedia Zero so that if a user who isn't on a participating carrier network navigates to a Wikipedia Zero page on <language>.zero.wikipedia.org, such as http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be presented an option to visit the canonical URL of the article. If clicked, the canonical URL should get the user to the mobile or desktop version of the page, based on device type.
We're hoping that by next week the Google index will be refreshed so as to correctly mark the <language>.zero.wikipedia.org pages as duplicate pages in the omitted section. Upon confirmation of as much, the current plan is to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent indexing of <language>.zero.wikipedia.org altogether.
On Tue, May 28, 2013 at 6:26 PM, Adam Baso abaso@wikimedia.org wrote:
All,
My mistake. The pages in Google's index that I used for sampling - the ones that have "Sorry, ..." in their description in Google search results - are cached pages. I assumed incorrectly that those pages were based on recent indexing (e.g., in the past few days).
I think we can actually stick to the original plan of Google re-indexing and the search results de-emphasizing the <language>.zero.wikipedia.orglinks within the next 30 days.
I still find it strange that there are <language>.zero.wikipedia.orglinks that turned up higher in the search engine rankings than their better-established <language>.wikipedia.org counterparts. But I suppose with fewer competing page elements, especially on long-tail articles with fewer or no direct links to the desktop page, this is maybe not totally unexpected.
-Adam
On Tue, May 28, 2013 at 1:49 PM, Adam Baso abaso@wikimedia.org wrote:
Hello All,
We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629, in hopes that an earlier patch, patch 61809https://gerrit.wikimedia.org/r/61809(bug 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would resolve the issue naturally as Google re-indexed. But it appears Google has re-indexed and yet the .zero.wikipedia.org URLs are still present in Google's index, instead of the <language>.wikipedia.org URLs.
I have thus resubmitted patch 64629https://gerrit.wikimedia.org/r/64629 for re-review. We will need to further discuss whether it is appropriate to have Google completely remove .zero.wikipedia.org links from their cache, or if perhaps we need to open a support thread with Google about canonical URLs.
On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwadhwa@wikimedia.orgwrote:
Adam Baso (copied on this email) is working on it and a fix is ready. He'll do some testing to make sure it's resolved.
On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tfinc@wikimedia.orgwrote:
Looping Dan Foy in who's managing the Zero backlog.
On Mon, May 27, 2013 at 8:01 AM, MZMcBride z@mzmcbride.com wrote:
K. Peachey wrote:
Can you please file this in bugzilla <https://bugzilla.wikimedia.org
?
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
MZMcBride
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
-- Kul Wadhwa Head of Mobile Wikimedia Foundation
On 06/18/2013 06:35 PM, Adam Baso wrote:
Update:
We've added an enhancement to Wikipedia Zero so that if a user who isn't on a participating carrier network navigates to a Wikipedia Zero page on <language>.zero.wikipedia.org, such as http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be presented an option to visit the canonical URL of the article. If clicked, the canonical URL should get the user to the mobile or desktop version of the page, based on device type.
That's good to hear. It would be helpful if when visiting on desktop (the original report, https://bugzilla.wikimedia.org/show_bug.cgi?id=48856, is about desktop search), it did not mention "mobile carriers", "data charges", and such. Perhaps it could even redirect silently.
If that's not feasible for now, perhaps the message could be a bit more general so it reads better on desktop.
Matt Flaschen
(cross-posted on mobile-l)
Okay, looks like the index of zero.wikipedia.org pages in Google has shrunk by some 20 million entries. Nonetheless, a number of really old pages (e.g., going back to 6-May-2013) are still in the Google index with article text. I'll set a reminder to check on the Google index again in 30 days, and hopefully then we can finally put the no-index rules in place at that time.
The good news is that many of the pages are now correctly suppressed in natural search as non-canonical pages. In other words, a user would need to go through omitted results or do a site:<domain> search to see them.
-Adam
On Tue, Jun 18, 2013 at 3:35 PM, Adam Baso abaso@wikimedia.org wrote:
Update:
We've added an enhancement to Wikipedia Zero so that if a user who isn't on a participating carrier network navigates to a Wikipedia Zero page on <language>.zero.wikipedia.org, such as http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be presented an option to visit the canonical URL of the article. If clicked, the canonical URL should get the user to the mobile or desktop version of the page, based on device type.
We're hoping that by next week the Google index will be refreshed so as to correctly mark the <language>.zero.wikipedia.org pages as duplicate pages in the omitted section. Upon confirmation of as much, the current plan is to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent indexing of <language>.zero.wikipedia.org altogether.
On Tue, May 28, 2013 at 6:26 PM, Adam Baso abaso@wikimedia.org wrote:
All,
My mistake. The pages in Google's index that I used for sampling - the ones that have "Sorry, ..." in their description in Google search results - are cached pages. I assumed incorrectly that those pages were based on recent indexing (e.g., in the past few days).
I think we can actually stick to the original plan of Google re-indexing and the search results de-emphasizing the <language>.zero.wikipedia.orglinks within the next 30 days.
I still find it strange that there are <language>.zero.wikipedia.orglinks that turned up higher in the search engine rankings than their better-established <language>.wikipedia.org counterparts. But I suppose with fewer competing page elements, especially on long-tail articles with fewer or no direct links to the desktop page, this is maybe not totally unexpected.
-Adam
On Tue, May 28, 2013 at 1:49 PM, Adam Baso abaso@wikimedia.org wrote:
Hello All,
We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629, in hopes that an earlier patch, patch 61809https://gerrit.wikimedia.org/r/61809(bug 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would resolve the issue naturally as Google re-indexed. But it appears Google has re-indexed and yet the .zero.wikipedia.org URLs are still present in Google's index, instead of the <language>.wikipedia.org URLs.
I have thus resubmitted patch 64629https://gerrit.wikimedia.org/r/64629 for re-review. We will need to further discuss whether it is appropriate to have Google completely remove .zero.wikipedia.org links from their cache, or if perhaps we need to open a support thread with Google about canonical URLs.
On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwadhwa@wikimedia.orgwrote:
Adam Baso (copied on this email) is working on it and a fix is ready. He'll do some testing to make sure it's resolved.
On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tfinc@wikimedia.orgwrote:
Looping Dan Foy in who's managing the Zero backlog.
On Mon, May 27, 2013 at 8:01 AM, MZMcBride z@mzmcbride.com wrote:
K. Peachey wrote: >Can you please file this in bugzilla <https://bugzilla.wikimedia.org ?
https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
MZMcBride
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
-- Kul Wadhwa Head of Mobile Wikimedia Foundation
(cross-posted on mobile-l)
Update:
I have been checking on the indexed link count over the last couple of months, and it has been roughly constant. Upon another check in the past week, it looked like it was time to go ahead with the robots.txt update.
Just yesterday, the start of a robots.txt entry for <lang>. zero.wikipedia.org has also been updated to instruct all robots like Googlebot to not index <lang>.zero.wikipedia.org. Looks like even more <lang>.zero.wikipedia.org pages may already be starting to fall out of the index.
Thanks for flagging this! Will keep watching the indexed links count as it dwindles.
Thanks again. -Adam
On Wed, Jun 26, 2013 at 10:59 AM, Adam Baso abaso@wikimedia.org wrote:
(cross-posted on mobile-l)
Okay, looks like the index of zero.wikipedia.org pages in Google has shrunk by some 20 million entries. Nonetheless, a number of really old pages (e.g., going back to 6-May-2013) are still in the Google index with article text. I'll set a reminder to check on the Google index again in 30 days, and hopefully then we can finally put the no-index rules in place at that time.
The good news is that many of the pages are now correctly suppressed in natural search as non-canonical pages. In other words, a user would need to go through omitted results or do a site:<domain> search to see them.
-Adam
On Tue, Jun 18, 2013 at 3:35 PM, Adam Baso abaso@wikimedia.org wrote:
Update:
We've added an enhancement to Wikipedia Zero so that if a user who isn't on a participating carrier network navigates to a Wikipedia Zero page on <language>.zero.wikipedia.org, such as http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be presented an option to visit the canonical URL of the article. If clicked, the canonical URL should get the user to the mobile or desktop version of the page, based on device type.
We're hoping that by next week the Google index will be refreshed so as to correctly mark the <language>.zero.wikipedia.org pages as duplicate pages in the omitted section. Upon confirmation of as much, the current plan is to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent indexing of <language>.zero.wikipedia.org altogether.
On Tue, May 28, 2013 at 6:26 PM, Adam Baso abaso@wikimedia.org wrote:
All,
My mistake. The pages in Google's index that I used for sampling - the ones that have "Sorry, ..." in their description in Google search results - are cached pages. I assumed incorrectly that those pages were based on recent indexing (e.g., in the past few days).
I think we can actually stick to the original plan of Google re-indexing and the search results de-emphasizing the <language>.zero.wikipedia.orglinks within the next 30 days.
I still find it strange that there are <language>.zero.wikipedia.orglinks that turned up higher in the search engine rankings than their better-established <language>.wikipedia.org counterparts. But I suppose with fewer competing page elements, especially on long-tail articles with fewer or no direct links to the desktop page, this is maybe not totally unexpected.
-Adam
On Tue, May 28, 2013 at 1:49 PM, Adam Baso abaso@wikimedia.org wrote:
Hello All,
We had shelved my patch, patch 64629https://gerrit.wikimedia.org/r/64629, in hopes that an earlier patch, patch 61809https://gerrit.wikimedia.org/r/61809(bug 35233 https://bugzilla.wikimedia.org/show_bug.cgi?id=35233), would resolve the issue naturally as Google re-indexed. But it appears Google has re-indexed and yet the .zero.wikipedia.org URLs are still present in Google's index, instead of the <language>.wikipedia.org URLs.
I have thus resubmitted patch 64629https://gerrit.wikimedia.org/r/64629 for re-review. We will need to further discuss whether it is appropriate to have Google completely remove .zero.wikipedia.org links from their cache, or if perhaps we need to open a support thread with Google about canonical URLs.
On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa kwadhwa@wikimedia.orgwrote:
Adam Baso (copied on this email) is working on it and a fix is ready. He'll do some testing to make sure it's resolved.
On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc tfinc@wikimedia.orgwrote:
Looping Dan Foy in who's managing the Zero backlog.
On Mon, May 27, 2013 at 8:01 AM, MZMcBride z@mzmcbride.com wrote: > K. Peachey wrote: >>Can you please file this in bugzilla < https://bugzilla.wikimedia.org%3E? > > https://bugzilla.wikimedia.org/show_bug.cgi?id=48856 > > > MZMcBride > > > > _______________________________________________ > Wikimedia-l mailing list > Wikimedia-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
-- Kul Wadhwa Head of Mobile Wikimedia Foundation
wikimedia-l@lists.wikimedia.org