Hi guys,
Today I noticed that a page on zero (http://nl.zero.wikipedia.org/wiki/Chuck_Deely) showed up in Google search results. I took a look at http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066 and the page does include <link href="http://nl.wikipedia.org/wiki/Chuck_Deely" rel="canonical" />.
Any idea what's going wrong here?
Maarten
CC'ing Dan Foy to investigate. --tomasz
On Mon, May 20, 2013 at 3:35 AM, Maarten Dammers maarten@mdammers.nl wrote:
Hi guys,
Today I noticed that a page on zero (http://nl.zero.wikipedia.org/wiki/Chuck_Deely) showed up in Google search results. I took a look at http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066 and the page does include <link href="http://nl.wikipedia.org/wiki/Chuck_Deely" rel="canonical" />.
Any idea what's going wrong here?
Maarten
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Thanks - we'll look into what's going on.
- Dan
On Mon, May 20, 2013 at 11:02 AM, Tomasz Finc tfinc@wikimedia.org wrote:
CC'ing Dan Foy to investigate. --tomasz
On Mon, May 20, 2013 at 3:35 AM, Maarten Dammers maarten@mdammers.nl wrote:
Hi guys,
Today I noticed that a page on zero (http://nl.zero.wikipedia.org/wiki/Chuck_Deely) showed up in Google
search
results. I took a look at http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066an... the page does include <link href="
http://nl.wikipedia.org/wiki/Chuck_Deely"
rel="canonical" />.
Any idea what's going wrong here?
Maarten
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
There was a recent bug that might have caused this: https://bugzilla.wikimedia.org/show_bug.cgi?id=35233
The page in cache probably hasn't been updated - if possible can you see the cached view and see if the canonical link tag is present?
On Mon, May 20, 2013 at 3:35 AM, Maarten Dammers maarten@mdammers.nl wrote:
Hi guys,
Today I noticed that a page on zero (http://nl.zero.wikipedia.org/wiki/Chuck_Deely) showed up in Google search results. I took a look at http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066 and the page does include <link href="http://nl.wikipedia.org/wiki/Chuck_Deely" rel="canonical" />.
Any idea what's going wrong here?
Maarten
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Kasper, we're looking at a patch for this. Thanks!
On Mon, May 20, 2013 at 1:02 PM, Kasper Souren kasper@guaka.org wrote:
"Disallow: /" in http://*.zero.wikipedia.org/robots.txt would suffice.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Kasper, after further analysis, it appears the patch introduced last week should clear this up within about 35 days. I'll set a reminder to validate as much.
On Mon, May 20, 2013 at 2:29 PM, Adam Baso abaso@wikimedia.org wrote:
Kasper, we're looking at a patch for this. Thanks!
On Mon, May 20, 2013 at 1:02 PM, Kasper Souren kasper@guaka.org wrote:
"Disallow: /" in http://*.zero.wikipedia.org/robots.txt would suffice.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Update:
We've added an enhancement to Wikipedia Zero so that if a user who isn't on a participating carrier network navigates to a Wikipedia Zero page on <language>.zero.wikipedia.org, such as http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be presented an option to visit the canonical URL of the article. If clicked, the canonical URL should get the user to the mobile or desktop version of the page, based on device type.
We're hoping that by next week the Google index will be refreshed so as to correctly mark the <language>.zero.wikipedia.org pages as duplicate pages in the omitted section. Upon confirmation of as much, the current plan is to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent indexing of <language>.zero.wikipedia.org altogether.
On Tue, May 21, 2013 at 11:26 AM, Adam Baso abaso@wikimedia.org wrote:
Kasper, after further analysis, it appears the patch introduced last week should clear this up within about 35 days. I'll set a reminder to validate as much.
On Mon, May 20, 2013 at 2:29 PM, Adam Baso abaso@wikimedia.org wrote:
Kasper, we're looking at a patch for this. Thanks!
On Mon, May 20, 2013 at 1:02 PM, Kasper Souren kasper@guaka.org wrote:
"Disallow: /" in http://*.zero.wikipedia.org/robots.txt would suffice.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Okay, looks like the index of zero.wikipedia.org pages in Google has shrunk by some 20 million entries. Nonetheless, a number of really old pages (e.g., going back to 6-May-2013) are still in the Google index with article text. I'll set a reminder to check on the Google index again in 30 days, and hopefully then we can finally put the no-index rules in place at that time.
The good news is that many of the pages are now correctly suppressed in natural search as non-canonical pages. In other words, a user would need to go through omitted results or do a site:<domain> search to see them.
-Adam
On Tue, Jun 18, 2013 at 3:33 PM, Adam Baso abaso@wikimedia.org wrote:
Update:
We've added an enhancement to Wikipedia Zero so that if a user who isn't on a participating carrier network navigates to a Wikipedia Zero page on <language>.zero.wikipedia.org, such as http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be presented an option to visit the canonical URL of the article. If clicked, the canonical URL should get the user to the mobile or desktop version of the page, based on device type.
We're hoping that by next week the Google index will be refreshed so as to correctly mark the <language>.zero.wikipedia.org pages as duplicate pages in the omitted section. Upon confirmation of as much, the current plan is to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent indexing of <language>.zero.wikipedia.org altogether.
On Tue, May 21, 2013 at 11:26 AM, Adam Baso abaso@wikimedia.org wrote:
Kasper, after further analysis, it appears the patch introduced last week should clear this up within about 35 days. I'll set a reminder to validate as much.
On Mon, May 20, 2013 at 2:29 PM, Adam Baso abaso@wikimedia.org wrote:
Kasper, we're looking at a patch for this. Thanks!
On Mon, May 20, 2013 at 1:02 PM, Kasper Souren kasper@guaka.org wrote:
"Disallow: /" in http://*.zero.wikipedia.org/robots.txt would suffice.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Update:
I have been checking on the indexed link count over the last couple of months, and it has been roughly constant. Upon another check in the past week, it looked like it was time to go ahead with the robots.txt update.
Just yesterday, the start of a robots.txt entry for <lang>. zero.wikipedia.org has also been updated to instruct all robots like Googlebot to not index <lang>.zero.wikipedia.org. Looks like even more <lang>.zero.wikipedia.org pages may already be starting to fall out of the index.
Thanks for flagging this! Will keep watching the indexed links count as it dwindles.
Thanks again. -Adam
On Wed, Jun 26, 2013 at 10:57 AM, Adam Baso abaso@wikimedia.org wrote:
Okay, looks like the index of zero.wikipedia.org pages in Google has shrunk by some 20 million entries. Nonetheless, a number of really old pages (e.g., going back to 6-May-2013) are still in the Google index with article text. I'll set a reminder to check on the Google index again in 30 days, and hopefully then we can finally put the no-index rules in place at that time.
The good news is that many of the pages are now correctly suppressed in natural search as non-canonical pages. In other words, a user would need to go through omitted results or do a site:<domain> search to see them.
-Adam
On Tue, Jun 18, 2013 at 3:33 PM, Adam Baso abaso@wikimedia.org wrote:
Update:
We've added an enhancement to Wikipedia Zero so that if a user who isn't on a participating carrier network navigates to a Wikipedia Zero page on <language>.zero.wikipedia.org, such as http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be presented an option to visit the canonical URL of the article. If clicked, the canonical URL should get the user to the mobile or desktop version of the page, based on device type.
We're hoping that by next week the Google index will be refreshed so as to correctly mark the <language>.zero.wikipedia.org pages as duplicate pages in the omitted section. Upon confirmation of as much, the current plan is to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent indexing of <language>.zero.wikipedia.org altogether.
On Tue, May 21, 2013 at 11:26 AM, Adam Baso abaso@wikimedia.org wrote:
Kasper, after further analysis, it appears the patch introduced last week should clear this up within about 35 days. I'll set a reminder to validate as much.
On Mon, May 20, 2013 at 2:29 PM, Adam Baso abaso@wikimedia.org wrote:
Kasper, we're looking at a patch for this. Thanks!
On Mon, May 20, 2013 at 1:02 PM, Kasper Souren kasper@guaka.orgwrote:
"Disallow: /" in http://*.zero.wikipedia.org/robots.txt would suffice.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l