[Wikimedia-l] Wikipedia Zero in Google search result

Adam Baso abaso at wikimedia.org
Wed Jun 26 17:59:25 UTC 2013


(cross-posted on mobile-l)

Okay, looks like the index of zero.wikipedia.org pages in Google has shrunk
by some 20 million entries. Nonetheless, a number of really old pages
(e.g., going back to 6-May-2013) are still in the Google index with article
text. I'll set a reminder to check on the Google index again in 30 days,
and hopefully then we can finally put the no-index rules in place at that
time.

The good news is that many of the pages are now correctly suppressed in
natural search as non-canonical pages. In other words, a user would need to
go through omitted results or do a site:<domain> search to see them.

-Adam


On Tue, Jun 18, 2013 at 3:35 PM, Adam Baso <abaso at wikimedia.org> wrote:

> Update:
>
> We've added an enhancement to Wikipedia Zero so that if a user who isn't
> on a participating carrier network navigates to a Wikipedia Zero page on
> <language>.zero.wikipedia.org, such as
> http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
> presented an option to visit the canonical URL of the article. If clicked,
> the canonical URL should get the user to the mobile or desktop version of
> the page, based on device type.
>
> We're hoping that by next week the Google index will be refreshed so as to
> correctly mark the <language>.zero.wikipedia.org pages as duplicate pages
> in the omitted section. Upon confirmation of as much, the current plan is
> to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to prevent
> indexing of <language>.zero.wikipedia.org altogether.
>
>
> On Tue, May 28, 2013 at 6:26 PM, Adam Baso <abaso at wikimedia.org> wrote:
>
>> All,
>>
>> My mistake. The pages in Google's index that I used for sampling - the
>> ones that have "Sorry, ..." in their description in Google search results -
>> are cached pages. I assumed incorrectly that those pages were based on
>> recent indexing (e.g., in the past few days).
>>
>> I think we can actually stick to the original plan of Google re-indexing
>> and the search results de-emphasizing the <language>.zero.wikipedia.orglinks within the next 30 days.
>>
>> I still find it strange that there are <language>.zero.wikipedia.orglinks that turned up higher in the search engine rankings than their
>> better-established <language>.wikipedia.org counterparts. But I suppose
>> with fewer competing page elements, especially on long-tail articles with
>> fewer or no direct links to the desktop page, this is maybe not totally
>> unexpected.
>>
>> -Adam
>>
>>
>>
>>
>> On Tue, May 28, 2013 at 1:49 PM, Adam Baso <abaso at wikimedia.org> wrote:
>>
>>> Hello All,
>>>
>>> We had shelved my patch, patch 64629<https://gerrit.wikimedia.org/r/64629>,
>>> in hopes that an earlier patch, patch 61809<https://gerrit.wikimedia.org/r/61809>(bug
>>> 35233 <https://bugzilla.wikimedia.org/show_bug.cgi?id=35233>), would
>>> resolve the issue naturally as Google re-indexed. But it appears Google has
>>> re-indexed and yet the .zero.wikipedia.org URLs are still  present in
>>> Google's index, instead of the <language>.wikipedia.org URLs.
>>>
>>> I have thus resubmitted patch 64629<https://gerrit.wikimedia.org/r/64629> for
>>> re-review. We will need to further discuss whether it is appropriate to
>>> have Google completely remove .zero.wikipedia.org links from their
>>> cache, or if perhaps we need to open a support thread with Google about
>>> canonical URLs.
>>>
>>>
>>>
>>>
>>> On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa <kwadhwa at wikimedia.org>wrote:
>>>
>>>> Adam Baso (copied on this email) is working on it and a fix is ready.
>>>> He'll do some testing to make sure it's resolved.
>>>>
>>>> On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc <tfinc at wikimedia.org>wrote:
>>>>
>>>>> Looping Dan Foy in who's managing the Zero backlog.
>>>>>
>>>>> On Mon, May 27, 2013 at 8:01 AM, MZMcBride <z at mzmcbride.com> wrote:
>>>>> > K. Peachey wrote:
>>>>> >>Can you please file this in bugzilla <https://bugzilla.wikimedia.org
>>>>> >?
>>>>> >
>>>>> > https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
>>>>> >
>>>>> >
>>>>> > MZMcBride
>>>>> >
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > Wikimedia-l mailing list
>>>>> > Wikimedia-l at lists.wikimedia.org
>>>>> > Unsubscribe:
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>>>
>>>>> _______________________________________________
>>>>> Wikimedia-l mailing list
>>>>> Wikimedia-l at lists.wikimedia.org
>>>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Kul Wadhwa
>>>> Head of Mobile
>>>> Wikimedia Foundation
>>>>
>>>
>>>
>>
>


More information about the Wikimedia-l mailing list