[Wikimedia-l] Wikipedia Zero in Google search result

Adam Baso abaso at wikimedia.org
Wed Aug 28 18:45:38 UTC 2013


(cross-posted on mobile-l)

Update:

I have been checking on the indexed link count over the last couple of
months, and it has been roughly constant. Upon another check in the past
week, it looked like it was time to go ahead with the robots.txt update.

Just yesterday, the start of a robots.txt entry for <lang>.
zero.wikipedia.org has also been updated to instruct all robots like
Googlebot to not index <lang>.zero.wikipedia.org. Looks like even more
<lang>.zero.wikipedia.org pages may already be starting to fall out of the
index.

Thanks for flagging this! Will keep watching the indexed links count as it
dwindles.

Thanks again.
-Adam


On Wed, Jun 26, 2013 at 10:59 AM, Adam Baso <abaso at wikimedia.org> wrote:

> (cross-posted on mobile-l)
>
> Okay, looks like the index of zero.wikipedia.org pages in Google has
> shrunk by some 20 million entries. Nonetheless, a number of really old
> pages (e.g., going back to 6-May-2013) are still in the Google index with
> article text. I'll set a reminder to check on the Google index again in 30
> days, and hopefully then we can finally put the no-index rules in place at
> that time.
>
> The good news is that many of the pages are now correctly suppressed in
> natural search as non-canonical pages. In other words, a user would need to
> go through omitted results or do a site:<domain> search to see them.
>
> -Adam
>
>
> On Tue, Jun 18, 2013 at 3:35 PM, Adam Baso <abaso at wikimedia.org> wrote:
>
>> Update:
>>
>> We've added an enhancement to Wikipedia Zero so that if a user who isn't
>> on a participating carrier network navigates to a Wikipedia Zero page on
>> <language>.zero.wikipedia.org, such as
>> http://en.zero.wikipedia.org/wiki/Muse_%28band%29 , the user will be
>> presented an option to visit the canonical URL of the article. If clicked,
>> the canonical URL should get the user to the mobile or desktop version of
>> the page, based on device type.
>>
>> We're hoping that by next week the Google index will be refreshed so as
>> to correctly mark the <language>.zero.wikipedia.org pages as duplicate
>> pages in the omitted section. Upon confirmation of as much, the current
>> plan is to introduce https://gerrit.wikimedia.org/r/#/c/69420/ to
>> prevent indexing of <language>.zero.wikipedia.org altogether.
>>
>>
>> On Tue, May 28, 2013 at 6:26 PM, Adam Baso <abaso at wikimedia.org> wrote:
>>
>>> All,
>>>
>>> My mistake. The pages in Google's index that I used for sampling - the
>>> ones that have "Sorry, ..." in their description in Google search results -
>>> are cached pages. I assumed incorrectly that those pages were based on
>>> recent indexing (e.g., in the past few days).
>>>
>>> I think we can actually stick to the original plan of Google re-indexing
>>> and the search results de-emphasizing the <language>.zero.wikipedia.orglinks within the next 30 days.
>>>
>>> I still find it strange that there are <language>.zero.wikipedia.orglinks that turned up higher in the search engine rankings than their
>>> better-established <language>.wikipedia.org counterparts. But I suppose
>>> with fewer competing page elements, especially on long-tail articles with
>>> fewer or no direct links to the desktop page, this is maybe not totally
>>> unexpected.
>>>
>>> -Adam
>>>
>>>
>>>
>>>
>>> On Tue, May 28, 2013 at 1:49 PM, Adam Baso <abaso at wikimedia.org> wrote:
>>>
>>>> Hello All,
>>>>
>>>> We had shelved my patch, patch 64629<https://gerrit.wikimedia.org/r/64629>,
>>>> in hopes that an earlier patch, patch 61809<https://gerrit.wikimedia.org/r/61809>(bug
>>>> 35233 <https://bugzilla.wikimedia.org/show_bug.cgi?id=35233>), would
>>>> resolve the issue naturally as Google re-indexed. But it appears Google has
>>>> re-indexed and yet the .zero.wikipedia.org URLs are still  present in
>>>> Google's index, instead of the <language>.wikipedia.org URLs.
>>>>
>>>> I have thus resubmitted patch 64629<https://gerrit.wikimedia.org/r/64629> for
>>>> re-review. We will need to further discuss whether it is appropriate to
>>>> have Google completely remove .zero.wikipedia.org links from their
>>>> cache, or if perhaps we need to open a support thread with Google about
>>>> canonical URLs.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, May 28, 2013 at 1:13 PM, Kul Wadhwa <kwadhwa at wikimedia.org>wrote:
>>>>
>>>>> Adam Baso (copied on this email) is working on it and a fix is ready.
>>>>> He'll do some testing to make sure it's resolved.
>>>>>
>>>>> On Tue, May 28, 2013 at 10:22 AM, Tomasz Finc <tfinc at wikimedia.org>wrote:
>>>>>
>>>>>> Looping Dan Foy in who's managing the Zero backlog.
>>>>>>
>>>>>> On Mon, May 27, 2013 at 8:01 AM, MZMcBride <z at mzmcbride.com> wrote:
>>>>>> > K. Peachey wrote:
>>>>>> >>Can you please file this in bugzilla <
>>>>>> https://bugzilla.wikimedia.org>?
>>>>>> >
>>>>>> > https://bugzilla.wikimedia.org/show_bug.cgi?id=48856
>>>>>> >
>>>>>> >
>>>>>> > MZMcBride
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > Wikimedia-l mailing list
>>>>>> > Wikimedia-l at lists.wikimedia.org
>>>>>> > Unsubscribe:
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wikimedia-l mailing list
>>>>>> Wikimedia-l at lists.wikimedia.org
>>>>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Kul Wadhwa
>>>>> Head of Mobile
>>>>> Wikimedia Foundation
>>>>>
>>>>
>>>>
>>>
>>
>


More information about the Wikimedia-l mailing list