[WikiEN-l] No-indexing of project-space pages

Charlotte Webb charlottethewebb at gmail.com
Wed Jul 23 16:09:12 UTC 2008


On 7/23/08, Joe Szilagyi <szilagyi at gmail.com> wrote:
> What is the benefit to allowing Google to index DRV, talk pages, and
> user/user talk pages? Aside from the Mediawiki native search function not
> being always that great, the only negative to blocking or restricting Search
> Engines to just cover strictly Article space would be a possible loss of
> Google Juice, which should not a concern.

As far as I'm concerned, Google juice, i.e. page-rank and whatnot can
go jump in the lake.

1. Build a search engine of Google-esque calibre (boolean +A +B -"C D"
etc. to search any and all WMF projects of the user's choosing),

2. Put it on the toolserver,

3. Configure the toolserver's robots.txt to unwelcome Google, at least
from indexing anything related to the toolserver search engine.

4. Configure all WMF projects' robots.txt to welcome Google indexing
only of main-space, article, portal, etc. "content" pages.

5. (optional, sounds quite tricky) Split the category namespace.
Figure out some way to train google-bot to:

index content categories like
*Category:English popes
*Category:Bob Dylan songs
*Category:Pacific Ocean

ignore logistical crap like
*Category:Articles with unsourced statements since December 2006
*Category:Unsuccessful requests for adminship
*Category:Suspected Wikipedia sockpuppets of Janis Doe
*Category:Start-Class biography (sports and games) articles
*Category:Sports templates by country
etc. etc.

could somebody think of a reliable way to do this, short of creating a
separate name-space?

—C.W.



More information about the WikiEN-l mailing list