On 7/23/08, Joe Szilagyi szilagyi@gmail.com wrote:
What is the benefit to allowing Google to index DRV, talk pages, and user/user talk pages? Aside from the Mediawiki native search function not being always that great, the only negative to blocking or restricting Search Engines to just cover strictly Article space would be a possible loss of Google Juice, which should not a concern.
As far as I'm concerned, Google juice, i.e. page-rank and whatnot can go jump in the lake.
1. Build a search engine of Google-esque calibre (boolean +A +B -"C D" etc. to search any and all WMF projects of the user's choosing),
2. Put it on the toolserver,
3. Configure the toolserver's robots.txt to unwelcome Google, at least from indexing anything related to the toolserver search engine.
4. Configure all WMF projects' robots.txt to welcome Google indexing only of main-space, article, portal, etc. "content" pages.
5. (optional, sounds quite tricky) Split the category namespace. Figure out some way to train google-bot to:
index content categories like *Category:English popes *Category:Bob Dylan songs *Category:Pacific Ocean
ignore logistical crap like *Category:Articles with unsourced statements since December 2006 *Category:Unsuccessful requests for adminship *Category:Suspected Wikipedia sockpuppets of Janis Doe *Category:Start-Class biography (sports and games) articles *Category:Sports templates by country etc. etc.
could somebody think of a reliable way to do this, short of creating a separate name-space?
—C.W.