On 7/23/08, Stephen Bain stephen.bain@gmail.com wrote:
On Wed, Jul 23, 2008 at 10:47 AM, Newyorkbrad (Wikipedia) newyorkbrad@gmail.com wrote:
A couple of months ago, I raised on this list the issue of "no-indexing" Wikipedia pages outside the mainspace, principally including
project-space
pages such as XfDs, AN/ANI, RfA's, RfAr's, and the like, but possibly including userspace as well. By no-indexing, I refer to coding these
pages
such that they will not be picked up by Google or other search engines.
Note that much of this is already done, see our robots file:
http://en.wikipedia.org/robots.txt
Currently all AFD, RFA, RFC and RFAR subpages (but not the main AFD page, the main RFA page etc) are blocked from indexing. Of your examples the admin noticeboard and userspace are probably the big examples of pages that are still indexed that we might not want to be so.
Note that the robots file can easily be updated by a request on bugzilla [1] if there is consensus for it.
- That Wikipedia currently lacks a top-quality internal search
capability,
and therefore we need to be able to use external search engines such as Google to perform administrator functions and the like. There is some
merit
On this point, there's been great improvement in MediaWiki's search capabilities this year with the MWSearch backend coming online.
[1] Like this request, for example: https://bugzilla.wikimedia.org/show_bug.cgi?id=10288
-- Stephen Bain stephen.bain@gmail.com
Thank you for this update. I think there may have been progress that I have missed in the past couple of months. When I posted on this topic a few months ago, either some of these types of pages were not yet no-indexed, or no one mentioned the fact, or if they did I overlooked it.
Other pages that should be excluded from indexing (if they aren't already) include SSP, RfCU, the old PAIN archives, WQA, and I'm sure people can put together a list of a few more.
As for userspace, I think the optimal solution would be to allow the individual user to opt in or out of indexing, if that is doable without too much fuss. (And indefblocked or banned users would automatically be no-indexed, to give those with identifiable usernames one fewer grievance to pursue after they have left us.) Query whether "in" or "out" would be the better default.
Newyorkbrad