On Wed, Jul 23, 2008 at 10:47 AM, Newyorkbrad (Wikipedia)
<newyorkbrad(a)gmail.com> wrote:
A couple of months ago, I raised on this list the
issue of "no-indexing"
Wikipedia pages outside the mainspace, principally including
project-space
pages such as XfDs, AN/ANI, RfA's,
RfAr's, and the like, but possibly
including userspace as well. By no-indexing, I refer to coding these
pages
such that they will not be picked up by Google or
other search engines.
Note that much of this is already done, see our robots file:
http://en.wikipedia.org/robots.txt
Currently all AFD, RFA, RFC and RFAR subpages (but not the main AFD
page, the main RFA page etc) are blocked from indexing. Of your
examples the admin noticeboard and userspace are probably the big
examples of pages that are still indexed that we might not want to be
so.
Note that the robots file can easily be updated by a request on
bugzilla [1] if there is consensus for it.
- That Wikipedia currently lacks a top-quality
internal search
capability,
and therefore we need to be able to use external
search engines such as
Google to perform administrator functions and the like. There is some
merit
On this point, there's been great improvement in MediaWiki's search
capabilities this year with the MWSearch backend coming online.
----
[1] Like this request, for example:
https://bugzilla.wikimedia.org/show_bug.cgi?id=10288
--
Stephen Bain
stephen.bain(a)gmail.com
Thank you for this update. I think there may have been progress that I have
missed in the past couple of months. When I posted on this topic a few
months ago, either some of these types of pages were not yet no-indexed, or
no one mentioned the fact, or if they did I overlooked it.
Other pages that should be excluded from indexing (if they aren't already)
include SSP, RfCU, the old PAIN archives, WQA, and I'm sure people can put
together a list of a few more.
As for userspace, I think the optimal solution would be to allow the
individual user to opt in or out of indexing, if that is doable without too
much fuss. (And indefblocked or banned users would automatically be
no-indexed, to give those with identifiable usernames one fewer grievance to
pursue after they have left us.) Query whether "in" or "out" would be
the
better default.
Newyorkbrad