On Saturday 18 May 2002 12:01 pm, you wrote:
Message: 6 Date: Fri, 17 May 2002 15:37:47 -0700 From: lcrocker@nupedia.com To: wikipedia-l@nupedia.com Subject: [Wikipedia-l] Robots and special pages Reply-To: wikipedia-l@nupedia.com
This is a multi-part message in MIME format...
------------=_1021675067-23053-0 Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: binary
A discussion just came up on the tech list that deserves input from the list at large: how do we want to restrict access (if at all) to robots on wikipedia special pages and edit pages and such?
My two cents (well maybe a bit more),
On talk pages: OPEN to bots
Its A OK for bots to index talk pages -- these pages often have interesting discussion that should be on search engines. Of course, if this becomes a performance issue then we could prevent bots from indexing these.
On wikipedia pages: OPEN to bots
I STRONGLY feel that wikipedia pages should be open to bots -- remember we are also trying to expand our community here and people do search for those things on the net.
On user pages: OPEN to bots
I also don't see anything wrong with letting bots crawl all over user pages -- I occasionally browse personal home pages of other people that have similar interests to myself. This project isn't just about the articles it is also about community building.
On log, history, print and special pages: CLOSED to bots (closed at least for indexing -- not sure about allowing the 'follow links' function. Would closing this allow bots to do their thing faster or slower? Is this at all important for us to consider? If a bot can index our site fast, will it do it more often?)
I think that the wikipedia pages are FAR better at naturally explaining what the project is about than are the log, history and special pages are -- these pages are far too technical and change too quickly to be useful for any search performed on a search engine. There is also limited utility of having direct links to the Printable version of articles -- these don't have any active wiki links in them which obscures the fact that the page is from a wiki.
Having history pages in the search results of external search engines is potentially dangerous, since somebody could easily click into an older version and save it -- thus reverting the article and unwittingly "earning" the label of VANDAL (even if they did make a stab at improving the version they read). Another reason to disallow bots access to history is because there often is copyrighted material in the history of pages that has since been removed from the current article version (it would be nice for an admin to able to delete just an older version of an article BTW).
On Edit links: CLOSED to bots (for index and probably follow links)
The edit links REALLY should NOT be allowed to be indexed by any bot: When somebody searches for something on a search engine, gets a link to our site, and clicks on it; do we want them to be greeted with an edit window? They want information -- not an edit window. No wonder we have so many pages that only have "Describe the new page here" as their only content.
I've been tracking this for awhile and almost every one of these pages that are created, are created by an IP that never returns to edit again. Many (if not most) of these "mysteriously" created pages are probably from someone clicking from a search engine, becoming puzzled by the edit window, and hitting the save button in frustration. Heck, I think I may have created a few of these in my pre-wiki days.
This has become a bit of a maintenance issue for the admins -- we can't delete these pages fast enough, let alone create stubs for them. If left unchecked, this could reduce the average quality of wikipedia articles and give people doubt as to whether an "active" wiki link really has an article (or even a stub) behind it.
There could, of course, be a purely technical fix for this by having the software not recognize newly created blank or "Describe the new page here" pages as being real pages (a Good Idea BTW). But then we still would have frustrated people who were looking for actual info that in the future may avoid clicking through to our site because of a previous "edit window experience".
Conclusion:
We should try to put our best foot forward when allowing bots to index the site and only allow indexing of pages that have information which is potentially useful for the person searching.
Edit widows and outdated lists are NOT useful to somebody clicking through for the first time (Recent Changes might be the only exception: Even though any index of this will be outdated, it is centrally important to the project and fairly self-explanatory). Links to older versions of articles and to history pages also sets-up would-be contributors into becoming labeled as "vandals" when trying to edit an older version -- thus turning them away forever.
Let visitors explore a real article first and discover the difference between an edit window an an actual article -- then they can decide about becoming a contributor, visitor or even a developer for that matter.
maveric149