Before we start converting and upgrading things willy-nilly, I wanted to
check on people's (ie, both the Bomis masters' and the general
community's) preferences about the use of the Wikipedia name...
In the new software, certain classes of page are set off in special
namespaces to distinguish them from actual encyclopedia articles: talk:
for talk pages, special: for interactive functions, user: for users'
personal pages, log: for automatically generated logs, and wikipedia:
for meta-topics that aren't quite meta enough for the separate meta wiki
-- help pages, site news, bug reporting, certain user-involved
maintanence functions, etc.
For the versions in other languages, at least some of these namespaces
will be translated (particularly talk: and user:). Not difficult
technically, but my question is: what do we do about "wikipedia:"? Both
as far as the namespace for article titles, and the giant "WIKIPEDIA"
that appears at the top of every page with the new "Cologne Blue"
interface.
Specific issues:
* The forked Spanish group is also in the process of adopting the new
software, but they're not using the "Wikipedia" name for their project.
It's easy enough to provide an alternative for this and other cases with
a flick of a variable in the code.
* On the Esperanto wikipedia, we often use a regularly Esperantized form
of the name, changing the foreign "w" to "v" and giving it an "o" noun
ending -> "Vikipedio". This is more comfortable to use -- not just
referring to the EO version, but when referring in that language to
Wikipedia generally -- and the 'pedia has already received web, radio,
and magazine publicity under that name.
* There are fledgling wikipedias with between 1 and 17 pages for
languages that natively use a non-Latin writing system: Japanese,
Russian, Arabic, Hebrew, Chinese. For as long as I've been aware of it,
the Japanese stub page has included a katakana-ized form of the name
(roughly UiKIPIDiA) on the main page, with no appearance of the
English/Roman form outside the URL.
For page headers and titles and whatnot, should we:
A) Always use the original English/Roman form "Wikipedia" even if it's
uncomfortably foreign or difficult to type?
B) Always use the familiar/nativized form if there is one, so that users
of each language can comfortably read and type it?
C) Use "Wikipedia" for the big page header (that's brand recognition),
but familiar/nativized form for the namespace (that's going to be used
in the trenches and needs to be easy for users to type in links)?
D) Other?
Personally, I'm leaning towards C.
-- brion vibber (brion @ pobox.com)
On Saturday 18 May 2002 12:01 pm, you wrote:
> Message: 6
> Date: Fri, 17 May 2002 15:37:47 -0700
> From: lcrocker(a)nupedia.com
> To: wikipedia-l(a)nupedia.com
> Subject: [Wikipedia-l] Robots and special pages
> Reply-To: wikipedia-l(a)nupedia.com
>
> This is a multi-part message in MIME format...
>
> ------------=_1021675067-23053-0
> Content-Type: text/plain
> Content-Disposition: inline
> Content-Transfer-Encoding: binary
>
> A discussion just came up on the tech list that deserves input from
> the list at large: how do we want to restrict access (if at all) to
> robots on wikipedia special pages and edit pages and such?
My two cents (well maybe a bit more),
On talk pages: OPEN to bots
Its A OK for bots to index talk pages -- these pages often have interesting
discussion that should be on search engines. Of course, if this becomes a
performance issue then we could prevent bots from indexing these.
On wikipedia pages: OPEN to bots
I STRONGLY feel that wikipedia pages should be open to bots -- remember we
are also trying to expand our community here and people do search for those
things on the net.
On user pages: OPEN to bots
I also don't see anything wrong with letting bots crawl all over user pages
-- I occasionally browse personal home pages of other people that have
similar interests to myself. This project isn't just about the articles it is
also about community building.
On log, history, print and special pages: CLOSED to bots (closed at least for
indexing -- not sure about allowing the 'follow links' function. Would
closing this allow bots to do their thing faster or slower? Is this at all
important for us to consider? If a bot can index our site fast, will it do it
more often?)
I think that the wikipedia pages are FAR better at naturally explaining what
the project is about than are the log, history and special pages are -- these
pages are far too technical and change too quickly to be useful for any
search performed on a search engine. There is also limited utility of having
direct links to the Printable version of articles -- these don't have any
active wiki links in them which obscures the fact that the page is from a
wiki.
Having history pages in the search results of external search engines is
potentially dangerous, since somebody could easily click into an older
version and save it -- thus reverting the article and unwittingly "earning"
the label of VANDAL (even if they did make a stab at improving the version
they read). Another reason to disallow bots access to history is because
there often is copyrighted material in the history of pages that has since
been removed from the current article version (it would be nice for an admin
to able to delete just an older version of an article BTW).
On Edit links: CLOSED to bots (for index and probably follow links)
The edit links REALLY should NOT be allowed to be indexed by any bot: When
somebody searches for something on a search engine, gets a link to our site,
and clicks on it; do we want them to be greeted with an edit window? They
want information -- not an edit window. No wonder we have so many pages that
only have "Describe the new page here" as their only content.
I've been tracking this for awhile and almost every one of these pages that
are created, are created by an IP that never returns to edit again. Many (if
not most) of these "mysteriously" created pages are probably from someone
clicking from a search engine, becoming puzzled by the edit window, and
hitting the save button in frustration. Heck, I think I may have created a
few of these in my pre-wiki days.
This has become a bit of a maintenance issue for the admins -- we can't
delete these pages fast enough, let alone create stubs for them. If left
unchecked, this could reduce the average quality of wikipedia articles and
give people doubt as to whether an "active" wiki link really has an article
(or even a stub) behind it.
There could, of course, be a purely technical fix for this by having the
software not recognize newly created blank or "Describe the new page here"
pages as being real pages (a Good Idea BTW). But then we still would have
frustrated people who were looking for actual info that in the future may
avoid clicking through to our site because of a previous "edit window
experience".
Conclusion:
We should try to put our best foot forward when allowing bots to index the
site and only allow indexing of pages that have information which is
potentially useful for the person searching.
Edit widows and outdated lists are NOT useful to somebody clicking through
for the first time (Recent Changes might be the only exception: Even though
any index of this will be outdated, it is centrally important to the project
and fairly self-explanatory). Links to older versions of articles and to
history pages also sets-up would-be contributors into becoming labeled as
"vandals" when trying to edit an older version -- thus turning them away
forever.
Let visitors explore a real article first and discover the difference between
an edit window an an actual article -- then they can decide about becoming a
contributor, visitor or even a developer for that matter.
maveric149
> One thing the nonprofit organisation could do is buy
> domains for the
> international wikipedias, at least the active ones.
> It would be nicer to
> promote.
Well, they've already bought vikipedio.com for us.
Unfortunately it points to eo.wikipedia.com, but none
of the links on there work. It is partly my fault
though because Jimmy asked me to bug him every day
until it was fixed and it seems I've been slacking on
the job. Hey Jimmy, could you look into this? :)
Chuck
=====
Come to my homepage! Venu al mia hejmpagxo!
http://amuzulo.babil.komputilo.org/
====
Venu al la senpaga, libera enciklopedio
esperanta reta! http://eo.wikipedia.com/
_________________________________________________________
Do You Yahoo!?
Información de Estados Unidos y América Latina, en Yahoo! Noticias.
Visítanos en http://noticias.espanol.yahoo.com
hi all.
I stubled upon a new article mentioning our little project:
http://www.computeruser.com/articles/2105,4,33,1,0501,02.html
Overall, the article is very positive and enthusiastic. Funnily enough
in the final paragraph we read:
'I also strongly advise not clicking on the "All Pages" link in the right-hand column from
Wikipedia's main page, unless you need to leave the computer for about 10 minutes.'
:-)
regards,
[[user:WojPob]]
<wojtek[at]seti23[dot]org>
what burns twice as bright, burns half
as long, and you have burnt so very,
very brightly roy.
> Every now and then I'll find (at Bomis) some jerk who wrote a
> homemade bot that's hammering us.
It doesn't have to be homemade though. If I didn't have a fast
always-on and flat-rate internet connection, the first thing I would do
upon coming across Wikipedia would be to whip out wget and download
the baby, so that I can browse offline. I can't be the only one.
Axel
>if a spider goes to Recent Changes and then to "Last 5000 changes"
>(and last 90 days, and last 30 days, and last 2500 changes, and last
>1000 changes, and every such combination) it seems to me the server
>load could get pretty high. Perhaps talk pages should be spidered,
>but not recent changes or the history (diff/changes).
I agree. Every RecentChanges page contains links to 13 other
RecentChanges, and one of them changes its URL each time the page is
loaded. The other special: pages like statistics, all pages, most
wanted etc. seem to be good candidates for robot exclusion as well:
they stress the database but don't provide much useful information for
indices.
Regarding talk:, wikipedia: and user: pages, I don't see any reason not
to have them indexed.
Diff pages seem to be useless to spiders since the same information
is contained in the two article versions.
Remaining question is: what about article histories and old versions
of articles? Do we want Google to have a copy of every version of
every article, or only the current one?
Axel
----- Original Message -----
From: "Axel Boldt" <axel(a)uni-paderborn.de>
To: <wikitech-l(a)nupedia.com>
Sent: Friday, May 17, 2002 1:43 AM
Subject: [Wikitech-l] Cause for slowdowns: spiders?
| Right now, I'm seeing nice and fast responses, except every
| once in a while everything slows to a halt. If that's due to our
| script, then there must be some really bad, really rare special
| function somewhere. I doubt that.
|
| Maybe the slowdowns are due to spiders that hit our site and request
| several pages at once, in parallel, like many of these multithreaded
| programs do. I read somewhere that everything2.com for this very
| reason has disallowed spiders completely and doesn't even allow Google
| to index their site anymore.
Robots can be useful as probably Google contributes a lot
to bringing new visitors to the site. Being a community this surely is
a usefull thing. An issue worth considering is banning robots from
special pages, Recent Changes, Talk, User, etc. by an appropriate META tag.
IMHO, robots and spiders should only be allowed to go over the articles.
<snip>
regards,
[[user:WojPob]]
On Sat, 18 May 2002, Karen AKA Kajikit kaji(a)labyrinth.net.au XXXXXXXXXXXXXXXXXXXXX wrote:
> The wikipedia is working much better since the work was done on it. I
> seem to be able to do anything I want in the morning and afternoon
> without any problems. But in the evening it slows to a crawl and access
> becomes extremely sporadic leaning to impossible. Might that be the time
> when Google and other search engines set their automatic bots loose to
> trawl for new articles? Or maybe it's just due to general access
> demands.
>
> I've got a feeling it's related to timezones. I'm in Australia, on the
> opposite side of the world to most Wikipedia contributors...
>
> --
>
> Karen AKA Kajikit
General access demands. Certainly crawlers don't send all their bots at
the same time - bots are designed to not slow sites down as much as
possible.
It could be Wikipedia, or your ISP or something in between. When my aunt
lived in Spain she would coordinate her surfing time with low usages times
in the US.
Ian Monroe
http://ian.webhop.org
There are a whole bunch of pages that were created as protected pages
when they were meant to be redirects. You can fix the redirect at this
point by deleting the surplus text, but that leaves the protected page
sitting there by itself... these pages apparantly cannot be linked to
or accessed once the false redirect is gone - I tried adding them to the
'to be deleted' list and the link doesn't transfer. So what's to be done
about them? Should I stop correcting the redirects? Or doesn't the empty
page matter once it's out of the linking system?
--
Karen AKA Kajikit
Come and visit my part of the web:
Kajikit's Corner: http://Kajikit.netfirms.com/
Aussie Support Mailing List: http://groups.yahoo.com/group/AussieSupport
Allergyfree Eating Recipe Swap:
http://groups.yahoo.com/group/Allergyfree_Eating
Love and huggles to all!
The wikipedia is working much better since the work was done on it. I
seem to be able to do anything I want in the morning and afternoon
without any problems. But in the evening it slows to a crawl and access
becomes extremely sporadic leaning to impossible. Might that be the time
when Google and other search engines set their automatic bots loose to
trawl for new articles? Or maybe it's just due to general access
demands.
I've got a feeling it's related to timezones. I'm in Australia, on the
opposite side of the world to most Wikipedia contributors...
--
Karen AKA Kajikit
Come and visit my part of the web:
Kajikit's Corner: http://Kajikit.netfirms.com/
Aussie Support Mailing List: http://groups.yahoo.com/group/AussieSupport
Allergyfree Eating Recipe Swap:
http://groups.yahoo.com/group/Allergyfree_Eating
Love and huggles to all!