We just never gave the programmers a good set of requirements for the
On Mon, 25 Feb 2002, Lars Aronsson wrote:
> Larry Sanger wrote:
> > I agree 100% that this is a problem. Last night I deleted several dozen
> > files that someone had overwritten as being (obviously) inappropriate for
> > Wikipedia articles. I was a little concerned from the beginning that
> > having virtually no restrictions on the upload function would have this
> > effect, so it's not too surprising that this is happening.
> >From a general, Wiki-philosophical-social aspect, it is interesting
> that the upload function gets abused, while general Wiki pages do not.
Actually, there's a good reason for it: the images aren't obviously linked
to anything in any article. This is an ABSOLUTELY essential piece of
information to have: what articles *use* the image in question? If no
article uses an image after 24 hours, perhaps we should delete the image
(or put it in a queue to be deleted by a human). So, the point is,
without a context, unless some image is at face value obviously worthless
to any Wikipedia article (e.g., porn advertisements), it's difficult for
us to tell whether an image really is appropriate for the 'pedia. It
would even make it easier for us to determine whether an image is
One way around this would be to attach images to unique articles, so that
the uploading of an image would be logged in a particular article's
history. I don't know if I like this suggestion, though, I'm just
throwing it out there for your consideration.
Here's another thing we need in that upload form. We should ask people to
choose: (1) I have created this image and release it under the GNU FDL (or
contribute it to Wikipedia); (2) I personally certify that this image is
public domain (if checked, add a text box requiring that a source be
given--a URL or else a book title, say); (3) other? If none are checked,
then the uploader wouldn't accept the article.
Under some schemes we might want (1) to require that the uploader identify
which article the image is going to be used in, and (2) to check that the
image title is linked to from that article. But (1) might be done
automatically, I guess...
Doing these things would remove a fair bit of the abuse. It would
certainly make it a lot easier for the community to act as a check on the
> Perhaps the uploads should be visible in the RecentChanges list?
They already are, sort of--but each one individually should be, which
isn't the case now.
> Perhaps there should be a "view other versions" for each upload?
Maybe--would prevent people from uploading porn in place of legit images,
> Perhaps a Wikipage in the upload: namespace for each uploaded object?
I'm reposting this to wikitech-l so that discussion doesn't get lost.
On mer, 2002-02-20 at 15:20, lcrocker(a)nupedia.com wrote:
> You Wrote:
> > Please take a
> > look at the non-English non-ISO-8859-1 wikipedias sometime.
> >Hundreds of pages, with correct charset headers:
> > ISO-8859-2:
> > http://pl.wikipedia.com/
> > UTF-8 with a custom conversion function for certain character
> > sequences:
> > http://eo.wikipedia.com/
> You're right. Last time I looked at these, the test pages I retrieved
> gave 404s, and the 404 page is still served as ISO-8859-1, but the
> headers of contentful pages are indeed as you say: 8859-2 for "pl"
> and UTF-8 for "eo", etc.
> OK, then, I guess we do have to wade into the morass of national
> character sets.
Unless you want to switch to UTF-8, that is a given.
> I have little or no experience using actual foreign-
> made computers; but I /do/ have extensive knowledge about character
> sets and communication protocols, so I'm just trying to make sure we
> don't make the same mistakes hundreds of others have made in the past
> by not getting this stuff right up front, but just diving headlong
> into coding without stepping back a moment to design something that
> will be usable and maintainable in the future.
> The way it is now, for example, we won't be able to cut-and-paste
> between wikis if, say, I wanted to include a quote from some Polish
> leader or something.
Sad but true.
> Maybe that's a reasonable sacrifice for ease of
> editing on those wikis.
Lee, let me put it this way. Imagine, if you will, that history had gone
somewhat differently. Let's say that the first computers had been
developed in a politically free, economically strong, highly
industrialized Russia and the standard computer character set around the
world had been based on the Cyrillic alphabet.
In our hypothetical world, there's a Russian version of what we would
have called Wikipedia. They set up some subsites in other languages, one
of which is English, which uses the Latin alphabet.
Now, you want to add some articles to the English site, but the site
administrators have declared that only the standard cyrillic character
set is to be used, with special markup to allow other characters through
the use of numerical codes. This means:
* Pages display fine for viewing, but when you edit, you see nothing
but numeric escape codes.
* You can't type *a single letter of English text* without using a
special numeric escape code.
* All page titles have to be transliterated into Cyrillic, because the
escape codes aren't allowed in titles.
Now, can you honestly tell me that you expect the average
English-speaking wiki contributor to edit a page that looks something
collaborative project to produce a complete encyclopedia from scratch.
We started in
January 2001 and
I can't imagine that you would expect that to be acceptable to anyone
else! You'll notice that the two non-ISO-8859-1-language 'pedias that
have actual content (Polish and Esperanto) both use the Latin alphabet
with a few diacritics. So theoretically, they would be the *most*
amenable to using HTML entities -- you can almost read text in the edit
box that way -- yet users of both wikipedias took the effort to tweak
the program to make their customary character encodings work so that
they could actually find people who would be willing to edit pages.
HTML entities are great for tossing in an occasional foreign letter or
word, but at the user level they are poor for regularly used diacritics
and utterly useless for text in other alphabets.
> We could, alternatively, serve UTF-8 on all
> of them, but that would risk breaking older browsers. There are side
> issues of what is stored in the database, and what is allowable in
> titles/URLs, etc.
Another alternative is to use the entities internally in the database,
but work some mojo to make them appear as normal characters in the edit
box. Which means you get zero advantage over simply using the national
character set -- you still have to send a character set header, you have
to know which Unicode characters can be passed through safely and which
need to be escaped, the search engine still breaks words, you still
can't capitalize non-ISO8859-1 titles, you still can't cut-n-paste, etc
etc etc. All of the pain, none of the gain.
> We really need to sit down and spec this out before we get too far
> down the road. That's one reason why I posted the proposed policy on
> foreign characters for the English Wiki; it is explicitly for the
> English one only, but we need something equivalent for the other
> We had a lot of discussion about these topics in the early months of
> the project: I don't want us to ignore everything we learned back
> then just because the folks working on the code now weren't around
> back then.
Indeed. What were the conclusions of these discussions, and the
reasoning behind them?
-- brion vibber (brion @ pobox.com)
While I have to agree that it would be nicer to have the use of colons in
titles, I don't think it would be better to have "content:" preceding
every Wikipedia title. Colonless page titles could be automatically
converted to content: titles (so that [[foo]] would be saved automatically
as [[content:foo]]), but it would make the system more complicated and
more importantly it would make the titles and the URLs uglier.
This is really more of a technical issue than a policy one. I mean, from
a policy standpoint, it's great to have the use of colons in titles. If
there aren't any technical objections, we should do it. But, of course,
there might be technical objections. (Cc'ing wikitech-l.)
> > > Thus instead of the ":" being a reserved character anywhere in a
> > > title, only "user:", "talk:", "wikipedia:" etc. need to be
> > > reserved. Any other uses of colons should be fine. This will let us
> > > have entries for books with standard formatting of the subtitle (e.g.,
> > > "The Muggles: A Tale of Woe") or other natural uses of the colon.
> > There is a simpeler solution. The actual contents of Wikipedia get the
> > namespace "content:". If everything is prefixed with a namespace then the
> > first colon is always the end of the namespace.
> That's perfect.
Dear fellow programmers,
now that the wikipedia software is running (and quite well), thanks to many
volunteers, and work on that software turns from bug-fixing to fine-tuning,
I would like to open another front ;)
For some time now I've been working on a completely rewritten version of the
Nupedia software (http://www.nupedia.com). For those of you who don't know,
Nupedia was the original, peer-reviewed Bomis encyclopedia project, and
wikipedia was a "spin-off". But, the Nupedia approval process proved to be
too slow and complicated, and the project came to a standstill, with only a
dozen or so articles actually online. Recently, the Nupedia group voted
about a streamlined article review process, and a draft policy is currently
Why does this require a new software? Well, it doesn't. But I worked on the
current Nupedia software, and it strikes me as a multi-redundant
HTML-PHP-mix. It is working, and it can be altered, but IMO it will be more
work to change the old software to a revised review process than to write a
new one. Also, some other changes will have to be made, like multi-language
interfaces and databases, which would further complicate a software change.
So, silently, I created a new SourceForge project, using "nunupedia" as a
working title (the encyclopedia title will remain "Nupedia"!). Both Jimbo
and Larry agreed that this can eventually become the official Nupedia
software. You can see a demo at
Only some parts of it do currently work. Under "articles in progess", you
can see a demo article with online discussion. You can become a member and
sumbit articles (but, don't use valuable stuff;)
I would have preferred to complete a basically working version before "going
public", like I did with the Wikipedia PHP script, but currently, I bluntly
don't have the time for that :(
So, I ask you, to help me with that. Remember, Nupedia will most likely
become a "stable version" of wikipedia ;)
If you are interested in developing this software, mail me or the list.
Don't forget your SourceForge name, so I can give you CVS write access.
Thanks for listening,
As the subject says these columns have now been removed from the database
schema and all the code that used them has been replaced by code that uses
the new (un)linked tables. This completes a major change in the database
scheme, so I will now wait until Jimbo has installed the latest CVS version,
and see what bugs/problems appear.
There is still a lot I can do in terms of speeding certain pages up, such as
the short-pages page, the long-pages page, the orphans page and the
most-wanted page. But for this I would again have to extend the database
scheme, so I first suggest we freeze the scheme and after the bugs have been
ironed out, I suggest we determine what would be the special page we want to
-- Jan Hidders
The code donated by Dan Keshet has been integrated. You can find it in the
Quick bar under the name ''watch page links". Not a very good name, but I
couldn't think of a better one that would fit.
I have also optimzed the SQL for the what-links-here page for the new
Finally, I added some code to the watchlist page to deal with pages on the
watch-list that no longer exist. A remaining problem is that users cannot
delete such pages from their watch list.
-- Jan Hidders
I've had this itch for a while, and I've finally scratched it: being
able to display the recent changes of all pages that are linked to
from a particular page. You can see the function I wrote at work
You could use this feature to get an overview of the activity in a
given section (say, "Mathematics") without having to load/unload the
entire section into your watch list. Alternately, you could use this
to build group or public watch lists, the wiki way.
Would you like this feature in the php script? Should I send someone
-- Dan Keshet
PS: (Please to: or cc: me; I'm not on the list.)
PPS: Great job everybody who's worked on the script. Wikipedia's
looking really great. :)
----- Original Message -----
From: "Magnus Manske" <Magnus.Manske(a)epost.de>
To: "Jan Hidders" <hidders(a)uia.ua.ac.be>
Sent: Friday, February 22, 2002 3:48 PM
Subject: RE: [Wikitech-l] recent changes linked
> I checked the online example, and it would be great to have such a thing
> the sidebar!
> Jan, as you are working on the link table, could you get the code and try
> integrate it, if you have time?
But of course. Send me the code and I'll adapt it and add it to the sidebar.
-- Jan Hidders