On Tue, May 07, 2002 at 01:47:57AM -0500, Lee Daniel Crocker wrote:
I'm curious: why is the Spanish text in
wikiTextEs.php encoded
in utf-8? ISO-8859-1 has all the needed characters for Spanish;
The German wikiTextDe.php uses plain ISO, and the current
es.wikipedia.com is in ISO.
All wikipedias are switching to utf-8.
Polish (normally latin2) and Esperanto (normally latin3) both decided
to do that.
I'm only wondering why English didn't do it *yet*
and I why Germans didn't switched.
I hope that both will switch.
Reasons include:
1.
We can't use latinX, we can either use utf8 or latinX +
&lot_of_silly_numeric_entities;
2.
UTF-8 allows you to insert all diactrics and other characters that
happen from time to time in proper names etc. Encoding them in html
codes is huge pita. There is no software that facilitate it.
3.
&silly_numeric_entities; are completely unreadable, and wikipedia
markup language should be easy to read and write.
4.
Interwiki links 100% require utf-8. Making interwiki links using
%-codes is completely out of the question. You can't even use
&silly_entities; in such links, as software don't know in what
encoding should it convert non-ascii characters to %-codes.
5.
General interoperability needs that. You wouldn't even be able to copy
from one wikipedia to another, if they used different charsets,
as software won't know in should convert diactrics to &entities;