On Tue, May 07, 2002 at 01:47:57AM -0500, Lee Daniel Crocker wrote:
I'm curious: why is the Spanish text in wikiTextEs.php encoded in utf-8? ISO-8859-1 has all the needed characters for Spanish; The German wikiTextDe.php uses plain ISO, and the current es.wikipedia.com is in ISO.
All wikipedias are switching to utf-8. Polish (normally latin2) and Esperanto (normally latin3) both decided to do that.
I'm only wondering why English didn't do it *yet* and I why Germans didn't switched.
I hope that both will switch.
Reasons include:
1. We can't use latinX, we can either use utf8 or latinX + &lot_of_silly_numeric_entities;
2. UTF-8 allows you to insert all diactrics and other characters that happen from time to time in proper names etc. Encoding them in html codes is huge pita. There is no software that facilitate it.
3. &silly_numeric_entities; are completely unreadable, and wikipedia markup language should be easy to read and write.
4. Interwiki links 100% require utf-8. Making interwiki links using %-codes is completely out of the question. You can't even use &silly_entities; in such links, as software don't know in what encoding should it convert non-ascii characters to %-codes.
5. General interoperability needs that. You wouldn't even be able to copy from one wikipedia to another, if they used different charsets, as software won't know in should convert diactrics to &entities;