Non-ASCII characters in interwiki links, URLs, etc. - Wikitech-l

28 Nov 2004

Hi all!

It seems to me that handling of non-ASCII characters in interwiki
links (or in URLs in general) is a bit problematic. As an example,
take [[en:Václav Havel]]. Since en: does not use UTF-8, the URL is
".../V%E1clav_Havel". If you try to use the interwiki link to cs:
(specified in the source as [[cs:Václav Havel]]), it leads to
http://cs.wikipedia.org/wiki/V%E1clav_Havel, which is _wrong_, because
the cs: Wikipedia uses UTF-8 and the proper link should be
".../V%C3%A1clav_Havel". And, vice versa, the Czech article contains
an interwiki link (specified again as [[en:Václav Havel]]) leads to
http://en.wikipedia.org/wiki/V%C3%A1clav_Havel, which is, again,
wrong.

I believe that a correct solution (apart from the long-term solution
of using UTF-8 everywhere) could be:
* Accept UTF-8 in URLs on en: (but how could they be recognized??)
* Interwiki linking should use UTF-8 even on en: (or, does another
Wikipedia except en: use latin-1?)

Best regards,
[[cs:User:Mormegil | Petr Kadlec]]