Erik wrote:
Is this true? All I know is that we had a *lot* of problems with broken special chars on the Meta-Wiki during the logo contest. I have no idea which browser broke them, but it seems to be a not totally uncommon one, perhaps in the 5% range. Given that a single edit by such a person will break an entire page, it might not be so wise to switch (but perhaps I'm missing something -- is Meta running UTF-8?).
IIRC meta is. And that fact has created some of the problems you mention. I therefore see no compelling need to convert Latin-1 languages to UTF-8 and in fact think such a switch would be harmful. It is also wrong-headed to state (as Tomasz did) that if people have non-UTF-8-friendly browsers that they should upgrade. That is not the attitude we should have when things work just fine the way they are (at least on the English Wikipedia - others may have more compelling reasons to use UTF-8 that outweigh the negatives).
The only place where UTR-8 would be very useful is with interlanguage links. But that could better be solved by placing all interlanguage links outside of the regular wiki text of pages. That separate edit window could support UTF-8 and be shared by all Wikipedia's. This should minimize damage done by non-UTF-8-compliant browsers and as an added benefit could be part of an easier way to add language links to articles (inputing the links once would create language links in every article listed in the common meta space).
-- Daniel Mayer (aka mav)
On Mon, Nov 17, 2003 at 10:29:25PM -0500, Daniel Mayer wrote:
Erik wrote:
Is this true? All I know is that we had a *lot* of problems with broken special chars on the Meta-Wiki during the logo contest. I have no idea which browser broke them, but it seems to be a not totally uncommon one, perhaps in the 5% range. Given that a single edit by such a person will break an entire page, it might not be so wise to switch (but perhaps I'm missing something -- is Meta running UTF-8?).
IIRC meta is. And that fact has created some of the problems you mention. I therefore see no compelling need to convert Latin-1 languages to UTF-8 and in fact think such a switch would be harmful. It is also wrong-headed to state (as Tomasz did) that if people have non-UTF-8-friendly browsers that they should upgrade. That is not the attitude we should have when things work just fine the way they are (at least on the English Wikipedia - others may have more compelling reasons to use UTF-8 that outweigh the negatives).
The only place where UTR-8 would be very useful is with interlanguage links. But that could better be solved by placing all interlanguage links outside of the regular wiki text of pages. That separate edit window could support UTF-8 and be shared by all Wikipedia's. This should minimize damage done by non-UTF-8-compliant browsers and as an added benefit could be part of an easier way to add language links to articles (inputing the links once would create language links in every article listed in the common meta space).
1. There are many reasons other than interwiki. ISO 8859-1 is broken by design - it doesn't even encode all Latin characters, and other characters are also needed for correct Latin-script typography.
2. Things are NOT fine the way they are. At least not for English Wikipedia.
3. And, as I said, we already break compatibility with very old browsers in many ways. Or do you maybe want to ban all PNGs, OGGs etc., and implement some converter from CSS to HTML3-compatible markup ?
On Mon, Nov 17, 2003 at 10:29:25PM -0500, Daniel Mayer wrote:
Erik wrote:
Is this true? All I know is that we had a *lot* of problems with broken special chars on the Meta-Wiki during the logo
[...]
IIRC meta is. And that fact has created some of the problems you mention. I
Could you point us to the page and revision of the problem? I'm curious what kind of problem it might have been, as many of the Wikipedias are in UTF-8 from the start, and we had no problem whatsoever.
However we *do* have problems with english wikipedia when pages contain unrepresentable literal characters, which makes the page break after editing. See "Budapest" article on wikitravel, where every special dash and curly quote marks became question marks. Truly ugly.
--~~~~
Peter Gervai grin@tolna.net writes:
However we *do* have problems with english wikipedia when pages contain unrepresentable literal characters, which makes the page break after editing.
Yes, the same is valid for the German WP. Pages are served as iso-8859-1 but people don't hesitate to add iso-8859-15 (the EUR symbol) or even windows-1252 ("smart quotes"). The next one, with a conforming browser, edits such an article and the next reader will see some quotations marks or other artefacts - this happens, not that often, though.
See "Budapest" article on wikitravel, where every special dash and curly quote marks became question marks. Truly ugly.
Yes, that's what I meant ;)
wikipedia-l@lists.wikimedia.org