I was wondering about the pros and cons of utf-8 for the French
Wikipedia:
Pros:
* improve interwiki links : any non latin-1 (ISO-8859-1 = default charset)
link would be possible without any transformation. Many non latin-1
links are copied and pasted in raw format by interwiki wanderers but
not checked afterwards; they are always miscoded, which results in loss
of time to fix them;
* improve orthography (and articles naming): French uses the famous <oe>
digraph that is not encoded by latin-1 (latin-9 does); every editor
must either type the HTML entity œ or prefer not to encode it,
resulting in misspelled words (one of our bots, Orthogaffe, when it
was used for orthography purpose, had many "oeuvre -> œuvre"
replacements to do);
* terminate transcodage problems: many editors do not use Windows
and its codepages; other do, but with Win-1252 or Unicode as default
charset. When some text is pasted from an application not using strict
latin-1 (but Win-1252, MacRoman, etc.) to some wiki editing area,
it is badly transcoded by the Wiki-soft, resulting in many raw
quotation marks and <oe> ligatures being replaced by question marks.
========
Cons:
* any text containing non ASCII characters would increase
its weight : instead of one byte for a single <c with cedilla>, it
would require two; French uses lots of non ASCII characters, as
é è ç à ù;
* I do not see other cons.
Would it be possible, thus, to make utf-8 default charset
for the French Wikipedia?
Vincent Ramos