I was wondering about the pros and cons of utf-8 for the French Wikipedia:
Pros: * improve interwiki links : any non latin-1 (ISO-8859-1 = default charset) link would be possible without any transformation. Many non latin-1 links are copied and pasted in raw format by interwiki wanderers but not checked afterwards; they are always miscoded, which results in loss of time to fix them;
* improve orthography (and articles naming): French uses the famous <oe> digraph that is not encoded by latin-1 (latin-9 does); every editor must either type the HTML entity œ or prefer not to encode it, resulting in misspelled words (one of our bots, Orthogaffe, when it was used for orthography purpose, had many "oeuvre -> œuvre" replacements to do);
* terminate transcodage problems: many editors do not use Windows and its codepages; other do, but with Win-1252 or Unicode as default charset. When some text is pasted from an application not using strict latin-1 (but Win-1252, MacRoman, etc.) to some wiki editing area, it is badly transcoded by the Wiki-soft, resulting in many raw quotation marks and <oe> ligatures being replaced by question marks.
========
Cons: * any text containing non ASCII characters would increase its weight : instead of one byte for a single <c with cedilla>, it would require two; French uses lots of non ASCII characters, as é è ç à ù; * I do not see other cons.
Would it be possible, thus, to make utf-8 default charset for the French Wikipedia?
Vincent Ramos