New subject: UTF-8 conversion was Multilingual interface

11 May 2004


      ...
-----Message d'origine-----
De : wikitech-l-bounces@Wikipedia.org
[mailto:wikitech-l-bounces@Wikipedia.org]De la part de Brion Vibber
Envoyé : mardi 11 mai 2004 07:48
À : Wikimedia developers
Objet : Re: [Wikitech-l] Multilingual interface
Nikola Smolenski wrote:
...
Yes. One can use all languages in same encoding (all
Latin-1, all Latin-2,
...
all UTF-8...) but can not mix encodings. It is trivial to
convert any language
...
to UTF-8, except for the linktrail which is not used anyway.
Wikisource,
...
Wikibooks and Wiktionary are in UTF-8 already, so I don't
think it will be a
...
problem for them.
All the remaining latin-1 language files will have to be 
upgraded to do 
that, or appropriate run-time upconversion added.
I have an idea, we had converted the frwiki by converting all the dump in one time. I took around 2 hours for 2.5GB of dump
We could convert all tables except old, the biggest and the longest. And running a script which convert old later. Old will be in broken iso-8859 for a few day. I wrote a small php script to convert old from iso-8859-1 to utf-8, cause I broke old on de.wiktionary during the conversion to utf-8 (I forgot that old is gzipped, converting a binary is not very good:) )
What do you think about ?
Shaihulud

UTF-8 conversion was [Wikitech-l] Multilingual interface