-----Message d'origine----- De : wikitech-l-bounces@Wikipedia.org [mailto:wikitech-l-bounces@Wikipedia.org]De la part de Brion Vibber Envoyé : mardi 11 mai 2004 07:48 À : Wikimedia developers Objet : Re: [Wikitech-l] Multilingual interface
Nikola Smolenski wrote:
Yes. One can use all languages in same encoding (all
Latin-1, all Latin-2,
all UTF-8...) but can not mix encodings. It is trivial to
convert any language
to UTF-8, except for the linktrail which is not used anyway.
Wikisource,
Wikibooks and Wiktionary are in UTF-8 already, so I don't
think it will be a
problem for them.
All the remaining latin-1 language files will have to be upgraded to do that, or appropriate run-time upconversion added.
I have an idea, we had converted the frwiki by converting all the dump in one time. I took around 2 hours for 2.5GB of dump We could convert all tables except old, the biggest and the longest. And running a script which convert old later. Old will be in broken iso-8859 for a few day. I wrote a small php script to convert old from iso-8859-1 to utf-8, cause I broke old on de.wiktionary during the conversion to utf-8 (I forgot that old is gzipped, converting a binary is not very good:) )
What do you think about ?
Shaihulud
Constans, Camille (C.C.) wrote:
I have an idea, we had converted the frwiki by converting all the dump in one time. I took around 2 hours for 2.5GB of dump We could convert all tables except old, the biggest and the longest. And running a script which convert old later. Old will be in broken iso-8859 for a few day. I wrote a small php script to convert old from iso-8859-1 to utf-8, cause I broke old on de.wiktionary during the conversion to utf-8 (I forgot that old is gzipped, converting a binary is not very good:) )
What do you think about ?
That would be really problematic, it would interfere with user contributions lists and history, diffs, etc, could cause data corruption on reversions (auto and manual), and you'll have to be very careful to avoid double conversions.
I'd recommend strongly against it.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Constans, Camille (C.C.) wrote:
We could convert all tables except old, the biggest and the longest. And running a script which convert old later. Old will be in broken iso-8859 for a few day. I wrote a small php script to convert old from iso-8859-1 to utf-8.
That would be really problematic,
I don't think it would.
it would interfere with user contributions lists and history,
Other than having slightly broken edit summaries for a few days, I'm not sure what problems you are referring to?
and you'll have to be very careful to avoid double conversions.
That won't be a problem. It is trivial to check if something is already in UTF-8 or ISO-8859-1. Checking for this has the added advantage of reducing database load by not converting things that are plain ASCII (and thus don't require conversion) anyway.
diffs, etc, could cause data corruption on reversions (auto and manual),
I don't see that as a problem either. I picture the script would take one page at a time and convert its entire history. Once it's finished with it, it can also check cur to see if someone just by pure coincidence happened to revert something at that particular moment, which is highly unlikely anyway.
I'd recommend strongly against it.
What alternative do you recommend? Prolonged downtime?
Timwi
wikitech-l@lists.wikimedia.org