Nikola Smolenski wrote:
Yes. One can use all languages in same encoding (all
Latin-1, all Latin-2,
all UTF-8...) but can not mix encodings. It is trivial to convert any language
to UTF-8, except for the linktrail which is not used anyway. Wikisource,
Wikibooks and Wiktionary are in UTF-8 already, so I don't think it will be a
problem for them.
All the remaining latin-1 language files will have to be upgraded to do
that, or appropriate run-time upconversion added.
I don't think that language-specific sorting will
be a
problem when introduced; an user will simply see text sorted in his language.
The sorting will have to be hard-coded into the index fields in the
tables, so it must be uniform for all saved data. Everything displayed
will use the same uniform sorting method, as it's impractical to add
dozens of sort indexes for every conceivable language to a table with
hundreds of thousands of pages.
But good luck to one who is going to implement it in
UTF-8 for all languages!
As for Chinese and Japanese, you were referring to stripForSearch? I don't
think that it is a problem, Chinese and Japanese users will be able to search
properly, other users will not, but they are not now anyway.
Yes, that's one of the problem functions. It's about saving the index
data consistently, and reading it back properly. The data must always be
stored and interpreted in a consistent way.
-- brion vibber (brion @
pobox.com)