Pablo Saratxaga wrote:
Sorting order is language specific, and some languages handle some accented letters as different letters of their own, not as simple variations of the base letter.
Sorting is one thing, searching is another. Both are determined by the language "collation" order. For example: In Swedish, A and Ä are two different letters, but Á is just an accented version of A.
I just finished reading chapter 8 of the MySQL manual, covering new features in MySQL version 4.1, http://www.mysql.com/doc/en/Charset.html
The contents sounds promising for applications like Wikipedia. As far as I know Wikipedia only uses MySQL 4.0.12 yet, but when 4.1 becomes more stable it would be possible to use these new features. Among them is the possibility to specify the collation order for fulltext search (select match()) or sort (order by) operation. Each database, table or column can also have a default collation. This means the Hungarian Wikipedia could have the Hungarian collation as its default, but a user could specify as her personal preference to use the English collation (accent-ignorant) for her searches. I'm looking forward to this.
wikitech-l@lists.wikimedia.org