Hi list!
This message popped up as a result of investigation why the new suite for Polish Wikipedia has so unpredictable search results when words with non latin1 chars are searched. But this is valid for Meta as well - which is UTF-8 as well (right?). Japanese, Korean, etc Wikipedias are endagered as well, but havent' tested that.
I focused on three letter words, what is theoretically legal.
Three letter words (3LW) with one Polish language specific letter (PLSL) can be searched, although many silly matches are found.
3LW with two PLSL search fails with the prompt "Badly formed search query"
3LW with three PLSL - the same as with one PLSL (strange!).
Exactly the same happens when I put Chinese ideograms instead of PLSL.
It seems that word lengths aren't recognized correctly ("Badly formed..." message is shown when I enter 1-letter word!) and/or words containing PLSL (or ideograms) are split somehow in a strange way. Based on search results my impression is: non-latin1 letters are treated sometimes as separators and sometimes as wildcards.
I guess that other bugs in searching non-latin1 words can be reduced to this.
It would be nice for Polish users to have it fixed before upgrade to Phase III. Other non-latin1 users would be perhaps happier too :-)
User:Youandme
---------------------------------------------------------------------- Najlepsi nie maja watpliwosci... >>> http://link.interia.pl/f1667