On mer, 2002-05-22 at 14:43, Jan.Hidders wrote:
On Wed, May 22, 2002 at 11:14:06AM -0700, Brion L. VIBBER wrote:
If I actually make the software check return codes and report useful error messages, I see:
You have an error in your SQL syntax near 'MATCH (cur_ind_title) AGAINST ("Fläche") ) AND cur_t' at line 3
Looks like mismatched parentheses. I've replaced (what I think is) the rest of the \w regexps with [\w\x80-\xff], and it looks improved, but as much of the search code is a mystery to me, I can't guarantee anything!
It looks Ok to me. Having said that can I as one of the contributers of the search code make a small protest here? [But feel free to ignore me because I've been away for too long without due notice.] I checked what \w actually matches on my system here and it matches the following ASCII codes (decimal):
48 - 57 ( '0' - '9' ) 65 - 90 ( 'A' - 'Z' ) 95 ( '_' ) 79 - 122 ( 'a' - 'z' ) 170 181 192 - 214 216 - 246 248 - 255
So that includes all the German characters (or has our encoding scheme changed?)
If you'll recall, we're switching all the wikipedias to UTF-8 (if we don't do it now, we'll just end up doing it in a few years and it'll be more painful). The above covers some, but not all UTF-8 sequences that encode valid letters.
In an ideal world, locale settings would apply (and work correctly and consistently!) and \w would match everything necessary, but... (PHP 4.1.0 has some sort of special UTF-8 mode for regexps that might or might not be useful here.)
and it should have worked with simply \w (and it did, as I said, searching for "Go"del" went fine. This is important because (1) our error reporting should be as tight as possible and not just give an empty search result if the user types a character that isn't indexed and (2) if something did really go wrong before then your quick-fix is now hiding a bug that may come to haunt us later.
Oh, I don't doubt that at all. :)
-- brion vibber (brion @ pobox.com)