Hello,
I wrote a new version of Wikipedia suggest (version 0.3) which includes : - an option to enable usage of MemoryQuery (-m) in TcpQuery command, you can also specify the number of threads that you want. - an heuristic to choose the correct redirection to keep (based on similarity with the query) - handling of articles with different capitalization (keep all different capitalizations) - includes the patch of Nick Jenkins.
I will regenerates the index for english/french on suggest.speedblue.org tomorrow. You can download the sources now on : http://suggest.speedblue.org/tgz/wikipedia-suggest-0.3.tar.gz
I also look the mysql tables (page and pageslink), but I have two questions : - is there a way to get the target of a redirection ? There is a is_redirected flag on the page table, but I do no see information about the redirected article - is the url available in a table ?
If I can have these two informations on the tables, I will write a SQL version of the analyzer. If these two informations are not available, what will be the best way to write a analyzer for Wikipedia ? (work on the pages-articles.xml http://download.wikipedia.org/enwiki/20060717/enwiki-20060717-pages-articles.xml.bz2 file ? ).
Best Regards. Julien Lemoine