Hi Julien,
== UI stuff ==
The only two constructive suggestions for the User-Interface that I would make are:
These two features are available now. You can use them :)
Excellent! Thank you, they work great.
A few more suggestions / comments / ideas after compiling it:
== Small addition for README file ==
Perhaps add this to the end of the README file (about lib requirements) - (these worked for me) :
# If you're on Debian/Ubuntu, you can probably install expat & icu with: apt-get install libicu34 libexpat1 libexpat1-dev libicu34-dev
# The compile with: ./configure make
== Pull data from MySQL? ==
Currently the data used by cmd/Analyzer appears to be generated from the XML data file (e.g. the 1.1 gigabyte XmlEnWikipedia.tar.bz2 file).
Could it be possible instead to generate the required data from a MySQL database? That way the whole XML step could be avoided by just connecting directly to the database, and generating the required data directly from the database.
Also this way, you _may_ be able to get away with reading far less data, and so it could potentially be quicker. Also it would be more up-to-date, because there would be intermediate step to add to the latency.
For example, this way you could get the number of links to each page by doing something like this: select pl_title, count(*) from pagelinks group by pl_title;
... and you could get the valid page names by doing something like this: select page_title from page where page_namespace = 0;
== Maybe include JS / HTML files ==
Maybe bundle the JS / HTML / PHP / Perl files that you're using at http://suggest.speedblue.org/ with the wikipedia-suggest.tar.gz archive? Basically I'd like to try and reproduce what you've got happening on your website so that I can play with it too ;-)
== Documentation for Gumbies ==
Potentially stupid question, but how do I use the resulting executable files in the 'cmd' directory? I couldn't see documentation on this in the wikipedia-suggest.tar.gz archive (but it is version 0.1, so it's not unexpected). Or to put it differently, what series of commands did you use to get your http://suggest.speedblue.org/ site working?
E.g. wget http://www2.speedblue.org/download/XmlEnWikipedia.tar.bz2 # How did you create XmlEnWikipedia.tar.bz2 by the way? E.g. Did it come from doing something to http://download.wikimedia.org/enwiki/20060717/enwiki-20060717-pages-articles... ?
cmd/Analyzer XmlEnWikipedia.tar.bz2 fsa.bin page.bin # (i.e. Not sure what creates fsa.bin & page.bin , or how to persuade it to do so )
Then presumably either cmd/Query or cmd/TcpQuery is invoked on fsa.bin and page.bin, and connected to somehow to query for results. # What's the difference between these two versions? E.g. is one the DiskQuery implementation, and the other the MemoryQuery implementation? Or is it just how you connect to them (e.g. one via TCP/IP, the other via some other method?)
Sorry to ask so many questions!
All the best, Nick.