Re: [Wikitech-l] Wikipedia Suggest

4 Aug 2006


      Hi Julien,
...
...
== UI stuff ==
The only two constructive suggestions for the User-Interface that I
would make are:
These two features are available now. You can use them :)
Excellent! Thank you, they work great.
A few more suggestions / comments / ideas after compiling it:
== Small addition for README file ==
Perhaps add this to the end of the README file (about lib
requirements) - (these worked for me) :
# If you're on Debian/Ubuntu, you can probably install expat & icu with:
apt-get install libicu34 libexpat1 libexpat1-dev libicu34-dev
# The compile with:
./configure
make
== Pull data from MySQL? ==
Currently the data used by cmd/Analyzer appears to be generated from
the XML data file (e.g. the 1.1 gigabyte XmlEnWikipedia.tar.bz2 file).
Could it be possible instead to generate the required data from a
MySQL database? That way the whole XML step could be avoided by just
connecting directly to the database, and generating the required data
directly from the database.
Also this way, you _may_ be able to get away with reading far less
data, and so it could potentially be quicker. Also it would be more
up-to-date, because there would be intermediate step to add to the
latency.
For example, this way you could get the number of links to each page
by doing something like this:
select pl_title, count(*) from pagelinks group by pl_title;
... and you could get the valid page names by doing something like this:
select page_title from page where page_namespace = 0;
== Maybe include JS / HTML files ==
Maybe bundle the JS / HTML / PHP / Perl files that you're using at
http://suggest.speedblue.org/ with the wikipedia-suggest.tar.gz
archive? Basically I'd like to try and reproduce what you've got
happening on your website so that I can play with it too ;-)
== Documentation for Gumbies ==
Potentially stupid question, but how do I use the resulting executable
files in the 'cmd' directory? I couldn't see documentation on this in
the wikipedia-suggest.tar.gz archive (but it is version 0.1, so it's
not unexpected). Or to put it differently, what series of commands did
you use to get your http://suggest.speedblue.org/ site working?
E.g.
wget http://www2.speedblue.org/download/XmlEnWikipedia.tar.bz2
# How did you create XmlEnWikipedia.tar.bz2 by the way? E.g. Did it
come from doing something to
http://download.wikimedia.org/enwiki/20060717/enwiki-20060717-pages-articles...
?
cmd/Analyzer XmlEnWikipedia.tar.bz2 fsa.bin page.bin
# (i.e. Not sure what creates fsa.bin & page.bin , or how to persuade
it to do so )
Then presumably either cmd/Query or cmd/TcpQuery is invoked on fsa.bin
and page.bin, and connected to somehow to query for results.
# What's the difference between these two versions? E.g. is one the
DiskQuery implementation, and the other the MemoryQuery
implementation? Or is it just how you connect to them (e.g. one via
TCP/IP, the other via some other method?)
Sorry to ask so many questions!
All the best,
Nick.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Wikipedia Suggest