Re: [Wikitech-l] Wikipedia Suggest

16 Aug 2006

      Hi Nick,
Nick Jenkins wrote:
...
Sounds good, and I really like it having the top 10 languages.
I think _maybe_ though it could be useful to have two variations of the Analyzer (like you have two variations of TcpQuery - one
that uses DiskQuery, and one that uses MemoryQuery). With Analyzer though, it could be good to have one that connects to MySQL and
gets the data directly from the database, and one that uses the downloaded XML dumps. This way, people can use whichever one is most
appropriate for them. For example, for someone running a big MediaWiki site who wanted to look at the possibility of using
suggestion searching, they probably wouldn't want to create an XML dump, then run Analyzer on the XML dump (this would be too slow,
and too many steps, and take a lot of disk space). Rather, if possible, in that situation it would be nice to create the compiled
files directly from the database.
To try and help with this, I've modified a copy of Analyzer.cpp to add basic importing (but just of the article names, not redirects
or article counts) from MySQL (i.e. does not use any downloaded files). The rough file (which still needs work for redirects +
article counts) is here: http://files.nickj.org/MediaWiki/MysqlAnalyzerCmd.cpp
Please note that I have not used C or C++ in a _very_ long time, so if looks like I have done something silly then that is almost
certainly correct. :-)
To use compile and run this, on a Debian/Ubuntu system, I did this:
# Install required MySQL libraries
apt-get install libmysqlclient15-dev
cd cmd
# Compile:
g++ -DHAVE_CONFIG_H -I. -I. -I.. -I../expat/lib    -g -O2 -O3 -MT MysqlAnalyzerCmd.o -MD -MP -MF ".deps/MysqlAnalyzerCmd.Tpo" -c -o
MysqlAnalyzerCmd.o MysqlAnalyzerCmd.cpp
# Link: (Note: needs " -lmysqlclient" parameter)
g++  -g -O2 -O3   -o MysqlAnalyzer -L../tools -L../serialization -L../analyzer
MysqlAnalyzerCmd.o -lanalyzer -lserialization -ltools -lexpat -lglib-2.0 -lmysqlclient
# Run (change hostname / username / password / database-name params as required) :
./MysqlAnalyzer localhost wikiuser FakePasswd wikidb
If it is working, it should print out something like this:
Connection success
Found 12345 articles

Then use the .bin files as per usual on TcpQuery.
Thank you very much for your contribution Nick. It is better indeed to 
have the two versions, I preferred working on xml dumps at the beginning 
since it was faster and easier for me to update wikipedia-suggest and 
need less time/memory/cpu power (I have the analyzer and the sql server 
on the same computer).
But the next step of wikipedia-suggest for me is to wrote a sql analyzer 
(probably using OTL : otl.sf.net and unixodbc), but unfortunately I will 
not be able to wrote it before september.
...
Also there is a small diff for WSuggest.js to fix a small problem in my autocomplete stuff. For example, suppose the user typed
"Aer", then moved the text cursor back to be between the 'A' and the 'e', typed 'm' (to make "Amer") then typed 'p' (to try and
spell 'Amper'). However in-between typing the 'm' and the 'p', the cursor position will jump to the end of the text box to try and
autocomplete "American", so the result of pressing 'p' will be 'Amerp', not 'Amper'. To prevent this, will now only try to
autocomplete if the cursor position is at the end of the text field. Diff is here:
http://files.nickj.org/MediaWiki/WSuggest.js-0.4-autocomplete-update.txt
Thank you, I release the wikipedia-suggest 0.41 with your contribution :
http://suggest.speedblue.org/tgz/wikipedia-suggest-0.41.tar.gz
Best Regards.
Julien Lemoine

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Wikipedia Suggest

If it is working, it should print out something like this: