Re: [Wikitech-l] Wikipedia Suggest

16 Aug 2006

Hi Nick,

Nick Jenkins wrote:
...
  Sounds good, and I really like it having the top 10
languages.

 I think _maybe_ though it could be useful to have two variations of the Analyzer (like
you have two variations of TcpQuery - one
 that uses DiskQuery, and one that uses MemoryQuery). With Analyzer though, it could be
good to have one that connects to MySQL and
 gets the data directly from the database, and one that uses the downloaded XML dumps.
This way, people can use whichever one is most
 appropriate for them. For example, for someone running a big MediaWiki site who wanted to
look at the possibility of using
 suggestion searching, they probably wouldn't want to create an XML dump, then run
Analyzer on the XML dump (this would be too slow,
 and too many steps, and take a lot of disk space). Rather, if possible, in that situation
it would be nice to create the compiled
 files directly from the database.

 To try and help with this, I've modified a copy of Analyzer.cpp to add basic
importing (but just of the article names, not redirects
 or article counts) from MySQL (i.e. does not use any downloaded files). The rough file
(which still needs work for redirects +
 article counts) is here: http://files.nickj.org/MediaWiki/MysqlAnalyzerCmd.cpp
 Please note that I have not used C or C++ in a _very_ long time, so if looks like I have
done something silly then that is almost
 certainly correct. :-)

 To use compile and run this, on a Debian/Ubuntu system, I did this:

 # Install required MySQL libraries
 apt-get install libmysqlclient15-dev
 cd cmd
 # Compile:
 g++ -DHAVE_CONFIG_H -I. -I. -I.. -I../expat/lib    -g -O2 -O3 -MT MysqlAnalyzerCmd.o -MD
-MP -MF ".deps/MysqlAnalyzerCmd.Tpo" -c -o
 MysqlAnalyzerCmd.o MysqlAnalyzerCmd.cpp
 # Link: (Note: needs " -lmysqlclient" parameter)
 g++  -g -O2 -O3   -o MysqlAnalyzer -L../tools -L../serialization -L../analyzer
 MysqlAnalyzerCmd.o -lanalyzer -lserialization -ltools -lexpat -lglib-2.0 -lmysqlclient
 # Run (change hostname / username / password / database-name params as required) :
 ./MysqlAnalyzer localhost wikiuser FakePasswd wikidb

 If it is working, it should print out something like this:
 -----------------------------
 Connection success
 Found 12345 articles
 -----------------------------
 Then use the .bin files as per usual on TcpQuery.

    Thank you very much for your contribution Nick. It is better indeed to 
have the two versions, I preferred working on xml dumps at the beginning 
since it was faster and easier for me to update wikipedia-suggest and 
need less time/memory/cpu power (I have the analyzer and the sql server 
on the same computer).
But the next step of wikipedia-suggest for me is to wrote a sql analyzer 
(probably using OTL : otl.sf.net and unixodbc), but unfortunately I will 
not be able to wrote it before september.

...
  Also there is a small diff for WSuggest.js to fix a
small problem in my autocomplete stuff. For example, suppose the user typed
 "Aer", then moved the text cursor back to be between the 'A' and the
'e', typed 'm' (to make "Amer") then typed 'p' (to try
and
 spell 'Amper'). However in-between typing the 'm' and the 'p',
the cursor position will jump to the end of the text box to try and
 autocomplete "American", so the result of pressing 'p' will be
'Amerp', not 'Amper'. To prevent this, will now only try to
 autocomplete if the cursor position is at the end of the text field. Diff is here:
 http://files.nickj.org/MediaWiki/WSuggest.js-0.4-autocomplete-update.txt
    Thank you, I release the wikipedia-suggest 0.41 with your contribution :
http://suggest.speedblue.org/tgz/wikipedia-suggest-0.41.tar.gz

Best Regards.
Julien Lemoine

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Wikipedia Suggest