New subject: Wikipedia Suggest

3 Aug 2006

Hi Julien,

...
  I have wrote a "google suggest" like service
for wikipedia under GPL
 licence [1].
 By the way, I am not sure if this project will interest you, I am open
 to all comments from your community. 
Yes!! Good stuff!

== UI stuff ==

The only two constructive suggestions for the User-Interface that I
would make are:

1) That if the user presses 'Enter' in the search textbox whilst
typing out a query, that it automatically choose/open/redirect to the
first item in the list. That way I can type out what I want, and press
enter to open the first link when I've typed enough to specify it well
enough to get it to the top of the list, all without using the mouse.

2) Allow the user to press the down/up arrows to select/highlight a
specified entry on the list (including but not limited to the first
item), and press enter to open it. That way again the user can be lazy
and can select a link without using the mouse, and without typing out
the full title.

== More Technical stuff ==

1) How do you handle pages with the same title, but different
capitalization? They're rare, but they do occur. My suspicion from
scanning Analyzer.cpp is that you just take the most popular. However,
if it's for search, it would be best to include everything (I think).

2) Doesn't seem to get include redirects. For example, when I search
for "Formula weight", it's not listed, but on the EN Wikipedia
"Formula weight" is a redirect to "Atomic mass". It would definitely
be better to include redirects (in my personal opinion).

However the downside of including these two things is that the amount
of data that you need to store goes up. I've actually had a go at a
very similar problem (storing a memory index of all article names, but
meeting the two conditions specified above, plus for redirects I would
also store the name of the article it redirected to [something which
could potentially maybe also be useful for your suggest service if you
wanted to show this information too]). However this was in PHP, and it
a complete memory hog (think > 1 Gb for the memory index). My solution
(since it was only for me) was to just "buy more RAM", however I like
your approach of getting more efficient. By the way, the reason I was
doing this was for suggesting links that could be made in wiki text -
just so you know I'm not in competition with you, but that the
problems we face are similar in some ways, and could maybe benefit
from a common solution.

How big would be a memory index be that had these properties (i.e.
including all NS:0 articles/redirects, and maybe including the targets
for redirects)?

All the best,
Nick.

Re: [Wikitech-l] Wikipedia Suggest