Hi Nick,
* Nick Jenkins nickpj@gmail.com [2006-08-03 16:16:23 +1000]:
== UI stuff ==
The only two constructive suggestions for the User-Interface that I would make are:
- That if the user presses 'Enter' in the search textbox whilst
typing out a query, that it automatically choose/open/redirect to the first item in the list. That way I can type out what I want, and press enter to open the first link when I've typed enough to specify it well enough to get it to the top of the list, all without using the mouse.
- Allow the user to press the down/up arrows to select/highlight a
specified entry on the list (including but not limited to the first item), and press enter to open it. That way again the user can be lazy and can select a link without using the mouse, and without typing out the full title.
You idea are goods, I added them on my TODO list :)
== More Technical stuff ==
- How do you handle pages with the same title, but different
capitalization? They're rare, but they do occur. My suspicion from scanning Analyzer.cpp is that you just take the most popular. However, if it's for search, it would be best to include everything (I think).
Yeah you are right, the most popular is keept. I wanted to have a case insensitive search and for the moment I have only one article by final node. But it will be better, I also added it to my TODO list.
- Doesn't seem to get include redirects. For example, when I search
for "Formula weight", it's not listed, but on the EN Wikipedia "Formula weight" is a redirect to "Atomic mass". It would definitely be better to include redirects (in my personal opinion).
I will generates a new index with redirects, I will give you the size before/after this evening.
However the downside of including these two things is that the amount of data that you need to store goes up. I've actually had a go at a very similar problem (storing a memory index of all article names, but meeting the two conditions specified above, plus for redirects I would also store the name of the article it redirected to [something which could potentially maybe also be useful for your suggest service if you wanted to show this information too]). However this was in PHP, and it a complete memory hog (think > 1 Gb for the memory index). My solution (since it was only for me) was to just "buy more RAM", however I like your approach of getting more efficient. By the way, the reason I was doing this was for suggesting links that could be made in wiki text - just so you know I'm not in competition with you, but that the problems we face are similar in some ways, and could maybe benefit from a common solution. How big would be a memory index be that had these properties (i.e. including all NS:0 articles/redirects, and maybe including the targets for redirects)?
The current automaton (index) of all wikipedia articles (without redirects) needs 127Mb, I don't think adding redirects will increass a lot his size. I will give you the exact size with redirect this evening :) (Paris time).
Best Regards. Julien Lemoine