On 13/04/06, Jakob Voss jakob.voss@nichtich.de wrote:
Search engines don't update their search index live with every new item. The problem with Wikipedia is its size and the quick changes. Normally you would generate a new index every week or night - and to generate a search index for millions of records takes hours! A powerful MediaWiki search engine with a time lag of 1 to 2 days would also be fine for me - you could also think of a smart search engine that works on an old dump in the first run and checks on the live database in the second.
That would be more than fine. I gather the search db is currently several months out of date? But that wasn't my major complaint.
To get such a powerful search it's better to build it from the scratch in an independent application instead of coding it into MediaWiki (but I'm no MediaWiki developer so I may be wrong) so you can optimize for searching only.
Well, it should be as easily accessible as the search box is now.
SELECT page_id FROM page WHERE page_title RLIKE $regxp AND $conditions LIMIT $limit
That would be nice, but even the simple mechanism of exact matches would be a start. And then you can add fall backs, like all upper case, all lower case, upper case first letter of each word and so on. If performance is the issue here.
Steve