Jakob Voss wrote:
Search engines don't update their search index
live with every new item.
The problem with Wikipedia is its size and the quick changes. Normally
you would generate a new index every week or night - and to generate a
search index for millions of records takes hours!
I'm not sure if you're talking about the big web search engines
(Google, Yahoo, MSN) or the search function in Mediawiki here.
There is little excuse for the latter to have any delay. But even
for a big web search engine, it is easy to keep track of how often
each webpage has changed in history, and economize how often it
needs to be revisited. Combined with the high PageRank of en.wp's
RecentChanges (9 of 10), it would be trivial for Googlebot to
revisit this page (or the front page of websites of major
newspapers) every minute or two and make it a high priority to
reindex all pages linked from there. I suppose this is how Google
News works. Why it still takes about a month for Google to update
its index on Wikipedia articles is a mystery to me. Probably it
has to do with a lack of competition. If MSN or Yahoo were
faster, it would force Google to improve.
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se