On Dec 8, 2003, at 09:19, Stuardo Rodríguez (StR) wrote:
Hi! , i'm new in the list... and i didn't have
the time to read all
mails,
but... with this search stuf ....
Have anyone tried htdig?
ht://Dig uses a web spider to do its indexing, which is less than
ideal. It doesn't understand the structure of the wiki (don't index
"edit this page"; keep articles and talk pages in distinct categories;
doesn't understand which pages are redirects) and it has to spider the
site to perform updates. Consider that we've got over 300,000 pages on
the English Wikipedia alone (including talk pages, user pages,
redirects, etc). It could probably be tweaked to grab updates off of
Recentchanges and other improvements, but I'm not sure it's the best
way to go.
JeLuF has experimented with a search engine based on Lucene
(
http://jakarta.apache.org/lucene/) a lone indexing/search engine which
lets you feed it data and updates however you like. This could include
keeping better track of wiki-specific data, and submitting the text in
the form we like it when we want it. Pages could be indexed immediately
on modification, or just the updated pages reindexed periodically.
JeLuF, how did that look? Promising or not?
Also of course we can switch the mysql search back on. The database
server will actually have all the CPU it wants now, so perhaps the
mysterious hanging threads won't plague us anymore. If it sucks again,
we can turn it back off until we figure it out or replace it.
--brion vibber (brion @
pobox.com)