Was this of any use to anyone?
Alex
On 8/8/07, Alex Powell alexp@exscien.com wrote:
Hi,
I've been using the existing Mediawiki search engine and implemented a docfile search based on the filesearch extension (running the doc thru antiword). I realize that wikipedia is now lucence, but I have some suggestions to improve the mysql search.
First off I noticed the maintenance rebuildTextIndexes.php has a bug that it doesn't index any namespace other than main. It also needs text on the page so I make the following hack (line 59):
$u = new SearchUpdate( $s->page_id,
Title::makeName($s->page_namespace,$s->page_title), $revtext); if($u->mNamespace == NS_IMAGE && !$u->mText ) $u->mText = "File"; // Always have some text for images to force indexing
This allows it to index files with no text, and ensures the namespace.
Also the MySQL ranking is not working at the moment:
$m2 = str_replace(" IN BOOLEAN MODE", "", $match);
$m2 = str_replace("+", "", $m2);
SELECT page_id, page_namespace, page_title, {$m2} as relevance FROM $page, .$searchindex WHERE page_id=si_masterid AND $match
I've replaced this query with a hacked multiwiki one that shows rank, so I hope that makes sense!
This tip was from http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html , the second comment
Both might be a nice addition for the core engine, for people without lucene...
Final fix was to the filesearch extension - it should return true, or subsequent indexing extensions break extensions.
One last question: when updating the index, should I hook the ondeletepage to remove an index or should there be another hook somewhere else?
Best regards,
Alex
-- Alex Powell
Exscien Training Ltd Tel: +44 (0) 1865 920024 Direct: +44 (0) 1865 920032 Mob: +44 (0) 7717 765210
skype: alexp700 mailto: alexp@exscien.com http://www.exscien.com
Registered in England and Wales 05927635, Unit 10 Wheatley Business Park, Old London Road, Wheatley, OX33 1XW, England