Neil Harris wrote:
Magnus Manske wrote:
I just found
http://point.davidglasser.net/wp-content/uploads/point.davidglasser.net/2008...
and thought I'd share it with the list...
Magnus
Interesting. What they seem to be proposing is to store the tags for each article in a plain text field, and then use the built-in MySQL full-text search mechanism to index and search that, thus taking advantage of all the development already devoted to speeding up general-purpose full text search.
I wonder how it would scale to Wikipedia's vast datasets?
Argh... I tested exactly that question last year: https://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-December/028081.ht...
and then talked about it again here https://lists.wikimedia.org/mailman/htdig/wikitech-l/2008-February/036570.ht...
I think that the fulltext solution is probably very good for a mid-size application, but it sounds like (form people who know more about MySQL databases than I) that it would not stand up to Wikipedia's traffic. Tim Starling suggested that Lucene is better at intersections that MySQL's fulltext database... I also did some testing with a lucene index, but I really don't want to set up Java on my server to I used Zend_Search_Lucene. That gave performance similar to the MySQL fulltext index *BUT* when I queried the same index with Luke (which is Java), the query was *fast*. Sorry, I can't find the mailing list posts about that.
So, I think the solution is to either a) add a field to the current search index or b) create a new search index. A fulltext index might make a nice addition to mediawiki for smaller installations though (and folks who don't want to run java).
FYI, I am using a fulltext index for tagging on my social bookmarking application http://tagthis.info (I know it's not a great social bookmarking app, the idea is that it's a hosted service where anyone can add tags to webpages with some javascript - I'm beta testing it on my wiki directory) and at that scale the performance is very adequate. I'm looking at clucene to set up an indexing daemon for the my higher performance searching needs (maybe might interest folks on this list?)
Best Regards, Aerik