Neil Harris wrote:
Magnus Manske wrote:
I just found
and thought I'd share it with the list...
Interesting. What they seem to be proposing is to store the tags for
each article in a plain text field, and then use the built-in MySQL
full-text search mechanism to index and search that, thus taking
advantage of all the development already devoted to speeding up
general-purpose full text search.
I wonder how it would scale to Wikipedia's vast datasets?
Argh... I tested exactly that question last year:
and then talked about it again here
I think that the fulltext solution is probably very good for a mid-size
application, but it sounds like (form people who know more about MySQL
databases than I) that it would not stand up to Wikipedia's traffic. Tim
Starling suggested that Lucene is better at intersections that MySQL's
fulltext database... I also did some testing with a lucene index, but I
really don't want to set up Java on my server to I used Zend_Search_Lucene.
That gave performance similar to the MySQL fulltext index *BUT* when I
queried the same index with Luke (which is Java), the query was *fast*.
Sorry, I can't find the mailing list posts about that.
So, I think the solution is to either a) add a field to the current search
index or b) create a new search index. A fulltext index might make a nice
addition to mediawiki for smaller installations though (and folks who don't
want to run java).
FYI, I am using a fulltext index for tagging on my social bookmarking
(I know it's not a great social bookmarking
app, the idea is that it's a hosted service where anyone can add tags to
and at that scale the performance is very adequate. I'm looking at clucene
to set up an indexing daemon for the my higher performance searching needs
(maybe might interest folks on this list?)
- the Wiki Directory of the Web
- Hosted Tagging for your website!