New subject: Tag intersection database performance (Neil Harris)

28 Feb 2008


      Neil Harris wrote:
...
Magnus Manske wrote:
...
I just found
http://point.davidglasser.net/wp-content/uploads/point.davidglasser.net/2008...
...
and thought I'd share it with the list...
Magnus
Interesting. What they seem to be proposing is to store the tags for
each article in a plain text field, and then use the built-in MySQL
full-text search mechanism to index and search that, thus taking
advantage of all the development already devoted to speeding up
general-purpose full text search.
I wonder how it would scale to Wikipedia's vast datasets?
Argh... I tested exactly that question last year:
https://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-December/028081.ht...
and then talked about it again here
https://lists.wikimedia.org/mailman/htdig/wikitech-l/2008-February/036570.ht...
I think that the fulltext solution is probably very good for a mid-size
application, but it sounds like (form people who know more about MySQL
databases than I) that it would not stand up to Wikipedia's traffic.  Tim
Starling suggested that Lucene is better at intersections that MySQL's
fulltext database... I also did some testing with a lucene index, but I
really don't want to set up Java on my server to I used Zend_Search_Lucene.
That gave performance similar to the MySQL fulltext index *BUT* when I
queried the same index with Luke (which is Java), the query was *fast*.
Sorry, I can't find the mailing list posts about that.
So, I think the solution is to either a) add a field to the current search
index or b) create a new search index.  A fulltext index might make a nice
addition to mediawiki for smaller installations though (and folks who don't
want to run java).
FYI, I am using a fulltext index for tagging on my social bookmarking
application http://tagthis.info (I know it's not a great social bookmarking
app, the idea is that it's a hosted service where anyone can add tags to
webpages with some javascript - I'm beta testing it on my wiki directory)
and at that scale the performance is very adequate.  I'm looking at clucene
to set up an indexing daemon for the my higher performance searching needs
(maybe might interest folks on this list?)
Best Regards,
Aerik
-- 
http://www.wikidweb.com - the Wiki Directory of the Web
http://tagthis.info - Hosted Tagging for your website!

Re: [Wikitech-l] Tag intersection database performance (Neil Harris)