Aerik Sylvan wrote:
I would approach this by taking all the article titles, and
creating an index with every significant word in the title. After that, I'm
afraid I'd have to use SQL, and likely fall back onto using "LIKE"
somewhere
if I was going to return partial-word matches (an "obvious and naive"
solution).
Creating an index of words is not that native anymore. :-) Also, if you
use "LIKE 'word%'" instead of "LIKE '%word%'", the
index can be used
quite efficiently. However, you still run into the problem: how do you
list the top 10 most linked-to pages given just a single letter? You'd
still have to query all article titles that have a word that begins with
that letter. The trie gives you a much more direct way of looking that up.
I have always assumed that MySQL is internally
optimised enough that
if one sticks to simple queries and whole word matches, you get
pretty good performance
No matter how well something is optimised, you would still expect some
things to be more efficient than others. By relying on the efficient
things, Wikipedia can run on fewer and/or cheaper servers. In other
words, your queries may be sufficiently efficient that they run within a
fraction in a second on your server which is not used by anyone else,
but if the server is already as busy as Wikipedia, and then you're
adding another few thousand extra queries per second, then you can't
expect "internal optimisation" to do all the magic for you.
This is, of
course, the most obvious and most naive solution. [...]
Timwi, from your other posts I can see that you know a lot more about
computer programming than I do, but I would hope to be able to offer some
opinion without getting slapped with "obvious and naive".
Please read
http://en.wikipedia.org/wiki/Wikipedia:Assume_good_faith .
In the same way that you hope to be able to offer your opinion, so do I.
The "getting slapped" is in your head, not in my intentions. If my
opinion, given in good faith, is offensive to you, then quite frankly
you need to lighten up. You can't expect it to be censored or suppressed
just so that you feel better.
Timwi