Aerik Sylvan wrote:
I would approach this by taking all the article titles, and creating an index with every significant word in the title. After that, I'm afraid I'd have to use SQL, and likely fall back onto using "LIKE" somewhere if I was going to return partial-word matches (an "obvious and naive" solution).
Creating an index of words is not that native anymore. :-) Also, if you use "LIKE 'word%'" instead of "LIKE '%word%'", the index can be used quite efficiently. However, you still run into the problem: how do you list the top 10 most linked-to pages given just a single letter? You'd still have to query all article titles that have a word that begins with that letter. The trie gives you a much more direct way of looking that up.
I have always assumed that MySQL is internally optimised enough that if one sticks to simple queries and whole word matches, you get pretty good performance
No matter how well something is optimised, you would still expect some things to be more efficient than others. By relying on the efficient things, Wikipedia can run on fewer and/or cheaper servers. In other words, your queries may be sufficiently efficient that they run within a fraction in a second on your server which is not used by anyone else, but if the server is already as busy as Wikipedia, and then you're adding another few thousand extra queries per second, then you can't expect "internal optimisation" to do all the magic for you.
This is, of course, the most obvious and most naive solution. [...]
Timwi, from your other posts I can see that you know a lot more about computer programming than I do, but I would hope to be able to offer some opinion without getting slapped with "obvious and naive".
Please read http://en.wikipedia.org/wiki/Wikipedia:Assume_good_faith .
In the same way that you hope to be able to offer your opinion, so do I. The "getting slapped" is in your head, not in my intentions. If my opinion, given in good faith, is offensive to you, then quite frankly you need to lighten up. You can't expect it to be censored or suppressed just so that you feel better.
Timwi