So, design question: in the links table, why is l_from a string rather than a cur_id? I'm trying to dig it up, but I can't.
Seems kinda losey -- you can't join cur and links on l_from -- say, to find what articles link to an article with a given ID. You can get all the l_from's from links, of course, but then you have to do a query to cur to get any other information.
~ESP
"BV" == Brion Vibber brion@pobox.com writes:
Me> So, design question: in the links table, why is l_from a string Me> rather than a cur_id? I'm trying to dig it up, but I can't.
BV> As far as I know there is no good reason for this.
The best thing about it is that you can find things-that-link-here fast. The bad thing is that you can't find out anything about those things unless you send another query to the database.
For the what links here page, for example, the code makes one query on links, to find all the links to a page, and one query per link (!) to find out if it's a redirect or not. If it were easy to join cur and links, this could be reduced to a single query. (Another possibility, of course, is caching cur_is_redirect in the links table, too. But then if you needed any other info from cur, you'd still have the same problem.)
I actually tried doing something sick like this:
SELECT DISTINCT cur_namespace, cur_title, cur_is_redirect FROM cur, links WHERE l_to = {$id_to_find_links_to} AND (( cur_namespace = 0 and l_from = cur_title ) OR ( cur_namespace = 1 and l_from = CONCAT("Talk:", cur_title)) OR ( cur_namespace = 2 and l_from = CONCAT("Wikipedia:", cur_title)) OR ...)
It actually worked -- I even bothered to get the prefixes from $wgLang! -- but I can't imagine those CONCAT's being anything but a totally ugly superhack. Probably much much better to just go ahead and do the multiple queries.
Anyways, something I was thinking about.
~ESP
wikitech-l@lists.wikimedia.org