On Wed, 2 Mar 2005 20:14:43 -0600, Richard Holton richholton@gmail.com wrote:
Thanks for the clarity, Brion.
Firstly, let me second that thanks. And thanks also for pointing out the relative unhelpfulness of storing the target as one text string. It now occurs to me that if a new namespace (or a new interwiki prefix) is created, you'd just have to search for {ns=0, title=/Foo:.*/} instead of {target=/Foo:.*/}, so it really makes no odds.
If I understand things correctly, we should look at eventually changing 'links' and 'broken links' to use: from id->(to namespace, to title).
Would there be a need for separate tables then?
No, almost certainly not. In fact, it would be very helpful to create a unified table, because then we wouldn't need the proliferation of task-specific tables. See, for instance, http://bugzilla.wikimedia.org/show_bug.cgi?id=1065
Such a configuration would require a lookup on the page table for each link when rendering a page. Although, now that I think about it, that's probably required now anyway.
Yes, in order to determine whether a link is "broken" or not, you have to look for the page in the cur/page table. The only difference is that right now it gets stored in a different table if its broken, rather than just having a link_is_broken flag which can be checked and toggled when necessary.
Categorylinks seems to be a bit of a special case, since it really amounts to a reverse link.
That's one way of thinking about it. The other, which is more in tune with less formalised wikis (e.g. http://c2.com/cgi/wiki?WikiCategories), is that the category page is just a souped-up "What links here" display, and the links on individual pages just internal links displayed out-of-line. On this view, they're just another kind of link - distinguished only to solve the "Use-mention problem" (http://www.usemod.com/cgi-bin/mb.pl?UseMentionProblem)
At the moment, I'm thinking of adding a "fromNamespace" field to categorylinks. This will decouple the display of pages from the display of parent categories, and would facilitate breaking the category display into namespaces.
Hm, that certainly does match your "reverse link" interpretation. Of course, it suffers from a variant of the update problem that the current id->id based links table does: a page can be moved from one namespace to another, taking its category links with it. So, if [[Foo]] contains [[Category:Metasyntactic variables]], and I move it to [[Wikipedia:Foo]], the from_namespace field in the categorylinks table would become wrong, even though the actual page reference (by ID) would remain valid. I don't know whether that update would be more or less sensible/expensive/complex than having to do an extra lookup/join/whatever to get the namespace out of the page table when displaying the category.
There's an additional problem with merging the categorylinks table in with the others anyway, in that it has extra fields cl_sortkey and cl_timestamp that the others don't have or need. That's a shame, because it would be neat, if redesigning the tables, not to have to leave that one out.
Some rambling thoughts, though, on a possible links schema: * l_from (as page_id) * l_to_ns (as int) * l_to_title (as string)
* l_is_broken (boolean) --- to allow everything that brokenlinks currently does; this should be kept seperate from the link type, because if you try to {{include}} a non-existent page, it's both "broken" and a "template" link, and both facts are potentially useful
* l_type (probably an int, like namespace) --- values could include: --- LT_NORMAL: just an ordinary [[free link]] (or an escaped link to a [[:Image:Page]] or [[:Category:Page]]) --- LT_IMAGE: it's an image display (l_to_ns would be redundant, but never mind) --- LT_TEMPLATE: it's a {{template inclusion}} --- LT_INTERWIKI: currently, interwiki links don't get stored anywhere, but it could be useful if they were; for instance, if someone changes the interwiki map, it might be worth looking up what links are affected. Obviously, l_to_ns would be meaningless, and any search for a specific prefix would involve matching the text, but might it be worth considering? --- LT_CATEGORY: as I say, this has rather awkward issues - where do the extra fields go? l_to_ns could be reused [abused] as the namespace of the *originating* page if keeping that up-to-date was considered better than looking it up in the page table
Using the links table for redirects --- this is another idea I had which may or may not have merit. At the moment, we can know a page is a redirect from one flag in the page table, but have to retrieve and parse its content to find what it's a redirect *to*. Given that this now requires a look-up in the revision table, it would be just as easy to look up the destination in the link table instead. The only times this wouldn't work would be redirects to Interwiki links (unless we start storing those) and redirects to Special pages (which are already kind of crazy, but certainly don't show up in the links table right now; they'd have to be their own link type if they did, I guess). You could always have a system that *fell back to* getting and parsing the revision, but maybe that would be kind of silly. --- I was originally going to suggest an 'l_is_redirect' field, but now I think about it, maybe we don't need it as long as a redirect page can only have one valid link on it. Every use I thought of (such as category redirects being treated as 'aliases' and effectively merging the two categories, which would make renaming a category a *lot* easier) has to access the page table anyway, to get the full text of the title from the id.
Right, sorry for the ramble, if anyone actually read this far, I'd be interested in your thoughts.