On Wed, 2 Mar 2005 20:14:43 -0600, Richard Holton <richholton(a)gmail.com> wrote:
Thanks for the clarity, Brion.
Firstly, let me second that thanks. And thanks also for pointing out
the relative unhelpfulness of storing the target as one text string.
It now occurs to me that if a new namespace (or a new interwiki
prefix) is created, you'd just have to search for {ns=0,
title=/Foo:.*/} instead of {target=/Foo:.*/}, so it really makes no
odds.
If I understand things correctly, we
should look at eventually changing 'links' and 'broken links' to use:
from id->(to namespace, to title).
Would there be a need for separate tables then?
No, almost certainly not. In fact, it would be very helpful to create
a unified table, because then we wouldn't need the proliferation of
task-specific tables. See, for instance,
http://bugzilla.wikimedia.org/show_bug.cgi?id=1065
Such a configuration
would require a lookup on the page table for each link when rendering
a page. Although, now that I think about it, that's probably required
now anyway.
Yes, in order to determine whether a link is "broken" or not, you have
to look for the page in the cur/page table. The only difference is
that right now it gets stored in a different table if its broken,
rather than just having a link_is_broken flag which can be checked and
toggled when necessary.
Categorylinks seems to be a bit of a special case,
since it really
amounts to a reverse link.
That's one way of thinking about it. The other, which is more in tune
with less formalised wikis (e.g.
http://c2.com/cgi/wiki?WikiCategories), is that the category page is
just a souped-up "What links here" display, and the links on
individual pages just internal links displayed out-of-line. On this
view, they're just another kind of link - distinguished only to solve
the "Use-mention problem"
(
http://www.usemod.com/cgi-bin/mb.pl?UseMentionProblem)
At the moment, I'm thinking of adding a
"fromNamespace" field to
categorylinks. This will decouple the display of pages from the
display of parent categories, and would facilitate breaking the
category display into namespaces.
Hm, that certainly does match your "reverse link" interpretation. Of
course, it suffers from a variant of the update problem that the
current id->id based links table does: a page can be moved from one
namespace to another, taking its category links with it. So, if
[[Foo]] contains [[Category:Metasyntactic variables]], and I move it
to [[Wikipedia:Foo]], the from_namespace field in the categorylinks
table would become wrong, even though the actual page reference (by
ID) would remain valid. I don't know whether that update would be more
or less sensible/expensive/complex than having to do an extra
lookup/join/whatever to get the namespace out of the page table when
displaying the category.
There's an additional problem with merging the categorylinks table in
with the others anyway, in that it has extra fields cl_sortkey and
cl_timestamp that the others don't have or need. That's a shame,
because it would be neat, if redesigning the tables, not to have to
leave that one out.
Some rambling thoughts, though, on a possible links schema:
* l_from (as page_id)
* l_to_ns (as int)
* l_to_title (as string)
* l_is_broken (boolean)
--- to allow everything that brokenlinks currently does; this should
be kept seperate from the link type, because if you try to {{include}}
a non-existent page, it's both "broken" and a "template" link,
and
both facts are potentially useful
* l_type (probably an int, like namespace)
--- values could include:
--- LT_NORMAL: just an ordinary [[free link]] (or an escaped link to a
[[:Image:Page]] or [[:Category:Page]])
--- LT_IMAGE: it's an image display (l_to_ns would be redundant, but never mind)
--- LT_TEMPLATE: it's a {{template inclusion}}
--- LT_INTERWIKI: currently, interwiki links don't get stored
anywhere, but it could be useful if they were; for instance, if
someone changes the interwiki map, it might be worth looking up what
links are affected. Obviously, l_to_ns would be meaningless, and any
search for a specific prefix would involve matching the text, but
might it be worth considering?
--- LT_CATEGORY: as I say, this has rather awkward issues - where do
the extra fields go? l_to_ns could be reused [abused] as the namespace
of the *originating* page if keeping that up-to-date was considered
better than looking it up in the page table
Using the links table for redirects
--- this is another idea I had which may or may not have merit. At the
moment, we can know a page is a redirect from one flag in the page
table, but have to retrieve and parse its content to find what it's a
redirect *to*. Given that this now requires a look-up in the revision
table, it would be just as easy to look up the destination in the link
table instead. The only times this wouldn't work would be redirects to
Interwiki links (unless we start storing those) and redirects to
Special pages (which are already kind of crazy, but certainly don't
show up in the links table right now; they'd have to be their own link
type if they did, I guess). You could always have a system that *fell
back to* getting and parsing the revision, but maybe that would be
kind of silly.
--- I was originally going to suggest an 'l_is_redirect' field, but
now I think about it, maybe we don't need it as long as a redirect
page can only have one valid link on it. Every use I thought of (such
as category redirects being treated as 'aliases' and effectively
merging the two categories, which would make renaming a category a
*lot* easier) has to access the page table anyway, to get the full
text of the title from the id.
Right, sorry for the ramble, if anyone actually read this far, I'd be
interested in your thoughts.
--
Rowan Collins BSc
[IMSoP]