I've gone ahead and made another change to the schema which I'd originally passed over.
The links and brokenlinks tables are now merged to a single pagelinks table, which records the namespace+title key pair of target links rather than the page ID or the prefixed title.
While I've been eyeing this for a while to simplify things, doing it now is mainly a response to the scalability problems of renaming and deletion of widely-linked pages. These actions required updating all linking records, hundreds of thousands in extreme cases, to maintain consistency and for instance are a significant factor in the unpleasantness of dealing with page-move vandalism.
(This issue is similar to but separate from the issue of title updates to all 'old' records for renaming often-edited pages, which was dealt with by the page/revision split.)
It may be necessary to do some shakedown testing to make sure I haven't introduced fun new bugs, but I figured better to do it now than have to wait until the next major release. The update.php script should convert the existing tables automatically (it will leave them in place for now...)
At some point we should also introduce the ability to run page_touched and squid purge updates in the background, by handing the target page to a purge daemon. This won't require database changes, though.
-- brion vibber (brion @ pobox.com)
On 26/05/05, Brion Vibber brion@pobox.com wrote:
I've gone ahead and made another change to the schema which I'd originally passed over.
The links and brokenlinks tables are now merged to a single pagelinks table, which records the namespace+title key pair of target links rather than the page ID or the prefixed title.
Just a quick suggestion (once again) that the opportunity is taken to merge in the categorylinks, imagelinks, etc tables as well (see comments on bug 1065, various posts in the mail archive, etc) while things are being changed (i.e. before 1.5 goes stable). If nothing else, keeping the schema relatively stable (as opposed to lots of little changes in future releases) ought to minimise the pain to users while big Wikimedia databases are upgraded. I know, "so do it yourself", but I just thought I'd keep it in the general consciousness that this needs doing...
Rowan Collins wrote:
Just a quick suggestion (once again) that the opportunity is taken to merge in the categorylinks, imagelinks, etc tables as well (see comments on bug 1065, various posts in the mail archive, etc) while things are being changed (i.e. before 1.5 goes stable).
Aside from a sense of aesthetic pleasantness, I'm not sure what the actual benefit of merging these would be. Both imagelinks and categorylinks already have the properties of the pagelinks table: they remain valid and don't need to be updated when the source page is renamed or when the target is created, deleted, or renamed.
From a database constraints / validity point of view, note that image links and category links would never be valid with a target namespace other than their 'native' namespace; not having a namespace target in those tables enforces that all records are always valid in this respect.
That's not to say it _shouldn't_ be done, but these sorts of big changes do tend to introduce bugs, so it'd be more likely to happen if there is a concrete benefit.
-- brion vibber (brion @ pobox.com)
On 26/05/05, Brion Vibber brion@pobox.com wrote:
Aside from a sense of aesthetic pleasantness, I'm not sure what the actual benefit of merging these would be. Both imagelinks and categorylinks already have the properties of the pagelinks table: they remain valid and don't need to be updated when the source page is renamed or when the target is created, deleted, or renamed.
Well, I think the first time I heard it mentionned was in relation to the need for a separate "templatelinks" table, distinct from normal links - the argument being that it would make sense to have a flag saying what type of link was being stored rather than just creating more and more tables that were essentially identical whenever we needed to distinguish something from a "plain" link. However, there *are* subtle differences between the tables, such as the namespace validity you mentioned, and the need for category links to have a sortkey (and possibly an indexable namespace_from, see previous discussions), so maybe this isn't such an obvious step as it seemed at first.
In which case, I guess it's time to add a "templatelinks" table! ;)
wikitech-l@lists.wikimedia.org