On Wed, 2 Mar 2005 20:14:25 +0000, Rowan Collins rowan.collins@gmail.com wrote:
The disadvantage of using {namespace_as_int, title_as_text} for link targets is that this doesn't reflect how they're entered: [[Foo:Bar]] could change in meaning from {0, "Foo:Bar"} to {20, "Bar"} if a custom "Foo" namespace was created; the two forms could not, however, co-exist.
Namespace creation is a rare event which requires administrative work. Filtering by namespace on the other hand happens constantly.
This suggests to me that it would be better to just make the link_to field wider than page_title (i.e. a width of 255 + a constant MAX_NAMESPACE_LENGTH), and retain the current practice of storing the destination as one string.
This would make it very difficult to use the links table for anything; for instance Special:Recentchangeslinked would no longer be possible.
Richard Holton wrote:
I notice that in the new schema, the 'page' table uses the {namespace_as_int, title_as_text} form, and it doesn't save the namespace within the title. (Was that true of the old schema as well?)
Yes. It's been that way for years.
I don't want to second-guess the new schema. It does seem that the link tables should use the same method of identifying pages as the 'page' table does.
For 'categorylinks', having the namespace in the index would allow fast separation by namespace.
Lemme summarize the situation:
We have four link tables currently: links, brokenlinks, imagelinks, and categorylinks.
links is from id->target id brokenlinks is from id->(text target namespace+title) imagelinks is from id->target title [namespace is 6 by definition] categorylinks is from id->target title [namespace is 14 by definition]
In all, the 'from' is a key on page_id (cur_id in old schema) which uniquely identifies the page doing the linking. This number persists across page renaming.
In imagelinks and categorylinks, the target title can be used in conjunction with the hardcoded namespace to join to page/cur for the target.
In brokenlinks, the target is ugly ugly text. This can't be used in any joins. It should be changed to (namespace,title) but we are too lazy and this hasn't been done yet. There is the additional problem that the size limit of the field isn't 100% correct so there might be inconsistencies with long titles.
In links, the target is a page_id/cur_id number, and can be used to do joins. BUT, since linking is done by *name*, not by number, a creation/deletion/renaming of the target page will break this entry. Thus we have to clean up links, and shuffle pages around between links and brokenlinks when these things happen.
This kind of updating can be a burden on the database during operations on heavily-linked pages, so it's something we scalability-conscious folk want to eliminate.
-- brion vibber (brion @ pobox.com)