On Thu, 3 Mar 2005 13:47:27 +0000, Rowan Collins rowan.collins@gmail.com wrote:
On Wed, 2 Mar 2005 20:14:43 -0600, Richard Holton richholton@gmail.com wrote:
If I understand things correctly, we should look at eventually changing 'links' and 'broken links' to use: from id->(to namespace, to title).
Would there be a need for separate tables then?
No, almost certainly not. In fact, it would be very helpful to create a unified table, because then we wouldn't need the proliferation of task-specific tables. See, for instance, http://bugzilla.wikimedia.org/show_bug.cgi?id=1065
Thanks for the link to the bug entry. The discussion there helped me to understand some things a bit better.
Categorylinks seems to be a bit of a special case, since it really amounts to a reverse link.
That's one way of thinking about it. The other, which is more in tune with less formalised wikis (e.g. http://c2.com/cgi/wiki?WikiCategories), is that the category page is just a souped-up "What links here" display, and the links on individual pages just internal links displayed out-of-line. On this view, they're just another kind of link - distinguished only to solve the "Use-mention problem" (http://www.usemod.com/cgi-bin/mb.pl?UseMentionProblem)
Another way to think about things: a category link is like a two-way link. It links both from the page to the category, and from the category to the page. I definitely see the similarity between "what links here" and categories. In that sense, each page becomes its own category, with any link to the page a 'category entry'. Perhaps we could use this relationship to improve the 'what links here' display.
At the moment, I'm thinking of adding a "fromNamespace" field to categorylinks. This will decouple the display of pages from the display of parent categories, and would facilitate breaking the category display into namespaces.
Hm, that certainly does match your "reverse link" interpretation. Of course, it suffers from a variant of the update problem that the current id->id based links table does: a page can be moved from one namespace to another, taking its category links with it. So, if [[Foo]] contains [[Category:Metasyntactic variables]], and I move it to [[Wikipedia:Foo]], the from_namespace field in the categorylinks table would become wrong, even though the actual page reference (by ID) would remain valid. I don't know whether that update would be more or less sensible/expensive/complex than having to do an extra lookup/join/whatever to get the namespace out of the page table when displaying the category.
Yes, that is the trade off. However, when a category page moves, it only has to update its own categorylink entries -- one for each each [[category:xxx]] on that page. Note that currently, the cl_sortkey field is updated on any page move for those links that don't have a specified sort key.
If you add a category link ([[category:foo]] to a category page "bar", then "bar" becomes a sub-category of Foo. The link from category:bar to category:foo is entered into categorylinks just like any other. Currently, to find subcategories for a category page, the links have to be retrieved and separated by namespace in PHP. This is why the subcategory and other pages on the category page are linked alphabetically. A chunk of up to 200 links to the category are retrieved, then separated. If you wanted to get the next 200 sub-categories, you would have to keep reading in category links until you find 200 subcategories, or until you've read all the categorylinks, whichever comes first. That is why having a namespace field is important.
There are also requests for separating the category displays further into namespaces (e.g. listing templates separately). This would have the same issue as above.
There's an additional problem with merging the categorylinks table in with the others anyway, in that it has extra fields cl_sortkey and cl_timestamp that the others don't have or need. That's a shame, because it would be neat, if redesigning the tables, not to have to leave that one out.
By the way, cl_timestamp is currently unused (I believe). It can be handy for debugging, but otherwise I don't know it's intended purpose.
Some rambling thoughts, though, on a possible links schema:
l_from (as page_id)
l_to_ns (as int)
l_to_title (as string)
l_is_broken (boolean)
--- to allow everything that brokenlinks currently does; this should be kept seperate from the link type, because if you try to {{include}} a non-existent page, it's both "broken" and a "template" link, and both facts are potentially useful
I don't fully understand the use of a "broken" field. Does this eliminate the need for verifying page existence during rendering?
Maintaining "broken" requires a potentially large update for each page creation, deletion, or move.
- l_type (probably an int, like namespace)
--- values could include: --- LT_NORMAL: just an ordinary [[free link]] (or an escaped link to a [[:Image:Page]] or [[:Category:Page]]) --- LT_IMAGE: it's an image display (l_to_ns would be redundant, but never mind) --- LT_TEMPLATE: it's a {{template inclusion}} --- LT_INTERWIKI: currently, interwiki links don't get stored anywhere, but it could be useful if they were; for instance, if someone changes the interwiki map, it might be worth looking up what links are affected. Obviously, l_to_ns would be meaningless, and any search for a specific prefix would involve matching the text, but might it be worth considering? --- LT_CATEGORY: as I say, this has rather awkward issues - where do the extra fields go? l_to_ns could be reused [abused] as the namespace of the *originating* page if keeping that up-to-date was considered better than looking it up in the page table
I want to think about your ideas on redirects, but I don't have time at the moment. I'll try to get back to you later.
-Rich Holton en.wikipedia:User:Rholton