On Thu, 3 Mar 2005 10:05:15 -0600, Richard Holton richholton@gmail.com wrote:
Another way to think about things: a category link is like a two-way link. It links both from the page to the category, and from the category to the page.
I definitely see the similarity between "what links here" and categories. In that sense, each page becomes its own category, with any link to the page a 'category entry'.
Yes, that's the concept behind "WikiCategories" on simple wikis like UseMod - the backlinks are the member pages. e.g. http://www.usemod.com/cgi-bin/mb.pl?CategoryMeatball, the contents of which just tells you to look at its backlinks.
Perhaps we could use this relationship to improve the 'what links here' display.
That would indeed be incredibly useful - that page is hideously presented, has an arbitrary hard-coded cut-off and is unsorted. If we could use the same display style (or even some shared super-class) as the category listings, that would be a huge improvement.
It almost makes me wonder if all links should store a from_namespace and sortkey, so that we can do the same paging and display seperation tricks - but that would of course require massive updates on every page move, so probably isn't going to happen.
... Of course, it suffers from a variant of the update problem that the current id->id based links table does: a page can be moved from one namespace to another, taking its category links with it.
Yes, that is the trade off. However, when a category page moves, it only has to update its own categorylink entries -- one for each each [[category:xxx]] on that page.
It's not moving the *category* that's the main problem, but moving the *content*, since that could cross namespaces. I think you did get my point, it's just this sentence is either a typo or a non sequitur (or both) ;-)
Note that currently, the cl_sortkey field is updated on any page move for those links that don't have a specified sort key.
Ah, I didn't realise that; but I guess it makes sense - especially once you get into the listing needing to be split into pages.
Currently, to find subcategories for a category page, the links have to be retrieved and separated by namespace in PHP.
I'm not that hot on SQL (serves me right for not studying straight CS at uni), but isn't it possible to do something like "SELECT <stuff> FROM categorylinks, page WHERE cl_to=<whatever> AND page_namespace=NS_CATEGORY LIMIT 200", thus avoiding the problem you're describing?
I've seen things that look similar to this in the code, but it could be that 1) I have misunderstood, and this particular one's impossible 2) this would in fact be possible, but prohibitively expensive/inefficient, and therefore it's been discarded as an option 3) it would amount to exactly the same activity as doing it as a sequence of queries with PHP in between
There's an additional problem with merging the categorylinks table in with the others anyway, in that it has extra fields cl_sortkey and cl_timestamp that the others don't have or need. That's a shame, because it would be neat, if redesigning the tables, not to have to leave that one out.
By the way, cl_timestamp is currently unused (I believe). It can be handy for debugging, but otherwise I don't know it's intended purpose.
Hm. I thought maybe it was intended that additions and removals from a category could be presented in a history display of some sort, or on watchlists etc. But a timestamp would only allow listing of additions, so I'm not sure that can be right.
Still, unless there's a sensible way of reusing the sortkey field in other link-types, it's probably not a good idea to merge category links in with everything else. [To store the namespace_from, you could just reuse the otherwise redundant namespace_to field, but nothing springs to mind that could double up with the sortkey in this way]
- l_is_broken (boolean)
--- to allow everything that brokenlinks currently does; this should be kept seperate from the link type, because if you try to {{include}} a non-existent page, it's both "broken" and a "template" link, and both facts are potentially useful
I don't fully understand the use of a "broken" field. Does this eliminate the need for verifying page existence during rendering?
Maintaining "broken" requires a potentially large update for each page creation, deletion, or move.
Well, there must be some reason we currently store "brokenlinks" separate from "links", right? I'm not sure the distinction is used when rendering, that seems to be done by looking at the page table, but there's an awful lot of deferral and caching, so maybe it is. [Or, alternatively, maybe it should be! I don't know.] It's used by utilities like Special:Wantedpages, I know that much. It certainly seems to make sense to store the distinction.
As for updates - yes, creation or deletion will have a big impact, but all the linking pages need their cache invalidating anyway, because they'll have the wrong colour links, so it's always going to be a big deal. And note that *moving* a page doesn't break any links - there's still a page at the old location, albeit a brand new redirect. Somebody might delete that redirect later, but that's a seperate action.