On Thu, 3 Mar 2005 17:28:50 +0000, Rowan Collins rowan.collins@gmail.com wrote:
On Thu, 3 Mar 2005 10:05:15 -0600, Richard Holton richholton@gmail.com wrote:
Yes, that is the trade off. However, when a category page moves, it only has to update its own categorylink entries -- one for each each [[category:xxx]] on that page.
It's not moving the *category* that's the main problem, but moving the *content*, since that could cross namespaces. I think you did get my point, it's just this sentence is either a typo or a non sequitur (or both) ;-)
Yes, it was a typo. I meant that when a page moves, it only has to update its own categorylink entries.
Note that currently, the cl_sortkey field is updated on any page move for those links that don't have a specified sort key.
Ah, I didn't realise that; but I guess it makes sense - especially once you get into the listing needing to be split into pages.
Currently, to find subcategories for a category page, the links have to be retrieved and separated by namespace in PHP.
I'm not that hot on SQL (serves me right for not studying straight CS at uni), but isn't it possible to do something like "SELECT <stuff> FROM categorylinks, page WHERE cl_to=<whatever> AND page_namespace=NS_CATEGORY LIMIT 200", thus avoiding the problem you're describing?
I've seen things that look similar to this in the code, but it could be that
- I have misunderstood, and this particular one's impossible
- this would in fact be possible, but prohibitively
expensive/inefficient, and therefore it's been discarded as an option 3) it would amount to exactly the same activity as doing it as a sequence of queries with PHP in between
Yes, I realized the same thing about 20 minutes after I posted my message. Funny how you can fail to see something until you actually put things to print. The namespace can be grabbed from the page table via a join. I _don't_ know what sort of inefficiencies are introduced by doing so.
I was thinking of having an index on (cl_to, cl_from_namespace, cl_sortkey) to have really speedy category page builds.
If I remember my CS courses correctly (now many years old), proper database design would have us not duplicate the namespace field. However, efficiency does sometimes trump theory.
There's an additional problem with merging the categorylinks table in with the others anyway, in that it has extra fields cl_sortkey and cl_timestamp that the others don't have or need. That's a shame, because it would be neat, if redesigning the tables, not to have to leave that one out.
By the way, cl_timestamp is currently unused (I believe). It can be handy for debugging, but otherwise I don't know it's intended purpose.
Hm. I thought maybe it was intended that additions and removals from a category could be presented in a history display of some sort, or on watchlists etc. But a timestamp would only allow listing of additions, so I'm not sure that can be right.
Still, unless there's a sensible way of reusing the sortkey field in other link-types, it's probably not a good idea to merge category links in with everything else. [To store the namespace_from, you could just reuse the otherwise redundant namespace_to field, but nothing springs to mind that could double up with the sortkey in this way]
I'm not thrilled with the idea of fudging the namespace_from data into the namespace_to field., though I certainly could live with it. However, I also cannot figure out any use for the cl_sortkey field for non-category links. We're probably better off not trying to force-fit things.
- l_is_broken (boolean)
--- to allow everything that brokenlinks currently does; this should be kept seperate from the link type, because if you try to {{include}} a non-existent page, it's both "broken" and a "template" link, and both facts are potentially useful
I don't fully understand the use of a "broken" field. Does this eliminate the need for verifying page existence during rendering?
Maintaining "broken" requires a potentially large update for each page creation, deletion, or move.
Well, there must be some reason we currently store "brokenlinks" separate from "links", right? I'm not sure the distinction is used when rendering, that seems to be done by looking at the page table, but there's an awful lot of deferral and caching, so maybe it is. [Or, alternatively, maybe it should be! I don't know.] It's used by utilities like Special:Wantedpages, I know that much. It certainly seems to make sense to store the distinction.
As for updates - yes, creation or deletion will have a big impact, but all the linking pages need their cache invalidating anyway, because they'll have the wrong colour links, so it's always going to be a big deal. And note that *moving* a page doesn't break any links - there's still a page at the old location, albeit a brand new redirect. Somebody might delete that redirect later, but that's a seperate action.
Yes, but if a page is moved to a previously unoccupied location, then you need to check if some previously broken links are now not broken.
-Rich Holton en.wikipedia:User:Rholton