On Thu, 3 Mar 2005 17:28:50 +0000, Rowan Collins
<rowan.collins(a)gmail.com> wrote:
On Thu, 3 Mar 2005 10:05:15 -0600, Richard Holton
<richholton(a)gmail.com> wrote:
Yes, that is the trade off. However, when a
category page moves, it
only has to update its own categorylink entries -- one for each each
[[category:xxx]] on that page.
It's not moving the *category* that's the main problem, but moving the
*content*, since that could cross namespaces. I think you did get my
point, it's just this sentence is either a typo or a non sequitur (or
both) ;-)
Yes, it was a typo. I meant that when a page moves, it only has to
update its own categorylink entries.
Note that
currently, the cl_sortkey
field is updated on any page move for those links that don't have a
specified sort key.
Ah, I didn't realise that; but I guess it makes sense - especially
once you get into the listing needing to be split into pages.
Currently, to find subcategories for a category
page, the links have
to be retrieved and separated by namespace in PHP.
I'm not that hot on SQL (serves me right for not studying straight CS
at uni), but isn't it possible to do something like "SELECT <stuff>
FROM categorylinks, page WHERE cl_to=<whatever> AND
page_namespace=NS_CATEGORY LIMIT 200", thus avoiding the problem
you're describing?
I've seen things that look similar to this in the code, but it could be that
1) I have misunderstood, and this particular one's impossible
2) this would in fact be possible, but prohibitively
expensive/inefficient, and therefore it's been discarded as an option
3) it would amount to exactly the same activity as doing it as a
sequence of queries with PHP in between
Yes, I realized the same thing about 20 minutes after I posted my
message. Funny how you can fail to see something until you actually
put things to print. The namespace can be grabbed from the page table
via a join. I _don't_ know what sort of inefficiencies are introduced
by doing so.
I was thinking of having an index on (cl_to, cl_from_namespace,
cl_sortkey) to have really speedy category page builds.
If I remember my CS courses correctly (now many years old), proper
database design would have us not duplicate the namespace field.
However, efficiency does sometimes trump theory.
There's an additional problem with merging the categorylinks table in
with the others anyway, in that it has extra fields cl_sortkey and
cl_timestamp that the others don't have or need. That's a shame,
because it would be neat, if redesigning the tables, not to have to
leave that one out.
By the way, cl_timestamp is currently unused (I believe). It can be
handy for debugging, but otherwise I don't know it's intended purpose.
Hm. I thought maybe it was intended that additions and removals from a
category could be presented in a history display of some sort, or on
watchlists etc. But a timestamp would only allow listing of additions,
so I'm not sure that can be right.
Still, unless there's a sensible way of reusing the sortkey field in
other link-types, it's probably not a good idea to merge category
links in with everything else. [To store the namespace_from, you could
just reuse the otherwise redundant namespace_to field, but nothing
springs to mind that could double up with the sortkey in this way]
I'm not thrilled with the idea of fudging the namespace_from data into
the namespace_to field., though I certainly could live with it.
However, I also cannot figure out any use for the cl_sortkey field for
non-category links. We're probably better off not trying to force-fit
things.
*
l_is_broken (boolean)
--- to allow everything that brokenlinks currently does; this should
be kept seperate from the link type, because if you try to {{include}}
a non-existent page, it's both "broken" and a "template" link,
and
both facts are potentially useful
I don't fully understand the use of a "broken" field. Does this
eliminate the need for verifying page existence during rendering?
Maintaining "broken" requires a potentially large update for each page
creation, deletion, or move.
Well, there must be some reason we currently store "brokenlinks"
separate from "links", right? I'm not sure the distinction is used
when rendering, that seems to be done by looking at the page table,
but there's an awful lot of deferral and caching, so maybe it is. [Or,
alternatively, maybe it should be! I don't know.] It's used by
utilities like Special:Wantedpages, I know that much. It certainly
seems to make sense to store the distinction.
As for updates - yes, creation or deletion will have a big impact, but
all the linking pages need their cache invalidating anyway, because
they'll have the wrong colour links, so it's always going to be a big
deal. And note that *moving* a page doesn't break any links - there's
still a page at the old location, albeit a brand new redirect.
Somebody might delete that redirect later, but that's a seperate
action.
Yes, but if a page is moved to a previously unoccupied location, then
you need to check if some previously broken links are now not broken.
-Rich Holton
en.wikipedia:User:Rholton