On Tue, Apr 1, 2008 at 5:39 PM, Tim Johansson <gurktim(a)gmail.com> wrote:
Second, all articles in the relevant category must
have their category
links changed. There are several obstacles involved in this task:
1. Finding all alternative ways of categorizing articles. It is simple
to match the simple category links and category lists, but more
difficult to find e.g. categories included from a template. Roan
Kattouw (Catrope) suggested category redirects for this, such that all
articles categorised as [[Category:A]] would also be listed at
[[Category:B]] if the prior has been redirected to the latter.
This is the way to do it. When we move an article currently, we don't
try to change the links to that article in all pages' wikitext, do we?
It would be hopeless. We rewrite the target at the point where the
link is followed, not where it's created.
2. Articles might be in the process of being edited
as the movement is
done. This, however, can be solved in the same manner as edit
collisions are currently solved.
I don't think it's necessary to worry about this. The wikitext of the
categorized pages should be unaffected, so there's nothing to resolve.
The analogy of ordinary links is helpful here.
3. The algorithm would likely have high complexity
and would thus not
scale well with very large categories.
This is likely to constitute a significant and challenging part of the project.
One implementation for this would be
1) Change cl_to to two columns: cl_to_id and cl_final_id. cl_to_id
would contain the id of the category that it's actually included in,
whereas cl_final_id would be the id of the category it's included in
once all redirects are resolved.
2) When querying what category something is in for the purposes of
category pages, etc., use cl_final_id, not cl_to_id.
3) When moving a category, change nothing in the categorylinks table;
the same cat_id will just refer to a new name. Create a redirect as
usual.
4) When changing an existing redirect (e.g., deleting it), or changing
an existing category into a redirect, just do UPDATE categorylinks SET
cl_final_id=$newdestination WHERE cl_to_id=$changedcat. This part
will be slow for large categories, perhaps unacceptably so for very
large ones. This is comparable to deleting large pages at present and
may need to be treated similarly.
This implementation is not normalized, which is why it's slow for
changing redirects. We could just use the same technique we use for
pages: join to the redirect table on every select. The problem is
that this doesn't work so well for necessities like sorting, as far as
I can see. You have to be able to sort efficiently when doing
retrieval for category pages. I'm a little tired right now, but I
can't see offhand how to do this in a way that's efficient for both
updating and selecting, you're right.
As the last step, the relevant entries in the
categorylinks table
would need to be changed. This is accomplished by a simple SQL query.
This could be avoided if bug #13579 [1] ("Category table should use
category ID rather than category name") is fixed, which it could be as
part of this project.
Well, the simple SQL query could turn out to be a problem for very
large categories. I might be wrong; a single update may well run
faster than the insert/delete we have right now for large page
deletions.
The project would preferably be written as a patch to
the core.
Catrope suggested setting up a separate SVN branch for the project,
such that everyone can see my progress.
Yes, certainly.
After the community bonding period
:)
On Tue, Apr 1, 2008 at 6:07 PM, David Gerard <dgerard(a)gmail.com> wrote:
Even having category redirects work properly (so that
if
[[Category:Foo]] redirects to [[Category:Bar]], putting an article in
[[Category:Foo]] means it shows up in [[Category:Bar]] - much like
redirected templates work) would be most helpful in allowing Commons
to use languages other than English for its category tree - so that
different-language names for the same thing would work the same, e.g.
[[Category:Horse]], [[Category:Cheval]] and [[Category:Hauspferd]]
could all point to [[Category:Equus caballus]] and Just Work.
The redirects seem like the hard part here. Once those are in place,
moving should be pretty easy. It practically just automates what
users could easily do anyway by copying over the page content and
adding a redirect manually.