On Tue, Apr 1, 2008 at 5:39 PM, Tim Johansson gurktim@gmail.com wrote:
Second, all articles in the relevant category must have their category links changed. There are several obstacles involved in this task:
- Finding all alternative ways of categorizing articles. It is simple
to match the simple category links and category lists, but more difficult to find e.g. categories included from a template. Roan Kattouw (Catrope) suggested category redirects for this, such that all articles categorised as [[Category:A]] would also be listed at [[Category:B]] if the prior has been redirected to the latter.
This is the way to do it. When we move an article currently, we don't try to change the links to that article in all pages' wikitext, do we? It would be hopeless. We rewrite the target at the point where the link is followed, not where it's created.
- Articles might be in the process of being edited as the movement is
done. This, however, can be solved in the same manner as edit collisions are currently solved.
I don't think it's necessary to worry about this. The wikitext of the categorized pages should be unaffected, so there's nothing to resolve. The analogy of ordinary links is helpful here.
- The algorithm would likely have high complexity and would thus not
scale well with very large categories. This is likely to constitute a significant and challenging part of the project.
One implementation for this would be
1) Change cl_to to two columns: cl_to_id and cl_final_id. cl_to_id would contain the id of the category that it's actually included in, whereas cl_final_id would be the id of the category it's included in once all redirects are resolved.
2) When querying what category something is in for the purposes of category pages, etc., use cl_final_id, not cl_to_id.
3) When moving a category, change nothing in the categorylinks table; the same cat_id will just refer to a new name. Create a redirect as usual.
4) When changing an existing redirect (e.g., deleting it), or changing an existing category into a redirect, just do UPDATE categorylinks SET cl_final_id=$newdestination WHERE cl_to_id=$changedcat. This part will be slow for large categories, perhaps unacceptably so for very large ones. This is comparable to deleting large pages at present and may need to be treated similarly.
This implementation is not normalized, which is why it's slow for changing redirects. We could just use the same technique we use for pages: join to the redirect table on every select. The problem is that this doesn't work so well for necessities like sorting, as far as I can see. You have to be able to sort efficiently when doing retrieval for category pages. I'm a little tired right now, but I can't see offhand how to do this in a way that's efficient for both updating and selecting, you're right.
As the last step, the relevant entries in the categorylinks table would need to be changed. This is accomplished by a simple SQL query. This could be avoided if bug #13579 [1] ("Category table should use category ID rather than category name") is fixed, which it could be as part of this project.
Well, the simple SQL query could turn out to be a problem for very large categories. I might be wrong; a single update may well run faster than the insert/delete we have right now for large page deletions.
The project would preferably be written as a patch to the core. Catrope suggested setting up a separate SVN branch for the project, such that everyone can see my progress.
Yes, certainly.
After the community bonding period
:)
On Tue, Apr 1, 2008 at 6:07 PM, David Gerard dgerard@gmail.com wrote:
Even having category redirects work properly (so that if [[Category:Foo]] redirects to [[Category:Bar]], putting an article in [[Category:Foo]] means it shows up in [[Category:Bar]] - much like redirected templates work) would be most helpful in allowing Commons to use languages other than English for its category tree - so that different-language names for the same thing would work the same, e.g. [[Category:Horse]], [[Category:Cheval]] and [[Category:Hauspferd]] could all point to [[Category:Equus caballus]] and Just Work.
The redirects seem like the hard part here. Once those are in place, moving should be pretty easy. It practically just automates what users could easily do anyway by copying over the page content and adding a redirect manually.