Bryan Tong Minh schreef:
On Wed, Apr 2, 2008 at 2:41 AM, Simetrical Simetrical+wikilist@gmail.com wrote:
- Change cl_to to two columns: cl_to_id and cl_final_id. cl_to_id
would contain the id of the category that it's actually included in, whereas cl_final_id would be the id of the category it's included in once all redirects are resolved.
- When querying what category something is in for the purposes of
category pages, etc., use cl_final_id, not cl_to_id.
Wouldn't it be easier for upgrading and backwards compatibility to keep the current cl_to field which should indicate the category that is indicated in wikitext, and add a cl_id field, which indicates the real category that is being pointed to.
That's probably a good idea.
Simetrical schreef:
Well, the simple SQL query could turn out to be a problem for very large categories. I might be wrong; a single update may well run faster than the insert/delete we have right now for large page deletions.
That's why I suggested using the category table rather than changing lots of rows in categorylinks.
- When changing an existing redirect (e.g., deleting it), or changing
an existing category into a redirect, just do UPDATE categorylinks SET cl_final_id=$newdestination WHERE cl_to_id=$changedcat. This part will be slow for large categories, perhaps unacceptably so for very large ones. This is comparable to deleting large pages at present and may need to be treated similarly.
Yes, at least something will suck here. I think your suggestion is preferable (making changing popular category redirects suck rather than making moving large categories suck), but maybe we could use the job queue here rather than a huge UPDATE query.
There is one thing nobody mentioned yet: nonexistent categories can have members, so it's possible to move one category on top of another one. For example, let [[Category:A]] be an existent category and [[Category:B]] a nonexistent one that does have members. If [[Category:A]] is then moved to [[Category:B]] (which is allowed, since the target doesn't exist), the categories would have to be merged. The thing is that A and B had different category IDs before the move, but the merged category will only have one ID after the move. This again means updating category IDs in the categorylinks table. We could probably use row count estimates here to decide which ID the unified category gets (A's or B's, depending on which one would result in more rows being changed) and stuff the UPDATEs in the job queue if both estimates are unacceptably large.
Roan Kattouw (Catrope)