On Wed, Apr 2, 2008 at 2:03 AM, Samuel Wantman wantman@earthlink.net wrote:
I don't know if this has been discussed, but I'm hoping some serious consideration could be put into creating a category history that can be viewed and used for reverting.
That would be a very good feature, yes. It's also worth considering at some point.
On Wed, Apr 2, 2008 at 4:58 AM, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
Wouldn't it be easier for upgrading and backwards compatibility to keep the current cl_to field which should indicate the category that is indicated in wikitext, and add a cl_id field, which indicates the real category that is being pointed to.
cl_to is a VARCHAR(255) times 200 million rows. Being able to get rid of it would significantly reduce the size (therefore also, to some extent, improve the speed) of the categorylinks table. Furthermore, having both the name and ID stored will unnecessarily allow inconsistency, i.e., it's gratuitously denormalized.
There will probably have to be a transitional period where both fields are present, just for the sake of updating. However, I'm viewing this as best made an intra-version period, so it changes totally from one release to the next. This is a breaking schema change, but we can't *always* avoid those. We don't have major versions that we can pack them all into; instead we sprinkle them in minor versions.
On Wed, Apr 2, 2008 at 8:02 AM, Roan Kattouw roan.kattouw@home.nl wrote:
Simetrical schreef:
Well, the simple SQL query could turn out to be a problem for very large categories. I might be wrong; a single update may well run faster than the insert/delete we have right now for large page deletions.
That's why I suggested using the category table rather than changing lots of rows in categorylinks.
Using the category table how? Just changing the id's? It doesn't work if you want to then change them back, or alter redirects. You could do a join, but that seems like it would break sorted retrieval.
There is one thing nobody mentioned yet: nonexistent categories can have members, so it's possible to move one category on top of another one. For example, let [[Category:A]] be an existent category and [[Category:B]] a nonexistent one that does have members. If [[Category:A]] is then moved to [[Category:B]] (which is allowed, since the target doesn't exist), the categories would have to be merged. The thing is that A and B had different category IDs before the move, but the merged category will only have one ID after the move. This again means updating category IDs in the categorylinks table. We could probably use row count estimates here to decide which ID the unified category gets (A's or B's, depending on which one would result in more rows being changed) and stuff the UPDATEs in the job queue if both estimates are unacceptably large.
Why would we want to allow moving one category on top of another? Why not ban it, and allow people to create a redirect if they want to "merge" them?