Nicolas Dumazet schreef:
== Original idea : add a cl_final field ==
<snip>
- Use case #6 :
When adding a category to a page, you have to fetch its corresponding cat_id for cl_to, (fairly easy), but you also have to fetch the right cl_final. You have to know first, if the category is a redirect. And if I'm right, with that schema, the only ways to tell this actually, are to retrieve the corresponding page_is_redirect in the page table, or to check for an entry in the redirect table. I believe that this forces us to compute a page-redirect join on page_id + a redirect-category join on page_title for each category title (see [3] ). If there's no results for that query, (can be caused if {1} the category page does not exist, {2} the category page exist but is not a redirect), the category is not a redirect, and else it returns us the cat_id for cl_final.
You wouldn't need to go through the page table here if you added a cat_page field to the category table. It's not a big deal, though, because joining on page_namespace=14 AND page_title=cat_title is pretty cheap.
- Use case #7 :
Easy. SELECT ... FROM category_links WHERE cl_final = ##
== But what about adding a cat_final field instead ? ==
When A redirects to B, all A' category_links entries will share the same cl_final field. Being quite unexperienced, and very naive, I may miss something important here... but I think that it makes much more sense to add that shared value to the category table, instead of duplicating it times the number of pages included in the category.
I'm saying that instead of adding a cl_final field to the category_links table, we should perhaps add a cat_final field to the category table pointing to the final category it belongs to (see [4] ).
Sounds like a quite sane idea to me.
- Use cases #2 #3 and #4 :
Trivial. Change cat_final in one category table row.
- Use case #5 :
Damn. Tricky. Alter TWO category table rows :p
- Use case #6 :
Easy, fetch the corresponding cat_id to fill cl_to.
- Use case #7 :
The most expensive operation for this proposal. You have to join category_links and category (hopefully on cat_id)
Can be done on cat_title=cl_to too, doesn't really matter. Joining on cat_id is cleaner of course, but I don't think it'll be any faster.
to retrieve what was cl_final in the first proposition, and select ... where cat_final = ###
I don't see how this is a big deal. The queries would be: SELECT cat_id FROM category WHERE cat_title='Foo' LIMIT 1 SELECT cl_from FROM categorylinks JOIN category ON cl_to=cat_title WHERE cat_final=123 (here 123 is the result from the first query) The necessary indices are in place, and assuming that we'll add a KEY cat_final (cat_final) along with the cat_final field (it would be very stupid not to), this should be a rather cheap query.
Evaluating the cost of a query is clearly one of the hardest things to do for that young me, and I fail to estimate the cost of that last query. How expensive is it compared to its use frequency ? Being apparently the most expensive part for this proposed change, the answer of that question will probably state how valid was my second approach ...
In my opinion, the benefit of having a fixed and small (1 or 2) number of UPDATEs rather than a potentially huge one when changing stuff outweighs the cost of having a slightly more complex query which for lookup which, in my opinion, is still quite cheap. There are quite a few joins like those in core already (most against page, though, but that doesn't really matter), so I don't think it'll cause any problems.
I need your help to finalize the schema change needed to implement category redirects. I may also miss a third solution, even better, or have forgiven some important details while considering what has to be done : let me know :)
I'm not satisfied until a performance expert like Simetrical or Domas has cast their verdict, but option #2 looks like the way to go to me.
Roan Kattouw (Catrope)