Re: [Wikitech-l] Schema change : category redirects

1 Jul 2008

      Nicolas Dumazet schreef:
...
== Original idea : add a cl_final field ==
<snip>

Use case #6 :

When adding a category to a page, you have to fetch its corresponding
cat_id for cl_to, (fairly easy), but you also have to fetch the right
cl_final.
You have to know first, if the category is a redirect. And if I'm
right, with that schema, the only ways to tell this actually, are to
retrieve the corresponding page_is_redirect in the page table, or to
check for an entry in the redirect table.
I believe that this forces us to compute a page-redirect join on
page_id + a redirect-category join on page_title for each category
title (see [3] ). If there's no results for that query, (can be caused
if {1} the category page does not exist, {2} the category page exist
but is not a redirect), the category is not a redirect, and else it
returns us the cat_id for cl_final.
You wouldn't need to go through the page table here if you added a 
cat_page field to the category table. It's not a big deal, though, 
because joining on page_namespace=14 AND page_title=cat_title is pretty 
cheap.
...

Use case #7 :

Easy. SELECT ... FROM category_links WHERE cl_final = ##
== But what about adding a cat_final field instead ? ==
When A redirects to B, all A' category_links entries will share the
same cl_final field. Being quite unexperienced, and very naive, I may
miss something important here... but I think that it makes much more
sense to add that shared value to the category table, instead of
duplicating it times the number of pages included in the category.
I'm saying that instead of adding a cl_final field to the
category_links table, we should perhaps add a cat_final field to the
category table pointing to the final category it belongs to (see [4]
).
Sounds like a quite sane idea to me.
...

Use cases #2 #3 and #4 :

Trivial. Change  cat_final in one category table row.

Use case #5 :

Damn. Tricky. Alter TWO category table rows :p

Use case #6 :

Easy, fetch the corresponding cat_id to fill cl_to.

Use case #7 :

The most expensive operation for this proposal. You have to join
category_links and category (hopefully on cat_id)
Can be done on cat_title=cl_to too, doesn't really matter. Joining on 
cat_id is cleaner of course, but I don't think it'll be any faster.
...
to retrieve what was
cl_final in the first proposition, and select ... where cat_final =
###
I don't see how this is a big deal. The queries would be:
SELECT cat_id FROM category WHERE cat_title='Foo' LIMIT 1
SELECT cl_from FROM categorylinks JOIN category ON cl_to=cat_title WHERE 
cat_final=123
(here 123 is the result from the first query)
The necessary indices are in place, and assuming that we'll add a KEY 
cat_final (cat_final) along with the cat_final field (it would be very 
stupid not to), this should be a rather cheap query.
...
Evaluating the cost of a query is clearly one of the hardest things to
do for that young me, and I fail to estimate the cost of that last
query. How expensive is it compared to its use frequency ? Being
apparently the most expensive part for this proposed change, the
answer of that question will probably state how valid was my second
approach ...
In my opinion, the benefit of having a fixed and small (1 or 2) number 
of UPDATEs rather than a potentially huge one when changing stuff 
outweighs the cost of having a slightly more complex query which for 
lookup which, in my opinion, is still quite cheap. There are quite a few 
joins like those in core already (most against page, though, but that 
doesn't really matter), so I don't think it'll cause any problems.
...
I need your help to finalize the schema change needed to implement
category redirects. I may also miss a third solution, even better, or
have forgiven some important details while considering what has to be
done : let me know :)
I'm not satisfied until a performance expert like Simetrical or Domas 
has cast their verdict, but option #2 looks like the way to go to me.
Roan Kattouw (Catrope)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Schema change : category redirects