Why would you need 2 tables? Why not have one with a page_id, link_type unique key. That way you could have page_id = 3535, link_type = 'categories' and such in the "nolinks" table.
Nick Jenkins wrote:
So I'd like to suggest that we all take a deep breath before starting a discussion under this thread. I think we should discuss the matter on a detached, technical level, and avoid the use of emotionally-charged words such as "fragile" and "bogus". Does that sound reasonable?
Yes, it does. I'll try my best to do this.
To check I understand the technical situation correctly, please let me summarise the available options for storing this information in the database, and their resulting pros and cons:
- Status Quo: Leave as-is for determining Special:UncategorizedPages and
Special:Deadendpages : Advantages : We know it works. It's implemented / the patch exists. It's conceptually clean (no magic values). No new database tables need to be added. Disadvantage: It's a slow database query, because it uses a LEFT JOIN.
- Introduce special magic values to indicate if a page has no page links,
or categories. (e.g. pagelinks.pl_title = '' and pagelinks.pl_namespace = 0 means no links, or pagelinks.pl_title = '##FROMBLANK' and pagelinks.pl_namespace = -3 means no links, or categorylinks.cl_to = '' means no categories ) : Advantages : It's a quicker database query. The patch exists. No new database tables need to be added. Disadvantage: Compatibility and cryptic-ness. To break that out a little: Concerns that there may be a cleaner solution, and that magic values are cryptic. Concerns that other stuff may break, and that once a magic value is introduced, it has a way of slowly spreading through code, and that every time that value is used, you have to say "-3" is not really "-3", rather it actually means "no links".
- Add a purpose-built table to store that the page has no links, and
another for no categories: Advantages : It's a quicker database query. It's conceptually clean (no magic values). Disadvantages: Adds 2 more tables, and the schema update means it may take time to apply to the cluster, and adding tables is generally painful and thus something people try to avoid doing. The patch does not exist. Is arguably more complicated that the other options - If links or categories are added, then 2 places have to be updated - the new table, and the current table. Is arguably excessive to introduce two new tables to store effectively 2 extra bits of information per page.
- To the page table, add a boolean field for "page_has_category", and
another one for "page_has_links", similar to the "page_is_redirect" field. Advantages : It's a quicker database query. It's conceptually clean (no magic values). No new database tables need to be added. Disadvantages: These fields would presumably have to be indexed to be of use (?), which would make adding rows to this table slower. If links or categories are initially added or all removed, then 2 places have to be updated - this field, plus the current table. The patch does not exist. Requires a schema update to add the fields, which takes time to apply to the cluster, and is something people try to avoid doing. The page table is a extremely important table to get right.
Is the above an accurate and fair summary of the situation? If not, please correct me, so we can at least agree on the facts of the available options, before debating opinions about which option is preferable.
Also, I can see why people have divergent opinions over this - simply put, there is no quick and easy solution which has only advantages, and no disadvantages. Which option you prefer comes down to what weighting you assign to the various advantages and disadvantages, and that weighting is different for different people.
-- All the best, Nick.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l