On Mon, Jul 08, 2002 at 04:01:38PM -0700, lcrocker@nupedia.com wrote:
I could easily enough add a unique key with a combination of the "from" and "to" fields.
Actually, I don't think you can. I tried that at the time and found out that MySQL cannot define unique indexes over such large (in bytes) combinations. So unless you changed the types of the columns, that is not possible.
Is there a good reason to do that? I never access the table by that--I only access it by the individual fields *i.e., there's never any query in which both fields appear in the where clause or an order by or group by).
The main reason would be data integrity and perhaps compactness, not access, and, well, we would be breaking Codd's first commandment: "Thou shalt always have a primary key". :-) But seriously, you could quite easily maintain this constraint by making the variabel for broken links an associative array that associates the links names with "1" or something. Thanks to the excellent object-oriented way in which you organized the code, that would probably be a simple change in the class LinkCache plus some changes when reading/writing it from the database.
Would it be worth it? I think that #pages is a slightly better metric than #links because although it is policy to link every word only once on a page, a newcomer might make a certain titel inadvertantly very wanted by linking it very often on the same page.
MySQL is doing it, but it uses a temp file ("explain" on the query says so, anyway). And the error we were getting before was the size of the tempfile bumping into a disk quota.
Ultimately the best solution would be to have a table wanted(title, #pages) with an index on #pages (and a unique index on title), then MySQL wouldn't need to sort at all. I don't know of the top of my head if there are any other queries that depend on 'brokenlinks' but I don't believe so and if there are not then I would recommend replacing it. That would really speed things up and save resources.
Translating the updates of 'brokenlinks' to updates of 'mostwanted' is not trivial but also not very difficult either. If a page is deleted/replaced you subtract 1 of #pages for every title it had a broken link to. If a page is created/updated you add 1 to #pages for every title it has a broken link to. The only extra work is that you have to check before the increment if the title is already in the table (otherwise you have to create it first) and on the other hand if after the decremetn #pages has become zero (in which case you probaly want to delete it).
-- Jan Hidders