Lars Aronsson wrote:
>>> The "Astronomer" page contains an extraordinary number of links,
>>> mostly to "year" pages. Each link on a page requires a database
>>> lookup (a quick one, but still a lookup). "Current events" had
>>
If this is true, I think the code should be rewritten to use an in-core
hash table that contains (as keys) the names of all existing pages. It
would only be used to determine whether a [[link]] should be
underlined or
a ?-link. This in-core data structures would not be shared between
threads of a PHP application (at least not if load balancing over
multiple
hosts), but it could be recreated from the database (select name from
cur)
every few minutes or so (and whenever a reload or post is served).
Pages
that are created between these intervals will falsely have ?-links for a
short while, big deal.
Hmm, maybe.
Or hits would be read from the hash, but misses
would force a read from the database. Avoiding a
roundtrip down to the
database sounds like a great saving to me.
As Neil suggested, couldn't these checks be done in just one big query
rather than a lot of separate ones? ie, something like:
SELECT cur_title
FROM cur
WHERE cur_title
IN
('Scientist','Research','Astronomy','Astrophysics',[...])
or would that just get more complicated?
I agree. Roll-your-own hash tables are hard to scale, and a bad idea
unless nothing else works. Keeping everything SQL will make things much
easier to go to a 3-tier Wikipedia (with a separate database
server/server cluster and multiple page servers) at a later date.
I _think_ that the big-query approach should be _lots_ faster: for say,
300 links, it will do only one lot of query-round-trip and parsing
overhead, instead of 300 lots.
Neil