If checking for all the links is too slow, we should *additionally* the
Phase II method:
* Every article has a "existing links" field (MEDIUMBLOB or the like)
* It contains the links known to exist, separated by some unique string
("\n")
* It is loaded together with the article text upon display
* The (existing) link cache is filled with these values
* If any of the "broken" links turn out to exist, the field is updated
(or this is done only upon saving, that's up to you)
The only time-consuming operation is to be performed upon page deletion,
when all the fields that contain a link to the deleted page have to be
cleared (or rebuild).
In effect, most of the "does that article exist" queries will become
unnecessary.
Magnus
Brion Vibber wrote:
On Tuesday, Oct 21, 2003, at 10:04 US/Pacific, Poor,
Edmund W wrote:
Eureka! I've got it!!
Give each page a "links complete" bit.
The first time the page is loaded, we check all the links, then set the
bit ON.
After that, any page change (creation, deletion, etc.) WHICH AFFECTS
THIS PAGE would then turn the bit OFF.
How's this different from the cur_touched cache invalidation timestamp
we've got now?
-- brion vibber (brion @
pobox.com)
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)Wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l