On Fri, 02 Apr 2004 01:55:31 -0800, Brion Vibber wrote:
Then they'll still have to make 240,000+ HTTP connections to check every individual page for updates, which can take days or weeks depending on the crawl delay.
What about adding a squid running at yahoo to our cache purge list? They could constantly crawl that one, and only purged pages would be fetched (from the wp caches configured as parent to the yahoo box, no extra db access).
That's zero extra db usage and no scripting on our part required. And the wp html should be pretty easy to parse for them by cutting out the content of <div id='content'>.