Neil Harris wrote:
On reflection, inventing new protocols for taking a real-time feed is probably uncessary for what will be a batch operation.
How about simply making plain-ASCII logfiles available, where each line consists of
<timestamp> <url>\n
which is grown in real-time by appending to it?
People wanting to freshen their caches can then download, trim off all entries previous to the last time they fetched, uniq the file to get only a single copy of each url, and then run a script to wget all the uls from inside their cache during off-peak hours, thus freshening it.
Very little software, a twenty-line perl script, no new protocols, and (I hope) achieves the same effect.
The pre-fetch component is a batch operation, but the cache clear isn't. When a user edits a page, we expect everyone in the world to be able to retrieve it within a second or two. That's why we already have apaches in two countries pumping out UDP packets which are routed by various means to all our squid servers. When someone makes an edit, the worldwide cache of that page is instantly purged, and the main point of my proposal is an automated method for ISP proxies to be part of that.
We could convert the CLR stream to ASCII text, but the software to do that could equally run on every individual squid, since they all have access to that stream. Either way, we need to put a flag in the CLR packets indicating whether the item should be pre-fetched or not.
Polling the recentchanges table to construct these ASCII files isn't really an option, we tried that with the RC->IRC bots and it turned out to be too resource-intensive, which is why they also use a UDP stream these days. We could add a third UDP stream for this purpose, or create a daemon and have TCP notification, but it seems easier to me to just modify one of the existing ones slightly, by adding a flag. The REASON field has a 4-bit width, and only two codes are defined, so there's plenty of room for us to suggest our own codes. Otherwise we can just stake a claim in the RESERVED section.
-- Tim Starling