Perhaps a way to address this dilemma is with a manual pull system. A page that incorporates WikiData would display a message indicating that the page uses data that was last refreshed *for this page* at such-and-such a date/time. (This information can be cached with the page, since it doesn't make a statement about the freshness of the underlying data.) The display would also give the user the ability to force a refresh if desired. (A DoS attack could be avoided by not allowing refresh before x amount of time has passed since the last refresh.)
Wouldn't be quite as automatic as the pull system described in the original message, but it could avoid the severe performance penalty. Just a thought.
Alan
On Thu, 21 Oct 2004 21:07:45 +0200, Magnus Manske magnus.manske@web.de wrote: [...]
As good "wiki-fiddlers" (thanks so much, Register!) we would like to see every change in WikiData on the wikipedia pages real soon. Like, now. So the information that something changes, and what changed, has to pass from the data site to the display site. There are two ways to do that: push or pull.
PUSH means the data site will notify the display site that something has changed, and the display needs to be updated. For that, the data site has to know which pages of the display site are affected by which change. Then, it has to notify the display site of this. Bad things:
- Needs basically a cache of *all* queries *ever* asked of the data
site, as well as their results
- Has to recalculate *all* of these after *every* change to find which
queries produce different results
- Won't work if the display site is offline
- Won't work well with non-wikipedias
That can't be it.
PULL means the display site asks the data site if anything has changed, which basically means rerunning a query. Which means, doing this for *every* pageview, even for anons. Which means, all caching variants, including squids, are going bye-bye. Additionally, for every page view, the display site has to wait for the data site to complete the query. Think wikipedia is slow today? Think again...
That can't be it, either.
Oh, sure, we can cache the queries with results on the display site, or only update the data once a day/week, but then we won't be wiki (=quick) anymore, no? Will this be the price to pay?