Perhaps a way to address this dilemma is with a manual pull system. A
page that incorporates WikiData would display a message indicating
that the page uses data that was last refreshed *for this page* at
such-and-such a date/time. (This information can be cached with the
page, since it doesn't make a statement about the freshness of the
underlying data.) The display would also give the user the ability to
force a refresh if desired. (A DoS attack could be avoided by not
allowing refresh before x amount of time has passed since the last
refresh.)
Wouldn't be quite as automatic as the pull system described in the
original message, but it could avoid the severe performance penalty.
Just a thought.
Alan
On Thu, 21 Oct 2004 21:07:45 +0200, Magnus Manske <magnus.manske(a)web.de> wrote:
[...]
As good "wiki-fiddlers" (thanks so much,
Register!) we would like to see
every change in WikiData on the wikipedia pages real soon. Like, now.
So the information that something changes, and what changed, has to pass
from the data site to the display site. There are two ways to do that:
push or pull.
PUSH means the data site will notify the display site that something has
changed, and the display needs to be updated. For that, the data site
has to know which pages of the display site are affected by which
change. Then, it has to notify the display site of this. Bad things:
* Needs basically a cache of *all* queries *ever* asked of the data
site, as well as their results
* Has to recalculate *all* of these after *every* change to find which
queries produce different results
* Won't work if the display site is offline
* Won't work well with non-wikipedias
That can't be it.
PULL means the display site asks the data site if anything has changed,
which basically means rerunning a query. Which means, doing this for
*every* pageview, even for anons. Which means, all caching variants,
including squids, are going bye-bye. Additionally, for every page view,
the display site has to wait for the data site to complete the query.
Think wikipedia is slow today? Think again...
That can't be it, either.
Oh, sure, we can cache the queries with results on the display site, or
only update the data once a day/week, but then we won't be wiki (=quick)
anymore, no? Will this be the price to pay?