If you add a proxy cache like Varnish in front of the endpoint, it
will cache based on the Cache-Control: max-age and ETag headers sent
by the endpoint, which I guess can be configured. But you can also
PURGE and BAN specific cache entries from Varnish to force fresh
retrieval.
On Wed, Feb 17, 2016 at 10:52 AM, Markus Krötzsch
<markus(a)semantic-mediawiki.org> wrote:
On 17.02.2016 10:34, Magnus Manske wrote:
On Wed, Feb 17, 2016 at 7:16 AM Stas Malyshev <smalyshev(a)wikimedia.org
<mailto:smalyshev@wikimedia.org>> wrote:
Well, again the problem is that one use case that I think absolutely
needs caching - namely, exporting data to graphs, maps, etc. deployed
on
wiki pages - is also the one not implemented yet because we don't have
cache (not only, but one of the things we need) so we've got chicken
and
egg problem here :) Of course, we can just choose something now based
on
educated guess and change it later if it works badly. That's probably
what we'll do.
Wouldn't those usecases be wrapped in an extension or WMF-controlled
JavaScript? In that case, queries could always indicate that use, and
they could be cached, for hours if need be. No reason to put everything
behind a long cache by default, just because of those controllable cases.
Also, what about creating more independent blazegraph instances? One (or
more) could be for wiki extension queries, with long cache; others could
be for Labs use (internal network only?), a "general" external-facing
server, etc.
If the problem (and it's not even certain we have one) can be mitigated
or solved with throwing a few more VMs at it, I'm all for it :-)
+1 for adding some servers before building complicated caching solutions
I think long caching periods for wiki queries could lead to user frustration
for the reasons I gave in my other post. But maybe one can simply give the
user a way to say "please recompute this query now" to avoid this.
Another thing one could do for wiki-based (and other) queries is to use
caches as fallbacks in case of timeouts: "we will try our best to give you a
fresh result, but if current load is too high, we will at least give you
some older result." This makes most sense for wiki-based queries which
repeat reliably over time (so it makes sense to keep one result, however old
it is, for all queries still used on some wiki page). Would be more work to
implement, so probably not the first thing to do without any real wiki usage
experiences yet.
Markus
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata