Re: [Wikidata] SPARQL endpoint caching

17 Feb 2016


      If you add a proxy cache like Varnish in front of the endpoint, it
will cache based on the Cache-Control: max-age and ETag headers sent
by the endpoint, which I guess can be configured. But you can also
PURGE and BAN specific cache entries from Varnish to force fresh
retrieval.
On Wed, Feb 17, 2016 at 10:52 AM, Markus Krötzsch
markus@semantic-mediawiki.org wrote:
...
On 17.02.2016 10:34, Magnus Manske wrote:
...
On Wed, Feb 17, 2016 at 7:16 AM Stas Malyshev <smalyshev@wikimedia.org
mailto:smalyshev@wikimedia.org> wrote:
Well, again the problem is that one use case that I think absolutely
needs caching - namely, exporting data to graphs, maps, etc. deployed

on
    wiki pages - is also the one not implemented yet because we don't have
    cache (not only, but one of the things we need) so we've got chicken
and
    egg problem here :) Of course, we can just choose something now based
on
    educated guess and change it later if it works badly. That's probably
    what we'll do.
Wouldn't those usecases be wrapped in an extension or WMF-controlled
JavaScript? In that case, queries could always indicate that use, and
they could be cached, for hours if need be. No reason to put everything
behind a long cache by default, just because of those controllable cases.
Also, what about creating more independent blazegraph instances? One (or
more) could be for wiki extension queries, with long cache; others could
be for Labs use (internal network only?), a "general" external-facing
server, etc.
If the problem (and it's not even certain we have one) can be mitigated
or solved with throwing a few more VMs at it, I'm all for it :-)
+1 for adding some servers before building complicated caching solutions
I think long caching periods for wiki queries could lead to user frustration
for the reasons I gave in my other post. But maybe one can simply give the
user a way to say "please recompute this query now" to avoid this.
Another thing one could do for wiki-based (and other) queries is to use
caches as fallbacks in case of timeouts: "we will try our best to give you a
fresh result, but if current load is too high, we will at least give you
some older result." This makes most sense for wiki-based queries which
repeat reliably over time (so it makes sense to keep one result, however old
it is, for all queries still used on some wiki page). Would be more work to
implement, so probably not the first thing to do without any real wiki usage
experiences yet.
Markus

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] SPARQL endpoint caching