Re: [Wikidata] SPARQL endpoint caching

17 Feb 2016

Hi!

...
  (2) Shouldn't BlazeGraph do the caching (too)? It
knows how much a query
 costs to re-run and it could even know if a query is affected by a data 
BlazeGraph does a lot of caching, but it's limited by the memory and it
AFAIK does not do whole query caching (like mysql does, for example) -
which means if you run two big queries one after another, the latter
could remove from cache what the former put there. Its caching, AFAIK,
is on much lower level. Which is helpful too since different queries
share a lot of underlying data, but not exactly our case here.

...
  update (a cache might still be the same as a current
result even after
 many data changes). Having several caching layers is useful, but the
 more elaborate (query-structure dependent) caching strategies should
 maybe be left to the database. 
I don't think Blazegraph does anything like resolving changes to see if
query results changed, that sound like pretty hard thing to do in triple
store. You can manually store specific query result AFAIK but that's
just form of writing data as I understand and may not be very scalable.

...
  The points (3)-(5) are based on guessing. As Magnus
said, some analysis
 could help to confirm or refute this. On the other hand, caching should
 not just focus on current usage patterns only, but consider a bit what
 could happen in the future. 
Well, again the problem is that one use case that I think absolutely
needs caching - namely, exporting data to graphs, maps, etc. deployed on
wiki pages - is also the one not implemented yet because we don't have
cache (not only, but one of the things we need) so we've got chicken and
egg problem here :) Of course, we can just choose something now based on
educated guess and change it later if it works badly. That's probably
what we'll do.

Thanks,
-- 
Stas Malyshev
smalyshev(a)wikimedia.org

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] SPARQL endpoint caching