Re: [Wikidata] Status of Wikidata Query Service

7 Feb 2020

On 07.02.20 14:32, Guillaume Lederrey wrote:
...
  Keeping all of Wikidata in a single graph is most
probably not going 
 to work long term. We have not found examples of public SPARQL 
 endpoints with > 10 B triples and there is probably a good reason for 
 that. We will probably need to split the graphs at some point. We 
 don't know how yet (that's why we loaded the dumps into Hadoop, that 
 might give us some more insight). We might expose a subgraph with only 
 truthy statements. Or have language specific graphs, with only 
 language specific labels. Or something completely different. 
I have not looked in detail at query runtimes nor how blazegraph 
indexing works internally, however I noticed that in many cases queries 
that involve SPARQL property paths (and especially joins of those) take 
a long time to run. At the same time, I recently discovered that if we 
only store which entity is connected to which other entity (without 
storing the actual statement details, like property, qualifiers or 
ranks), those only take up about 2GB compressed with Zstandard (I 
represented each connection as <32 bit int source entity> <32 bit int 
destination entity>). Of course that discards a lot of important 
information, but it made me wonder if there is perhaps something that 
could be done to more efficiently evaluate queries, given the relatively 
strict schema the RDF representation of Wikidata adheres to? (Since it 
is generated from a more structured form, Statements). As an example, 
blazegraph doesn't know the relationship between wdt:Pxxx and p:Pxxx, or 
even things like p:Pxxx/ps:Pxxx.

Another, somewhat related idea: perhaps it's possible to keep the SPARQL 
interface for the frontend, but use a more efficient, split 
representation of the graph in the backend? Not sure how different that 
would be from the indexing that blazegraph does already, though.

Regards,

Benno

PS: appologies to Guillaume if you receive this mail twice, i clicked 
the wrong button when replying

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Status of Wikidata Query Service