Hey all,
Recently we wrote a paper discussing the query performance for Wikidata, comparing different possible representations of the knowledge-base in Postgres (a relational database), Neo4J (a graph database), Virtuoso (a SPARQL database) and BlazeGraph (the SPARQL database currently in use) for a set of equivalent benchmark queries.
The paper was recently accepted for presentation at the International Semantic Web Conference (ISWC) 2016. A pre-print is available here:
http://aidanhogan.com/docs/wikidata-sparql-relational-graph.pdf
Of course there are some caveats with these results in the sense that perhaps other engines would perform better on different hardware, or different styles of queries: for this reason we tried to use the most general types of queries possible and tried to test different representations in different engines (we did not vary the hardware). Also in the discussion of results, we tried to give a more general explanation of the trends, highlighting some strengths/weaknesses for each engine independently of the particular queries/data.
I think it's worth a glance for anyone who is interested in the technology/techniques needed to query Wikidata.
Cheers, Aidan
P.S., the paper above is a follow-up to a previous work with Markus Krötzsch that focussed purely on RDF/SPARQL:
http://aidanhogan.com/docs/reification-wikidata-rdf-sparql.pdf
(I'm not sure if it was previously mentioned on the list.)
P.P.S., as someone who's somewhat of an outsider but who's been watching on for a few years now, I'd like to congratulate the community for making Wikidata what it is today. It's awesome work. Keep going. :)