Re: [Wikidata] Scaling Wikidata Query Service

12 Jun 2019


      Le dim. 9 juin 2019 à 23:18, Amirouche Boubekki <
amirouche.boubekki@gmail.com> a écrit :
...
I made a proposal for a grant at
https://meta.wikimedia.org/wiki/Grants:Project/WDQS_On_FoundationDB
Mind the fact that this is not about the versioned quadstore. It is about
simple triplestore, it mainly missing bindings for foundationdb and SPARQL
syntax.
Also, I will prolly need help to interface with geo and label services.
Feedback welcome!
I got "feedback" in others threads from the same topic that I will quote
and reply to.
...
So there needs to be some smarter solution, one that we'd unlike to
develop inhouse
Big cat, small fish. As wikidata continue to grow, it will have specific
needs.
Needs that are unlikely to be solved by off-the-shelf solutions.
...
but one that has already been verified by industry experience and other
deployments.
FoundationDB and WiredTiger are respectively used at Apple (among other
companies)
and MongoDB since 3.2 all over-the-world. WiredTiger is also used at Amazon.
...
We also have a plan on improving the throughput of Blazegraph, which
we're working on now.
What is the phabricator ticket? Please.
...
"Evaluation of Metadata Representations in RDF stores"
I don't understand how this is related to the scaling issues.
...
[About proprietary version Virtuoso], I dare say [it must have] enormous
advantage for us to consider running it in production.
That will be vendor lock-in for wikidata and wikimedia along all the poor
souls that try to interop with it.
...
This project seems to be still very young.
First commit
https://github.com/arangodb/arangodb/commit/6577d5417a000c29c9ee7666cbcc3cefae6eee21
is from 2011.
...
AgangoDB seems to be document database inside.
It has two backends: MMAP and rocksdb.
...
While I would be very interested if somebody took on themselves to model
Wikidata
...
in terms of ArangoDB documents,
It looks like a bounty.
ArangoDB is a multi-model database, it support:
- Document
- Graph
- Key-Value
...
load the whole data and see what the resulting performance would be, I am
not sure
...
it would be wise for us to invest our team's - very limited currently -
resources into that.
I am biased. I would advise against trying arangodb. This is another short
term solution.
...
the concept of having single data store is probably not realistic at
least
...
within foreseeable timeframes.
Incorrect. My solution is in the foreseeable future.
...
We use separate data store for search (ElasticSearch) and probably will
have to have separate one for queries, whatever would be the mechanism.
It would be interesting to read how much "resource" is poured into keeping
all those synchronized:
- ElasticSearch
- MySQL
- BlazeGraph
Maybe some REDIS?

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Scaling Wikidata Query Service