Le dim. 9 juin 2019 à 23:18, Amirouche Boubekki < amirouche.boubekki@gmail.com> a écrit :
I made a proposal for a grant at https://meta.wikimedia.org/wiki/Grants:Project/WDQS_On_FoundationDB
Mind the fact that this is not about the versioned quadstore. It is about simple triplestore, it mainly missing bindings for foundationdb and SPARQL syntax.
Also, I will prolly need help to interface with geo and label services.
Feedback welcome!
I got "feedback" in others threads from the same topic that I will quote and reply to.
So there needs to be some smarter solution, one that we'd unlike to
develop inhouse
Big cat, small fish. As wikidata continue to grow, it will have specific needs. Needs that are unlikely to be solved by off-the-shelf solutions.
but one that has already been verified by industry experience and other
deployments.
FoundationDB and WiredTiger are respectively used at Apple (among other companies) and MongoDB since 3.2 all over-the-world. WiredTiger is also used at Amazon.
We also have a plan on improving the throughput of Blazegraph, which
we're working on now.
What is the phabricator ticket? Please.
"Evaluation of Metadata Representations in RDF stores"
I don't understand how this is related to the scaling issues.
[About proprietary version Virtuoso], I dare say [it must have] enormous
advantage for us to consider running it in production.
That will be vendor lock-in for wikidata and wikimedia along all the poor souls that try to interop with it.
This project seems to be still very young.
First commit https://github.com/arangodb/arangodb/commit/6577d5417a000c29c9ee7666cbcc3cefae6eee21 is from 2011.
AgangoDB seems to be document database inside.
It has two backends: MMAP and rocksdb.
While I would be very interested if somebody took on themselves to model
Wikidata
in terms of ArangoDB documents,
It looks like a bounty.
ArangoDB is a multi-model database, it support:
- Document - Graph - Key-Value
load the whole data and see what the resulting performance would be, I am
not sure
it would be wise for us to invest our team's - very limited currently -
resources into that.
I am biased. I would advise against trying arangodb. This is another short term solution.
the concept of having single data store is probably not realistic at
least
within foreseeable timeframes.
Incorrect. My solution is in the foreseeable future.
We use separate data store for search (ElasticSearch) and probably will have to have separate one for queries, whatever would be the mechanism.
It would be interesting to read how much "resource" is poured into keeping all those synchronized:
- ElasticSearch - MySQL - BlazeGraph
Maybe some REDIS?