Mind the fact that this is not about the versioned quadstore. It is about simple triplestore, it mainly missing bindings for foundationdb and SPARQL syntax.
Also, I will prolly need help to interface with geo and label services.
Feedback welcome!
I got "feedback" in others threads from the same topic that I will quote and reply to.
> So there needs to be some smarter solution, one that we'd unlike to develop inhouse
Big cat, small fish. As wikidata continue to grow, it will have specific needs.
Needs that are unlikely to be solved by off-the-shelf solutions.
> but one that has already been verified by industry experience and other deployments.
FoundationDB and WiredTiger are respectively used at Apple (among other companies)
and MongoDB since 3.2 all over-the-world. WiredTiger is also used at Amazon.
> We also have a plan on improving the throughput of Blazegraph, which we're working on now.
What is the phabricator ticket? Please.
> "Evaluation of Metadata Representations in RDF stores"
I don't understand how this is related to the scaling issues.
> [About proprietary version Virtuoso], I dare say [it must have] enormous advantage for us to consider running it in production.
That will be vendor lock-in for wikidata and wikimedia along all the poor souls that try to interop with it.
> This project seems to be still very young.
> AgangoDB seems to be document database inside.
It has two backends: MMAP and rocksdb.
> While I would be very interested if somebody took on themselves to model Wikidata
> in terms of ArangoDB documents,
It looks like a bounty.
ArangoDB is a multi-model database, it support:
- Document
- Graph
- Key-Value
> load the whole data and see what the resulting performance would be, I am not sure
> it would be wise for us to invest our team's - very limited currently - resources into that.
I am biased. I would advise against trying arangodb. This is another short term solution.
> the concept of having single data store is probably not realistic at least
> within foreseeable timeframes.
Incorrect. My solution is in the foreseeable future.
> We use separate data store for search (ElasticSearch) and probably will
> have to have separate one for queries, whatever would be the mechanism.
It would be interesting to read how much "resource" is poured into keeping
all those synchronized:
- ElasticSearch
- MySQL
- BlazeGraph
Maybe some REDIS?