Would it be an idea if HDT remains unfeasible to place the journal file of blazegraph online? Yes, people need to use blazegraph if they want to access the files and query it but it could be an extra next to turtle dump?
On 27 Oct 2017, at 17:08, Laura Morales lauretas@mail.com wrote:
Hello everyone,
I'd like to ask if Wikidata could please offer a HDT [1] dump along with the already available Turtle dump [2]. HDT is a binary format to store RDF data, which is pretty useful because it can be queried from command line, it can be used as a Jena/Fuseki source, and it also uses orders-of-magnitude less space to store the same data. The problem is that it's very impractical to generate a HDT, because the current implementation requires a lot of RAM processing to convert a file. For Wikidata it will probably require a machine with 100-200GB of RAM. This is unfeasible for me because I don't have such a machine, but if you guys have one to share, I can help setup the rdf2hdt software required to convert Wikidata Turtle to HDT.
Thank you.
[1] http://www.rdfhdt.org/ [2] https://dumps.wikimedia.org/wikidatawiki/entities/
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata