You shouldn't have to keep anything in RAM to HDT-ize something as you could make the dictionary by sorting on disk and also do the joins to look up everything against the dictionary by sorting.
------ Original Message ------ From: "Ettore RIZZA" ettorerizza@gmail.com To: "Discussion list for the Wikidata project." wikidata@lists.wikimedia.org Sent: 10/1/2018 5:03:59 PM Subject: Re: [Wikidata] Wikidata HDT dump
what computer did you use for this? IIRC it required >512GB of RAM to
function.
Hello Laura,
Sorry for my confusing message, I am not at all a member of the HDT team. But according to its creator https://twitter.com/ciutti/status/1046849607114936320, 100 GB "with an optimized code" could be enough to produce an HDT like that.
On Mon, 1 Oct 2018 at 18:59, Laura Morales lauretas@mail.com wrote:
a new dump of Wikidata in HDT (with index) is
available[http://www.rdfhdt.org/datasets/].
Thank you very much! Keep it up! Out of curiosity, what computer did you use for this? IIRC it required
512GB of RAM to function.
You will see how Wikidata has become huge compared to other
datasets. it contains about twice the limit of 4B triples discussed above.
There is a 64-bit version of HDT that doesn't have this limitation of 4B triples.
In this regard, what is in 2018 the most user friendly way to use
this format?
Speaking for me at least, Fuseki with a HDT store. But I know there are also some CLI tools from the HDT folks.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata