Dear all,
It also seems that some of your post answers the question from my previous email. That sounds as if it is pretty hard to create HDT exports (not much surprise there). Maybe it would be nice to at least reuse the work: could we re-publish your HDT dumps after you created them?
yes, sure, here they are: http://wikidataldf.com/download/
I should add, yes, it is pretty hard to create the HDT file since the process requires an awful lot of RAM, and I don't know if in the future I will be able to produce them.
Maybe some nuance: creating HDT exports is not *that* hard.
First, on a technical level, it's simply: rdf2hdt -f turtle triples.ttl triples.hdt so that's not really difficult ;-)
Second, concerning machine resources: for datasets with millions of triples, you can easily do it on any machine. It doesn't take that much RAM, and certainly not that much disk space. When you have hundreds of millions of triples, as is the case with Wikidata/DBpedia/…, having a significant amount of RAM does indeed help a lot. The people working on HDT will surely improve that requirement in the future.
We should really see HDT generation as a one-time server effort that serves to reduce future server efforts significantly.
Best,
Ruben
PS If anybody has trouble generating an HDT file, feel free to send me a link to your dump and I'll do it for you.