On 31.10.2014 14:51, Cristian Consonni wrote:
2014-10-30 22:40 GMT+01:00 Cristian Consonni kikkocristian@gmail.com:
Ok, now I have managed to add the Wikidata statements dump too.
And I have added a wikidata.hdt combined dump of all of the above.
Nice. We are running the RDF generation on a shared cloud environment and I am not sure we can really use a lot of RAM there. Do you have any guess how much RAM you needed to get this done?
2014-10-31 10:25 GMT+01:00 Ruben Verborgh ruben.verborgh@ugent.be:
Maybe some nuance: creating HDT exports is not *that* hard.
First, on a technical level, it's simply: rdf2hdt -f turtle triples.ttl triples.hdt so that's not really difficult ;-)
Yes, I agree. I mean, I am not an expert in the field - this should be clear by now :P - and I was able to do that. (by "not an expert in the field" I mean that I never heard about HDT or LDF before 6 days ago)
It should be noted that in the conversion of the statements and terms dump I obtained some "Unicode range" errors, which result in ignored triples (i.e. triples not inserted in the HDT files). I am unable to say if this is a problem of the dumps or of hdt-lib.
The OpenRDF library we use for creating the dumps has some fairly thorough range checks for every single character it exports (from the code I have seen), so my default assumption would be that it does the right thing. However, it is also true that Wikidata contains some very exotic unicode characters in its data. ;-)
Markus