Hello, 

a new dump of Wikidata in HDT (with index) is available. You will see how Wikidata has become huge compared to other datasets. it contains about twice the limit of 4B triples discussed above.

In this regard, what is in 2018 the most user friendly way to use this format?

BR,

Ettore

On Tue, 7 Nov 2017 at 15:33, Ghislain ATEMEZING <ghislain.atemezing@gmail.com> wrote:

Hi Jeremie,

Thanks for this info.

In the meantime, what about making chunks of 3.5Bio triples (or any size less than 4Bio) and a script to convert the dataset? Would that be possible ?

 

Best,

Ghislain

 

Provenance : Courrier pour Windows 10

 

De : Jérémie Roquet
Envoyé le :mardi 7 novembre 2017 15:25
À : Discussion list for the Wikidata project.
Objet :Re: [Wikidata] Wikidata HDT dump

 

Hi everyone,

 

I'm afraid the current implementation of HDT is not ready to handle

more than 4 billions triples as it is limited to 32 bit indexes. I've

opened an issue upstream: https://github.com/rdfhdt/hdt-cpp/issues/135

 

Until this is addressed, don't waste your time trying to convert the

entire Wikidata to HDT: it can't work.

 

--

Jérémie

 

_______________________________________________

Wikidata mailing list

Wikidata@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikidata

 

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata