Re: [Wikidata] Wikidata HDT dump

2 Oct 2018


      You shouldn't have to keep anything in RAM to HDT-ize something as you 
could make the dictionary by sorting on disk and also do the joins to 
look up everything against the dictionary by sorting.
------ Original Message ------
From: "Ettore RIZZA" ettorerizza@gmail.com
To: "Discussion list for the Wikidata project." 
wikidata@lists.wikimedia.org
Sent: 10/1/2018 5:03:59 PM
Subject: Re: [Wikidata] Wikidata HDT dump
...
...
what computer did you use for this? IIRC it required >512GB of RAM to
function.
Hello Laura,
Sorry for my confusing message, I am not at all a member of the HDT 
team. But according to its creator 
https://twitter.com/ciutti/status/1046849607114936320, 100 GB "with 
an optimized code" could be enough to produce an HDT like that.
On Mon, 1 Oct 2018 at 18:59, Laura Morales lauretas@mail.com wrote:
...
...
a new dump of Wikidata in HDT (with index) is
available[http://www.rdfhdt.org/datasets/].
Thank you very much! Keep it up!
Out of curiosity, what computer did you use for this? IIRC it required
...
512GB of RAM to function.
...
You will see how Wikidata has become huge compared to other
datasets. it contains about twice the limit of 4B triples discussed 
above.
There is a 64-bit version of HDT that doesn't have this limitation of 
4B triples.
...
In this regard, what is in 2018 the most user friendly way to use
this format?
Speaking for me at least, Fuseki with a HDT store. But I know there 
are also some CLI tools from the HDT folks.

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Wikidata HDT dump