Re: [Wikidata] Wikidata HDT dump

2 Nov 2017


      Laura Morales kirjoitti 02.11.2017 klo 15:54:
...
...
The tool is in the hdt-jena package (not hdt-java-cli where the other
command line tools reside), since it uses parts of Jena (e.g. ARQ).
...
There is a wrapper script called hdtsparql.sh for executing it with the
proper Java environment.
Does this tool work nicely with large HDT files such as wikidata? Or does it need to load the whole graph+index into memory?
I haven't tested it with huge datasets like Wikidata. But for the 
moderately sized (40M triples) data that I use it for, it runs pretty 
fast and without using lots of memory, so I think it just memory maps 
the hdt and index file and reads only what it needs to answer the query.
-Osma
-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Wikidata HDT dump