Slow loading process - Wikidata-tech

11 Jun 2020


      Hi,
I have downloaded Blazegraph already compiled from [1]. I also made the
optimizations indicated at [2].
For the loading process I'm following the instructions given in the
"getting-started.md" file that comes in the "docs" folder of the compiled
distribution [1]. That means:
1- Munge the data with: ./munge.sh -f
data/wikidata-20150427-all-BETA.ttl.gz -d data/split -l en -s
2- Start the loading process with: ./loadRestAPI.sh -n wdq -d
`pwd`/data/split
Then the loading process starts with a rate of 84352.  However, the rate
has been progressively going down till 3362 after 36 hours of loading.
I'm running the process on a HPC with SSD and I'm giving to the loading
process 3 cores and 120 GB RAM. On the other hand, I notice that the
average processor usage doesn't go up over 1.6 and the maximum RAM usage is
14 GB.
I also saw [3] and I'm running the loading natively (without containers). I
have the difference with [3] that I've reduced the JVM heap to 4GB as [2]
suggested.
So what else could I do to improve the loading performance.
Thanks,
Leandro
[1]
http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.wikidata.query.rdf%22...
[2] https://github.com/blazegraph/database/wiki/IOOptimization
[3]
https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-...