Hi,
I have downloaded Blazegraph already compiled from [1]. I also made the optimizations indicated at [2].
For the loading process I'm following the instructions given in the "getting-started.md" file that comes in the "docs" folder of the compiled distribution [1]. That means:
1- Munge the data with: ./munge.sh -f data/wikidata-20150427-all-BETA.ttl.gz -d data/split -l en -s 2- Start the loading process with: ./loadRestAPI.sh -n wdq -d `pwd`/data/split
Then the loading process starts with a rate of 84352. However, the rate has been progressively going down till 3362 after 36 hours of loading.
I'm running the process on a HPC with SSD and I'm giving to the loading process 3 cores and 120 GB RAM. On the other hand, I notice that the average processor usage doesn't go up over 1.6 and the maximum RAM usage is 14 GB.
I also saw [3] and I'm running the loading natively (without containers). I have the difference with [3] that I've reduced the JVM heap to 4GB as [2] suggested.
So what else could I do to improve the loading performance.
Thanks,
Leandro
[1] http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.wikidata.query.rdf%22... [2] https://github.com/blazegraph/database/wiki/IOOptimization [3] https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-...
It's generally advised to reply to the replies to your original mailing list post, rather than creating a very similar post few days later...
On Thu, 11 Jun 2020 at 13:33, Leandro Tabares Martín < leandro.tabaresmartin@uhasselt.be> wrote:
Hi,
I have downloaded Blazegraph already compiled from [1]. I also made the optimizations indicated at [2].
For the loading process I'm following the instructions given in the "getting-started.md" file that comes in the "docs" folder of the compiled distribution [1]. That means:
1- Munge the data with: ./munge.sh -f data/wikidata-20150427-all-BETA.ttl.gz -d data/split -l en -s 2- Start the loading process with: ./loadRestAPI.sh -n wdq -d `pwd`/data/split
Then the loading process starts with a rate of 84352. However, the rate has been progressively going down till 3362 after 36 hours of loading.
I'm running the process on a HPC with SSD and I'm giving to the loading process 3 cores and 120 GB RAM. On the other hand, I notice that the average processor usage doesn't go up over 1.6 and the maximum RAM usage is 14 GB.
I also saw [3] and I'm running the loading natively (without containers). I have the difference with [3] that I've reduced the JVM heap to 4GB as [2] suggested.
So what else could I do to improve the loading performance.
Thanks,
Leandro
[1] http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.wikidata.query.rdf%22... [2] https://github.com/blazegraph/database/wiki/IOOptimization [3] https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-... _______________________________________________ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Hi,
Please, find attached a picture of Blazegraph's performance during the load of Wikidata's dataset. This is after increasing the resources assigned to the job to 24 cores and 240 GB RAM. Do you think it is normal behaviour?
Thanks,
Leandro
On Thu, Jun 11, 2020 at 2:33 PM Leandro Tabares Martín < leandro.tabaresmartin@uhasselt.be> wrote:
Hi,
I have downloaded Blazegraph already compiled from [1]. I also made the optimizations indicated at [2].
For the loading process I'm following the instructions given in the "getting-started.md" file that comes in the "docs" folder of the compiled distribution [1]. That means:
1- Munge the data with: ./munge.sh -f data/wikidata-20150427-all-BETA.ttl.gz -d data/split -l en -s 2- Start the loading process with: ./loadRestAPI.sh -n wdq -d `pwd`/data/split
Then the loading process starts with a rate of 84352. However, the rate has been progressively going down till 3362 after 36 hours of loading.
I'm running the process on a HPC with SSD and I'm giving to the loading process 3 cores and 120 GB RAM. On the other hand, I notice that the average processor usage doesn't go up over 1.6 and the maximum RAM usage is 14 GB.
I also saw [3] and I'm running the loading natively (without containers). I have the difference with [3] that I've reduced the JVM heap to 4GB as [2] suggested.
So what else could I do to improve the loading performance.
Thanks,
Leandro
[1] http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.wikidata.query.rdf%22... [2] https://github.com/blazegraph/database/wiki/IOOptimization [3] https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-...
wikidata-tech@lists.wikimedia.org