Dear all,
I'm loading the whole wikidata dataset into Blazegraph using a High Performance Computer. I gave 120 GB RAM and 3 processing cores to the job. After almost 24 hours of load the "wikidata.jnl" file has only 28 GB as size. Initially the process was fast, but as the file increased its size the loading speed has decreased. I realize that only 14 GB of RAM are being used. I already implemented the recomendations given in https://github.com/blazegraph/database/wiki/IOOptimization Do you have some other recommendations to increase the loading speed?
Leandro
Did you see this?
https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-...
On Wed, Jun 10, 2020, 12:51 Leandro Tabares Martín < leandro.tabaresmartin@uhasselt.be> wrote:
Dear all,
I'm loading the whole wikidata dataset into Blazegraph using a High Performance Computer. I gave 120 GB RAM and 3 processing cores to the job. After almost 24 hours of load the "wikidata.jnl" file has only 28 GB as size. Initially the process was fast, but as the file increased its size the loading speed has decreased. I realize that only 14 GB of RAM are being used. I already implemented the recomendations given in https://github.com/blazegraph/database/wiki/IOOptimization Do you have some other recommendations to increase the loading speed?
Leandro _______________________________________________ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Hi,
did you "munge"[0] the dumps prior to loading them? As a comparison, loading the munged dump on a WMF production machine (128G, 32cores, SSD drives) takes around 8days.
0: https://wikitech.wikimedia.org/wiki/Wikidata_query_service#Data_preparation
On Thu, Jun 11, 2020 at 12:37 AM Denny Vrandečić vrandecic@gmail.com wrote:
Did you see this?
https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-...
On Wed, Jun 10, 2020, 12:51 Leandro Tabares Martín < leandro.tabaresmartin@uhasselt.be> wrote:
Dear all,
I'm loading the whole wikidata dataset into Blazegraph using a High Performance Computer. I gave 120 GB RAM and 3 processing cores to the job. After almost 24 hours of load the "wikidata.jnl" file has only 28 GB as size. Initially the process was fast, but as the file increased its size the loading speed has decreased. I realize that only 14 GB of RAM are being used. I already implemented the recomendations given in https://github.com/blazegraph/database/wiki/IOOptimization Do you have some other recommendations to increase the loading speed?
Leandro _______________________________________________ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Le jeu. 11 juin 2020 à 11:13, David Causse dcausse@wikimedia.org a écrit :
Hi,
did you "munge"[0] the dumps prior to loading them? As a comparison, loading the munged dump on a WMF production machine (128G, 32cores, SSD drives) takes around 8days.
0: https://wikitech.wikimedia.org/wiki/Wikidata_query_service#Data_preparation
munge.sh can be found at https://github.com/wikimedia/wikidata-query-deploy/blob/master/munge.sh
The source is available at https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/j...
On Thu, Jun 11, 2020 at 12:37 AM Denny Vrandečić vrandecic@gmail.com wrote:
Did you see this?
https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-...
I read Total time: ~5.5 days that is really impressive. My latest attempt at loading latest-lexmes.nt (10G uncompressed) inside my triple store took me 1 day and it requires 10G of disk space. I made progress but still far from being able to compete with blazegraph on that matter. I have an idea about some optimization to do. munge.sh will help
On Wed, Jun 10, 2020, 12:51 Leandro Tabares Martín leandro.tabaresmartin@uhasselt.be wrote:
Dear all,
I'm loading the whole wikidata dataset into Blazegraph using a High Performance Computer. I gave 120 GB RAM and 3 processing cores to the job. After almost 24 hours of load the "wikidata.jnl" file has only 28 GB as size. Initially the process was fast, but as the file increased its size the loading speed has decreased. I realize that only 14 GB of RAM are being used. I already implemented the recomendations given in https://github.com/blazegraph/database/wiki/IOOptimization Do you have some other recommendations to increase the loading speed?
wikidata-tech@lists.wikimedia.org