Le jeu. 11 juin 2020 à 11:13, David Causse <dcausse(a)wikimedia.org> a écrit :
Hi,
did you "munge"[0] the dumps prior to loading them?
As a comparison, loading the munged dump on a WMF production machine (128G, 32cores, SSD
drives) takes around 8days.
0:
https://wikitech.wikimedia.org/wiki/Wikidata_query_service#Data_preparation
munge.sh can be found at
https://github.com/wikimedia/wikidata-query-deploy/blob/master/munge.sh
The source is available at
https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/…
On Thu, Jun 11, 2020 at 12:37 AM Denny Vrandečić <vrandecic(a)gmail.com> wrote:
>
> Did you see this?
>
>
https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits…
I read Total time: ~5.5 days that is really impressive. My latest
attempt at loading latest-lexmes.nt (10G uncompressed) inside my
triple store took me 1 day and it requires 10G of disk space. I made
progress but still far from being able to compete with blazegraph on
that matter. I have an idea about some optimization to do. munge.sh
will help
>
> On Wed, Jun 10, 2020, 12:51 Leandro Tabares Martín
<leandro.tabaresmartin(a)uhasselt.be> wrote:
>>
>> Dear all,
>>
>> I'm loading the whole wikidata dataset into Blazegraph using a High
Performance Computer. I gave 120 GB RAM and 3 processing cores to the job. After almost 24
hours of load the "wikidata.jnl" file has only 28 GB as size. Initially the
process was fast, but as the file increased its size the loading speed has decreased. I
realize that only 14 GB of RAM are being used. I already implemented the recomendations
given in
https://github.com/blazegraph/database/wiki/IOOptimization Do you have some other
recommendations to increase the loading speed?