On 6/20/19 12:48 PM, hellmann@informatik.uni-leipzig.de wrote:
Hi Adam,
the server specs you posted are not so important. What disks did you use?

They should be SSD or 15k RPM SAS to make it faster.

Virtuoso can parse multi thread if you split the files before loading, but hdd speed is still the bottleneck.

Sebastian


Yep!

And if the shared-nothing cluster edition is in use, you can run the bulk loaders in parallel across each of the nodes in the clusters which will reduce the load time too.

We've cluster configurations behind our LOD instance where all of DBpedia was loaded in 15 minutes flat, and I don't mean via some massive cluster setup just what we have behind our LOD Cloud cache instance :)


Kingsley


On June 20, 2019 2:37:16 PM GMT+02:00, Adam Sanchez <a.sanchez75@gmail.com> wrote:
For your information

a) It took 10.2 days to load the Wikidata RDF dump
(wikidata-20190513-all-BETA.ttl, 379G) in Blazegraph 2.1.5.
The bigdata.jnl file turned to be 1.3T

Server technical features

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Stepping:              1
CPU MHz:               1200.476
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4197.65
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
RAM: 128G

b) It took 43 hours to load the Wikidata RDF dump
(wikidata-20190610-all-BETA.ttl, 383G) in the dev version of Virtuoso
07.20.3230.
I had to patch Virtuoso because it was given the following error each
time I load the RDF data

09:58:06 PL LOG: File /backup/wikidata-20190610-all-BETA.ttl error
42000 TURTLE RDF loader, line 2984680: RDFGE: RDF box with a geometry
RDF type and a non-geometry content

The virtuoso.db file turned to be 340G.

Server technical features

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
Stepping:              2
CPU MHz:               1199.920
CPU max MHz:           3800.0000
CPU min MHz:           1200.0000
BogoMIPS:              6984.39
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-11
RAM: 128G

Best,


Le mar. 4 juin 2019 à 16:37, Vi to <vituzzu.wiki@gmail.com> a écrit :
V4 has 8 cores instead of 6. But well, it's a server grade config on purpose! Vito Il giorno mar 4 giu 2019 alle ore 16:32 Guillaume Lederrey <glederrey@wikimedia.org> ha scritto:
On Tue, Jun 4, 2019 at 3:14 PM Vi to <vituzzu.wiki@gmail.com> wrote:
AFAIR it's a double Xeon E5-2620 v3. With modern CPUs frequency is not so significant.
Our latest batch of servers are: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (so v4 instead of v3, but the difference is probably minimal).
Vito Il giorno mar 4 giu 2019 alle ore 13:00 Adam Sanchez <a.sanchez75@gmail.com> ha scritto:
Thanks Guillaume! One question more, what is the CPU frequency (GHz)? Le mar. 4 juin 2019 à 12:25, Guillaume Lederrey <glederrey@wikimedia.org> a écrit :
On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez <a.sanchez75@gmail.com> wrote:
Hello, Does somebody know the minimal hardware requirements (disk size and RAM) for loading wikidata dump in Blazegraph?
The actual hardware requirements will depend on your use case. But for comparison, our production servers are: * 16 cores (hyper threaded, 32 threads) * 128G RAM * 1.5T of SSD storage
The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G. The bigdata.jnl file which stores all the triples data in Blazegraph is 478G but still growing. I had 1T disk but is almost full now.
The current size of our jnl file in production is ~670G. Hope that helps! Guillaume
Thanks, Adam
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


-- 
Regards,

Kingsley Idehen	      
Founder & CEO 
OpenLink Software   
Home Page: http://www.openlinksw.com
Community Support: https://community.openlinksw.com
Weblogs (Blogs):
Company Blog: https://medium.com/openlink-software-blog
Virtuoso Blog: https://medium.com/virtuoso-blog
Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

Personal Weblogs (Blogs):
Medium Blog: https://medium.com/@kidehen
Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
              http://kidehen.blogspot.com

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
        : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this