On Jun 20, 2019, at 08:37 AM, Adam Sanchez <a.sanchez75(a)gmail.com> wrote:
For your information
...
b) It took 43 hours to load the Wikidata RDF dump
(wikidata-20190610-all-BETA.ttl, 383G) in the dev version of Virtuoso
07.20.3230.
I had to patch Virtuoso because it was given the following error each
time I load the RDF data
09:58:06 PL LOG: File /backup/wikidata-20190610-all-BETA.ttl error
42000 TURTLE RDF loader, line 2984680: RDFGE: RDF box with a geometry
RDF type and a non-geometry content
The virtuoso.db file turned to be 340G.
Server technical features
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
Stepping: 2
CPU MHz: 1199.920
CPU max MHz: 3800.0000
CPU min MHz: 1200.0000
BogoMIPS: 6984.39
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0-11
RAM: 128G
Best,
Hi, Adam --
We're quite interested in the time your Wikidata load took on
Virtuoso, as it seems rather slow, given our experience with
other large (and much larger!) data sets.
The hardware information you provided focused primarily on the
processors -- but RAM and disk details are much more important
to data loads.
Also, there are some significant Virtuoso configuration settings
(in the INI file) which have an impact.
We'd like to get the info that would let us fill in the blanks
on this spreadsheet (itself a work in progress), so we can do
some analysis, and likely provide some tuning hints that would
bring the Virtuoso Wikidata load time down significantly.
https://docs.google.com/spreadsheets/d/1-stlTC_WJmMU3xA_NxA1tSLHw6_sbpjff-5…
You can see the settings in use for some other deployments, on
the "Current" tab, which may in themselves show you some places
you could improve things immediately.
Last, we would appreciate knowing exactly what you patched to
get around the geodata error, as there are a few open issues
along those lines, which are also works in progress.
Thanks,
Ted
--
A: Yes.
http://www.idallen.com/topposting.html
| Q: Are you sure?
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?
Ted Thibodeau, Jr. // voice +1-781-273-0900 x32
Senior Support & Evangelism // mailto:tthibodeau@openlinksw.com
//
http://twitter.com/TallTed
OpenLink Software, Inc. //
http://www.openlinksw.com/
20 Burlington Mall Road, Suite 322, Burlington MA 01803
Weblog --
http://www.openlinksw.com/blogs/
Community --
https://community.openlinksw.com/
LinkedIn --
http://www.linkedin.com/company/openlink-software/
Twitter --
http://twitter.com/OpenLink
Facebook --
http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers