Hi
I was able to reduce the load time to 9.1 hours aprox. (32890338 msec)
in Virtuoso 7.
I used 6 SSD disks of 1T each with RAID 0 (mdadm software RAID, I have
not tried with hardware RAID).
The virtuoso.ini for 256G RAM is
https://gist.github.com/asanchez75/58d5aed504051c7fbf9af0921c3c9130
I downloaded the dump from
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz
on August 30th,
The size is 387G uncompressed and finally the file virtuoso.db is
362G. The total number of triples is 9 470 700 617.
Have a look to the simple patch here (is just a workaround)
https://github.com/asanchez75/virtuoso-opensource/commit/5d7b1b9b29e53cb8a2…
You can create your own docker image with that patch using
https://github.com/asanchez75/docker-virtuoso/tree/brendan
Check the Dockerfile which retrieves the patch from my forked Virtuoso
git repository
https://github.com/asanchez75/docker-virtuoso/blob/brendan/Dockerfile
Best,
Great job!
I've granted access to you via your email address so that you can update
the Google Spreadsheet containing configuration details per sample
Virtuoso instances [1]. You can put your data in the Wikidata worksheet [2].
Links:
[1]
Le dim. 1 sept. 2019 à 13:38, Edgar Meij <edgar.meij(a)gmail.com
<mailto:edgar.meij@gmail.com>> a écrit :
Thanks for this, Kingsley.
Based on
https://docs.google.com/spreadsheets/d/1-stlTC_WJmMU3xA_NxA1tSLHw6_sbpjff-5…
(copy-pasted below), it seems that it takes 43 hours to load, is
that correct?
Also, what is the "patch for geometry" mentioned there? I'm
assuming that is the patch meant to address
https://github.com/openlink/virtuoso-opensource/issues/295 and
https://community.openlinksw.com/t/non-terrestrial-geo-literals/359,
correct? Is it simply disabling the data validation code? Can you
share the patch?
Thanks,
Edgar
Other Information
Architecture
x86_64
CPU op-mode(s)
32-bit, 64-bit
Byte Order
Little Endian
CPU(s)
12.00
On-line CPU(s) list
0-11
Thread(s) per core
2.00
Core(s) per socket
6.00
Socket(s)
1.00
NUMA node(s)
1.00
Vendor ID
GenuineIntel
CPU family
6.00
Model
63.00
Model name
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
Stepping
2.00
CPU MHz
1,199.92
CPU max MHz
3,800.00
CPU min MHz
1,200.00
BogoMIPS
6,984.39
Virtualization
VT-x
L1d cache
32K
L1i cache
32K
L2 cache
256K
L3 cache
15360K
NUMA node0 CPU(s)
0-11
RAM
128G
wikidata-20190610-all-BETA.ttl
383G
Virtuoso version
07.20.3230 (with patch for geometry)
Time to load
43 hours
virtuoso.db
340G
On Wed, Aug 14, 2019 at 12:10 AM Kingsley Idehen
<kidehen(a)openlinksw.com <mailto:kidehen@openlinksw.com>> wrote:
Hi Everyone,
A little FYI.
We have loaded Wikidata into a Virtuoso instance accessible
via SPARQL [1]. One benefit is helping to understand Wikidata
using our Faceted Browsing Interface for Entity Relationship
Types [2][3].
Links:
[1]
http://wikidata.demo.openlinksw.com/sparql -- SPARQL endpoint
[2]
http://wikidata.demo.openlinksw.com/fct -- Faceted
Browsing Interface
[3] About New York
<https://wikidata.demo.openlinksw.com/describe/?url=http%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ60&gp=16&go=&lp=940&invfp=IFP_OFF&sas=SAME_AS_OFF&distinct=1>
Enjoy!
Feedback always welcome too :)
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Home Page:
http://www.openlinksw.com
Community Support:
https://community.openlinksw.com
Weblogs (Blogs):
Company Blog:
https://medium.com/openlink-software-blog
Virtuoso Blog:
https://medium.com/virtuoso-blog
Data Access Drivers Blog:
https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs):
Medium Blog:
https://medium.com/@kidehen
Legacy Blogs:
http://www.openlinksw.com/blog/~kidehen/
http://kidehen.blogspot.com
Profile Pages:
Pinterest:
https://www.pinterest.com/kidehen/
Quora:
https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter:
https://twitter.com/kidehen
Google+:
https://plus.google.com/+KingsleyIdehen/about
LinkedIn:
http://www.linkedin.com/in/kidehen
Web Identities (WebID):
Personal:
http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
:
http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Home Page: