On 9/2/19 3:51 PM, Adam Sanchez wrote:
Hi
I was able to reduce the load time to 9.1 hours aprox. (32890338 msec) in Virtuoso 7. I used 6 SSD disks of 1T each with RAID 0 (mdadm software RAID, I have not tried with hardware RAID). The virtuoso.ini for 256G RAM is https://gist.github.com/asanchez75/58d5aed504051c7fbf9af0921c3c9130 I downloaded the dump from https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz%C2%A0 on August 30th, The size is 387G uncompressed and finally the file virtuoso.db is 362G. The total number of triples is 9 470 700 617. Have a look to the simple patch here (is just a workaround) https://github.com/asanchez75/virtuoso-opensource/commit/5d7b1b9b29e53cb8a25... You can create your own docker image with that patch using https://github.com/asanchez75/docker-virtuoso/tree/brendan Check the Dockerfile which retrieves the patch from my forked Virtuoso git repository https://github.com/asanchez75/docker-virtuoso/blob/brendan/Dockerfile
Best,
Great job!
I've granted access to you via your email address so that you can update the Google Spreadsheet containing configuration details per sample Virtuoso instances [1]. You can put your data in the Wikidata worksheet [2].
Links:
[1] https://docs.google.com/spreadsheets/d/1-stlTC_WJmMU3xA_NxA1tSLHw6_sbpjff-5O...
[2] https://docs.google.com/spreadsheets/d/1-stlTC_WJmMU3xA_NxA1tSLHw6_sbpjff-5O...
Kingsley
Le dim. 1 sept. 2019 à 13:38, Edgar Meij <edgar.meij@gmail.com mailto:edgar.meij@gmail.com> a écrit :
Thanks for this, Kingsley. Based on https://docs.google.com/spreadsheets/d/1-stlTC_WJmMU3xA_NxA1tSLHw6_sbpjff-5OITtrbFw/edit#gid=1799898600 (copy-pasted below), it seems that it takes 43 hours to load, is that correct? Also, what is the "patch for geometry" mentioned there? I'm assuming that is the patch meant to address https://github.com/openlink/virtuoso-opensource/issues/295 and https://community.openlinksw.com/t/non-terrestrial-geo-literals/359, correct? Is it simply disabling the data validation code? Can you share the patch? Thanks, Edgar Other Information Architecture x86_64 CPU op-mode(s) 32-bit, 64-bit Byte Order Little Endian CPU(s) 12.00 On-line CPU(s) list 0-11 Thread(s) per core 2.00 Core(s) per socket 6.00 Socket(s) 1.00 NUMA node(s) 1.00 Vendor ID GenuineIntel CPU family 6.00 Model 63.00 Model name Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz Stepping 2.00 CPU MHz 1,199.92 CPU max MHz 3,800.00 CPU min MHz 1,200.00 BogoMIPS 6,984.39 Virtualization VT-x L1d cache 32K L1i cache 32K L2 cache 256K L3 cache 15360K NUMA node0 CPU(s) 0-11 RAM 128G wikidata-20190610-all-BETA.ttl 383G Virtuoso version 07.20.3230 (with patch for geometry) Time to load 43 hours virtuoso.db 340G On Wed, Aug 14, 2019 at 12:10 AM Kingsley Idehen <kidehen@openlinksw.com <mailto:kidehen@openlinksw.com>> wrote: Hi Everyone, A little FYI. We have loaded Wikidata into a Virtuoso instance accessible via SPARQL [1]. One benefit is helping to understand Wikidata using our Faceted Browsing Interface for Entity Relationship Types [2][3]. Links: [1] http://wikidata.demo.openlinksw.com/sparql -- SPARQL endpoint [2] http://wikidata.demo.openlinksw.com/fct -- Faceted Browsing Interface [3] About New York <https://wikidata.demo.openlinksw.com/describe/?url=http%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ60&gp=16&go=&lp=940&invfp=IFP_OFF&sas=SAME_AS_OFF&distinct=1> Enjoy! Feedback always welcome too :) -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata