topic: removed semantic-web, guile, and added wikidata-tech.
Let's move the conversation to wikidata-tech@
Please remove wikidata@lists.wikimedia.org next time you reply.
Le dim. 22 déc. 2019 à 23:35, Ted Thibodeau Jr tthibodeau@openlinksw.com a écrit :
On Dec 22, 2019, at 03:17 PM, Amirouche Boubekki amirouche.boubekki@gmail.com wrote:
Hello all ;-)
I ported the code to Chez Scheme to do an apple-to-apple comparison between GNU Guile and Chez and took the time to launch a few queries against Virtuoso available in Ubuntu 18.04 (LTS).
Hi, Amirouche --
Kingsley's points about tuning Virtuoso to use available RAM [1] and other system resources are worth looking into, but a possibly more important first question is --
Exactly what version of Virtuoso are you testing?
If you followed the common script on Ubuntu 18.04, i.e., --
sudo apt update
sudo apt install virtuoso-opensource
-- then you likely have version 6.1.6 of VOS, the Open Source Edition of Virtuoso, which shipped 2012-08-02 [2], and is far behind the latest version of both VOS (v7.2.5+) and Enterprise Edition (v8.3+)!
The easiest way to confirm what you're running is to review the first "paragraph" of output from the command corresponding to the name of your Virtuoso binary --
virtuoso-t -?
$ virtuoso-t -? Virtuoso Open Source Edition (multi threaded) Version 6.1.6.3127-pthreads as of Feb 6 2018
virtuoso-iodbc-t -?
I do not have that command. I use isql-vt:
$ isql-vt --help OpenLink Interactive SQL (Virtuoso), version 0.9849b.
If I'm right, and you're running 6.x, you'll get much better test results just by running a current version of Virtuoso.
You can build VOS 7.2.6+ from source [3] (we'd recommend the develop/7 branch [4] for the absolute latest), or download a precompiled binary [5] of VOS 7.2.5.1 or 7.2.6.dev.
You can also try Enterprise Edition at no cost for 30 days [5].
Next round I will try the develop branch.
Like I said, previously, somewhere, those benchmark must be taken with a grain of salt:
For one, the Virtuoso timings are reported by Virtuoso. Second, nomuofu side, I do not convert the internal representation into the external representation, third and most important point, this is just a glimpse into the full picture.
My mails are mainly trying to spark some interest or discussion with wikidata and wikimedia, so that I can work full time on this. I already described my intents, that is to create a benchmark tool based wikidata SPARQL logs [*], then use those to reallistically benchmark Virtuoso, the current solution and a new solution (nomunofu) that I am working on.
[*] https://iccl.inf.tu-dresden.de/web/Wissensbasierte_Systeme/WikidataSPARQL/en
Raw benchmarks would not tell all the thruth, because nomunofu can rely on both WiredTiger and FoundationDB, which, as far as I know, claim stronger guarantees than Virtuoso. The only way to know whether Virtuoso is comparable to FoundationDB or WiredTiger, will be for Virtuoso to pass the Jespen harness tests (https://jepsen.io/).
I did not put all the eggs in the same basket, I am considering other options. But I think working for wikimedia by contract or permanent position would be best overall.
I will make another WDQS proposal, based on some feedback I have been given on IRC to add more technical details (and improve the road map).
[1] http://vos.openlinksw.com/owiki/wiki/VOS/VirtRDFPerformanceTuning
[2] http://vos.openlinksw.com/owiki/wiki/VOS/VOSNews2012#2012-08-02%20--%20Annou....
[3] http://vos.openlinksw.com/owiki/wiki/VOS/VOSBuild
[4] https://github.com/openlink/virtuoso-opensource/tree/develop/7
[5] https://sourceforge.net/projects/virtuoso/files/virtuoso/
Spoiler: the new code is always faster.
The hard disk is SATA, and the CPU is dubbed: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz
I imported latest-lexeme.nt (6GB) using guile-nomunofu, chez-nomunofu and Virtuoso:
- Chez takes 40 minutes to import 6GB
- Chez is 3 to 5 times faster than Guile
- Chez is 11% faster than Virtuoso
How did you load the data? Did you use Virtuoso's build-load facilities? This is the recommended method [6].
[6] http://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader
Regarding query time, Chez is still faster than Virtuoso with or without cache. The query I am testing is the following:
SELECT ?s ?p ?o FROM http://fu WHERE { ?s http://purl.org/dc/terms/language http://www.wikidata.org/entity/Q150 . ?s http://wikiba.se/ontology#lexicalCategory http://www.wikidata.org/entity/Q1084 . ?s http://www.w3.org/2000/01/rdf-schema#label ?o };
Virtuoso first query takes: 1295 msec. The second query takes: 331 msec. Then it stabilize around: 200 msec.
chez nomunofu takes around 200ms without cache.
There is still an optimization I can do to speed up nomunofu a little.
Happy hacking!
I'll be interested to hear your new results, with a current build, and with proper INI tuning in place.
What will be the INI options I need to use? Thanks!
Regards,
Ted
-- A: Yes. http://www.idallen.com/topposting.html | Q: Are you sure? | | A: Because it reverses the logical flow of conversation. | | | Q: Why is top posting frowned upon?
Ted Thibodeau, Jr. // voice +1-781-273-0900 x32 Senior Support & Evangelism // mailto:tthibodeau@openlinksw.com // http://twitter.com/TallTed OpenLink Software, Inc. // http://www.openlinksw.com/ 20 Burlington Mall Road, Suite 322, Burlington MA 01803 Weblog -- http://www.openlinksw.com/blogs/ Community -- https://community.openlinksw.com/ LinkedIn -- http://www.linkedin.com/company/openlink-software/ Twitter -- http://twitter.com/OpenLink Facebook -- http://www.facebook.com/OpenLinkSoftware Universal Data Access, Integration, and Management Technology Providers
Regards,
Amirouche ~ zig ~ https://hyper.dev
Checkout my proposal at https://meta.wikimedia.org/wiki/Grants:Project/Future-proof_WDQS
I started working a paper (more will follow) that will document and support my work, see https://en.wikiversity.org/wiki/WikiJournal_Preprints/Generic_Tuple_Store#Fu...
Happy Holydays ;-)
On 12/23/19 5:39 AM, Amirouche Boubekki wrote:
topic: removed semantic-web, guile, and added wikidata-tech.
Let's move the conversation to wikidata-tech@
Please remove wikidata@lists.wikimedia.org next time you reply.
Le dim. 22 déc. 2019 à 23:35, Ted Thibodeau Jr tthibodeau@openlinksw.com a écrit :
I don't get it.
You start off making unqualified claims about Virtuoso here, and you now want to open up a new topic somewhere else?
If you want to make claims about Virtuoso and performance comparisons do so with professional disclosure re:
1. Version 2. Configuration 3. Docker Image -- if using Docker 4. Host Machine -- OS, RAM, and CPU Affinity.
Kingsley
On Dec 22, 2019, at 03:17 PM, Amirouche Boubekki amirouche.boubekki@gmail.com wrote:
Hello all ;-)
I ported the code to Chez Scheme to do an apple-to-apple comparison between GNU Guile and Chez and took the time to launch a few queries against Virtuoso available in Ubuntu 18.04 (LTS).
Hi, Amirouche --
Kingsley's points about tuning Virtuoso to use available RAM [1] and other system resources are worth looking into, but a possibly more important first question is --
Exactly what version of Virtuoso are you testing?
If you followed the common script on Ubuntu 18.04, i.e., --
sudo apt update
sudo apt install virtuoso-opensource
-- then you likely have version 6.1.6 of VOS, the Open Source Edition of Virtuoso, which shipped 2012-08-02 [2], and is far behind the latest version of both VOS (v7.2.5+) and Enterprise Edition (v8.3+)!
The easiest way to confirm what you're running is to review the first "paragraph" of output from the command corresponding to the name of your Virtuoso binary --
virtuoso-t -?
$ virtuoso-t -? Virtuoso Open Source Edition (multi threaded) Version 6.1.6.3127-pthreads as of Feb 6 2018
virtuoso-iodbc-t -?
I do not have that command. I use isql-vt:
$ isql-vt --help OpenLink Interactive SQL (Virtuoso), version 0.9849b.
If I'm right, and you're running 6.x, you'll get much better test results just by running a current version of Virtuoso.
You can build VOS 7.2.6+ from source [3] (we'd recommend the develop/7 branch [4] for the absolute latest), or download a precompiled binary [5] of VOS 7.2.5.1 or 7.2.6.dev.
You can also try Enterprise Edition at no cost for 30 days [5].
Next round I will try the develop branch.
Like I said, previously, somewhere, those benchmark must be taken with a grain of salt:
For one, the Virtuoso timings are reported by Virtuoso. Second, nomuofu side, I do not convert the internal representation into the external representation, third and most important point, this is just a glimpse into the full picture.
My mails are mainly trying to spark some interest or discussion with wikidata and wikimedia, so that I can work full time on this. I already described my intents, that is to create a benchmark tool based wikidata SPARQL logs [*], then use those to reallistically benchmark Virtuoso, the current solution and a new solution (nomunofu) that I am working on.
[*] https://iccl.inf.tu-dresden.de/web/Wissensbasierte_Systeme/WikidataSPARQL/en
Raw benchmarks would not tell all the thruth, because nomunofu can rely on both WiredTiger and FoundationDB, which, as far as I know, claim stronger guarantees than Virtuoso. The only way to know whether Virtuoso is comparable to FoundationDB or WiredTiger, will be for Virtuoso to pass the Jespen harness tests (https://jepsen.io/).
I did not put all the eggs in the same basket, I am considering other options. But I think working for wikimedia by contract or permanent position would be best overall.
I will make another WDQS proposal, based on some feedback I have been given on IRC to add more technical details (and improve the road map).
[1] http://vos.openlinksw.com/owiki/wiki/VOS/VirtRDFPerformanceTuning
[2] http://vos.openlinksw.com/owiki/wiki/VOS/VOSNews2012#2012-08-02%20--%20Annou....
[3] http://vos.openlinksw.com/owiki/wiki/VOS/VOSBuild
[4] https://github.com/openlink/virtuoso-opensource/tree/develop/7
[5] https://sourceforge.net/projects/virtuoso/files/virtuoso/
Spoiler: the new code is always faster.
The hard disk is SATA, and the CPU is dubbed: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz
I imported latest-lexeme.nt (6GB) using guile-nomunofu, chez-nomunofu and Virtuoso:
- Chez takes 40 minutes to import 6GB
- Chez is 3 to 5 times faster than Guile
- Chez is 11% faster than Virtuoso
How did you load the data? Did you use Virtuoso's build-load facilities? This is the recommended method [6].
[6] http://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader
Regarding query time, Chez is still faster than Virtuoso with or without cache. The query I am testing is the following:
SELECT ?s ?p ?o FROM http://fu WHERE { ?s http://purl.org/dc/terms/language http://www.wikidata.org/entity/Q150 . ?s http://wikiba.se/ontology#lexicalCategory http://www.wikidata.org/entity/Q1084 . ?s http://www.w3.org/2000/01/rdf-schema#label ?o };
Virtuoso first query takes: 1295 msec. The second query takes: 331 msec. Then it stabilize around: 200 msec.
chez nomunofu takes around 200ms without cache.
There is still an optimization I can do to speed up nomunofu a little.
Happy hacking!
I'll be interested to hear your new results, with a current build, and with proper INI tuning in place.
What will be the INI options I need to use? Thanks!
Regards,
Ted
-- A: Yes. http://www.idallen.com/topposting.html | Q: Are you sure? | | A: Because it reverses the logical flow of conversation. | | | Q: Why is top posting frowned upon?
Ted Thibodeau, Jr. // voice +1-781-273-0900 x32 Senior Support & Evangelism // mailto:tthibodeau@openlinksw.com // http://twitter.com/TallTed OpenLink Software, Inc. // http://www.openlinksw.com/ 20 Burlington Mall Road, Suite 322, Burlington MA 01803 Weblog -- http://www.openlinksw.com/blogs/ Community -- https://community.openlinksw.com/ LinkedIn -- http://www.linkedin.com/company/openlink-software/ Twitter -- http://twitter.com/OpenLink Facebook -- http://www.facebook.com/OpenLinkSoftware Universal Data Access, Integration, and Management Technology Providers
Regards,
Amirouche ~ zig ~ https://hyper.dev
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
wikidata-tech@lists.wikimedia.org