On 7/13/20 1:41 PM, Adam Sanchez wrote:
Hi,
I have to launch 2 million queries against a Wikidata instance. I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with RAID 0). The queries are simple, just 2 types.
select ?s ?p ?o { ?s ?p ?o. filter (?s = ?param) }
select ?s ?p ?o { ?s ?p ?o. filter (?o = ?param) }
If I use a Java ThreadPoolExecutor takes 6 hours. How can I speed up the queries processing even more?
I was thinking :
a) to implement a Virtuoso cluster to distribute the queries or b) to load Wikidata in a Spark dataframe (since Sansa framework is very slow, I would use my own implementation) or c) to load Wikidata in a Postgresql table and use Presto to distribute the queries or d) to load Wikidata in a PG-Strom table to use GPU parallelism.
What do you think? I am looking for ideas. Any suggestion will be appreciated.
Best,
Hi Adam,
You need to increase the memory available to Virtuoso. If you are at your limits that's when the Cluster Edition will come in handy i.e., enabling you build a large pool or memory from a sharded DB horizontally partitioning over of collection of commodity computers.
There is a public Google Spreadsheet covering a variety of public Virtuoso instances that should aid you in this process [1].
Links:
[1] https://docs.google.com/spreadsheets/d/1-stlTC_WJmMU3xA_NxA1tSLHw6_sbpjff-5O...