I have 14T SSD (RAID 0)
Le lun. 13 juil. 2020 à 21:19, Amirouche Boubekki amirouche.boubekki@gmail.com a écrit :
Le lun. 13 juil. 2020 à 19:42, Adam Sanchez a.sanchez75@gmail.com a écrit :
Hi,
I have to launch 2 million queries against a Wikidata instance. I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with RAID 0). The queries are simple, just 2 types.
How much SSD in Gigabytes do you have?
select ?s ?p ?o { ?s ?p ?o. filter (?s = ?param) }
Is that the same as:
select ?p ?o { param ?p ?o }
Where param is one of the two million params.
select ?s ?p ?o { ?s ?p ?o. filter (?o = ?param) }
If I use a Java ThreadPoolExecutor takes 6 hours. How can I speed up the queries processing even more?
I was thinking :
a) to implement a Virtuoso cluster to distribute the queries or b) to load Wikidata in a Spark dataframe (since Sansa framework is very slow, I would use my own implementation) or c) to load Wikidata in a Postgresql table and use Presto to distribute the queries or d) to load Wikidata in a PG-Strom table to use GPU parallelism.
What do you think? I am looking for ideas. Any suggestion will be appreciated.
Best,
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Amirouche ~ https://hyper.dev
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata