> How can I speed up the queries processing even more?

imho: drop the unwanted data as early as you can ...  (    ~ aggressive prefiltering ;  ~ not import  )

> Any suggestion will be appreciated.

in your case ..  
- I will check the RDF dumps .. https://www.wikidata.org/wiki/Wikidata:Database_download#RDF_dumps 
- I will try to write a custom filter for pre-filter for 2 million parameters  ... ( simple text parsing ..  in GoLang; using multiple cores ... or with other fast code  ) 
- and just  load the results to PostgreSQL ..  

I have a good experience - parsing the and filtering the wikidata json dump (gzipped) .. and loading the result to PostgreSQL database ..
I can run the full code on my laptop ....     and the result in my case ~ 12 GB in the PostgreSQL ...

the biggest problem .. the memory requirements of  "2 million parameters"   .. but you can choose some fast key-value storage .. like RocksDB ...
but there are other low tech parsing solutions ... 

Regards,
 Imre



Best,
 Imre



Adam Sanchez <a.sanchez75@gmail.com> ezt írta (időpont: 2020. júl. 13., H, 19:42):
Hi,

I have to launch 2 million queries against a Wikidata instance.
I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with RAID 0).
The queries are simple, just 2 types.

select ?s ?p ?o {
?s ?p ?o.
filter (?s = ?param)
}

select ?s ?p ?o {
?s ?p ?o.
filter (?o = ?param)
}

If I use a Java ThreadPoolExecutor takes 6 hours.
How can I speed up the queries processing even more?

I was thinking :

a) to implement a Virtuoso cluster to distribute the queries or
b) to load Wikidata in a Spark dataframe (since Sansa framework is
very slow, I would use my own implementation) or
c) to load Wikidata in a Postgresql table and use Presto to distribute
the queries or
d) to load Wikidata in a PG-Strom table to use GPU parallelism.

What do you think? I am looking for ideas.
Any suggestion will be appreciated.

Best,

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata