Hi!
On a side note, the results we presented for BlazeGraph could improve dramatically if one could isolate queries that timed out. Once one query in a sequence timed-out (we used server-side timeouts), we observed that a run of queries would then timeout, possibly a locking problem or
Could you please give a bit more details about this failure scenario? Is is that several queries are run in parallel and one query, timing out, hurts performance of others? Does it happen even after the long query times out? Or it was a sequential run and after one query timed out, the next query had worse performance than the same query when run not preceded by the timing-out query, i.e. timeout query had persistent effect beyond its initial run?
BTW, what was the timeout setting in your experiments? I see in the article that it says "timeouts are counted as 60 seconds" - does it mean that Blazegraph had internal timeout setting set to 60 seconds, or that the setting was different, but when processing results, the actual run time was replaced by 60 seconds?
Also, did you use analytic mode for the queries? https://wiki.blazegraph.com/wiki/index.php/QueryEvaluation#Analytic_Query_Ev... https://wiki.blazegraph.com/wiki/index.php/AnalyticQuery
This is the mode that is turned on automatically for the Wikidata Query Service, and it uses AFAIK different memory management which may influence how the cases you had problems with are handled.
I would appreciate as much detail as you could give on this, as this may also be useful on current query engine work. Also, if you're interested in the work done on WDQS, our experiences and the reasons for certain decisions and setups we did, I'd be glad to answer any questions.
resource leak. Also Daniel mentioned that from discussion with the devs, they claim that the current implementation works best on SSD hard drives; our experiments were on a standard SATA.
Yes, we run it on SSD, judging from our tests on test servers, running on virtualized SATA machines, the difference is indeed dramatic (orders of magnitude and more for some queries). Then again, this is highly unscientific anecdotal evidence, we didn't make anything resembling formal benchmarks since the test hardware is clearly inferior to the production one and is meant to be so. But the point is that SSD is likely a must for Blazegraph to work well on this data set. Might also improve results for other engines, so not sure how it influences the comparison between the engines.