Hi!
A thing I was wondering about while testing LDF is how
this type of
service might behave under load. In the tests I am doing, my single
browser issues several 100,000 requests for a single query, at an
average rate close to 100 requests per second. This is one user.
I was wondering this too. Now, pattern fragment requests are much easier
than SPARQL - as far as I can see, they go directly to the index, no
query parsing, no plan building, no complex calculations, joins, etc. I
haven't seen any noticeable change in the load when the tests were run
yesterday (I've run several, and Markus did too).
That said, the answer to the question whether the server can handle the
load required for practical LDF usage is a resounding "I don't know". So
far I haven't seen any signs of it being problematic (with SPARQL it's
pretty apparent when some queries are a problem, haven't seen anything
like that). But we haven't seen any serious usage yet, and I'm not sure
what usage patterns to look for, since it's entirely client-driven.
We do have caching in front of it, though I am not sure how effective it
would be - after all, we're talking about 1.5 billion triples, each
member of which can appear in a patterns, in various combinations and
with various output formats, pagination etc. I am not entirely sure
whether naive URL-based cache would do a lot here.
It is possible to have more horizontal-scale replication - i.e. adding
servers - of course, at the cost of hardware which inevitably raises the
question of budget -
It seems one might need a sizeable
caching/replication/sharding
infrastructure to cope with this load as soon as more than a few users
issue manual queries. The current Wikidata SPARQL service handles about
20-30 queries per second on average. If you have this rate, and you
expect that an LDF query is taking 30sec to answer on average (being
optimistic here compared to my experience so far), you will have about
600-900 active queries at each moment, for a rate of 60,000 to 90,000
requests per second.
Note again that LDF queries would typically be very short in duration
(since they produce only 100 items per page) and we still do have
parallel connection limits :) But again, I'm not sure how it would
behave under typical load, one of the reasons being I don't even know
what typical load for such API is. I guess I'll have to monitor it
carefully and see if there are signs of trouble and deal with it then.
I plan to do some light load testing just to have at least baseline
measures, but until we know how the real usage looks like it all be
guesswork I think.
(e.g., what if the database is updated during this
time?). I got some
failed requests during this query, too, but apparently they did not
affect my result.
Some of the failures maybe because of parallel connection limits, I'm
not sure how many parallel requests the JS client produces - it uses web
workers but I haven't found how the parallelism is controlled.
--
Stas Malyshev
smalyshev(a)wikimedia.org