Hi again,
A thing I was wondering about while testing LDF is how this type of
service might behave under load. In the tests I am doing, my single
browser issues several 100,000 requests for a single query, at an
average rate close to 100 requests per second. This is one user.
It seems one might need a sizeable caching/replication/sharding
infrastructure to cope with this load as soon as more than a few users
issue manual queries. The current Wikidata SPARQL service handles about
20-30 queries per second on average. If you have this rate, and you
expect that an LDF query is taking 30sec to answer on average (being
optimistic here compared to my experience so far), you will have about
600-900 active queries at each moment, for a rate of 60,000 to 90,000
requests per second.
This seems to be a lot. It is actually approaching the order of
magnitude we are seeing for Wikipedia (it's hard to compare these
services; Wikipedia has mostly cache-served content too, but the average
result size is larger). Wouldn't this load somehow lead to problems?
By the way, the query I had tried (streets named after women) has now
finished after 1h and 20min (with the correct number of 320 results). If
you have such "harder" [1] queries in the mix, the average time I
estimated above might be too small. Such long runtimes also seem to
increase the likeliness of connection errors and data inconsistencies
(e.g., what if the database is updated during this time?). I got some
failed requests during this query, too, but apparently they did not
affect my result.
Cheers,
Markus
[1] Of course, this "hard" query takes a mere 1.3 sec on the SPARQL
endpoint, so it is still very far from the 30sec timeout that LDF is
aiming to go beyond.
On 21.12.2016 09:23, Léa Lacroix wrote:
Hello all,
The SPARQL endpoint we are running at
http://query.wikidata.org has
several measures in place in order to ensure it stays up and running and
available for everyone, for example the 30 sec query timeout. This is
necessary but also prevents some useful queries from being run. One way
around this is Linked Data Fragments. It allows for some of the query
computation to be done on the client-side instead of our server.
We have set this up now for testing and would appreciate your testing
and feedback. You can find out more about Linked Data Fragments
<http://linkeddatafragments.org/concept/> and documentation for our
installation
<https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Linked_Data_Fragments_endpoint>.
Also, you can see a demo of client-side SPARQL evaluation and LDF server
usage here:
http://ldfclient.wmflabs.org/
Please note - it's in no way a production service for anything, just a
proof-of-concept deployment of LDF client. If you like how it works, you
can get it from the source
<https://github.com/LinkedDataFragments/jQuery-Widget.js> and deploy it
on your own setup.
Feel free to ask Stas (Smalyshev (WMF)) for any further question!
--
Léa Lacroix
Community Communication Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de <http://www.wikimedia.de>
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata