Hi again,
A thing I was wondering about while testing LDF is how this type of service might behave under load. In the tests I am doing, my single browser issues several 100,000 requests for a single query, at an average rate close to 100 requests per second. This is one user.
It seems one might need a sizeable caching/replication/sharding infrastructure to cope with this load as soon as more than a few users issue manual queries. The current Wikidata SPARQL service handles about 20-30 queries per second on average. If you have this rate, and you expect that an LDF query is taking 30sec to answer on average (being optimistic here compared to my experience so far), you will have about 600-900 active queries at each moment, for a rate of 60,000 to 90,000 requests per second.
This seems to be a lot. It is actually approaching the order of magnitude we are seeing for Wikipedia (it's hard to compare these services; Wikipedia has mostly cache-served content too, but the average result size is larger). Wouldn't this load somehow lead to problems?
By the way, the query I had tried (streets named after women) has now finished after 1h and 20min (with the correct number of 320 results). If you have such "harder" [1] queries in the mix, the average time I estimated above might be too small. Such long runtimes also seem to increase the likeliness of connection errors and data inconsistencies (e.g., what if the database is updated during this time?). I got some failed requests during this query, too, but apparently they did not affect my result.
Cheers,
Markus
[1] Of course, this "hard" query takes a mere 1.3 sec on the SPARQL endpoint, so it is still very far from the 30sec timeout that LDF is aiming to go beyond.
On 21.12.2016 09:23, Léa Lacroix wrote:
Hello all,
The SPARQL endpoint we are running at http://query.wikidata.org has several measures in place in order to ensure it stays up and running and available for everyone, for example the 30 sec query timeout. This is necessary but also prevents some useful queries from being run. One way around this is Linked Data Fragments. It allows for some of the query computation to be done on the client-side instead of our server.
We have set this up now for testing and would appreciate your testing and feedback. You can find out more about Linked Data Fragments http://linkeddatafragments.org/concept/ and documentation for our installation https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Linked_Data_Fragments_endpoint. Also, you can see a demo of client-side SPARQL evaluation and LDF server usage here: http://ldfclient.wmflabs.org/
Please note - it's in no way a production service for anything, just a proof-of-concept deployment of LDF client. If you like how it works, you can get it from the source https://github.com/LinkedDataFragments/jQuery-Widget.js and deploy it on your own setup.
Feel free to ask Stas (Smalyshev (WMF)) for any further question!
-- Léa Lacroix Community Communication Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de http://www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata