[Wikidata] Can LDF scale? (Was: Linked data fragment enabled on the Query Service)

23 Dec 2016


      Hi again,
A thing I was wondering about while testing LDF is how this type of 
service might behave under load. In the tests I am doing, my single 
browser issues several 100,000 requests for a single query, at an 
average rate close to 100 requests per second. This is one user.
It seems one might need a sizeable caching/replication/sharding 
infrastructure to cope with this load as soon as more than a few users 
issue manual queries. The current Wikidata SPARQL service handles about 
20-30 queries per second on average. If you have this rate, and you 
expect that an LDF query is taking 30sec to answer on average (being 
optimistic here compared to my experience so far), you will have about 
600-900 active queries at each moment, for a rate of 60,000 to 90,000 
requests per second.
This seems to be a lot. It is actually approaching the order of 
magnitude we are seeing for Wikipedia (it's hard to compare these 
services; Wikipedia has mostly cache-served content too, but the average 
result size is larger). Wouldn't this load somehow lead to problems?
By the way, the query I had tried (streets named after women) has now 
finished after 1h and 20min (with the correct number of 320 results). If 
you have such "harder" [1] queries in the mix, the average time I 
estimated above might be too small. Such long runtimes also seem to 
increase the likeliness of connection errors and data inconsistencies 
(e.g., what if the database is updated during this time?). I got some 
failed requests during this query, too, but apparently they did not 
affect my result.
Cheers,
Markus
[1] Of course, this "hard" query takes a mere 1.3 sec on the SPARQL 
endpoint, so it is still very far from the 30sec timeout that LDF is 
aiming to go beyond.
On 21.12.2016 09:23, Léa Lacroix wrote:
...
Hello all,
The SPARQL endpoint we are running at http://query.wikidata.org has
several measures in place in order to ensure it stays up and running and
available for everyone, for example the 30 sec query timeout. This is
necessary but also prevents some useful queries from being run. One way
around this is Linked Data Fragments. It allows for some of the query
computation to be done on the client-side instead of our server.
We have set this up now for testing and would appreciate your testing
and feedback. You can find out more about Linked Data Fragments
http://linkeddatafragments.org/concept/ and documentation for our
installation
https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Linked_Data_Fragments_endpoint.
Also, you can see a demo of client-side SPARQL evaluation and LDF server
usage here: http://ldfclient.wmflabs.org/
Please note - it's in no way a production service for anything, just a
proof-of-concept deployment of LDF client. If you like how it works, you
can get it from the source
https://github.com/LinkedDataFragments/jQuery-Widget.js and deploy it
on your own setup.
Feel free to ask Stas (Smalyshev (WMF)) for any further question!
--
Léa Lacroix
Community Communication Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de http://www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikidata] Can LDF scale? (Was: Linked data fragment enabled on the Query Service)