On 18 April 2016 at 21:51, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de> wrote:

On 18.04.2016 22:21, Markus Kroetzsch wrote:

On 18.04.2016 21:56, Markus Kroetzsch wrote:

Thanks, the dashboard is interesting.

I am trying to run this query:

SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC }

It is supposed to return a large result set. But I am only running it
once per week. It used to work fine, but today I could not get it to
succeed a single time.

Actually, the query seems to work as it should. I am investigating why I
get an error in some cases on my machine.

Ok, I found that this is not so easy to reproduce reliably. The symptom I am seeing is a truncated JSON response, which just stops in the middle of the data (at a random location, but usually early on), and which is *not* followed by any error message. The stream just ends.

So far, I could only get this in Java, not in Python, and it does not always happen. If successful, the result is about 250M in size. The following Python script can retrieve it:

import requests
SPARQL_SERVICE_URL = 'https://query.wikidata.org/sparql'
query = """SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC }"""
print requests.get(SPARQL_SERVICE_URL, params={'query': query, 'format': 'json'}).text

(output should be redirected to a file)

I will keep an eye on the issue, but I don't know how to debug this any further now, since it started to work without me changing any code.

I also wonder how to read the dashboard after all. In spite of me repeating an experiment that creates a 250M result file for five times in the past few minutes, the "Bytes out" figure remains below a few MB for most of the time.

Markus

On 18.04.2016 21:40, Stas Malyshev wrote:

Hi!

I have the impression that some not-so-easy SPARQL queries that used to
run just below the timeout are now timing out regularly. Has there been
a change in the setup that may have caused this, or are we maybe seeing
increased query traffic [1]?

We've recently run on a single server for couple of days due to
reloading of the second one, so this may have made it a bit slower. But
that should be gone now, we're back to two. Other than that, not seeing
anything abnormal in
https://grafana.wikimedia.org/dashboard/db/wikidata-query-service

[1] The deadline for the Int. Semantic Web Conf. is coming up, so it
might be that someone is running experiments on the system to get their
paper finished. It has been observed for other endpoints that traffic
increases at such times. This community sometimes is the greatest enemy
of its own technology ... (I recently had to IP-block an RDF crawler
from one of my sites after it had ignored robots.txt completely).

We don't have any blocks or throttle mechanisms right now. But if we see
somebody making serious negative impact on the service, we may have to
change that.

--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Addshore