Re: [Wikidata] SPARQL service timeouts

19 Apr 2016

On 19.04.2016 11:33, Addshore wrote:
...
  Also per https://phabricator.wikimedia.org/T126730
and
 https://gerrit.wikimedia.org/r/#/c/274864/8 requests to the query
 service are now cached for 60 seconds.
 I expect this will include error results from timeouts so retrying a
 request within the same 60 seconds as the first won't event reach the
 WDQS servers now. 
Maybe this could be the answer. Is it possible that the cache stores the 
truncated result but not the Java exception? Then the behaviour could be 
a timeout which just is not reported properly. Ideally, partial results 
should not be cached or the "timeout" should be cached so that a renewed 
request (in 60sec) returns an immediate timeout rather than a broken 
result set.

Cheers,

Markus

...

 On 19 April 2016 at 10:05, Addshore &lt;addshorewiki(a)gmail.com
 <mailto:addshorewiki@gmail.com>> wrote:

     In the case we are discussing here the truncated JSON is caused by
     blaze graph deciding it has been sending data for too long and then
     stopping (as I understand).
     Thus you will only see a spike on the graph for the amount of data
     actually sent from the server, not the size of the result blazegraph
     was trying to send back.

     I also ran into this with some simple queries that returned big sets
     of data.
     Although with my issue I did actually also see a Java exception
     somewhere.

     On 18 April 2016 at 21:51, Markus Kroetzsch
     &lt;markus.kroetzsch(a)tu-dresden.de
     <mailto:markus.kroetzsch@tu-dresden.de>> wrote:

         On 18.04.2016 22:21, Markus Kroetzsch wrote:

             On 18.04.2016 21:56, Markus Kroetzsch wrote:

                 Thanks, the dashboard is interesting.

                 I am trying to run this query:

                 SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC }

                 It is supposed to return a large result set. But I am
                 only running it
                 once per week. It used to work fine, but today I could
                 not get it to
                 succeed a single time.

             Actually, the query seems to work as it should. I am
             investigating why I
             get an error in some cases on my machine.

         Ok, I found that this is not so easy to reproduce reliably. The
         symptom I am seeing is a truncated JSON response, which just
         stops in the middle of the data (at a random location, but
         usually early on), and which is *not* followed by any error
         message. The stream just ends.

         So far, I could only get this in Java, not in Python, and it
         does not always happen. If successful, the result is about 250M
         in size. The following Python script can retrieve it:

         import requests
         SPARQL_SERVICE_URL = 'https://query.wikidata.org/sparql'
         query = """SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC
         }"""
         print requests.get(SPARQL_SERVICE_URL, params={'query': query,
         'format': 'json'}).text

         (output should be redirected to a file)

         I will keep an eye on the issue, but I don't know how to debug
         this any further now, since it started to work without me
         changing any code.

         I also wonder how to read the dashboard after all. In spite of
         me repeating an experiment that creates a 250M result file for
         five times in the past few minutes, the "Bytes out" figure
         remains below a few MB for most of the time.

         Markus

                 On 18.04.2016 21:40, Stas Malyshev wrote:

                     Hi!

                         I have the impression that some not-so-easy
                         SPARQL queries that used to
                         run just below the timeout are now timing out
                         regularly. Has there been
                         a change in the setup that may have caused this,
                         or are we maybe seeing
                         increased query traffic [1]?

                     We've recently run on a single server for couple of
                     days due to
                     reloading of the second one, so this may have made
                     it a bit slower. But
                     that should be gone now, we're back to two. Other
                     than that, not seeing
                     anything abnormal in
                     https://grafana.wikimedia.org/dashboard/db/wikidata-query-service

                         [1] The deadline for the Int. Semantic Web Conf.
                         is coming up, so it
                         might be that someone is running experiments on
                         the system to get their
                         paper finished. It has been observed for other
                         endpoints that traffic
                         increases at such times. This community
                         sometimes is the greatest enemy
                         of its own technology ... (I recently had to
                         IP-block an RDF crawler
                         from one of my sites after it had ignored
                         robots.txt completely).

                     We don't have any blocks or throttle mechanisms
                     right now. But if we see
                     somebody making serious negative impact on the
                     service, we may have to
                     change that.

         --
         Markus Kroetzsch
         Faculty of Computer Science
         Technische Universität Dresden
         +49 351 463 38486 <tel:%2B49%20351%20463%2038486>
         http://korrekt.org/

         _______________________________________________
         Wikidata mailing list
         Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
         https://lists.wikimedia.org/mailman/listinfo/wikidata

     --
     Addshore

 --
 Addshore

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] SPARQL service timeouts