Re: [Wikidata] SPARQL CONSTRUCT results truncated

16 Feb 2016


      Thanks Markus, I've created https://phabricator.wikimedia.org/T127070 with the details.
-----Ursprüngliche Nachricht-----
Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag von Markus Krötzsch
Gesendet: Dienstag, 16. Februar 2016 14:57
An: Discussion list for the Wikidata project.
Betreff: Re: [Wikidata] SPARQL CONSTRUCT results truncated
Hi Joachim,
I think SERVICE queries should be working, but maybe Stas knows more about this. Even if they are disabled, this should not result in some message rather than in a NullPointerException. Looks like a bug.
Markus
On 16.02.2016 13:56, Neubert, Joachim wrote:
...
Hi Markus,
Great that you checked that out. I can confirm that the simplified query worked for me, too. It took 15.6s and revealed roughly the same number of results (323789).
When I loaded the results into http://zbw.eu/beta/sparql/econ_pers/query, an endpoint for "economics-related" persons, it matched with 36050 persons (supposedly the "most important" 8 percent of our set).
What I normally would do to get the according Wikipedia site URLs, is a query against the wikidata endpoint, which references the relevant wikidata URIs via a "service" clause:
PREFIX skos: http://www.w3.org/2004/02/skos/core#
PREFIX schema: http://schema.org/
#
construct {
   ?gnd schema:about ?sitelink .
}
where {
   service http://zbw.eu/beta/sparql/econ_pers/query {
     ?gnd skos:prefLabel [] ;
          skos:exactMatch ?wd .
     filter(contains(str(?wd), 'wikidata'))
   }
   ?sitelink schema:about ?wd ;
             schema:inLanguage ?language .
   filter (contains(str(?sitelink), 'wikipedia'))
   filter (lang(?wdLabel) = ?language && ?language in ('en', 'de')) }
This however results in a java error.
If "service" clauses are supposed to work in the wikidata endpoint, I'd happily provide addtitional details in phabricator.
For now, I'll get the data via your java example code :)
Cheers, Joachim
-----Ursprüngliche Nachricht-----
Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag 
von Markus Kroetzsch
Gesendet: Samstag, 13. Februar 2016 22:56
An: Discussion list for the Wikidata project.
Betreff: Re: [Wikidata] SPARQL CONSTRUCT results truncated
And here is another comment on this interesting topic :-)
I just realised how close the service is to answering the query. It turns out that you can in fact get the whole set of (currently >324000 result items) together with their GND identifiers as a download *within the timeout* (I tried several times without any errors). This is a 63M json result file with >640K individual values, and it downloads in no time on my home network. The query I use is simply this:
PREFIX wd: http://www.wikidata.org/entity/ PREFIX wdt: 
http://www.wikidata.org/prop/direct/
select ?item ?gndId
where {
    ?item wdt:P227 ?gndId ; # get gnd ID
          wdt:P31  wd:Q5  . # instance of human } ORDER BY ASC(?gndId) 
LIMIT 10
(don't run this in vain: even with the limit, the ORDER clause 
requires the service to compute all results every time someone runs 
this. Also be careful when removing the limit; your browser may hang 
on an HTML page that large; better use the SPARQL endpoint directly to 
download the complete result file.)
It seems that the timeout is only hit when adding more information (labels and wiki URLs) to the result.
So it seems that we are not actually very far away from being able to answer the original query even within the timeout. Certainly not as far away as I first thought. It might not be necessary at all to switch to a different approach (though it would be interesting to know how long LDF takes to answer the above -- our current service takes less than 10sec).
Cheers,
Markus
On 13.02.2016 11:40, Peter Haase wrote:
...
Hi,
you may want to check out the Linked Data Fragment server in Blazegraph:
https://github.com/blazegraph/BlazegraphBasedTPFServer
Cheers,
Peter
...
On 13.02.2016, at 01:33, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
...
The Linked data fragments approach Osma mentioned is very 
interesting (particularly the bit about setting it up on top of an 
regularily updated existing endpoint), and could provide another 
alternative, but I have not yet experimented with it.
There is apparently this:
https://github.com/CristianCantoro/wikidataldf
though not sure what it its status - I just found it.
In general, yes, I think checking out LDF may be a good idea. I'll 
put it on my todo list.
--
Stas Malyshev
smalyshev@wikimedia.org

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] SPARQL CONSTRUCT results truncated