[Wikidata] Wikidata SPARQL query logs available

7 Aug 2018


      Dear all,
I am happy to announce that as part of an ongoing research collaboration 
between TU Dresden researchers and Wikimedia [1], we could now release 
pre-processed logs from the Wikidata SPARQL Query Service [2]. You can 
find details and download links on the following page:
https://iccl.inf.tu-dresden.de/web/Wikidata_SPARQL_Logs/en
The data so far comprises over 200 million queries answered in 
June-August 2017. There is also an accompanying publication that 
describes the workings of and practical experiences with the SPARQL 
query service [3].
The logs have been pre-processed to remove information that could 
potentially be used for identifying individual users (e.g., comments 
were removed, geo-coordinates coarsened, and query strings reformatted 
completely -- see above page for details). Nevertheless, one can still 
learn many interesting things from the logs, e.g., which properties and 
entities are used in queries, which SPARQL features are most prominent, 
or which languages are requested.
We also have preserved some amount of user agent information, but 
without overly detailed software versions and only in cases where the 
agents occurred many times across several weeks. This can at least be 
used to recognise the (significant amount) of queries generated, e.g., 
by Magnus' tools, or to do a rough analysis of which software platforms 
are mostly used to send queries from. We used #TOOL comments found in 
queries to refine user agent information in some cases.
We also made an effort to identify those queries that come from browser 
agents *and* also behave like one would expect from a browser (not all 
"browsers" did). We called such queries "organic" and provide this 
classification with the logs (there is also a filtered dump of only 
organic queries, which is much smaller and therefore nicer to process, 
also for testing). See the paper for details on our methodology.
Finally, the data contains the time of each request, so one can 
reconstruct query loads over time.
Feedback is very welcome, both in terms of comments on the data (is it 
useful to you? would you like to see more? do you have concerns?) and in 
terms of insights that you can get from it (we did some analyses but one 
can surely do more).
Cheers,
Markus
[1] https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
[2] https://query.wikidata.org/ (or rather the web service that powers 
this UI and many other applications).
[3] Stanislav Malyshev, Markus Krötzsch, Larry González, Julius Gonsior, 
Adrian Bielefeldt: Getting the Most out of Wikidata: Semantic Technology 
Usage in Wikipedia’s Knowledge Graph. In Proceedings of the 17th 
International Semantic Web Conference (ISWC-18), Springer 2018. 
https://iccl.inf.tu-dresden.de/web/Wikidata_SPARQL_Logs/en
-- 
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Center for Advancing Electronics Dresden (cfaed)
Faculty of Computer Science
TU Dresden
+49 351 463 38486
https://kbs.inf.tu-dresden.de/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikidata] Wikidata SPARQL query logs available