Re: [Wikidata] SPARQL power users and developers

30 Sep 2016


      Hi Markus,
I assume I qualify for (1) and (2). I can add an identifyable comment
with a  '#Tool:' prefix to every major sparql query done by our tools.
One bot run usually generates a few very heavy queries, and 10,000s of
smaller ones, depending on the actual task a bot performs. All of this
serves to keep the data in WD consistent, avoid duplicates, etc and,
in principle, acts as a combination of database connector and Wikidata
API wrapper.
Best,
Sebastian
-- 

Sebastian Burgstaller-Muehlbacher, PhD
Research Associate
Andrew Su Lab
MEM-216, Department of Molecular and Experimental Medicine
The Scripps Research Institute
10550 North Torrey Pines Road
La Jolla, CA 92037


On Fri, Sep 30, 2016 at 11:53 AM, Markus Kroetzsch
markus.kroetzsch@tu-dresden.de wrote:
> On 30.09.2016 19:50, Andra Waagmeester wrote:
>>
>> Just curious while we are on the topic. When you are inspecting the
>> headers to separate between "organic" queries and bot queries, would it
>> be possible to count the times a set of properties is used in the
>> different queries? This would be a nice way to demonstrate to original
>> external resources how "their" data is used and which combination of
>> properties are used together with "their" properties (eg. P351 for ncbi
>> gene or P699 for the disease ontology). It would be interesting to know
>> how often for example those two properties are used in one single query.
>
>
> Yes, we definitely want to do such analyses. The first task is to clean up
> and group/categorize queries so we can get a better understanding (if a
> property is used in 100K queries a day, it would still be nice to know if
> they come from a single script or from many users).
>
> Once we have this, we would like to analyse for content (which properties
> and classes are used, etc.) but also for query feature (how many OPTIONALs,
> GROUP BYs, etc. are used). Ideas on what to analyse further are welcome. Of
> course, SPARQL can only give a partial idea of "usage", since Wikidata
> content can be used in ways that don't involve SPARQL. Moreover, counting
> raw numbers of queries can also be misleading: we have had cases where a
> single query result was discussed by hundreds of people (e.g. the Panama
> papers query that made it to Le Monde online), but in the logs it will still
> show up only as a single query among millions.
>
> Best,
>
> Markus
>
>
>> On Fri, Sep 30, 2016 at 4:44 PM, Markus Kroetzsch
>> <markus.kroetzsch@tu-dresden.de mailto:markus.kroetzsch@tu-dresden.de>
>> wrote:
>>
>>     On 30.09.2016 16:18, Andra Waagmeester wrote:
>>
>>         Would it help if I add the following header to every large batch
>>         of queries?
>>
>>         #######
>>         # access: (http://query.wikidata.org
>>         or
>>
>> https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=%7BSPARQL%7D
>>
>> https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=%7BSPARQL%7D
>>         .)
>>         # contact: email, acountname, twittername etc
>>         # bot: True/False
>>         # .........
>>         ######
>>
>>
>>     This is already more detailed than what I had in mind. Having a way
>>     to tell apart bots and tools from "organic" queries would already be
>>     great. We are mainly looking for something that will help us to
>>     understand sudden peaks of activity. For this, it might be enough to
>>     have a short signature (a URL could be given, but a tool name with a
>>     version would also be fine). This is somewhat like the "user agent"
>>     field in HTTP.
>>
>>     But you are right that some formatting convention may help further
>>     here. How about this:
>>
>>     #TOOL:<any user agent information that you like to share>
>>
>>     Then one could look for comments of this form without knowing all
>>     the tools upfront. Of course, this is just a hint in any case, since
>>     one could always use the same comment in any manually written query.
>>
>>     Best regards,
>>
>>     Markus
>>
>>
>>         On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch
>>         <markus.kroetzsch@tu-dresden.de
>>         mailto:markus.kroetzsch@tu-dresden.de
>>         mailto:markus.kroetzsch@tu-dresden.de
>
>>         mailto:markus.kroetzsch@tu-dresden.de>>
>>
>>         wrote:
>>
>>             Dear SPARQL users,
>>
>>             We are starting a research project to investigate the use of
>> the
>>             Wikidata SPARQL Query Service, with the goal to gain
>>         insights that
>>             may help to improve Wikidata and the query service [1].
>>         Currently,
>>             we are still waiting for all data to become available.
>>         Meanwhile, we
>>             would like to ask for your input.
>>
>>             Preliminary analyses show that the use of the SPARQL query
>>         service
>>             varies greatly over time, presumably because power users and
>>             software tools are running large numbers of queries. For a
>>             meaningful analysis, we would like to understand such
>>         high-impact
>>             biases in the data. We therefore need your help:
>>
>>             (1) Are you a SPARQL power user who sometimes runs large
>>         numbers of
>>             queries (over 10,000)? If so, please let us know how your
>>         queries
>>             might typically look so we can identify them in the logs.
>>
>>             (2) Are you the developer of a tool that launches SPARQL
>>         queries? If
>>             so, then please let us know if there is any way to identify
>> your
>>             queries.
>>
>>             If (1) or (2) applies to you, then it would be good if you
>> could
>>             include an identifying comment into your SPARQL queries in the
>>             future, to make it easier to recognise them. In return, this
>>         would
>>             enable us to provide you with statistics on the usage of
>>         your tool [2].
>>
>>             Further feedback is welcome.
>>
>>             Cheers,
>>
>>             Markus
>>
>>
>>             [1]
>>
>>
>> https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
>>
>> https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
>>
>>
>> https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
>
>> https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries>
>>
>>             [2] Pending permission by the WMF. Like all Wikimedia usage
>>         data,
>>             the query logs are under strict privacy protection, so we
>>         will need
>>             to get clearance before sharing any findings with the public.
>> We
>>             hope, however, that there won't be any reservations against
>>             publishing non-identifying information.
>>
>>             --
>>             Prof. Dr. Markus Kroetzsch
>>             Knowledge-Based Systems Group
>>             Faculty of Computer Science
>>             TU Dresden
>>             +49 351 463 38486 tel:%2B49%20351%20463%2038486
>>         tel:%2B49%20351%20463%2038486
>>             https://iccl.inf.tu-dresden.de/web/KBS/en
>>         https://iccl.inf.tu-dresden.de/web/KBS/en
>>             https://iccl.inf.tu-dresden.de/web/KBS/en
>         https://iccl.inf.tu-dresden.de/web/KBS/en>
>>
>>             _______________________________________________
>>             Wikidata mailing list
>>             Wikidata@lists.wikimedia.org
>>         mailto:Wikidata@lists.wikimedia.org
>>         mailto:Wikidata@lists.wikimedia.org
>         mailto:Wikidata@lists.wikimedia.org>
>>             https://lists.wikimedia.org/mailman/listinfo/wikidata
>>         https://lists.wikimedia.org/mailman/listinfo/wikidata
>>             https://lists.wikimedia.org/mailman/listinfo/wikidata
>         https://lists.wikimedia.org/mailman/listinfo/wikidata>
>>
>>
>>
>>
>>         _______________________________________________
>>         Wikidata mailing list
>>         Wikidata@lists.wikimedia.org mailto:Wikidata@lists.wikimedia.org
>>         https://lists.wikimedia.org/mailman/listinfo/wikidata
>>         https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>>     _______________________________________________
>>     Wikidata mailing list
>>     Wikidata@lists.wikimedia.org mailto:Wikidata@lists.wikimedia.org
>>     https://lists.wikimedia.org/mailman/listinfo/wikidata
>>     https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] SPARQL power users and developers