I assume I qualify for (1) and (2). I can add an identifyable comment
with a '#Tool:' prefix to every major sparql query done by our tools.
One bot run usually generates a few very heavy queries, and 10,000s of
smaller ones, depending on the actual task a bot performs. All of this
serves to keep the data in WD consistent, avoid duplicates, etc and,
in principle, acts as a combination of database connector and Wikidata
API wrapper.
--
Sebastian Burgstaller-Muehlbacher, PhD
Research Associate
Andrew Su Lab
MEM-216, Department of Molecular and Experimental Medicine
The Scripps Research Institute
10550 North Torrey Pines Road
La Jolla, CA 92037
On Fri, Sep 30, 2016 at 11:53 AM, Markus Kroetzsch
markus.kroetzsch@tu-dresden.de wrote:
> On 30.09.2016 19:50, Andra Waagmeester wrote:
>>
>> Just curious while we are on the topic. When you are inspecting the
>> headers to separate between "organic" queries and bot queries, would it
>> be possible to count the times a set of properties is used in the
>> different queries? This would be a nice way to demonstrate to original
>> external resources how "their" data is used and which combination of
>> properties are used together with "their" properties (eg. P351 for ncbi
>> gene or P699 for the disease ontology). It would be interesting to know
>> how often for example those two properties are used in one single query.
>
>
> Yes, we definitely want to do such analyses. The first task is to clean up
> and group/categorize queries so we can get a better understanding (if a
> property is used in 100K queries a day, it would still be nice to know if
> they come from a single script or from many users).
>
> Once we have this, we would like to analyse for content (which properties
> and classes are used, etc.) but also for query feature (how many OPTIONALs,
> GROUP BYs, etc. are used). Ideas on what to analyse further are welcome. Of
> course, SPARQL can only give a partial idea of "usage", since Wikidata
> content can be used in ways that don't involve SPARQL. Moreover, counting
> raw numbers of queries can also be misleading: we have had cases where a
> single query result was discussed by hundreds of people (e.g. the Panama
> papers query that made it to Le Monde online), but in the logs it will still
> show up only as a single query among millions.
>
> Best,
>
> Markus
>
>
>> On Fri, Sep 30, 2016 at 4:44 PM, Markus Kroetzsch
>> <markus.kroetzsch@tu-dresden.de
mailto:markus.kroetzsch@tu-dresden.de>
>> wrote:
>>
>> On 30.09.2016 16:18, Andra Waagmeester wrote:
>>
>> Would it help if I add the following header to every large batch
>> of queries?
>>
>> #######
>> # access: (
http://query.wikidata.org
>> or
>>
>>
https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=%7BSPARQL%7D
>>
>>
https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=%7BSPARQL%7D
>> .)
>> # contact: email, acountname, twittername etc
>> # bot: True/False
>> # .........
>> ######
>>
>>
>> This is already more detailed than what I had in mind. Having a way
>> to tell apart bots and tools from "organic" queries would already be
>> great. We are mainly looking for something that will help us to
>> understand sudden peaks of activity. For this, it might be enough to
>> have a short signature (a URL could be given, but a tool name with a
>> version would also be fine). This is somewhat like the "user agent"
>> field in HTTP.
>>
>> But you are right that some formatting convention may help further
>> here. How about this:
>>
>> #TOOL:<any user agent information that you like to share>
>>
>> Then one could look for comments of this form without knowing all
>> the tools upfront. Of course, this is just a hint in any case, since
>> one could always use the same comment in any manually written query.
>>
>> Best regards,
>>
>> Markus
>>
>>
>> On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch
>> <markus.kroetzsch@tu-dresden.de
>>
mailto:markus.kroetzsch@tu-dresden.de
>>
mailto:markus.kroetzsch@tu-dresden.de
>
>>
mailto:markus.kroetzsch@tu-dresden.de>>
>>
>> wrote:
>>
>> Dear SPARQL users,
>>
>> We are starting a research project to investigate the use of
>> the
>> Wikidata SPARQL Query Service, with the goal to gain
>> insights that
>> may help to improve Wikidata and the query service [1].
>> Currently,
>> we are still waiting for all data to become available.
>> Meanwhile, we
>> would like to ask for your input.
>>
>> Preliminary analyses show that the use of the SPARQL query
>> service
>> varies greatly over time, presumably because power users and
>> software tools are running large numbers of queries. For a
>> meaningful analysis, we would like to understand such
>> high-impact
>> biases in the data. We therefore need your help:
>>
>> (1) Are you a SPARQL power user who sometimes runs large
>> numbers of
>> queries (over 10,000)? If so, please let us know how your
>> queries
>> might typically look so we can identify them in the logs.
>>
>> (2) Are you the developer of a tool that launches SPARQL
>> queries? If
>> so, then please let us know if there is any way to identify
>> your
>> queries.
>>
>> If (1) or (2) applies to you, then it would be good if you
>> could
>> include an identifying comment into your SPARQL queries in the
>> future, to make it easier to recognise them. In return, this
>> would
>> enable us to provide you with statistics on the usage of
>> your tool [2].
>>
>> Further feedback is welcome.
>>
>> Cheers,
>>
>> Markus
>>
>>
>> [1]
>>
>>
>>
https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
>>
>>
https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
>>
>>
>>
https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
>
>>
https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries>
>>
>> [2] Pending permission by the WMF. Like all Wikimedia usage
>> data,
>> the query logs are under strict privacy protection, so we
>> will need
>> to get clearance before sharing any findings with the public.
>> We
>> hope, however, that there won't be any reservations against
>> publishing non-identifying information.
>>
>> --
>> Prof. Dr. Markus Kroetzsch
>> Knowledge-Based Systems Group
>> Faculty of Computer Science
>> TU Dresden
>> +49 351 463 38486
tel:%2B49%20351%20463%2038486
>>
tel:%2B49%20351%20463%2038486
>>
https://iccl.inf.tu-dresden.de/web/KBS/en
>>
https://iccl.inf.tu-dresden.de/web/KBS/en
>>
https://iccl.inf.tu-dresden.de/web/KBS/en
>
https://iccl.inf.tu-dresden.de/web/KBS/en>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>>
mailto:Wikidata@lists.wikimedia.org
>>
mailto:Wikidata@lists.wikimedia.org
>
mailto:Wikidata@lists.wikimedia.org>
>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>
https://lists.wikimedia.org/mailman/listinfo/wikidata>
>>
>>
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
mailto:Wikidata@lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
mailto:Wikidata@lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata