Just curious while we are on the topic. When you are
inspecting the
headers to separate between "organic" queries and bot queries, would it
be possible to count the times a set of properties is used in the
different queries? This would be a nice way to demonstrate to original
external resources how "their" data is used and which combination of
properties are used together with "their" properties (eg. P351 for ncbi
gene or P699 for the disease ontology). It would be interesting to know
how often for example those two properties are used in one single query.
Yes, we definitely want to do such analyses. The first task is to clean
up and group/categorize queries so we can get a better understanding (if
a property is used in 100K queries a day, it would still be nice to know
if they come from a single script or from many users).
Once we have this, we would like to analyse for content (which
properties and classes are used, etc.) but also for query feature (how
many OPTIONALs, GROUP BYs, etc. are used). Ideas on what to analyse
further are welcome. Of course, SPARQL can only give a partial idea of
"usage", since Wikidata content can be used in ways that don't involve
SPARQL. Moreover, counting raw numbers of queries can also be
misleading: we have had cases where a single query result was discussed
by hundreds of people (e.g. the Panama papers query that made it to Le
Monde online), but in the logs it will still show up only as a single
query among millions.
Best,
Markus
On Fri, Sep 30, 2016 at 4:44 PM, Markus Kroetzsch
<markus.kroetzsch(a)tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de>>
wrote:
On 30.09.2016 16:18, Andra Waagmeester wrote:
Would it help if I add the following header to every large batch
of queries?
#######
# access: (
http://query.wikidata.org
or
https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={SPARQL}
<https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=%7BSPARQL%7D>
.)
# contact: email, acountname, twittername etc
# bot: True/False
# .........
######
This is already more detailed than what I had in mind. Having a way
to tell apart bots and tools from "organic" queries would already be
great. We are mainly looking for something that will help us to
understand sudden peaks of activity. For this, it might be enough to
have a short signature (a URL could be given, but a tool name with a
version would also be fine). This is somewhat like the "user agent"
field in HTTP.
But you are right that some formatting convention may help further
here. How about this:
#TOOL:<any user agent information that you like to share>
Then one could look for comments of this form without knowing all
the tools upfront. Of course, this is just a hint in any case, since
one could always use the same comment in any manually written query.
Best regards,
Markus
On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch
<markus.kroetzsch(a)tu-dresden.de
<mailto:markus.kroetzsch@tu-dresden.de>
<mailto:markus.kroetzsch@tu-dresden.de
<mailto:markus.kroetzsch@tu-dresden.de>>>
wrote:
Dear SPARQL users,
We are starting a research project to investigate the use of the
Wikidata SPARQL Query Service, with the goal to gain
insights that
may help to improve Wikidata and the query service [1].
Currently,
we are still waiting for all data to become available.
Meanwhile, we
would like to ask for your input.
Preliminary analyses show that the use of the SPARQL query
service
varies greatly over time, presumably because power users and
software tools are running large numbers of queries. For a
meaningful analysis, we would like to understand such
high-impact
biases in the data. We therefore need your help:
(1) Are you a SPARQL power user who sometimes runs large
numbers of
queries (over 10,000)? If so, please let us know how your
queries
might typically look so we can identify them in the logs.
(2) Are you the developer of a tool that launches SPARQL
queries? If
so, then please let us know if there is any way to identify your
queries.
If (1) or (2) applies to you, then it would be good if you could
include an identifying comment into your SPARQL queries in the
future, to make it easier to recognise them. In return, this
would
enable us to provide you with statistics on the usage of
your tool [2].
Further feedback is welcome.
Cheers,
Markus
[1]
https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
<https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries>
<https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
<https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries>>
[2] Pending permission by the WMF. Like all Wikimedia usage
data,
the query logs are under strict privacy protection, so we
will need
to get clearance before sharing any findings with the public. We
hope, however, that there won't be any reservations against
publishing non-identifying information.
--
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Faculty of Computer Science
TU Dresden
+49 351 463 38486 <tel:%2B49%20351%20463%2038486>
<tel:%2B49%20351%20463%2038486>
https://iccl.inf.tu-dresden.de/web/KBS/en
<https://iccl.inf.tu-dresden.de/web/KBS/en>
<https://iccl.inf.tu-dresden.de/web/KBS/en
<https://iccl.inf.tu-dresden.de/web/KBS/en>>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
<mailto:Wikidata@lists.wikimedia.org>
<mailto:Wikidata@lists.wikimedia.org
<mailto:Wikidata@lists.wikimedia.org>>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>
<https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata