Re: [Wikidata] Wikidata SPARQL query logs available

23 Aug 2018

On 23/08/18 23:10, Stas Malyshev wrote:
...
  Hi!

 On 8/23/18 2:07 PM, Daniel Mietchen wrote:
  On Thu, Aug 23, 2018 at 10:44 PM
&lt;fn(a)imm.dtu.dk&gt; wrote:
  I was wondering why our research section was
number 8. Then I recalled
 our dashboard running from
 "http://people.compute.dtu.dk/faan/cognitivesystemswikidata1.html". It
 updates around each 3 minute all day long... 
 Such automated queries should not be in the organic query file that I looked at.  
 If it's a browser page and the underlying code does not set distinctive
 user agent, I think they will be. It'd be hard to identify such cases
 otherwise (ccing Markus in case he knows more on the topic). 
Yes, the "organic" file is a subset of the queries from agents that 
pretended to be a browser. We filtered agents and query patterns that 
were clearly not "human-like" but a tool that asks one query every 3 min 
would not be recognised at this level.

Such a tool would also not strongly affect most statistics, but it can 
in cases of statistics that have an extremely high number of possible 
values (e.g., items used in the query). In such cases, normal "organic" 
traffic is usually so diverse, that no individual value receives much 
prominence, so that even a rather small number of queries from one 
source could have an impact.

In general, popularity measures based on query traffic, even on the 
organic part, must be taken with caution, because of the many effects 
that lead to skewed query volumes from a particular source (without this 
necessarily indicating real "popularity"). It is an open question how 
one should best evaluate the traffic in the presence of these skews. Our 
two-class system of "robotic" [massively skewed] and "organic" [less 
skewed] is only a first step there.

Best,

Markus

...

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Wikidata SPARQL query logs available