On Thu, Jul 9, 2020 at 4:52 PM Egon Willighagen <egon.willighagen(a)gmail.com>
On Thu, Jul 9, 2020 at 3:23 PM Guillaume Lederrey <glederrey(a)wikimedia.org>
Some very preliminary analysis indicates that
less then 2% of the queries
on WDQS generate more than 90% of the load. This is definitely something we
need to better understand.
Is the data behind that available? I wonder if I recognize any of the top
No, the data isn't publicly available. Queries can (and do) contain private
information, so we don't publish raw queries. We might publish a subset of
those queries at some point, but only after having reviewed them manually
to ensure they are clean.
(I guess the top 2% can be simple queries run very many times, as well as
hard queries rarely run, correct?)
The analysis at this point is just on individual queries, with no
aggregation of similar queries. This means that this 2% of queries are very
expensive queries. We need to refine that analysis, and aggregation of
similar queries is one of the things we should be working on.
Hi, do you like citation networks? Already 51% of all citations are
available <https://i4oc.org/> available for innovative new uses
<https://twitter.com/hashtag/acs2ioc>. Join me in asking the American
Chemical Society to join the Initiative for Open Citations too
the RSC and many others already did <https://i4oc.org/#publishers>.
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/
ORCID: 0000-0001-7542-0286 <http://orcid.org/0000-0001-7542-0286>
Wikidata mailing list
Engineering Manager, Search Platform
UTC+1 / CET