On Thu, Jul 9, 2020 at 4:52 PM Egon Willighagen <egon.willighagen(a)gmail.com>
wrote:
Dear Guillaume,
On Thu, Jul 9, 2020 at 3:23 PM Guillaume Lederrey <glederrey(a)wikimedia.org>
wrote:
Some very preliminary analysis indicates that
less then 2% of the queries
on WDQS generate more than 90% of the load. This is definitely something we
need to better understand.
Is the data behind that available? I wonder if I recognize any of the top
25 queries.
No, the data isn't publicly available. Queries can (and do) contain private
information, so we don't publish raw queries. We might publish a subset of
those queries at some point, but only after having reviewed them manually
to ensure they are clean.
(I guess the top 2% can be simple queries run very many times, as well as
hard queries rarely run, correct?)
The analysis at this point is just on individual queries, with no
aggregation of similar queries. This means that this 2% of queries are very
expensive queries. We need to refine that analysis, and aggregation of
similar queries is one of the things we should be working on.
Egon
--
Hi, do you like citation networks? Already 51% of all citations are
available <https://i4oc.org/> available for innovative new uses
<https://twitter.com/hashtag/acs2ioc>. Join me in asking the American
Chemical Society to join the Initiative for Open Citations too
<https://www.change.org/p/asking-the-american-chemical-society-to-join-the-initiative-for-open-citations>.
SpringerNature,
the RSC and many others already did <https://i4oc.org/#publishers>.
-----
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (
http://www.bigcat.unimaas.nl/)
Homepage:
http://egonw.github.com/
Blog:
http://chem-bla-ics.blogspot.com/
PubList:
https://www.zotero.org/egonw
ORCID: 0000-0001-7542-0286 <http://orcid.org/0000-0001-7542-0286>
ImpactStory:
https://impactstory.org/u/egonwillighagen
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+1 / CET