On Thu, Jul 9, 2020 at 4:52 PM Egon Willighagen egon.willighagen@gmail.com wrote:
Dear Guillaume,
On Thu, Jul 9, 2020 at 3:23 PM Guillaume Lederrey glederrey@wikimedia.org wrote:
Some very preliminary analysis indicates that less then 2% of the queries on WDQS generate more than 90% of the load. This is definitely something we need to better understand.
Is the data behind that available? I wonder if I recognize any of the top 25 queries.
No, the data isn't publicly available. Queries can (and do) contain private information, so we don't publish raw queries. We might publish a subset of those queries at some point, but only after having reviewed them manually to ensure they are clean.
(I guess the top 2% can be simple queries run very many times, as well as
hard queries rarely run, correct?)
The analysis at this point is just on individual queries, with no aggregation of similar queries. This means that this 2% of queries are very expensive queries. We need to refine that analysis, and aggregation of similar queries is one of the things we should be working on.
Egon
-- Hi, do you like citation networks? Already 51% of all citations are available https://i4oc.org/ available for innovative new uses https://twitter.com/hashtag/acs2ioc. Join me in asking the American Chemical Society to join the Initiative for Open Citations too https://www.change.org/p/asking-the-american-chemical-society-to-join-the-initiative-for-open-citations. SpringerNature, the RSC and many others already did https://i4oc.org/#publishers.
E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: https://www.zotero.org/egonw ORCID: 0000-0001-7542-0286 http://orcid.org/0000-0001-7542-0286 ImpactStory: https://impactstory.org/u/egonwillighagen _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata