On Thu, Jul 9, 2020 at 4:52 PM Egon Willighagen <egon.willighagen@gmail.com> wrote:

Dear Guillaume,

On Thu, Jul 9, 2020 at 3:23 PM Guillaume Lederrey <glederrey@wikimedia.org> wrote:
Some very preliminary analysis indicates that less then 2% of the queries on WDQS generate more than 90% of the load. This is definitely something we need to better understand.

Is the data behind that available? I wonder if I recognize any of the top 25 queries.
No, the data isn't publicly available. Queries can (and do) contain private information, so we don't publish raw queries. We might publish a subset of those queries at some point, but only after having reviewed them manually to ensure they are clean.

(I guess the top 2% can be simple queries run very many times, as well as hard queries rarely run, correct?)

The analysis at this point is just on individual queries, with no aggregation of similar queries. This means that this 2% of queries are very expensive queries. We need to refine that analysis, and aggregation of similar queries is one of the things we should be working on.

