The best I could offer you Felix is a very small subset of queries that have been manually reviewed for release. These queries are within our result grading platform, Discernatron.  You will need to first login at and then visit This will output a list of query results that have been graded, from that you can extract the individual queries that were used. You may be able to use these scores for an nDCG calculation. Unfortunately the list of graded queries is very small. There are 95 unique queries, and 4219 scored result pages.

On Wed, Aug 17, 2016 at 9:51 AM, Pine W <> wrote:

Hi Felix,

There was recently a discussion about releasing raw queries, and the decision was made by WMF not to release raw queries for privacy reasons. Personally, I support that decision because the risks seem to far outweigh the benefits. The staff from Discovery may be able to provide you with more detail or alternatives, but I would say that the odds of releasing raw data from is low.

Sometimes WMF allows access to sensitive data if an NDA is signed. In this case, I feel that the risks are too high even for that to be allowed. That's a personal opinion only; the official answer will come from WMF.


On Aug 17, 2016 08:39, "Tilman Bayer" <> wrote:
CCing the WMF Search and Discovery mailing list
( )

On Wed, Aug 17, 2016 at 6:00 AM, Felix Engelmann
<> wrote:
> Hi everybody,
> I’m currently writing by bachelor thesis at University Koblenz, Germany. The goal is to improve Wikipedia search by exploiting the text structure of Wikipedia articles. To conduct unbiased user studies I need real world queries so I can compare the novel algorithms agains the currently used ones. Are there any query logs existing which I can use for this purpose?
> Thanks for your help!
> Felix Engelmann
> _______________________________________________
> Wiki-research-l mailing list

Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB

discovery mailing list

discovery mailing list