Re: [discovery] Cirrus request logs are now in hive

2 Dec 2015

OK, I'll try to work on a small doc to describe the features available 
in cirrus from the query parser to fallback methods.
I'll try to map some query classes to each of these features and see 
what needs to be added by cirrus to these logs.

But I think we can already start to classify some queries with the UDF 
to detect special syntax (AND/OR/NOT) and the number of results.
The work on pageviews directly address a class of queries that are 
identifiable with the data available in these logs today e.g.:

- One word query with more than X results will be directly affected by 
the addition of pageviews in the ranking.
- 2 words or more with more than X results will be also affected but 
another feature that relates to words proximity can take precedence.

I still don't know what makes sense for X but it's a minimum of 20 (we 
display 20 results by default).

Le 02/12/2015 15:46, Oliver Keyes a écrit :
...
  Well, the query classification Mikhail was suggesting
involved adding
 data to the logs. So in and of itself, this does not help. But it is a
 fantastic achievement, and I am looking forward to switching our data
 collection scripts over to using this.

 On 2 December 2015 at 09:30, David Causse &lt;dcausse(a)wikimedia.org&gt; wrote:
  Hi,

 The work started by Erik few month ago is finally done. Cirrus requests are
 now available in the hive table wmf_raw.CirrusSearchRequestSet.

 I really hope this will help us to understand the kind of queries we are
 serving and start to work on query classification as Mikhail suggested.

 David.

 _______________________________________________
 discovery mailing list
 discovery(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/discovery 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [discovery] Cirrus request logs are now in hive