Cirrus request logs are now in hive

List overview All Threads
Download

newer

older

Retrospective notes posted, with...

Results of Language Switching A/B...

David Causse

2 Dec 2015 2 Dec '15

2:30 p.m.

Hi,

The work started by Erik few month ago is finally done. Cirrus requests are now available in the hive table wmf_raw.CirrusSearchRequestSet.

I really hope this will help us to understand the kind of queries we are serving and start to work on query classification as Mikhail suggested.

David.

Show replies by date

Oliver Keyes

2 Dec 2 Dec

2:46 p.m.

Well, the query classification Mikhail was suggesting involved adding data to the logs. So in and of itself, this does not help. But it is a fantastic achievement, and I am looking forward to switching our data collection scripts over to using this.

On 2 December 2015 at 09:30, David Causse dcausse@wikimedia.org wrote:

...

Hi,

The work started by Erik few month ago is finally done. Cirrus requests are now available in the hive table wmf_raw.CirrusSearchRequestSet.

I really hope this will help us to understand the kind of queries we are serving and start to work on query classification as Mikhail suggested.

David.

discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

-- Oliver Keyes Count Logula Wikimedia Foundation

David Causse

3:20 p.m.

OK, I'll try to work on a small doc to describe the features available in cirrus from the query parser to fallback methods. I'll try to map some query classes to each of these features and see what needs to be added by cirrus to these logs.

But I think we can already start to classify some queries with the UDF to detect special syntax (AND/OR/NOT) and the number of results. The work on pageviews directly address a class of queries that are identifiable with the data available in these logs today e.g.:

- One word query with more than X results will be directly affected by the addition of pageviews in the ranking. - 2 words or more with more than X results will be also affected but another feature that relates to words proximity can take precedence.

I still don't know what makes sense for X but it's a minimum of 20 (we display 20 results by default).

Le 02/12/2015 15:46, Oliver Keyes a écrit :

...

Well, the query classification Mikhail was suggesting involved adding data to the logs. So in and of itself, this does not help. But it is a fantastic achievement, and I am looking forward to switching our data collection scripts over to using this.

On 2 December 2015 at 09:30, David Causse dcausse@wikimedia.org wrote:

...
Hi,

The work started by Erik few month ago is finally done. Cirrus requests are now available in the hive table wmf_raw.CirrusSearchRequestSet.

I really hope this will help us to understand the kind of queries we are serving and start to work on query classification as Mikhail suggested.

David.

discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery

3155

Age (days ago)

3155

Last active (days ago)

discovery@lists.wikimedia.org

2 comments

2 participants

tags (0)

participants (2)

David Causse
Oliver Keyes