Re: [discovery] Completion suggester and pageviews

26 Jan 2016

Le 26/01/2016 11:20, billinghurst a écrit :
...
  I would think that 1/4 of
 searches for "po..." are not for pornhub, though I am not aware that
 such data is available.
 Yes it's the main problem we have today, score is computed from document 
metadata (size, templates, headings, incoming_links... and now pageviews).
Search usage is not part of the score: we suggest pages not search queries.

Another problem I have today is that I don't have any good method to 
evaluate the quality of the formula.
I've added a small page on wikitech that describes the formula[1]. It's 
the R script I use to briefly evaluate the score distribution before 
testing on en-suggesty. Note that this page is not necessarily updated 
with the latest params, gerrit[2] may contain up-to-date params with 
what you can see on en-suggesty.
Another data I failed to use is term statistics from the prefixsearch 
index[2], it helps to see the level of ambiguity of a prefix according 
to its length.

Any suggestions to improve the method and/or the formula are very welcome.

Thanks!

[1] 
https://wikitech.wikimedia.org/wiki/User:DCausse/Completion_Suggester_And_P…
[2] https://gerrit.wikimedia.org/r/#/c/265771/
[3] 
https://wikitech.wikimedia.org/wiki/User:DCausse/Term_Stats_With_Cirrus_Dump 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [discovery] Completion suggester and pageviews