Re: [discovery] Completion suggester and pageviews

26 Jan 2016


      Le 26/01/2016 11:20, billinghurst a écrit :
...
I would think that 1/4 of
searches for "po..." are not for pornhub, though I am not aware that
such data is available.
Yes it's the main problem we have today, score is computed from document 
metadata (size, templates, headings, incoming_links... and now pageviews).
Search usage is not part of the score: we suggest pages not search queries.
Another problem I have today is that I don't have any good method to 
evaluate the quality of the formula.
I've added a small page on wikitech that describes the formula[1]. It's 
the R script I use to briefly evaluate the score distribution before 
testing on en-suggesty. Note that this page is not necessarily updated 
with the latest params, gerrit[2] may contain up-to-date params with 
what you can see on en-suggesty.
Another data I failed to use is term statistics from the prefixsearch 
index[2], it helps to see the level of ambiguity of a prefix according 
to its length.
Any suggestions to improve the method and/or the formula are very welcome.
Thanks!
[1] 
https://wikitech.wikimedia.org/wiki/User:DCausse/Completion_Suggester_And_Pa...
[2] https://gerrit.wikimedia.org/r/#/c/265771/
[3] 
https://wikitech.wikimedia.org/wiki/User:DCausse/Term_Stats_With_Cirrus_Dump

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [discovery] Completion suggester and pageviews