Thank you for all your suggestions, very inspiring!
[response inline]
Le 10/02/2016 21:33, Justin Ormont a écrit :
Good hits on page two:
There's a few cases where good results could exist only on page two.
One case is when incorrectly searching for a homophone or other
misspelling. Eg: "their red hot" instead of "they're red hot"
(expected result <https://en.wikipedia.org/wiki/They%27re_Red_Hot> --
wikipedia
<https://en.wikipedia.org/w/index.php?search=their+red+hot&title=Special%3ASearch>
(pos
22), google
<https://www.google.com/search?q=their+red+hot&oq=their+red+hot> (pos
1), bing <https://www.bing.com/search?q=their+red+hot> (pos 2), ddg
<https://duckduckgo.com/?q=their+red+hot> (pos 2)).
Indeed, we do a pretty bad job for this kind of queries. But I still
don't know how to address that correctly. We don't use any synonym
resources yet. This is usually addressed by the list of curated
redirects, in this example we're able to catch only "theyre red hot" but
we fail for their/there/....
Another case is when you get an exact string match on incorrect pages,
but only non-exact string match on the correct page. Eg: "Cities in
the San Francisco Bay Area" (expected result
<https://en.wikipedia.org/wiki/List_of_cities_and_towns_in_the_San_Francisco_Bay_Area>
-- wikipedia
<https://en.wikipedia.org/w/index.php?title=Special:Search&search=Cities+in+the+San+Francisco+Bay+Area>
(pos
122), google
<https://www.google.com/search?q=Cities+in+the+San+Francisco+Bay+Area> (pos
1), bing
<https://www.bing.com/search?q=Cities+in+the+San+Francisco+Bay+Area> (pos
1), ddg
<https://duckduckgo.com/?q=Cities+in+the+San+Francisco+Bay+Area> (pos 1)).
This style occurs mostly for a navigation query (only one correct
result). For explorative queries, odds are one of the relevant results
will be on page 1.
There's a couple less direct cases, for instance if/once you integrate
a popularity score, freshness score, importance score, page query
score, or personalization (eg. ranking by physical distance from user
or user's interests), you'll find some examples where incorrect
results are non-helpfully boosted.
You're completely right and this is exactly the case here. We always
rescore the top 8000 documents (per node) with the number of incoming
links (which is far from ideal). By disabling all the top-N rescoring
features the expected result is now #2:
https://en.wikipedia.org/w/index.php?search=Cities+in+the+San+Francisco+Bay…
We don't do anything smart here, it's always the same plan whatever the
query is...
Investigating queries which lead to clicks on page two may find
interesting things popping out.
--
Knowing the SAT/DSAT-click-rate-vs.-position will tell you if good
clicks often occur beyond position 10. Then running an experiment of
10 SERP results vs. 20 SERP results may give interesting insights when
watching a session-success-rate metric (and maybe a time-to-success
metric). Aka, checking if a click on position 11+ is almost ever
useful, or just leads to a requery or abandonment. If you run result
size experiments, you can normalize for the query latency effects by
generating 20 and displaying 10.
The need of scrolling can cause a faster fall off of the click rates
listed. On my web browser, as it's currently sized, there are only
three results above the fold (my open advanced facet block takes a lot
of space, scrolling required for result 4+). Knowing how-much/if the
click rate drops for results below the fold will also help optimize
the number of results to display, snippet length, and UI design. Could
instrument number of results above the fold.
--
Side note: possible bug, I can't find the page "List of New York
University alumni
<https://en.wikipedia.org/wiki/List_of_New_York_University_alumni>"
when querying "New York University alumni
<https://en.wikipedia.org/w/index.php?search=New+York+University+alumni&title=Special%3ASearch&go=Go>"
(screenshot <https://imgur.com/SymW9tv>).
Yes... I usually find the params to tweak the query and push good
results near the top but not here...
I'll have to dig into more details to see what's going on, the best I
can do is a rank around pos 120 :(
Thank you very much for your help!
--
David