Hi Everyone,
Mikhail recently did a nice analysis of the relationship between various
features of queries as strings and zero results rates (ZRR),[1] and I took
a quick look at the two most impactful features, quotes and question marks,
and how ZRR would be affected on enwiki if they were stripped out of poorly
performing queries (< 3 results).[2]
Now I've looked at all queries (including zero results queries, poorly
performing queries, *and* all other queries for the top 10 Wikipedias by
search traffic[3] as part of a Phab ticket[4] which started based on the
unexpected results for queries like *How old is tom cruise?*
Most queries ending in question marks seem to be questions, but give
unexpected results (or no results) because ? is treated as a wildcard.
Stripping query-final question marks would help, but might cause problems
for a smaller number of users who intend them as wildcards on these
Wikipedias, and there might be other unknown issues on other wiki projects.
For those who don't like footnotes and want more detail, here's the link to
the new report:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Dropping_Final_Quest…
—Trey
[1]
https://github.com/wikimedia-research/Discovery-Search-Adhoc-QueryFeatures/…
[2]
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Quotes_and_Questions
[3]
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Dropping_Final_Quest…
[4]
https://phabricator.wikimedia.org/T133711
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation