Hi Everyone,

Mikhail recently did a nice analysis of the relationship between various features of queries as strings and zero results rates (ZRR),[1] and I took a quick look at the two most impactful features, quotes and question marks, and how ZRR would be affected on enwiki if they were stripped out of poorly performing queries (< 3 results).[2]

Now I've looked at all queries (including zero results queries, poorly performing queries, and all other queries for the top 10 Wikipedias by search traffic[3] as part of a Phab ticket[4] which started based on the unexpected results for queries like How old is tom cruise?

Most queries ending in question marks seem to be questions, but give unexpected results (or no results) because ? is treated as a wildcard. Stripping query-final question marks would help, but might cause problems for a smaller number of users who intend them as wildcards on these Wikipedias, and there might be other unknown issues on other wiki projects.

For those who don't like footnotes and want more detail, here's the link to the new report:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Dropping_Final_Question_Marks_in_the_Top_10_Wikipedias

—Trey

[1] https://github.com/wikimedia-research/Discovery-Search-Adhoc-QueryFeatures/blob/master/report.pdf
[2] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Quotes_and_Questions
[3] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Dropping_Final_Question_Marks_in_the_Top_10_Wikipedias
[4] https://phabricator.wikimedia.org/T133711

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation