Finally, if this is important enough and the task gets prioritized, I'd be willing to dive back in and go through the process once and pull out the top zero-results queries, this time with basic bot exclusion and IP deduplication—which we didn't do early on because we didn't realize what a mess the data was. We could process a week or a month of data and categorize the top 100 to 500 results in terms of personal info, junk, porn, and whatever other categories we want or that bubble up from the data, and perhaps publish the non-personal-info part of the list as an example, either to persuade ourselves that this is worth pursuing, or as a clearer counter to future calls to do so.
—Trey
---------- Forwarded message ----------
From: "James Heilman" <jmh649@gmail.com>
Date: Jul 15, 2016 06:33
Subject: [Wikimedia-l] Improving search (sort of)
To: "Wikimedia Mailing List" <wikimedia-l@lists.wikimedia.org>
Cc:
A while ago I requested a list of the "most frequently searched for terms
for which no Wikipedia articles are returned". This would allow the
community to than create redirect or new pages as appropriate and help
address the "zero results rate" of about 30%.
While we are still waiting for this data I have recently come across a list
of the most frequently clicked on redlinks on En WP produced by Andrew West
https://en.wikipedia.org/wiki/User:West.andrew.g/Popular_redlinks Many of
these can be reasonably addressed with a redirect as the issue is often
capitals.
Do anyone know where things are at with respect to producing the list of
most search for terms that return nothing?
--
James Heilman
MD, CCFP-EM, Wikipedian