On Thu, May 15, 2014 at 10:03 PM, John Mark Vandenberg jayvdb@gmail.com wrote:
We're getting a long way off topic of the still frame on MOTD, but I agree, and wish that the WMF would make this a priority for their multimedia and search team. Many improvements have been suggested by the community, and both sides of the fence have even agreed on some of them, such as clustered search results:
https://meta.wikimedia.org/wiki/Controversial_content/Brainstorming#Clusteri... https://bugzilla.wikimedia.org/show_bug.cgi?id=35701
First, as general background, WMF recently started migrating its search infrastructure over to ElasticSearch. See:
https://www.mediawiki.org/wiki/Search https://www.mediawiki.org/wiki/Help:CirrusSearch
The new search is available on Commons as a BetaFeature. It's worth looking at search results that are viewed as problematic through the new search and compare. For example, the results for "Asian" are markedly different in the new search.
I would caution against a simplistic characterization of technology as a solution for what's inherently a complex socio-technical problem. That was a core issue with the image filter proposal and it's a similar issue here. If people insist on uploading pictures of masturbation with toothbrushes, those pictures will come up in searches. If we insist on not having a distinction between explicit and non-explicit materials in file metadata, search results won't have it either. We can point the finger at technology because that's easy, but it's not magical pixie dust.
To get a feel for ElasticSearch's capabilities, please see the help page above, as well as the tech talk that Nik gave earlier today on the subject: https://www.youtube.com/watch?v=FubXExbAvOA
Capabilities that exist today with the new search include template-based "boosting" of results, a feature that's already enabled on Commons and which will boost quality content in search results: https://commons.wikimedia.org/w/index.php?title=MediaWiki:Cirrussearch-boost...
ElasticSearch has support for faceting (see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search... ), which might come in handy for creating a breakdown of search results.
However, keep in mind that unless you collapse each facet by default, you're still going to show explicit thumbs -- and collapsing results by default could compromise usability to an unacceptable degree for the common use case. The more complex suggestions that include taking the full category tree into account also seem fairly complex/expensive (ElasticSearch has no awareness of the actual category tree structure, which is a complex structure to traverse) and a faceted search that only operates on the specific categories associated with a given file might not be very useful due to the high degree of granularity that exists in the category structure.
I'd encourage Nik and Chad (search engineers) to weigh in here & on the bug as they see fit, as well as correct me if I'm misrepresenting anything in the above.
Cheers, Erik