I've written up my analysis of the ElasticSearch language detection plugin that Erik recently enabled:
    https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Detection_Evaluation

The short version is that it really likes Romanian (and Italian, and has a bit of a thing for French), and precision on English is great, but recall is poor (probably because of all the typos and other crap that go to enwiki that is still technically "English"). Chinese and Arabic are good.

I think we could do better, and we should evaluate (a) other language detectors and (b) the effect of a good language detector on zero results rate (i.e., simulate sending queries to the right place and see how much of a difference it makes).

Moderately pretty pictures included.

—Trey

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation