Yay! Thank you for this awesome research, Trey. Evaluating language
plugins sounds like it would make a /great/ blog post. What
alternatives are up next?
> _______________________________________________
On 4 September 2015 at 18:45, Trey Jones <tjones@wikimedia.org> wrote:
> I've written up my analysis of the ElasticSearch language detection plugin
> that Erik recently enabled:
>
> https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Detection_Evaluation
>
> The short version is that it really likes Romanian (and Italian, and has a
> bit of a thing for French), and precision on English is great, but recall is
> poor (probably because of all the typos and other crap that go to enwiki
> that is still technically "English"). Chinese and Arabic are good.
>
> I think we could do better, and we should evaluate (a) other language
> detectors and (b) the effect of a good language detector on zero results
> rate (i.e., simulate sending queries to the right place and see how much of
> a difference it makes).
>
> Moderately pretty pictures included.
>
> —Trey
>
> Trey Jones
> Software Engineer, Discovery
> Wikimedia Foundation
>
> Wikimedia-search mailing list
> Wikimedia-search@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>
--
Oliver Keyes
Count Logula
Wikimedia Foundation
_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search