Hey everyone,

Mikhail has written up and should soon release his report on our recent TextCat A/B tests; the results look good, and language identification and cross-wiki searching definitely improve the results (in terms of results shown and results clicked) for otherwise poorly performing queries (those that get fewer than 3 results).

Mikhail's report also suggests looking at some measure of confidence for the language identification to see if that has any effect on the quality (in terms of number of results, but more importantly clicks) of the crosswiki (also "interwiki") results. This sounds like a good idea, but TextCat doesn't make it super easy to do. I have some ideas, though, and I would love some suggestions from anyone else who has any ideas.

The details are kind of technical, so if that kind of thing makes your eyes glaze over, you should avert your gaze now.

Otherwise, check out my write up on TextCat and confidence and share your ideas here, or on the talk page.

Thanks!

—Trey

Trey Jones

Software Engineer, Discovery

Wikimedia Foundation