[discovery] TextCat and Confidence

12 Jul 2016


      Hey everyone,
Mikhail has written up and should soon release his report on our recent
TextCat A/B tests; the results look good, and language identification and
cross-wiki searching definitely improve the results (in terms of results
shown and results clicked) for otherwise poorly performing queries (those
that get fewer than 3 results).
Mikhail's report also suggests looking at some measure of confidence for
the language identification to see if that has any effect on the quality
(in terms of number of results, but more importantly clicks) of the
crosswiki (also "interwiki") results. This sounds like a good idea, but
TextCat doesn't make it super easy to do. I have some ideas, though, and I
would love some suggestions from anyone else who has any ideas.
The details are kind of technical, so if that kind of thing makes your eyes
glaze over, you should avert your gaze now.
Otherwise, check out my write up on TextCat and confidence
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_and_Confidence
and share your ideas here, or on the talk page.
Thanks!
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

[discovery] TextCat and Confidence