Hi!
Since we've talked about maybe using TextCat-based algorithms, I've made
an implementation of textcat as PHP class/utility, which may be useful:
https://github.com/smalyshev/textcat
Please feel free to comment. It bases on what I found at
http://odur.let.rug.nl/~vannoord/TextCat/ which is pretty old, so we may
want to patch it up, but it works as a starting point I think (provided
we'd want to pursue this route).
I'll work on improving the loading latency (converting LM format to PHP)
and making it into a real composer module. Maybe also add some tests.
Improvement suggestions welcome of course.
--
Stas Malyshev
smalyshev(a)wikimedia.org