Hi!
Since we've talked about maybe using TextCat-based algorithms, I've made an implementation of textcat as PHP class/utility, which may be useful:
https://github.com/smalyshev/textcat
Please feel free to comment. It bases on what I found at http://odur.let.rug.nl/~vannoord/TextCat/ which is pretty old, so we may want to patch it up, but it works as a starting point I think (provided we'd want to pursue this route).
I'll work on improving the loading latency (converting LM format to PHP) and making it into a real composer module. Maybe also add some tests. Improvement suggestions welcome of course.