On Fri, Sep 15, 2017 at 12:30 PM, Pau Giner pginer@wikimedia.org wrote:
With "a candidate "small wiki" that's been wanting to use ContentTranslation", do you mean a wiki with heavy use of Content Translation but lacking Machine Translation support? (I'm asking because Content Translation is available in all wikis, although some lack automatic translation support). The CX Stats page https://en.wikipedia.org/wiki/Special:ContentTranslationStats can give you an idea on how much Content Translation has been used for translation on each wiki, and automatic translation support can be found here https://www.mediawiki.org/wiki/Content_translation/Machine_Translation.
I was thinking of a wiki w/o machine translation support but for which the community had been lobbying for it, with an added bonus if the language in question was related in some way to a larger language family. For example, perhaps since Catalan is a romance language, perhaps a model trained on French and Spanish would be able to pretrain for Catalan. (But on the other hand, Latvian is pretty isolated as a language, so would only be worth cross-training with Lithuanian.)
The data is probably buried in those two pages you cited for me, I've just got to dig for it a bit. One odd thing that jumps out: why do we support en->zh but not zh->en ? --scott