On Fri, Sep 15, 2017 at 12:30 PM, Pau Giner <pginer(a)wikimedia.org> wrote:
With "a candidate "small wiki"
that's been wanting to use
ContentTranslation", do you mean a wiki with heavy use of Content
Translation but lacking Machine Translation support? (I'm asking because
Content Translation is available in all wikis, although some lack automatic
translation support). The CX Stats page
<https://en.wikipedia.org/wiki/Special:ContentTranslationStats> can give
you an idea on how much Content Translation has been used for translation
on each wiki, and automatic translation support can be found here
<https://www.mediawiki.org/wiki/Content_translation/Machine_Translation>.
I was thinking of a wiki w/o machine translation support but for which the
community had been lobbying for it, with an added bonus if the language in
question was related in some way to a larger language family. For example,
perhaps since Catalan is a romance language, perhaps a model trained on
French and Spanish would be able to pretrain for Catalan. (But on the
other hand, Latvian is pretty isolated as a language, so would only be
worth cross-training with Lithuanian.)
The data is probably buried in those two pages you cited for me, I've just
got to dig for it a bit. One odd thing that jumps out: why do we support
en->zh but not zh->en ?
--scott
--
(
http://cscott.net)