Hi C. Scott,

Information about the APIs to get the list of translations and the parallel corpora (that includes examples of human translation, machine translation and the corrections people did to them) is available at https://www.mediawiki.org/wiki/Content_translation/Published_translations
People in the team more familiar with the technical details may provide more details if needed.

With "a candidate "small wiki" that's been wanting to use ContentTranslation", do you mean a wiki with heavy use of Content Translation but lacking Machine Translation support? (I'm asking because Content Translation is available in all wikis, although some lack automatic translation support). The CX Stats page can give you an idea on how much Content Translation has been used for translation on each wiki, and automatic translation support can be found here.

--Pau

On Fri, Sep 15, 2017 at 6:14 PM C. Scott Ananian <cananian@wikimedia.org> wrote:
We're tracking source/destination pairs generated by the ContentTranslation tool, right? Could someone point me to that dataset?  (I'm playing around with some machine translation stuff to see if i can prototype a suggester tool that would translate edits on wiki A to corresponding edits on wiki B.)
  --scott

PS. There's some cool work being done on "zero-shot translation"; aka bootstrapping translation tools for small languages by pre-training them on a related language (or even an unrelated language).  Apparently that works! (Cf https://arxiv.org/pdf/1611.04558.pdf) It can greatly reduce the amount of data required to build a translation model for the small language.

Is there a candidate "small wiki" that's been wanting to use ContentTranslation which would be a good candidate for experimentation?

--
_______________________________________________
Mediawiki-i18n mailing list
Mediawiki-i18n@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n
--
-- 
Pau Giner
Senior User Experience Designer
Wikimedia Foundation