Re: [Wikimedia-l] The case for supporting open source machine translation

25 Apr 2013

On 24/04/13 12:35, Denny Vrandečić wrote:
...
  Current machine translation research aims at using
massive machine learning
 supported systems. They usually require big parallel corpora. We do not
 have big parallel corpora (Wikipedia articles are not translations of each
 other, in general), especially not for many languages, and there is no 
Could you define "big"? If 10% of Wikipedia articles are translations of 
each other, we have 2 million translation pairs. Assuming ten sentences 
per average article, this is 20 million sentence pairs. An average 
Wikipedia with 100,000 articles would have 10,000 translations and 
100,000 sentence pairs; a large Wikipedia with 1,000,000 articles would 
have 100,000 translations and 1,000,000 sentence pairs - is this not 
enough to kickstart a massive machine learning supported system? 
(Consider also that the articles are somewhat similar in structure and 
less rich than general text - future tense is rarely used for example.)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] The case for supporting open source machine translation