Re: [Wikimedia-l] The case for supporting open source machine translation

24 Apr 2013

On Wed, Apr 24, 2013 at 2:04 PM, Mathieu Stumpf <
psychoslave(a)culture-libre.org&gt; wrote:

...
  I would like to add that (I'm no specialist of
this subject) translating
 natural language probably need at least a large set of existing
 translations, at least to get read of "obvious well known" idiotisms like
 "kitchen sink" translated "usine à gaz" when you are speaking of a
software
 for example. On this regard, we probably have such a base with wikisource.
 What do you think? 

Personally, I think this is an awesome idea :-)
Wikisource corpora could be a huge asset in developing this.
We already host different public domain translations, and in the future, we
hope, more and more Wikisources will allow user generated translations.

At the moment, Wikisource could be a interesting corpora and laboratory for
improving and enhancing OCR,
as the OCR generated text is always proofread and corrected by humans.
As part of our project (
http://wikisource.org/wiki/Wikisource_vision_development), Micru was
looking for a GSoC candidate for studing the reinsertion of proofread text
into djvus [1], but at the moment didn't find any interested student. We
have some contacts with people at Google working on Tesseract, and they
were available for mentoring.

Aubrey

[1] We thought about this both for OCR enhancement purposes and files
updating on Commons and Internet Archive (which is off topic here).

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] The case for supporting open source machine translation