Re: [Wikimedia-l] The case for supporting open source machine translation

29 Apr 2013

On 26/04/13 19:38, Bjoern Hoehrmann wrote:
...
  * Andrea Zanni wrote:
> At the moment, Wikisource could be a interesting corpora and laboratory for
> improving and enhancing OCR,
> as the OCR generated text is always proofread and corrected by humans. 
Try also Distributed Proofreaders. It is my impression that Wikisource's 
proofreading standards are not always up to par.

...
   As part of our
project (
 http://wikisource.org/wiki/Wikisource_vision_development), Micru was
 looking for a GSoC candidate for studing the reinsertion of proofread text
 into djvus [1], but at the moment didn't find any interested student. We
 have some contacts with people at Google working on Tesseract, and they
 were available for mentoring. 
  [1] We thought about this both for OCR
enhancement purposes and files
 updating on Commons and Internet Archive (which is off topic here). 
 I built various tools that could be fairly easily adapted for this, my
 http://www.google.com/search?q=site:lists.w3.org+intitle:hoehrmann+ocr
 notes are available. One of the tools for instance is a diff tool, see
 image at <http://lists.w3.org/Archives/Public/www-archive/2012Apr/0031>. 
This is a very interesting approach :)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] The case for supporting open source machine translation