Re: [Wikimedia-l] The case for supporting open source machine translation

26 Apr 2013

* Andrea Zanni wrote:
...
 At the moment, Wikisource could be a interesting
corpora and laboratory for
improving and enhancing OCR,
as the OCR generated text is always proofread and corrected by humans.
As part of our project (
http://wikisource.org/wiki/Wikisource_vision_development), Micru was
looking for a GSoC candidate for studing the reinsertion of proofread text
into djvus [1], but at the moment didn't find any interested student. We
have some contacts with people at Google working on Tesseract, and they
were available for mentoring. 
...
 [1] We thought about this both for OCR enhancement
purposes and files
updating on Commons and Internet Archive (which is off topic here). 
I built various tools that could be fairly easily adapted for this, my
http://www.google.com/search?q=site:lists.w3.org+intitle:hoehrmann+ocr
notes are available. One of the tools for instance is a diff tool, see
image at <http://lists.w3.org/Archives/Public/www-archive/2012Apr/0031>.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] The case for supporting open source machine translation