Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

15 Jan 2012

Дана Wednesday 11 January 2012 18:19:14 Cristian Consonni написа:
...
  I believe the trickiest part is creating a system to
put results back
 in Wikisource in a semi-automated way, but having "captcha reviewers"
 may help. 
OCRs generally work by finding lines of text on a page, splitting the lines 
into letters, then recognizing each letter separately. So, an OCR would know, 
for each letter of the recognized text, what is its bounding box on the page.

However, to my knowledge there is not a single OCR that exports this data, nor 
is there a standard format for it. If an open source OCR could be modified to 
do this, then it would be easy to inject data retreieved from captchas back 
into OCR-ed text. And it could be used for so much more :)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Fwd: wikicaptcha on GitHub