On 19 January 2012 11:19, Cristian Consonni <kikkocristian(a)gmail.com> wrote:
2012/1/15 Nikola Smolenski <smolensk(a)eunet.rs>rs>:
Дана Wednesday 11 January 2012 18:19:14 Cristian
However, to my knowledge there is not a single OCR that exports this data, nor
is there a standard format for it. If an open source OCR could be modified to
do this, then it would be easy to inject data retreieved from captchas back
into OCR-ed text. And it could be used for so much more :)
I know (but I am not proficient in their use) at least two open source
* OCRopus[1a][1b], by the German Research Center for Artificial
Intelligence, sponsored by Google
* Tesseract[2a][2b], started by HP in far 1995, now Google-sponsored
(yeah, this one too!) [note: as far as I know OCRopus used tesserect
as an engine for OCR]
I think much can be done.
Using tesseract alone is "too much work". Tesseract want tiff files in
a particular format, and DPI. Humans want stuff in a easy to use
format, perhaps click on a image and get the text directly behind the
mouse arrow as text can be copied and paste.
ℱin del ℳensaje.