Re: [Wikisource-l] [Commons-l] [cultural-partners] ABBYY Finereader 11 on Toolserver: do we like it?

30 Nov 2011


      2011/11/29 Lars Aronsson lars@aronsson.se
...
On 11/28/2011 10:23 PM, Alex Brollo wrote:
...
[...] FineReader 11 [...] produces a complete djvu file [...] Text
layer hasn't full range of details, it's organized into two levels
(page and line), while OCR engine on  IA servers produces a very rich
"tree" (page, column, region, paragraph, line and word).
Has anybody designed a web interface that shows the scanned
image and the zones or regions of the Djvu text layer? It would
look similar to image annotation on Commons,
http://commons.wikimedia.org/wiki/Commons:Image_annotations
For a Djvu file uploaded to Commons, could you automatically
generate image annotations for the various text columns and
illustrations? Does image annotation handle multi-page
document formats such as PDF and Djvu?
Thanks for interesing questions. I'm exploring as deeply as I can djvu text
layer, metadata, anf informations wrapped into djvu file, and my feel is
that djvu support is very primitive, the first needed step perhaps being
conversion from "bundled" to "indirect" format; djvu files into the web are
great exactly because single pages can be shared into the web, with their
complete content.
I'll take a look to Image annotations, I don't know anything about them
even if I tested ImageMap extension as a proofreading tool: take a look
here:
http://it.wikisource.org/wiki/Pagina:Vettura_a_vapore_del_signor_Dietz.djvu/...
Presently I'm building a python DjvuDsed "object", containing any
information about the whole text layer and annotations and informations of
a djvu file, and I'm adding, one by one, methods and attributes such a
formidable object. I'll care for your ideas while going on.
Alex

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikisource-l] [Commons-l] [cultural-partners] ABBYY Finereader 11 on Toolserver: do we like it?