Perhaps there's a misinterpretation - I mentioned
abbyy.xml but with no
project to import it as-it-is; abbyy.xml is only a surprising data
container from which extract anything useful to speed up proofreading (and
formatting) - nothing more than this.
Just an example: vertical djvu coordinates of lines can be used to get
font-size; horizontal coordinates of lines can be used to recognize
centered text; paragraphs splitting is valuable; coolumns can be
recognized; margin too; with some effort probably poems can pop up.
Far from simply importing coordinates, it's a matter of use them at our
best; no data, no data information contents.
Alex
2013/7/17 Lars Aronsson <lars(a)aronsson.se>
On 07/17/2013 12:57 PM, Alex Brollo wrote:
FineReader OCR stores an incredibly detailed
information in [...]
abbyy.xml
In the other end, Wikisource is a wiki that edits wiki text.
Sure, you could insert the XML there and let users
edit the XML, but that would scare more users away
and allow for more mistakes.
For example, if proofreading Hamlet,
To be or not to bc, that is the question,
anybody can easily spot "bc" and correct that.
In the XML version,
<word x=1 y=1>To</word>
<word x=5 y=1>be</word>
<word x=8 y=1>or</word>
someone might think that "or" should be a litte more
to the right, so one user inserts a space between the
tag "<word x=8 y=1>" and "or", while another user
adjusts the tag to "<word x=9 y=1>". All the tags
make it harder to spot the OCR error "bc".
Even if you replace manual XML editing with some
graphic tool, you get the same ambiguity between
adding whitespace and adjusting coordinates.
This is a nightmare that we avoid by throwing away
all the coordinates and just proofreading the plain text.
It is not the perfect system, it's a compromise, in
order to get some useful work done.
--
Lars Aronsson (lars(a)aronsson.se)
Project Runeberg - free Nordic literature -
http://runeberg.org/
______________________________**_________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.**org <Wikisource-l(a)lists.wikimedia.org>
https://lists.wikimedia.org/**mailman/listinfo/wikisource-l<https://list…
_______________________________________________
Wikisource-l mailing list
Wikisource-l(a)lists.wikimedia.org