[Wikisource-l] On linking Wikisource with page images

ThomasV thomasV1 at gmx.de
Mon Jan 21 16:59:54 UTC 2008


Nice idea. Note that we now have an ocr server running Tesseract.
It is linked to Proofreadpage (and it works erratically)

Questions : are the bbox coordinates generated by the ocr engine ?
in that case, what happense if the ocr outputs an incorrect number
of lines ?

also, I think you do need a javascript hack for the edit box;
what happens if the user creates a new line ?

Thomas


Gregory Maxwell wrote:
> What I'd like to do is
> leave invisible markup generated by the ocr software in the page text,
> like this:
>
> <span class='ocr_line' title='bbox 551 4202 2666 4278 1'>The first
> experiments were made on the absorption of carbonic</span> <span
> class='ocr_line' title='bbox 474 4281 2668 4355 1'>acid gas by water:
> and here a singular disagreement was observed</span> <span
> class='ocr_line' title='bbox 471 4360 2668 4433 1'>in the first trials
> made under exactly the same circumstances. It</span>
>
> >From this the ocred text could be corrected, and markup could be
> added, but I could still take the output and apply it back to the
> original document.    If people feel this would frustrate editing too
> much we could make some Javascript hacks to the edit box to reduce the
> span tags to nothing more than an immutable <S marker.
>
> Would this be acceptable?
>
>   



More information about the Wikisource-l mailing list