Hi Aubrey,
Thanks for the heads-up, I have CC'ed Sébastien from fr-ws, he worked on the djvu text extraction/merging and he was interested in following-up on that. Maybe he has some fresh ideas about it.

Micru

On Tue, Jul 16, 2013 at 10:24 AM, Andrea Zanni <zanni.andrea84@gmail.com> wrote:
Hi David, Aarti, thibaud and Tpt, 
please look at this thread:
http://en.wikisource.org/wiki/Wikisource:Scriptorium#EPUB.2FHTML_to_Wikitext
especially the last message. 

It seems George Orwell III knows his stuff about Djvu and Proofread extension, 
and it's probably worth digging into this "layer text" djvu thing. 

Even if I might dream of an ideal solution (a "layered structure" for wikisource, in which text can marked up several times in different layers) that is probably very far away.

But it's still important to pave the way for further improvements, I guess:
losing all the information from a formatted, mapped IA djvu it's not a good thing to do, IMHO.
And the Visual Editor could help us, in the future, to keep some of that information (italics, bold, etc.)

I know Aarti spoke with Alex about abbyy.xml: is it possible to do something with it?

Aubrey  



--
Etiamsi omnes, ego non