I like and I studies - as deeply I can - djvu file structure and DjvuLibre routines; dealing with wikisource needs, I appreciate, but I like less, pdf files for their complexity. Proofread procedure is presently based on djvu or pdf files; but I see that another approach could be used, using only simpler routines.

Proofreading procedure needs two inputs:

1. a set of good images of page scans;

2. a good mapped file of text content matched with images.

About "mapped text", there are two alternatives, hOCR and xml; both can be used to extract "unmapped raw text" when needed at server level, but at local level too by jQuery. If hOCR/xml of page text could be fastly and simply accessed from nsPage, I see interesting opportunities - i.e. generalized highlighting of selected text on nsPage image both in view and in edit mode; formatting suggestions from heuristic analysis of word coordinates; different organization of high level text structures, as wrong column layout).

Alex brollo (it.wikisource)