Only a little bit of djvu OCR/text  contents is currently used, I think that we can do more:
1. xml and dsed (LISP-like) representations have pros and cons, that should be carefully considered; 
2. djvu text layer can host an unlimited number of metadata and free text content, indipendent from mapped OCR;
3. hOCR (by tesseract) can be translated in dsed, a converting script would be very useful to inject tesseract output into djvu OCR layer;
4. IA shares a terrible g-zipped xml, _abbyy.gz, where any possible detail about OCR recognition can be found, and a converting tool to dsed (perhaps, recovering too many formatting details!) would be very useful. 


I'm playing into all from these issues, I'd like to know if any other wikisource contributor is interested about.