uh, that sounds very interesting.Right now, we mainly use OCR from djvu from Internet Archive (that means ABBYY Finereader, which is very nice).
But ideally we could think of a "customizable" OCR software that gets trained language per language: htat would be extremely useful for Wiikisources.(i can also imagine to divide, inside every language, per centuries, because languages too changes over time ;-)