I completely agree with Lars.
I remember, for example, an awesome tool from Alex Brollo, postOCR,
a js script which corrects automatically most common OCR errors and converts apostrophes.
The tool is very useful and very used, and it would improve a lot from
a given list of common OCR errors per book.
Moreover, a set of stats per books
(list of words used, counting those words, etc.)
could be very interesting for a tiny range of readers, but skilled ones,
as digital humanists and philologists.
As an example, we are collaborating right now with a philologist (a digital humanist)
who put text on Wikisource, proofread them with the community,
and then works on them.
Aubrey