Federico Leva (Nemo), 11/10/2013 08:48:
Dispenser kindly made a list of DjVu files on Commons
linking an IA
item, with some information like global usage:
https://toolserver.org/~dispenser/temp/djvu2archive.org.txt (just change
the extension to csv to open it as a spreadsheet, tab-separated).
It's about 5000 books with 6-200 global usages and 5000 outside that
range (which probably means completely unused apart some talk pages or
whatever, or with most text already living on wiki pages).
If I manage to convince a "slash-admin", I'll get those 5000 re-OCR'd,
otherwise I need to do it manually so suggestions on priorities are
welcome. :)
Jeff at the Internet Archive tells me they haven't tested the new OCR
extensively yet, so they won't re-OCR en masse yet.
I'll select a few test cases, reupload to different items and see what
difference the new OCR makes: I'd use some help comparing the results
for non-romance languages though... I'll also try some books in the
newly supported languages: Hebrew and Thai (now with dictionary),
Chinese (traditional and simplified) and Japanese.
Nemo