Hi Nemo,

that's great news.

I wonder though how would be worth to redo the OCR on the archive djvu, as it will be on the archive.org but not on Commons...

Do you imply that we would need to re-upload the djvu on Commons?

BTW,

I think it's past time that Archive.org and Wikimedia start a real partnership/collaboration.

With Micru, some months ago, we tried to draft a possible model:

https://docs.google.com/file/d/0B1PNcNlN2oqvajVfOEFuM29sbzg/edit?usp=sharing

But I think the discussion died (as did many others).

One of the things we could do is a project similar to this:

https://www.mediawiki.org/wiki/Possible_projects#Google_Books_.3E_Internet_Archive_.3E_Commons_upload_cycle

Aubrey

On Tue, Oct 1, 2013 at 6:25 PM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:

As you know, many of us use archive.org to OCR their books: <https://en.wikisource.org/wiki/Help:DjVu_files#The_Internet_Archive>
For a while, they've been stuck with FineReader 8.0. I've just noticed the last OCR processes use 9.0, which has 5 more languages and 2 more dictionaries:
http://www.abbyy.com/support/finereader_90_ts/RecognitionLanguages/
http://www.abbyy.com/support/finereader_80_ts/RecognitionLanguages/

I think it's worth re-doing OCR on any archive.org DjVu you're using (and you definitely must do so if it's one of those languages). I'm a (limited) admin there, so feel free to give me on my talk lists of items where to update OCR: https://wikisource.org/wiki/User_talk:Nemo_bis

Nemo

_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l