On 11/28/2011 01:03 PM, Andrea Zanni wrote:
- Technical: afaik, toolserver run Solaris, and
apparently Finereader
is Windows only (I think we can solve easly this if we want, though)
- Ethic: this is proprietary software, and I don't know if we *want*
to use it on Wikimedia projects...
You can buy Finereader Professional for €129, which is a
single-user Windows software. Wikimedia Sweden paid
this for one sv.wikisource volunteer, who has used it a lot.
It has a complicated graphic user interface and takes some
time to master, but then it gives really good OCR results.
For a server installation, that version would not be useful.
As you say, it runs only on Windows, and the single-user
license doesn't allow sharing of the software. There is
also a "Corporate Edition" with 3 or more user licenses,
starting at €999, which is not very useful to us.
What WMF could use is called "Finereader Engine", which
is an SDK (software development kit) that runs on a
server. See for example,
http://www.abbyy.com/ocr_sdk_linux/
I think this is what the Internet Archive uses, as well as
several European libraries. We could look into establishing
a cooperation with the Internet Archive or perhaps with
Europeana in this area. Maybe the Internet Archive can
open up an API for OCR-ing a single page at a time?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se