On 11/28/2011 01:03 PM, Andrea Zanni wrote:
- Technical: afaik, toolserver run Solaris, and apparently Finereader
is Windows only (I think we can solve easly this if we want, though)
- Ethic: this is proprietary software, and I don't know if we *want*
to use it on Wikimedia projects...
You can buy Finereader Professional for €129, which is a single-user Windows software. Wikimedia Sweden paid this for one sv.wikisource volunteer, who has used it a lot. It has a complicated graphic user interface and takes some time to master, but then it gives really good OCR results.
For a server installation, that version would not be useful. As you say, it runs only on Windows, and the single-user license doesn't allow sharing of the software. There is also a "Corporate Edition" with 3 or more user licenses, starting at €999, which is not very useful to us.
What WMF could use is called "Finereader Engine", which is an SDK (software development kit) that runs on a server. See for example, http://www.abbyy.com/ocr_sdk_linux/
I think this is what the Internet Archive uses, as well as several European libraries. We could look into establishing a cooperation with the Internet Archive or perhaps with Europeana in this area. Maybe the Internet Archive can open up an API for OCR-ing a single page at a time?