On 11/30/2011 09:55 PM, Eugene Zelenko wrote:
ABBYY has own online OCR service http://finereader.abbyyonline.com
This is very interesting, OCR as a cloud service. I didn't know they were doing this. They charge EUR 7 per 200 pages, or US$ 0.05 per page, which I guess can be (almost) reasonable for the Wikimedia Foundation to pay. I sometimes feel bad because I have OCRed so many tens of thousand pages with a single EUR 129 license of Finereader. Here, EUR 129 would buy us 3700 pages.
All languages of Wikisource together are proofreading slightly less than 900 pages/day, for which OCR would cost EUR 32/day or US$ 43/day. With good OCR, proofreading is more fun, and these numbers may increase. But then again, we wouldn't need the service for all pages, as some books already have OCR.
The most interesting feature of a cloud-based OCR service, is if they can accumulate improvements in font training (?) and dictionaries from a large number of users over time. With Wikisource, they can of course get direct access to the page after proofreading.
So, is the service any good? They even promise to do Fraktur (blackletter). Does it work well?
On 12/01/2011 04:08 AM, Lars Aronsson wrote:
On 11/30/2011 09:55 PM, Eugene Zelenko wrote:
ABBYY has own online OCR service http://finereader.abbyyonline.com
So, is the service any good? They even promise to do Fraktur (blackletter). Does it work well?
After having tried it, I'm less enthusiastic. The web user interface is only upload images, download OCR text. There is no interaction with adjusting segments / zones or training the OCR output. Only 40 languages are supported, and there is no way to indicate special dictionaries for old spelling. Blackletter is only supported for German and Latvian. The upload button is based on Flash, and didn't quite work in Firefox on Linux, but it worked in Opera.
It worked OK for a modern (not blackletter) Norwegian text from the 1930s. An advantage is that you can start as low as 50 pages for EUR 3.50. Double that and you get 200 pages. For advanced jobs, I still recommend buying the Professional edition, but some users might find the online version useful.
wikisource-l@lists.wikimedia.org