[Wikisource-l] ABBYY Finereader 11 on Toolserver: do we like it?

Andrea Zanni andrea.zanni at wikimedia.it
Mon Nov 28 12:03:58 UTC 2011


Dear all,
it's a lot aof time I'm wondering a project that could help Wikisource (and
some GLAMs too), and the idea is simply to install ABBYY Finereader 11 on
toolserver,
as a tool for all Wikisource users.

For those who don't know, ABBYY Finereader is an OCR software: it is
proprietary and fairly expensive,
but it is accurate and works really, really well. Plus, its 11 version can
save files in DjVu.

Now, in my mind having such a software on toolserver could take us to:
- restore our beloved OCR button, with a much more accurate OCR
- use Finereader for transforming PDF/TIFF/JPG from Commons directly in
OCRred DjVus.
- others things I've not thought yet

Issues are many too:
- Cost: I don't know how much this could cost. Many WM chapters do give
money to toolserver, and the status of the thing is a bit fuzzy a the
moment, but, for example,
Wikimedia Italy has frozen 5000 euros for the toolserver, and maybe we can
use those money for the license (I'm in WMI Board, and I've asked, they say
it's OK);
- Technical: afaik, toolserver run Solaris, and apparently Finereader is
Windows only (I think we can solve easly this if we want, though)
- Ethic: this is proprietary software, and I don't know if we *want* to use
it on Wikimedia projects...
- Resources: i think this is probably the main issue: we need skilled
people to set this up technically, and at least one toolserver operator
(Phe, maybe?)

Below, the mail I sent to ABBYY Europe, to see it the thing was feasible.
They simply replied they want a phone call. Of course, if the thing would
be too expensive the projects collapse immediately,
but I think it's worth to discuss. If nobody wants it, I can drop it right
now.

Please, forward this may to everyone possibly interested,
I don't thin it's a good idea to scatter discussions in every ws Village
Pump.

Cheers

Aubrey




*From:* Andrea Zanni <andrea.zanni at wikimedia.it>

*Sent:* Friday, November 25, 2011 10:20 AM

*To:* support_eu at abbyy.com

*Subject:* Questions about server licenses



Dear ABBYY Europe,
my name is Andrea Zanni, and I'm a Board member of Wikimedia Italy,
the Italian chapter of Wikimedia movement.
We are a no-profit association which promotes and sustains Wikimedia
project,
as the online encyclopedia Wikipedia.

I'm writing you because I'm interested in knowing
about "server licences" of your new Finereader 11.

As far as I know, your product save files in DjVu, and this is an
interesting feature that
could help some of our project.
Maybe you know Wikisource, a multilingual digital library in which the
community upload, transcribe and proofread books.
This is the english version (http://en.wikisource.org/wiki/Main_Page).
In each page of each book (which are uploaded in DjVu), we have a little
button "OCR"
which used to call a tesseract bot and ocr the page.
Right now, the bot doesn't work for lack of maintainance.

My idea would be to substitute the tesseract with Finereader, and also have
the possibility to
use other features, as taking a PDF/JPEG file and saving it as a OCRred
DjVu, or as choosing the language of the OCR from project to project.

Now, I do not have an estimate of how much this engine could be used (I
understand this is a crucial factor for the price of a server license).
I would count few hundreds of pages OCRred per day (maybe more, if this
thing works), and a few dozens file conversions (any to DjVu) per day.

So, my questions are:
- do you have a rough idea how much this license would cost?
- do you know if it is possible to run FR11 in other os than Windows (we
actually run Solaris)?
- do you know if is possible to have all these feature via API or
something?

Thank you for your time,
regards

Andrea Zanni

-- 
Wikimedia Italia Board



Sostieni la cultura, dona a Wikimedia Italia.
http://sostienilacultura.it
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/wikisource-l/attachments/20111128/f92da3c7/attachment-0001.htm 


More information about the Wikisource-l mailing list