I tried ABBY before and the quality was low,
I will try tesseract and see what happens


On Tue, Jun 24, 2014 at 7:08 PM, Aleksey Chalabyan <xelgen.am@gmail.com> wrote:
ABBYY FineReader supports Hebrew and Arabic since v. 11. But I'm afraid same script is not enough. For example FineReader has 3 versions for Armenian. All three use same scripts, different orphography and slightly different vocabulary, but if you set wrong language drop in quality is dramatic. So I'm not sure if Arabic OCR would work good for text in Farsi (Persian).
FineReader provides 30 days full trial, and I think it's worth to give it a try.

You may try to approach ABBYY and check if there are any plans on full support of Persian in coming future.

And trying to train Teseract seems like good idea to get free/open source OCR for Persian, if you can get enough resources on that. But I can't comment on how well it will work with RTL scripts especially with Nastaliq/Naskh when letters and words are not separated from each other.

On Tue, Jun 24, 2014 at 6:13 PM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
Amir Ladsgroup, 24/06/2014 15:37:

I have access to huge resources of old books in Persian (some of them
are even typed) and almost all of them can be imported to Wikisource but
the problem is I don't have (or know) any OCR for Persian, Do you know
which OCR software supports Persian (supporting Arabic is not enough; I
checked several programs) texts?

The only result for "Persian" and OCR in abbyy website is <http://www.abbyy.com/CaseStudies/SISU-Reveals-Its-Multilingual-Content-to-Academic-Community-Thanks-to-ABBYY-Recognition-Server/>, weird! Worth asking them some details, they might have some additional plugins.

On the FLOSS side, maybe some library in Iran made some investments on tesseract? If there's any big digital library of Persian content you should ask them as well.

Reminder: archive.org is still in need of people willing to compare 8.0 vs. 9.0 OCR results of some books in their language. :)


Wikisource-l mailing list

Wikisource-l mailing list