Yeah!
I'm really happy that the BUB tool is resurrecting, and for the new OCR script. Thanks everyone!

Aubrey

On Tue, Jan 5, 2016 at 9:53 PM, Asaf Bartov <abartov@wikimedia.org> wrote:
On Tue, Jan 5, 2016 at 10:29 AM, Bodhisattwa Mandal <bodhisattwa.rgkmc@gmail.com> wrote:
Hi,

I am happy to inform, that Shrinivasan has created a python script to automate the process in Linux system. This scripts upload the PDF files to Google Drive, download the OCRed text and split, merge the text files properly to fit as the PDF file. We have just tested the script for small files in Kannad and Bengali Wikisource and it was successful. We are going to test the script for using different types and sizes of files and in other Indic languages in next few days.

The script is in https://github.com/tshrinivasan/OCR4wikisource

Fantastic news!

   A.


_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l