Hi Asaf ,

Thank you for helping us.

Presently I am personally working this for my native Bengali Wikisource. There some program (1,2,3,4)  available on web to download Books from DLI website. It may help for you too!  Sometimes I am facing the big issue with this utility from service side not available or down for few moments. 

1) http://sanskritdocuments.org/scannedbooks/dlidownloader/
2) http://dlidownloader.wordpress.com/
3) http://code.google.com/p/dli-downloader/
4) http://techstunted.blogspot.in/2013/03/downloading-books-from-digital-library.html

Another issue I found at Internet Archive, I have uploaded some books in PDF format  in AI here [1], but no boos have converted to DJVU , because, they are saying that A DjVu can only be made if the language of the book is OCRable. At this time we are not able to OCR Bengali. I know PDF will also accepted format in WS, but I would preferred DJVU.

1)https://archive.org/search.php?query=uploader%3A%22jayantanth%40gmail.com%22&sort=-publicdate

I shall send a mail for download list to you off-list.

Jayanta



On Fri, Dec 6, 2013 at 4:49 AM, Asaf Bartov <abartov@wikimedia.org> wrote:
Jayanta, I'm also happy to help, if bandwidth is a problem.  If you send me a list of URLs of books at the DLI that you'd like me to download and upload to the Internet Archive for you, I'm happy to do it.

   A.


On Wed, Dec 4, 2013 at 2:14 PM, Yann Forget <yannfo@gmail.com> wrote:
2013/12/5 Jayanta Nath <jayantanth@gmail.com>
Hi Yann,

Thank you for sharing this add-on and website. This site may very useful for sa.wikisource.org.

Yes, I will upload some of these books and tell them.
 
I am working on my native wikisource bengali. Can you  help us to develop OCR for Bengali?

Unfortunately, Bengali may not even exist in commercial software, although I know a French company which is making OCR for Indian languages, it will take some time.
Bengali is not available in Abby FineReader 11 Professional Editon, which is the leading world software for OCR. However several dozens of languages are available: all European languages, Latin, Greek, Russian, Chinese, Japanese, Korean, Arabic, several African languages, etc., but no Indian language is available in the list. It is what Internet Archive uses.
Developing OCR is a very long and complex work. And I don't speak Bengali, so I can't help much.

However I can help creating PDF and/or DJVU files, and uploading them.

Best regards,

Yann

_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l




--
    Asaf Bartov
    Wikimedia Foundation

Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality!

_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l