Hi Asaf ,
Thank you for helping us.
Presently I am personally working this for my native Bengali Wikisource. There some program (1,2,3,4) available on web to download Books from DLI website. It may help for you too! Sometimes I am facing the big issue with this utility from service side not available or down for few moments.
1) http://sanskritdocuments.org/scannedbooks/dlidownloader/ 2) http://dlidownloader.wordpress.com/ 3) http://code.google.com/p/dli-downloader/ 4) http://techstunted.blogspot.in/2013/03/downloading-books-from-digital-librar...
Another issue I found at Internet Archive, I have uploaded some books in PDF format in AI here [1], but no boos have converted to DJVU , because, they are saying that A DjVu can only be made if the language of the book is OCRable. At this time we are not able to OCR Bengali. I know PDF will also accepted format in WS, but I would preferred DJVU.
1) https://archive.org/search.php?query=uploader%3A%22jayantanth%40gmail.com%22...
I shall send a mail for download list to you off-list.
Jayanta
On Fri, Dec 6, 2013 at 4:49 AM, Asaf Bartov abartov@wikimedia.org wrote:
Jayanta, I'm also happy to help, if bandwidth is a problem. If you send me a list of URLs of books at the DLI that you'd like me to download and upload to the Internet Archive for you, I'm happy to do it.
A.
On Wed, Dec 4, 2013 at 2:14 PM, Yann Forget yannfo@gmail.com wrote:
2013/12/5 Jayanta Nath jayantanth@gmail.com
Hi Yann,
Thank you for sharing this add-on and website. This site may very useful for sa.wikisource.org.
Yes, I will upload some of these books and tell them.
I am working on my native wikisource bengali. Can you help us to develop OCR for Bengali?
Unfortunately, Bengali may not even exist in commercial software, although I know a French company which is making OCR for Indian languages, it will take some time. Bengali is not available in Abby FineReader 11 Professional Editon, which is the leading world software for OCR. However several dozens of languages are available: all European languages, Latin, Greek, Russian, Chinese, Japanese, Korean, Arabic, several African languages, etc., but no Indian language is available in the list. It is what Internet Archive uses. Developing OCR is a very long and complex work. And I don't speak Bengali, so I can't help much.
However I can help creating PDF and/or DJVU files, and uploading them.
Best regards,
Yann
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
-- Asaf Bartov Wikimedia Foundation http://www.wikimediafoundation.org
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality! https://donate.wikimedia.org
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l