On Mon, Nov 15, 2010 at 11:33 AM, Murali Kumar <pthooran(a)hotmail.com> wrote:
Dear Wikimedia India,
As you probably aware the Govt. of India, immediately post Independence
started multiple Indian language encyclopedia projects to stream in Science
and Technology. The Tamil language encyclopedia was completed
[
http://en.wikipedia.org/wiki/Tamil_Encyclopedia]
I'm pleased to report Tamil Virtual University has scanned in the Tamil
Kalaikalanjiam / Tamil Encyclopedia [Please see Reference 1 below].
I was able to download the material via the wonderful wget command and the
'convert' (from imagemagick lib) in GNU/Linux. However each of the 10
volumes is close to 700 MB without compression.
I would imagine, the people behind this mammoth task (pre-internet era)
would have liked it to be merged into a Wiki type format, which would make
it a truly living document in-sync with the times.
I do not have any experience with 1) Tamil OCR software and 2) Automated
updates to Wikipedia.
Can anyone take the lead on this project ? It will help boost the number of
quality, articles in Indian languages. The Children's encyclopedia is being
scanned and has a lot of great visual content.
I have uploaded a sample (10 MB) PDF file at
https://sites.google.com/site/periasamythooran/kalaikalanjiam/kalaikalanjia…
do we have license to adopt or use this material for tamil wikipedia?
I am working on a similar plan with Marathi Viswakosh project in Mumbai.
--
GN