From a recent hackathon
https://www.mediawiki.org/wiki/Chennai_Hackathon_March_2012 , a project
that might be of interest to researchers:
3. Find list of unique Tamil words in tawiki
By: Shrinivasan T
What it does:
It took the entire tamil wikipedia dump and extracted all unique words
out of it. About 1.3 million unique tamil words were extracted. Has
multiple applications, including a tamil spell checker.
Status:
Code and the dataset live on github:
https://github.com/tshrinivasan/tamil-wikipedia-word-list
--
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation