From a recent hackathon
https://www.mediawiki.org/wiki/Chennai_Hackathon_March_2012 , a project that might be of interest to researchers:
- Find list of unique Tamil words in tawiki
By: Shrinivasan T
What it does: It took the entire tamil wikipedia dump and extracted all unique words out of it. About 1.3 million unique tamil words were extracted. Has multiple applications, including a tamil spell checker.
Status: Code and the dataset live on github: https://github.com/tshrinivasan/tamil-wikipedia-word-list