2012/3/29 Sumana Harihareswara sumanah@wikimedia.org:
From a recent hackathon
https://www.mediawiki.org/wiki/Chennai_Hackathon_March_2012 , a project that might be of interest to researchers:
- Find list of unique Tamil words in tawiki
By: Shrinivasan T
What it does: It took the entire tamil wikipedia dump and extracted all unique words out of it. About 1.3 million unique tamil words were extracted. Has multiple applications, including a tamil spell checker.
Status: Code and the dataset live on github: https://github.com/tshrinivasan/tamil-wikipedia-word-list
Spel chekkerz rool.
There are similar projects like this. I didn't try to run any of them, but here's one with a very similar description: https://github.com/pune-lug/shabdakosh
It would be nice if they would unite their efforts to make better language proofing tools for all languages.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore