2012/3/29 Sumana Harihareswara <sumanah(a)wikimedia.org>rg>:
From a recent
hackathon
https://www.mediawiki.org/wiki/Chennai_Hackathon_March_2012 , a project
that might be of interest to researchers:
> 3. Find list of unique Tamil words in tawiki
> By: Shrinivasan T
>
> What it does:
> It took the entire tamil wikipedia dump and extracted all unique words
> out of it. About 1.3 million unique tamil words were extracted. Has
> multiple applications, including a tamil spell checker.
>
> Status:
> Code and the dataset live on github:
>
https://github.com/tshrinivasan/tamil-wikipedia-word-list
Spel chekkerz rool.
There are similar projects like this. I didn't try to run any of them,
but here's one with a very similar description:
https://github.com/pune-lug/shabdakosh
It would be nice if they would unite their efforts to make better
language proofing tools for all languages.
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore