2012/3/29 Sumana Harihareswara <sumanah(a)wikimedia.org>rg>:
From a recent
, a project
that might be of interest to researchers:
> 3. Find list of unique Tamil words in tawiki
> By: Shrinivasan T
> What it does:
> It took the entire tamil wikipedia dump and extracted all unique words
> out of it. About 1.3 million unique tamil words were extracted. Has
> multiple applications, including a tamil spell checker.
> Code and the dataset live on github:
Spel chekkerz rool.
There are similar projects like this. I didn't try to run any of them,
but here's one with a very similar description:
It would be nice if they would unite their efforts to make better
language proofing tools for all languages.
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
“We're living in pieces,
I want to live in peace.” – T. Moore