I'm writing a new Wiktionary parser and I'm wondering if anybody else
who has made or is making or wants to make a Wiktionary parser would
like to share some thoughts.
My main aim is to mine translation data to use with my other project,
Linguaphile, a language translator.
At the moment I'm parsing the XML dump file but I also want an
interface to fetch wiktext from the live Wiktionary.
I'm focusing on the English Wiktionary first because I know its
format, but I'd also like to target the other bigger Wiktionaries.
Another thing I'm thinking about is a central repository for
Wiktionary parser source code. The code I'm making now is in Perl but
I'm sure others have code in Python.
I know several people have parsed the English Wiktionary - has anybody
made parsers for other Wiktionaries yet?
Let's hear what you are working on.
Andrew Dunbar (hippietrail)