Andrew Dunbar wrote:
I'm writing a new Wiktionary parser and I'm
wondering if anybody else
who has made or is making or wants to make a Wiktionary parser would
like to share some thoughts.
My main aim is to mine translation data to use with my other project,
Linguaphile, a language translator.
At the moment I'm parsing the XML dump file but I also want an
interface to fetch wiktext from the live Wiktionary.
I'm focusing on the English Wiktionary first because I know its
format, but I'd also like to target the other bigger Wiktionaries.
Another thing I'm thinking about is a central repository for
Wiktionary parser source code. The code I'm making now is in Perl but
I'm sure others have code in Python.
I know several people have parsed the English Wiktionary - has anybody
made parsers for other Wiktionaries yet?
Let's hear what you are working on.
Andrew Dunbar (hippietrail)
The wiktionary.py class of pywikipediabot  has "alpha" support for
the English and Dutch Wiktionaries. Since several large Wiktionaries use
the Dutch "templatized" format, it should be simple to extend support to
those wikis as well.
Minh Nguyen <mxn(a)zoomtown.com>
[[en:User:Mxn]] [[vi:User:Mxn]] [[m:User:Mxn]]
AIM: trycom2000; Jabber: mxn(a)myjabber.net; Blog: http://notes.1ec5.org/