2009/10/17 Lars Aronsson lars@aronsson.se
Kelly Jones wrote:
How can I extract just a word list w/ definitions from wiktionary?
A very simple Perl script for extracting information from the Wikimedia XML dumps is found on http://meta.wikimedia.org/wiki/User:LA2/Extraktor
If you know Perl, you can modify this script to filter out the articles and sections you want, and output them separately.
There are some Perl tools I've made on the Toolserver subversion repository
but since I'm the only one using them so far they be tricky for others to use: https://fisheye.toolserver.org/browse/enwikt/wiktdump/ wiktsplitnames.pl https://fisheye.toolserver.org/browse/enwikt/wiktdump/wiktsplitnames.plwill split an English Wiktionary dumpfile into word lists or mini dump files for each language at the most simplistic level
I've also created a feature request on bugzilla: "Regularly publish updated word lists and definition lists" https://bugzilla.wikimedia.org/show_bug.cgi?id=21164
Andrew Dunbar (hippietrail)
-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l