regarding the forms: currently that is not part of the dataset yet. And unfortunatley its not very easy to add it. I think it even would require some enhancement to the extractor (not just the config). But its on my todo list... However such "boxes" of word forms are probably easier to extract with the default DPpedia infobox extractor. Maybe the DBpedia community could help with that. The biggest problem there would be to determine the right "context" (i.e. the subject URI)... i crossposted this to DBpedia, so they can reply
Regards, Jonas
Am Freitag, den 01.06.2012, 12:08 +0200 schrieb Lars Aronsson:
On 2012-05-31 12:42, Gerd Zechmeister wrote:
I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
Example: http://de.wiktionary.org/wiki/Haus
I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser
This is provided in the wiki template call
{{Deutsch Substantiv Übersicht |... |Nominativ Singular=das Haus |Nominativ Plural=die Häuser ...
That you find in this XML dump (only 50 MB compressed), http://dumps.wikimedia.org/dewiktionary/20120526/dewiktionary-20120526-pages...
An old Perl script for parsing the XML dumps is found here, http://meta.wikimedia.org/wiki/User:LA2/Extraktor