Re: [Wiktionary-l] Extracting German noun forms

1 Jun 2012


      regarding the forms: currently that is not part of the dataset yet. And
unfortunatley its not very easy to add it. I think it even would require
some enhancement to the extractor (not just the config). But its on my
todo list...
However such "boxes" of word forms are probably easier to extract with
the default DPpedia infobox extractor. Maybe the DBpedia community could
help with that. The biggest problem there would be to determine the
right "context" (i.e. the subject URI)...
i crossposted this to DBpedia, so they can reply
Regards,
Jonas
Am Freitag, den 01.06.2012, 12:08 +0200 schrieb Lars Aronsson:
...
On 2012-05-31 12:42, Gerd Zechmeister wrote:
...
I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
Example: http://de.wiktionary.org/wiki/Haus
I need the data from the box:
Kasus 	Singular 	Plural
Nominativ 	das Haus 	die Häuser
This is provided in the wiki template call
{{Deutsch Substantiv Übersicht
|...
|Nominativ Singular=das Haus
|Nominativ Plural=die Häuser
...
That you find in this XML dump (only 50 MB compressed),
http://dumps.wikimedia.org/dewiktionary/20120526/dewiktionary-20120526-pages...
An old Perl script for parsing the XML dumps is found here,
http://meta.wikimedia.org/wiki/User:LA2/Extraktor

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wiktionary-l] Extracting German noun forms