Hi,
I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
Example: http://de.wiktionary.org/wiki/Haus
I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser Genitiv des Hauses der Häuser Dativ dem Haus dem Hause den Häusern Akkusativ das Haus die Häuser
Any idea how to get this? a SPARQL query expression?
regards, Gerd
-- Gerd Zechmeister Research & Development Manager
Semantic Web Company GmbH Mariahilfer Straße 70 / 8 A - 1070 Vienna, Austria Tel +43 1 402 12 35 - 28 Fax +43 1 402 12 35 - 22 Mobile +43 650 3905697
http://www.semantic-web.at http://blog.semantic-web.at http://poolparty.biz
LOD2 - Creating Knowledge out of Interlinked Data - http://lod2.eu/
social: http://at.linkedin.com/pub/gerd-zechmeister/26/504/49a http://www.xing.com/profile/Gerd_Zechmeister?sc_o=mxb_p
Hi,
I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
Example: http://de.wiktionary.org/wiki/Haus
I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser Genitiv des Hauses der Häuser Dativ dem Haus dem Hause den Häusern Akkusativ das Haus die Häuser
Any idea how to get this? a SPARQL query expression?
regards, Gerd
Hi Gerd, I suppose you mean the dbpedia dumps from wiktionary, because the wiktionary xml dumps contain the box data. If that is so, you're right that unfortunately they are not in there. So a SPARQL query won't help you either, it will give you the same informations that are in the dumps. In order to add these informations you would have to write a template for the "Entry Layout" as explained on the dbpedia website, but I'm not an expert on that, maybe Jonas can tell you more about that, or if it's even possible. Sory I can't help you any further :-)
Hi Christoph,
thanks for your reply! In between we'll investigate here at SWC as well and let you know.
btw: Virtuoso returns an error when querying the endpoint (http://wiktionary.dbpedia.org/sparql) with the expression below. Is that an encoding issue?
SELECT * WHERE { ?s ?p ?o FILTER(bif:contains(?o, "häuser")) }
Regards, Gerd
----- Ursprüngliche Mail ----- Von: "Christoph Lauer" dbpedia@online.ms An: "The Wiktionary (http://www.wiktionary.org) mailing list" wiktionary-l@lists.wikimedia.org Gesendet: Donnerstag, 31. Mai 2012 17:12:16 Betreff: Re: [Wiktionary-l] Extracting German noun forms
Hi,
I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
Example: http://de.wiktionary.org/wiki/Haus
I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser Genitiv des Hauses der Häuser Dativ dem Haus dem Hause den Häusern Akkusativ das Haus die Häuser
Any idea how to get this? a SPARQL query expression?
regards, Gerd
Hi Gerd, I suppose you mean the dbpedia dumps from wiktionary, because the wiktionary xml dumps contain the box data. If that is so, you're right that unfortunately they are not in there. So a SPARQL query won't help you either, it will give you the same informations that are in the dumps. In order to add these informations you would have to write a template for the "Entry Layout" as explained on the dbpedia website, but I'm not an expert on that, maybe Jonas can tell you more about that, or if it's even possible. Sory I can't help you any further :-)
_______________________________________________ Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Hi Gerd, Apparently SPARQL has problems with mutated vovels, you'll have to use the Unicode escape sequence for the letter "ä" instead (something with \u and 4 characters). Regards, Christoph
Hi Christoph,
thanks for your reply! In between we'll investigate here at SWC as well and let you know.
btw: Virtuoso returns an error when querying the endpoint (http://wiktionary.dbpedia.org/sparql) with the expression below. Is that an encoding issue?
SELECT * WHERE { ?s ?p ?o FILTER(bif:contains(?o, "häuser")) }
Regards, Gerd
----- Ursprüngliche Mail ----- Von: "Christoph Lauer" dbpedia@online.ms An: "The Wiktionary (http://www.wiktionary.org) mailing list" wiktionary-l@lists.wikimedia.org Gesendet: Donnerstag, 31. Mai 2012 17:12:16 Betreff: Re: [Wiktionary-l] Extracting German noun forms
Hi,
I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
Example: http://de.wiktionary.org/wiki/Haus
I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser Genitiv des Hauses der Häuser Dativ dem Haus dem Hause den Häusern Akkusativ das Haus die Häuser
Any idea how to get this? a SPARQL query expression?
regards, Gerd
Hi Gerd, I suppose you mean the dbpedia dumps from wiktionary, because the wiktionary xml dumps contain the box data. If that is so, you're right that unfortunately they are not in there. So a SPARQL query won't help you either, it will give you the same informations that are in the dumps. In order to add these informations you would have to write a template for the "Entry Layout" as explained on the dbpedia website, but I'm not an expert on that, maybe Jonas can tell you more about that, or if it's even possible. Sory I can't help you any further :-)
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
On 2012-05-31 12:42, Gerd Zechmeister wrote:
I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
Example: http://de.wiktionary.org/wiki/Haus
I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser
This is provided in the wiki template call
{{Deutsch Substantiv Übersicht |... |Nominativ Singular=das Haus |Nominativ Plural=die Häuser ...
That you find in this XML dump (only 50 MB compressed), http://dumps.wikimedia.org/dewiktionary/20120526/dewiktionary-20120526-pages...
An old Perl script for parsing the XML dumps is found here, http://meta.wikimedia.org/wiki/User:LA2/Extraktor
Thanks, Lars! this seems to be right source ;)
----- Ursprüngliche Mail ----- Von: "Lars Aronsson" lars@aronsson.se An: "The Wiktionary (http://www.wiktionary.org) mailing list" wiktionary-l@lists.wikimedia.org Gesendet: Freitag, 1. Juni 2012 12:08:12 Betreff: Re: [Wiktionary-l] Extracting German noun forms
On 2012-05-31 12:42, Gerd Zechmeister wrote:
I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
Example: http://de.wiktionary.org/wiki/Haus
I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser
This is provided in the wiki template call
{{Deutsch Substantiv Übersicht |... |Nominativ Singular=das Haus |Nominativ Plural=die Häuser ...
That you find in this XML dump (only 50 MB compressed), http://dumps.wikimedia.org/dewiktionary/20120526/dewiktionary-20120526-pages...
An old Perl script for parsing the XML dumps is found here, http://meta.wikimedia.org/wiki/User:LA2/Extraktor
regarding the forms: currently that is not part of the dataset yet. And unfortunatley its not very easy to add it. I think it even would require some enhancement to the extractor (not just the config). But its on my todo list... However such "boxes" of word forms are probably easier to extract with the default DPpedia infobox extractor. Maybe the DBpedia community could help with that. The biggest problem there would be to determine the right "context" (i.e. the subject URI)... i crossposted this to DBpedia, so they can reply
Regards, Jonas
Am Freitag, den 01.06.2012, 12:08 +0200 schrieb Lars Aronsson:
On 2012-05-31 12:42, Gerd Zechmeister wrote:
I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.
Example: http://de.wiktionary.org/wiki/Haus
I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser
This is provided in the wiki template call
{{Deutsch Substantiv Übersicht |... |Nominativ Singular=das Haus |Nominativ Plural=die Häuser ...
That you find in this XML dump (only 50 MB compressed), http://dumps.wikimedia.org/dewiktionary/20120526/dewiktionary-20120526-pages...
An old Perl script for parsing the XML dumps is found here, http://meta.wikimedia.org/wiki/User:LA2/Extraktor
wiktionary-l@lists.wikimedia.org