Extracting German noun forms

List overview All Threads
Download

newer

older

Workshop Multilingual Linked Open...

dbpedia-template for lemmatized...

Gerd Zechmeister

31 May 2012 31 May '12

4:12 p.m.

Hi,

I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.

Example: http://de.wiktionary.org/wiki/Haus

I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser Genitiv des Hauses der Häuser Dativ dem Haus dem Hause den Häusern Akkusativ das Haus die Häuser

Any idea how to get this? a SPARQL query expression?

regards, Gerd

-- Gerd Zechmeister Research & Development Manager

Semantic Web Company GmbH Mariahilfer Straße 70 / 8 A - 1070 Vienna, Austria Tel +43 1 402 12 35 - 28 Fax +43 1 402 12 35 - 22 Mobile +43 650 3905697

http://www.semantic-web.at http://blog.semantic-web.at http://poolparty.biz

LOD2 - Creating Knowledge out of Interlinked Data - http://lod2.eu/

social: http://at.linkedin.com/pub/gerd-zechmeister/26/504/49a http://www.xing.com/profile/Gerd_Zechmeister?sc_o=mxb_p

Show replies by date

Christoph Lauer

31 May 31 May

8:42 p.m.

...

Hi,

I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.

Example: http://de.wiktionary.org/wiki/Haus

I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser Genitiv des Hauses der Häuser Dativ dem Haus dem Hause den Häusern Akkusativ das Haus die Häuser

Any idea how to get this? a SPARQL query expression?

regards, Gerd

Hi Gerd, I suppose you mean the dbpedia dumps from wiktionary, because the wiktionary xml dumps contain the box data. If that is so, you're right that unfortunately they are not in there. So a SPARQL query won't help you either, it will give you the same informations that are in the dumps. In order to add these informations you would have to write a template for the "Entry Layout" as explained on the dbpedia website, but I'm not an expert on that, maybe Jonas can tell you more about that, or if it's even possible. Sory I can't help you any further :-)

Gerd Zechmeister

9:18 p.m.

Hi Christoph,

thanks for your reply! In between we'll investigate here at SWC as well and let you know.

btw: Virtuoso returns an error when querying the endpoint (http://wiktionary.dbpedia.org/sparql) with the expression below. Is that an encoding issue?

SELECT * WHERE { ?s ?p ?o FILTER(bif:contains(?o, "häuser")) }

Regards, Gerd

----- Ursprüngliche Mail ----- Von: "Christoph Lauer" dbpedia@online.ms An: "The Wiktionary (http://www.wiktionary.org) mailing list" wiktionary-l@lists.wikimedia.org Gesendet: Donnerstag, 31. Mai 2012 17:12:16 Betreff: Re: [Wiktionary-l] Extracting German noun forms

...

Hi,

I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.

Example: http://de.wiktionary.org/wiki/Haus

I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser Genitiv des Hauses der Häuser Dativ dem Haus dem Hause den Häusern Akkusativ das Haus die Häuser

Any idea how to get this? a SPARQL query expression?

regards, Gerd

_______________________________________________ Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Christoph Lauer

1 Jun 1 Jun

12:37 a.m.

Hi Gerd, Apparently SPARQL has problems with mutated vovels, you'll have to use the Unicode escape sequence for the letter "ä" instead (something with \u and 4 characters). Regards, Christoph

...

Hi Christoph,

thanks for your reply! In between we'll investigate here at SWC as well and let you know.

btw: Virtuoso returns an error when querying the endpoint (http://wiktionary.dbpedia.org/sparql) with the expression below. Is that an encoding issue?

SELECT * WHERE { ?s ?p ?o FILTER(bif:contains(?o, "häuser")) }

Regards, Gerd

----- Ursprüngliche Mail ----- Von: "Christoph Lauer" dbpedia@online.ms An: "The Wiktionary (http://www.wiktionary.org) mailing list" wiktionary-l@lists.wikimedia.org Gesendet: Donnerstag, 31. Mai 2012 17:12:16 Betreff: Re: [Wiktionary-l] Extracting German noun forms

...
Hi,

I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.

Example: http://de.wiktionary.org/wiki/Haus

I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser Genitiv des Hauses der Häuser Dativ dem Haus dem Hause den Häusern Akkusativ das Haus die Häuser

Any idea how to get this? a SPARQL query expression?

regards, Gerd

Hi Gerd, I suppose you mean the dbpedia dumps from wiktionary, because the wiktionary xml dumps contain the box data. If that is so, you're right that unfortunately they are not in there. So a SPARQL query won't help you either, it will give you the same informations that are in the dumps. In order to add these informations you would have to write a template for the "Entry Layout" as explained on the dbpedia website, but I'm not an expert on that, maybe Jonas can tell you more about that, or if it's even possible. Sory I can't help you any further :-)

Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Lars Aronsson

3:38 p.m.

On 2012-05-31 12:42, Gerd Zechmeister wrote:

...

I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.

Example: http://de.wiktionary.org/wiki/Haus

I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser

This is provided in the wiki template call

{{Deutsch Substantiv Übersicht |... |Nominativ Singular=das Haus |Nominativ Plural=die Häuser ...

That you find in this XML dump (only 50 MB compressed), http://dumps.wikimedia.org/dewiktionary/20120526/dewiktionary-20120526-pages...

An old Perl script for parsing the XML dumps is found here, http://meta.wikimedia.org/wiki/User:LA2/Extraktor

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Gerd Zechmeister

4 p.m.

Thanks, Lars! this seems to be right source ;)

----- Ursprüngliche Mail ----- Von: "Lars Aronsson" lars@aronsson.se An: "The Wiktionary (http://www.wiktionary.org) mailing list" wiktionary-l@lists.wikimedia.org Gesendet: Freitag, 1. Juni 2012 12:08:12 Betreff: Re: [Wiktionary-l] Extracting German noun forms

On 2012-05-31 12:42, Gerd Zechmeister wrote:

...

I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.

Example: http://de.wiktionary.org/wiki/Haus

I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser

This is provided in the wiki template call

{{Deutsch Substantiv Übersicht |... |Nominativ Singular=das Haus |Nominativ Plural=die Häuser ...

That you find in this XML dump (only 50 MB compressed), http://dumps.wikimedia.org/dewiktionary/20120526/dewiktionary-20120526-pages...

An old Perl script for parsing the XML dumps is found here, http://meta.wikimedia.org/wiki/User:LA2/Extraktor

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se _______________________________________________ Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Jonas Brekle

8:59 p.m.

regarding the forms: currently that is not part of the dataset yet. And unfortunatley its not very easy to add it. I think it even would require some enhancement to the extractor (not just the config). But its on my todo list... However such "boxes" of word forms are probably easier to extract with the default DPpedia infobox extractor. Maybe the DBpedia community could help with that. The biggest problem there would be to determine the right "context" (i.e. the subject URI)... i crossposted this to DBpedia, so they can reply

Regards, Jonas

Am Freitag, den 01.06.2012, 12:08 +0200 schrieb Lars Aronsson:

...

On 2012-05-31 12:42, Gerd Zechmeister wrote:

...
I'd like to extract German noun forms (Kasus and Numerus) but didn't find this data in the provided dumps.

Example: http://de.wiktionary.org/wiki/Haus

I need the data from the box: Kasus Singular Plural Nominativ das Haus die Häuser

This is provided in the wiki template call

{{Deutsch Substantiv Übersicht |... |Nominativ Singular=das Haus |Nominativ Plural=die Häuser ...

That you find in this XML dump (only 50 MB compressed), http://dumps.wikimedia.org/dewiktionary/20120526/dewiktionary-20120526-pages...

An old Perl script for parsing the XML dumps is found here, http://meta.wikimedia.org/wiki/User:LA2/Extraktor

4561

Age (days ago)

4562

Last active (days ago)

wiktionary-l@lists.wikimedia.org

6 comments

4 participants

tags (0)

participants (4)

Christoph Lauer
Gerd Zechmeister
Jonas Brekle
Lars Aronsson