Hello everyone, sorry for my partially bad english. I would like to use the dbpedia-based wiktionary framework (through dumps or online access) for a program I'm developing, which (trys to) extract the article name and property from a natural language question and queries the dbpedia for the answer. Particularly interesting would be the extraction of a property from a verb, e.g. if I ask when someone was "born", the program has to find a connection to the noun "birth", which can then be further processed. Wiktionary provides that link by first relating to the base form "to bear" and there under "Etymology 2 - Verb" the transitive meaning "give birth". The connection in the german wiktionary is a little different, there the link to the base form is under "Grammatische Merkmale" (grammatical properties), and in the base form of the verb the noun "Geburt" (birth) is found under "Abgeleitete Begriffe" (derived terms). I would be very happy if these informations could be extracted into the dbpedia-wiktionary, in a unified way for all languages. Unfortunately I'm not such an expert programmer, it would probably take weeks until I found my way through mercurial, maven and the entire source code of the framework to do the extraction myself, so I was hoping, someone with more experience with the framework could do it (if it doesn't take weeks of work :-) ). Thanks in advance! With regards, Christoph Lauer
On 2012-05-14 15:44, Christoph Lauer wrote:
The connection in the german wiktionary is a little different, there the link to the base form is under "Grammatische Merkmale" (grammatical properties), and in the base form of the verb the noun "Geburt" (birth) is found under "Abgeleitete Begriffe" (derived terms). I would be very happy if these informations could be extracted into the dbpedia-wiktionary, in a unified way for all languages.
If you look around the various languages of Wiktionary, you will find that German is the exception. Most languages follow the pattern of the English Wiktionary. If you want things to work the same way for all languages, the German Wiktionary would need to be restructured from scratch. This is not likely to happen.
Still, the entry for bear (English Wiktionary, etymology 2, verb) does list "born" as the participle near the headword. There is also a list ofderived terms (bear down, bear up, ...), it just doesn't list "birth" yet, but I think you are free to add it.
Am 14.05.2012 16:11, schrieb Lars Aronsson:
On 2012-05-14 15:44, Christoph Lauer wrote:
The connection in the german wiktionary is a little different, there the link to the base form is under "Grammatische Merkmale" (grammatical properties), and in the base form of the verb the noun "Geburt" (birth) is found under "Abgeleitete Begriffe" (derived terms). I would be very happy if these informations could be extracted into the dbpedia-wiktionary, in a unified way for all languages.
If you look around the various languages of Wiktionary, you will find that German is the exception. Most languages follow the pattern of the English Wiktionary. If you want things to work the same way for all languages, the German Wiktionary would need to be restructured from scratch. This is not likely to happen.
Still, the entry for bear (English Wiktionary, etymology 2, verb) does list "born" as the participle near the headword. There is also a list ofderived terms (bear down, bear up, ...), it just doesn't list "birth" yet, but I think you are free to add it.
Thanks for the information. Too bad the german wiktionary makes such exceptions there, it's the wiktionary I wanted to use :-( However my central problem was that none of these informations aren't available in the RDF dumps or through the SPARQL endpoint http://wiktionary.dbpedia.org/sparql, neither born -> bear, nor bear -> birth/give birth, I thought maybe someone knows if there are plans to import these informations. Does the project, which creates the dumps, has a name anyway? Like dbpedia, the project creating the dumps from wikipedia.
wiktionary.dbpedia.org is part of the DBpedia project and not associated with Wiktionary or Wikimedia etc. It doesnt have a special name yet.
That the article "born" is not fully parsed, is a bug, as far as i can see now. when you look at its html representation http://wiktionary.dbpedia.org/page/born you can see that only the language section (English) was parsed, there should be a link to the PoS section too... I will have a look into it soon.
But generally, if data is missing in the wiki, change it there (and wait for us to make a new dump) or if its not parsed, have a look at the configuration xml file. If its a general problem with the general entry layout, thats hard to change. But in this case, its a bug.
Regards, Jonas
Am Montag, den 14.05.2012, 16:54 +0200 schrieb Christoph Lauer:
Am 14.05.2012 16:11, schrieb Lars Aronsson:
On 2012-05-14 15:44, Christoph Lauer wrote:
The connection in the german wiktionary is a little different, there the link to the base form is under "Grammatische Merkmale" (grammatical properties), and in the base form of the verb the noun "Geburt" (birth) is found under "Abgeleitete Begriffe" (derived terms). I would be very happy if these informations could be extracted into the dbpedia-wiktionary, in a unified way for all languages.
If you look around the various languages of Wiktionary, you will find that German is the exception. Most languages follow the pattern of the English Wiktionary. If you want things to work the same way for all languages, the German Wiktionary would need to be restructured from scratch. This is not likely to happen.
Still, the entry for bear (English Wiktionary, etymology 2, verb) does list "born" as the participle near the headword. There is also a list ofderived terms (bear down, bear up, ...), it just doesn't list "birth" yet, but I think you are free to add it.
Thanks for the information. Too bad the german wiktionary makes such exceptions there, it's the wiktionary I wanted to use :-( However my central problem was that none of these informations aren't available in the RDF dumps or through the SPARQL endpoint http://wiktionary.dbpedia.org/sparql, neither born -> bear, nor bear -> birth/give birth, I thought maybe someone knows if there are plans to import these informations. Does the project, which creates the dumps, has a name anyway? Like dbpedia, the project creating the dumps from wikipedia.
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
On 14/05/12 07:54 AM, Christoph Lauer wrote:
Am 14.05.2012 16:11, schrieb Lars Aronsson:
On 2012-05-14 15:44, Christoph Lauer wrote:
The connection in the german wiktionary is a little different, there the link to the base form is under "Grammatische Merkmale" (grammatical properties), and in the base form of the verb the noun "Geburt" (birth) is found under "Abgeleitete Begriffe" (derived terms). I would be very happy if these informations could be extracted into the dbpedia-wiktionary, in a unified way for all languages.
If you look around the various languages of Wiktionary, you will find that German is the exception. Most languages follow the pattern of the English Wiktionary. If you want things to work the same way for all languages, the German Wiktionary would need to be restructured from scratch. This is not likely to happen.
Still, the entry for bear (English Wiktionary, etymology 2, verb) does list "born" as the participle near the headword. There is also a list ofderived terms (bear down, bear up, ...), it just doesn't list "birth" yet, but I think you are free to add it.
Thanks for the information. Too bad the german wiktionary makes such exceptions there, it's the wiktionary I wanted to use :-( However my central problem was that none of these informations aren't available in the RDF dumps or through the SPARQL endpoint http://wiktionary.dbpedia.org/sparql, neither born -> bear, nor bear -> birth/give birth, I thought maybe someone knows if there are plans to import these informations. Does the project, which creates the dumps, has a name anyway? Like dbpedia, the project creating the dumps from wikipedia.
Yes, there is no 'official' project which has developed structured metadata for terms. There are a couple of projects which are working with such information, the most mature of which is proprietary but the JWKTL has a rich output.
The Wikipedia project itself has begun to implement semantic data via microformats[1]. Microformats embed some semantic structures directly into articles in a machine-readable manner without affecting the display of the article.
The primary method of applying microformats in Mediawiki software is via the templates which we use. This seems like an obvious and simple way for Wiktionaries - which make extensive use of templates to display already-structured information - to add machine-readable structures to our content. There are currently browser plugins/extensions for users to take advantage of this added layer of data, plus of course websites and webapps and other third-party ventures.
Amgine
On 2012-05-14 16:54, Christoph Lauer wrote:
However my central problem was that none of these informations aren't available in the RDF dumps or through the SPARQL endpoint http://wiktionary.dbpedia.org/sparql, neither born -> bear, nor bear ->
Wiktionary is highly concentrated: A few people and a few templates generate the vast majority of the content. I think I created half of the Swedish language entries in the English Wiktionary. If the people (who?) who run dbpedia.org can explain their needs, perhaps the templates used in Wiktionary can better support the extraction of structured data. I don't recall getting any feedback from them.
For the purpose of Swedish entries in the English Wiktionary, "född" (born, geboren) is treated as an adjective (since it is inflected as an adjective), with its role as participle of the verb being indicated in the etymology section. The template {{sv-verb-form-pastpart|föda}} expands to the text "past participle of föda" and also adds a category: Swedish past participles, but it doesn't contain any other mark-up that says this is a past participle. I have no idea how this is treated by dbpedia.
The DBpedia Wiktionary parser does not have a special use case. It aims for flexibility. The parser can be configured by anyone to fit their use case. It is also not limited to Wiktionary, we intend to parse other Wikis such as http://wikihow.com orhttp://wikitravel.org as well
DBpedia Wiktionary follows several visions: 1. if it is possible to get the data that you have put into Wiktionary out again, Wiktionary will be strengthened as a central resource. 2. Efforts to extract data from Wiktionary can be focused into one collaborative project. Therefore not everybody has to write his/her own parser. 3. DBpedia Wiktionary has the potential to become a major hub of: http://linguistics.okfn.org/resources/llod/ as DBpedia is the central hub of http://richard.cyganiak.de/2007/10/lod/
It will need some more work to improve the config files step by step for each language, but it is not unrealistic. During the next week, we will add dumps for several more languages. We will migrate the config files somewhere user-friendly. So people who want to get data, will have no need to download and install software and know mercurial or Scala. Sebastian
On 05/14/2012 09:19 PM, Lars Aronsson wrote:
On 2012-05-14 16:54, Christoph Lauer wrote:
However my central problem was that none of these informations aren't available in the RDF dumps or through the SPARQL endpoint http://wiktionary.dbpedia.org/sparql, neither born -> bear, nor bear ->
Wiktionary is highly concentrated: A few people and a few templates generate the vast majority of the content. I think I created half of the Swedish language entries in the English Wiktionary. If the people (who?) who run dbpedia.org can explain their needs, perhaps the templates used in Wiktionary can better support the extraction of structured data. I don't recall getting any feedback from them.
For the purpose of Swedish entries in the English Wiktionary, "född" (born, geboren) is treated as an adjective (since it is inflected as an adjective), with its role as participle of the verb being indicated in the etymology section. The template {{sv-verb-form-pastpart|föda}} expands to the text "past participle of föda" and also adds a category: Swedish past participles, but it doesn't contain any other mark-up that says this is a past participle. I have no idea how this is treated by dbpedia.
Hi Christoph,
if you're interested in accessing the "Abgeleitete Begriffe" (derived terms) from the German Wiktionary, you could use the JWKTL software [1]. It is a Java library for parsing German and English Wiktionary dump files and accessing much of the information encoded in Wiktionary in a structured way.
Hope it helps!
Best regards, Christian
[1] http://www.ukp.tu-darmstadt.de/software/jwktl/
________________________________________ Von: wiktionary-l-bounces@lists.wikimedia.org [wiktionary-l-bounces@lists.wikimedia.org]" im Auftrag von "Christoph Lauer [dbpedia@online.ms] Gesendet: Montag, 14. Mai 2012 15:44 An: wiktionary-l@lists.wikimedia.org Betreff: [Wiktionary-l] more grammatical information in the extraction framework
Hello everyone, sorry for my partially bad english. I would like to use the dbpedia-based wiktionary framework (through dumps or online access) for a program I'm developing, which (trys to) extract the article name and property from a natural language question and queries the dbpedia for the answer. Particularly interesting would be the extraction of a property from a verb, e.g. if I ask when someone was "born", the program has to find a connection to the noun "birth", which can then be further processed. Wiktionary provides that link by first relating to the base form "to bear" and there under "Etymology 2 - Verb" the transitive meaning "give birth". The connection in the german wiktionary is a little different, there the link to the base form is under "Grammatische Merkmale" (grammatical properties), and in the base form of the verb the noun "Geburt" (birth) is found under "Abgeleitete Begriffe" (derived terms). I would be very happy if these informations could be extracted into the dbpedia-wiktionary, in a unified way for all languages. Unfortunately I'm not such an expert programmer, it would probably take weeks until I found my way through mercurial, maven and the entire source code of the framework to do the extraction myself, so I was hoping, someone with more experience with the framework could do it (if it doesn't take weeks of work :-) ). Thanks in advance! With regards, Christoph Lauer
_______________________________________________ Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Hi Christian, Thanks I'll look into it, maybe this is just what I need. Regards, Christoph
Am 14.05.2012 17:19, schrieb Christian Meyer:
Hi Christoph,
if you're interested in accessing the "Abgeleitete Begriffe" (derived terms) from the German Wiktionary, you could use the JWKTL software [1]. It is a Java library for parsing German and English Wiktionary dump files and accessing much of the information encoded in Wiktionary in a structured way.
Hope it helps!
Best regards, Christian
[1] http://www.ukp.tu-darmstadt.de/software/jwktl/
Von: wiktionary-l-bounces@lists.wikimedia.org [wiktionary-l-bounces@lists.wikimedia.org]" im Auftrag von "Christoph Lauer [dbpedia@online.ms] Gesendet: Montag, 14. Mai 2012 15:44 An: wiktionary-l@lists.wikimedia.org Betreff: [Wiktionary-l] more grammatical information in the extraction framework
Hello everyone, sorry for my partially bad english. I would like to use the dbpedia-based wiktionary framework (through dumps or online access) for a program I'm developing, which (trys to) extract the article name and property from a natural language question and queries the dbpedia for the answer. Particularly interesting would be the extraction of a property from a verb, e.g. if I ask when someone was "born", the program has to find a connection to the noun "birth", which can then be further processed. Wiktionary provides that link by first relating to the base form "to bear" and there under "Etymology 2 - Verb" the transitive meaning "give birth". The connection in the german wiktionary is a little different, there the link to the base form is under "Grammatische Merkmale" (grammatical properties), and in the base form of the verb the noun "Geburt" (birth) is found under "Abgeleitete Begriffe" (derived terms). I would be very happy if these informations could be extracted into the dbpedia-wiktionary, in a unified way for all languages. Unfortunately I'm not such an expert programmer, it would probably take weeks until I found my way through mercurial, maven and the entire source code of the framework to do the extraction myself, so I was hoping, someone with more experience with the framework could do it (if it doesn't take weeks of work :-) ). Thanks in advance! With regards, Christoph Lauer
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
wiktionary-l@lists.wikimedia.org