Hey everyone :)
Wiktionary is our third-largest sister project, both in term of active editors and readers. It is a unique resource, with the goal to provide a dictionary for every language, in every language. Since the beginning of Wikidata but increasingly over the past months I have been getting more and more requests for supporting Wiktionary and lexicographical data in Wikidata. Having this data available openly and freely licensed would be a major step forward in automated translation, text analysis, text generation and much more. It will enable and ease research. And most importantly it will enable the individual Wiktionary communities to work more closely together and benefit from each other’s work.
With this and the increased demand to support Wikimedia Commons with Wikidata, we have looked at the bigger picture and our options. I am seeing a lot of overlap in the work we need to do to support Wiktionary and Commons. I am also seeing increasing pressure to store lexicographical data in existing items (which would be bad for many reasons).
Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans. We contacted several of our partners in order to get funding for this additional work. I am happy that Google agreed to provide funding (restricted to work on Wikidata). With this we can reorganize our team and set up one part of the team to continue working on building out the core of Wikidata and support for Wikipedia and Commons and the other part will concentrate on Wiktionary. (To support and to extend our work around Wikidata with the help of external funding sources was our plan in our annual plan 2016: https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2015-2016_round1/Wikime...)
As a next step I’d like us all to have another careful look at the latest proposal at https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development. It has been online for input in its current form for a year and the first version is 3 years old now. So I am confident that the proposal is in a good shape to start implementation. However I’d like to do a last round of feedback with you all to make sure the concept really is sane. To make it easier to understand there is now also a pdf explaining the concept in a slightly different way: https://commons.wikimedia.org/wiki/File:Wikidata_for_Wiktionary_announcement... Please do go ahead and review it. If you have comments or questions please leave them on the talk page of the latest proposal at https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/Development/Proposals.... I’d be especially interested in feedback from editors who are familiar with both Wiktionary and Wikidata.
Getting support for Wiktionary done - just like for Commons - will take some time but I am really excited about the opportunities it will open up especially for languages that have so far not gotten much or any technological support.
Cheers Lydia
Hoi, You assume that it is not good to have lexicological information in our existing items. With Wiktionary support you bring such information on board. It would be really awkward when for every concept there has to be an item in two databases.
Why is there this problem with lexicologival information and how will the current data be linked to the future "Wiktionary-data" information if there are to be two databases? Thanks, GerardM
PS I cannot find this question or an answer in the PDF.
On 13 September 2016 at 15:17, Lydia Pintscher <lydia.pintscher@wikimedia.de
wrote:
Hey everyone :)
Wiktionary is our third-largest sister project, both in term of active editors and readers. It is a unique resource, with the goal to provide a dictionary for every language, in every language. Since the beginning of Wikidata but increasingly over the past months I have been getting more and more requests for supporting Wiktionary and lexicographical data in Wikidata. Having this data available openly and freely licensed would be a major step forward in automated translation, text analysis, text generation and much more. It will enable and ease research. And most importantly it will enable the individual Wiktionary communities to work more closely together and benefit from each other’s work.
With this and the increased demand to support Wikimedia Commons with Wikidata, we have looked at the bigger picture and our options. I am seeing a lot of overlap in the work we need to do to support Wiktionary and Commons. I am also seeing increasing pressure to store lexicographical data in existing items (which would be bad for many reasons).
Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans. We contacted several of our partners in order to get funding for this additional work. I am happy that Google agreed to provide funding (restricted to work on Wikidata). With this we can reorganize our team and set up one part of the team to continue working on building out the core of Wikidata and support for Wikipedia and Commons and the other part will concentrate on Wiktionary. (To support and to extend our work around Wikidata with the help of external funding sources was our plan in our annual plan 2016: https://meta.wikimedia.org/wiki/Grants:APG/Proposals/ 2015-2016_round1/Wikimedia_Deutschland_e.V./Proposal_ form#Financials:_current_funding_period)
As a next step I’d like us all to have another careful look at the latest proposal at https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development. It has been online for input in its current form for a year and the first version is 3 years old now. So I am confident that the proposal is in a good shape to start implementation. However I’d like to do a last round of feedback with you all to make sure the concept really is sane. To make it easier to understand there is now also a pdf explaining the concept in a slightly different way: https://commons.wikimedia.org/wiki/File:Wikidata_for_ Wiktionary_announcement.pdf Please do go ahead and review it. If you have comments or questions please leave them on the talk page of the latest proposal at https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/ Development/Proposals/2015-05. I’d be especially interested in feedback from editors who are familiar with both Wiktionary and Wikidata.
Getting support for Wiktionary done - just like for Commons - will take some time but I am really excited about the opportunities it will open up especially for languages that have so far not gotten much or any technological support.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hello Gerard,
We won't create a second database, only improve the existing one with new types of entities, especially for Lexemes. They will have their own specific structure, and will be linked to the concepts (items) by their statements.
The statements you can make about a word are very different from the statements you can make about a concept, so we will keep them separated.
Bests,
On 13 September 2016 at 15:37, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, You assume that it is not good to have lexicological information in our existing items. With Wiktionary support you bring such information on board. It would be really awkward when for every concept there has to be an item in two databases.
Why is there this problem with lexicologival information and how will the current data be linked to the future "Wiktionary-data" information if there are to be two databases? Thanks, GerardM
PS I cannot find this question or an answer in the PDF.
On 13 September 2016 at 15:17, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
Hey everyone :)
Wiktionary is our third-largest sister project, both in term of active editors and readers. It is a unique resource, with the goal to provide a dictionary for every language, in every language. Since the beginning of Wikidata but increasingly over the past months I have been getting more and more requests for supporting Wiktionary and lexicographical data in Wikidata. Having this data available openly and freely licensed would be a major step forward in automated translation, text analysis, text generation and much more. It will enable and ease research. And most importantly it will enable the individual Wiktionary communities to work more closely together and benefit from each other’s work.
With this and the increased demand to support Wikimedia Commons with Wikidata, we have looked at the bigger picture and our options. I am seeing a lot of overlap in the work we need to do to support Wiktionary and Commons. I am also seeing increasing pressure to store lexicographical data in existing items (which would be bad for many reasons).
Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans. We contacted several of our partners in order to get funding for this additional work. I am happy that Google agreed to provide funding (restricted to work on Wikidata). With this we can reorganize our team and set up one part of the team to continue working on building out the core of Wikidata and support for Wikipedia and Commons and the other part will concentrate on Wiktionary. (To support and to extend our work around Wikidata with the help of external funding sources was our plan in our annual plan 2016: https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2015- 2016_round1/Wikimedia_Deutschland_e.V./Proposal_form# Financials:_current_funding_period)
As a next step I’d like us all to have another careful look at the latest proposal at https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development. It has been online for input in its current form for a year and the first version is 3 years old now. So I am confident that the proposal is in a good shape to start implementation. However I’d like to do a last round of feedback with you all to make sure the concept really is sane. To make it easier to understand there is now also a pdf explaining the concept in a slightly different way: https://commons.wikimedia.org/wiki/File:Wikidata_for_Wiktion ary_announcement.pdf Please do go ahead and review it. If you have comments or questions please leave them on the talk page of the latest proposal at https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/Devel opment/Proposals/2015-05. I’d be especially interested in feedback from editors who are familiar with both Wiktionary and Wikidata.
Getting support for Wiktionary done - just like for Commons - will take some time but I am really excited about the opportunities it will open up especially for languages that have so far not gotten much or any technological support.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 13.09.2016 um 15:37 schrieb Gerard Meijssen:
Hoi, You assume that it is not good to have lexicological information in our existing items. With Wiktionary support you bring such information on board. It would be really awkward when for every concept there has to be an item in two databases.
It will be two namespaces in the same project.
But we will not duplicate items. The proposed structure is not concept-centered like Omegawiki is. It will be centered about lexemes, like Wiktionary is, but with a higher level of granularity (a lexeme corresponds to one "morphological" section on a Wiktionary page).
Why is there this problem with lexicologival information and how will the current data be linked to the future "Wiktionary-data" information if there are to be two databases?
Because "bumblebee" <instance-of> "noun" conflicts with "bumblebee" <subclass-of> "insect". They can't both be true for the same thing, because nouns are not insects. One is true for the word, the other is true for the concept. So they need to be treated separately.
Hoi, The database design for OmegaWiki had a distinction between the concept and all the derivatives for them. The lexemes were all connected to the concept and independent of the spelling they are connected to the concept. Obviously this is language dependent.
So bumblebee is more complex than just "instance of" noun. It is an English noun. "Hommel" is connected as a Dutch noun for the same concept and "hommels" is the Dutch plural...
Obviously. Thanks, GerardM
On 13 September 2016 at 16:35, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 13.09.2016 um 15:37 schrieb Gerard Meijssen:
Hoi, You assume that it is not good to have lexicological information in our
existing
items. With Wiktionary support you bring such information on board. It
would be
really awkward when for every concept there has to be an item in two
databases.
It will be two namespaces in the same project.
But we will not duplicate items. The proposed structure is not concept-centered like Omegawiki is. It will be centered about lexemes, like Wiktionary is, but with a higher level of granularity (a lexeme corresponds to one "morphological" section on a Wiktionary page).
Why is there this problem with lexicologival information and how will the current data be linked to the future "Wiktionary-data" information if
there are
to be two databases?
Because "bumblebee" <instance-of> "noun" conflicts with "bumblebee" <subclass-of> "insect". They can't both be true for the same thing, because nouns are not insects. One is true for the word, the other is true for the concept. So they need to be treated separately.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 13.09.2016 um 17:16 schrieb Gerard Meijssen:
Hoi, The database design for OmegaWiki had a distinction between the concept and all the derivatives for them.
Wikidata will have Lexemes and their Forms and Senses.
So bumblebee is more complex than just "instance of" noun. It is an English noun. "Hommel" is connected as a Dutch noun for the same concept and "hommels" is the Dutch plural...
Wikidata would have a Lexeme for "bumblebee" (english noun) and one for "Hommel" (dutch noun). Both would have a sense that would describe them as a flying insect (and perhaps other word senses, such as Q1626135, a creater on the moon). The senses that refer to the flying insect would be considered translations of each other, and both senses would refer to the same concept.
So "bumblebee" (insect) is a translation of "Hommel" (insect), and both refer to the genus Bombus (Q25407). "Hommel" (creater) would share the morphology of "Hommel" (insect), as it has the same forms (I assume), but it won't share the translations.
Having lexeme-specific word-senses avoids the loss of connotation and nuance that you get when you force words of different languages on a shared meaning. The effect of referring to the same concept can still be achieved via the reference to a concept (item).
Hoi, The main thing to remember is that all these lexemes are in fact the labels we currently hold. The relatisation that this is true is key. Thanks, Gerard
On 13 September 2016 at 18:30, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 13.09.2016 um 17:16 schrieb Gerard Meijssen:
Hoi, The database design for OmegaWiki had a distinction between the concept
and all
the derivatives for them.
Wikidata will have Lexemes and their Forms and Senses.
So bumblebee is more complex than just "instance of" noun. It is an
English
noun. "Hommel" is connected as a Dutch noun for the same concept and
"hommels"
is the Dutch plural...
Wikidata would have a Lexeme for "bumblebee" (english noun) and one for "Hommel" (dutch noun). Both would have a sense that would describe them as a flying insect (and perhaps other word senses, such as Q1626135, a creater on the moon). The senses that refer to the flying insect would be considered translations of each other, and both senses would refer to the same concept.
So "bumblebee" (insect) is a translation of "Hommel" (insect), and both refer to the genus Bombus (Q25407). "Hommel" (creater) would share the morphology of "Hommel" (insect), as it has the same forms (I assume), but it won't share the translations.
Having lexeme-specific word-senses avoids the loss of connotation and nuance that you get when you force words of different languages on a shared meaning. The effect of referring to the same concept can still be achieved via the reference to a concept (item).
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
\o/
On Tue, Sep 13, 2016 at 6:18 AM Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
Hey everyone :)
Wiktionary is our third-largest sister project, both in term of active editors and readers. It is a unique resource, with the goal to provide a dictionary for every language, in every language. Since the beginning of Wikidata but increasingly over the past months I have been getting more and more requests for supporting Wiktionary and lexicographical data in Wikidata. Having this data available openly and freely licensed would be a major step forward in automated translation, text analysis, text generation and much more. It will enable and ease research. And most importantly it will enable the individual Wiktionary communities to work more closely together and benefit from each other’s work.
With this and the increased demand to support Wikimedia Commons with Wikidata, we have looked at the bigger picture and our options. I am seeing a lot of overlap in the work we need to do to support Wiktionary and Commons. I am also seeing increasing pressure to store lexicographical data in existing items (which would be bad for many reasons).
Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans. We contacted several of our partners in order to get funding for this additional work. I am happy that Google agreed to provide funding (restricted to work on Wikidata). With this we can reorganize our team and set up one part of the team to continue working on building out the core of Wikidata and support for Wikipedia and Commons and the other part will concentrate on Wiktionary. (To support and to extend our work around Wikidata with the help of external funding sources was our plan in our annual plan 2016:
https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2015-2016_round1/Wikime... )
As a next step I’d like us all to have another careful look at the latest proposal at https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development. It has been online for input in its current form for a year and the first version is 3 years old now. So I am confident that the proposal is in a good shape to start implementation. However I’d like to do a last round of feedback with you all to make sure the concept really is sane. To make it easier to understand there is now also a pdf explaining the concept in a slightly different way:
https://commons.wikimedia.org/wiki/File:Wikidata_for_Wiktionary_announcement... Please do go ahead and review it. If you have comments or questions please leave them on the talk page of the latest proposal at
https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/Development/Proposals... . I’d be especially interested in feedback from editors who are familiar with both Wiktionary and Wikidata.
Getting support for Wiktionary done - just like for Commons - will take some time but I am really excited about the opportunities it will open up especially for languages that have so far not gotten much or any technological support.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Tue, 13 Sep 2016 at 06:18 Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans.
Fantastic news. I'm hugely excited about this.
J.
This is great. :) Crossing fingers!
L.
2016-09-13 15:17 GMT+02:00 Lydia Pintscher lydia.pintscher@wikimedia.de:
Hey everyone :)
Wiktionary is our third-largest sister project, both in term of active editors and readers. It is a unique resource, with the goal to provide a dictionary for every language, in every language. Since the beginning of Wikidata but increasingly over the past months I have been getting more and more requests for supporting Wiktionary and lexicographical data in Wikidata. Having this data available openly and freely licensed would be a major step forward in automated translation, text analysis, text generation and much more. It will enable and ease research. And most importantly it will enable the individual Wiktionary communities to work more closely together and benefit from each other’s work.
With this and the increased demand to support Wikimedia Commons with Wikidata, we have looked at the bigger picture and our options. I am seeing a lot of overlap in the work we need to do to support Wiktionary and Commons. I am also seeing increasing pressure to store lexicographical data in existing items (which would be bad for many reasons).
Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans. We contacted several of our partners in order to get funding for this additional work. I am happy that Google agreed to provide funding (restricted to work on Wikidata). With this we can reorganize our team and set up one part of the team to continue working on building out the core of Wikidata and support for Wikipedia and Commons and the other part will concentrate on Wiktionary. (To support and to extend our work around Wikidata with the help of external funding sources was our plan in our annual plan 2016: https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2015-2016_round1/Wikime...)
As a next step I’d like us all to have another careful look at the latest proposal at https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development. It has been online for input in its current form for a year and the first version is 3 years old now. So I am confident that the proposal is in a good shape to start implementation. However I’d like to do a last round of feedback with you all to make sure the concept really is sane. To make it easier to understand there is now also a pdf explaining the concept in a slightly different way: https://commons.wikimedia.org/wiki/File:Wikidata_for_Wiktionary_announcement... Please do go ahead and review it. If you have comments or questions please leave them on the talk page of the latest proposal at https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/Development/Proposals.... I’d be especially interested in feedback from editors who are familiar with both Wiktionary and Wikidata.
Getting support for Wiktionary done - just like for Commons - will take some time but I am really excited about the opportunities it will open up especially for languages that have so far not gotten much or any technological support.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I'm really glad to see this. I have been an avid contributor to Wiktionary for a few years, until about 10 years ago. Then Openstreetmap caught my attention and Wiktionary became dull as it was mostly fighting vandalism at some point.
I'm certainly going to follow up on this,
Polyglot
2016-09-13 15:17 GMT+02:00 Lydia Pintscher lydia.pintscher@wikimedia.de:
Hey everyone :)
Wiktionary is our third-largest sister project, both in term of active editors and readers. It is a unique resource, with the goal to provide a dictionary for every language, in every language. Since the beginning of Wikidata but increasingly over the past months I have been getting more and more requests for supporting Wiktionary and lexicographical data in Wikidata. Having this data available openly and freely licensed would be a major step forward in automated translation, text analysis, text generation and much more. It will enable and ease research. And most importantly it will enable the individual Wiktionary communities to work more closely together and benefit from each other’s work.
With this and the increased demand to support Wikimedia Commons with Wikidata, we have looked at the bigger picture and our options. I am seeing a lot of overlap in the work we need to do to support Wiktionary and Commons. I am also seeing increasing pressure to store lexicographical data in existing items (which would be bad for many reasons).
Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans. We contacted several of our partners in order to get funding for this additional work. I am happy that Google agreed to provide funding (restricted to work on Wikidata). With this we can reorganize our team and set up one part of the team to continue working on building out the core of Wikidata and support for Wikipedia and Commons and the other part will concentrate on Wiktionary. (To support and to extend our work around Wikidata with the help of external funding sources was our plan in our annual plan 2016: https://meta.wikimedia.org/wiki/Grants:APG/Proposals/ 2015-2016_round1/Wikimedia_Deutschland_e.V./Proposal_ form#Financials:_current_funding_period)
As a next step I’d like us all to have another careful look at the latest proposal at https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development. It has been online for input in its current form for a year and the first version is 3 years old now. So I am confident that the proposal is in a good shape to start implementation. However I’d like to do a last round of feedback with you all to make sure the concept really is sane. To make it easier to understand there is now also a pdf explaining the concept in a slightly different way: https://commons.wikimedia.org/wiki/File:Wikidata_for_ Wiktionary_announcement.pdf Please do go ahead and review it. If you have comments or questions please leave them on the talk page of the latest proposal at https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/ Development/Proposals/2015-05. I’d be especially interested in feedback from editors who are familiar with both Wiktionary and Wikidata.
Getting support for Wiktionary done - just like for Commons - will take some time but I am really excited about the opportunities it will open up especially for languages that have so far not gotten much or any technological support.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hey everyone,
Happy to hear that Wiktionary will make it into Wikidata. Looks like a long but promising way to go. I just happened to edit Wikidata items about dictionary types as part of my work on taxonomy extraction (see presentation http://dx.doi.org/10.5281/zenodo.61767, paper http://ceur-ws.org/Vol-1676/paper2.pdf, and tool https://www.npmjs.com/package/wikidata-taxonomy). The current state of dictionary types and instances in Wikidata is given below, created with "wdtaxonomy Q23622" on the command line.
I welcome everbody familiar with and/or interested in lexicography to improve this classification (there is no WikiProject lexicography yet). One problem of Wiktionary is that it tries to subsume all kinds of dictionaries, making it less practical for specific use cases. I bet that Wiktionary data in Wikidata will allow to extract special kinds of dictionaries by selection of Wikidata properties and queries some day in the future. Before this I hope that we also have some better understanding of dictionary types.
Cheers Jakob
dictionary (Q23622) •126 ×279 ↑↑ ├──lexicon (Q8096) •38 ×4 ↑ ├──lexicographic thesaurus (Q179797) •48 ×7 │ └──synonym dictionary (Q2376111) •1 ├──orthographic dictionary (Q378914) •7 ├──etymological dictionary (Q521983) •13 ×2 ├──frequency list (Q697133) •6 ├──glossary (Q859161) •35 ×14 ↑ ├──visual dictionary (Q861712) •7 ×1 ├──explanatory dictionary (Q897755) •6 ×4 │ ├──explanatory combinatorial dictionary (Q4459737) •2 │ └──monolingual learner's dictionary (Q6901667) •2 │ └──Advanced learner's dictionary (Q17011199) •2 ├──encyclopedic dictionary (Q975413) •9 ×43 ↑ │ └──biographical encyclopedia (Q1787111) •9 ×69 ↑ │ ├──group biography (Q1499601) •1 ×1 │ ├──national biography (Q21050458) •1 ×1 │ ├──epoch biography (Q21050912) •1 │ └──dictionary of people (Q26721650) •3 ├──pronouncing dictionary (Q1048400) •6 ├──reverse dictionary (Q1304223) •9 ├──machine-readable dictionary (Q1327461) •10 │ └──online dictionary (Q3327521) •6 ×9 │ └──Wiktionary language edition (Q22001389) ×7 ↑↑ ├──single-field dictionary (Q1391417) •3 ×3 ↑ │ ├──law dictionary (Q1464287) •5 │ └──medical dictionary (Q6806507) •2 ×1 ├──dictionary of foreign words (Q1455182) •1 ├──concise dictionary (Q1575315) •1 ├──idioticon (Q1656835) •2 ×1 ├──??? (Q1722340) •1 ├──learner's dictionary (Q1820290) •1 │ ╘══monolingual learner's dictionary (Q6901667) •2 … ├──??? (Q2134855) •1 ├──rime dictionary (Q2191807) •7 ×4 ├──rhyming dictionary (Q2210568) •11 ├──conceptual dictionary (Q2361647) •6 ╞══synonym dictionary (Q2376111) •1 … ├──pocket dictionary (Q2394934) •1 ├──bilingual dictionary (Q2640207) •13 ×2 ├──slang dictionary (Q3808854) •3 ×2 ├──idiom dictionary (Q4492301) •5 ├──Anagram dictionary (Q4750851) •1 ├──author dictionary (Q5805540) •1 ├──language for specific purposes dictionary (Q6486734) •1 ├──phonetic dictionary (Q7187214) •1 ├──picture dictionary (Q7191193) •2 ├──specialized dictionary (Q7574915) •2 │ └──sub-field dictionary (Q7630614) •1 ├──defining vocabulary (Q15192747) •1 ↑ └──multi-field dictionary (Q17094463) •1
Héllo,
I am very happy of this news.
I a wiki newbie interested in using wikidata to do text analysis. I try to follow the discussion here and on french wiktionary.
I take this as opportunity to try to sum up some concerns that are raised on french wiktionary [0]:
- How wikidata and wiktionary databases will be synchronized?
- Will editing wiktionary change? The concern is that this will make editing wiktionary more difficult for people.
- Also, what about bots. Will bots be allowed/able to edit wiktionary pages after the support of wikidata in wiktionary?
- Another concern is that if edits are done in some wiktionary and that edit has an impact on another wiktionary. People will have trouble to reconcil their opinion given they don't speak the same language. Can an edit in a wiktionary A break wiktionary B?
I understand that wikidata requires new code to support the organisation of new relations between the data. I understand that with wikidata it will be easy to create interwiki links and thesaurus kind of pages but what else can provide wikidata to wiktionary?
[0] https://fr.wiktionary.org/wiki/Projet:Coop%C3%A9ration/Wikidata
Thanks,
i⋅am⋅amz3
On 2016-09-13 15:17, Lydia Pintscher wrote:
Hey everyone :)
Wiktionary is our third-largest sister project, both in term of active editors and readers. It is a unique resource, with the goal to provide a dictionary for every language, in every language. Since the beginning of Wikidata but increasingly over the past months I have been getting more and more requests for supporting Wiktionary and lexicographical data in Wikidata. Having this data available openly and freely licensed would be a major step forward in automated translation, text analysis, text generation and much more. It will enable and ease research. And most importantly it will enable the individual Wiktionary communities to work more closely together and benefit from each other’s work.
With this and the increased demand to support Wikimedia Commons with Wikidata, we have looked at the bigger picture and our options. I am seeing a lot of overlap in the work we need to do to support Wiktionary and Commons. I am also seeing increasing pressure to store lexicographical data in existing items (which would be bad for many reasons).
Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans. We contacted several of our partners in order to get funding for this additional work. I am happy that Google agreed to provide funding (restricted to work on Wikidata). With this we can reorganize our team and set up one part of the team to continue working on building out the core of Wikidata and support for Wikipedia and Commons and the other part will concentrate on Wiktionary. (To support and to extend our work around Wikidata with the help of external funding sources was our plan in our annual plan 2016: https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2015-2016_round1/Wikime...)
As a next step I’d like us all to have another careful look at the latest proposal at https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development. It has been online for input in its current form for a year and the first version is 3 years old now. So I am confident that the proposal is in a good shape to start implementation. However I’d like to do a last round of feedback with you all to make sure the concept really is sane. To make it easier to understand there is now also a pdf explaining the concept in a slightly different way: https://commons.wikimedia.org/wiki/File:Wikidata_for_Wiktionary_announcement... Please do go ahead and review it. If you have comments or questions please leave them on the talk page of the latest proposal at https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/Development/Proposals.... I’d be especially interested in feedback from editors who are familiar with both Wiktionary and Wikidata.
Getting support for Wiktionary done - just like for Commons - will take some time but I am really excited about the opportunities it will open up especially for languages that have so far not gotten much or any technological support.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Great news, and thanks!
How will Wiktionary with Wikidata anticipate "voice" and "voice in translation" (for example in Google Voice or in other parallel projects)? And how will Wiktionary/Wikidata also anticipate all 7,097 living languages (Ethnologue) / 7943 entries under languages (Glottolog) and re machine translation? (I'll look at the plan more closely about this).
Thank you, Scott
On Tue, Sep 13, 2016 at 1:53 PM, Amirouche Boubekki <amirouche@hypermove.net
wrote:
Héllo,
I am very happy of this news.
I a wiki newbie interested in using wikidata to do text analysis. I try to follow the discussion here and on french wiktionary.
I take this as opportunity to try to sum up some concerns that are raised on french wiktionary [0]:
How wikidata and wiktionary databases will be synchronized?
Will editing wiktionary change? The concern is that this will make
editing wiktionary more difficult for people.
- Also, what about bots. Will bots be allowed/able to edit wiktionary
pages after the support of wikidata in wiktionary?
- Another concern is that if edits are done in some wiktionary and that
edit has an impact on another wiktionary. People will have trouble to reconcil their opinion given they don't speak the same language. Can an edit in a wiktionary A break wiktionary B?
I understand that wikidata requires new code to support the organisation of new relations between the data. I understand that with wikidata it will be easy to create interwiki links and thesaurus kind of pages but what else can provide wikidata to wiktionary?
[0] https://fr.wiktionary.org/wiki/Projet:Coop%C3%A9ration/Wikidata
Thanks,
i⋅am⋅amz3
On 2016-09-13 15:17, Lydia Pintscher wrote:
Hey everyone :)
Wiktionary is our third-largest sister project, both in term of active editors and readers. It is a unique resource, with the goal to provide a dictionary for every language, in every language. Since the beginning of Wikidata but increasingly over the past months I have been getting more and more requests for supporting Wiktionary and lexicographical data in Wikidata. Having this data available openly and freely licensed would be a major step forward in automated translation, text analysis, text generation and much more. It will enable and ease research. And most importantly it will enable the individual Wiktionary communities to work more closely together and benefit from each other’s work.
With this and the increased demand to support Wikimedia Commons with Wikidata, we have looked at the bigger picture and our options. I am seeing a lot of overlap in the work we need to do to support Wiktionary and Commons. I am also seeing increasing pressure to store lexicographical data in existing items (which would be bad for many reasons).
Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans. We contacted several of our partners in order to get funding for this additional work. I am happy that Google agreed to provide funding (restricted to work on Wikidata). With this we can reorganize our team and set up one part of the team to continue working on building out the core of Wikidata and support for Wikipedia and Commons and the other part will concentrate on Wiktionary. (To support and to extend our work around Wikidata with the help of external funding sources was our plan in our annual plan 2016: https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2015- 2016_round1/Wikimedia_Deutschland_e.V./Proposal_form# Financials:_current_funding_period)
As a next step I’d like us all to have another careful look at the latest proposal at https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development. It has been online for input in its current form for a year and the first version is 3 years old now. So I am confident that the proposal is in a good shape to start implementation. However I’d like to do a last round of feedback with you all to make sure the concept really is sane. To make it easier to understand there is now also a pdf explaining the concept in a slightly different way: https://commons.wikimedia.org/wiki/File:Wikidata_for_Wiktion ary_announcement.pdf Please do go ahead and review it. If you have comments or questions please leave them on the talk page of the latest proposal at https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/Devel opment/Proposals/2015-05. I’d be especially interested in feedback from editors who are familiar with both Wiktionary and Wikidata.
Getting support for Wiktionary done - just like for Commons - will take some time but I am really excited about the opportunities it will open up especially for languages that have so far not gotten much or any technological support.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Amirouche ~ amz3 ~ http://www.hyperdev.fr
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hello,
Thanks a lot for your questions and feedbacks. Here are some answers, I hope these will be useful.
*- How wikidata and wiktionary databases will be synchronized?* New entity types will be created in Wikidata database, with new ids (ex. L for lexemes). A Wiktionary will have the possibility to include data from Wikidata in their pages (the complete entity or only some chosen statements, as the community decides)
*- Will editing wiktionary change?* Yes, changes will happen, but we're working on making editing Wiktionary easier. Soon as we can provide some mockups, we will share them with you for collecting feedbacks.
*- Will bots be allowed/able to edit wiktionary pages after the support of wikidata in wiktionary?* Yes, of course, we want the data to be machine-readable and editable, and with the new structure, bots will be able to edit data stored in Wikidata and still Wiktionary pages.
*- Can an edit in a wiktionary A break wiktionary B?* Data about lexemes will be stored on Wikidata, and Wiktionaries will choose if they want to use this data, which part of it and how. Yes, if an information stored in Wikidata and displayed on several Wiktionaries is modified, via Wikidata or a Wiktionary interface, this will affect all the pages where the information is included. Because Wikidata is a multilingual project, we already have to deal with the language issue, and we hope that with the increase of the numbers of editors coming from Wikidata and Wiktionaries, it will become easier to find people with at least one common language to communicate between the different projects.
*- What else can provide wikidata to wiktionary?* Machine-readable data will allow users to create new tools, useful for editors, based on the communities' needs. By helping the different communities (Wiktionaries and Wikidata) working together on the same project, we expect a growth of the number of people editing the lexicographical data, providing more review and a better quality of the data. Finally, when centralized and structured, the data will be easily reusable by third parties, other websites or applications... and give a better visibility of the volunteers' work.
On 13 September 2016 at 22:53, Amirouche Boubekki amirouche@hypermove.net wrote:
Héllo,
I am very happy of this news.
I a wiki newbie interested in using wikidata to do text analysis. I try to follow the discussion here and on french wiktionary.
I take this as opportunity to try to sum up some concerns that are raised on french wiktionary [0]:
How wikidata and wiktionary databases will be synchronized?
Will editing wiktionary change? The concern is that this will make
editing wiktionary more difficult for people.
- Also, what about bots. Will bots be allowed/able to edit wiktionary
pages after the support of wikidata in wiktionary?
- Another concern is that if edits are done in some wiktionary and that
edit has an impact on another wiktionary. People will have trouble to reconcil their opinion given they don't speak the same language. Can an edit in a wiktionary A break wiktionary B?
I understand that wikidata requires new code to support the organisation of new relations between the data. I understand that with wikidata it will be easy to create interwiki links and thesaurus kind of pages but what else can provide wikidata to wiktionary?
[0] https://fr.wiktionary.org/wiki/Projet:Coop%C3%A9ration/Wikidata
Thanks,
i⋅am⋅amz3
On 2016-09-13 15:17, Lydia Pintscher wrote:
Hey everyone :)
Wiktionary is our third-largest sister project, both in term of active editors and readers. It is a unique resource, with the goal to provide a dictionary for every language, in every language. Since the beginning of Wikidata but increasingly over the past months I have been getting more and more requests for supporting Wiktionary and lexicographical data in Wikidata. Having this data available openly and freely licensed would be a major step forward in automated translation, text analysis, text generation and much more. It will enable and ease research. And most importantly it will enable the individual Wiktionary communities to work more closely together and benefit from each other’s work.
With this and the increased demand to support Wikimedia Commons with Wikidata, we have looked at the bigger picture and our options. I am seeing a lot of overlap in the work we need to do to support Wiktionary and Commons. I am also seeing increasing pressure to store lexicographical data in existing items (which would be bad for many reasons).
Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans. We contacted several of our partners in order to get funding for this additional work. I am happy that Google agreed to provide funding (restricted to work on Wikidata). With this we can reorganize our team and set up one part of the team to continue working on building out the core of Wikidata and support for Wikipedia and Commons and the other part will concentrate on Wiktionary. (To support and to extend our work around Wikidata with the help of external funding sources was our plan in our annual plan 2016: https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2015- 2016_round1/Wikimedia_Deutschland_e.V./Proposal_form# Financials:_current_funding_period)
As a next step I’d like us all to have another careful look at the latest proposal at https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development. It has been online for input in its current form for a year and the first version is 3 years old now. So I am confident that the proposal is in a good shape to start implementation. However I’d like to do a last round of feedback with you all to make sure the concept really is sane. To make it easier to understand there is now also a pdf explaining the concept in a slightly different way: https://commons.wikimedia.org/wiki/File:Wikidata_for_Wiktion ary_announcement.pdf Please do go ahead and review it. If you have comments or questions please leave them on the talk page of the latest proposal at https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/Devel opment/Proposals/2015-05. I’d be especially interested in feedback from editors who are familiar with both Wiktionary and Wikidata.
Getting support for Wiktionary done - just like for Commons - will take some time but I am really excited about the opportunities it will open up especially for languages that have so far not gotten much or any technological support.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Amirouche ~ amz3 ~ http://www.hyperdev.fr
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 14.09.2016 um 10:51 schrieb Léa Lacroix:
/- What else can provide wikidata to wiktionary?/ Machine-readable data will allow users to create new tools, useful for editors, based on the communities' needs. By helping the different communities (Wiktionaries and Wikidata) working together on the same project, we expect a growth of the number of people editing the lexicographical data, providing more review and a better quality of the data. Finally, when centralized and structured, the data will be easily reusable by third parties, other websites or applications... and give a better visibility of the volunteers' work.
Here are some examples of things that will become possible with the new structure:
* the fact that the English word "sleeper" may refer to a railway tie, and in which regions this is the case, only has to be entered once, not separately in each Wiktionary.
* the fact that "Stuhl" is the German translation of (a specific sense of) the English word "chair" only has to be entered once, not separately in each Wiktionary.
* by connecting lexeme-sense to concepts (items), it will become possible to automatically search for potential synonyms and translations to other languages.
* by providing a statement defining the morphological class of a lexeme, it becomes possible to automatically generate derived forms for display and search
* different representations (spellings, scripts) of a lexeme can be covered by a single entry, information about word senses does not have to be repeated.
* the search interface will know about languages and word types, so you can search specifically for "french verb dormir" (or perhaps more technical "lang:fr a:Q24905 dormir")
* Similarly, you can search for or filter by epoch, region, linguistic convention or methodology, etc.
- Will editing wiktionary change?
Yes, changes will happen, but we're working on making editing Wiktionary easier. Soon as we can provide some mockups, we will share them with you for collecting feedbacks.
The question is if you consider editing wikitext with complex nested templates "easy" or not. With wikidata, editing would be form-based, with input fields and suggestions. This makes it a lot easier especially for new editors. And even for experienced editors, I think it's more convenient for editing individual bits of information.
The form-based approach is less convenient when you want to enter a lot of information at once. The solution is to identify the use cases for this, and provide a specialized interface for that use case. This does not have to depend on Wikibase developers, it can also be done by wiki users using gadgets, Labs-based tools, or even bots.
Because Wikidata is a multilingual project, we already have to deal with the language issue, and we hope that with the increase of the numbers of editors coming from Wikidata and Wiktionaries, it will become easier to find people with at least one common language to communicate between the different projects.
Interestingly, we found that on wikidata there is rarely a conflict about whether a statement about an item should say X or Y, e.g. whether Chelsea Manning's gender should be given as "transgender female" or just "female" or even "male". The conflict does not arise because you can and should simply add all three, and use qualifiers and source references to specify who claimed which of these, and for which period of time.
Long discussions do take place about the overall organization of information on wikidata, about which properties to have and how to use them, about whether substances like "ethanol" should be considered subclasses or instance of classes like "alcohol".
I agree however that cross-lingual discussions are indeed an issue, and finding techniques and strategies to help with communication between the speakers of different languages will be a challenge. But isn't the Wiktionary community perfectly equipped for just that challenge? Isn't it just the crowd you would ask if you had to solve a problem like this? I would (along perhaps with the folks from translatewiki.net).
Hi everyone,
FYI, there is an ongoing Wikimedia IEG project (main grantee in CC), which seems to be following a related direction: https://meta.wikimedia.org/wiki/Grants:IEG/A_graphical_and_interactive_etymo...
Its first phase will translate Wiktionary into machine-readable data. I think it is worth to consider reusing its outcome if possible, since it may fit into the Wikidata data model: https://meta.wikimedia.org/wiki/Grants_talk:IEG/A_graphical_and_interactive_...
Cheers,
Marco
On 9/14/16 10:51, Léa Lacroix wrote:
Hello,
Thanks a lot for your questions and feedbacks. Here are some answers, I hope these will be useful.
/- How wikidata and wiktionary databases will be synchronized?/ New entity types will be created in Wikidata database, with new ids (ex. L for lexemes). A Wiktionary will have the possibility to include data from Wikidata in their pages (the complete entity or only some chosen statements, as the community decides)
/- Will editing wiktionary change?/ Yes, changes will happen, but we're working on making editing Wiktionary easier. Soon as we can provide some mockups, we will share them with you for collecting feedbacks.
/- Will bots be allowed/able to edit wiktionary pages after the support of wikidata in wiktionary?/ Yes, of course, we want the data to be machine-readable and editable, and with the new structure, bots will be able to edit data stored in Wikidata and still Wiktionary pages.
/- Can an edit in a wiktionary A break wiktionary B?/ Data about lexemes will be stored on Wikidata, and Wiktionaries will choose if they want to use this data, which part of it and how. Yes, if an information stored in Wikidata and displayed on several Wiktionaries is modified, via Wikidata or a Wiktionary interface, this will affect all the pages where the information is included. Because Wikidata is a multilingual project, we already have to deal with the language issue, and we hope that with the increase of the numbers of editors coming from Wikidata and Wiktionaries, it will become easier to find people with at least one common language to communicate between the different projects.
/- What else can provide wikidata to wiktionary?/ Machine-readable data will allow users to create new tools, useful for editors, based on the communities' needs. By helping the different communities (Wiktionaries and Wikidata) working together on the same project, we expect a growth of the number of people editing the lexicographical data, providing more review and a better quality of the data. Finally, when centralized and structured, the data will be easily reusable by third parties, other websites or applications... and give a better visibility of the volunteers' work.
On 13 September 2016 at 22:53, Amirouche Boubekki <amirouche@hypermove.net mailto:amirouche@hypermove.net> wrote:
Héllo, I am very happy of this news. I a wiki newbie interested in using wikidata to do text analysis. I try to follow the discussion here and on french wiktionary. I take this as opportunity to try to sum up some concerns that are raised on french wiktionary [0]: - How wikidata and wiktionary databases will be synchronized? - Will editing wiktionary change? The concern is that this will make editing wiktionary more difficult for people. - Also, what about bots. Will bots be allowed/able to edit wiktionary pages after the support of wikidata in wiktionary? - Another concern is that if edits are done in some wiktionary and that edit has an impact on another wiktionary. People will have trouble to reconcil their opinion given they don't speak the same language. Can an edit in a wiktionary A break wiktionary B? I understand that wikidata requires new code to support the organisation of new relations between the data. I understand that with wikidata it will be easy to create interwiki links and thesaurus kind of pages but what else can provide wikidata to wiktionary? [0] https://fr.wiktionary.org/wiki/Projet:Coop%C3%A9ration/Wikidata <https://fr.wiktionary.org/wiki/Projet:Coop%C3%A9ration/Wikidata> Thanks, i⋅am⋅amz3 On 2016-09-13 15:17, Lydia Pintscher wrote: Hey everyone :) Wiktionary is our third-largest sister project, both in term of active editors and readers. It is a unique resource, with the goal to provide a dictionary for every language, in every language. Since the beginning of Wikidata but increasingly over the past months I have been getting more and more requests for supporting Wiktionary and lexicographical data in Wikidata. Having this data available openly and freely licensed would be a major step forward in automated translation, text analysis, text generation and much more. It will enable and ease research. And most importantly it will enable the individual Wiktionary communities to work more closely together and benefit from each other’s work. With this and the increased demand to support Wikimedia Commons with Wikidata, we have looked at the bigger picture and our options. I am seeing a lot of overlap in the work we need to do to support Wiktionary and Commons. I am also seeing increasing pressure to store lexicographical data in existing items (which would be bad for many reasons). Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans. We contacted several of our partners in order to get funding for this additional work. I am happy that Google agreed to provide funding (restricted to work on Wikidata). With this we can reorganize our team and set up one part of the team to continue working on building out the core of Wikidata and support for Wikipedia and Commons and the other part will concentrate on Wiktionary. (To support and to extend our work around Wikidata with the help of external funding sources was our plan in our annual plan 2016: https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2015-2016_round1/Wikimedia_Deutschland_e.V./Proposal_form#Financials:_current_funding_period <https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2015-2016_round1/Wikimedia_Deutschland_e.V./Proposal_form#Financials:_current_funding_period>) As a next step I’d like us all to have another careful look at the latest proposal at https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development <https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development>. It has been online for input in its current form for a year and the first version is 3 years old now. So I am confident that the proposal is in a good shape to start implementation. However I’d like to do a last round of feedback with you all to make sure the concept really is sane. To make it easier to understand there is now also a pdf explaining the concept in a slightly different way: https://commons.wikimedia.org/wiki/File:Wikidata_for_Wiktionary_announcement.pdf <https://commons.wikimedia.org/wiki/File:Wikidata_for_Wiktionary_announcement.pdf> Please do go ahead and review it. If you have comments or questions please leave them on the talk page of the latest proposal at https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/Development/Proposals/2015-05 <https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/Development/Proposals/2015-05>. I’d be especially interested in feedback from editors who are familiar with both Wiktionary and Wikidata. Getting support for Wiktionary done - just like for Commons - will take some time but I am really excited about the opportunities it will open up especially for languages that have so far not gotten much or any technological support. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher <http://about.me/lydia.pintscher> Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de <http://www.wikimedia.de> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207 <tel:27%2F029%2F42207>. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata> -- Amirouche ~ amz3 ~ http://www.hyperdev.fr _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
-- Léa Lacroix Community Communication Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de http://www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
*- How wikidata and wiktionary databases will be synchronized?* New entity types will be created in Wikidata database, with new ids (ex. L for lexemes). A Wiktionary will have the possibility to include data from Wikidata in their pages (the complete entity or only some chosen statements, as the community decides)
The pdf mentions 4 new entity types: Lexeme, Statement, Form, Embedded (?). Curious, was the existing data model not flexible enough?
Will these new entities be restricted to the usage in a lexicographical context, i.e. Wiktionary? How will they fit into the existing data model, will there be links from existing Wikidata items to the new entities? (i.e. how will Wikidata benefit from the new data?)
*- Will editing wiktionary change?* Yes, changes will happen, but we're working on making editing Wiktionary easier. Soon as we can provide some mockups, we will share them with you for collecting feedbacks.
Making contributing to Wiktionary easier will be a huge help. Right now the learning curve is extremely steep, and turning away potential contributors.
One thing to keep in mind is that Wiktionary is more than just the content in the page namespace. A big part of what you see is actually generated dynamically, for example transliteration, pronunciation and grammatical forms (conjugations, plurals etc).
I imagine in an integrated Wikidata/Wiktionary world "content" and code lives in various places, and we'll have a range of automated processes to copy things back and forth, and to automatically create new entries derived from existing ones?
– Jan
Hoi, Please understand that for every label for a current item in Wikidata there should be one lexeme. It would be really helpful when all the new lexemes added are associated with labels. You will then be able to show an item with the conjugation as is preferred for a language.Currently this is not our practise.
When we associate labels with lexemes, we have in fact the missing functionality like indicating that a specific lexeme was preferred up to a point. It allows for people to understand where "Batavia" was and why you will not find "Jakarta" in certain papers. Thanks, GerardM
On 15 September 2016 at 17:40, Jan Berkel jan@berkel.fr wrote:
*- How wikidata and wiktionary databases will be synchronized?* New entity types will be created in Wikidata database, with new ids (ex. L for lexemes). A Wiktionary will have the possibility to include data from Wikidata in their pages (the complete entity or only some chosen statements, as the community decides)
The pdf mentions 4 new entity types: Lexeme, Statement, Form, Embedded (?). Curious, was the existing data model not flexible enough?
Will these new entities be restricted to the usage in a lexicographical context, i.e. Wiktionary? How will they fit into the existing data model, will there be links from existing Wikidata items to the new entities? (i.e. how will Wikidata benefit from the new data?)
*- Will editing wiktionary change?* Yes, changes will happen, but we're working on making editing Wiktionary easier. Soon as we can provide some mockups, we will share them with you for collecting feedbacks.
Making contributing to Wiktionary easier will be a huge help. Right now the learning curve is extremely steep, and turning away potential contributors.
One thing to keep in mind is that Wiktionary is more than just the content in the page namespace. A big part of what you see is actually generated dynamically, for example transliteration, pronunciation and grammatical forms (conjugations, plurals etc).
I imagine in an integrated Wikidata/Wiktionary world "content" and code lives in various places, and we'll have a range of automated processes to copy things back and forth, and to automatically create new entries derived from existing ones?
– Jan
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Yes, there should be some connection between items and lexemes, but I am still hazy about details on how exactly this should look like. If someone could actually make a strawman proposal, that would be great.
I think the connection should live in the statement space, and not be on the level of labels, but that is just a hunch. I'd be happy to see proposals incoming.
On Thu, Sep 15, 2016 at 10:00 PM Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Please understand that for every label for a current item in Wikidata there should be one lexeme. It would be really helpful when all the new lexemes added are associated with labels. You will then be able to show an item with the conjugation as is preferred for a language.Currently this is not our practise.
When we associate labels with lexemes, we have in fact the missing functionality like indicating that a specific lexeme was preferred up to a point. It allows for people to understand where "Batavia" was and why you will not find "Jakarta" in certain papers. Thanks, GerardM
On 15 September 2016 at 17:40, Jan Berkel jan@berkel.fr wrote:
*- How wikidata and wiktionary databases will be synchronized?* New entity types will be created in Wikidata database, with new ids (ex. L for lexemes). A Wiktionary will have the possibility to include data from Wikidata in their pages (the complete entity or only some chosen statements, as the community decides)
The pdf mentions 4 new entity types: Lexeme, Statement, Form, Embedded (?). Curious, was the existing data model not flexible enough?
Will these new entities be restricted to the usage in a lexicographical context, i.e. Wiktionary? How will they fit into the existing data model, will there be links from existing Wikidata items to the new entities? (i.e. how will Wikidata benefit from the new data?)
*- Will editing wiktionary change?* Yes, changes will happen, but we're working on making editing Wiktionary easier. Soon as we can provide some mockups, we will share them with you for collecting feedbacks.
Making contributing to Wiktionary easier will be a huge help. Right now the learning curve is extremely steep, and turning away potential contributors.
One thing to keep in mind is that Wiktionary is more than just the content in the page namespace. A big part of what you see is actually generated dynamically, for example transliteration, pronunciation and grammatical forms (conjugations, plurals etc).
I imagine in an integrated Wikidata/Wiktionary world "content" and code lives in various places, and we'll have a range of automated processes to copy things back and forth, and to automatically create new entries derived from existing ones?
– Jan
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 16.09.2016 um 19:41 schrieb Denny Vrandečić:
Yes, there should be some connection between items and lexemes, but I am still hazy about details on how exactly this should look like. If someone could actually make a strawman proposal, that would be great.
I think the connection should live in the statement space, and not be on the level of labels, but that is just a hunch. I'd be happy to see proposals incoming.
My thinking is this:
On some Sense of a Lexeme, there is a Statement saying that this Sense refers to a given concept (Item). If the property for stating this is well-known, we can track the Sense-to-Item relationship in the database. We can then automatically show the lexeme's lemma as a (pseudo-)alias on the Item, and perhaps also use it (and maybe all forms of the lexeme!) for indexing the item for search. So:
from ( Lexeme - Sense - Statement -> Item ) we can derive ( Item -> Lexeme - Forms )
In the beginning of Wikidata, I was very reluctant about the software knowing about "magic" properties. Now I feel better about this, since wikidata properties are established as a permanent vocabulary that can be used by any software, including our own.
Denny,
I would suggest to use https://en.wiktionary.org/wiki/product as that strawman proposal. Because it has 2 levels of Senses. 3. Anything that is produced (contains 6 sub-senses)
To apply Daniel's thinking... I wonder...
his Lexeme Sense = Wordnet Sense ? his Item -> Lexeme - Forms = Wordnet Synset ?
Thad +ThadGuidry https://www.google.com/+ThadGuidry
Am 16.09.2016 um 20:11 schrieb Thad Guidry:
Denny,
I would suggest to use https://en.wiktionary.org/wiki/product as that strawman proposal. Because it has 2 levels of Senses. 3. Anything that is produced (contains 6 sub-senses)
Modelling sub-senses is a completely different can of worms. The proposed model doesn't allow this directly (we try to avoid recursive structures), but it can be done using statements.
Your example doesn't really say anything about how lexemes could be connected to items as labels/aliases, which is, i believe, what Gerard and Denny were discussing.
My usage of "Sense" and "From" follows https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2013-08 which in turn follows the LEMON model http://lemon-model.net/.
Synsets are not directly modeled, but it's possible to construct them via statements.
Daniel,
I wasn't trying to help solve the issues - I'll be quite now :)
I was helping to expose one of your test cases :)
'product' is a lexeme - a headword - a basic unit of meaning that has a 'set of forms' and those have 'a set of definitions' Here are some of its forms: 1. product 2. products 3. producing 4. production 5. ...
Here are some definitions against PRODUCT: 1. product - a commodity offered for sale. 2. product - a quantity obtained by multiplication of two or more numbers 3. product - anything that is produced.
But a thought just occured to me... A. In order to model this perhaps would be to have those headwords stored in Wikidata. Those headwords ideally would not actually be a Q or a P ... but what about instead ... L ? Wrapping the graph structure itself ? Pros / Cons ?
B. or do we go with Daniel's suggestion of linking out to headwords and not actually storing them in Wikidata ? Pros / Cons ?
Thad +ThadGuidry https://www.google.com/+ThadGuidry
Am 16.09.2016 um 20:46 schrieb Thad Guidry:
Daniel,
I wasn't trying to help solve the issues - I'll be quite now :)
I was helping to expose one of your test cases :)
Ha, sorry for sounding harsh, and thanks for pointing me to "product"! It's a good test case indeed.
'product' is a lexeme - a headword - a basic unit of meaning that has a 'set of forms' and those have 'a set of definitions'
In the current model, a Lexeme has forms and senses. Forms don't have senses directly, the meanings should apply to all forms. This means lexemes have to be split with higher granularity:
* product (English noun) would be one lexeme, with "products" being the plural form, and "product's" the genitive, and "products'" the plural genitive. Sense include the ones you mentioned. * (to) produce (English verb) would be another lexeme, with forms like "produces", "produced", "producing", etc, and senses meaning "to create", "to show", "to make available", etc * production (English noun) would be another lexeme, with other forms and senses. * produce (English noun) would be another * producer (English noun) would be another * produced (English adjective) another etc...
These lexemes can be linked using some kind of "derived from" statements.
But a thought just occured to me... A. In order to model this perhaps would be to have those headwords stored in Wikidata. Those headwords ideally would not actually be a Q or a P ... but what about instead ... L ? Wrapping the graph structure itself ? Pros / Cons ?
That's the plan, yes: Have lexemes (L...) on wikidata, which wrap the structure of forms and senses, and has statements for the lexeme, as well as for each form and each sense.
We don't currently plan a "super-structure" for wrapping derived/related lexemse (product, produce, production, etc). They would just be inter-linked by statements.
B. or do we go with Daniel's suggestion of linking out to headwords and not actually storing them in Wikidata ? Pros / Cons ?
The link I suggest is between items (Q...) and lexemes (L...), both on Wikidata.
And just to point out - even though there are no plans to accommodate the superstructures in the data model directly, it should be noted that the current data model already is flexible to have it, i.e. if the community so wishes they can create Lexemes which represent the "root" of a word like "produc-" and then explicitly link these with statements from the Lexemes for "production", "producer", etc. Or not. It could instead try to model it with statements pertaining the etymology of the words. Or not.
The Wiktionary data model is not supposed to express a specific theory of linguistics, just as the Wikidata data model is not supposed to express a specific theory of ontology. It is supposed to be flexible enough to work with whatever the community decides it wants to express, sometimes even contradictory statements, and the ability to source them to references.
On Mon, Sep 19, 2016 at 6:05 AM Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 16.09.2016 um 20:46 schrieb Thad Guidry:
Daniel,
I wasn't trying to help solve the issues - I'll be quite now :)
I was helping to expose one of your test cases :)
Ha, sorry for sounding harsh, and thanks for pointing me to "product"! It's a good test case indeed.
'product' is a lexeme - a headword - a basic unit of meaning that has a
'set of
forms' and those have 'a set of definitions'
In the current model, a Lexeme has forms and senses. Forms don't have senses directly, the meanings should apply to all forms. This means lexemes have to be split with higher granularity:
- product (English noun) would be one lexeme, with "products" being the
plural form, and "product's" the genitive, and "products'" the plural genitive. Sense include the ones you mentioned.
- (to) produce (English verb) would be another lexeme, with forms like
"produces", "produced", "producing", etc, and senses meaning "to create", "to show", "to make available", etc
- production (English noun) would be another lexeme, with other forms and
senses.
- produce (English noun) would be another
- producer (English noun) would be another
- produced (English adjective) another
etc...
These lexemes can be linked using some kind of "derived from" statements.
But a thought just occured to me... A. In order to model this perhaps would be to have those headwords
stored in
Wikidata. Those headwords ideally would not actually be a Q or a P ...
but what
about instead ... L ? Wrapping the graph structure itself ? Pros /
Cons ?
That's the plan, yes: Have lexemes (L...) on wikidata, which wrap the structure of forms and senses, and has statements for the lexeme, as well as for each form and each sense.
We don't currently plan a "super-structure" for wrapping derived/related lexemse (product, produce, production, etc). They would just be inter-linked by statements.
B. or do we go with Daniel's suggestion of linking out to headwords and
not
actually storing them in Wikidata ? Pros / Cons ?
The link I suggest is between items (Q...) and lexemes (L...), both on Wikidata.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Denny,
Ah. very cool. So its currently supported just by the flexible nature of Wikidata's backing triplestore, Blazegraph and its generic graph structure, I assume what you mean.
So just having statements perform the linking to Lexemes that are just Q items themselves, but with a special statement that says... 'I am not an entity, but instead a Lexeme".
Can you or Daniel start with those few lexemes for 'Product' as Daniel and I mentioned , perhaps in Labs or somewhere, so that all of us can begin to see how this might work using statements ?
Thad +ThadGuidry https://www.google.com/+ThadGuidry
No, sorry, that is not what I meant. When I said "current data model" I meant the "currently proposed data model". Sorry for being sloppy.
So it assumes new entity types for Lexemes, which are not just special forms of Items. And it is not reliant on an underlying graph model.
Daniel already sketched out how 'product' may look like. Since the current implementation does not support Lexemes, I can not just put it on Labs.
But there could be a Lexeme "produc-" which is pointed to from Daniel's Lexemes for "to produce", "production", "producer", etc., which could point to "produc-" via a statement, say, "root word" or similar. In the end, it really is unclear whether that is correct or not, but it sure is a possibility that can represented with the currently proposed data model. Which properties exist, how they are linked to each other, etc., is all up to the collaborative decisions which the community has to make.
On Mon, Sep 19, 2016 at 12:38 PM Thad Guidry thadguidry@gmail.com wrote:
Denny,
Ah. very cool. So its currently supported just by the flexible nature of Wikidata's backing triplestore, Blazegraph and its generic graph structure, I assume what you mean.
So just having statements perform the linking to Lexemes that are just Q items themselves, but with a special statement that says... 'I am not an entity, but instead a Lexeme".
Can you or Daniel start with those few lexemes for 'Product' as Daniel and I mentioned , perhaps in Labs or somewhere, so that all of us can begin to see how this might work using statements ?
Thad +ThadGuidry https://www.google.com/+ThadGuidry
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Yes, that definitively is one promising approach (and I hope that we would make a rough impact analysis before deciding on it and implementing it, once the structures and data are there).
I wonder if there are other approaches that are somehow more subtle. But I cannot express what I am looking for, and maybe yours is already sufficiently close to optimal.
On Fri, Sep 16, 2016 at 11:00 AM Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 16.09.2016 um 19:41 schrieb Denny Vrandečić:
Yes, there should be some connection between items and lexemes, but I am
still
hazy about details on how exactly this should look like. If someone could actually make a strawman proposal, that would be great.
I think the connection should live in the statement space, and not be on
the
level of labels, but that is just a hunch. I'd be happy to see proposals
incoming.
My thinking is this:
On some Sense of a Lexeme, there is a Statement saying that this Sense refers to a given concept (Item). If the property for stating this is well-known, we can track the Sense-to-Item relationship in the database. We can then automatically show the lexeme's lemma as a (pseudo-)alias on the Item, and perhaps also use it (and maybe all forms of the lexeme!) for indexing the item for search. So:
from ( Lexeme - Sense - Statement -> Item ) we can derive ( Item -> Lexeme - Forms )
In the beginning of Wikidata, I was very reluctant about the software knowing about "magic" properties. Now I feel better about this, since wikidata properties are established as a permanent vocabulary that can be used by any software, including our own.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, It does make sense to link it to labels for many reasons. * Every word can safely exist in a dictionary, in Wikidata.. Typically names are not included in a dictionary but they could. * Labels are written based on the rules and exceptions of a language. These are different in every language and sometimes even within dialects. The way content is displayed in a dictionary differs from country to country as well. * Concepts are not very stable and the writing a label in a language is not stable either. The one thing that binds them is the label; often a change in a label and a change in the concept go together. * Synonyms differ in label not in concept.
I REALLY wonder why you think you can do this with statements.. In my opinion you can not do that without sacrifice. Thanks, GerardM
On 16 September 2016 at 19:41, Denny Vrandečić vrandecic@gmail.com wrote:
Yes, there should be some connection between items and lexemes, but I am still hazy about details on how exactly this should look like. If someone could actually make a strawman proposal, that would be great.
I think the connection should live in the statement space, and not be on the level of labels, but that is just a hunch. I'd be happy to see proposals incoming.
On Thu, Sep 15, 2016 at 10:00 PM Gerard Meijssen < gerard.meijssen@gmail.com> wrote:
Hoi, Please understand that for every label for a current item in Wikidata there should be one lexeme. It would be really helpful when all the new lexemes added are associated with labels. You will then be able to show an item with the conjugation as is preferred for a language.Currently this is not our practise.
When we associate labels with lexemes, we have in fact the missing functionality like indicating that a specific lexeme was preferred up to a point. It allows for people to understand where "Batavia" was and why you will not find "Jakarta" in certain papers. Thanks, GerardM
On 15 September 2016 at 17:40, Jan Berkel jan@berkel.fr wrote:
*- How wikidata and wiktionary databases will be synchronized?* New entity types will be created in Wikidata database, with new ids (ex. L for lexemes). A Wiktionary will have the possibility to include data from Wikidata in their pages (the complete entity or only some chosen statements, as the community decides)
The pdf mentions 4 new entity types: Lexeme, Statement, Form, Embedded (?). Curious, was the existing data model not flexible enough?
Will these new entities be restricted to the usage in a lexicographical context, i.e. Wiktionary? How will they fit into the existing data model, will there be links from existing Wikidata items to the new entities? (i.e. how will Wikidata benefit from the new data?)
*- Will editing wiktionary change?* Yes, changes will happen, but we're working on making editing Wiktionary easier. Soon as we can provide some mockups, we will share them with you for collecting feedbacks.
Making contributing to Wiktionary easier will be a huge help. Right now the learning curve is extremely steep, and turning away potential contributors.
One thing to keep in mind is that Wiktionary is more than just the content in the page namespace. A big part of what you see is actually generated dynamically, for example transliteration, pronunciation and grammatical forms (conjugations, plurals etc).
I imagine in an integrated Wikidata/Wiktionary world "content" and code lives in various places, and we'll have a range of automated processes to copy things back and forth, and to automatically create new entries derived from existing ones?
– Jan
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Quick clarification:
Am 15.09.2016 um 17:40 schrieb Jan Berkel:
The pdf mentions 4 new entity types: Lexeme, Statement, Form, Embedded (?).
"Embedded" isn't a separate type, it refers to the fact that Senses and Forms are stored on the same page as "their" Lexeme. "Statement" isn't an entity, I assume you meant to write "Sense".
Curious, was the existing data model not flexible enough?
It was not expressive enough, no; it would be possible to use items to model lexemes, but it would be very annoying and complicated. You would need separate items for each form and sense, and need to keep track of them for deletion, undeletion, etc.
Will these new entities be restricted to the usage in a lexicographical context, i.e. Wiktionary?
It will also be accessible from Wikipedia and other wikis.
How will they fit into the existing data model, will there be links from existing Wikidata items to the new entities? (i.e. how will Wikidata benefit from the new data?)
Yes, there will be cross-linking.
I imagine in an integrated Wikidata/Wiktionary world "content" and code lives in various places, and we'll have a range of automated processes to copy things back and forth, and to automatically create new entries derived from existing ones?
It would be transcluded and generated, like with templates and and Lua. Not so much copied, with bots.
A substantial amount of work in the LOD community seems to have gone into Ontolex:
https://www.w3.org/community/ontolex/wiki/Final_Model_Specification
Is there any concern with aligning WD's model to this standard?
Cheers,
Eric Scott
On 09/13/2016 06:17 AM, Lydia Pintscher wrote:
Hey everyone :)
Wiktionary is our third-largest sister project, both in term of active editors and readers. It is a unique resource, with the goal to provide a dictionary for every language, in every language. Since the beginning of Wikidata but increasingly over the past months I have been getting more and more requests for supporting Wiktionary and lexicographical data in Wikidata. Having this data available openly and freely licensed would be a major step forward in automated translation, text analysis, text generation and much more. It will enable and ease research. And most importantly it will enable the individual Wiktionary communities to work more closely together and benefit from each other’s work.
With this and the increased demand to support Wikimedia Commons with Wikidata, we have looked at the bigger picture and our options. I am seeing a lot of overlap in the work we need to do to support Wiktionary and Commons. I am also seeing increasing pressure to store lexicographical data in existing items (which would be bad for many reasons).
Because of this we will start implementing support for Wiktionary in parallel to Commons based on our annual plan and quarterly plans. We contacted several of our partners in order to get funding for this additional work. I am happy that Google agreed to provide funding (restricted to work on Wikidata). With this we can reorganize our team and set up one part of the team to continue working on building out the core of Wikidata and support for Wikipedia and Commons and the other part will concentrate on Wiktionary. (To support and to extend our work around Wikidata with the help of external funding sources was our plan in our annual plan 2016: https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2015-2016_round1/Wikime...)
As a next step I’d like us all to have another careful look at the latest proposal at https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development. It has been online for input in its current form for a year and the first version is 3 years old now. So I am confident that the proposal is in a good shape to start implementation. However I’d like to do a last round of feedback with you all to make sure the concept really is sane. To make it easier to understand there is now also a pdf explaining the concept in a slightly different way: https://commons.wikimedia.org/wiki/File:Wikidata_for_Wiktionary_announcement... Please do go ahead and review it. If you have comments or questions please leave them on the talk page of the latest proposal at https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary/Development/Proposals.... I’d be especially interested in feedback from editors who are familiar with both Wiktionary and Wikidata.
Getting support for Wiktionary done - just like for Commons - will take some time but I am really excited about the opportunities it will open up especially for languages that have so far not gotten much or any technological support.
Cheers Lydia
Am 21.09.2016 um 19:23 schrieb Eric Scott:
A substantial amount of work in the LOD community seems to have gone into Ontolex:
https://www.w3.org/community/ontolex/wiki/Final_Model_Specification
Is there any concern with aligning WD's model to this standard?
Thanks for pointing to this!
From a first look, the models seem to roughly align:
What we call a "Lexeme" corresponds to a "Lexical Entry" in ontolex. What we call a "Form" corresponds to a "Form" in ontolex. What we call a "Sense" corresponds to a "Lexical Sense & Reference" in ontolex, although in ontolex, a reference to a Concept is required, while in our model that reference would be optional, but a natural language gloss is required.
So the models seem to match fine on a conceptual level. Perhaps someone with more expertise in RDF modeling can provide a more detailed analysis.
The DBnary project "Wiktionary as Linguistic Linked Open Data" at:
http://kaiko.getalp.org/about-dbnary/ http://kaiko.getalp.org/sparql
can be used as a reference.
I am basing my IEG project to visualize etymologies from Wktionary on it:
https://meta.wikimedia.org/wiki/Grants:IEG/A_graphical_and_interactive_etymo... is based on at project/
Cheers,
Ester
On Wed, Sep 21, 2016 at 7:39 PM, Daniel Kinzler <daniel.kinzler@wikimedia.de
wrote:
Am 21.09.2016 um 19:23 schrieb Eric Scott:
A substantial amount of work in the LOD community seems to have gone
into Ontolex:
https://www.w3.org/community/ontolex/wiki/Final_Model_Specification
Is there any concern with aligning WD's model to this standard?
Thanks for pointing to this!
From a first look, the models seem to roughly align:
What we call a "Lexeme" corresponds to a "Lexical Entry" in ontolex. What we call a "Form" corresponds to a "Form" in ontolex. What we call a "Sense" corresponds to a "Lexical Sense & Reference" in ontolex, although in ontolex, a reference to a Concept is required, while in our model that reference would be optional, but a natural language gloss is required.
So the models seem to match fine on a conceptual level. Perhaps someone with more expertise in RDF modeling can provide a more detailed analysis.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
It is no coincidence that the Wikidata Wiktionary data model and OntoLex fit well together: Wikidata was informed and followed the Lemon data model closely, and OntoLex also is rooted in Lemon.
It's good that both built on the same solid results from linguistics ;)
On Wed, Sep 21, 2016 at 10:55 AM Ester Pantaleo esterpantaleo@gmail.com wrote:
The DBnary project "Wiktionary as Linguistic Linked Open Data" at:
http://kaiko.getalp.org/about-dbnary/ http://kaiko.getalp.org/sparql
can be used as a reference.
I am basing my IEG project to visualize etymologies from Wktionary on it:
https://meta.wikimedia.org/wiki/Grants:IEG/A_graphical_and_interactive_etymo... is based on at project/
Cheers,
Ester
On Wed, Sep 21, 2016 at 7:39 PM, Daniel Kinzler < daniel.kinzler@wikimedia.de> wrote:
Am 21.09.2016 um 19:23 schrieb Eric Scott:
A substantial amount of work in the LOD community seems to have gone
into Ontolex:
https://www.w3.org/community/ontolex/wiki/Final_Model_Specification
Is there any concern with aligning WD's model to this standard?
Thanks for pointing to this!
From a first look, the models seem to roughly align:
What we call a "Lexeme" corresponds to a "Lexical Entry" in ontolex. What we call a "Form" corresponds to a "Form" in ontolex. What we call a "Sense" corresponds to a "Lexical Sense & Reference" in ontolex, although in ontolex, a reference to a Concept is required, while in our model that reference would be optional, but a natural language gloss is required.
So the models seem to match fine on a conceptual level. Perhaps someone with more expertise in RDF modeling can provide a more detailed analysis.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata