[Wikidata-l] Wikidata for Wiktionary

List overview All Threads
Download

newer

older

Is it possible to use SQL on...

Wikidata identifier / VIAF / query...

Denny Vrandečić

6 May 2015 6 May '15

7:54 p.m.

It is rather clear that everyone wants Wikidata to also support Wiktionary, and there have been plenty of proposals in the last few years. I think that the latest proposals are sufficiently similar to go for the next step: a break down of the tasks needed to get this done.

Currently, the idea of having Wikidata supporting Wiktionary is stalled because it is regarded as a large monolithic task, and as such it is hard to plan and commit to. I tried to come up with a task break-down, and discussed it with Lydia and Daniel, and now, as said in the last office hour, here it is for discussion and community input.

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015...

I think it would be really awesome if we would start moving in this direction. Wiktionary supported by Wikidata could quickly become one of the crucial pieces of infrastructure for the Web as a whole, but in particular for Wikipedia and its future development.

Cheers, Denny

Attachments:

attachment.htm (text/html — 1.2 KB)

Show replies by date

Gerard Meijssen

6 May 6 May

9:53 p.m.

Hoi, Would it not make sense to FIRST finish a few things.. Like Commons and Query ? Thanks, GerardM

On 7 May 2015 at 04:54, Denny Vrandečić vrandecic@gmail.com wrote:

...

It is rather clear that everyone wants Wikidata to also support Wiktionary, and there have been plenty of proposals in the last few years. I think that the latest proposals are sufficiently similar to go for the next step: a break down of the tasks needed to get this done.

Currently, the idea of having Wikidata supporting Wiktionary is stalled because it is regarded as a large monolithic task, and as such it is hard to plan and commit to. I tried to come up with a task break-down, and discussed it with Lydia and Daniel, and now, as said in the last office hour, here it is for discussion and community input.

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015...

I think it would be really awesome if we would start moving in this direction. Wiktionary supported by Wikidata could quickly become one of the crucial pieces of infrastructure for the Web as a whole, but in particular for Wikipedia and its future development.

Cheers, Denny

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Denny Vrandečić

10:35 p.m.

The work on queries and arbitrary access is well on its way, and also the new UI is continually being developed and deployed. I don't think that it is too early to think and gather consensus on how the steps for Wiktionary could look like. I am certainly not proposing to stop the current work on queries, but merely to create realistic tasks for the Wiktionary phase of Wikidata.

On Wed, May 6, 2015, 21:54 Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Hoi, Would it not make sense to FIRST finish a few things.. Like Commons and Query ? Thanks, GerardM

On 7 May 2015 at 04:54, Denny Vrandečić vrandecic@gmail.com wrote:

...
It is rather clear that everyone wants Wikidata to also support Wiktionary, and there have been plenty of proposals in the last few years. I think that the latest proposals are sufficiently similar to go for the next step: a break down of the tasks needed to get this done.

Currently, the idea of having Wikidata supporting Wiktionary is stalled because it is regarded as a large monolithic task, and as such it is hard to plan and commit to. I tried to come up with a task break-down, and discussed it with Lydia and Daniel, and now, as said in the last office hour, here it is for discussion and community input.

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015...

I think it would be really awesome if we would start moving in this direction. Wiktionary supported by Wikidata could quickly become one of the crucial pieces of infrastructure for the Web as a whole, but in particular for Wikipedia and its future development.

Cheers, Denny

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

John Mark Vandenberg

11 p.m.

On Thu, May 7, 2015 at 2:53 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Hoi, Would it not make sense to FIRST finish a few things.. Like Commons and Query ?

One of the primary things Wikidata was supposed to do is manage interlanguage links for Wikimedia projects. That isnt finished until Wiktionary joins the other multi-language families in Wikidata.

It looks like Task 1 of this Wiktionary-Wikidata plan will achieve that goal, and the migration will be extremely quick. Hooray!

-- John Vandenberg

Gerard Meijssen

7 May 7 May

1:10 a.m.

Hoi, The interwiki links to Wiktionary are from an interwiki point of view EXTREMELY easy to do. The problem with those links is that they cannot be uniquely linked to existing items to Wikidata and thereby it becomes unrealistic to do it in a meaningful way at this time.

Wiktionary has one article for multiple lemmas in multiple languages and they are based on the way they are written NOT on being about a subject.

Query is not the only thing that is missing ... Commons is more acutely felt to be missing than Wiktionary.. PLEASE DO NOT PROCRASTINATE and do something that is "nice" because someone proposed something similar. First get the job done and first make Wikidata usable for my siter, my mother in the way that Reasonator is and Wikidata is not. Please consider monitoring the use of Wikidata... More relevant than Wiktionary at this time Thanks, GerardM

On 7 May 2015 at 08:00, John Mark Vandenberg jayvdb@gmail.com wrote:

...

On Thu, May 7, 2015 at 2:53 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Hoi, Would it not make sense to FIRST finish a few things.. Like Commons and Query ?

One of the primary things Wikidata was supposed to do is manage interlanguage links for Wikimedia projects. That isnt finished until Wiktionary joins the other multi-language families in Wikidata.

It looks like Task 1 of this Wiktionary-Wikidata plan will achieve that goal, and the migration will be extremely quick. Hooray!

-- John Vandenberg

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Smolenski Nikola

3:03 a.m.

Citiranje Gerard Meijssen gerard.meijssen@gmail.com:

...

The interwiki links to Wiktionary are from an interwiki point of view EXTREMELY easy to do. The problem with those links is that they cannot be uniquely linked to existing items to Wikidata and thereby it becomes unrealistic to do it in a meaningful way at this time.

Wiktionary has one article for multiple lemmas in multiple languages and they are based on the way they are written NOT on being about a subject.

Would it be possible to ask the Wiktionary community to stop with this practice? I have never understood why is it done in the first place, never saw any benefit from it, nor known who came with the idea and why.

Gerard Meijssen

3:17 a.m.

Hoi, The practice makes sense for Wiktionary. As a matter of fact I think I added quite a few with my bot. My point is not that it would not make sense, my point is that it does NOT easily connect to Wikidata. When a separate Wikibase is used for this ... fine. That makes sense. Thanks, GerardM

On 7 May 2015 at 12:03, Smolenski Nikola smolensk@eunet.rs wrote:

...

Citiranje Gerard Meijssen gerard.meijssen@gmail.com:

...
The interwiki links to Wiktionary are from an interwiki point of view EXTREMELY easy to do. The problem with those links is that they cannot be uniquely linked to existing items to Wikidata and thereby it becomes unrealistic to do it in a meaningful way at this time.

Wiktionary has one article for multiple lemmas in multiple languages and they are based on the way they are written NOT on being about a subject.

Would it be possible to ask the Wiktionary community to stop with this practice? I have never understood why is it done in the first place, never saw any benefit from it, nor known who came with the idea and why.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

3:43 a.m.

What you get on a Wiktionary page is a description of words in several languages with that particular spelling. Of course 1 spelling can also be several words in 1 language already.

It's at the level of the definition that one can link to the current Wikidata. Provided Wikidata wants to have entries for all those definitions. I'm not very active in Wiktionary anymore, but a template pointing to wikidata might make sense on the Wiktionary page.

Of course you'd prefer to link in the other direction. I guess a separate wikibase with links to WD would be better. Can those query languages query across more than 1 wikibase?

If they can, it may make sense to put our 'meta-data' of Openstreetmap in a dedicated wikibase too, but that's another discussion.

Polyglot

2015-05-07 12:03 GMT+02:00 Smolenski Nikola smolensk@eunet.rs:

...

Citiranje Gerard Meijssen gerard.meijssen@gmail.com:

...
The interwiki links to Wiktionary are from an interwiki point of view EXTREMELY easy to do. The problem with those links is that they cannot be uniquely linked to existing items to Wikidata and thereby it becomes unrealistic to do it in a meaningful way at this time.

Wiktionary has one article for multiple lemmas in multiple languages and they are based on the way they are written NOT on being about a subject.

Would it be possible to ask the Wiktionary community to stop with this practice? I have never understood why is it done in the first place, never saw any benefit from it, nor known who came with the idea and why.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Smolenski Nikola

10:17 a.m.

Citiranje Jo winfixit@gmail.com:

...

What you get on a Wiktionary page is a description of words in several languages with that particular spelling. Of course 1 spelling can also be several words in 1 language already.

And why? Why not having a separate page for every language, while the spelling would just be a disambiguation page? This would be easier for Wiktionary readers, writers and for linking with Wikidata.

...

2015-05-07 12:03 GMT+02:00 Smolenski Nikola smolensk@eunet.rs:

...
Citiranje Gerard Meijssen gerard.meijssen@gmail.com:

...
The interwiki links to Wiktionary are from an interwiki point of view EXTREMELY easy to do. The problem with those links is that they cannot

be

...
...
uniquely linked to existing items to Wikidata and thereby it becomes unrealistic to do it in a meaningful way at this time.

Wiktionary has one article for multiple lemmas in multiple languages and they are based on the way they are written NOT on being about a subject.

Would it be possible to ask the Wiktionary community to stop with this practice? I have never understood why is it done in the first place, never saw any benefit from it, nor known who came with the idea and why.

Yair Rand

10:27 a.m.

The Wiktionary communities tend to strongly disagree that splitting entries per language would be easier for either editors or readers. It has been discussed before numerous times over the years.

On Thu, May 7, 2015 at 1:17 PM, Smolenski Nikola smolensk@eunet.rs wrote:

...

Citiranje Jo winfixit@gmail.com:

...
What you get on a Wiktionary page is a description of words in several languages with that particular spelling. Of course 1 spelling can also be several words in 1 language already.

And why? Why not having a separate page for every language, while the spelling would just be a disambiguation page? This would be easier for Wiktionary readers, writers and for linking with Wikidata.

...
2015-05-07 12:03 GMT+02:00 Smolenski Nikola smolensk@eunet.rs:

...
Citiranje Gerard Meijssen gerard.meijssen@gmail.com:

...
The interwiki links to Wiktionary are from an interwiki point of view EXTREMELY easy to do. The problem with those links is that they

cannot

...
be

...
...
uniquely linked to existing items to Wikidata and thereby it becomes unrealistic to do it in a meaningful way at this time.

Wiktionary has one article for multiple lemmas in multiple languages

and

...
...
...
they are based on the way they are written NOT on being about a

subject.

...
...
Would it be possible to ask the Wiktionary community to stop with this practice? I have never understood why is it done in the first place, never saw

any

...
...
benefit from it, nor known who came with the idea and why.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Smolenski Nikola

12:13 p.m.

Citiranje Yair Rand yyairrand@gmail.com:

...

The Wiktionary communities tend to strongly disagree that splitting entries per language would be easier for either editors or readers. It has been discussed before numerous times over the years.

I do not see this strong disagreement. The last discussion about it was at http://en.wiktionary.org/wiki/Wiktionary:Grease_pit/2014/February#Embrace_th... and to me it seems that the majority of users support it.

(Other discussions are listed at http://en.wiktionary.org/wiki/Wiktionary:Per-language_pages_proposal#Past_di... )

...

On Thu, May 7, 2015 at 1:17 PM, Smolenski Nikola smolensk@eunet.rs wrote:

...
Citiranje Jo winfixit@gmail.com:

...
What you get on a Wiktionary page is a description of words in several languages with that particular spelling. Of course 1 spelling can also

be

...
...
several words in 1 language already.

And why? Why not having a separate page for every language, while the spelling would just be a disambiguation page? This would be easier for Wiktionary readers, writers and for linking with Wikidata.

...
2015-05-07 12:03 GMT+02:00 Smolenski Nikola smolensk@eunet.rs:

...
Citiranje Gerard Meijssen gerard.meijssen@gmail.com:

...
The interwiki links to Wiktionary are from an interwiki point of

view

...
...
...
...
EXTREMELY easy to do. The problem with those links is that they

cannot

...
be

...
...
uniquely linked to existing items to Wikidata and thereby it becomes unrealistic to do it in a meaningful way at this time.

Wiktionary has one article for multiple lemmas in multiple languages

and

...
...
...
they are based on the way they are written NOT on being about a

subject.

...
...
Would it be possible to ask the Wiktionary community to stop with this practice? I have never understood why is it done in the first place, never saw

any

...
...
benefit from it, nor known who came with the idea and why.

Andy Mabbett

1:53 p.m.

On 7 May 2015 at 18:27, Yair Rand yyairrand@gmail.com wrote:

...

The Wiktionary communities tend to strongly disagree that splitting entries per language would be easier for either editors or readers.

How many languages are currently used? How will this scale to ~300 languages?

-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

Federico Leva (Nemo)

2:25 p.m.

Andy Mabbett, 07/05/2015 22:53:

...

...
...
The Wiktionary communities tend to strongly disagree that splitting entries per language would be easier for either editors or readers.

How many languages are currently used? How will this scale to ~300 languages?

Hm? Last time I counted, the English Wiktionary alone used way more than 300 languages.

Nemo

Denny Vrandečić

9:19 p.m.

I would disagree with requiring the Wiktionary communities to change their ways. Instead we should adapt our plans to fit into the way they are set up.

Even if the English Wiktionary community would change to have per-language pages instead of the current system, it would be rather unlikely that all other language editions of Wiktionary would follow in a timely manner. I would prefer to leave this decision to the autonomy of the projects, and instead adapt to them (which is, by the way, what the proposal does).

Yair, as Daniel said, the current Wiktionary pages would not be mapped to Q-Items. Since this was unclear, I tried to update the text to make it clearer. Let me know if it is still confusing.

I do not think a separate Wikibase instance would be needed to provide the data for Wiktionary. I think this can and should be done on Wikidata. But as said by Milos and pointed out by Gerard, lexical knowledge does indeed require a different data schema. This is why the proposal introduces new entity types for lexemes, forms, and senses. The data model is mostly based on lexical ontologies that we surveyed, like LEMON and others.

On Thu, May 7, 2015 at 2:26 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...

Andy Mabbett, 07/05/2015 22:53:

...
...
...
The Wiktionary communities tend to strongly disagree that splitting

entries

...
...
...
per language would be easier for either editors or readers.

How many languages are currently used? How will this scale to ~300

languages?

Hm? Last time I counted, the English Wiktionary alone used way more than 300 languages.

Nemo

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

9:24 p.m.

Hoi, Given the opposition to having statements on the level of the label, it does not make sense to have Wiktionary included in Wikidata. Thanks, GerardM

On 8 May 2015 at 06:19, Denny Vrandečić vrandecic@gmail.com wrote:

...

I would disagree with requiring the Wiktionary communities to change their ways. Instead we should adapt our plans to fit into the way they are set up.

Even if the English Wiktionary community would change to have per-language pages instead of the current system, it would be rather unlikely that all other language editions of Wiktionary would follow in a timely manner. I would prefer to leave this decision to the autonomy of the projects, and instead adapt to them (which is, by the way, what the proposal does).

Yair, as Daniel said, the current Wiktionary pages would not be mapped to Q-Items. Since this was unclear, I tried to update the text to make it clearer. Let me know if it is still confusing.

I do not think a separate Wikibase instance would be needed to provide the data for Wiktionary. I think this can and should be done on Wikidata. But as said by Milos and pointed out by Gerard, lexical knowledge does indeed require a different data schema. This is why the proposal introduces new entity types for lexemes, forms, and senses. The data model is mostly based on lexical ontologies that we surveyed, like LEMON and others.

On Thu, May 7, 2015 at 2:26 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...
Andy Mabbett, 07/05/2015 22:53:

...
...
...
The Wiktionary communities tend to strongly disagree that splitting

entries

...
...
...
per language would be easier for either editors or readers.

How many languages are currently used? How will this scale to ~300

languages?

Hm? Last time I counted, the English Wiktionary alone used way more than 300 languages.

Nemo

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Denny Vrandečić

9:32 p.m.

I am not sure I understand what you are saying. The lexical data in Wikidata does allow for statements on Lexemes and Forms, as the proposal states explicitly.

On Thu, May 7, 2015 at 9:25 PM Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Hoi, Given the opposition to having statements on the level of the label, it does not make sense to have Wiktionary included in Wikidata. Thanks, GerardM

On 8 May 2015 at 06:19, Denny Vrandečić vrandecic@gmail.com wrote:

...
I would disagree with requiring the Wiktionary communities to change their ways. Instead we should adapt our plans to fit into the way they are set up.

Even if the English Wiktionary community would change to have per-language pages instead of the current system, it would be rather unlikely that all other language editions of Wiktionary would follow in a timely manner. I would prefer to leave this decision to the autonomy of the projects, and instead adapt to them (which is, by the way, what the proposal does).

Yair, as Daniel said, the current Wiktionary pages would not be mapped to Q-Items. Since this was unclear, I tried to update the text to make it clearer. Let me know if it is still confusing.

I do not think a separate Wikibase instance would be needed to provide the data for Wiktionary. I think this can and should be done on Wikidata. But as said by Milos and pointed out by Gerard, lexical knowledge does indeed require a different data schema. This is why the proposal introduces new entity types for lexemes, forms, and senses. The data model is mostly based on lexical ontologies that we surveyed, like LEMON and others.

On Thu, May 7, 2015 at 2:26 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...
Andy Mabbett, 07/05/2015 22:53:

...
...
...
The Wiktionary communities tend to strongly disagree that splitting

entries

...
...
...
per language would be easier for either editors or readers.

How many languages are currently used? How will this scale to ~300

languages?

Hm? Last time I counted, the English Wiktionary alone used way more than 300 languages.

Nemo

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Denny Vrandečić

10 p.m.

I mean, the lexical data in Wikidata according to the proposal would allow for statements on Lexemes and Forms. I slipped into the future for a moment ;)

On Thu, May 7, 2015 at 9:32 PM Denny Vrandečić vrandecic@gmail.com wrote:

...

I am not sure I understand what you are saying. The lexical data in Wikidata does allow for statements on Lexemes and Forms, as the proposal states explicitly.

On Thu, May 7, 2015 at 9:25 PM Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Hoi, Given the opposition to having statements on the level of the label, it does not make sense to have Wiktionary included in Wikidata. Thanks, GerardM

On 8 May 2015 at 06:19, Denny Vrandečić vrandecic@gmail.com wrote:

...
I would disagree with requiring the Wiktionary communities to change their ways. Instead we should adapt our plans to fit into the way they are set up.

Even if the English Wiktionary community would change to have per-language pages instead of the current system, it would be rather unlikely that all other language editions of Wiktionary would follow in a timely manner. I would prefer to leave this decision to the autonomy of the projects, and instead adapt to them (which is, by the way, what the proposal does).

Yair, as Daniel said, the current Wiktionary pages would not be mapped to Q-Items. Since this was unclear, I tried to update the text to make it clearer. Let me know if it is still confusing.

I do not think a separate Wikibase instance would be needed to provide the data for Wiktionary. I think this can and should be done on Wikidata. But as said by Milos and pointed out by Gerard, lexical knowledge does indeed require a different data schema. This is why the proposal introduces new entity types for lexemes, forms, and senses. The data model is mostly based on lexical ontologies that we surveyed, like LEMON and others.

On Thu, May 7, 2015 at 2:26 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...
Andy Mabbett, 07/05/2015 22:53:

...
...
>The Wiktionary communities tend to strongly disagree that splitting

entries

...
...
>per language would be easier for either editors or readers.

How many languages are currently used? How will this scale to ~300

languages?

Hm? Last time I counted, the English Wiktionary alone used way more than 300 languages.

Nemo

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

11:10 p.m.

Hoi, You do not address how it prevents redundancy. I do not care for lexemes nor forms when they do not incorporate labels. That is something that you can explain now. Thanks, GerardM

On 8 May 2015 at 07:00, Denny Vrandečić vrandecic@gmail.com wrote:

...

I mean, the lexical data in Wikidata according to the proposal would allow for statements on Lexemes and Forms. I slipped into the future for a moment ;)

On Thu, May 7, 2015 at 9:32 PM Denny Vrandečić vrandecic@gmail.com wrote:

...
I am not sure I understand what you are saying. The lexical data in Wikidata does allow for statements on Lexemes and Forms, as the proposal states explicitly.

On Thu, May 7, 2015 at 9:25 PM Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Hoi, Given the opposition to having statements on the level of the label, it does not make sense to have Wiktionary included in Wikidata. Thanks, GerardM

On 8 May 2015 at 06:19, Denny Vrandečić vrandecic@gmail.com wrote:

...
I would disagree with requiring the Wiktionary communities to change their ways. Instead we should adapt our plans to fit into the way they are set up.

Even if the English Wiktionary community would change to have per-language pages instead of the current system, it would be rather unlikely that all other language editions of Wiktionary would follow in a timely manner. I would prefer to leave this decision to the autonomy of the projects, and instead adapt to them (which is, by the way, what the proposal does).

Yair, as Daniel said, the current Wiktionary pages would not be mapped to Q-Items. Since this was unclear, I tried to update the text to make it clearer. Let me know if it is still confusing.

I do not think a separate Wikibase instance would be needed to provide the data for Wiktionary. I think this can and should be done on Wikidata. But as said by Milos and pointed out by Gerard, lexical knowledge does indeed require a different data schema. This is why the proposal introduces new entity types for lexemes, forms, and senses. The data model is mostly based on lexical ontologies that we surveyed, like LEMON and others.

On Thu, May 7, 2015 at 2:26 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...
Andy Mabbett, 07/05/2015 22:53:

...
> >The Wiktionary communities tend to strongly disagree that

splitting entries

...
> >per language would be easier for either editors or readers. How many languages are currently used? How will this scale to ~300

languages?

Hm? Last time I counted, the English Wiktionary alone used way more than 300 languages.

Nemo

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

10:05 p.m.

Hoi, I have asked repeatedly to be allowed to indicated on labels that they were in use up to a given time. The argument that labels are "only" for identification is imho not valid because it denies the need that cannot be expressed in a similar way. Having other constructs that do not address this does not make the issue go away. Thanks, GerardM

PS date is just one, alternate spelling is another, there are many more.

On 8 May 2015 at 06:32, Denny Vrandečić vrandecic@gmail.com wrote:

...

I am not sure I understand what you are saying. The lexical data in Wikidata does allow for statements on Lexemes and Forms, as the proposal states explicitly.

On Thu, May 7, 2015 at 9:25 PM Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Hoi, Given the opposition to having statements on the level of the label, it does not make sense to have Wiktionary included in Wikidata. Thanks, GerardM

On 8 May 2015 at 06:19, Denny Vrandečić vrandecic@gmail.com wrote:

...
I would disagree with requiring the Wiktionary communities to change their ways. Instead we should adapt our plans to fit into the way they are set up.

Even if the English Wiktionary community would change to have per-language pages instead of the current system, it would be rather unlikely that all other language editions of Wiktionary would follow in a timely manner. I would prefer to leave this decision to the autonomy of the projects, and instead adapt to them (which is, by the way, what the proposal does).

Yair, as Daniel said, the current Wiktionary pages would not be mapped to Q-Items. Since this was unclear, I tried to update the text to make it clearer. Let me know if it is still confusing.

I do not think a separate Wikibase instance would be needed to provide the data for Wiktionary. I think this can and should be done on Wikidata. But as said by Milos and pointed out by Gerard, lexical knowledge does indeed require a different data schema. This is why the proposal introduces new entity types for lexemes, forms, and senses. The data model is mostly based on lexical ontologies that we surveyed, like LEMON and others.

On Thu, May 7, 2015 at 2:26 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...
Andy Mabbett, 07/05/2015 22:53:

...
...
>The Wiktionary communities tend to strongly disagree that splitting

entries

...
...
>per language would be easier for either editors or readers.

How many languages are currently used? How will this scale to ~300

languages?

Hm? Last time I counted, the English Wiktionary alone used way more than 300 languages.

Nemo

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

10:19 p.m.

Hoi, Again I do not care for lexemes and forms when they are distinct from labels. I hate redundancy. Thanks, GerardM

On 8 May 2015 at 06:32, Denny Vrandečić vrandecic@gmail.com wrote:

...

I am not sure I understand what you are saying. The lexical data in Wikidata does allow for statements on Lexemes and Forms, as the proposal states explicitly.

On Thu, May 7, 2015 at 9:25 PM Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Hoi, Given the opposition to having statements on the level of the label, it does not make sense to have Wiktionary included in Wikidata. Thanks, GerardM

On 8 May 2015 at 06:19, Denny Vrandečić vrandecic@gmail.com wrote:

...
I would disagree with requiring the Wiktionary communities to change their ways. Instead we should adapt our plans to fit into the way they are set up.

Even if the English Wiktionary community would change to have per-language pages instead of the current system, it would be rather unlikely that all other language editions of Wiktionary would follow in a timely manner. I would prefer to leave this decision to the autonomy of the projects, and instead adapt to them (which is, by the way, what the proposal does).

Yair, as Daniel said, the current Wiktionary pages would not be mapped to Q-Items. Since this was unclear, I tried to update the text to make it clearer. Let me know if it is still confusing.

I do not think a separate Wikibase instance would be needed to provide the data for Wiktionary. I think this can and should be done on Wikidata. But as said by Milos and pointed out by Gerard, lexical knowledge does indeed require a different data schema. This is why the proposal introduces new entity types for lexemes, forms, and senses. The data model is mostly based on lexical ontologies that we surveyed, like LEMON and others.

On Thu, May 7, 2015 at 2:26 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...
Andy Mabbett, 07/05/2015 22:53:

...
...
>The Wiktionary communities tend to strongly disagree that splitting

entries

...
...
>per language would be easier for either editors or readers.

How many languages are currently used? How will this scale to ~300

languages?

Hm? Last time I counted, the English Wiktionary alone used way more than 300 languages.

Nemo

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Stas Malyshev

10:15 p.m.

Hi!

...

I do not think a separate Wikibase instance would be needed to provide the data for Wiktionary. I think this can and should be done on Wikidata. But as said by Milos and pointed out by Gerard, lexical

I am worried that having two different data sets within the same instance would be a problem for tools working with the data, and for humans too. And frankly, I don't see too much benefit - virtually all added value Wikidata has now is working with the assumption of the semantics of Wikidata values and properties. Everything that pertains to lexemes, forms, etc. will have to be built separately, so why do it within the same site and have all the mechanics act as a split brain? I would think having parallel instance of Wikibase would serve the same goal much better, while preserving all the benefits of using the Wikibase toolkit and basic data model. Ultimately, it's the same as having separate databases vs. having one huge database (or even one huge table) with columns marking virtual partitions - the former is much easier to handle if the sets are completely disjoint, as we'd have between Wikidata and Wiktionary, as far as I can see. Maybe I am missing some benefit joint structure would produce?

-- Stas Malyshev smalyshev@wikimedia.org

Lydia Pintscher

11:50 p.m.

On Fri, May 8, 2015 at 7:15 AM, Stas Malyshev smalyshev@wikimedia.org wrote:

...

I am worried that having two different data sets within the same instance would be a problem for tools working with the data, and for humans too. And frankly, I don't see too much benefit - virtually all added value Wikidata has now is working with the assumption of the semantics of Wikidata values and properties. Everything that pertains to lexemes, forms, etc. will have to be built separately, so why do it within the same site and have all the mechanics act as a split brain? I would think having parallel instance of Wikibase would serve the same goal much better, while preserving all the benefits of using the Wikibase toolkit and basic data model. Ultimately, it's the same as having separate databases vs. having one huge database (or even one huge table) with columns marking virtual partitions - the former is much easier to handle if the sets are completely disjoint, as we'd have between Wikidata and Wiktionary, as far as I can see. Maybe I am missing some benefit joint structure would produce?

The benefits of having it in one instance are huge imho. Our community exists and knows how to handle structured data by now. Processes/documentation/etc are set up. The world outside is starting to realize that Wikidata is the place to go to for structured data around Wikimedia now. And we probably do want easy connecting between items/properties/lexems etc. As we're talking about different entity types the data is easy enough to keep apart for those who want to.

Cheers Lydia

-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Stas Malyshev

8 May 8 May

12:33 a.m.

Hi!

...

The benefits of having it in one instance are huge imho. Our community exists and knows how to handle structured data by now. Processes/documentation/etc are set up. The world outside is starting to realize that Wikidata is the place to go to for structured data around Wikimedia now. And we probably do want easy connecting between

All this is true, but I don't see why this implies running only one instance of wikibase. We could run another instance under the same Wikidata umbrella, connect them (just as we are connecting other wikis with Wikidata), share relevant documentation, etc. - neither of that mandates running everything within the same database. I think we have a lot of experience here of running services that are different technically but unified by common goals and common purposes and linking them.

...

items/properties/lexems etc. As we're talking about different entity types the data is easy enough to keep apart for those who want to.

I'm not sure how easy that would be - I've seen a lot of code that assumes certain things work with all entities, now this code needs to be reworked to work with only two types of entities, or support many other types that behave very differently. And it's very easy to miss something and not discover it until we launch it and tools start to break because Lexems get into code that assumes something is either Item or Property. And I'm not talking about internal PHP code only - there's a lot of tools out there that neither WMF nor WMDE maintains. It's one thing to make new service (which btw I think is an awesome idea, just wanted to say it so that it would be clear than I am not criticizing the whole idea, just this aspect of it) and another add subtle changes to an existing one.

-- Stas Malyshev smalyshev@wikimedia.org

Lydia Pintscher

12:45 a.m.

On Fri, May 8, 2015 at 9:33 AM, Stas Malyshev smalyshev@wikimedia.org wrote:

...

Hi!

...
The benefits of having it in one instance are huge imho. Our community exists and knows how to handle structured data by now. Processes/documentation/etc are set up. The world outside is starting to realize that Wikidata is the place to go to for structured data around Wikimedia now. And we probably do want easy connecting between

All this is true, but I don't see why this implies running only one instance of wikibase. We could run another instance under the same Wikidata umbrella, connect them (just as we are connecting other wikis with Wikidata), share relevant documentation, etc. - neither of that mandates running everything within the same database. I think we have a lot of experience here of running services that are different technically but unified by common goals and common purposes and linking them.

I would argue we are actually really really bad at it ;-)

...

...
items/properties/lexems etc. As we're talking about different entity types the data is easy enough to keep apart for those who want to.

I'm not sure how easy that would be - I've seen a lot of code that assumes certain things work with all entities, now this code needs to be reworked to work with only two types of entities, or support many other types that behave very differently. And it's very easy to miss something and not discover it until we launch it and tools start to break because Lexems get into code that assumes something is either Item or Property. And I'm not talking about internal PHP code only - there's a lot of tools out there that neither WMF nor WMDE maintains. It's one thing to make new service (which btw I think is an awesome idea, just wanted to say it so that it would be clear than I am not criticizing the whole idea, just this aspect of it) and another add subtle changes to an existing one.

Yeah there are a lot of those indeed. The assumptions around entities need to go away anyway as we are tackling Commons support. So we'll have to bite that bullet one way or another.

Cheers Lydia

Federico Leva (Nemo)

6:33 a.m.

Lydia Pintscher, 08/05/2015 09:45:

...

...
I think we have a

...
lot of experience here of running services that are different technically but unified by common goals and common purposes and linking them.

I would argue we are actually really really bad at it;-)

+1. The Wikimedia community has been long able to think of all the Wikimedia projects as an organic whole. Software, on the other hand, too often forced innatural divisions.

Wiktionary, Wikipedia, Commons and Wikiquote (to name the main cases) link to each other all the time in a constructive division of labour. It makes no sense to make connections between them harder.

Nemo

Luca Martinelli

8:35 a.m.

2015-05-08 15:33 GMT+02:00 Federico Leva (Nemo) nemowiki@gmail.com:

...

+1. The Wikimedia community has been long able to think of all the Wikimedia projects as an organic whole. Software, on the other hand, too often forced innatural divisions.

Wiktionary, Wikipedia, Commons and Wikiquote (to name the main cases) link to each other all the time in a constructive division of labour. It makes no sense to make connections between them harder.

I start from here, since Nemo got the point IMHO: the fact that every project has its own scope doesn't imply that the whole of the community works on different scopes - we just decided to split up our duties among ourselves. But it's not just that.

TL;DR: Wikidata and Wiktionary deal with the same things (concepts), therefore are best-suited for each other, given some needed adaptations. Structured Data and Structured Wikiquote deal with different things (objects), therefore are not to be considered good examples.

Long version here:

In theory, one might just agree that a separate instance of Wikibase might be the best solution for Wiktionary, but Structured Data and Structured Wikiquote are different from a theoretical "Structured Wiktionary", because they respectively deal with images, quotes and words.

Images and quotes are describable *objects*, as the Wiki* articles/pages are, and there are billions and billions of those objects out there. This is the main, if not just the only, reason why we *have* to put up a separate instance of Wikibase to deal with them: thinking that Wikidata might deal with such an infinite task is just nuts.

Words, on the other hands, are describable *concepts*, not objects. They can be linked one another by relation, they have synonyms and opposites, they can be regrouped or separated, etcetera, which is exactly what we're currently doing with Wikidata items.

I know, words are even more than images and quotes, so it would be even more nuts to think to deal with this just with Wikidata - but Wikidata is *already* structured for dealing with concepts, making it the best choice for integrating data from Wiktionary.

In other words, Wikidata and Wiktionary both work with *concepts*, while all the other projects work with *objects*. From a more practical point of view, why should I have a Wikidata item about, say, present tense[1] *AND* a completely similar item on "Structured Wiktionary"? It's the same concept, why should I have it in two different-yet-linked databases, belonging to and maintained by the very same community? Why can't we work something out to keep all informations just in one database?

This is why I think that setting up a separate Wikibase for Wiktionary might end up in doubling our efforts and splitting our communities, which is exactly the opposite of what we need to do (halving the efforts and doubling the community).[2]

Sorry for the long post. :)

[1] https://www.wikidata.org/wiki/Q192613 [2] Not sure if I have to remark this, but please, PLEASE, note this is just an exaggeration for argument's sake, I have of course no data that might confirm factually that the WD community will surge by 100%. I just want to make clear my concept (heh).

-- Luca "Sannita" Martinelli http://it.wikipedia.org/wiki/Utente:Sannita

Denny Vrandečić

9:18 a.m.

I very much agree with Lydia and Nemo that there should not be a separate Wikibase instance for Wiktionary data. Having a single community in a single project, and not having to vote for admins here and there, have two different watchlists, have documentation be repeated, policies being rediscussed, etc. sounds like a smart move. Also, the Item-data and the Lexical-data would be much tighter connected than with any other project, and queries should be able to seamlessly work between them.

The only reason Commons is proposed to have its own instance is because the actual multimedia files are there, and the community caring about those files is there and should work in one place. If there was only a single Wiktionary project, it might also be worth to consider having the structured data there - but since there are more than 150 editions of Wiktionary, a centralized place makes more sense. And since we already have Wikidata for that, I don't see the advantage of splitting the potential communities.

On Fri, May 8, 2015 at 8:35 AM Luca Martinelli martinelliluca@gmail.com wrote:

...

2015-05-08 15:33 GMT+02:00 Federico Leva (Nemo) nemowiki@gmail.com:

...
+1. The Wikimedia community has been long able to think of all the

Wikimedia

...
projects as an organic whole. Software, on the other hand, too often

forced

...
innatural divisions.

Wiktionary, Wikipedia, Commons and Wikiquote (to name the main cases)

link

...
to each other all the time in a constructive division of labour. It

makes no

...
sense to make connections between them harder.

I start from here, since Nemo got the point IMHO: the fact that every project has its own scope doesn't imply that the whole of the community works on different scopes - we just decided to split up our duties among ourselves. But it's not just that.

TL;DR: Wikidata and Wiktionary deal with the same things (concepts), therefore are best-suited for each other, given some needed adaptations. Structured Data and Structured Wikiquote deal with different things (objects), therefore are not to be considered good examples.

Long version here:

In theory, one might just agree that a separate instance of Wikibase might be the best solution for Wiktionary, but Structured Data and Structured Wikiquote are different from a theoretical "Structured Wiktionary", because they respectively deal with images, quotes and words.

Images and quotes are describable *objects*, as the Wiki* articles/pages are, and there are billions and billions of those objects out there. This is the main, if not just the only, reason why we *have* to put up a separate instance of Wikibase to deal with them: thinking that Wikidata might deal with such an infinite task is just nuts.

Words, on the other hands, are describable *concepts*, not objects. They can be linked one another by relation, they have synonyms and opposites, they can be regrouped or separated, etcetera, which is exactly what we're currently doing with Wikidata items.

I know, words are even more than images and quotes, so it would be even more nuts to think to deal with this just with Wikidata - but Wikidata is *already* structured for dealing with concepts, making it the best choice for integrating data from Wiktionary.

In other words, Wikidata and Wiktionary both work with *concepts*, while all the other projects work with *objects*. From a more practical point of view, why should I have a Wikidata item about, say, present tense[1] *AND* a completely similar item on "Structured Wiktionary"? It's the same concept, why should I have it in two different-yet-linked databases, belonging to and maintained by the very same community? Why can't we work something out to keep all informations just in one database?

This is why I think that setting up a separate Wikibase for Wiktionary might end up in doubling our efforts and splitting our communities, which is exactly the opposite of what we need to do (halving the efforts and doubling the community).[2]

Sorry for the long post. :)

[1] https://www.wikidata.org/wiki/Q192613 [2] Not sure if I have to remark this, but please, PLEASE, note this is just an exaggeration for argument's sake, I have of course no data that might confirm factually that the WD community will surge by 100%. I just want to make clear my concept (heh).

-- Luca "Sannita" Martinelli http://it.wikipedia.org/wiki/Utente:Sannita

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

11:46 a.m.

Hoi, Please do appreciate that OmegaWiki, originally WiktionaryZ, really wants to be considered in all this. It is the grand daddy of Wikidata and it does combine everything you would want as far as lexical data is concerned. Thanks, GerardM

On 8 May 2015 at 18:18, Denny Vrandečić vrandecic@gmail.com wrote:

...

I very much agree with Lydia and Nemo that there should not be a separate Wikibase instance for Wiktionary data. Having a single community in a single project, and not having to vote for admins here and there, have two different watchlists, have documentation be repeated, policies being rediscussed, etc. sounds like a smart move. Also, the Item-data and the Lexical-data would be much tighter connected than with any other project, and queries should be able to seamlessly work between them.

The only reason Commons is proposed to have its own instance is because the actual multimedia files are there, and the community caring about those files is there and should work in one place. If there was only a single Wiktionary project, it might also be worth to consider having the structured data there - but since there are more than 150 editions of Wiktionary, a centralized place makes more sense. And since we already have Wikidata for that, I don't see the advantage of splitting the potential communities.

On Fri, May 8, 2015 at 8:35 AM Luca Martinelli martinelliluca@gmail.com wrote:

...
2015-05-08 15:33 GMT+02:00 Federico Leva (Nemo) nemowiki@gmail.com:

...
+1. The Wikimedia community has been long able to think of all the

Wikimedia

...
projects as an organic whole. Software, on the other hand, too often

forced

...
innatural divisions.

Wiktionary, Wikipedia, Commons and Wikiquote (to name the main cases)

link

...
to each other all the time in a constructive division of labour. It

makes no

...
sense to make connections between them harder.

I start from here, since Nemo got the point IMHO: the fact that every project has its own scope doesn't imply that the whole of the community works on different scopes - we just decided to split up our duties among ourselves. But it's not just that.

TL;DR: Wikidata and Wiktionary deal with the same things (concepts), therefore are best-suited for each other, given some needed adaptations. Structured Data and Structured Wikiquote deal with different things (objects), therefore are not to be considered good examples.

Long version here:

In theory, one might just agree that a separate instance of Wikibase might be the best solution for Wiktionary, but Structured Data and Structured Wikiquote are different from a theoretical "Structured Wiktionary", because they respectively deal with images, quotes and words.

Images and quotes are describable *objects*, as the Wiki* articles/pages are, and there are billions and billions of those objects out there. This is the main, if not just the only, reason why we *have* to put up a separate instance of Wikibase to deal with them: thinking that Wikidata might deal with such an infinite task is just nuts.

Words, on the other hands, are describable *concepts*, not objects. They can be linked one another by relation, they have synonyms and opposites, they can be regrouped or separated, etcetera, which is exactly what we're currently doing with Wikidata items.

I know, words are even more than images and quotes, so it would be even more nuts to think to deal with this just with Wikidata - but Wikidata is *already* structured for dealing with concepts, making it the best choice for integrating data from Wiktionary.

In other words, Wikidata and Wiktionary both work with *concepts*, while all the other projects work with *objects*. From a more practical point of view, why should I have a Wikidata item about, say, present tense[1] *AND* a completely similar item on "Structured Wiktionary"? It's the same concept, why should I have it in two different-yet-linked databases, belonging to and maintained by the very same community? Why can't we work something out to keep all informations just in one database?

This is why I think that setting up a separate Wikibase for Wiktionary might end up in doubling our efforts and splitting our communities, which is exactly the opposite of what we need to do (halving the efforts and doubling the community).[2]

Sorry for the long post. :)

[1] https://www.wikidata.org/wiki/Q192613 [2] Not sure if I have to remark this, but please, PLEASE, note this is just an exaggeration for argument's sake, I have of course no data that might confirm factually that the WD community will surge by 100%. I just want to make clear my concept (heh).

-- Luca "Sannita" Martinelli http://it.wikipedia.org/wiki/Utente:Sannita

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Denny Vrandečić

12:33 p.m.

I very much appreciate OmegaWiki - it has been a trailblazer for many of the ideas in Wikidata, and as you say, it is the granddaddy in many ways. OmegaWiki has been extensively looked into and the results from that have directly flown into the current proposal. The write up of that analysis can be found here:

https://www.wikidata.org/wiki/Wikidata:Comparison_of_Projects_and_Proposals_...

On Fri, May 8, 2015 at 11:46 AM Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Hoi, Please do appreciate that OmegaWiki, originally WiktionaryZ, really wants to be considered in all this. It is the grand daddy of Wikidata and it does combine everything you would want as far as lexical data is concerned. Thanks, GerardM

On 8 May 2015 at 18:18, Denny Vrandečić vrandecic@gmail.com wrote:

...
I very much agree with Lydia and Nemo that there should not be a separate Wikibase instance for Wiktionary data. Having a single community in a single project, and not having to vote for admins here and there, have two different watchlists, have documentation be repeated, policies being rediscussed, etc. sounds like a smart move. Also, the Item-data and the Lexical-data would be much tighter connected than with any other project, and queries should be able to seamlessly work between them.

The only reason Commons is proposed to have its own instance is because the actual multimedia files are there, and the community caring about those files is there and should work in one place. If there was only a single Wiktionary project, it might also be worth to consider having the structured data there - but since there are more than 150 editions of Wiktionary, a centralized place makes more sense. And since we already have Wikidata for that, I don't see the advantage of splitting the potential communities.

On Fri, May 8, 2015 at 8:35 AM Luca Martinelli martinelliluca@gmail.com wrote:

...
2015-05-08 15:33 GMT+02:00 Federico Leva (Nemo) nemowiki@gmail.com:

...
+1. The Wikimedia community has been long able to think of all the

Wikimedia

...
projects as an organic whole. Software, on the other hand, too often

forced

...
innatural divisions.

Wiktionary, Wikipedia, Commons and Wikiquote (to name the main cases)

link

...
to each other all the time in a constructive division of labour. It

makes no

...
sense to make connections between them harder.

I start from here, since Nemo got the point IMHO: the fact that every project has its own scope doesn't imply that the whole of the community works on different scopes - we just decided to split up our duties among ourselves. But it's not just that.

TL;DR: Wikidata and Wiktionary deal with the same things (concepts), therefore are best-suited for each other, given some needed adaptations. Structured Data and Structured Wikiquote deal with different things (objects), therefore are not to be considered good examples.

Long version here:

In theory, one might just agree that a separate instance of Wikibase might be the best solution for Wiktionary, but Structured Data and Structured Wikiquote are different from a theoretical "Structured Wiktionary", because they respectively deal with images, quotes and words.

Images and quotes are describable *objects*, as the Wiki* articles/pages are, and there are billions and billions of those objects out there. This is the main, if not just the only, reason why we *have* to put up a separate instance of Wikibase to deal with them: thinking that Wikidata might deal with such an infinite task is just nuts.

Words, on the other hands, are describable *concepts*, not objects. They can be linked one another by relation, they have synonyms and opposites, they can be regrouped or separated, etcetera, which is exactly what we're currently doing with Wikidata items.

I know, words are even more than images and quotes, so it would be even more nuts to think to deal with this just with Wikidata - but Wikidata is *already* structured for dealing with concepts, making it the best choice for integrating data from Wiktionary.

In other words, Wikidata and Wiktionary both work with *concepts*, while all the other projects work with *objects*. From a more practical point of view, why should I have a Wikidata item about, say, present tense[1] *AND* a completely similar item on "Structured Wiktionary"? It's the same concept, why should I have it in two different-yet-linked databases, belonging to and maintained by the very same community? Why can't we work something out to keep all informations just in one database?

This is why I think that setting up a separate Wikibase for Wiktionary might end up in doubling our efforts and splitting our communities, which is exactly the opposite of what we need to do (halving the efforts and doubling the community).[2]

Sorry for the long post. :)

[1] https://www.wikidata.org/wiki/Q192613 [2] Not sure if I have to remark this, but please, PLEASE, note this is just an exaggeration for argument's sake, I have of course no data that might confirm factually that the WD community will surge by 100%. I just want to make clear my concept (heh).

-- Luca "Sannita" Martinelli http://it.wikipedia.org/wiki/Utente:Sannita

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

9 May 9 May

12:33 a.m.

Hoi, I have read it, I had read it before, I commented at the time and imho it is flawed.

What I am waiting for is why there is this insistence on not having attributes on labels, why there is a "need" for the constructs that you mentioned that only duplicate what is already there. It is an answer I asked before and I framed it in this way because OmegaWiki proves that there this can be done.

Please do not frame it in terms that can be understood. "Because we insist on it" is a fine answer, it is the answer I have had so far. Thanks, GerardM

On 8 May 2015 at 21:33, Denny Vrandečić vrandecic@google.com wrote:

...

I very much appreciate OmegaWiki - it has been a trailblazer for many of the ideas in Wikidata, and as you say, it is the granddaddy in many ways. OmegaWiki has been extensively looked into and the results from that have directly flown into the current proposal. The write up of that analysis can be found here:

https://www.wikidata.org/wiki/Wikidata:Comparison_of_Projects_and_Proposals_...

On Fri, May 8, 2015 at 11:46 AM Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Hoi, Please do appreciate that OmegaWiki, originally WiktionaryZ, really wants to be considered in all this. It is the grand daddy of Wikidata and it does combine everything you would want as far as lexical data is concerned. Thanks, GerardM

On 8 May 2015 at 18:18, Denny Vrandečić vrandecic@gmail.com wrote:

...
I very much agree with Lydia and Nemo that there should not be a separate Wikibase instance for Wiktionary data. Having a single community in a single project, and not having to vote for admins here and there, have two different watchlists, have documentation be repeated, policies being rediscussed, etc. sounds like a smart move. Also, the Item-data and the Lexical-data would be much tighter connected than with any other project, and queries should be able to seamlessly work between them.

The only reason Commons is proposed to have its own instance is because the actual multimedia files are there, and the community caring about those files is there and should work in one place. If there was only a single Wiktionary project, it might also be worth to consider having the structured data there - but since there are more than 150 editions of Wiktionary, a centralized place makes more sense. And since we already have Wikidata for that, I don't see the advantage of splitting the potential communities.

On Fri, May 8, 2015 at 8:35 AM Luca Martinelli martinelliluca@gmail.com wrote:

...
2015-05-08 15:33 GMT+02:00 Federico Leva (Nemo) nemowiki@gmail.com:

...
+1. The Wikimedia community has been long able to think of all the

Wikimedia

...
projects as an organic whole. Software, on the other hand, too often

forced

...
innatural divisions.

Wiktionary, Wikipedia, Commons and Wikiquote (to name the main cases)

link

...
to each other all the time in a constructive division of labour. It

makes no

...
sense to make connections between them harder.

I start from here, since Nemo got the point IMHO: the fact that every project has its own scope doesn't imply that the whole of the community works on different scopes - we just decided to split up our duties among ourselves. But it's not just that.

TL;DR: Wikidata and Wiktionary deal with the same things (concepts), therefore are best-suited for each other, given some needed adaptations. Structured Data and Structured Wikiquote deal with different things (objects), therefore are not to be considered good examples.

Long version here:

In theory, one might just agree that a separate instance of Wikibase might be the best solution for Wiktionary, but Structured Data and Structured Wikiquote are different from a theoretical "Structured Wiktionary", because they respectively deal with images, quotes and words.

Images and quotes are describable *objects*, as the Wiki* articles/pages are, and there are billions and billions of those objects out there. This is the main, if not just the only, reason why we *have* to put up a separate instance of Wikibase to deal with them: thinking that Wikidata might deal with such an infinite task is just nuts.

Words, on the other hands, are describable *concepts*, not objects. They can be linked one another by relation, they have synonyms and opposites, they can be regrouped or separated, etcetera, which is exactly what we're currently doing with Wikidata items.

I know, words are even more than images and quotes, so it would be even more nuts to think to deal with this just with Wikidata - but Wikidata is *already* structured for dealing with concepts, making it the best choice for integrating data from Wiktionary.

In other words, Wikidata and Wiktionary both work with *concepts*, while all the other projects work with *objects*. From a more practical point of view, why should I have a Wikidata item about, say, present tense[1] *AND* a completely similar item on "Structured Wiktionary"? It's the same concept, why should I have it in two different-yet-linked databases, belonging to and maintained by the very same community? Why can't we work something out to keep all informations just in one database?

This is why I think that setting up a separate Wikibase for Wiktionary might end up in doubling our efforts and splitting our communities, which is exactly the opposite of what we need to do (halving the efforts and doubling the community).[2]

Sorry for the long post. :)

[1] https://www.wikidata.org/wiki/Q192613 [2] Not sure if I have to remark this, but please, PLEASE, note this is just an exaggeration for argument's sake, I have of course no data that might confirm factually that the WD community will surge by 100%. I just want to make clear my concept (heh).

-- Luca "Sannita" Martinelli http://it.wikipedia.org/wiki/Utente:Sannita

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Paul Houle

8 May 8 May

9:30 a.m.

Concepts and words are different things, or better yet, words (word senses, ...) are a special kind of concept.

I was looking at what the data model for a system that supports logical representation of 100% of critical knowledge in business and technical documents over narrow domains.

One thing I tried was (more or less) Wikidata+Wordnet and I found the Wordnet part was difficult to apply. Where Wikidata concepts match text chunks it works OK, but trying to deal with the verbs and prepositions and all that stuff is labor intensive, hard to do correctly, and doesn't contribute much to machine readable semantics. It is more useful to model verb functions in terms of discontinuous chunks which form templates, i.e. often the verb and associated prepositions together are a good unit of modelling.

Super-Wordnet, however, will still be interesting to humans who might want to pin down exact word senses in a contract.

On Fri, May 8, 2015 at 11:35 AM, Luca Martinelli martinelliluca@gmail.com wrote:

...

2015-05-08 15:33 GMT+02:00 Federico Leva (Nemo) nemowiki@gmail.com:

...
+1. The Wikimedia community has been long able to think of all the

Wikimedia

...
projects as an organic whole. Software, on the other hand, too often

forced

...
innatural divisions.

Wiktionary, Wikipedia, Commons and Wikiquote (to name the main cases)

link

...
to each other all the time in a constructive division of labour. It

makes no

...
sense to make connections between them harder.

I start from here, since Nemo got the point IMHO: the fact that every project has its own scope doesn't imply that the whole of the community works on different scopes - we just decided to split up our duties among ourselves. But it's not just that.

TL;DR: Wikidata and Wiktionary deal with the same things (concepts), therefore are best-suited for each other, given some needed adaptations. Structured Data and Structured Wikiquote deal with different things (objects), therefore are not to be considered good examples.

Long version here:

In theory, one might just agree that a separate instance of Wikibase might be the best solution for Wiktionary, but Structured Data and Structured Wikiquote are different from a theoretical "Structured Wiktionary", because they respectively deal with images, quotes and words.

Images and quotes are describable *objects*, as the Wiki* articles/pages are, and there are billions and billions of those objects out there. This is the main, if not just the only, reason why we *have* to put up a separate instance of Wikibase to deal with them: thinking that Wikidata might deal with such an infinite task is just nuts.

Words, on the other hands, are describable *concepts*, not objects. They can be linked one another by relation, they have synonyms and opposites, they can be regrouped or separated, etcetera, which is exactly what we're currently doing with Wikidata items.

I know, words are even more than images and quotes, so it would be even more nuts to think to deal with this just with Wikidata - but Wikidata is *already* structured for dealing with concepts, making it the best choice for integrating data from Wiktionary.

In other words, Wikidata and Wiktionary both work with *concepts*, while all the other projects work with *objects*. From a more practical point of view, why should I have a Wikidata item about, say, present tense[1] *AND* a completely similar item on "Structured Wiktionary"? It's the same concept, why should I have it in two different-yet-linked databases, belonging to and maintained by the very same community? Why can't we work something out to keep all informations just in one database?

This is why I think that setting up a separate Wikibase for Wiktionary might end up in doubling our efforts and splitting our communities, which is exactly the opposite of what we need to do (halving the efforts and doubling the community).[2]

Sorry for the long post. :)

[1] https://www.wikidata.org/wiki/Q192613 [2] Not sure if I have to remark this, but please, PLEASE, note this is just an exaggeration for argument's sake, I have of course no data that might confirm factually that the WD community will surge by 100%. I just want to make clear my concept (heh).

-- Luca "Sannita" Martinelli http://it.wikipedia.org/wiki/Utente:Sannita

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Paul Houle *Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes* (607) 539 6254 paul.houle on Skype ontology2@gmail.com https://legalentityidentifier.info/lei/lookup http://legalentityidentifier.info/lei/lookup

Federico Leva (Nemo)

9:46 a.m.

Paul Houle, 08/05/2015 18:30:

...

Concepts and words are different things, or better yet, words (word senses, ...) are a special kind of concept.

I think however that Sannita's point is important and interesting. It can perhaps be illustrated with a simple point: Wikidata items (like Wikipedia articles) connect well to Commons categories, Wikiquote articles (authors/themes/works), Wikisource authors; they don't necessarily connect well to the building blocks (pages), like individual files, quotations, chapters. Similarly, Wiktionary is in large majority very overlapping and connected with the other projects, as long as you consider a subset of it (say, nominative form of nouns). The fact that Wiktionary contains an impressive mass of "other stuff" doesn't make it *so* special as to force a separate install, even though more aggressive/complete implementations of structured data might require one. Just like Wikiquote and Commons currently benefit (from) Wikidata even though one can imagine broader uses with different technical requirements.

Nemo

Luca Martinelli

9 May 9 May

2:52 a.m.

Nemo was more effective than me in explaining what I meant. For a partial excuse, I had to rewrite and simplify my message several times, because I was trying to make up my mind while writing. :)

L. Il 08/mag/2015 18:47, "Federico Leva (Nemo)" nemowiki@gmail.com ha scritto:

...

Paul Houle, 08/05/2015 18:30:

...
Concepts and words are different things, or better yet, words (word senses, ...) are a special kind of concept.

I think however that Sannita's point is important and interesting. It can perhaps be illustrated with a simple point: Wikidata items (like Wikipedia articles) connect well to Commons categories, Wikiquote articles (authors/themes/works), Wikisource authors; they don't necessarily connect well to the building blocks (pages), like individual files, quotations, chapters. Similarly, Wiktionary is in large majority very overlapping and connected with the other projects, as long as you consider a subset of it (say, nominative form of nouns). The fact that Wiktionary contains an impressive mass of "other stuff" doesn't make it *so* special as to force a separate install, even though more aggressive/complete implementations of structured data might require one. Just like Wikiquote and Commons currently benefit (from) Wikidata even though one can imagine broader uses with different technical requirements.

Nemo

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Markus Krötzsch

8 May 8 May

12:36 a.m.

On 08.05.2015 08:50, Lydia Pintscher wrote:

...

On Fri, May 8, 2015 at 7:15 AM, Stas Malyshev smalyshev@wikimedia.org wrote:

...
I am worried that having two different data sets within the same instance would be a problem for tools working with the data, and for humans too. And frankly, I don't see too much benefit - virtually all added value Wikidata has now is working with the assumption of the semantics of Wikidata values and properties. Everything that pertains to lexemes, forms, etc. will have to be built separately, so why do it within the same site and have all the mechanics act as a split brain? I would think having parallel instance of Wikibase would serve the same goal much better, while preserving all the benefits of using the Wikibase toolkit and basic data model. Ultimately, it's the same as having separate databases vs. having one huge database (or even one huge table) with columns marking virtual partitions - the former is much easier to handle if the sets are completely disjoint, as we'd have between Wikidata and Wiktionary, as far as I can see. Maybe I am missing some benefit joint structure would produce?

The benefits of having it in one instance are huge imho. Our community exists and knows how to handle structured data by now. Processes/documentation/etc are set up. The world outside is starting to realize that Wikidata is the place to go to for structured data around Wikimedia now. And we probably do want easy connecting between items/properties/lexems etc. As we're talking about different entity types the data is easy enough to keep apart for those who want to.

Other technical solutions can be found for keeping content apart when needed (e.g., separate dumps by entity types).

Cheers,

Markus

Stas Malyshev

12:40 a.m.

Hi!

...

Other technical solutions can be found for keeping content apart when needed (e.g., separate dumps by entity types).

It's not only dumps, it's also searches, APIs, special pages, etc. Of course, everything can be solved with enough time and coding, but to me it looks like running a DB server with only one database and only one table - why not use separation that already comes for free with another instance? We still can reuse any code we like.

-- Stas Malyshev smalyshev@wikimedia.org

Markus Krötzsch

12:51 a.m.

Hi,

On 08.05.2015 09:40, Stas Malyshev wrote:

...

Hi!

...
Other technical solutions can be found for keeping content apart when needed (e.g., separate dumps by entity types).

It's not only dumps, it's also searches, APIs, special pages, etc. Of course, everything can be solved with enough time and coding, but to me it looks like running a DB server with only one database and only one table - why not use separation that already comes for free with another instance? We still can reuse any code we like.

API features must support entity type selection anyway, and I think the same holds for most other cases you mention. One would not start a new Wikibase from scratch but build on the existing code. Therefore, it would be necessary to extend Wikibase with the new features. This will not be a fork, but an extension of the existing system. Therefore, it will be unavoidable to implement it in a way that would work when using all the features on one site. This implies that all of the problems you mentioned will have to be solved anyway.

Regards,

Markus

Bene*

2:15 a.m.

...

I do not think a separate Wikibase instance would be needed to provide the data for Wiktionary. I think this can and should be done on Wikidata. But as said by Milos and pointed out by Gerard, lexical knowledge does indeed require a different data schema. This is why the proposal introduces new entity types for lexemes, forms, and senses. The data model is mostly based on lexical ontologies that we surveyed, like LEMON and others.

I think a separate Wikibase installation would be much better than adding lexical knowledge on Wikidata. Wikidata is about things in the first place and Wiktionary is about words etc. So having a Wikibase installation only for Wiktionary makes more sense in my opinion as that is the same plan we currently have for Commons/Wikiquote etc. It would still be connected to Wikidata in ways like accepting items from Wikidata as values in statements and having access to their data. However, we should separate lexical knowledge and Wikidata also wiki-wise.

Best regards, Bene

Magnus Manske

2:19 a.m.

On Fri, May 8, 2015 at 10:16 AM Bene* benestar.wikimedia@gmail.com wrote:

...

Hi

...
I do not think a separate Wikibase instance would be needed to provide the data for Wiktionary. I think this can and should be done on Wikidata. But as said by Milos and pointed out by Gerard, lexical knowledge does indeed require a different data schema. This is why the proposal introduces new entity types for lexemes, forms, and senses. The data model is mostly based on lexical ontologies that we surveyed, like LEMON and others.

I think a separate Wikibase installation would be much better than adding lexical knowledge on Wikidata. Wikidata is about things in the first place and Wiktionary is about words etc. So having a Wikibase installation only for Wiktionary makes more sense in my opinion as that is the same plan we currently have for Commons/Wikiquote etc. It would still be connected to Wikidata in ways like accepting items from Wikidata as values in statements and having access to their data. However, we should separate lexical knowledge and Wikidata also wiki-wise.

+1

Thomas Douillard

2:30 a.m.

I don't get this, is this really a technical issue or just an interface one ? It can be pretty clear to users that the semantic entity pages are very different from lexical entities in the same instance just by tweaking the UI. Or with separate instances this can be confusing as well if not well done.

Is this a community issue ? Different project, different communities, different site ? I really don't like it as it tends to make several groups who can have difficulties to talk to each other and go on the other site. I think as Wikidata community is already constituted and tends to try to grow and advocate for the project, considering its central situation in the ecosystem and that community tends to learn how to make interproject social links, it would be beneficial imho to continue to grow and to learn from here. There is strong connections between words and senses.

I think in that global scheme, one or several instance is a mostly technical detail that is not really important and that both solutions can accommodate to distinct (or not) pages or distinct (or not) communities.

2015-05-08 11:15 GMT+02:00 Bene* benestar.wikimedia@gmail.com:

...

Hi

I do not think a separate Wikibase instance would be needed to provide

...
the data for Wiktionary. I think this can and should be done on Wikidata. But as said by Milos and pointed out by Gerard, lexical knowledge does indeed require a different data schema. This is why the proposal introduces new entity types for lexemes, forms, and senses. The data model is mostly based on lexical ontologies that we surveyed, like LEMON and others.

I think a separate Wikibase installation would be much better than adding lexical knowledge on Wikidata. Wikidata is about things in the first place and Wiktionary is about words etc. So having a Wikibase installation only for Wiktionary makes more sense in my opinion as that is the same plan we currently have for Commons/Wikiquote etc. It would still be connected to Wikidata in ways like accepting items from Wikidata as values in statements and having access to their data. However, we should separate lexical knowledge and Wikidata also wiki-wise.

Best regards, Bene

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Markus Krötzsch

5:02 a.m.

On 08.05.2015 11:30, Thomas Douillard wrote:

...

I don't get this, is this really a technical issue or just an interface one ? It can be pretty clear to users that the semantic entity pages are very different from lexical entities in the same instance just by tweaking the UI. Or with separate instances this can be confusing as well if not well done.

Is this a community issue ? Different project, different communities, different site ? I really don't like it as it tends to make several groups who can have difficulties to talk to each other and go on the other site. I think as Wikidata community is already constituted and tends to try to grow and advocate for the project, considering its central situation in the ecosystem and that community tends to learn how to make interproject social links, it would be beneficial imho to continue to grow and to learn from here. There is strong connections between words and senses.

I think in that global scheme, one or several instance is a mostly technical detail that is not really important and that both solutions can accommodate to distinct (or not) pages or distinct (or not) communities.

That's what I was thinking as well. As far as I see, whether it's one site or two sites would not make much difference for users, other than that the domain part of the URL would change and the menu/logo on the left would be different. But the accounts would be the same, the individual page contents would look the same, and the cross-links between dictionary content and data content would also be the same. Things would probably work fine either way.

Regards,

Markus

...

2015-05-08 11:15 GMT+02:00 Bene* <benestar.wikimedia@gmail.com mailto:benestar.wikimedia@gmail.com>:

Hi

    I do not think a separate Wikibase instance would be needed to
    provide the data for Wiktionary. I think this can and should be
    done on Wikidata. But as said by Milos and pointed out by
    Gerard, lexical knowledge does indeed require a different data
    schema. This is why the proposal introduces new entity types for
    lexemes, forms, and senses. The data model is mostly based on
    lexical ontologies that we surveyed, like LEMON and others.


I think a separate Wikibase installation would be much better than
adding lexical knowledge on Wikidata. Wikidata is about things in
the first place and Wiktionary is about words etc. So having a
Wikibase installation only for Wiktionary makes more sense in my
opinion as that is the same plan we currently have for
Commons/Wikiquote etc. It would still be connected to Wikidata in
ways like accepting items from Wikidata as values in statements and
having access to their data. However, we should separate lexical
knowledge and Wikidata also wiki-wise.

Best regards,
Bene


_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Federico Leva (Nemo)

6:40 a.m.

Bene*, 08/05/2015 11:15:

...

So having a Wikibase installation only for Wiktionary makes more sense in my opinion as that is the same plan we currently have for Commons/Wikiquote etc.

We? Please remember that's only a personal proposal, which no Wikiquote community has ever subscribed to (yet). (Cc Wikiquote-l.)

Nemo

Ricordisamoa

12 May 12 May

12:24 p.m.

Il 08/05/2015 15:40, Federico Leva (Nemo) ha scritto:

...

Bene*, 08/05/2015 11:15:

...
So having a Wikibase installation only for Wiktionary makes more sense in my opinion as that is the same plan we currently have for Commons/Wikiquote etc.

We? Please remember that's only a personal proposal, which no Wikiquote community has ever subscribed to (yet). (Cc Wikiquote-l.)

Nemo

It's only a personal proposal https://meta.wikimedia.org/wiki/Structured_Wikiquote, supported by 16 people, and with a demo http://structured.wikiquote.wmflabs.org set up by someone with no affiliation to the original proposer. If you have any feelings against this project, I think you'd better explain them on the talk page https://meta.wikimedia.org/wiki/Talk:Structured_Wikiquote.

Jan Dudík

13 May 13 May

1:44 p.m.

French wiktionary uses more than 2000 languages JAnD

2015-05-07 23:25 GMT+02:00 Federico Leva (Nemo) nemowiki@gmail.com:

...

Andy Mabbett, 07/05/2015 22:53:

...
...
The Wiktionary communities tend to strongly disagree that splitting entries

...
per language would be easier for either editors or readers.

How many languages are currently used? How will this scale to ~300 languages?

Hm? Last time I counted, the English Wiktionary alone used way more than 300 languages.

Nemo

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

14 May 14 May

4:04 a.m.

Hoi, What is your definition of a language and, if it is not along the lines of the ISO-639-3, how are they organised.

One of the first things to do is understand how these languages can be incorporated in Wikidata and prepare for that. Do you have a list with all the languages and hopefully their code ? Thanks, GerardM

Ricordisamoa

7 May 7 May

3:57 a.m.

Hi Denny, I would strongly advise against connecting Wiktionary to Wikidata in the status quo, mainly for the reasons Gerard summarized. While wikt's 'data model' probably makes sense for a spelling-based dictionary, it does not for a concept-based knowledge base like ours. Even turning Wiktionary into an OmegaWiki https://meta.wikimedia.org/wiki/OmegaWiki-like project seems unlikely feasible without an intermediate step. Let's focus on Commons, OpenStreetMap, queries, arbitrary access, new datatypes?

Il 07/05/2015 04:54, Denny Vrandečić ha scritto:

...

It is rather clear that everyone wants Wikidata to also support Wiktionary, and there have been plenty of proposals in the last few years. I think that the latest proposals are sufficiently similar to go for the next step: a break down of the tasks needed to get this done.

Currently, the idea of having Wikidata supporting Wiktionary is stalled because it is regarded as a large monolithic task, and as such it is hard to plan and commit to. I tried to come up with a task break-down, and discussed it with Lydia and Daniel, and now, as said in the last office hour, here it is for discussion and community input.

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015...

I think it would be really awesome if we would start moving in this direction. Wiktionary supported by Wikidata could quickly become one of the crucial pieces of infrastructure for the Web as a whole, but in particular for Wikipedia and its future development.

Cheers, Denny

Magnus Manske

4:57 a.m.

Forgive me, but at the 2014 WikiCon in Cologne, I saw a talk that would see Wiktionary converted to a separate wikibase installation, collapsing all the wikitionary languages into items. THAT could reasonably be linked to Wikidata, or "just" cross-references via properties.

Trying to wedge the current links into Wikidata seems like a failing proposition.

On Thu, May 7, 2015 at 11:58 AM Ricordisamoa ricordisamoa@openmailbox.org wrote:

...

Hi Denny, I would strongly advise against connecting Wiktionary to Wikidata in the status quo, mainly for the reasons Gerard summarized. While wikt's 'data model' probably makes sense for a spelling-based dictionary, it does not for a concept-based knowledge base like ours. Even turning Wiktionary into an OmegaWiki https://meta.wikimedia.org/wiki/OmegaWiki-like project seems unlikely feasible without an intermediate step. Let's focus on Commons, OpenStreetMap, queries, arbitrary access, new datatypes?

Il 07/05/2015 04:54, Denny Vrandečić ha scritto:

It is rather clear that everyone wants Wikidata to also support Wiktionary, and there have been plenty of proposals in the last few years. I think that the latest proposals are sufficiently similar to go for the next step: a break down of the tasks needed to get this done.

Currently, the idea of having Wikidata supporting Wiktionary is stalled because it is regarded as a large monolithic task, and as such it is hard to plan and commit to. I tried to come up with a task break-down, and discussed it with Lydia and Daniel, and now, as said in the last office hour, here it is for discussion and community input.

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015...

I think it would be really awesome if we would start moving in this direction. Wiktionary supported by Wikidata could quickly become one of the crucial pieces of infrastructure for the Web as a whole, but in particular for Wikipedia and its future development.

Cheers, Denny

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Andy Mabbett

5:08 a.m.

On 7 May 2015 at 11:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:

...

Let's focus on Commons, OpenStreetMap, queries, arbitrary access, new datatypes?

OSM in what context?

Also, we should throw WikiSpecies into the mix.

-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

Ricordisamoa

8 May 8 May

4:23 a.m.

Il 07/05/2015 14:08, Andy Mabbett ha scritto:

...

On 7 May 2015 at 11:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:

...
Let's focus on Commons, OpenStreetMap, queries, arbitrary access, new datatypes?

OSM in what context?

Adding mutual links, keeping them up to date, building applications that use both databases, etc. https://wiki.openstreetmap.org/wiki/Wikidata

...

Also, we should throw WikiSpecies into the mix.

This reminds me of some old discussions... [1] https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2013/02#Include_Wikispecies_into_Wikidata [2] https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2013/04#Wikispecies [3] https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/02#Wikispecies etc.

Romaine Wiki

5:30 a.m.

I personally am waiting for Meta to be added.

Romaine

2015-05-07 14:08 GMT+02:00 Andy Mabbett andy@pigsonthewing.org.uk:

...

On 7 May 2015 at 11:57, Ricordisamoa ricordisamoa@openmailbox.org wrote:

...
Let's focus on Commons, OpenStreetMap, queries, arbitrary access, new datatypes?

OSM in what context?

Also, we should throw WikiSpecies into the mix.

-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Lydia Pintscher

7 May 7 May

5:28 a.m.

Hey folks :)

You're absolutely right that we need to focus on a few other things first (UI redesign, units, queries, arbitrary access, data quality tools incl watchlist improvements). However we also need to look into the future. Wiktionary support needs a lot of input to make sure we're doing the right thing. And it's good to give that time. So please do read the latest proposal Denny posted. It even has some mockups to make it easier to understand what it'd look like in practice. If we can get rough consensus that this is the way forward things will fall into place. And we'll not abandon the things I mentioned that are right now more important.

Cheers Lydia

Yair Rand

5:56 a.m.

Task 1 as described on the proposal page isn't completely clear on how it would work. Would the generated "items" have Q-ids? Would it be possible to link Wiktionary entries to non-Wiktionary pages in the very rare situations that make sense (articles on particular series of (not-language-associated) symbols/characters)?

Regardless, I think that doing Task 1 is a very worthwhile idea. The rest of the tasks, however, should probably wait until much later.

On Thu, May 7, 2015 at 8:28 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:

...

Hey folks :)

You're absolutely right that we need to focus on a few other things first (UI redesign, units, queries, arbitrary access, data quality tools incl watchlist improvements). However we also need to look into the future. Wiktionary support needs a lot of input to make sure we're doing the right thing. And it's good to give that time. So please do read the latest proposal Denny posted. It even has some mockups to make it easier to understand what it'd look like in practice. If we can get rough consensus that this is the way forward things will fall into place. And we'll not abandon the things I mentioned that are right now more important.

Cheers Lydia

-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata

Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Daniel Kinzler

7:03 a.m.

Am 07.05.2015 um 14:56 schrieb Yair Rand:

...

Task 1 as described on the proposal page isn't completely clear on how it would work. Would the generated "items" have Q-ids? Would it be possible to link Wiktionary entries to non-Wiktionary pages in the very rare situations that make sense (articles on particular series of (not-language-associated) symbols/characters)?

Task 1 (Interlanguage-Links for Wiktionary) would not involve Wikidata or Wikibase at all. It would be a standalone extension linking pages with identical names between wikis.

-- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Ricordisamoa

10:32 a.m.

Il 07/05/2015 16:03, Daniel Kinzler ha scritto:

...

Am 07.05.2015 um 14:56 schrieb Yair Rand:

...
Task 1 as described on the proposal page isn't completely clear on how it would work. Would the generated "items" have Q-ids? Would it be possible to link Wiktionary entries to non-Wiktionary pages in the very rare situations that make sense (articles on particular series of (not-language-associated) symbols/characters)?

Task 1 (Interlanguage-Links for Wiktionary) would not involve Wikidata or Wikibase at all. It would be a standalone extension linking pages with identical names between wikis.

It's ok then! I have been thinking about something like that for some time...

Milos Rancic

10:38 a.m.

BTW, Daniel, there are standardized templates for "real" "interwiki" links (links to the entries with the same meaning in other languages on the same Wiktionary). It makes sense that Wikidata creates a db for that. Though, it isn't trivial and assumes meanings. Though, it seems to me reasonably possible. On May 7, 2015 19:32, "Ricordisamoa" ricordisamoa@openmailbox.org wrote:

...

Il 07/05/2015 16:03, Daniel Kinzler ha scritto:

...
Am 07.05.2015 um 14:56 schrieb Yair Rand:

...
Task 1 as described on the proposal page isn't completely clear on how it would work. Would the generated "items" have Q-ids? Would it be possible to link Wiktionary entries to non-Wiktionary pages in the very rare situations that make sense (articles on particular series of (not-language-associated) symbols/characters)?

Task 1 (Interlanguage-Links for Wiktionary) would not involve Wikidata or Wikibase at all. It would be a standalone extension linking pages with identical names between wikis.

It's ok then! I have been thinking about something like that for some time...

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Daniel Kinzler

8 May 8 May

2:33 a.m.

Am 07.05.2015 um 19:38 schrieb Milos Rancic:

...

BTW, Daniel, there are standardized templates for "real" "interwiki" links (links to the entries with the same meaning in other languages on the same Wiktionary). It makes sense that Wikidata creates a db for that. Though, it isn't trivial and assumes meanings. Though, it seems to me reasonably possible.

The idea is to do this by having both lexical entries reference the same Q-item as one of their meanings.

-- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Romaine Wiki

5:32 a.m.

Only for some templates, project pages and categories.

The only way it makes sense to link to an article of Wiktionary is when someone wants to look up what a word can mean.

Romaine

2015-05-07 14:56 GMT+02:00 Yair Rand yyairrand@gmail.com:

...

Task 1 as described on the proposal page isn't completely clear on how it would work. Would the generated "items" have Q-ids? Would it be possible to link Wiktionary entries to non-Wiktionary pages in the very rare situations that make sense (articles on particular series of (not-language-associated) symbols/characters)?

Regardless, I think that doing Task 1 is a very worthwhile idea. The rest of the tasks, however, should probably wait until much later.

On Thu, May 7, 2015 at 8:28 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:

...
Hey folks :)

You're absolutely right that we need to focus on a few other things first (UI redesign, units, queries, arbitrary access, data quality tools incl watchlist improvements). However we also need to look into the future. Wiktionary support needs a lot of input to make sure we're doing the right thing. And it's good to give that time. So please do read the latest proposal Denny posted. It even has some mockups to make it easier to understand what it'd look like in practice. If we can get rough consensus that this is the way forward things will fall into place. And we'll not abandon the things I mentioned that are right now more important.

Cheers Lydia

-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata

Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Luca Martinelli

7 May 7 May

7:49 a.m.

2015-05-07 14:28 GMT+02:00 Lydia Pintscher lydia.pintscher@wikimedia.de:

...

However we also need to look into the future. Wiktionary support needs a lot of input to make sure we're doing the right thing. And it's good to give that time.

Totally agree with that. There's plenty of work to do for the team, we all know that, but *one day* we'd have to figure out how to deal with Wiktionary. It's just something that *has* to happen.

This doesn't mean at all it should become our first or only thought, everybody knows that there are at least two or three concerns that should have priority at the moment, but not even Denny was suggesting that. He was merely suggesting to restart thinking about something that, sooner or later, we'll have to deal with and to estabilish "a break down of the tasks needed to get this done." Sorry for being blunt, but not even the Structured Data project for Commons - which is indeed a top-priority thing at the moment - would have started with this attitude.

-- Luca "Sannita" Martinelli http://it.wikipedia.org/wiki/Utente:Sannita

Milos Rancic

6:31 a.m.

It is of limited value (as Gerard explained) to do major work on Wiktionary. Wiktionary articles could be transferred to the structured data in the similar way like Wikipedia articles, with a lot of trouble. Thus not the most optimal solution.

What makes sense is to incorporate OmegaWiki logic into Wikidata and create formal multilingual dictionary (vs. Wiktionary as philological dictionary). On May 7, 2015 4:54 AM, "Denny Vrandečić" vrandecic@gmail.com wrote:

...

It is rather clear that everyone wants Wikidata to also support Wiktionary, and there have been plenty of proposals in the last few years. I think that the latest proposals are sufficiently similar to go for the next step: a break down of the tasks needed to get this done.

Currently, the idea of having Wikidata supporting Wiktionary is stalled because it is regarded as a large monolithic task, and as such it is hard to plan and commit to. I tried to come up with a task break-down, and discussed it with Lydia and Daniel, and now, as said in the last office hour, here it is for discussion and community input.

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015...

I think it would be really awesome if we would start moving in this direction. Wiktionary supported by Wikidata could quickly become one of the crucial pieces of infrastructure for the Web as a whole, but in particular for Wikipedia and its future development.

Cheers, Denny

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

John Erling Blad

14 May 14 May

7:49 a.m.

As I read your proposal you want to automate IW-linkage of similar lexemes, but how do you want to handle those cases where the lexemes are not similar? Your example "the tea room" vs "le questions sur let mots" is such a case. Is this handled as a mixed automatic/manuel case, with lexemes added automatically and the additional ones added manually?

Can you elaborate on how you want to handle word form vs word sense?

John

On Thu, May 7, 2015 at 4:54 AM, Denny Vrandečić vrandecic@gmail.com wrote:

...

It is rather clear that everyone wants Wikidata to also support Wiktionary, and there have been plenty of proposals in the last few years. I think that the latest proposals are sufficiently similar to go for the next step: a break down of the tasks needed to get this done.

Currently, the idea of having Wikidata supporting Wiktionary is stalled because it is regarded as a large monolithic task, and as such it is hard to plan and commit to. I tried to come up with a task break-down, and discussed it with Lydia and Daniel, and now, as said in the last office hour, here it is for discussion and community input.

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015...

I think it would be really awesome if we would start moving in this direction. Wiktionary supported by Wikidata could quickly become one of the crucial pieces of infrastructure for the Web as a whole, but in particular for Wikipedia and its future development.

Cheers, Denny

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

8:59 a.m.

Hoi,

...

From a Wiktionary point of view they are not the same. Wiktionary links

articles that have the same spelling in common. For every meaning in every language they link to the articles that have a specific spelling and it is potluck if that meaning actually exists. Thanks, GerardM

On 14 May 2015 at 16:49, John Erling Blad jeblad@gmail.com wrote:

...

As I read your proposal you want to automate IW-linkage of similar lexemes, but how do you want to handle those cases where the lexemes are not similar? Your example "the tea room" vs "le questions sur let mots" is such a case. Is this handled as a mixed automatic/manuel case, with lexemes added automatically and the additional ones added manually?

Can you elaborate on how you want to handle word form vs word sense?

John

On Thu, May 7, 2015 at 4:54 AM, Denny Vrandečić vrandecic@gmail.com wrote:

...
It is rather clear that everyone wants Wikidata to also support

Wiktionary,

...
and there have been plenty of proposals in the last few years. I think

that

...
the latest proposals are sufficiently similar to go for the next step: a break down of the tasks needed to get this done.

Currently, the idea of having Wikidata supporting Wiktionary is stalled because it is regarded as a large monolithic task, and as such it is

hard to

...
plan and commit to. I tried to come up with a task break-down, and

discussed

...
it with Lydia and Daniel, and now, as said in the last office hour, here

it

...
is for discussion and community input.

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015...

...
I think it would be really awesome if we would start moving in this direction. Wiktionary supported by Wikidata could quickly become one of

the

...
crucial pieces of infrastructure for the Web as a whole, but in

particular

...
for Wikipedia and its future development.

Cheers, Denny

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

John Erling Blad

2:54 p.m.

Let me rephrase, and the question is for Denny unless someone knows the answer.

Lexemes at different languages share a spelling, and that is the reason why they are linked together. That kind of linkage can be automated. Some other pages (usually in other namespaces) at those projects should be linked too, but can't be handled automatically. Would they be handled as sitelinks in Items?

John

On Thu, May 14, 2015 at 5:59 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Hoi, From a Wiktionary point of view they are not the same. Wiktionary links articles that have the same spelling in common. For every meaning in every language they link to the articles that have a specific spelling and it is potluck if that meaning actually exists. Thanks, GerardM

On 14 May 2015 at 16:49, John Erling Blad jeblad@gmail.com wrote:

...
As I read your proposal you want to automate IW-linkage of similar lexemes, but how do you want to handle those cases where the lexemes are not similar? Your example "the tea room" vs "le questions sur let mots" is such a case. Is this handled as a mixed automatic/manuel case, with lexemes added automatically and the additional ones added manually?

Can you elaborate on how you want to handle word form vs word sense?

John

On Thu, May 7, 2015 at 4:54 AM, Denny Vrandečić vrandecic@gmail.com wrote:

...
It is rather clear that everyone wants Wikidata to also support Wiktionary, and there have been plenty of proposals in the last few years. I think that the latest proposals are sufficiently similar to go for the next step: a break down of the tasks needed to get this done.

Currently, the idea of having Wikidata supporting Wiktionary is stalled because it is regarded as a large monolithic task, and as such it is hard to plan and commit to. I tried to come up with a task break-down, and discussed it with Lydia and Daniel, and now, as said in the last office hour, here it is for discussion and community input.

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015...

I think it would be really awesome if we would start moving in this direction. Wiktionary supported by Wikidata could quickly become one of the crucial pieces of infrastructure for the Web as a whole, but in particular for Wikipedia and its future development.

Cheers, Denny

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Daniel Kinzler

3:34 p.m.

Am 14.05.2015 um 23:54 schrieb John Erling Blad:

...

Let me rephrase, and the question is for Denny unless someone knows the answer.

Lexemes at different languages share a spelling, and that is the reason why they are linked together. That kind of linkage can be automated. Some other pages (usually in other namespaces) at those projects should be linked too, but can't be handled automatically. Would they be handled as sitelinks in Items?

Yes, I'd assume so.

-- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

John Erling Blad

3:36 p.m.

Yes, found a sentence in task 2. :)

On Fri, May 15, 2015 at 12:34 AM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:

...

Am 14.05.2015 um 23:54 schrieb John Erling Blad:

...
Let me rephrase, and the question is for Denny unless someone knows the answer.

Lexemes at different languages share a spelling, and that is the reason why they are linked together. That kind of linkage can be automated. Some other pages (usually in other namespaces) at those projects should be linked too, but can't be handled automatically. Would they be handled as sitelinks in Items?

Yes, I'd assume so.

-- Daniel Kinzler Senior Software Developer

Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

John Erling Blad

4:11 p.m.

Seems like this is doable, and it does describe a solution to how Wiktionary can be linked form Wikidata. It is although not completely clear to me how some remaining problems can be solved.

How do we go from a spelled form of a lexeme at Wiktionary and to an identifier on Wikidata? And how do we go from one Sense to another synonym Sense? Do we use statements? But then only the L-identifiers can be used, so we will link them at the Lexeme level..

Wiktionary is organized around homonyms while Wikipedia is organized around synonyms, especially across languages, and I think this difference creates some of the problems.

On Fri, May 15, 2015 at 12:36 AM, John Erling Blad jeblad@gmail.com wrote:

...

Yes, found a sentence in task 2. :)

On Fri, May 15, 2015 at 12:34 AM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:

...
Am 14.05.2015 um 23:54 schrieb John Erling Blad:

...
Let me rephrase, and the question is for Denny unless someone knows the answer.

Lexemes at different languages share a spelling, and that is the reason why they are linked together. That kind of linkage can be automated. Some other pages (usually in other namespaces) at those projects should be linked too, but can't be handled automatically. Would they be handled as sitelinks in Items?

Yes, I'd assume so.

-- Daniel Kinzler Senior Software Developer

Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Gerard Meijssen

9:46 p.m.

Hoi, This is in other words what my question amounts to. The question that Denny does not answer. Thanks, GerardM

On 15 May 2015 at 01:11, John Erling Blad jeblad@gmail.com wrote:

...

Seems like this is doable, and it does describe a solution to how Wiktionary can be linked form Wikidata. It is although not completely clear to me how some remaining problems can be solved.

How do we go from a spelled form of a lexeme at Wiktionary and to an identifier on Wikidata? And how do we go from one Sense to another synonym Sense? Do we use statements? But then only the L-identifiers can be used, so we will link them at the Lexeme level..

Wiktionary is organized around homonyms while Wikipedia is organized around synonyms, especially across languages, and I think this difference creates some of the problems.

On Fri, May 15, 2015 at 12:36 AM, John Erling Blad jeblad@gmail.com wrote:

...
Yes, found a sentence in task 2. :)

On Fri, May 15, 2015 at 12:34 AM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:

...
Am 14.05.2015 um 23:54 schrieb John Erling Blad:

...
Let me rephrase, and the question is for Denny unless someone knows

the answer.

...
...
...
Lexemes at different languages share a spelling, and that is the reason why they are linked together. That kind of linkage can be automated. Some other pages (usually in other namespaces) at those projects should be linked too, but can't be handled automatically. Would they be handled as sitelinks in Items?

Yes, I'd assume so.

-- Daniel Kinzler Senior Software Developer

Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Daniel Kinzler

16 May 16 May

3:21 a.m.

Am 15.05.2015 um 01:11 schrieb John Erling Blad:

...

How do we go from a spelled form of a lexeme at Wiktionary and to an identifier on Wikidata?

What do you mean by "go to"? And what do you mean by "identifier on Wikidata" - Items, Lexemes, Senses, or Forms?

Generally, Wiktionary currently combines words with the same rendering from different languages on a single page. So a single Wiktionary page would correspond to several Lexeme entries on Wikidata, since Lexemes on wikidata would be split per language.

I suppose a Lexeme-Entry could be linked back to the corresponding pages on the various Wiktionaries, but I don't really see the value of that, and sitelinks are currently not planned for Lexeme entries. It probably makes more sense for the Wiktionary pages to explicitly reference the Wikidata-Lexeme that corresponds to each language-section on the page.

...

And how do we go from one Sense to another synonym Sense? Do we use statements? But then only the L-identifiers can be used, so we will link them at the Lexeme level..

Why can only L-Identifiers be used? Senses (and Forms) are entities and have identifiers. They wouldn't have a wiki-page of their own, but that's not a problem. The intention is that it's possible for one Sense to have a statement referring directly to another Sense (of the same or a different Lexeme).

...

Wiktionary is organized around homonyms while Wikipedia is organized around synonyms, especially across languages, and I think this difference creates some of the problems.

The Lexeme-Part of Wikidata (L-ids) would be separate from the Concept-part of Wikidata (Q-ids). The Lexeme part is organized around homonyms (more precisely, homographs in a single language). Each Lexeme can have several "Senses" modeled as "sub-entities", meaning that each Sense has its own set of Statements. Each Sense can be linked to Senses of other Lexemes (explicit synonyms or translations) and to Q-id concepts (implicit synonyms or translations) using Statements.

-- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

John Erling Blad

3:46 p.m.

Your description is pretty far from whats in the proposal right now. The proposal is not clear at all, so I would say update it and resubmit if for a new discussion.

On Sat, May 16, 2015 at 12:21 PM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:

...

Am 15.05.2015 um 01:11 schrieb John Erling Blad:

...
How do we go from a spelled form of a lexeme at Wiktionary and to an identifier on Wikidata?

What do you mean by "go to"? And what do you mean by "identifier on Wikidata" - Items, Lexemes, Senses, or Forms?

Generally, Wiktionary currently combines words with the same rendering from different languages on a single page. So a single Wiktionary page would correspond to several Lexeme entries on Wikidata, since Lexemes on wikidata would be split per language.

I suppose a Lexeme-Entry could be linked back to the corresponding pages on the various Wiktionaries, but I don't really see the value of that, and sitelinks are currently not planned for Lexeme entries. It probably makes more sense for the Wiktionary pages to explicitly reference the Wikidata-Lexeme that corresponds to each language-section on the page.

...
And how do we go from one Sense to another synonym Sense? Do we use statements? But then only the L-identifiers can be used, so we will link them at the Lexeme level..

Why can only L-Identifiers be used? Senses (and Forms) are entities and have identifiers. They wouldn't have a wiki-page of their own, but that's not a problem. The intention is that it's possible for one Sense to have a statement referring directly to another Sense (of the same or a different Lexeme).

...
Wiktionary is organized around homonyms while Wikipedia is organized around synonyms, especially across languages, and I think this difference creates some of the problems.

The Lexeme-Part of Wikidata (L-ids) would be separate from the Concept-part of Wikidata (Q-ids). The Lexeme part is organized around homonyms (more precisely, homographs in a single language). Each Lexeme can have several "Senses" modeled as "sub-entities", meaning that each Sense has its own set of Statements. Each Sense can be linked to Senses of other Lexemes (explicit synonyms or translations) and to Q-id concepts (implicit synonyms or translations) using Statements.

-- Daniel Kinzler Senior Software Developer

Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Denny Vrandečić

17 May 17 May

12:20 p.m.

Daniel's answer fits exactly with the proposal (which is unsurprising, because he reviewed and certainly influenced it).

To make it clear again: the proposal on https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015... is a proposal for the tasks that need to be performed. Your questions are mostly about the data model, which was discussed earlier in the following proposal: https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2013...

Since I am not sure which questions remain open, I will try to address them here again, on the risk of repeating what has been said before. Unfortunately you seem to not use the terminology as defined in the second proposal linked above, which makes the discussion unnecessarily harder than it could be. If you prefer another terminology, I would be happy if you link to a one pager describing it, so that we can effectively communicate.

...

How do we go from a spelled form of a lexeme at Wiktionary and to an identifier

on Wikidata?

If with "spelled form of a lexeme at Wiktionary" you mean a Form as per the proposal, then the answer is: Forms have statements, and statements may point to Items, Forms, Senses, Lexemes, etc.. The exact properties to be used in these statements are up to the community.

If with "spelled form of a lexeme at Wiktionary" you mean Lexeme as per the proposal, than the answer is: Lexems have statements, and statements may point to Items, Forms, Senses, Lexemes, etc. The exact properties to be used in these statements are up to the community.

This is already stated in the second link above.

...

And how do we go from one Sense to another synonym Sense?

A Sense has a set of statements, and statements may point to other Senses. The exact properties used are up to the community. So a statement with the property 'synonym' stated on a Sense could point to another Sense.

...

Do we use statements?

Yes.

...

But then only the L-identifiers can be used, so we will link them at the

Lexeme level..

No. As the second link above says, Senses and Forms also have Statements. It is not only Lexemes that have Statements.

...

Wiktionary is organized around homonyms while Wikipedia is organized around

synonyms, especially across languages, and I think this difference creates some of the problems.

Yes, that is why Tasks 1, 2, 9 and 10 in the proposal for the task breakdown, the first link above, deal with exactly this question.

Since Gerard stated that his question was subsumed by the above list, I hope that his question is also answered?

I am afraid that I could not write a new proposal which is significantly clearer than the current, but I can keep answering questions. But all the questions you have asked seem to be explicitly answered in the two links given above. Since I know you are smart, I am wondering what is not working in the communication right now. Did you miss the first link? Because without that it is indeed hard to fully understand the second link (but the first link is already given in the second link).

So, please, keep asking questions. And everyone else too. I would like to continue improving the proposals based on your questions and suggestions.

On Sat, May 16, 2015 at 3:46 PM John Erling Blad jeblad@gmail.com wrote:

...

Your description is pretty far from whats in the proposal right now. The proposal is not clear at all, so I would say update it and resubmit if for a new discussion.

On Sat, May 16, 2015 at 12:21 PM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:

...
Am 15.05.2015 um 01:11 schrieb John Erling Blad:

...
How do we go from a spelled form of a lexeme at Wiktionary and to an identifier on Wikidata?

What do you mean by "go to"? And what do you mean by "identifier on

Wikidata" -

...
Items, Lexemes, Senses, or Forms?

Generally, Wiktionary currently combines words with the same rendering

from

...
different languages on a single page. So a single Wiktionary page would correspond to several Lexeme entries on Wikidata, since Lexemes on

wikidata

...
would be split per language.

I suppose a Lexeme-Entry could be linked back to the corresponding pages

on the

...
various Wiktionaries, but I don't really see the value of that, and

sitelinks

...
are currently not planned for Lexeme entries. It probably makes more

sense for

...
the Wiktionary pages to explicitly reference the Wikidata-Lexeme that corresponds to each language-section on the page.

...
And how do we go from one Sense to another synonym Sense? Do we use statements? But then only the L-identifiers can be used, so we will link them at the Lexeme level..

Why can only L-Identifiers be used? Senses (and Forms) are entities and

have

...
identifiers. They wouldn't have a wiki-page of their own, but that's not

a

...
problem. The intention is that it's possible for one Sense to have a

statement

...
referring directly to another Sense (of the same or a different Lexeme).

...
Wiktionary is organized around homonyms while Wikipedia is organized around synonyms, especially across languages, and I think this difference creates some of the problems.

The Lexeme-Part of Wikidata (L-ids) would be separate from the

Concept-part of

...
Wikidata (Q-ids). The Lexeme part is organized around homonyms (more

precisely,

...
homographs in a single language). Each Lexeme can have several "Senses"

modeled

...
as "sub-entities", meaning that each Sense has its own set of

Statements. Each

...
Sense can be linked to Senses of other Lexemes (explicit synonyms or translations) and to Q-id concepts (implicit synonyms or translations)

using

...
Statements.

-- Daniel Kinzler Senior Software Developer

Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Denny Vrandečić

12:35 p.m.

John, sorry, I guess I was too slow - as far as I understand you have now re-read the 13-08 proposal, which has made my last Email redundant.

https://www.wikidata.org/w/index.php?title=Wikidata_talk:Wiktionary/Developm...

I hope that the model is clear now. Thanks for your engagement! Denny

On Sun, May 17, 2015 at 12:20 PM Denny Vrandečić vrandecic@gmail.com wrote:

...

Daniel's answer fits exactly with the proposal (which is unsurprising, because he reviewed and certainly influenced it).

To make it clear again: the proposal on

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015... is a proposal for the tasks that need to be performed. Your questions are mostly about the data model, which was discussed earlier in the following proposal:

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2013...

Since I am not sure which questions remain open, I will try to address them here again, on the risk of repeating what has been said before. Unfortunately you seem to not use the terminology as defined in the second proposal linked above, which makes the discussion unnecessarily harder than it could be. If you prefer another terminology, I would be happy if you link to a one pager describing it, so that we can effectively communicate.

...
How do we go from a spelled form of a lexeme at Wiktionary and to an identifier

on Wikidata?

If with "spelled form of a lexeme at Wiktionary" you mean a Form as per the proposal, then the answer is: Forms have statements, and statements may point to Items, Forms, Senses, Lexemes, etc.. The exact properties to be used in these statements are up to the community.

If with "spelled form of a lexeme at Wiktionary" you mean Lexeme as per the proposal, than the answer is: Lexems have statements, and statements may point to Items, Forms, Senses, Lexemes, etc. The exact properties to be used in these statements are up to the community.

This is already stated in the second link above.

...
And how do we go from one Sense to another synonym Sense?

A Sense has a set of statements, and statements may point to other Senses. The exact properties used are up to the community. So a statement with the property 'synonym' stated on a Sense could point to another Sense.

...
Do we use statements?

Yes.

...
But then only the L-identifiers can be used, so we will link them at

the Lexeme level..

No. As the second link above says, Senses and Forms also have Statements. It is not only Lexemes that have Statements.

...
Wiktionary is organized around homonyms while Wikipedia is organized around

synonyms, especially across languages, and I think this difference creates some of the problems.

Yes, that is why Tasks 1, 2, 9 and 10 in the proposal for the task breakdown, the first link above, deal with exactly this question.

Since Gerard stated that his question was subsumed by the above list, I hope that his question is also answered?

I am afraid that I could not write a new proposal which is significantly clearer than the current, but I can keep answering questions. But all the questions you have asked seem to be explicitly answered in the two links given above. Since I know you are smart, I am wondering what is not working in the communication right now. Did you miss the first link? Because without that it is indeed hard to fully understand the second link (but the first link is already given in the second link).

So, please, keep asking questions. And everyone else too. I would like to continue improving the proposals based on your questions and suggestions.

On Sat, May 16, 2015 at 3:46 PM John Erling Blad jeblad@gmail.com wrote:

...
Your description is pretty far from whats in the proposal right now. The proposal is not clear at all, so I would say update it and resubmit if for a new discussion.

On Sat, May 16, 2015 at 12:21 PM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:

...
Am 15.05.2015 um 01:11 schrieb John Erling Blad:

...
How do we go from a spelled form of a lexeme at Wiktionary and to an identifier on Wikidata?

What do you mean by "go to"? And what do you mean by "identifier on

Wikidata" -

...
Items, Lexemes, Senses, or Forms?

Generally, Wiktionary currently combines words with the same rendering

from

...
different languages on a single page. So a single Wiktionary page would correspond to several Lexeme entries on Wikidata, since Lexemes on

wikidata

...
would be split per language.

I suppose a Lexeme-Entry could be linked back to the corresponding

pages on the

...
various Wiktionaries, but I don't really see the value of that, and

sitelinks

...
are currently not planned for Lexeme entries. It probably makes more

sense for

...
the Wiktionary pages to explicitly reference the Wikidata-Lexeme that corresponds to each language-section on the page.

...
And how do we go from one Sense to another synonym Sense? Do we use statements? But then only the L-identifiers can be used, so we will link them at the Lexeme level..

Why can only L-Identifiers be used? Senses (and Forms) are entities and

have

...
identifiers. They wouldn't have a wiki-page of their own, but that's

not a

...
problem. The intention is that it's possible for one Sense to have a

statement

...
referring directly to another Sense (of the same or a different Lexeme).

...
Wiktionary is organized around homonyms while Wikipedia is organized around synonyms, especially across languages, and I think this difference creates some of the problems.

The Lexeme-Part of Wikidata (L-ids) would be separate from the

Concept-part of

...
Wikidata (Q-ids). The Lexeme part is organized around homonyms (more

precisely,

...
homographs in a single language). Each Lexeme can have several "Senses"

modeled

...
as "sub-entities", meaning that each Sense has its own set of

Statements. Each

...
Sense can be linked to Senses of other Lexemes (explicit synonyms or translations) and to Q-id concepts (implicit synonyms or translations)

using

...
Statements.

-- Daniel Kinzler Senior Software Developer

Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Daniel Kinzler

1:39 p.m.

Am 17.05.2015 um 00:46 schrieb John Erling Blad:

...

Your description is pretty far from whats in the proposal right now. The proposal is not clear at all, so I would say update it and resubmit if for a new discussion.

Can you explain where you think my description is inconsistent with the current proposal?

I agree the proposal is a bit terse, and it would be nice if it explained a bit more how common use cases, like translations and synonyms, would be covered by the proposed model. But it clearly states that Lexemes contain Senses and Forms, and that Sense and Forms are entities (and thus have IDs, and can be referenced individually) and have Statements (which can be used to reference other entities, like Senses or Items).

My explanation reflects the intent behind the proposed model. If it seems far from the proposal to you, it would be good to know why that is, and how that could be fixed.

-- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Scott MacLeod

21 May 21 May

3:54 p.m.

Hi Denny, Lydia, Daniel and All,

This is great.

Looking further into the CC Wikidata future, in what ways might such CC Wiktionary database developments as you're outlining here - https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015... - inform, or be extensible into, a CC Universal Translator, eventually in all 7,929 languages in Glottolog - http://glottolog.org/glottolog/language - for example, as well as including invented and dead languages, and even inter-species' communication, - and in voice and video eventually, and for MIT OCW-centric linguistic research, as well?

Thanks and cheers, Scott CC WUaS Universal Translator: http://worlduniversity.wikia.com/wiki/WUaS_Universal_Translator

On Sun, May 17, 2015 at 1:39 PM, Daniel Kinzler <daniel.kinzler@wikimedia.de

...

wrote:

...

Am 17.05.2015 um 00:46 schrieb John Erling Blad:

...
Your description is pretty far from whats in the proposal right now. The proposal is not clear at all, so I would say update it and resubmit if for a new discussion.

Can you explain where you think my description is inconsistent with the current proposal?

I agree the proposal is a bit terse, and it would be nice if it explained a bit more how common use cases, like translations and synonyms, would be covered by the proposed model. But it clearly states that Lexemes contain Senses and Forms, and that Sense and Forms are entities (and thus have IDs, and can be referenced individually) and have Statements (which can be used to reference other entities, like Senses or Items).

My explanation reflects the intent behind the proposed model. If it seems far from the proposal to you, it would be good to know why that is, and how that could be fixed.

-- Daniel Kinzler Senior Software Developer

Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- - Scott MacLeod - Founder & President - http://worlduniversityandschool.org - 415 480 4577 - PO Box 442, (86 Ridgecrest Road), Canyon, CA 94516 - World University and School - like Wikipedia with best STEM-centric OpenCourseWare - incorporated as a nonprofit university and school in California, and is a U.S. 501 (c) (3) tax-exempt educational organization, both effective April 2010. World University and School is sending you this because of your interest in free, online, higher education. If you don't want to receive these, please reply with 'unsubscribe' in the body of the email, leaving the subject line intact. Thank you.

3504

Age (days ago)

3518

Last active (days ago)

wikidata@lists.wikimedia.org

70 comments

24 participants

tags (0)

participants (24)

Andy Mabbett
Bene*
Daniel Kinzler
Denny Vrandečić
Denny Vrandečić
Federico Leva (Nemo)
Gerard Meijssen
Jan Dudík
Jo
John Erling Blad
John Mark Vandenberg
Luca Martinelli
Lydia Pintscher
Magnus Manske
Markus Krötzsch
Milos Rancic
Paul Houle
Ricordisamoa
Romaine Wiki
Scott MacLeod
Smolenski Nikola
Stas Malyshev
Thomas Douillard
Yair Rand