Saluton Nicolas,
Le 30/11/2017 à 00:23, Nicolas VIGNERON a écrit :
Mathieu,
I know you and like you personally, that why I can say that this mail is clearly not your best argument.
Despite saying multiple times this is not a manifesto nor against Wikidata, your mail seems clearly fuelled with biases and misjudgements (especially Wikidata can't be « discontinued quietly » not now that it's so widely used in Wikimedia projects, even the wiktionaries are *already* using Wikidata).
That's perfectly plausible that my view is fuelled with biases and misjudgements, and that's why I'm looking for feedback that might help in correcting them if needed. I prefer to expose my errors blatantly and seize opportunities to correct them rather than confine myself in my possibly misguided views.
Of course, the statement that Wikidata can't be « discontinued quietly » is shocking. Surely I'm a little provocative here. But one have to put that in perspective with the fact that my previous attempts to get feedback on this were far less provocative, or at least were aiming at being as unprovocative as I could do. So I recognize you are right to point this, all the more as I made my previous more cordial demands in less visible canals.
Dissecting each single phrase point by point is violent, borderline mean and definitely not constructive ; cross-posting this mail on multiple places doesn't help either. This is not the good way to debate peacefully.
First, if people felt personally assaulted by my message, I apologize. I wasn't aware that treating a topic point by point extensively could be perceived as such a violent behaviour. I don't want to harass anyone, I want to get constructive feedback on this topic from as many people of our community that I can get. If there are better way to achieve this through documented peaceful process, I would welcome references to this kind of documentation. And if we don't have that kind of documentation, I think it would be interesting that we build one.
For better or worse, Wikidata choose CC0 and it will be quite difficult to change the licence now (the example of licence change on OpenStreetMap illustrate it quite painfully).
Actually, with CC0 – if it appeared that all the data contained in Wikidata really can be published under CC0 – we could switch the whole database to whatever license we want. That was even explicitly stated as is at the start of the project that:
So do I understand it correctly that during development and testing, we can can go with CC-0, and later relicense to whatever seems suitable, which is possible with CC-0?, Denny Vrandečić, https://lists.wikimedia.org/pipermail/wikidata//2012-April/000185.html
But as far as I'm concerned, I wouldn't suggest for such a unilateral move. For me, just allowing a tracking of license for each item would be enough.
We have to get approval of the community, there was multiple lengthy and non-conclusive discussions, it's not something that will be done with a ranting mail.
I'm interested with links to this community discussions and clear approval of the community.
For me, the situation is quite simple, Wikidata needs lexiographical data and the Wikimedia projects needs Wikidata to have these data.
I agree with that, or at least that it would be very positive for our community to have this kind of tools.
Nobody suggest in no way to do license laundering nor to violates Wiktionaries licence,
It's not suggestion, it's what Wikidata is already doing with Wikipedia, despite the initial statement of Wikidata team[1] that it wouldn't do that because it's illegal :
/"Alexrk2, it is true that Wikidata under CC0 would not be allowed to import content from a Share-Alike data source. Wikidata does not plan to extract content out of Wikipedia at all. Wikidata will provide data that can be reused in the Wikipedias./" – Denny Vrandečić https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_da...
I think that the extent to which massive import without respecting license of the source should be investigated properly by the Wikimedia legal team, or some qualified consultants.
In the mid time, based on its previous practises, it's clear that promises of Wikidata team regarding respect of licenses can not be trusted. So even if they suggested that that kind of massive import won't be done, it wouldn't be enough.
in fact we could simply import Public Domain sources (in the same way the wiktionaries did, in frwikt a big chunk of entries come from the /Littré/ and the /Dictionnaire de l’Académie française/, and there is enough dictionaries waiting in the Wikisources to keep us busy for years) but it would be a shame for Wikidata to not profits from wiktionarists expertise.
I agree with that. All the more, all this material we imported helped much in populating the project, but it often includes heavy biases, outdated definitions which are not marked as is, completely sexists and racists definitions that we are improving with the goals and values of our movement in mind. So it's not just expertise of contributors, but also all the work they already achieved that should be mergeable in the Wikidata solution. Only allowing CC0 will make that impossible.
Let's get over the petty and unsolvable issues and work intelligently and pragmatically to improve Wikidata.
You entitled to disagree with the way that has been chosen and not take part in it (and from your editcount, I see that you don't) but please don't destroy others efforts and try to be more aligned with the wiki-spirit.
I'm not trying to destroy the work of any part of our community, but on the contrary I'm aiming at protect its sustainability. If my concerns are only mere delusions, that's great. But if it's not, I would feel ashamed in the future that I suspected possible avoidable bad scenario and did nothing about that.
All the more, Wikidata aims at being ubiquitous under all Wikimedia projects, even if some integration are moderated through community consensus. So there is no way I might avoid it completely while continuing to contribute in Wikimedia projects. Actually I have recently learn that there are already data which are automatically inserted in Wikidata when publishing contributions on others mediawiki projects, but so far I'm not aware of what is cover exactly. All the more, I am in fact very favourable to a more ubiquitous integration of Wikidata in our ecosystem. But not with the current license conditions.
I hope my answer wasn't too point by point so that it wont fall in the problems you mentioned.
Amike, mathieu
A galon, ~nicolas
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
2017-11-30 11:46 GMT+02:00 mathieu stumpf guntz < psychoslave@culture-libre.org>:
Nobody suggest in no way to do license laundering nor to violates
Wiktionaries licence,
It's not suggestion, it's what Wikidata is already doing with Wikipedia,
despite the initial statement of Wikidata team[1] that it wouldn't do that because it's illegal :
/"Alexrk2, it is true that Wikidata under CC0 would not be allowed to import content from a Share-Alike data source. Wikidata does not plan to extract content out of Wikipedia at all. Wikidata will provide data that can be reused in the Wikipedias./" – Denny Vrandečić
https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_da...
I think that the extent to which massive import without respecting
license of the source should be investigated properly by the Wikimedia legal team, or some qualified consultants.
In the mid time, based on its previous practises, it's clear that
promises of Wikidata team regarding respect of licenses can not be trusted. So even if they suggested that that kind of massive import won't be done, it wouldn't be enough.
This is another personal attack, and it's unnecessary and incorrect.
The imports from Wikipedia were done by the Wikidata community, not by Wikidata team.
It's too easy to speak in retrospect, but there were these plausible scenarios:
1. Editors who strongly care about reliable sourcing, in the style of English Wikipedia verifiability policies, are strongly opposed to importing data from Wikipedia, because by itself it's a self-reference and not a reliable source. If it would succeed, data would not be imported from Wikipedia, not because of licensing, but because of content quality. I remember attempts to do this, but evidently this is not what happened.
2. Editors who strongly care about the prevention of license whitewashing object to importing data from Wikipedia and prevent it. This also could happen, but it didn't.
3. Editors who are good at writing bots or making a lot of manual edits and love seeing Wikidata getting filled with data, import a lot of data. Like it or not, this happened.
Could anybody know in 2012 what would actually happen? I don't know. If you would have asked me then, I'd possibly guess that scenarios 1 and 2 are likelier, but now we know that that would be very naïve.
Judging by what happened in the past, I can suspect that data from Wiktionary will be imported anyway. Public domain or not, the bots people will find a way around licenses. It's a certain eventuality. The bigger questions are under what license will it be eventually stored, under what licenses will it be reused, and will this contribute to the growth of Free Knowledge. My intuition tells me that using more CC-BY-SA and less CC-0 will contribute more to Free Knowledge, but what do I know.
Maybe, instead of thinking about CC0 vs CC-BY-SA, we should try to think at the goal: how can we, as a movement, "fight" the exploitation from over-the-top players of community-generated content?
Of course, license is the primary tool every one of us thinks about. But (and please correct me if I'm wrong) I don't think that things changed much from when Wikidata was not here and Google just scraped/crawled Wikipedia for their own knowledge base. Players like Google have resources and skill to basically do what they want, and if I recall correctly they didn't really stop with CC-BY-SA content. So license is not an obstacle for them.
As much as I don't personally like this, my question is: Is this a real problem? I don't like the idea of Wikimedia communities giving content for free to players so big that can actually profit hugely from this, (huge profits always translates to huge power), but I really don't know what we could do about this.
Aubrey
On Thu, Nov 30, 2017 at 11:04 AM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
2017-11-30 11:46 GMT+02:00 mathieu stumpf guntz < psychoslave@culture-libre.org>:
Nobody suggest in no way to do license laundering nor to violates
Wiktionaries licence,
It's not suggestion, it's what Wikidata is already doing with Wikipedia,
despite the initial statement of Wikidata team[1] that it wouldn't do that because it's illegal :
/"Alexrk2, it is true that Wikidata under CC0 would not be allowed to import content from a Share-Alike data source. Wikidata does not plan to extract content out of Wikipedia at all. Wikidata will provide data that can be reused in the Wikipedias./" – Denny Vrandečić
https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_ right_license_for_data.3F
I think that the extent to which massive import without respecting
license of the source should be investigated properly by the Wikimedia legal team, or some qualified consultants.
In the mid time, based on its previous practises, it's clear that
promises of Wikidata team regarding respect of licenses can not be trusted. So even if they suggested that that kind of massive import won't be done, it wouldn't be enough.
This is another personal attack, and it's unnecessary and incorrect.
The imports from Wikipedia were done by the Wikidata community, not by Wikidata team.
It's too easy to speak in retrospect, but there were these plausible scenarios:
- Editors who strongly care about reliable sourcing, in the style of
English Wikipedia verifiability policies, are strongly opposed to importing data from Wikipedia, because by itself it's a self-reference and not a reliable source. If it would succeed, data would not be imported from Wikipedia, not because of licensing, but because of content quality. I remember attempts to do this, but evidently this is not what happened.
- Editors who strongly care about the prevention of license whitewashing
object to importing data from Wikipedia and prevent it. This also could happen, but it didn't.
- Editors who are good at writing bots or making a lot of manual edits and
love seeing Wikidata getting filled with data, import a lot of data. Like it or not, this happened.
Could anybody know in 2012 what would actually happen? I don't know. If you would have asked me then, I'd possibly guess that scenarios 1 and 2 are likelier, but now we know that that would be very naïve.
Judging by what happened in the past, I can suspect that data from Wiktionary will be imported anyway. Public domain or not, the bots people will find a way around licenses. It's a certain eventuality. The bigger questions are under what license will it be eventually stored, under what licenses will it be reused, and will this contribute to the growth of Free Knowledge. My intuition tells me that using more CC-BY-SA and less CC-0 will contribute more to Free Knowledge, but what do I know. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Le 30/11/2017 à 12:14, Andrea Zanni a écrit :
Maybe, instead of thinking about CC0 vs CC-BY-SA, we should try to think at the goal: how can we, as a movement, "fight" the exploitation from over-the-top players of community-generated content?
Thank you for enlightening this surely far better way to investigate the topic. Stand back is often very helpful, but also often difficult when you have your nose in the data.
Of course, license is the primary tool every one of us thinks about. But (and please correct me if I'm wrong) I don't think that things changed much from when Wikidata was not here and Google just scraped/crawled Wikipedia for their own knowledge base. Players like Google have resources and skill to basically do what they want, and if I recall correctly they didn't really stop with CC-BY-SA content. So license is not an obstacle for them. As much as I don't personally like this, my question is: Is this a real problem?
I miss clear data on that, but I came across some documents making a parallel between a shrink of audience in Wikipedia and the arrival of Google Knowledg Graph. So the basic argument was, less traffic, less people know our movement, less potential contributors and less donors. But I didn't deepen this topic yet. Any reference which confirm/infirm or simply speak about this corollary is welcome.
I don't like the idea of Wikimedia communities giving content for free to players so big that can actually profit hugely from this, (huge profits always translates to huge power), but I really don't know what we could do about this.
Well, I'm far less concerned with other actors making little, medium or huge profit by using work of our community. Per se, I don't see it as a threat for our community, and even this actors might give back in some way if they wish. And in fact, some do. Google does provide to our community some useful resources, not only money but they also organize events like summer of code which benefits our community.
What raises my concern is that this actors can have a negative effect on our community liveliness, even if it's not their goal at all and that they are fine with the idea of helping us where it doesn't directly conflict with their business model.
I say Google, but other prominent actors which makes the sun shine or make it rain as regards of web audience are equally replaceable in previous sentences.
So to my point of view, despite all the controversies it raised "knowledge engine" as a general search open engine would be an interesting idea to explore. That could avoid being left without visibility due to main actors of the field moving to a new paradigm where our community is no longer useful for them, or even in direct competition with what they are targeting but under a closed garden paradigm.
Aubrey
On Thu, Nov 30, 2017 at 11:04 AM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
2017-11-30 11:46 GMT+02:00 mathieu stumpf guntz < psychoslave@culture-libre.org>:
Nobody suggest in no way to do license laundering nor to violates
Wiktionaries licence,
It's not suggestion, it's what Wikidata is already doing with Wikipedia,
despite the initial statement of Wikidata team[1] that it wouldn't do that because it's illegal :
/"Alexrk2, it is true that Wikidata under CC0 would not be allowed to import content from a Share-Alike data source. Wikidata does not plan to extract content out of Wikipedia at all. Wikidata will provide data that can be reused in the Wikipedias./" – Denny Vrandečić
https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_ right_license_for_data.3F
I think that the extent to which massive import without respecting
license of the source should be investigated properly by the Wikimedia legal team, or some qualified consultants.
In the mid time, based on its previous practises, it's clear that
promises of Wikidata team regarding respect of licenses can not be trusted. So even if they suggested that that kind of massive import won't be done, it wouldn't be enough.
This is another personal attack, and it's unnecessary and incorrect.
The imports from Wikipedia were done by the Wikidata community, not by Wikidata team.
It's too easy to speak in retrospect, but there were these plausible scenarios:
- Editors who strongly care about reliable sourcing, in the style of
English Wikipedia verifiability policies, are strongly opposed to importing data from Wikipedia, because by itself it's a self-reference and not a reliable source. If it would succeed, data would not be imported from Wikipedia, not because of licensing, but because of content quality. I remember attempts to do this, but evidently this is not what happened.
- Editors who strongly care about the prevention of license whitewashing
object to importing data from Wikipedia and prevent it. This also could happen, but it didn't.
- Editors who are good at writing bots or making a lot of manual edits and
love seeing Wikidata getting filled with data, import a lot of data. Like it or not, this happened.
Could anybody know in 2012 what would actually happen? I don't know. If you would have asked me then, I'd possibly guess that scenarios 1 and 2 are likelier, but now we know that that would be very naïve.
Judging by what happened in the past, I can suspect that data from Wiktionary will be imported anyway. Public domain or not, the bots people will find a way around licenses. It's a certain eventuality. The bigger questions are under what license will it be eventually stored, under what licenses will it be reused, and will this contribute to the growth of Free Knowledge. My intuition tells me that using more CC-BY-SA and less CC-0 will contribute more to Free Knowledge, but what do I know. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Le 30/11/2017 à 11:04, Amir E. Aharoni a écrit :
2017-11-30 11:46 GMT+02:00 mathieu stumpf guntz < psychoslave@culture-libre.org>: promises of Wikidata team regarding respect of licenses can not be trusted. So even if they suggested that that kind of massive import won't be done, it wouldn't be enough.
This is another personal attack, and it's unnecessary and incorrect.
Well, I don't know on this one, I'm talking about "Wikidata team". Maybe the statement might be considered incorrect, but is it personal attack to mention who said what with a precise source? Or is it the way I formulated that was needlessly aggressive? What would be a more proper way to formulate that there was a stated promise which wasn't hold?
Once again, my goal is not to offence anyone, be it an individual or a group of people. On the other hand, when there are decisions which are taken by some entity which clearly identify itself as responsible for the decision, then isn't it fair to consider this entity as also responsible for consequences of this decision. At least to some extent which don't include reasonably unpredictable consequences.
The imports from Wikipedia were done by the Wikidata community, not by Wikidata team.
Sure, and I think that here the responsibility is shared between the Wikidata community and the team which promised it would not happen. Hopefully, Wikisource community would no allow anyone to publish a work like Harry Potter in it's repository. Or even less legally problematic some works available under a CC-by-sa-nc license or some equivalent. And would the Wikisource community be lenient enough for allowing that, I would expect the foundation to remove this works, especially if authors of this works would complain about this license laundering.
Also I wonder why Wikipedia community didn't react to this massive extraction, if indeed it didn't, so maybe there are also some convincing arguments that was presented to him that I'm not aware of. Once again, references are welcome.
So, maybe I'll proven completely wrong here too, with some point I'm not aware of, which would be fine. Otherwise Wikidata team did indeed let the community go in too lenient behaviours.
By the way, arguments proposed here will be used in further evolution of the project research on this topic. Plus it's on Wikiversity, so if you speak French, your contributions are welcome.
It's too easy to speak in retrospect, but there were these plausible scenarios:
Well, the easiest way to go is to blindly follow anywhere the majority goes. Anything else is more difficult. Building scenarios is good, and trying to falsify them with available data is even better.
- Editors who strongly care about reliable sourcing, in the style of
English Wikipedia verifiability policies, are strongly opposed to importing data from Wikipedia, because by itself it's a self-reference and not a reliable source. If it would succeed, data would not be imported from Wikipedia, not because of licensing, but because of content quality. I remember attempts to do this, but evidently this is not what happened.
Yes, I came across some document on that matter, which fed my thoughts on traceability. Actually, from document I went through it's probably the most recurring concern that I found expressed by the community. And the most usual answer is (in spirit) that "it will improve in the future, this is a useful transition state, later more external sources will supersed Wikipedia for the same statements". Apart from the usefulness from a Wikipedia perspective, that are arguments that all sound rather consistent to my mind.
I'm not sure of the current state of use of Wikidata within the miscellaneous Wikipedia projects, and what community discussions occurred in each. References are welcome here too.
- Editors who strongly care about the prevention of license whitewashing
object to importing data from Wikipedia and prevent it. This also could happen, but it didn't.
- Editors who are good at writing bots or making a lot of manual edits and
love seeing Wikidata getting filled with data, import a lot of data. Like it or not, this happened.
Could anybody know in 2012 what would actually happen? I don't know. If you would have asked me then, I'd possibly guess that scenarios 1 and 2 are likelier, but now we know that that would be very naïve.
The problem is not so much predictions, which is always difficult, especially about future.
The problem is the will of Wikidata team to intervene when the community is crossing the line that they themselves previously identified as not legally negotiable.
Judging by what happened in the past, I can suspect that data from Wiktionary will be imported anyway. Public domain or not, the bots people will find a way around licenses. It's a certain eventuality. The bigger questions are under what license will it be eventually stored, under what licenses will it be reused, and will this contribute to the growth of Free Knowledge. My intuition tells me that using more CC-BY-SA and less CC-0 will contribute more to Free Knowledge, but what do I know.
Actually, I dug in all this research because I'm very interested with all that Wikibase could bring to lexicological works in Wikimedia, but that I wasn't very enthusiast with CC0 and I wanted to see if I could change my mind through studying why it was chose for Wikidata in the first place.
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
wikimedia-l@lists.wikimedia.org