Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia.
Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in /Our strategic direction: Service and //*Equity*/ https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on.
Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as /a way to try to make it harder for big companies to profit from openly available resources/ *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated /proprietary software developers have the advantage of money; free software developers need to make advantages for each other/. This might be generalised as /big companies have the advantage of money; free/libre culture contributors need to make advantages for each other/. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to /collect and use different forms of free, trusted knowledge/ that /focus efforts on the knowledge and communities that have been left out by structures of power and privilege/, as stated in /Our strategic direction: Service and Equity/.
CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
*
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
*
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Mathieu,
I know you and like you personally, that why I can say that this mail is clearly not your best argument.
Despite saying multiple times this is not a manifesto nor against Wikidata, your mail seems clearly fuelled with biases and misjudgements (especially Wikidata can't be « discontinued quietly » not now that it's so widely used in Wikimedia projects, even the wiktionaries are *already* using Wikidata). Dissecting each single phrase point by point is violent, borderline mean and definitely not constructive ; cross-posting this mail on multiple places doesn't help either. This is not the good way to debate peacefully. Some of your argument are good but most are quite poor and really missed the big picture.
For better or worse, Wikidata choose CC0 and it will be quite difficult to change the licence now (the example of licence change on OpenStreetMap illustrate it quite painfully). We have to get approval of the community, there was multiple lengthy and non-conclusive discussions, it's not something that will be done with a ranting mail.
For me, the situation is quite simple, Wikidata needs lexiographical data and the Wikimedia projects needs Wikidata to have these data. Nobody suggest in no way to do license laundering nor to violates Wiktionaries licence, in fact we could simply import Public Domain sources (in the same way the wiktionaries did, in frwikt a big chunk of entries come from the *Littré* and the *Dictionnaire de l’Académie française*, and there is enough dictionaries waiting in the Wikisources to keep us busy for years) but it would be a shame for Wikidata to not profits from wiktionarists expertise. Let's get over the petty and unsolvable issues and work intelligently and pragmatically to improve Wikidata.
You entitled to disagree with the way that has been chosen and not take part in it (and from your editcount, I see that you don't) but please don't destroy others efforts and try to be more aligned with the wiki-spirit.
A galon, ~nicolas
Saluton Nicolas,
Le 30/11/2017 à 00:23, Nicolas VIGNERON a écrit :
Mathieu,
I know you and like you personally, that why I can say that this mail is clearly not your best argument.
Despite saying multiple times this is not a manifesto nor against Wikidata, your mail seems clearly fuelled with biases and misjudgements (especially Wikidata can't be « discontinued quietly » not now that it's so widely used in Wikimedia projects, even the wiktionaries are *already* using Wikidata).
That's perfectly plausible that my view is fuelled with biases and misjudgements, and that's why I'm looking for feedback that might help in correcting them if needed. I prefer to expose my errors blatantly and seize opportunities to correct them rather than confine myself in my possibly misguided views.
Of course, the statement that Wikidata can't be « discontinued quietly » is shocking. Surely I'm a little provocative here. But one have to put that in perspective with the fact that my previous attempts to get feedback on this were far less provocative, or at least were aiming at being as unprovocative as I could do. So I recognize you are right to point this, all the more as I made my previous more cordial demands in less visible canals.
Dissecting each single phrase point by point is violent, borderline mean and definitely not constructive ; cross-posting this mail on multiple places doesn't help either. This is not the good way to debate peacefully.
First, if people felt personally assaulted by my message, I apologize. I wasn't aware that treating a topic point by point extensively could be perceived as such a violent behaviour. I don't want to harass anyone, I want to get constructive feedback on this topic from as many people of our community that I can get. If there are better way to achieve this through documented peaceful process, I would welcome references to this kind of documentation. And if we don't have that kind of documentation, I think it would be interesting that we build one.
For better or worse, Wikidata choose CC0 and it will be quite difficult to change the licence now (the example of licence change on OpenStreetMap illustrate it quite painfully).
Actually, with CC0 – if it appeared that all the data contained in Wikidata really can be published under CC0 – we could switch the whole database to whatever license we want. That was even explicitly stated as is at the start of the project that:
So do I understand it correctly that during development and testing, we can can go with CC-0, and later relicense to whatever seems suitable, which is possible with CC-0?, Denny Vrandečić, https://lists.wikimedia.org/pipermail/wikidata//2012-April/000185.html
But as far as I'm concerned, I wouldn't suggest for such a unilateral move. For me, just allowing a tracking of license for each item would be enough.
We have to get approval of the community, there was multiple lengthy and non-conclusive discussions, it's not something that will be done with a ranting mail.
I'm interested with links to this community discussions and clear approval of the community.
For me, the situation is quite simple, Wikidata needs lexiographical data and the Wikimedia projects needs Wikidata to have these data.
I agree with that, or at least that it would be very positive for our community to have this kind of tools.
Nobody suggest in no way to do license laundering nor to violates Wiktionaries licence,
It's not suggestion, it's what Wikidata is already doing with Wikipedia, despite the initial statement of Wikidata team[1] that it wouldn't do that because it's illegal :
/"Alexrk2, it is true that Wikidata under CC0 would not be allowed to import content from a Share-Alike data source. Wikidata does not plan to extract content out of Wikipedia at all. Wikidata will provide data that can be reused in the Wikipedias./" – Denny Vrandečić https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_da...
I think that the extent to which massive import without respecting license of the source should be investigated properly by the Wikimedia legal team, or some qualified consultants.
In the mid time, based on its previous practises, it's clear that promises of Wikidata team regarding respect of licenses can not be trusted. So even if they suggested that that kind of massive import won't be done, it wouldn't be enough.
in fact we could simply import Public Domain sources (in the same way the wiktionaries did, in frwikt a big chunk of entries come from the /Littré/ and the /Dictionnaire de l’Académie française/, and there is enough dictionaries waiting in the Wikisources to keep us busy for years) but it would be a shame for Wikidata to not profits from wiktionarists expertise.
I agree with that. All the more, all this material we imported helped much in populating the project, but it often includes heavy biases, outdated definitions which are not marked as is, completely sexists and racists definitions that we are improving with the goals and values of our movement in mind. So it's not just expertise of contributors, but also all the work they already achieved that should be mergeable in the Wikidata solution. Only allowing CC0 will make that impossible.
Let's get over the petty and unsolvable issues and work intelligently and pragmatically to improve Wikidata.
You entitled to disagree with the way that has been chosen and not take part in it (and from your editcount, I see that you don't) but please don't destroy others efforts and try to be more aligned with the wiki-spirit.
I'm not trying to destroy the work of any part of our community, but on the contrary I'm aiming at protect its sustainability. If my concerns are only mere delusions, that's great. But if it's not, I would feel ashamed in the future that I suspected possible avoidable bad scenario and did nothing about that.
All the more, Wikidata aims at being ubiquitous under all Wikimedia projects, even if some integration are moderated through community consensus. So there is no way I might avoid it completely while continuing to contribute in Wikimedia projects. Actually I have recently learn that there are already data which are automatically inserted in Wikidata when publishing contributions on others mediawiki projects, but so far I'm not aware of what is cover exactly. All the more, I am in fact very favourable to a more ubiquitous integration of Wikidata in our ecosystem. But not with the current license conditions.
I hope my answer wasn't too point by point so that it wont fall in the problems you mentioned.
Amike, mathieu
A galon, ~nicolas
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Dear Lydia, Mathieu, Nicolas and All,
I'm seeking a clarification here to "An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0" re the implications of CC-0 licensing for Wikidata say in comparison with CC-4 licensing.
If CC-0 licensing allows for commercial use - "Once the creator or a subsequent owner of a work applies CC0 to a work, the work is no longer his or hers in any meaningful sense under copyright law. Anyone can then use the work in any way and for any purpose, including commercial purposes, subject to other laws and the rights others may have in the work or how the work is used. Think of CC0 as the "no rights reserved" option " (https://wiki.creativecommons.org/wiki/CC0_FAQ ) ...
... and, by contrast, CC-4 licensing (say by MIT OpenCourseWare in its 7 languages, for example, - where its CC-4 licensing allows for "sharing" "adapting" but "non-commercially"), what would CC-0 Wikidata licensed databases allow for commercially? Since Wikidata, or Wikisource or Project Wikicite in particular, for example, are licensed CC-0 licensing option, could (CC) Bookstores, for example, use this CC-0 licensing, in all 295 of Wikipedia's languages, for the books in their (online) bookstores? (Also are there any data, or sister projects, affiliated with Wikidata that are not CC-0 re https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE% 29/CC-0 ? )
Thanks, Scott
On Wed, Nov 29, 2017 at 1:45 PM, Mathieu Stumpf Guntz < psychoslave@culture-libre.org> wrote:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and **Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On November 29, 2017 at 3:33:47 PM, Scott MacLeod ( worlduniversityandschool@gmail.com) wrote:
Dear Lydia, Mathieu, Nicolas and All,
I'm seeking a clarification here to "An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0" re the implications of CC-0 licensing for Wikidata say in comparison with CC-4 licensing.
If CC-0 licensing allows for commercial use - "Once the creator or a subsequent owner of a work applies CC0 to a work, the work is no longer his or hers in any meaningful sense under copyright law. Anyone can then use the work in any way and for any purpose, including commercial purposes, subject to other laws and the rights others may have in the work or how the work is used. Think of CC0 as the "no rights reserved" option " (https://wiki.creativecommons.org/wiki/CC0_FAQ ) ...
... and, by contrast, CC-4 licensing (say by MIT OpenCourseWare in its 7 languages, for example, - where its CC-4 licensing allows for "sharing" "adapting" but "non-commercially"), what would CC-0 Wikidata licensed databases allow for commercially? Since Wikidata, or Wikisource or Project Wikicite in particular, for example, are licensed CC-0 licensing option, could (CC) Bookstores, for example, use this CC-0 licensing, in all 295 of Wikipedia's languages, for the books in their (online) bookstores? (Also are there any data, or sister projects, affiliated with Wikidata that are not CC-0 re https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE% 29/CC-0 ? )
Thanks, Scott
CC-0 is functionally equivalent to the public domain. Anything released under CC-0 can be used by anyone for any reason with no conditions whatsoever. For more information see < https://creativecommons.org/share-your-work/public-domain/cc0/%3E. Since Wikidata’s data is released under CC-0, it can be used by anyone for any reason with no conditions.
Cheers, James Hare
Here are some reasons for other resources to switch to CC0: https://www.wikipathways.org/index.php/WikiPathways:CC0_Announcement
On Wed, Nov 29, 2017 at 10:45 PM, Mathieu Stumpf Guntz < psychoslave@culture-libre.org> wrote:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and **Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, With all due respect. The way you treat Denny Vrandečić is a personal attack. You dismiss his work and opinion by stating that he now works for Google implying that it must have been because of his influence on the decision on the use of the CC-0. This happened several years after the decision on CC-0 and, in my opinion the fact that we were willing to collaborate.. on Freebase for instance, is what probably served us and Denny more in this. Also this decision comes from a longer history; for instance OmegaWiki has both CC-0 and CC-by-sa as a license because of the lust for endless talk on what license is "best". I do know about a conversation in Rome where this was discussed at length over a pizza.
You ask for a "proof" that shows the use of CC-0 is best. The best proof that you are going to get is the success of Wikidata in reaching out widely with success and the huge amount of data that comes its way. What more proof do you want?
Your claim of the moral high ground fails to impress given the facts and there is no proof that I see that substantiates your attack on Denny, the implicit dismissal of his arguments and any reason why Wikipedifiying Wikidata will serve us better.
PS You can make my contributions CC-by-sa without my consent. As it is, anyone can use the data of Wikidata and that is why I contribute. Thanks, GerardM
On 29 November 2017 at 22:45, Mathieu Stumpf Guntz < psychoslave@culture-libre.org> wrote:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and **Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi Gerard,
Le 30/11/2017 à 08:46, Gerard Meijssen a écrit :
Hoi, With all due respect. The way you treat Denny Vrandečić is a personal attack.
If he feels so, then I apologize to him. I'm not in a quest against any one, I don't make the problem a question of person. Also, I initially where far less emphazing on who was telling what in the research project I pointed. But following a feedback from Nicolas Vigneron, I started to give more emphasize on who was telling what. It might not have been what it was expecting through it's suggestion, however, so if there is a problem with the way I presented the topic, it's of course all my responsibility, and I just give this explication to be transparent on how I arrived at this, not to try to transfer to responsibility on anyone else.
So, my point is not about a particular person, but the role they occupied in the decision of using exclusively CC0 as Wikidata license. So I'm perfectly ok with replacing "Denny Vrandečić" with "Wikidata leader" or whatever role title we might agree on. All the more, I'm not sure of his current role in the project now, and a timeline of his successive role he occupied regarding OmegaWiki, (maybe Semantic Wikipedia too?), Wikidata, and Google would be interesting to better grab this topic.
Because I feel like there as been rather obvious indices of possible conflict of interest. Now, that might be false positive indications, but in this case I think it would worth to completely remove any doubt about that. I must precise here that, as far as I'm concerned, pointing possible conflict of interest is not an attack on person.
Despite my attempt to emphasize that I'm not trying to harass anyone or call for destroying Wikidata, it seems that I fail to avoid this interpretations. I'm sorry, as it's really sincerely not things I'm attempting to do. I am also human, I make errors, I listen to people when they point me to behavior that they find problematic from my part, and I do my best to improve myself on this point.
Maybe to counter balance my statements regarding Denny Vrandečić, I feel deeply recognizing for its work as a whole which – with no doubt – brought extremely vast and valuable contributions to our community. While documenting on the current topic I learned a bit about him, I find is path interesting, and I thought here and there that it should be interesting to have some friendly discussion with him. I also have in mind to make an interview with him that I would hope to publish on Wikinews, as he suggested he would welcome at some point in his correspondence. But since I wouldn't like to make such an interview too focus on this current topic of license in which my mind is currently too occupied, I prefer to differ that for later.
You dismiss his work and opinion by stating that he now works for Google implying that it must have been because of his influence on the decision on the use of the CC-0. This happened several years after the decision on CC-0 and, in my opinion the fact that we were willing to collaborate.. on Freebase for instance, is what probably served us and Denny more in this. Also this decision comes from a longer history; for instance OmegaWiki has both CC-0 and CC-by-sa as a license because of the lust for endless talk on what license is "best". I do know about a conversation in Rome where this was discussed at length over a pizza.
Well, I'm all for documenting how this decision happened, so if you have suggestions of document I should read on the topic, this references are welcome. I'm all open to changing my mind based on clear references. Unfortunately, a discussion around a pizza won't be accessible to me until a breakthrough happens in what we can reliably access of reality. So far I went through all the threads on wikidata-l since its launch and IRC logs of 2014/2015 to find information on this topic, plus many other sources documented in the research project.
You ask for a "proof" that shows the use of CC-0 is best. The best proof that you are going to get is the success of Wikidata in reaching out widely with success and the huge amount of data that comes its way. What more proof do you want?
I think that any that any convincing hint able to relegate exposed concerns as mere delusions would be fine.
Your claim of the moral high ground fails to impress given the facts and there is no proof that I see that substantiates your attack on Denny, the implicit dismissal of his arguments and any reason why Wikipedifiying Wikidata will serve us better.
There is no claim of moral high ground, attempt to impress anyone, nor any will to personally attack anyone. The very explicit concerns of possible conflict of interest can and should be proven wrong if it indeed is.
PS You can make my contributions CC-by-sa without my consent.
No, I can't do that legally. But what you explicitly published under a free license, that, I can use in the extent to which you explicitly consented to grant to everybody.
As it is, anyone can use the data of Wikidata and that is why I contribute. Thanks, GerardM
Cheers
On 29 November 2017 at 22:45, Mathieu Stumpf Guntz <psychoslave@culture-libre.org mailto:psychoslave@culture-libre.org> wrote:
Saluton ĉiuj, I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold. Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue. Bellow is a copy/paste of the above linked message: Thank you Lydia Pintscher <https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29> for taking the time to answer. Unfortunately this answer <https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0> miss too many important points to solve all concerns which have been raised. Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic <https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive> advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1>. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics. Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in /Our strategic direction: Service and //*Equity*/ <https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity>. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2> of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights <https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights> for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering <https://en.wikipedia.org/wiki/license_laundering>. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as /a way to try to make it harder for big companies to profit from openly available resources/ *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library <https://www.gnu.org/licenses/why-not-lgpl.html> is stated /proprietary software developers have the advantage of money; free software developers need to make advantages for each other/. This might be generalised as /big companies have the advantage of money; free/libre culture contributors need to make advantages for each other/. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to /collect and use different forms of free, trusted knowledge/ that /focus efforts on the knowledge and communities that have been left out by structures of power and privilege/, as stated in /Our strategic direction: Service and Equity/. CC-0 is becoming more and more common. Just like economic inequality <https://en.wikipedia.org/wiki/economic_inequality>. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people. * [1] Wikipedia Signpost 2015, 2nd december <https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed> * [2] according to the next statement of Lydia Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects. Kun multe da vikiamo, mathieu _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I basically stopped reading this email after the first attack to Denny.
I was there since the beginning, and I do recall the *extensive* discussion about what license to use. CC0 was chosen, among other things, because of the moronic EU rule about database rights, that CC 3.0 licenses didn't allow us to counter - please remember that 4.0 were still under discussion, and we couldn't afford the luxury of waiting for 4.0 to come out before publishing Wikidata.
And possibly next time provide a TL;DR version of your email at the top.
Cheers,
L.
Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" psychoslave@culture-libre.org ha scritto:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and **Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Oh, and by the way, ODbL was considered as a potential license, but I recall that that license could have been incompatible for reuse with CC BY-SA 3.0. It was actually a point of discussion with the Italian OpenStreetMap community back in 2013, when I first presented at the OSM-IT meeting the possibility of a collaboration between WD and OSM.
L.
Il 30 nov 2017 08:57, "Luca Martinelli" martinelliluca@gmail.com ha scritto:
I basically stopped reading this email after the first attack to Denny.
I was there since the beginning, and I do recall the *extensive* discussion about what license to use. CC0 was chosen, among other things, because of the moronic EU rule about database rights, that CC 3.0 licenses didn't allow us to counter - please remember that 4.0 were still under discussion, and we couldn't afford the luxury of waiting for 4.0 to come out before publishing Wikidata.
And possibly next time provide a TL;DR version of your email at the top.
Cheers,
L.
Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" < psychoslave@culture-libre.org> ha scritto:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and **Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Please keep this civil and on topic!
Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started working for Google.
As I recall it was mention during first week of the project (first week of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may), as the delivery of the laptoppen was delayed. I was against CC0 as I expected problems with reuse og external data. The arguments for CC0 convinced me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too.
Argument is pretty simple: Part A has some data A and claim license A. Part B has some data B and claim license B. Both license A and license B are sticky, this later data C that use an aggregation of A and B must satisfy both license A and license B. That is not viable.
Moving forward to a safe, non-sticky license seems to be the only viable solution, and this leads to CC0.
Feel free to discuss the merrit of our choice but do not use personal attacs. Thank you.
Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli < martinelliluca@gmail.com>:
Oh, and by the way, ODbL was considered as a potential license, but I recall that that license could have been incompatible for reuse with CC BY-SA 3.0. It was actually a point of discussion with the Italian OpenStreetMap community back in 2013, when I first presented at the OSM-IT meeting the possibility of a collaboration between WD and OSM.
L.
Il 30 nov 2017 08:57, "Luca Martinelli" martinelliluca@gmail.com ha scritto:
I basically stopped reading this email after the first attack to Denny.
I was there since the beginning, and I do recall the *extensive* discussion about what license to use. CC0 was chosen, among other things, because of the moronic EU rule about database rights, that CC 3.0 licenses didn't allow us to counter - please remember that 4.0 were still under discussion, and we couldn't afford the luxury of waiting for 4.0 to come out before publishing Wikidata.
And possibly next time provide a TL;DR version of your email at the top.
Cheers,
L.
Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" < psychoslave@culture-libre.org> ha scritto:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and * *Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
A single property licensing scheme would allow storage of data, it might or might not allow reuse of the licensed data together with other data. Remember that all entries in the servers might be part of an mashup with all other entries.
On Thu, Nov 30, 2017 at 9:55 AM, John Erling Blad jeblad@gmail.com wrote:
Please keep this civil and on topic!
Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started working for Google.
As I recall it was mention during first week of the project (first week of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may), as the delivery of the laptoppen was delayed. I was against CC0 as I expected problems with reuse og external data. The arguments for CC0 convinced me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too.
Argument is pretty simple: Part A has some data A and claim license A. Part B has some data B and claim license B. Both license A and license B are sticky, this later data C that use an aggregation of A and B must satisfy both license A and license B. That is not viable.
Moving forward to a safe, non-sticky license seems to be the only viable solution, and this leads to CC0.
Feel free to discuss the merrit of our choice but do not use personal attacs. Thank you.
Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli < martinelliluca@gmail.com>:
Oh, and by the way, ODbL was considered as a potential license, but I recall that that license could have been incompatible for reuse with CC BY-SA 3.0. It was actually a point of discussion with the Italian OpenStreetMap community back in 2013, when I first presented at the OSM-IT meeting the possibility of a collaboration between WD and OSM.
L.
Il 30 nov 2017 08:57, "Luca Martinelli" martinelliluca@gmail.com ha scritto:
I basically stopped reading this email after the first attack to Denny.
I was there since the beginning, and I do recall the *extensive* discussion about what license to use. CC0 was chosen, among other things, because of the moronic EU rule about database rights, that CC 3.0 licenses didn't allow us to counter - please remember that 4.0 were still under discussion, and we couldn't afford the luxury of waiting for 4.0 to come out before publishing Wikidata.
And possibly next time provide a TL;DR version of your email at the top.
Cheers,
L.
Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" < psychoslave@culture-libre.org> ha scritto:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and * *Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Le 30/11/2017 à 10:14, John Erling Blad a écrit :
A single property licensing scheme would allow storage of data, it might or might not allow reuse of the licensed data together with other data. Remember that all entries in the servers might be part of an mashup with all other entries.
That's a very interesting point. Does anyone know a clear extensive report of what is legal or not regarding massive import of data extracted from some source?
Indeed, if there was really no limit in using "factual statement" data, that would be a huge loophole in copyright. For example you might enumerate the position of each occurrence of a word in Harry Potter, that's all pure facts after all. But publishing an extensive set of that kind of factual statements would let anyone rebuild this books.
The same might happen with an extensive extraction of data stored initially in Wikipedia under CC-by-sa, and imported in Wikidata. There is already the ArticlePlaceholder[1] extension which is a first step in generating whole complete prosodic encyclopaedic article, which then should be logically be publishable under CC0. Thus the concerns of license laundering.
Not having a way to track sources and their corresponding licenses doesn't make automagically disappear that there are licenses issues in the first place. An integrating license tracking system should enable to detect possible infractions in remixes. Users should be informed that what they are trying to mix is legally authorized by the miscellaneous ultimate sources from which Wikidata gathered them, or not. Until some solid legal report point in this direction, it's not accurate to pretend unilaterally that they can do whatever they want regardless of sources from which Wikidata gathered them in the first place even if it's a massive import of a differently licensed source.
[1] https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder
On Thu, Nov 30, 2017 at 9:55 AM, John Erling Blad <jeblad@gmail.com mailto:jeblad@gmail.com> wrote:
Please keep this civil and on topic! Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started working for Google. As I recall it was mention during first week of the project (first week of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may), as the delivery of the laptoppen was delayed. I was against CC0 as I expected problems with reuse og external data. The arguments for CC0 convinced me. And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too. Argument is pretty simple: Part A has some data A and claim license A. Part B has some data B and claim license B. Both license A and license B are sticky, this later data C that use an aggregation of A and B must satisfy both license A and license B. That is not viable. Moving forward to a safe, non-sticky license seems to be the only viable solution, and this leads to CC0. Feel free to discuss the merrit of our choice but do not use personal attacs. Thank you. Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli <martinelliluca@gmail.com <mailto:martinelliluca@gmail.com>>: Oh, and by the way, ODbL was considered as a potential license, but I recall that that license could have been incompatible for reuse with CC BY-SA 3.0. It was actually a point of discussion with the Italian OpenStreetMap community back in 2013, when I first presented at the OSM-IT meeting the possibility of a collaboration between WD and OSM. L. Il 30 nov 2017 08:57, "Luca Martinelli" <martinelliluca@gmail.com <mailto:martinelliluca@gmail.com>> ha scritto: I basically stopped reading this email after the first attack to Denny. I was there since the beginning, and I do recall the *extensive* discussion about what license to use. CC0 was chosen, among other things, because of the moronic EU rule about database rights, that CC 3.0 licenses didn't allow us to counter - please remember that 4.0 were still under discussion, and we couldn't afford the luxury of waiting for 4.0 to come out before publishing Wikidata. And possibly next time provide a TL;DR version of your email at the top. Cheers, L. Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" <psychoslave@culture-libre.org <mailto:psychoslave@culture-libre.org>> ha scritto: Saluton ĉiuj, I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold. Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue. Bellow is a copy/paste of the above linked message: Thank you Lydia Pintscher <https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29> for taking the time to answer. Unfortunately this answer <https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0> miss too many important points to solve all concerns which have been raised. Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic <https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive> advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1>. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics. Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in /Our strategic direction: Service and //*Equity*/ <https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity>. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2> of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights <https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights> for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering <https://en.wikipedia.org/wiki/license_laundering>. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as /a way to try to make it harder for big companies to profit from openly available resources/ *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library <https://www.gnu.org/licenses/why-not-lgpl.html> is stated /proprietary software developers have the advantage of money; free software developers need to make advantages for each other/. This might be generalised as /big companies have the advantage of money; free/libre culture contributors need to make advantages for each other/. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to /collect and use different forms of free, trusted knowledge/ that /focus efforts on the knowledge and communities that have been left out by structures of power and privilege/, as stated in /Our strategic direction: Service and Equity/. CC-0 is becoming more and more common. Just like economic inequality <https://en.wikipedia.org/wiki/economic_inequality>. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people. * [1] Wikipedia Signpost 2015, 2nd december <https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed> * [2] according to the next statement of Lydia Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects. Kun multe da vikiamo, mathieu _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
you might enumerate the position of each occurrence of a word in Harry
Potter, that's all pure facts after all. But publishing an extensive set of that kind of factual statements would let anyone rebuild this books.
This is just a representation of the artwork. And the artwork is protected as a creative work. So you can’t do that without violating database right (I guess a court won’t buy the argument « but this was not the ebook of Harry Potter, this was the zipfile of an ebook of Harry Potter.) You can’t « hack » the law that way as it has been robust ehough to protect numerical and paper versions of book withou a sustantial change, an editor don’t have to protect the little endian as well as the big endian version of the file :). What is not protected is the idea : you can make a story about a sorcerer school.
What is a work of the spirit is defined by the law, in france : http://www.bnf.fr/fr/professionnels/principes_droit_auteur.html A criteria, relevant in databases is « originality » *: another author would not make the same work. In pure factual facts, like a lot of stuffs, a list of work ever published by a specific editor, any author would do the same list eventually. Only the specific presentation of the data can apply as « droit d’auteur ». However databases obey a specific law that aims to protect an organisation that uses a « substancial » amout of resources to build a specific dataset. An example is the french organism IGN https://www.wikidata.org/wiki/Special:EntityPage/Q1665102 https://www.wikidata.org/wiki/Special:EntityPage/Q1665102 who recolted, updated and publisshed detailed geographic maps of france. Such an editor is allowed to disallow the extraction of a « substancial » amount of datas from his dataset … this last 15 years from the point the editor stops unpdating the data. *
2017-11-30 13:38 GMT+01:00 mathieu stumpf guntz < psychoslave@culture-libre.org>:
Le 30/11/2017 à 10:14, John Erling Blad a écrit :
A single property licensing scheme would allow storage of data, it might or might not allow reuse of the licensed data together with other data. Remember that all entries in the servers might be part of an mashup with all other entries.
That's a very interesting point. Does anyone know a clear extensive report of what is legal or not regarding massive import of data extracted from some source?
Indeed, if there was really no limit in using "factual statement" data, that would be a huge loophole in copyright. For example you might enumerate the position of each occurrence of a word in Harry Potter, that's all pure facts after all. But publishing an extensive set of that kind of factual statements would let anyone rebuild this books.
The same might happen with an extensive extraction of data stored initially in Wikipedia under CC-by-sa, and imported in Wikidata. There is already the ArticlePlaceholder[1] extension which is a first step in generating whole complete prosodic encyclopaedic article, which then should be logically be publishable under CC0. Thus the concerns of license laundering.
Not having a way to track sources and their corresponding licenses doesn't make automagically disappear that there are licenses issues in the first place. An integrating license tracking system should enable to detect possible infractions in remixes. Users should be informed that what they are trying to mix is legally authorized by the miscellaneous ultimate sources from which Wikidata gathered them, or not. Until some solid legal report point in this direction, it's not accurate to pretend unilaterally that they can do whatever they want regardless of sources from which Wikidata gathered them in the first place even if it's a massive import of a differently licensed source.
[1] https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder
On Thu, Nov 30, 2017 at 9:55 AM, John Erling Blad jeblad@gmail.com wrote:
Please keep this civil and on topic!
Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started working for Google.
As I recall it was mention during first week of the project (first week of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may), as the delivery of the laptoppen was delayed. I was against CC0 as I expected problems with reuse og external data. The arguments for CC0 convinced me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too.
Argument is pretty simple: Part A has some data A and claim license A. Part B has some data B and claim license B. Both license A and license B are sticky, this later data C that use an aggregation of A and B must satisfy both license A and license B. That is not viable.
Moving forward to a safe, non-sticky license seems to be the only viable solution, and this leads to CC0.
Feel free to discuss the merrit of our choice but do not use personal attacs. Thank you.
Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli < martinelliluca@gmail.com>:
Oh, and by the way, ODbL was considered as a potential license, but I recall that that license could have been incompatible for reuse with CC BY-SA 3.0. It was actually a point of discussion with the Italian OpenStreetMap community back in 2013, when I first presented at the OSM-IT meeting the possibility of a collaboration between WD and OSM.
L.
Il 30 nov 2017 08:57, "Luca Martinelli" martinelliluca@gmail.com ha scritto:
I basically stopped reading this email after the first attack to Denny.
I was there since the beginning, and I do recall the *extensive* discussion about what license to use. CC0 was chosen, among other things, because of the moronic EU rule about database rights, that CC 3.0 licenses didn't allow us to counter - please remember that 4.0 were still under discussion, and we couldn't afford the luxury of waiting for 4.0 to come out before publishing Wikidata.
And possibly next time provide a TL;DR version of your email at the top.
Cheers,
L.
Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" < psychoslave@culture-libre.org> ha scritto:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and * *Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Sorry for the sprelling errojs, my post was written on a cellphone set to Norwegian.
On Thu, Nov 30, 2017 at 9:55 AM, John Erling Blad jeblad@gmail.com wrote:
Please keep this civil and on topic!
Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started working for Google.
As I recall it was mention during first week of the project (first week of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may), as the delivery of the laptoppen was delayed. I was against CC0 as I expected problems with reuse og external data. The arguments for CC0 convinced me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too.
Argument is pretty simple: Part A has some data A and claim license A. Part B has some data B and claim license B. Both license A and license B are sticky, this later data C that use an aggregation of A and B must satisfy both license A and license B. That is not viable.
Moving forward to a safe, non-sticky license seems to be the only viable solution, and this leads to CC0.
Feel free to discuss the merrit of our choice but do not use personal attacs. Thank you.
Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli < martinelliluca@gmail.com>:
Oh, and by the way, ODbL was considered as a potential license, but I recall that that license could have been incompatible for reuse with CC BY-SA 3.0. It was actually a point of discussion with the Italian OpenStreetMap community back in 2013, when I first presented at the OSM-IT meeting the possibility of a collaboration between WD and OSM.
L.
Il 30 nov 2017 08:57, "Luca Martinelli" martinelliluca@gmail.com ha scritto:
I basically stopped reading this email after the first attack to Denny.
I was there since the beginning, and I do recall the *extensive* discussion about what license to use. CC0 was chosen, among other things, because of the moronic EU rule about database rights, that CC 3.0 licenses didn't allow us to counter - please remember that 4.0 were still under discussion, and we couldn't afford the luxury of waiting for 4.0 to come out before publishing Wikidata.
And possibly next time provide a TL;DR version of your email at the top.
Cheers,
L.
Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" < psychoslave@culture-libre.org> ha scritto:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and * *Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 30 November 2017 at 09:16, John Erling Blad jeblad@gmail.com wrote:
Sorry for the sprelling errojs
Post of the year!
However, please will *everyone* trim quoted material from their replies? The OP was extremely long, and I have now received several unnecessary duplicate copies of it.
Le 30/11/2017 à 08:57, Luca Martinelli a écrit :
I basically stopped reading this email after the first attack to Denny.
That's sad to read, but I guess I must mostly blame my unfortunate formulations.
I was there since the beginning, and I do recall the *extensive* discussion about what license to use. CC0 was chosen, among other things, because of the moronic EU rule about database rights, that CC 3.0 licenses didn't allow us to counter - please remember that 4.0 were still under discussion, and we couldn't afford the luxury of waiting for 4.0 to come out before publishing Wikidata.
I welcome any reference to this discussions.
And possibly next time provide a TL;DR version of your email at the top.
Ok, thank you for this suggestion, I'll do that.
Cheers,
L.
Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" <psychoslave@culture-libre.org mailto:psychoslave@culture-libre.org> ha scritto:
Saluton ĉiuj, I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold. Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue. Bellow is a copy/paste of the above linked message: Thank you Lydia Pintscher <https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29> for taking the time to answer. Unfortunately this answer <https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0> miss too many important points to solve all concerns which have been raised. Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic <https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive> advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1>. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics. Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in /Our strategic direction: Service and //*Equity*/ <https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity>. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2> of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights <https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights> for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering <https://en.wikipedia.org/wiki/license_laundering>. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as /a way to try to make it harder for big companies to profit from openly available resources/ *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library <https://www.gnu.org/licenses/why-not-lgpl.html> is stated /proprietary software developers have the advantage of money; free software developers need to make advantages for each other/. This might be generalised as /big companies have the advantage of money; free/libre culture contributors need to make advantages for each other/. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to /collect and use different forms of free, trusted knowledge/ that /focus efforts on the knowledge and communities that have been left out by structures of power and privilege/, as stated in /Our strategic direction: Service and Equity/. CC-0 is becoming more and more common. Just like economic inequality <https://en.wikipedia.org/wiki/economic_inequality>. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people. * [1] Wikipedia Signpost 2015, 2nd december <https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed> * [2] according to the next statement of Lydia Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects. Kun multe da vikiamo, mathieu _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Dear Mathieu,
On Wed, Nov 29, 2017 at 10:45 PM, Mathieu Stumpf Guntz < psychoslave@culture-libre.org> wrote:
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
As having contributed to many open database and as user of many open database, the CCZero is my default choice for making data open. Adoption of this license is, IMHO, the prime reason Wikidata is growing so fast, and integrated so fast in many use cases. License incompatibilities have been a major concern in open source development and academic research. Yes, there too, there is a continuous almost-religious and unsolved discussion about copylefting, but the plain experience there is that the closer to the idea of public domain, the easier it is to use. The advantages of CCZero have been widely discussed in the life sciences, and while not everyone choice, the benefits outweigh the disadvantages for many. I also note that public domain (which CCZero formalizes across jurisdictions) is still the "ideal" license when uploading images to Wikimedia, suggesting more of Wikimedia actually finds the CCZero idea very welcome.
Also stress that in no way I recognize myself in your comments about Denny and Google. And your comment that "freedom of one is murder and slavery of others" needs some refinement, IMHO; my definition of "freedom" is quite different and I experience your definition as abusive and offensive.
The CCZero license of Wikidata is essential to my contributions and use of Wikimedia products. The chemistry knowledge in Wikidata is 100x more useful (to me) than that in Wikipedia etc. That is in part because of the machine readability, but also to a large part by the choice of CCZero.
I hope this helps,
with kind regards,
Egon
Le 30/11/2017 à 10:13, Egon Willighagen a écrit :
Dear Mathieu,
On Wed, Nov 29, 2017 at 10:45 PM, Mathieu Stumpf Guntz <psychoslave@culture-libre.org mailto:psychoslave@culture-libre.org> wrote:
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
As having contributed to many open database and as user of many open database, the CCZero is my default choice for making data open. Adoption of this license is, IMHO, the prime reason Wikidata is growing so fast, and integrated so fast in many use cases.
Well, that would indeed be a huge point in favor of CC0 then. Unfortunately, I'm not aware of any way to turn that into a measurable analyze, as too many factors might come coincidentally to this. However, since you are contributor of many open database, maybe you are aware of some studies on the subject which can back your opinion.
License incompatibilities have been a major concern in open source development and academic research. Yes, there too, there is a continuous almost-religious and unsolved discussion about copylefting, but the plain experience there is that the closer to the idea of public domain, the easier it is to use. The advantages of CCZero have been widely discussed in the life sciences, and while not everyone choice, the benefits outweigh the disadvantages for many.
Well, surely my message don't help to make it obvious, but I'm not radically against CC0, and don't deny it does have huge advantages in reuse. As an example I already gave the CC0/public domain for works publishd by State institutions. This is something that I am completely favorable to and will defend and promote anytime I can.
I also note that public domain (which CCZero formalizes across jurisdictions) is still the "ideal" license when uploading images to Wikimedia, suggesting more of Wikimedia actually finds the CCZero idea very welcome.
I'm not sure what you mean here. If you are talking about things like pictures that the NASA release, I think it falls in the case exposed above. If you are speaking of the most used license on Wikimedia by benevolent contributors, I'm not aware of the statistics on this topic, but would be interested to have some.
Also stress that in no way I recognize myself in your comments about Denny and Google.
I guess it's all in your honour.
And your comment that "freedom of one is murder and slavery of others" needs some refinement, IMHO; my definition of "freedom" is quite different and I experience your definition as abusive and offensive.
If you mean "freedom of one begins where it confirms freedom of others", it's not "my" definition, however I could not give proper credit to it. Maybe Joseph Déjacque was among the first to publish this with some variation in the exact formulation. But really this not "mine definition". Also it is of course not the ultimate definition of freedom that everybody have to agree with.
If you are talking about the more dramatic example of "freedom abuse" I provided next to this definition, as far as I'm aware it's more or less my forgery. Although it probably was somewhat influenced by a comment of Teofilo[1].
Suggestion of less dramatic examples which enlighten the point just as well are welcome.
[1] https://meta.wikimedia.org/wiki/Talk:Wikidata#Teofilo
The CCZero license of Wikidata is essential to my contributions and use of Wikimedia products. The chemistry knowledge in Wikidata is 100x more useful (to me) than that in Wikipedia etc. That is in part because of the machine readability, but also to a large part by the choice of CCZero.
I hope this helps,
with kind regards,
Egon
-- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ORCID: 0000-0001-7542-0286 ImpactStory: https://impactstory.org/u/egonwillighagen
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Dear Mathieu,
On Thu, Nov 30, 2017 at 2:28 PM, mathieu stumpf guntz psychoslave@culture-libre.org wrote:
Le 30/11/2017 à 10:13, Egon Willighagen a écrit :
On Wed, Nov 29, 2017 at 10:45 PM, Mathieu Stumpf Guntz psychoslave@culture-libre.org wrote:
As having contributed to many open database and as user of many open database, the CCZero is my default choice for making data open. Adoption of this license is, IMHO, the prime reason Wikidata is growing so fast, and integrated so fast in many use cases.
Well, that would indeed be a huge point in favor of CC0 then. Unfortunately, I'm not aware of any way to turn that into a measurable analyze, as too many factors might come coincidentally to this. However, since you are contributor of many open database, maybe you are aware of some studies on the subject which can back your opinion.
Generally for open projects, the impact is hard to measure. It's not as simple as determining the sales.
Overview of reuse and adoption by independent project is for me the most important measure. For example, for Wikipedia that Google shows it prominently on the search results, that students around the world frequently use it as first source to get an overview of a topic.
For Wikidata this is not as established, but I would look at the collaborations. Other databases that have adopted the Wikidata Q-number is identifiers, for example, like we did in WikiPathways, and less domain-specific, by OpenStreetMap, if not mistaken. Those collaborations are a good indication of success: projects invest time in adoption of it, and would not do it if they did not expect "return on investment".
I also note that public domain (which CCZero formalizes across jurisdictions) is still the "ideal" license when uploading images to Wikimedia, suggesting more of Wikimedia actually finds the CCZero idea very welcome.
I'm not sure what you mean here. If you are talking about things like pictures that the NASA release, I think it falls in the case exposed above. If you are speaking of the most used license on Wikimedia by benevolent contributors, I'm not aware of the statistics on this topic, but would be interested to have some.
My point was that the impression I get when uploading media is that the more liberal the license, the happier Wikimedia is about it.
Egon
Hi,
Did not read your whole argument, but as a collection of brute facts, it is hard to see how the content of wikidata could be in something else than public domain.
As a whole, the database could present a Sui generis database right (https://en.wikipedia.org/wiki/Sui_generis_database_right) , but individual contributors would not have rights in this scheme as they have in wikipedia use case.
Xavier Combelle
Le 29/11/2017 à 22:45, Mathieu Stumpf Guntz a écrit :
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia.
Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in /Our strategic direction: Service and //*Equity*/ https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on.
Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as /a way to try to make it harder for big companies to profit from openly available resources/ *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated /proprietary software developers have the advantage of money; free software developers need to make advantages for each other/. This might be generalised as /big companies have the advantage of money; free/libre culture contributors need to make advantages for each other/. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to /collect and use different forms of free, trusted knowledge/ that /focus efforts on the knowledge and communities that have been left out by structures of power and privilege/, as stated in /Our strategic direction: Service and Equity/.
CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hi Mathieu,
I understand you care a lot about this topic and are posting about it in many places but I have a personal rule that a lot of the people in Wikidata know. I am willing to discuss and explain basically anything on a calm and rational basis. (And I did this on-wiki I believe.) The rule is simple: The more loud, aggressive and pushy someone gets about a topic the less likely I am to engage. This rule has a simple reason: I don't want Wikidata to get into a spiral of shouting. If we do this people get into the mode where only if they shout they get heard so they shout all the time. This is toxic for a community. So I fear I can't contribute to this thread beyond this message.
Cheers Lydia
Wikidata is not replacing Wiktionary. Wikidata did not replace Wikipedia, and force all articles to be under CC-0. Structured data for Commons doesn't replace all Commons media with CC-0-licensed content. They didn't even set up parallel projects to hold CC-0 articles or media. There is no reason to believe that structured data for Wiktionary would do any of these things. Wikidata is for holding structured data, and only structured data.
The fact that France is in Europe is not, independently, copyrightable. The fact that File:Vanessa_indica-Silent_Valley-2016-08-14-002.jpg is a picture of a butterfly is not copyrightable. The facts that "balloons" is the plural of "balloon", and that "feliĉiĝi" is an intransitive verb in Esperanto, are not copyrightable. Even if they were copyrightable, copyrighting them independently would harm their potential reuse, as elements of a database, as has been previously explained.
A Wikipedia article is copyrightable. Licensing it under CC-BY-SA does not particularly harm its reuse, and makes it so that reuse can happen with attribution. Wikidata includes links to Wikipedia articles, and while the links are under CC-0, the linked content is under CC-BY-SA. Similarly for Commons content. Wikipedia articles and Commons Media are not structured data, and as such, they do not belong in Wikidata.
Elements of prose in Wiktionary, such as definitions, appendices, extensive usage notes and notes on grammar and whatnot, are copyrightable. Similar to Wikipedia articles, licensing them under CC-BY-SA would not particularly harm their reuse, as attribution is completely feasible. They are also not structured data, and can not be made into structured data. Wikidata will not be laundering this data to CC-0, nor will it be setting up a parallel project to duplicate the efforts under a license which is not appropriate for the type of content.
Attempting to license the database's contents under CC-BY-SA would not ensure attribution, and would harm reuse. I fail to see any potential benefits to using the more restrictive license. Attribution will be required where it is possible (in Wiktionary proper), and content will be as reusable as possible in areas where requiring attribution isn't feasible (in Wikidata). There's no real conflict here.
-- Yair Rand
2017-11-29 16:45 GMT-05:00 Mathieu Stumpf Guntz < psychoslave@culture-libre.org>:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and **Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
+1
Wikipedia and wiktionary themselves rely upon taking "facts, not they way they're stated" from sources.
Vito
2017-11-30 18:05 GMT+01:00 Yair Rand yyairrand@gmail.com:
Wikidata is not replacing Wiktionary. Wikidata did not replace Wikipedia, and force all articles to be under CC-0. Structured data for Commons doesn't replace all Commons media with CC-0-licensed content. They didn't even set up parallel projects to hold CC-0 articles or media. There is no reason to believe that structured data for Wiktionary would do any of these things. Wikidata is for holding structured data, and only structured data.
The fact that France is in Europe is not, independently, copyrightable. The fact that File:Vanessa_indica-Silent_Valley-2016-08-14-002.jpg is a picture of a butterfly is not copyrightable. The facts that "balloons" is the plural of "balloon", and that "feliĉiĝi" is an intransitive verb in Esperanto, are not copyrightable. Even if they were copyrightable, copyrighting them independently would harm their potential reuse, as elements of a database, as has been previously explained.
A Wikipedia article is copyrightable. Licensing it under CC-BY-SA does not particularly harm its reuse, and makes it so that reuse can happen with attribution. Wikidata includes links to Wikipedia articles, and while the links are under CC-0, the linked content is under CC-BY-SA. Similarly for Commons content. Wikipedia articles and Commons Media are not structured data, and as such, they do not belong in Wikidata.
Elements of prose in Wiktionary, such as definitions, appendices, extensive usage notes and notes on grammar and whatnot, are copyrightable. Similar to Wikipedia articles, licensing them under CC-BY-SA would not particularly harm their reuse, as attribution is completely feasible. They are also not structured data, and can not be made into structured data. Wikidata will not be laundering this data to CC-0, nor will it be setting up a parallel project to duplicate the efforts under a license which is not appropriate for the type of content.
Attempting to license the database's contents under CC-BY-SA would not ensure attribution, and would harm reuse. I fail to see any potential benefits to using the more restrictive license. Attribution will be required where it is possible (in Wiktionary proper), and content will be as reusable as possible in areas where requiring attribution isn't feasible (in Wikidata). There's no real conflict here.
-- Yair Rand
2017-11-29 16:45 GMT-05:00 Mathieu Stumpf Guntz < psychoslave@culture-libre.org>:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and **Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Le 30/11/2017 à 18:05, Yair Rand a écrit :
Wikidata is not replacing Wiktionary.
We will see that in the future. At least the proposed model allow to include most things that you might find in a Wiktionary article, plus it comes with all the benefit of a relational(-like) database.
See https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model for more information on what it will allow or not.
Wikidata did not replace Wikipedia, and force all articles to be under CC-0.
Sure. Not yet. But if it continue to improve, as well as tools to generate prose from it, at some point it might reach a good job at doing just that.
Structured data for Commons doesn't replace all Commons media with CC-0-licensed content.
Well, unlike one try to include use it in a very different way than what it is aiming at, there is no chance as pictures contains far more information than their metadata. Now, technically one might probably be able to store the whole picture in that kind of structure (provided no size restriction is enforced), but this is not the goal.
This is very different case than the Wiktionary case. The case of Wikipedia might be closer, but you can not make a simple one-to-one correspondence between Wikidata elements and Wikipedia prose. Actually Wikipedia extraction in statements usable in Wikidata is far more easier with current natural language processing toolkits. One the other hand such a bijective correspondence between a Wiktionary article and a set of WikibaseLexeme elements is clearly straight forward. So the domain of targeted knowledge documentation is extremely overlapping. Plus the Wikibase approach bring many advantages in term of knowledge factorisation.
To my mind, WikibaseLexeme have a good potential to quickly supersede our plethora of sparsely communicating Wiktionaries. At least far sooner than Wikibase will have a chance to approach the same level as Wikipedia article.
The fact that France is in Europe is not, independently, copyrightable. The fact that File:Vanessa_indica-Silent_Valley-2016-08-14-002.jpg is a picture of a butterfly is not copyrightable. The facts that "balloons" is the plural of "balloon", and that "feliĉiĝi" is an intransitive verb in Esperanto, are not copyrightable.
Surely that is something we all agree. :)
Even if they were copyrightable, copyrighting them independently would harm their potential reuse, as elements of a database, as has been previously explained.
Any information monopoly is a possible obstacle to reuse. No one will deny that, I guess. But information monopolies, such as copyright, patent and so on do exists. And so does unequal access to resources useful for human flourishing, including knowledge.
Now, personally I am not satisfied with this situation, nor with the growth of inequalities. A part of my motivation in contributing in Wikimedia projects is that it might contribute to make situation evolve otherwise. That might not enter in the field of motivations of every contributor, but I guess I'm not alone on this.
So the question for me is not, "how do we make our knowledge bank current snapshots as reusable as possible right now?", but "how do we build a sustainable movement which maintain and update knowledge banks that are as accessible as possible for every single human out there with this goal of sustainability in mind?".
Maybe it's not what every single stakeholder of our movement is expecting. But I don't feel that this personal vision is at odd with what is stated in the strategic direction. And I hope I'm not alone holding this vision.
Wikipedia articles and Commons Media are not structured data, and as such, they do not belong in Wikidata.
I think you statement is wrong here. Wikipedia articles are structured on several analysable levels. For example, from the point of view of a common linguistic theory, they are structured and analysable on syntaxique level, semantic level and pragmatic level. But they are many other way in which you might analyse them because they are structured data. But it is true that there are not structured in a way that ease SQL-like querying.
However, every single sentence contained in Wikipedia articles can be reduce down to a set of predicates, that is they are reducible in things that can be stored in Wikidata. There is no technical barrier I'm aware of that prevent putting the whole content of all Wikipedia in as many as required statements within Wikidata.
Elements of prose in Wiktionary, such as definitions, appendices, extensive usage notes and notes on grammar and whatnot, are copyrightable. Similar to Wikipedia articles, licensing them under CC-BY-SA would not particularly harm their reuse, as attribution is completely feasible. They are also not structured data, and can not be made into structured data.
Well, as far as I'm concerned that would be great news to hear that Wikidata team will allow contributors to indeed include this CC-BY-SA material in the Wikibase instance/namespace/whatever place where this lexicological items will be stored in, rather than enforcing here too contribution under CC0. But so far statement made by the Wikidata team go in the exact opposite hypothesis, that is using CC0 for everything.
Wikidata will not be laundering this data to CC-0, nor will it be setting up a parallel project to duplicate the efforts under a license which is not appropriate for the type of content.
I hope future will prove you right.
Attempting to license the database's contents under CC-BY-SA would not ensure attribution, and would harm reuse. I fail to see any potential benefits to using the more restrictive license. Attribution will be required where it is possible (in Wiktionary proper), and content will be as reusable as possible in areas where requiring attribution isn't feasible (in Wikidata). There's no real conflict here.
I hope my answer made this conflicts more obvious, as well as showing how "more reusable right now" might rhyme with "less equity and accessibility of knowledge in the long term".
-- Yair Rand
2017-11-29 16:45 GMT-05:00 Mathieu Stumpf Guntz <psychoslave@culture-libre.org mailto:psychoslave@culture-libre.org>:
Saluton ĉiuj, I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold. Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue. Bellow is a copy/paste of the above linked message: Thank you Lydia Pintscher <https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29> for taking the time to answer. Unfortunately this answer <https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0> miss too many important points to solve all concerns which have been raised. Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic <https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive> advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1>. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics. Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in /Our strategic direction: Service and //*Equity*/ <https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity>. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2> of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights <https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights> for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering <https://en.wikipedia.org/wiki/license_laundering>. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as /a way to try to make it harder for big companies to profit from openly available resources/ *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library <https://www.gnu.org/licenses/why-not-lgpl.html> is stated /proprietary software developers have the advantage of money; free software developers need to make advantages for each other/. This might be generalised as /big companies have the advantage of money; free/libre culture contributors need to make advantages for each other/. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to /collect and use different forms of free, trusted knowledge/ that /focus efforts on the knowledge and communities that have been left out by structures of power and privilege/, as stated in /Our strategic direction: Service and Equity/. CC-0 is becoming more and more common. Just like economic inequality <https://en.wikipedia.org/wiki/economic_inequality>. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people. * [1] Wikipedia Signpost 2015, 2nd december <https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed> * [2] according to the next statement of Lydia Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects. Kun multe da vikiamo, mathieu _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Mathieu, Lydia and All,
As a further clarification:
I just looked up Wikipedia's license at bottom here - https://www.wikipedia.org/ - and it says it's CC-3 ((CC BY-SA 3.0)) - https://creativecommons.org/licenses/by-sa/3.0/ - which allows for commercial use.
Wikidata.org's is CC-0 ( CC0 1.0 Universal (CC0 1.0) ) which also allows for commercial use.
Wiktionary doesn't seem to list a license on its front page - https://www.wiktionary.org/ .
( By way of comparison, both MIT OCW and MIT OCW Translated courses, which now seem to number 4, having recently lost Portuguese and Persian, use a CC-4 license ... ( 4.0 International (CC BY-NC-SA 4.0) ) https://ocw.mit.edu/ https://ocw.mit.edu/courses/translated-courses/ https://creativecommons.org/licenses/by-nc-sa/4.0/
Noncommercial means: The NonCommercial (“NC”) element is found in three of the six CC licenses: BY-NC, BY-NC-SA, and BY-NC-ND. In each of these licenses, NonCommercial is expressly defined as follows: “NonCommercial means not primarily intended for or directed towards commercial advantage or monetary compensation.”Oct 15, 2017 NonCommercial interpretation - Creative Commons https://wiki.creativecommons.org/wiki/NonCommercial_interpretation )
(World University and School donated itself to Wikidata in 2015, but since WUaS is CC-4 MIT OpenCourseWare-centric in 5 languages, WUaS obviously doesn't donate CC MIT OCW).
Here's more about CC licenses: https://creativecommons.org/licenses/
Are there ways that Wikidata or the Wikimedia Foundation might develop further the Wikidata CC-0 license in conversation with Creative Commons organization itself (as an alternative to license laundering or license migration over time)?
What kind of license is Wiktionary, as a Wikipedia/Wikidata sister project, likely to list on its front page in the future, especially giving its relevance for a universal translator, and for Wikimedia's Content Translation?
I'm grateful so much thought has gone into these CC licenses - and that there are such a variety of them, some explicitly international.
Cheers, Scott CC-? World University and School https://wiki.worlduniversityandschool.org/wiki/Nation_States
On Thu, Nov 30, 2017 at 1:17 PM, mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Le 30/11/2017 à 18:05, Yair Rand a écrit :
Wikidata is not replacing Wiktionary.
We will see that in the future. At least the proposed model allow to include most things that you might find in a Wiktionary article, plus it comes with all the benefit of a relational(-like) database.
See https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model for more information on what it will allow or not.
Wikidata did not replace Wikipedia, and force all articles to be under CC-0.
Sure. Not yet. But if it continue to improve, as well as tools to generate prose from it, at some point it might reach a good job at doing just that.
Structured data for Commons doesn't replace all Commons media with CC-0-licensed content.
Well, unlike one try to include use it in a very different way than what it is aiming at, there is no chance as pictures contains far more information than their metadata. Now, technically one might probably be able to store the whole picture in that kind of structure (provided no size restriction is enforced), but this is not the goal.
This is very different case than the Wiktionary case. The case of Wikipedia might be closer, but you can not make a simple one-to-one correspondence between Wikidata elements and Wikipedia prose. Actually Wikipedia extraction in statements usable in Wikidata is far more easier with current natural language processing toolkits. One the other hand such a bijective correspondence between a Wiktionary article and a set of WikibaseLexeme elements is clearly straight forward. So the domain of targeted knowledge documentation is extremely overlapping. Plus the Wikibase approach bring many advantages in term of knowledge factorisation.
To my mind, WikibaseLexeme have a good potential to quickly supersede our plethora of sparsely communicating Wiktionaries. At least far sooner than Wikibase will have a chance to approach the same level as Wikipedia article.
The fact that France is in Europe is not, independently, copyrightable. The fact that File:Vanessa_indica-Silent_Valley-2016-08-14-002.jpg is a picture of a butterfly is not copyrightable. The facts that "balloons" is the plural of "balloon", and that "feliĉiĝi" is an intransitive verb in Esperanto, are not copyrightable.
Surely that is something we all agree. :)
Even if they were copyrightable, copyrighting them independently would harm their potential reuse, as elements of a database, as has been previously explained.
Any information monopoly is a possible obstacle to reuse. No one will deny that, I guess. But information monopolies, such as copyright, patent and so on do exists. And so does unequal access to resources useful for human flourishing, including knowledge.
Now, personally I am not satisfied with this situation, nor with the growth of inequalities. A part of my motivation in contributing in Wikimedia projects is that it might contribute to make situation evolve otherwise. That might not enter in the field of motivations of every contributor, but I guess I'm not alone on this.
So the question for me is not, "how do we make our knowledge bank current snapshots as reusable as possible right now?", but "how do we build a sustainable movement which maintain and update knowledge banks that are as accessible as possible for every single human out there with this goal of sustainability in mind?".
Maybe it's not what every single stakeholder of our movement is expecting. But I don't feel that this personal vision is at odd with what is stated in the strategic direction. And I hope I'm not alone holding this vision.
Wikipedia articles and Commons Media are not structured data, and as such, they do not belong in Wikidata.
I think you statement is wrong here. Wikipedia articles are structured on several analysable levels. For example, from the point of view of a common linguistic theory, they are structured and analysable on syntaxique level, semantic level and pragmatic level. But they are many other way in which you might analyse them because they are structured data. But it is true that there are not structured in a way that ease SQL-like querying.
However, every single sentence contained in Wikipedia articles can be reduce down to a set of predicates, that is they are reducible in things that can be stored in Wikidata. There is no technical barrier I'm aware of that prevent putting the whole content of all Wikipedia in as many as required statements within Wikidata.
Elements of prose in Wiktionary, such as definitions, appendices, extensive usage notes and notes on grammar and whatnot, are copyrightable. Similar to Wikipedia articles, licensing them under CC-BY-SA would not particularly harm their reuse, as attribution is completely feasible. They are also not structured data, and can not be made into structured data.
Well, as far as I'm concerned that would be great news to hear that Wikidata team will allow contributors to indeed include this CC-BY-SA material in the Wikibase instance/namespace/whatever place where this lexicological items will be stored in, rather than enforcing here too contribution under CC0. But so far statement made by the Wikidata team go in the exact opposite hypothesis, that is using CC0 for everything.
Wikidata will not be laundering this data to CC-0, nor will it be setting up a parallel project to duplicate the efforts under a license which is not appropriate for the type of content.
I hope future will prove you right.
Attempting to license the database's contents under CC-BY-SA would not ensure attribution, and would harm reuse. I fail to see any potential benefits to using the more restrictive license. Attribution will be required where it is possible (in Wiktionary proper), and content will be as reusable as possible in areas where requiring attribution isn't feasible (in Wikidata). There's no real conflict here.
I hope my answer made this conflicts more obvious, as well as showing how "more reusable right now" might rhyme with "less equity and accessibility of knowledge in the long term".
-- Yair Rand
2017-11-29 16:45 GMT-05:00 Mathieu Stumpf Guntz < psychoslave@culture-libre.org>:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and **Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Dear Scott,
Wiktionary is CC BY-SA 3.0 as well, as all Wikimedia project are since around 2007.
Wikidata too is under CC BY-SA 3.0, for its non-data part (that is, everything but ns0, that is CC0).
L.
Il 30 nov 2017 23:53, "Scott MacLeod" worlduniversityandschool@gmail.com ha scritto:
Mathieu, Lydia and All,
As a further clarification:
I just looked up Wikipedia's license at bottom here - https://www.wikipedia.org/ - and it says it's CC-3 ((CC BY-SA 3.0)) - https://creativecommons.org/licenses/by-sa/3.0/ - which allows for commercial use.
Wikidata.org's is CC-0 ( CC0 1.0 Universal (CC0 1.0) ) which also allows for commercial use.
Wiktionary doesn't seem to list a license on its front page - https://www.wiktionary.org/ .
( By way of comparison, both MIT OCW and MIT OCW Translated courses, which now seem to number 4, having recently lost Portuguese and Persian, use a CC-4 license ... ( 4.0 International (CC BY-NC-SA 4.0) ) https://ocw.mit.edu/ https://ocw.mit.edu/courses/translated-courses/ https://creativecommons.org/licenses/by-nc-sa/4.0/
Noncommercial means: The NonCommercial (“NC”) element is found in three of the six CC licenses: BY-NC, BY-NC-SA, and BY-NC-ND. In each of these licenses, NonCommercial is expressly defined as follows: “NonCommercial means not primarily intended for or directed towards commercial advantage or monetary compensation.”Oct 15, 2017 NonCommercial interpretation - Creative Commons https://wiki.creativecommons.org/wiki/NonCommercial_interpretation )
(World University and School donated itself to Wikidata in 2015, but since WUaS is CC-4 MIT OpenCourseWare-centric in 5 languages, WUaS obviously doesn't donate CC MIT OCW).
Here's more about CC licenses: https://creativecommons.org/licenses/
Are there ways that Wikidata or the Wikimedia Foundation might develop further the Wikidata CC-0 license in conversation with Creative Commons organization itself (as an alternative to license laundering or license migration over time)?
What kind of license is Wiktionary, as a Wikipedia/Wikidata sister project, likely to list on its front page in the future, especially giving its relevance for a universal translator, and for Wikimedia's Content Translation?
I'm grateful so much thought has gone into these CC licenses - and that there are such a variety of them, some explicitly international.
Cheers, Scott CC-? World University and School https://wiki.worlduniversityandschool.org/wiki/Nation_States
On Thu, Nov 30, 2017 at 1:17 PM, mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Le 30/11/2017 à 18:05, Yair Rand a écrit :
Wikidata is not replacing Wiktionary.
We will see that in the future. At least the proposed model allow to include most things that you might find in a Wiktionary article, plus it comes with all the benefit of a relational(-like) database.
See https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model for more information on what it will allow or not.
Wikidata did not replace Wikipedia, and force all articles to be under CC-0.
Sure. Not yet. But if it continue to improve, as well as tools to generate prose from it, at some point it might reach a good job at doing just that.
Structured data for Commons doesn't replace all Commons media with CC-0-licensed content.
Well, unlike one try to include use it in a very different way than what it is aiming at, there is no chance as pictures contains far more information than their metadata. Now, technically one might probably be able to store the whole picture in that kind of structure (provided no size restriction is enforced), but this is not the goal.
This is very different case than the Wiktionary case. The case of Wikipedia might be closer, but you can not make a simple one-to-one correspondence between Wikidata elements and Wikipedia prose. Actually Wikipedia extraction in statements usable in Wikidata is far more easier with current natural language processing toolkits. One the other hand such a bijective correspondence between a Wiktionary article and a set of WikibaseLexeme elements is clearly straight forward. So the domain of targeted knowledge documentation is extremely overlapping. Plus the Wikibase approach bring many advantages in term of knowledge factorisation.
To my mind, WikibaseLexeme have a good potential to quickly supersede our plethora of sparsely communicating Wiktionaries. At least far sooner than Wikibase will have a chance to approach the same level as Wikipedia article.
The fact that France is in Europe is not, independently, copyrightable. The fact that File:Vanessa_indica-Silent_Valley-2016-08-14-002.jpg is a picture of a butterfly is not copyrightable. The facts that "balloons" is the plural of "balloon", and that "feliĉiĝi" is an intransitive verb in Esperanto, are not copyrightable.
Surely that is something we all agree. :)
Even if they were copyrightable, copyrighting them independently would harm their potential reuse, as elements of a database, as has been previously explained.
Any information monopoly is a possible obstacle to reuse. No one will deny that, I guess. But information monopolies, such as copyright, patent and so on do exists. And so does unequal access to resources useful for human flourishing, including knowledge.
Now, personally I am not satisfied with this situation, nor with the growth of inequalities. A part of my motivation in contributing in Wikimedia projects is that it might contribute to make situation evolve otherwise. That might not enter in the field of motivations of every contributor, but I guess I'm not alone on this.
So the question for me is not, "how do we make our knowledge bank current snapshots as reusable as possible right now?", but "how do we build a sustainable movement which maintain and update knowledge banks that are as accessible as possible for every single human out there with this goal of sustainability in mind?".
Maybe it's not what every single stakeholder of our movement is expecting. But I don't feel that this personal vision is at odd with what is stated in the strategic direction. And I hope I'm not alone holding this vision.
Wikipedia articles and Commons Media are not structured data, and as such, they do not belong in Wikidata.
I think you statement is wrong here. Wikipedia articles are structured on several analysable levels. For example, from the point of view of a common linguistic theory, they are structured and analysable on syntaxique level, semantic level and pragmatic level. But they are many other way in which you might analyse them because they are structured data. But it is true that there are not structured in a way that ease SQL-like querying.
However, every single sentence contained in Wikipedia articles can be reduce down to a set of predicates, that is they are reducible in things that can be stored in Wikidata. There is no technical barrier I'm aware of that prevent putting the whole content of all Wikipedia in as many as required statements within Wikidata.
Elements of prose in Wiktionary, such as definitions, appendices, extensive usage notes and notes on grammar and whatnot, are copyrightable. Similar to Wikipedia articles, licensing them under CC-BY-SA would not particularly harm their reuse, as attribution is completely feasible. They are also not structured data, and can not be made into structured data.
Well, as far as I'm concerned that would be great news to hear that Wikidata team will allow contributors to indeed include this CC-BY-SA material in the Wikibase instance/namespace/whatever place where this lexicological items will be stored in, rather than enforcing here too contribution under CC0. But so far statement made by the Wikidata team go in the exact opposite hypothesis, that is using CC0 for everything.
Wikidata will not be laundering this data to CC-0, nor will it be setting up a parallel project to duplicate the efforts under a license which is not appropriate for the type of content.
I hope future will prove you right.
Attempting to license the database's contents under CC-BY-SA would not ensure attribution, and would harm reuse. I fail to see any potential benefits to using the more restrictive license. Attribution will be required where it is possible (in Wiktionary proper), and content will be as reusable as possible in areas where requiring attribution isn't feasible (in Wikidata). There's no real conflict here.
I hope my answer made this conflicts more obvious, as well as showing how "more reusable right now" might rhyme with "less equity and accessibility of knowledge in the long term".
-- Yair Rand
2017-11-29 16:45 GMT-05:00 Mathieu Stumpf Guntz < psychoslave@culture-libre.org>:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and **Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Thanks, Luca,
Scott
On Thu, Nov 30, 2017 at 2:59 PM, Luca Martinelli martinelliluca@gmail.com wrote:
Dear Scott,
Wiktionary is CC BY-SA 3.0 as well, as all Wikimedia project are since around 2007.
Wikidata too is under CC BY-SA 3.0, for its non-data part (that is, everything but ns0, that is CC0).
L.
Il 30 nov 2017 23:53, "Scott MacLeod" worlduniversityandschool@gmail.com ha scritto:
Mathieu, Lydia and All,
As a further clarification:
I just looked up Wikipedia's license at bottom here - https://www.wikipedia.org/ - and it says it's CC-3 ((CC BY-SA 3.0)) - https://creativecommons.org/licenses/by-sa/3.0/ - which allows for commercial use.
Wikidata.org's is CC-0 ( CC0 1.0 Universal (CC0 1.0) ) which also allows for commercial use.
Wiktionary doesn't seem to list a license on its front page - https://www.wiktionary.org/ .
( By way of comparison, both MIT OCW and MIT OCW Translated courses, which now seem to number 4, having recently lost Portuguese and Persian, use a CC-4 license ... ( 4.0 International (CC BY-NC-SA 4.0) ) https://ocw.mit.edu/ https://ocw.mit.edu/courses/translated-courses/ https://creativecommons.org/licenses/by-nc-sa/4.0/
Noncommercial means: The NonCommercial (“NC”) element is found in three of the six CC licenses: BY-NC, BY-NC-SA, and BY-NC-ND. In each of these licenses, NonCommercial is expressly defined as follows: “NonCommercial means not primarily intended for or directed towards commercial advantage or monetary compensation.”Oct 15, 2017 NonCommercial interpretation - Creative Commons https://wiki.creativecommons.org/wiki/NonCommercial_interpretation )
(World University and School donated itself to Wikidata in 2015, but since WUaS is CC-4 MIT OpenCourseWare-centric in 5 languages, WUaS obviously doesn't donate CC MIT OCW).
Here's more about CC licenses: https://creativecommons.org/licenses/
Are there ways that Wikidata or the Wikimedia Foundation might develop further the Wikidata CC-0 license in conversation with Creative Commons organization itself (as an alternative to license laundering or license migration over time)?
What kind of license is Wiktionary, as a Wikipedia/Wikidata sister project, likely to list on its front page in the future, especially giving its relevance for a universal translator, and for Wikimedia's Content Translation?
I'm grateful so much thought has gone into these CC licenses - and that there are such a variety of them, some explicitly international.
Cheers, Scott CC-? World University and School https://wiki.worlduniversityandschool.org/wiki/Nation_States
On Thu, Nov 30, 2017 at 1:17 PM, mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Le 30/11/2017 à 18:05, Yair Rand a écrit :
Wikidata is not replacing Wiktionary.
We will see that in the future. At least the proposed model allow to include most things that you might find in a Wiktionary article, plus it comes with all the benefit of a relational(-like) database.
See https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model for more information on what it will allow or not.
Wikidata did not replace Wikipedia, and force all articles to be under CC-0.
Sure. Not yet. But if it continue to improve, as well as tools to generate prose from it, at some point it might reach a good job at doing just that.
Structured data for Commons doesn't replace all Commons media with CC-0-licensed content.
Well, unlike one try to include use it in a very different way than what it is aiming at, there is no chance as pictures contains far more information than their metadata. Now, technically one might probably be able to store the whole picture in that kind of structure (provided no size restriction is enforced), but this is not the goal.
This is very different case than the Wiktionary case. The case of Wikipedia might be closer, but you can not make a simple one-to-one correspondence between Wikidata elements and Wikipedia prose. Actually Wikipedia extraction in statements usable in Wikidata is far more easier with current natural language processing toolkits. One the other hand such a bijective correspondence between a Wiktionary article and a set of WikibaseLexeme elements is clearly straight forward. So the domain of targeted knowledge documentation is extremely overlapping. Plus the Wikibase approach bring many advantages in term of knowledge factorisation.
To my mind, WikibaseLexeme have a good potential to quickly supersede our plethora of sparsely communicating Wiktionaries. At least far sooner than Wikibase will have a chance to approach the same level as Wikipedia article.
The fact that France is in Europe is not, independently, copyrightable. The fact that File:Vanessa_indica-Silent_Valley-2016-08-14-002.jpg is a picture of a butterfly is not copyrightable. The facts that "balloons" is the plural of "balloon", and that "feliĉiĝi" is an intransitive verb in Esperanto, are not copyrightable.
Surely that is something we all agree. :)
Even if they were copyrightable, copyrighting them independently would harm their potential reuse, as elements of a database, as has been previously explained.
Any information monopoly is a possible obstacle to reuse. No one will deny that, I guess. But information monopolies, such as copyright, patent and so on do exists. And so does unequal access to resources useful for human flourishing, including knowledge.
Now, personally I am not satisfied with this situation, nor with the growth of inequalities. A part of my motivation in contributing in Wikimedia projects is that it might contribute to make situation evolve otherwise. That might not enter in the field of motivations of every contributor, but I guess I'm not alone on this.
So the question for me is not, "how do we make our knowledge bank current snapshots as reusable as possible right now?", but "how do we build a sustainable movement which maintain and update knowledge banks that are as accessible as possible for every single human out there with this goal of sustainability in mind?".
Maybe it's not what every single stakeholder of our movement is expecting. But I don't feel that this personal vision is at odd with what is stated in the strategic direction. And I hope I'm not alone holding this vision.
Wikipedia articles and Commons Media are not structured data, and as such, they do not belong in Wikidata.
I think you statement is wrong here. Wikipedia articles are structured on several analysable levels. For example, from the point of view of a common linguistic theory, they are structured and analysable on syntaxique level, semantic level and pragmatic level. But they are many other way in which you might analyse them because they are structured data. But it is true that there are not structured in a way that ease SQL-like querying.
However, every single sentence contained in Wikipedia articles can be reduce down to a set of predicates, that is they are reducible in things that can be stored in Wikidata. There is no technical barrier I'm aware of that prevent putting the whole content of all Wikipedia in as many as required statements within Wikidata.
Elements of prose in Wiktionary, such as definitions, appendices, extensive usage notes and notes on grammar and whatnot, are copyrightable. Similar to Wikipedia articles, licensing them under CC-BY-SA would not particularly harm their reuse, as attribution is completely feasible. They are also not structured data, and can not be made into structured data.
Well, as far as I'm concerned that would be great news to hear that Wikidata team will allow contributors to indeed include this CC-BY-SA material in the Wikibase instance/namespace/whatever place where this lexicological items will be stored in, rather than enforcing here too contribution under CC0. But so far statement made by the Wikidata team go in the exact opposite hypothesis, that is using CC0 for everything.
Wikidata will not be laundering this data to CC-0, nor will it be setting up a parallel project to duplicate the efforts under a license which is not appropriate for the type of content.
I hope future will prove you right.
Attempting to license the database's contents under CC-BY-SA would not ensure attribution, and would harm reuse. I fail to see any potential benefits to using the more restrictive license. Attribution will be required where it is possible (in Wiktionary proper), and content will be as reusable as possible in areas where requiring attribution isn't feasible (in Wikidata). There's no real conflict here.
I hope my answer made this conflicts more obvious, as well as showing how "more reusable right now" might rhyme with "less equity and accessibility of knowledge in the long term".
-- Yair Rand
2017-11-29 16:45 GMT-05:00 Mathieu Stumpf Guntz < psychoslave@culture-libre.org>:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0, because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1. To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia. Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in *Our strategic direction: Service and * *Equity* https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity. Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2] https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2 of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as *a way to try to make it harder for big companies to profit from openly available resources* *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated *proprietary software developers have the advantage of money; free software developers need to make advantages for each other*. This might be generalised as *big companies have the advantage of money; free/libre culture contributors need to make advantages for each other*. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to *collect and use different forms of free, trusted knowledge* that *focus efforts on the knowledge and communities that have been left out by structures of power and privilege*, as stated in *Our strategic direction: Service and Equity*. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people.
[1] Wikipedia Signpost 2015, 2nd december https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--
--
Scott MacLeod - Founder & President
World University and School
415 480 4577 <(415)%20480-4577>
CC World University and School - like CC Wikipedia with best
STEM-centric CC OpenCourseWare - incorporated as a nonprofit university and school in California, and is a U.S. 501 (c) (3) tax-exempt educational organization.
IMPORTANT NOTICE: This transmission and any attachments are intended only for the use of the individual or entity to which they are addressed and may contain information that is privileged, confidential, or exempt from disclosure under applicable federal or state laws. If the reader of this transmission is not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this transmission in error, please notify me immediately by email or telephone.
World University and School is sending you this because of your interest in free, online, higher education. If you don't want to receive these, please reply with 'unsubscribe' in the body of the email, leaving the subject line intact. Thank you.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Mathieu,
You don't seem to grasp the essential legal point, though several people in this thread have already tried to tell you.
Copyright protects expression and creative originality. It does not protect merely a collation of facts.
The CC-SA licence is based on copyright. Anything that is not protected by copyright is not protected by the CC-SA licence.
To the extent that an article can be reduced to a mere collation of facts, it is not protected by copyright. What is protected is any originality or creativity in how those facts are organised and presented -- the expression, the sequence of thought, the selections of words, all the authorial choices in the text.
*That* is the difference between a copyright-protected Wiki article on the one hand, and a Wikidata collection of facts on the other.
In the European Union collections of facts can be protected by database rights.
That is the path Open Streetmap chose, when they designed the ODbL, to prevent their work being eaten up and assimilated by closed commercial rivals.
It is not the choice Wikidata made. And it is not the choice any of the Wiki projects made before Wikidata -- CC-SA disclaims database rights.
The debate between the two views goes back at least as far as GPL vs BSD, and the arguments have been gone over many many times in many many communities over that time.
Yes, CC0 causes us some difficulties.
It means what we can import from OpenStreetmap is very restricted -- mass import falls foul of OSM's database rights; and also coordinates and boundaries are somewhat susceptible to judgment, so there is probably a copyright element to.
It also makes it difficult to import from official sources (eg the UK Open Government Licence) that use database rights to require attribution -- that is not an obligation we are prepared to pass on to out re-users, which means we generally have to forego such sources.
But the counterbalance is that for many people it is the openness and reusability for all purposes of Wikidata that very much encourages them to contribute -- they feel the more reusable and reused their work is, the more it is worth contributing.
The important point though is that this boat has sailed. Wikidata is CC0, and it is not going to change now.
Yes, somebody could fork the data from Wikidata into their own ODbL project if they wanted to. CC0 allows that. (The reverse direction is what is difficult). You might have preferred ODBL on viral GPL-style community-building (or community-isolating) grounds. But that is not going to happen.
As regards Wiktionary, it means that Wikidata cannot import from Wiktionary anything that represents original expression or original creativity.
But there is no restriction, not from copyright law, nor from the CC-BY-SA licence, to stop Wikidata -- or anyone else -- extracting and systematically storing standard uncontroversial facts, so long as nothing of original expression is taken.
Please confirm that you understand this.
Best regards,
James.
On 30/11/2017 21:17, mathieu stumpf guntz wrote:
Le 30/11/2017 à 18:05, Yair Rand a écrit :
Wikidata is not replacing Wiktionary.
We will see that in the future. At least the proposed model allow to include most things that you might find in a Wiktionary article, plus it comes with all the benefit of a relational(-like) database.
See https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model for more information on what it will allow or not.
Wikidata did not replace Wikipedia, and force all articles to be under CC-0.
Sure. Not yet. But if it continue to improve, as well as tools to generate prose from it, at some point it might reach a good job at doing just that.
Structured data for Commons doesn't replace all Commons media with CC-0-licensed content.
Well, unlike one try to include use it in a very different way than what it is aiming at, there is no chance as pictures contains far more information than their metadata. Now, technically one might probably be able to store the whole picture in that kind of structure (provided no size restriction is enforced), but this is not the goal.
This is very different case than the Wiktionary case. The case of Wikipedia might be closer, but you can not make a simple one-to-one correspondence between Wikidata elements and Wikipedia prose. Actually Wikipedia extraction in statements usable in Wikidata is far more easier with current natural language processing toolkits. One the other hand such a bijective correspondence between a Wiktionary article and a set of WikibaseLexeme elements is clearly straight forward. So the domain of targeted knowledge documentation is extremely overlapping. Plus the Wikibase approach bring many advantages in term of knowledge factorisation.
To my mind, WikibaseLexeme have a good potential to quickly supersede our plethora of sparsely communicating Wiktionaries. At least far sooner than Wikibase will have a chance to approach the same level as Wikipedia article.
The fact that France is in Europe is not, independently, copyrightable. The fact that File:Vanessa_indica-Silent_Valley-2016-08-14-002.jpg is a picture of a butterfly is not copyrightable. The facts that "balloons" is the plural of "balloon", and that "feliĉiĝi" is an intransitive verb in Esperanto, are not copyrightable.
Surely that is something we all agree. :)
Even if they were copyrightable, copyrighting them independently would harm their potential reuse, as elements of a database, as has been previously explained.
Any information monopoly is a possible obstacle to reuse. No one will deny that, I guess. But information monopolies, such as copyright, patent and so on do exists. And so does unequal access to resources useful for human flourishing, including knowledge.
Now, personally I am not satisfied with this situation, nor with the growth of inequalities. A part of my motivation in contributing in Wikimedia projects is that it might contribute to make situation evolve otherwise. That might not enter in the field of motivations of every contributor, but I guess I'm not alone on this.
So the question for me is not, "how do we make our knowledge bank current snapshots as reusable as possible right now?", but "how do we build a sustainable movement which maintain and update knowledge banks that are as accessible as possible for every single human out there with this goal of sustainability in mind?".
Maybe it's not what every single stakeholder of our movement is expecting. But I don't feel that this personal vision is at odd with what is stated in the strategic direction. And I hope I'm not alone holding this vision.
Wikipedia articles and Commons Media are not structured data, and as such, they do not belong in Wikidata.
I think you statement is wrong here. Wikipedia articles are structured on several analysable levels. For example, from the point of view of a common linguistic theory, they are structured and analysable on syntaxique level, semantic level and pragmatic level. But they are many other way in which you might analyse them because they are structured data. But it is true that there are not structured in a way that ease SQL-like querying.
However, every single sentence contained in Wikipedia articles can be reduce down to a set of predicates, that is they are reducible in things that can be stored in Wikidata. There is no technical barrier I'm aware of that prevent putting the whole content of all Wikipedia in as many as required statements within Wikidata.
Elements of prose in Wiktionary, such as definitions, appendices, extensive usage notes and notes on grammar and whatnot, are copyrightable. Similar to Wikipedia articles, licensing them under CC-BY-SA would not particularly harm their reuse, as attribution is completely feasible. They are also not structured data, and can not be made into structured data.
Well, as far as I'm concerned that would be great news to hear that Wikidata team will allow contributors to indeed include this CC-BY-SA material in the Wikibase instance/namespace/whatever place where this lexicological items will be stored in, rather than enforcing here too contribution under CC0. But so far statement made by the Wikidata team go in the exact opposite hypothesis, that is using CC0 for everything.
Wikidata will not be laundering this data to CC-0, nor will it be setting up a parallel project to duplicate the efforts under a license which is not appropriate for the type of content.
I hope future will prove you right.
Attempting to license the database's contents under CC-BY-SA would not ensure attribution, and would harm reuse. I fail to see any potential benefits to using the more restrictive license. Attribution will be required where it is possible (in Wiktionary proper), and content will be as reusable as possible in areas where requiring attribution isn't feasible (in Wikidata). There's no real conflict here.
I hope my answer made this conflicts more obvious, as well as showing how "more reusable right now" might rhyme with "less equity and accessibility of knowledge in the long term".
-- Yair Rand
2017-11-29 16:45 GMT-05:00 Mathieu Stumpf Guntz <psychoslave@culture-libre.org mailto:psychoslave@culture-libre.org>:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous Wiktionary User Group talk page
because I'm interested to have a wider feedback of the community on this point. Whether you think that my view is completely misguided or that I might have a few relevant points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind that I stay convinced that Wikidata is a wonderful project and I wish it a bright future full of even more amazing things than what it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29 for taking the time to answer. Unfortunately this answer https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0 miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the decision of using CC0 exclusively for Wikidata came from. But as this inquiry on the topic
advance, an answer is emerging from it. It seems that Wikidata choice toward CC0 was heavily influenced by Denny Vrandečić, who – to make it short – is now working in the Google Knowledge Graph team. Also it worth noting that Google funded a quarter of the initial development work. Another quarter came from the Gordon and Betty Moore Foundation, established by Intel co-founder. And half the money came from Microsoft co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1]
https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1.
To state it shortly in a conspirational fashion, Wikidata is the puppet trojan horse of big tech hegemonic companies into the realm of Wikimedia. For a less tragic, more argumentative version, please see the research project (work in progress, only chapter 1 is in good enough shape, and it's only available in French so far). Some proofs that this claim is completely wrong are welcome, as it would be great that in fact that was the community that was the driving force behind this single license choice and that it is the best choice for its future, not the future of giant tech companies. This would be a great contribution to bring such a happy light on this subject, so we can all let this issue alone and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia.
Wikidata is here to give more people more access to more knowledge. So far, it makes it matches Wikimedia movement stated goal. This means we want our data to be used as widely as possible. Sure, as long as it rhymes with equity. As in /Our strategic direction: Service and //*Equity*/
Just like we want freedom for everybody as widely as possible. That is, starting where it confirms each others freedom. Because under this level, freedom of one is murder and slavery of others. CC-0 is one step towards that. That's a thesis, you can propose to defend it but no one have to agree without some convincing proof. Data is different from many other things we produce in Wikimedia in that it is aggregated, combined, mashed-up, filtered, and so on much more extensively. No it's not. From a data processing point of view, everything is data. Whether it's stored in a wikisyntax, in a relational database or engraved in stone only have a commodity side effect. Whether it's a random stream of bit generated by a dumb chipset or some encoded prose of Shakespeare make no difference. So from this point of view, no, what Wikidata store is not different from what is produced anywhere else in Wikimedia projects. Sure, the way it's structured does extremely ease many things. But this is not because it's data, when elsewhere there would be no data. It's because it enforce data to be stored in a way that ease aggregation, combination, mashing-up, filtering and so on. Our data lives from being able to write queries over millions of statements, putting it into a mobile app, visualizing parts of it on a map and much more. Sure. It also lives from being curated from millions[2]
https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2
of benevolent contributors, or it would be just a useless pile of random bytes. This means, if we require attribution, in a huge number of cases attribution would need to go back to potentially millions of editors and sources (even if that data is not visible in the end result but only helped to get the result). No, it doesn't mean that. First let's recall a few basics as it seems the whole answer makes confusion between attribution and distribution of contributions under the same license as the original. Attribution is crucial for traceability and so for reliable and trusted knowledge that we are targeting within the Wikimedia movement. The "same license" is the sole legal guaranty of equity contributors have. That's it, trusted knowledge and equity are requirements for the Wikimedia movement goals. That means withdrawing this requirements is withdrawing this goals. Now, what would be the additional cost of storing sources in Wikidata? Well, zero cost. Actually, it's already here as the "reference" attribute is part of the Wikibase item structure. So attribution is not a problem, you don't have to put it in front of your derived work, just look at a Wikipedia article: until you go to history, you have zero attribution visible, and it's ok. It's also have probably zero or negligible computing cost, as it doesn't have to be included in all computations, it just need to be retrievable on demand. What would be the additional cost of storing licenses for each item based on its source? Well, adding a license attribute might help, but actually if your reference is a work item, I guess it might comes with a "license" statement, so zero additional cost. Now for letting user specify under which free licenses they publish their work, that would just require an additional attribute, a ridiculous weight when balanced with equity concerns it resolves. Could that prevent some uses for some actors? Yes, that's actually the point, preventing abuse of those who doesn't want to act equitably. For all other actors a "distribute under same condition" is fine. This is potentially computationally hard to do and and depending on where the data is used very inconvenient (think of a map with hundreds of data points in a mobile app). OpenStreetMap which use ODbL, a copyleft attributive license, do exactly that too, doesn't it? By the way, allowing a license by item would enable to include OpenStreetMap data in WikiData, which is currently impossible due to the CC0 single license policy of the project. Too bad, it could be so useful to have this data accessible for Wikimedia projects, but who cares? This is a burden on our re-users that I do not want to impose on them. Wait, which re-users? Surely one might expect that Wikidata would care first of re-users which are in the phase with Wikimedia goal, so surely needs of Wikimedia community in particular and Free/Libre Culture in general should be considered. Do this re-users would be penalized by a copyleft license? Surely no, or they wouldn't use it extensively as they do. So who are this re-users for who it's thought preferable, without consulting the community, to not annoy with questions of equity and traceability? It would make it significantly harder to re-use our data and be in direct conflict with our goal of spreading knowledge. No, technically it would be just as easy as punching a button on a computer to do that rather than this. What is in direct conflict with our clearly stated goals emerging from the 2017 community consultation is going against equity and traceability. You propose to discard both to satisfy exogenous demands which should have next to no weight in decision impacting so deeply the future of our community. Whether data can be protected in this way at all or not depends on the jurisdiction we are talking about. See this Wikilegal on on database rights https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights for more details. It says basically that it's applicable in United States and Europe on different legal bases and extents. And for the rest of the world, it doesn't say it doesn't say nothing can apply, it states nothing. So even if we would have decided to require attribution it would only be enforceable in some jurisdictions. What kind of logic is that? Maybe it might not be applicable in some country, so let's withdraw the few rights we have. Ambiguity, when it comes to legal matters, also unfortunately often means that people refrain from what they want to to for fear of legal repercussions. This is directly in conflict with our goal of spreading knowledge. Economic inequality, social inequity and legal imbalance might also refrain people from doing what they want, as they fear practical repercussions. CC0 strengthen this discrimination factors by enforcing people to withdraw the few rights they have to weight against the growing asymmetry that social structures are concomitantly building. So CC0 as unique license choice is in direct conflict with our goal of *equitably* spreading knowledge. Also it seems like this statement suggest that releasing our contributions only under CC0 is the sole solution to diminish legal doubts. Actually any well written license would do an equal job regarding this point, including many copyleft licenses out there. So while associate a clear license to each data item might indeed diminish legal uncertainty, it's not an argument at all for enforcing CC0 as sole license available to contributors. Moreover, just putting a license side by side with a work does not ensure that the person who made the association was legally allowed to do so. To have a better confidence in the legitimacy of a statement that a work is covered by a certain license, there is once again a traceability requirement. For example, Wikidata currently include many items which were imported from misc. Wikipedia versions, and claim that the derived work obtained – a set of items and statements – is under CC0. That is a hugely doubtful statement and it alarmingly looks like license laundering https://en.wikipedia.org/wiki/license_laundering. This is true for Wikipedia, but it's also true for any source on which a large scale extraction and import are operated, whether through bots or crowd sourcing. So the Wikidata project is currently extremely misplaced to give lessons on legal ambiguity, as it heavily plays with legal blur and the hope that its shady practises won't fall under too much scrutiny. Licenses that require attribution are often used as a way to try to make it harder for big companies to profit from openly available resources. No there are not. They are used as /a way to try to make it harder for big companies to profit from openly available resources/ *in inequitable manners*. That's completely different. Copyleft licenses give the same rights to big companies and individuals in a manner that lower socio-economic inequalities which disproportionally advantage the former. The thing is there seems to be no indication of this working. Because it's not trying to enforce what you pretend, so of course it's not working for this goal. But for the goal that copyleft licenses aims at, there are clear evidences that yes it works. Big companies have the legal and engineering resources to handle both the legal minefield and the technical hurdles easily. There is no pitfall in copyleft licenses. Using war material analogy is disrespectful. That's true that copyleft licenses might come with some constraints that non-copyleft free licenses don't have, but that the price for fostering equity. And it's a low price, that even individuals can manage, it might require a very little extra time on legal considerations, but on the other hand using the free work is an immensely vast gain that worth it. In Why you shouldn't use the Lesser GPL for your next library https://www.gnu.org/licenses/why-not-lgpl.html is stated /proprietary software developers have the advantage of money; free software developers need to make advantages for each other/. This might be generalised as /big companies have the advantage of money; free/libre culture contributors need to make advantages for each other/. So at odd with what pretend this fallacious claims against copyleft licenses, they are not a "minefield and the technical hurdles" that only big companies can handle. All the more, let's recall who financed the initial development of Wikidata: only actors which are related to big companies. Who it is really hurting is the smaller start-up, institution or hacker who can not deal with it. If this statement is about copyleft licenses, then this is just plainly false. Smaller actors have more to gain in preserving mutual benefit of the common ecosystem that a copyleft license fosters. With Wikidata we are making structured data about the world available for everyone. And that's great. But that doesn't require CC0 as sole license to be achieved. We are leveling the playing field to give those who currently don’t have access to the knowledge graphs of the big companies a chance to build something amazing. And that's great. But that doesn't require CC0 as sole license. Actually CC0 makes it a less sustainable project on this point, as it allows unfair actors to take it all, add some interesting added value that our community can not afford, reach/reinforce an hegemonic position in the ecosystem with their own closed solution. And, ta ta, Wikidata can be discontinued quietly, just like Google did with the defunct Freebase which was CC-BY-SA before they bought the company that was running it, and after they imported it under CC0 in Wikidata as a new attempt to gather a larger community of free curators. And when it will have performed license laundering of all Wikimedia projects works with shady mass extract and import, Wikimedia can disappear as well. Of course big companies benefits more of this possibilities than actors with smaller financial support and no hegemonic position. Thereby we are helping more people get access to knowledge from more places than just the few big ones. No, with CC0 you are certainly helping big companies to reinforce their position in which they can distribute information manipulated as they wish, without consideration for traceability and equity considerations. Allowing contributors to also use copyleft licenses would be far more effective to /collect and use different forms of free, trusted knowledge/ that /focus efforts on the knowledge and communities that have been left out by structures of power and privilege/, as stated in /Our strategic direction: Service and Equity/. CC-0 is becoming more and more common. Just like economic inequality https://en.wikipedia.org/wiki/economic_inequality. But that is not what we are aiming to foster in the Wikimedia movement. Many organisations are releasing their data under CC-0 and are happy with the experience. Among them are the European Union, Europeana, the National Library of Sweden and the Metropolitan Museum of Modern Arts. Good for them. But they are not the Wikimedia community, they have their own goals and plan to be sustainable that does not necessarily meet what our community can follow. Different contexts require different means. States and their institutions can count on tax revenue, and if taxpayers ends up in public domain works, that's great and seems fair. States are rarely threatened by companies, they have legal lever to pressure that kind of entity, although conflict of interest and lobbying can of course mitigate this statement. Importing that kind of data with proper attribution and license is fine, be it CC0 or any other free license. But that's not an argument in favour of enforcing on benevolent a systematic withdraw of all their rights as single option to contribute. All this being said we do encourage all re-users of our data to give attribution to Wikidata because we believe it is in the interest of all parties involved. That's it, zero legal hope of equity. And our experience shows that many of our re-users do give credit to Wikidata even if they are not forced to. Experience also show that some prominent actors like Google won't credit the Wikimedia community anymore when generating directly answer based on, inter alia, information coming from Wikidata, which is itself performing license laundering of Wikipedia data. Are there no downsides to this? No, of course not. Some people chose not to participate, some data can't be imported and some re-users do not attribute us. But the benefits I have seen over the years for Wikidata and the larger open knowledge ecosystem far outweigh them. This should at least backed with some solid statistics that it had a positive impact in term of audience and contribution in Wikimedia project as a whole. Maybe the introduction of Wikidata did have a positive effect on the evolution of total number of contributors, or maybe so far it has no significant correlative effect, or maybe it is correlative with a decrease of the total number of active contributors. Some plots would be interesting here. Mere personal feelings of benefits and hindrances means nothing here, mine included of course. Plus, there is not even the beginning of an attempt to A/B test with a second Wikibase instant that allow users to select which licenses its contributions are released under, so there is no possible way to state anything backed on relevant comparison. The fact that they are some people satisfied with the current state of things doesn't mean they would not be even more satisfied with a more equitable solution that allows contributors to chose a free license set for their publications. All the more this is all about the sustainability and fostering of our community and reaching its goals, not immediate feeling of satisfaction for some people. *
[1] Wikipedia Signpost 2015, 2nd december
https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed
*
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The motivation behind this message is a hope that one day one might participate in Wikidata with the same respect for equity and traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo, mathieu
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org mailto:Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--- This email has been checked for viruses by AVG. http://www.avg.com
Hi James,
Le 30/11/2017 à 23:54, James Heald a écrit :
Mathieu,
You don't seem to grasp the essential legal point, though several people in this thread have already tried to tell you.
Copyright protects expression and creative originality. It does not protect merely a collation of facts.
Well, let's recall https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights#Copyright_protecti...:
A database is protected by copyright when the selection or arrangement is original and creative.^[2] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights#cite_note-2 The level of creativity required is low, so it doesn’t have to be very creative — as long as the author had some discretion and made some choices in what to include or how to organize it, the database is likely to be protected.
So, depending on how creative your collation arrangement is, copyright might apply (in the US). In Europe, as you point bellow, /sui generis/ rights might be enforceable.
The CC-SA licence is based on copyright. Anything that is not protected by copyright is not protected by the CC-SA licence.
To the extent that an article can be reduced to a mere collation of facts, it is not protected by copyright. What is protected is any originality or creativity in how those facts are organised and presented -- the expression, the sequence of thought, the selections of words, all the authorial choices in the text.
The problem not addressed in this reasoning is that all this "creative choices" can themselves be exposed as factual statements. This could be exposed in extensive development of several concurrent theses regarding the problem of knowledge and creativity from both a gnoseologic and epistemic perspectives. But admittedly, here this would be useless offtopic logorrhoea. So in short, through history people developed, inter alias, theories which states that everything is creative, nothing is creative, only some things are creative.
So the problem here is not that I can't grasp the legal point about the creativity argument, but that I'm not in position of enforcing what is considered creative nor predict whatever some undetermined legal entity might prefer to declare to be creative or not.
For that, you have to get the answer from some legal entity which through their mystic power inaccessible to mere mortal like me will be able to operate the magical performative statement https://en.wikipedia.org/wiki/Performative_utterance that will seal the destiny of a work into the realm of creativity or relegate it to vulgar combinatorial material for the rest of eternity (in the scope of its jurisdiction, until some other legal decision states otherwise).
*That* is the difference between a copyright-protected Wiki article on the one hand, and a Wikidata collection of facts on the other.
In the European Union collections of facts can be protected by database rights.
That is the path Open Streetmap chose, when they designed the ODbL, to prevent their work being eaten up and assimilated by closed commercial rivals.
It is not the choice Wikidata made. And it is not the choice any of the Wiki projects made before Wikidata -- CC-SA disclaims database rights.
Actually, as far as I know, CC-by-sa-3.0-undeed states nothing about /suis generis/ rights, and so don't disclaim it but let it applied in all its extensiveness.
And that is the license that cover all other Wikimedia wiki projects (with a dual GFDL 1.3), except Commons where users chose whatever free licenses they want, and Wikidata which permit exclusively CC0.
And a large part of the inquiry on this topic is to determine who decided to use exclusively CC0, through which process and with which goals/perspectives. Some answers stated "long discussions on the topic", but I wasn't given any link so far with something like a vote on the topic, and until something like that is provided, it can't be checked that indeed the community made this decision. So a statement like "the choice Wikidata made" is inconvenient, as what denotation is supposed to be done of "Wikidata" in this context is all but trivial.
Yes, CC0 causes us some difficulties.
It means what we can import from OpenStreetmap is very restricted -- mass import falls foul of OSM's database rights; and also coordinates and boundaries are somewhat susceptible to judgment, so there is probably a copyright element to.
It also makes it difficult to import from official sources (eg the UK Open Government Licence) that use database rights to require attribution -- that is not an obligation we are prepared to pass on to out re-users, which means we generally have to forego such sources.
I think that with the solution already previously proposed to integrate a license attribute, it would be extremely easy for end user to filter items and statements that come with license they don't want to respect, while still enabling other to benefit from their presence in Wikidata.
The important point though is that this boat has sailed. Wikidata is CC0, and it is not going to change now.
Why not? It was envisioned and explicitly stated from the very beginning of Wikidata that it might switch license at some point. And actually, I'm not even supporting such a move, but simply to also open Wikidata to sources with other free licenses.
Yes, somebody could fork the data from Wikidata into their own ODbL project if they wanted to. CC0 allows that. (The reverse direction is what is difficult). You might have preferred ODBL on viral GPL-style community-building (or community-isolating) grounds. But that is not going to happen.
I would prefer to avoid fork if possible, this is scattering of resources. And personally I'm not favourable to enforce a single license. ODbL alone would keep entire the legal uncertainty issue of massive import from incompatible licenses such as Wikipedia. However if there was a community driven decision on this topic going in this direction, I would follow it.
Also a fork would not be accessible from other Wikimedia projects, which is as far as I'm concerned the main interest of Wikidata (indeed I have no interest in how it might be used by some random organisation where I don't have any possibility to contribute). Now if the foundation would be interested to run and integrate such a fork, that might begin to become interesting, but otherwise a fork would be useless for the Wikimedia environment.
But there is no restriction, not from copyright law, nor from the CC-BY-SA licence, to stop Wikidata -- or anyone else -- extracting and systematically storing standard uncontroversial facts, so long as nothing of original expression is taken.
Please confirm that you understand this.
I understand that "uncontroversial facts" and "original expression" is too subject to interpretation for enabling any accurate prediction on what will be qualified as covered under it by some undetermined legal entity, all the more as I am not a lawyer. If you have some jurisprudence references that pertain to this topic, surely that would be far more interesting than what I might understand and opinion.
I'm afraid this will not be the reply you would have preferred, but I greet your educational effort toward me and deeply thank you for taking time to write such an extensive and detailed answer.
Cheers, mathieu
Google and Wikidata,
Mathieu as an AI responder is really awesome !
Curious, what language is he programmed in and how long did it take you guys to code him ?
:) -Thad +ThadGuidry https://plus.google.com/+ThadGuidry
mathieu stumpf guntz, 01/12/2017 03:00:
Actually, as far as I know, CC-by-sa-3.0-undeed states nothing about /suis generis/ rights
I don't know what's -undeed, but 3.0-it and 4.0 do, which is for instance why ISTAT data can be imported in Wikidata despite the less than ideal license (CC-BY-3.0-it).
Federico
Le 01/12/2017 à 14:06, Federico Leva (Nemo) a écrit :
mathieu stumpf guntz, 01/12/2017 03:00:
Actually, as far as I know, CC-by-sa-3.0-undeed states nothing about /suis generis/ rights
I don't know what's -undeed, but 3.0-it and 4.0 do, which is for instance why ISTAT data can be imported in Wikidata despite the less than ideal license (CC-BY-3.0-it).
Federico
Sorry, I meant "unported", that is whith no specific claims about local juridiction. So, in a nutshell, ported versions of CC-3.0 of European countries such as Italy or France do include clauses related to /suis generis/ rights, while the unported version.
And to be complete "undeed" is the Creative Commons sobriquet for "full legal code", as opposed to the simple "deed" presentation for the layman:
The Commons Deed is a handy reference for licensors and licensees, summarizing and expressing some of the most important terms and conditions. Think of the Commons Deed as a user-friendly interface to the Legal Code beneath, although the Deed itself is not a license, and its contents are not part of the Legal Code itself.
https://creativecommons.org/licenses/
Uncreatively, mathieu
Dear Mathieu,
Your post demands my response since I was there when CC0 was first chosen (i.e., in the April meeting). I won't discuss your other claims here -- the discussions on the Wikidata list are already doing this, and I agree with Lydia that no shouting is necessary here.
Nevertheless, I must at least testify to what John wrote in his earlier message (quote included below this email for reference): it was not Denny's decision to go for CC0, but the outcome of a discussion among several people who had worked with open data for some time before Wikidata was born. I have personally supported this choice and still do. I have never received any money directly or indirectly from Google, though -- full disclosure -- I got several T-shirts for supervising in Summer of Code projects.
At no time did Google or any other company take part in our discussions in the zeroth hour of Wikidata. And why should they? From what I can see on their web page, Google has no problem with all kinds of different license terms in the data they display. Also, I can tell you that we would have reacted in a very allergic way to such attempts, so if any company had approached us, this would quite likely have backfired. But, believe it or not, when we started it was all but clear that this would become a relevant project at all, and no major company even cared to lobby us. It was still mostly a few hackers getting together in varying locations in Berlin. There was a lot of fun, optimism, and excitement in this early phase of Wikidata (well, I guess we are still in this phase).
So please do not start emails with made-up stories around past events that you have not even been close to (calling something "research" is no substitute for methodology and rigour). Putting unsourced personal attacks against community members before all other arguments is a reckless way of maximising effect, and such rhetoric can damage our movement beyond this thread or topic. Our main strength is not our content but our community, and I am glad to see that many have already responded to you in such a measured and polite way.
Peace,
Markus
On 30.11.2017 09:55, John Erling Blad wrote:
Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started working for Google.
As I recall it was mention during first week of the project (first week of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may), as the delivery of the laptoppen was delayed. I was against CC0 as I expected problems with reuse og external data. The arguments for CC0 convinced me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too.
Hello Markus,
First rest assured that any feedback provided will be integrated in the research project on the topic with proper references, including this email. It might not come before beginning of next week however, as I'm already more than fully booked until then. But once again it's on a wiki, be bold.
Le 01/12/2017 à 01:18, Markus Krötzsch a écrit :
Dear Mathieu,
Your post demands my response since I was there when CC0 was first chosen (i.e., in the April meeting). I won't discuss your other claims here -- the discussions on the Wikidata list are already doing this, and I agree with Lydia that no shouting is necessary here.
Nevertheless, I must at least testify to what John wrote in his earlier message (quote included below this email for reference): it was not Denny's decision to go for CC0, but the outcome of a discussion among several people who had worked with open data for some time before Wikidata was born. I have personally supported this choice and still do. I have never received any money directly or indirectly from Google, though -- full disclosure -- I got several T-shirts for supervising in Summer of Code projects.
Maybe I wasn't clear enough on that too, but to my mind the problem is not money but governance. Anyone with too much cash can throw it wherever wanted, and if some fall into Wikimedia pocket, that's fine.
But the moment a decision that impact so deeply Wikimedia governance and future happen, then maximum transparency must be present, communication must be extensive, and taking into account community feedback is extremely preferable. No one is perfect, myself included, so its all the more important to listen to external feedback. I said earlier that I found the knowledge engine was a good idea, but for what I red it seems that transparency didn't reach expectation of the community.
So, I was wrong my inferences around Denny, good news. Of course I would prefer to have other archived sources to confirm that. No mistrust intended, I think most of us are accustomed to put claims in perspective with sources and think critically.
For completeness, was this discussion online or – to bring bag the earlier stated testimony – around a pizza? If possible, could you provide a list of involved people? Did a single person took the final decision, or was it a show of hands, or some consensus emerged from discussion? Or maybe the community was consulted with a vote, and if yes, where can I find the archive?
Also archives show that lawyers were consulted on the topic, could we have a copy of their report?
At no time did Google or any other company take part in our discussions in the zeroth hour of Wikidata. And why should they? From what I can see on their web page, Google has no problem with all kinds of different license terms in the data they display.
Because they are more and more moving to a business model of providing themselves what people are looking for to keep users in their sphere of tracking and influence, probably with the sole idea of generating more revenue I guess.
Also, I can tell you that we would have reacted in a very allergic way to such attempts, so if any company had approached us, this would quite likely have backfired. But, believe it or not, when we started it was all but clear that this would become a relevant project at all, and no major company even cared to lobby us. It was still mostly a few hackers getting together in varying locations in Berlin. There was a lot of fun, optimism, and excitement in this early phase of Wikidata (well, I guess we are still in this phase).
Please situate that in time so we can place that in a timeline. In March 2012 Wikimedia DE announced the initial funding of 1.3 million Euros by Google, Paul Allen's Institute for Artificial Intelligence and Gordon and Betty Moore Foundation.
So please do not start emails with made-up stories around past events that you have not even been close to (calling something "research" is no substitute for methodology and rigour).
But that's all the problem here, no one should have to carry the pain of trying to reconstruct what happened through such a research. Process of this kind of decision should have been documented and should be easily be found in archives. If you have suggestion in methods, please provide them. Just denigrating the work don't help in any way to improve it. If there are additional sources that I missed, please provide them. If there are methodologies that would help improve the work, references are welcome.
Putting unsourced personal attacks against community members before all other arguments is a reckless way of maximising effect, and such rhetoric can damage our movement beyond this thread or topic.
All this is built on references. If the analyze is wrong, for example because it missed crucial undocumented information this must be corrected with additional sources. Wikidata team, as far as I can tell, was perfectly aware of this project for weeks. So if there was some sources that the team considered that it merited my attention to complete my thoughts on the topic, there was plenty of time to provide them before I posted this message.
Our main strength is not our content but our community, and I am glad to see that many have already responded to you in such a measured and polite way.
We completely agree on that. This is a wonderful community. And that's concerns for future of this very community which fueled this project.
I only can reiterate all apologies to anyone that might have felt personally attacked. I can go back to reformulate my message.
I hope you will help me to improve the research, or call it as you like, with more relevant feedback and references.
Peace
Peace,
Markus
On 30.11.2017 09:55, John Erling Blad wrote:
Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started working for Google.
As I recall it was mention during first week of the project (first week of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may), as
the
delivery of the laptoppen was delayed. I was against CC0 as I expected problems with reuse og external data. The arguments for CC0
convinced me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too.
Dear Mathieu,
You are in an impossible position. Either you want to be an objective researcher who tries to reconstruct past events as they happened, or you are pursuing an agenda to criticise and change some aspects of Wikidata. The way you do it, you are making yourself part of the debate that you claim you want to reconstruct.
From a research perspective, any material you gather in this way comes with a big question mark. You are not doing us much of a favour either, because by forcing us to refute accusations, you are placing our memories of the past events in a doubtful, heavily biased context.
Your overall approach of considering a theory to be true (or at least equally likely to be true) unless you are given "proofs that this claim is completely wrong" is not scientific. This is not how research works. For a start, Occam's Razor should make you disregard overly complex theories for things that have much simpler explanations (in our case: CC0 is a respected license chosen by many other projects for good reasons, so it is entirely plausible that the founders of Wikidata also just picked it for the usual reasons, without any secret conspiracy). And once you have an interesting theory formed, you need to gather evidence for or against it in a way that is not affected by the theory (i.e., in particular, don't start calls for information with an emotional discussion of whether or not you would personally like the theory to turn out true).
What you are doing here is completely unscientific and I hope that your supervisor (?) will also point this out to you at some point. Moreover, I am afraid that you cannot really get back to the position of an objective observer from where you are now. Better leave this research to others who are not in publicly documented disagreement with the main historic witnesses.
So you should understand that I don't feel compelled to give you a detailed account of every Wikidata-related discussion I had as if I were on some trial here. As a "researcher", it is you who has to prove your theories, not the rest of the world who has to disprove them. I already told you that your main guesses as far as they concern things I have witnessed are not true, and that's all from me for now.
Kind regards,
Markus
On 01.12.2017 03:43, mathieu stumpf guntz wrote:
Hello Markus,
First rest assured that any feedback provided will be integrated in the research project on the topic with proper references, including this email. It might not come before beginning of next week however, as I'm already more than fully booked until then. But once again it's on a wiki, be bold.
Le 01/12/2017 à 01:18, Markus Krötzsch a écrit :
Dear Mathieu,
Your post demands my response since I was there when CC0 was first chosen (i.e., in the April meeting). I won't discuss your other claims here -- the discussions on the Wikidata list are already doing this, and I agree with Lydia that no shouting is necessary here.
Nevertheless, I must at least testify to what John wrote in his earlier message (quote included below this email for reference): it was not Denny's decision to go for CC0, but the outcome of a discussion among several people who had worked with open data for some time before Wikidata was born. I have personally supported this choice and still do. I have never received any money directly or indirectly from Google, though -- full disclosure -- I got several T-shirts for supervising in Summer of Code projects.
Maybe I wasn't clear enough on that too, but to my mind the problem is not money but governance. Anyone with too much cash can throw it wherever wanted, and if some fall into Wikimedia pocket, that's fine.
But the moment a decision that impact so deeply Wikimedia governance and future happen, then maximum transparency must be present, communication must be extensive, and taking into account community feedback is extremely preferable. No one is perfect, myself included, so its all the more important to listen to external feedback. I said earlier that I found the knowledge engine was a good idea, but for what I red it seems that transparency didn't reach expectation of the community.
So, I was wrong my inferences around Denny, good news. Of course I would prefer to have other archived sources to confirm that. No mistrust intended, I think most of us are accustomed to put claims in perspective with sources and think critically.
For completeness, was this discussion online or – to bring bag the earlier stated testimony – around a pizza? If possible, could you provide a list of involved people? Did a single person took the final decision, or was it a show of hands, or some consensus emerged from discussion? Or maybe the community was consulted with a vote, and if yes, where can I find the archive?
Also archives show that lawyers were consulted on the topic, could we have a copy of their report?
At no time did Google or any other company take part in our discussions in the zeroth hour of Wikidata. And why should they? From what I can see on their web page, Google has no problem with all kinds of different license terms in the data they display.
Because they are more and more moving to a business model of providing themselves what people are looking for to keep users in their sphere of tracking and influence, probably with the sole idea of generating more revenue I guess.
Also, I can tell you that we would have reacted in a very allergic way to such attempts, so if any company had approached us, this would quite likely have backfired. But, believe it or not, when we started it was all but clear that this would become a relevant project at all, and no major company even cared to lobby us. It was still mostly a few hackers getting together in varying locations in Berlin. There was a lot of fun, optimism, and excitement in this early phase of Wikidata (well, I guess we are still in this phase).
Please situate that in time so we can place that in a timeline. In March 2012 Wikimedia DE announced the initial funding of 1.3 million Euros by Google, Paul Allen's Institute for Artificial Intelligence and Gordon and Betty Moore Foundation.
So please do not start emails with made-up stories around past events that you have not even been close to (calling something "research" is no substitute for methodology and rigour).
But that's all the problem here, no one should have to carry the pain of trying to reconstruct what happened through such a research. Process of this kind of decision should have been documented and should be easily be found in archives. If you have suggestion in methods, please provide them. Just denigrating the work don't help in any way to improve it. If there are additional sources that I missed, please provide them. If there are methodologies that would help improve the work, references are welcome.
Putting unsourced personal attacks against community members before all other arguments is a reckless way of maximising effect, and such rhetoric can damage our movement beyond this thread or topic.
All this is built on references. If the analyze is wrong, for example because it missed crucial undocumented information this must be corrected with additional sources. Wikidata team, as far as I can tell, was perfectly aware of this project for weeks. So if there was some sources that the team considered that it merited my attention to complete my thoughts on the topic, there was plenty of time to provide them before I posted this message.
Our main strength is not our content but our community, and I am glad to see that many have already responded to you in such a measured and polite way.
We completely agree on that. This is a wonderful community. And that's concerns for future of this very community which fueled this project.
I only can reiterate all apologies to anyone that might have felt personally attacked. I can go back to reformulate my message.
I hope you will help me to improve the research, or call it as you like, with more relevant feedback and references.
Peace
Peace,
Markus
On 30.11.2017 09:55, John Erling Blad wrote:
Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started working for Google.
As I recall it was mention during first week of the project (first week of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may), as
the
delivery of the laptoppen was delayed. I was against CC0 as I expected problems with reuse og external data. The arguments for CC0
convinced me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too.
Le 01/12/2017 à 09:34, Markus Kroetzsch a écrit :
Dear Mathieu,
You are in an impossible position. Either you want to be an objective researcher who tries to reconstruct past events as they happened, or you are pursuing an agenda to criticise and change some aspects of Wikidata. The way you do it, you are making yourself part of the debate that you claim you want to reconstruct.
Well, I guess this is a dilemma that many sociologists and anthropologists have to deal with. That's a really hard epistemic problem you are raising here, and I don't think this list is the place to discuss it extensively. So to make it short, I fully agree that your concern is legitimate, but if your implied conclusion is that it would be better to do nothing rather than going into a difficult epistemic position, I don't share this conclusion. Also, to my mind belief in absolute objectiveness is only delusion. I prefer to expose clearly what I can myself identify as my starting point of view and let audience take my biases into account rather than pretending that I aim presenting the ultimate objective truth.
So I recognize I have a strong bias toward copyleft licenses as general solution. But as I already stated in this thread, I am also for promoting solutions with less legal constraints depending on the context of production and fixed goals. And this nothing new, I surely might be able to provide links or get some testimony that here and there I do promote and myself use solutions with less legal constraints.
For this project, believe it or not, I had no pre-established agenda to criticise and change Wikidata in a predetermined fashion as point of departure. Of course before starting this project I had an opinion, and yes CC0 for Wikidata didn't look appealing to me. But a strong motivation behind this project was to give me a chance to change my mind with a broader view of this choice of CC0 as unique license. Its origin, its impact, and opinion of the Wikimedia community regarding this topic. And I stay in this open minded dynamic.
Now while doing my research with this goal, I found strong hints of potential conflict of interest, which was absolutely not what I was looking for. Now strong hints and potential conflict of interest are not proof of conflict of interest. If there was no such a thing, then it's great and I'll document that in this way.
Finally note that while I'm taking part of the debate right now won't change the fact that I didn't at the moment that the decision was took. That is, I don't have the power to change the past, and I am aiming at documenting past events on the topic using verifiable available sources. I don't expect anyone to blindly trust me. Don't blindly trust me. Everybody should really interested in the subject should check sources on which claims are done and possibly draw a different conclusion and be bold and make evolve the project or at least provide feedback.
From a research perspective, any material you gather in this way comes with a big question mark. You are not doing us much of a favour either, because by forcing us to refute accusations, you are placing our memories of the past events in a doubtful, heavily biased context.
Well, I'm sorry for that. But it's not nothing new that our community is full of freaks obsessed with transparency, "respect the license" and "reference needed", is it? So how possibly it wasn't envisioned that one day it would be embarrassing to not have a documented information about how exactly was done this license choice and by who? My guess is that the simple answer is that human make errors. I do errors. A lot of it. Many reply in this thread surely can attest that, doesn't it? But may be it would be good to recognize that you too can make errors, rather than trying to put all the shame on me for asking information about such an important topic so many time after the decision occurred.
Your overall approach of considering a theory to be true (or at least equally likely to be true) unless you are given "proofs that this claim is completely wrong" is not scientific.
Claiming that some approach is the one I'm following, discrediting this approach and conclude that anything I say is then wrong is not fair either.
Contemporary scientific method mostly agree that you have to come with a falsifiable theory, as exposed in by Thomas Kuhn in /The Structure of Scientific Revolutions/. So this is a condition to have any chance to have some scientific value. But of course this is not a guarantee that the theory is true. At best it makes the theory not proven wrong by any evidence.
This is not how research works. For a start, Occam's Razor should make you disregard overly complex theories for things that have much simpler explanations (in our case: CC0 is a respected license chosen by many other projects for good reasons, so it is entirely plausible that the founders of Wikidata also just picked it for the usual reasons, without any secret conspiracy).
Occam's Razor states that you should always prefer the theory which requires the smallest set of entities/rules couple available to explain a phenomena in regard of empirical data. It's completely different from opting for the simplest explanation. The possibility of conflict of interest require no hidden conspiracy, no additional entity, and simply consider the possibility of occurrence of a phenomena which is widely documented in social science fields.
Maybe at this point it might also be interesting to explicitly state that knowing that there was no conflict of interest intervening in this decision is interesting for the sake of governance transparency. But going with this hypothesis don't really have much importance with the rather independent question of whether using CC0 as unique license for Wikidata is the best choice for reaching the goal of the Wikimedia movement in a sustainable manner.
And once you have an interesting theory formed, you need to gather evidence for or against it in a way that is not affected by the theory (i.e., in particular, don't start calls for information with an emotional discussion of whether or not you would personally like the theory to turn out true).
I totally recognize that on this point I've misbehaved in this post, I should have refrain of adding so much emotional emphaze in my message.
What you are doing here is completely unscientific and I hope that your supervisor (?) will also point this out to you at some point. Moreover, I am afraid that you cannot really get back to the position of an objective observer from where you are now. Better leave this research to others who are not in publicly documented disagreement with the main historic witnesses.
This research don't have a supervisor. This is a Wikiversity research project. Anyone can join and improve it.
So you should understand that I don't feel compelled to give you a detailed account of every Wikidata-related discussion I had as if I were on some trial here. As a "researcher", it is you who has to prove your theories, not the rest of the world who has to disprove them. I already told you that your main guesses as far as they concern things I have witnessed are not true, and that's all from me for now.
The question is not whether you want to give me that kind of details. Me and the feelings I might inspire doesn't matter here. The question is whether you are willing to comply with the exigence of transparency that the Wikimedia movement is attached to, on a topic which directly impact its governance and future on a large scale.
Kind regards, mathieu
Kind regards,
Markus
On 01.12.2017 03:43, mathieu stumpf guntz wrote:
Hello Markus,
First rest assured that any feedback provided will be integrated in the research project on the topic with proper references, including this email. It might not come before beginning of next week however, as I'm already more than fully booked until then. But once again it's on a wiki, be bold.
Le 01/12/2017 à 01:18, Markus Krötzsch a écrit :
Dear Mathieu,
Your post demands my response since I was there when CC0 was first chosen (i.e., in the April meeting). I won't discuss your other claims here -- the discussions on the Wikidata list are already doing this, and I agree with Lydia that no shouting is necessary here.
Nevertheless, I must at least testify to what John wrote in his earlier message (quote included below this email for reference): it was not Denny's decision to go for CC0, but the outcome of a discussion among several people who had worked with open data for some time before Wikidata was born. I have personally supported this choice and still do. I have never received any money directly or indirectly from Google, though -- full disclosure -- I got several T-shirts for supervising in Summer of Code projects.
Maybe I wasn't clear enough on that too, but to my mind the problem is not money but governance. Anyone with too much cash can throw it wherever wanted, and if some fall into Wikimedia pocket, that's fine.
But the moment a decision that impact so deeply Wikimedia governance and future happen, then maximum transparency must be present, communication must be extensive, and taking into account community feedback is extremely preferable. No one is perfect, myself included, so its all the more important to listen to external feedback. I said earlier that I found the knowledge engine was a good idea, but for what I red it seems that transparency didn't reach expectation of the community.
So, I was wrong my inferences around Denny, good news. Of course I would prefer to have other archived sources to confirm that. No mistrust intended, I think most of us are accustomed to put claims in perspective with sources and think critically.
For completeness, was this discussion online or – to bring bag the earlier stated testimony – around a pizza? If possible, could you provide a list of involved people? Did a single person took the final decision, or was it a show of hands, or some consensus emerged from discussion? Or maybe the community was consulted with a vote, and if yes, where can I find the archive?
Also archives show that lawyers were consulted on the topic, could we have a copy of their report?
At no time did Google or any other company take part in our discussions in the zeroth hour of Wikidata. And why should they? From what I can see on their web page, Google has no problem with all kinds of different license terms in the data they display.
Because they are more and more moving to a business model of providing themselves what people are looking for to keep users in their sphere of tracking and influence, probably with the sole idea of generating more revenue I guess.
Also, I can tell you that we would have reacted in a very allergic way to such attempts, so if any company had approached us, this would quite likely have backfired. But, believe it or not, when we started it was all but clear that this would become a relevant project at all, and no major company even cared to lobby us. It was still mostly a few hackers getting together in varying locations in Berlin. There was a lot of fun, optimism, and excitement in this early phase of Wikidata (well, I guess we are still in this phase).
Please situate that in time so we can place that in a timeline. In March 2012 Wikimedia DE announced the initial funding of 1.3 million Euros by Google, Paul Allen's Institute for Artificial Intelligence and Gordon and Betty Moore Foundation.
So please do not start emails with made-up stories around past events that you have not even been close to (calling something "research" is no substitute for methodology and rigour).
But that's all the problem here, no one should have to carry the pain of trying to reconstruct what happened through such a research. Process of this kind of decision should have been documented and should be easily be found in archives. If you have suggestion in methods, please provide them. Just denigrating the work don't help in any way to improve it. If there are additional sources that I missed, please provide them. If there are methodologies that would help improve the work, references are welcome.
Putting unsourced personal attacks against community members before all other arguments is a reckless way of maximising effect, and such rhetoric can damage our movement beyond this thread or topic.
All this is built on references. If the analyze is wrong, for example because it missed crucial undocumented information this must be corrected with additional sources. Wikidata team, as far as I can tell, was perfectly aware of this project for weeks. So if there was some sources that the team considered that it merited my attention to complete my thoughts on the topic, there was plenty of time to provide them before I posted this message.
Our main strength is not our content but our community, and I am glad to see that many have already responded to you in such a measured and polite way.
We completely agree on that. This is a wonderful community. And that's concerns for future of this very community which fueled this project.
I only can reiterate all apologies to anyone that might have felt personally attacked. I can go back to reformulate my message.
I hope you will help me to improve the research, or call it as you like, with more relevant feedback and references.
Peace
Peace,
Markus
On 30.11.2017 09:55, John Erling Blad wrote:
Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started
working for
Google.
As I recall it was mention during first week of the project (first
week
of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may),
as the
delivery of the laptoppen was delayed. I was against CC0 as I
expected
problems with reuse og external data. The arguments for CC0
convinced me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too.
[I'm writing in my personal capacity.]
Hi Mathieu,
On Fri, Dec 1, 2017 at 2:45 AM, mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Le 01/12/2017 à 09:34, Markus Kroetzsch a écrit :
Dear Mathieu,
You are in an impossible position. Either you want to be an objective researcher who tries to reconstruct past events as they happened, or you are pursuing an agenda to criticise and change some aspects of Wikidata. The way you do it, you are making yourself part of the debate that you claim you want to reconstruct.
Well, I guess this is a dilemma that many sociologists and anthropologists have to deal with. That's a really hard epistemic problem you are raising here, and I don't think this list is the place to discuss it extensively. So to make it short, I fully agree that your concern is legitimate, but if your implied conclusion is that it would be better to do nothing rather than going into a difficult epistemic position, I don't share this conclusion.
You can do both, but these will be two separate efforts and you need to be clear to your audience which hat you have on when you're writing your messages. At the moment, the messages come across with mixed signals which makes it really hard to understand what is your goal. FYI: Here is what I have heard so far on this thread from you: (i) I want to do research to understand how the decision about CC0 was made. (ii) I demand transparency: You need to answer my questions since transparency is important for us and I have the right to ask about any topic and demand more explanation until my satisfaction. (iii) I am pretty skeptical about the way CC0 was chosen as a license for Wikidata, and I'm going to dig deep (casually, and not methodically/systematically) to figure out what's going on. If you're doing (i): We count you as a researcher and you are asked to follow research norms. In this case, I recommend that you open a research page on https://meta.wikimedia.org/wiki/Research:Index , clearly state what the problem is, why it's important to solve it, what methods have been used in the past (literature review) and why they are not enough, what is your methodology, how are you planning to do data collection (for example, will there be interviews? if yes, how are you going to handle the data collected?), results (when they become available), discussion (how you do or don't handle bias in data collection, where you think your study can be improved, ...). Once you have that page up, others may join to help you improve your research methodology and analysis before embarking on the actual research.
If you're doing (ii): Be aware: all of us have to make trade-offs between documentation, spending time on building history, and getting the volunteer/staff work ahead of us done. This is especially true for volunteer projects (which is how Wikidata was initiated). Someone spending time on documentation may mean the project not moving forward, literally. On this front: If you demand transparency and you make documentation a requirement for transparency, you will likely have to work hard to bring more volunteer resources to this community to help us document better/more, and also work with us to create ways for doing documentation without disrupting current workflows as much as possible. This is a long-term discussion, it needs months/years of planning and execution to expand a capacity that is heavily under-resourced in our Movement.
If you're doing (iii): I highly recommend that you start small, even more private, in the future. You are exposing quite a few people. You will hurt them less (or not at all) and still will learn over time. Only if you see strong reasons for opening up things at the level of this mailing list, I suggest you embark on journeys like the one you're on now.
I tend to agree with Markus that you are in a very difficult place now: you have communicated mixed signals, some people are hurt, and you need to spend a lot of time and resources on your end and theirs (if they're willing to), to start from scratch. In practice, you may be better off letting this conversation go and allowing others to pick it up and build it on a clearer base.
Best, Leila
Hi Leila,
First, thank you for your clear analyze and suggestions.
I won't respond extensively on list about this thread anymore for now.
So to your reply, I will just make a single point more clear, and take the rest in consideration off list.
Le 01/12/2017 à 22:49, Leila Zia a écrit :
(ii) I demand transparency: You need to answer my questions since transparency is important for us and I have the right to ask about any topic and demand more explanation until my satisfaction.
Once again, this is not about "I, me and my". Transparency is a core value of *our* Wikimedia movement. So the question is not to reach my satifaction, but the level of transparency which is expected in the Wikimedia movement.
As far as I'm aware, this level is nothing like "a right for any individual to ask full transparency on any topic at whichever level it wants". This is just broad unfair generalization of what I said. I never demanded such an extensive transparency level, and I actually would raise against such a demand more vigorously than what I'm doing here in favor of more transparency on a scoped issue.
My demand is on a scoped topic which, to my mind, is of deep importance for the general governance of the movement and its future as a whole. So if that is asking too much information, then yes it can be stated that I was wrong in my view regarding the expected level of transparency our community is demanding on its governance. Or maybe it's the importance of the topic and its impact that I'm miss-evaluating.
I recognize I'm all but perfect, I do mistakes, and the form of my message was a terrible one. Exaggeratedly generalized interpretation of a transparency demand is however not a proper way to discard the underlying issue.
But once again, this is the single point I wanted to makes things more clear, and the rest of Leila message seems full of good advises. So while I'm not going to make extensive laudatory comments on the reply, I'm not short of complimentary thoughts for the rest of it.
Kind regards, mathieu
[I apologize for the longish response, and I will do what I can to take the rest of this offlist as needed. I just see a couple of places where I need to add more explanation.]
On Fri, Dec 1, 2017 at 10:31 PM, mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Hi Leila,
First, thank you for your clear analyze and suggestions.
I won't respond extensively on list about this thread anymore for now.
So to your reply, I will just make a single point more clear, and take the rest in consideration off list.
Le 01/12/2017 à 22:49, Leila Zia a écrit :
(ii) I demand transparency: You need to answer my questions since transparency is important for us and I have the right to ask about any topic and demand more explanation until my satisfaction.
Once again, this is not about "I, me and my". Transparency is a core value of *our* Wikimedia movement. So the question is not to reach my satifaction, but the level of transparency which is expected in the Wikimedia movement.
(Side-note. We should take this part offline but for the record: I couldn't find a place where transparency was listed as an agreed upon and shared value of our movement as a whole. There are subgroups that consider it a core value or one of the guiding principles, and it's of course built in in many of the things we do in Wikimedia, but I'm hesitant to call it /a core value of our movement/ given that it's not listed somewhere as such. btw, for the record, it's high on my personal and professional list of values.)
While I agree that transparency is a value for many of us, it is not very clear, to at least me, how we as a whole define transparency to the level that can be used in practice. In the absence of a shared practical definition for transparency, each of us (or groups of us) define a process as transparent as a function of how big/impactful the result of a process is at each point in time, our backgrounds/cultures/countries-we're-from, how much personal trust we have in the process or the people involved in the process, etc. If this is correct, this means that in practice we as individuals or groups define what transparency means for us and we will demand specific things based on our own definition. So, while in theory you are requesting/demanding something that is likely a shared value for many of us, in practice, you are entering your own checklist (that may be shared with some other people's view on transparency in a specific case) that once met, you will call the process transparent. That's why I interpreted what I heard from you as "I" demand transparency, versus "we, as a movement" demand transparency in this case.
To give you a more specific example: as an Iranian involved in Wikimedia movement who knows Markus through his contributions to Wikidata and at a professional/work level, I trusted Markus' words when he said that those in early stages of the project didn't think of Wikidata as a project that one day becomes as big as it is today. I believe it that this was a fun project that they wanted to see succeed, but they were not sure at all if it gets somewhere, so the natural thing to do for them was to spend time to see if they can help it take off at all as opposed to spending time on documenting decisions in case it takes off and they need to show to people how they have done things. If trust between Markus and I were broken, however, I would likely not be content with that level of response and I would ask/demand for more explanation. In case (ii), and in the absence of a shared practical definition of transparency, my personal priors and understandings of the case would define when I call the process transparent.
As far as I'm aware, this level is nothing like "a right for any individual to ask full transparency on any topic at whichever level it wants". This is just broad unfair generalization of what I said. I never demanded such an extensive transparency level, and I actually would raise against such a demand more vigorously than what I'm doing here in favor of more transparency on a scoped issue.
My demand is on a scoped topic which, to my mind, is of deep importance for the general governance of the movement and its future as a whole. So if that is asking too much information, then yes it can be stated that I was wrong in my view regarding the expected level of transparency our community is demanding on its governance. Or maybe it's the importance of the topic and its impact that I'm miss-evaluating.
I recognize I'm all but perfect, I do mistakes, and the form of my message was a terrible one. Exaggeratedly generalized interpretation of a transparency demand is however not a proper way to discard the underlying issue.
Point taken. Those 3 categories and descriptions are not very carefully crafted, partly because I wanted to share the general signals that I've received from your messages (which btw, also touches on another topic: you may or may not mean certain things when you say them, but your audience, based on their own priors can understand them differently.). They are supposed to signal to you how in a broad sense what you had written had translated in my mind. I acknowledge that this thread is about one specific topic (not "any topic") and "right" to transparency can be much stronger than what you had in mind. The intention was not to exaggerate what you had said. Thanks for calling it out.
Best, Leila
Kind regards, mathieu
Leila Zia, 02/12/2017 22:48:
(Side-note. We should take this part offline but for the record: I couldn't find a place where transparency was listed as an agreed upon and shared value of our movement as a whole. There are subgroups that consider it a core value or one of the guiding principles, and it's of course built in in many of the things we do in Wikimedia, but I'm hesitant to call it /a core value of our movement/ given that it's not listed somewhere as such. btw, for the record, it's high on my personal and professional list of values.)
Transparency it's one of the 6 main Wikimedia values as listed in the "canonical" values document: https://meta.wikimedia.org/w/index.php?title=Values&oldid=15348985
I know that since 2013 things have become increasingly confusing, with other texts and qualifiers popping up, but I consider that to be just background noise.
Federico
Dear Leila
Le 02/12/2017 à 21:48, Leila Zia a écrit :
[I apologize for the longish response, and I will do what I can to take the rest of this offlist as needed. I just see a couple of places where I need to add more explanation.]
Then I feel somewhat bond to respond too. But too make it shorts, I don't think I add in this email says anything that wasn't already said before. So anyone already fed up with this thread can just skip this message with no fear to miss any revelation. And to make it clear, I don't expect any answer to this message on the list, but will diligently reply in private if you are looking for more information from my part.
(Side-note. We should take this part offline but for the record: I couldn't find a place where transparency was listed as an agreed upon and shared value of our movement as a whole. There are subgroups that consider it a core value or one of the guiding principles, and it's of course built in in many of the things we do in Wikimedia, but I'm hesitant to call it /a core value of our movement/ given that it's not listed somewhere as such. btw, for the record, it's high on my personal and professional list of values.)
Here is an official Wikimedia Foundation presentation support of 2017 related to leadership where /being transparent/ is explicitely stated in a silde titled "Staying true to our values": https://meta.wikimedia.org/w/index.php?title=File%3AWhat_is_Leadership%3F.pd...
While I agree that transparency is a value for many of us, it is not very clear, to at least me, how we as a whole define transparency to the level that can be used in practice. In the absence of a shared practical definition for transparency, each of us (or groups of us) define a process as transparent as a function of how big/impactful the result of a process is at each point in time, our backgrounds/cultures/countries-we're-from, how much personal trust we have in the process or the people involved in the process, etc. If this is correct, this means that in practice we as individuals or groups define what transparency means for us and we will demand specific things based on our own definition. So, while in theory you are requesting/demanding something that is likely a shared value for many of us, in practice, you are entering your own checklist (that may be shared with some other people's view on transparency in a specific case) that once met, you will call the process transparent. That's why I interpreted what I heard from you as "I" demand transparency, versus "we, as a movement" demand transparency in this case.
I completely agree with you with the lake of clear definition of some crucial core notions we use all the time. This is also a feedback I red in several comments in the 2017 strategy consultation. Staying vague brings both pros and cons of flexibility. An other example is "free license", which is for example used in the foundation bylaws https://wikimediafoundation.org/wiki/Bylaws, but not defined it it. One might argue that "free license" has a clear cultural meaning in the free/libre culture movement, with the four famous freedom inherited from free software. But this is a legal document, what is not clearly explicitly stated is subject to large interpretation variations. But at list the foundation has "free license" in its bylaws, I know that the equivalent is not even mentioned in the French chapter similar document https://www.wikimedia.fr/documents-officiels/statuts-de-lassociation/.
To give you a more specific example: as an Iranian involved in Wikimedia movement who knows Markus through his contributions to Wikidata and at a professional/work level, I trusted Markus' words when he said that those in early stages of the project didn't think of Wikidata as a project that one day becomes as big as it is today. I believe it that this was a fun project that they wanted to see succeed, but they were not sure at all if it gets somewhere, so the natural thing to do for them was to spend time to see if they can help it take off at all as opposed to spending time on documenting decisions in case it takes off and they need to show to people how they have done things. If trust between Markus and I were broken, however, I would likely not be content with that level of response and I would ask/demand for more explanation. In case (ii), and in the absence of a shared practical definition of transparency, my personal priors and understandings of the case would define when I call the process transparent.
The issue has nothing to do with Markus or anyone else being an honest sympathetic person, and just by "assuming good faith" surely we can grant that, even without any testimony, to every contributors unless clear proof of the contrary should make think otherwise. Also the issue is not how Wikidata project debuted in some confidential ways with uncertain results.
One issue remounted here is that publicly available data make apparent that Wikidata official launch, the choice of the CC0 license, and huge funding by three actors related to hegemonic corporations are all very close in time. On the other hand, any reference of a community decision regarding this license policy if it exits was not yet provided. Hopefully, that is a formulation that will be judged factual enough to not be interpreted as a personal attack of anyone while still letting understand how such a concomitance might raise concerns of potential conflict of interest. But actually, this first issue seems rather negligible.
The main issue is "to which future such a license policy is going to lead our community".
One scenario might be that, thanks to Wikidata large visibility, every single stakeholders of the knowledge economy get enlightened by the obviously far more interesting situation of not having any information monopoly at all, and together start heavy lobbying that leads to global abolition of all information monopolies. Also everybody become kind enough to always maintain traceability with references to its sources.
An other scenario is that BraveNewWorld™, which already has a very large user base in the field of digital answering to people requests by redirecting to third party services, imports all Wikidata information along many others data sources and directly generate sufficient relevant informations so that users never need to consult a document that is out of control of BraveNewWorld™. BraveNewWorld™ also includes in its presented answer "improved reality" features. Because, for example, everybody knows that BraveNewWorld™ is your most trusted source of information and some answers could inaccurately state otherwise. But BraveNewWorld™ has made sure that this kind of outrageous reputation damage attempt was enacted illegal with death penalty. Some legislators was not completely convinced with that at first, but in total coincidence most of this objectors lost all credit soon after as people were revealed how evil this elitists were in their private life. And now everybody on earth live happy, in great part because of BraveNewWorld™ existence. At least if you believe the Bravepedia autrogenerated prose article. Some old people venture in pretending that many of Bravepedia statements come from a thing called Wikipedia. But searching for "Wikipedia" in BraveNewWorld™ myReality will reassure everybody as it explains that is just hoaxes and common rambling among old persons with dementia. In any case, you trust Bravepedia articles, don't you? Don't mind answer, your unconscious reactions already gave enough data to BraveNewWorld™ myHappySensors that you wear. It already was computed that everything is going to be fine.
Of course many other scenarios, with obviously plenty of room for far less exaggerated ones, can be depicted.
Point taken. Those 3 categories and descriptions are not very carefully crafted, partly because I wanted to share the general signals that I've received from your messages (which btw, also touches on another topic: you may or may not mean certain things when you say them, but your audience, based on their own priors can understand them differently.). They are supposed to signal to you how in a broad sense what you had written had translated in my mind. I acknowledge that this thread is about one specific topic (not "any topic") and "right" to transparency can be much stronger than what you had in mind. The intention was not to exaggerate what you had said. Thanks for calling it out.
Ok, thank you for your feedback.
2017-12-01 9:34 GMT+01:00 Markus Kroetzsch markus.kroetzsch@tu-dresden.de:
Dear Mathieu,
You are in an impossible position. Either you want to be an objective researcher who tries to reconstruct past events as they happened, or you are pursuing an agenda to criticise and change some aspects of Wikidata. The way you do it, you are making yourself part of the debate that you claim you want to reconstruct.
From a research perspective, any material you gather in this way comes with a big question mark. You are not doing us much of a favour either, because by forcing us to refute accusations, you are placing our memories of the past events in a doubtful, heavily biased context.
Your overall approach of considering a theory to be true (or at least equally likely to be true) unless you are given "proofs that this claim is completely wrong" is not scientific. This is not how research works. For a start, Occam's Razor should make you disregard overly complex theories for things that have much simpler explanations (in our case: CC0 is a respected license chosen by many other projects for good reasons, so it is entirely plausible that the founders of Wikidata also just picked it for the usual reasons, without any secret conspiracy). And once you have an interesting theory formed, you need to gather evidence for or against it in a way that is not affected by the theory (i.e., in particular, don't start calls for information with an emotional discussion of whether or not you would personally like the theory to turn out true).
What you are doing here is completely unscientific and I hope that your supervisor (?) will also point this out to you at some point. Moreover, I am afraid that you cannot really get back to the position of an objective observer from where you are now. Better leave this research to others who are not in publicly documented disagreement with the main historic witnesses.
So you should understand that I don't feel compelled to give you a detailed account of every Wikidata-related discussion I had as if I were on some trial here. As a "researcher", it is you who has to prove your theories, not the rest of the world who has to disprove them. I already told you that your main guesses as far as they concern things I have witnessed are not true, and that's all from me for now.
I agree wholeheartedly with Markus.
I'm sorry to be blunt, but it's been almost three days now and 40+ messages, and it seems that all the fundamental reasons for this thread to be open are either too complicated to be implemented (or at least "not worth the while") or inherently biased and/or unfounded.
For so, I kindly ask all people in this list to close this thread, as it seems that nothing good will ever come out of it.
Thank you.
My reference was to in-place discussions at WMDE, not the open meetings with Markus. Each week we had an open demo where Markus usually attended. As I remember the May-discussion, it was just a discussion in the office, there was a reference to an earlier meeting. It is although easy to mix up old memories, so what happen first and what happen next should not be taken to be facts. If Markus also says the same it is although a reasonable chance we have got it right.
As to the questions about archives on open discussions with the community. This was in April-May 2012. There was no community, there were only concerned individuals. The community started to emerge in August with the first attempts to go public. On Wikidata_talk:Introduction there are some posts from 15. August 2012,[1] while first post on the subject page is from 30. October. The stuff from before October comes from a copy-paste from Meta.[3] Note that Denny writes "The data in Wikidata is published under a free license, allowing the reuse of the data in many different scenarios." but Whittylama changes this to "The data in Wikidata is published under [ http://creativecommons.org/publicdomain/zero/1.0/ a free license], allowing the reuse of the data in many different scenarios.",[4] and at that point there were a community on an open site and had been for a week. When Whittylama did his post it was the 4504th post on the site, so it was hardly the first! The license was initially a CC-SA.[8] I'm not quite sure when it was changed to CC0 in the footer,[9] but it seems to have happen before 31 October 2012, at 19:09. First post on Q1 is from 29. October 2012,[5] this is one of several items updated this evening.
It is quite enlightening to start at oldid=1 [6] and stepping forward. You will find that our present incarnation went live 25. October 2012. So much for the "birthday". To ask for archived community discussions before 25th October does not make sense, there were no site, and the only people involved were mostly devs posting at Meta. Note for example that the page Wikidata:Introduction is from Meta.[7]
[1] https://www.wikidata.org/wiki/Wikidata_talk:Introduction [2] https://www.wikidata.org/w/index.php?title=Wikidata:Introduction&oldid=2... [3] https://www.wikidata.org/w/index.php?title=Wikidata_talk:Introduction&di... [4] https://www.wikidata.org/w/index.php?title=Wikidata:Introduction&diff=ne... [5] https://www.wikidata.org/w/index.php?title=Q1&oldid=103 [6] https://www.wikidata.org/w/index.php?oldid=1 [7] https://meta.wikimedia.org/w/index.php?title=Wikidata/Introduction&oldid... [8] https://web.archive.org/web/20121027015501/http://www.wikidata.org/wiki/Wiki... [9] https://web.archive.org/web/20121102074347/http://www.wikidata.org/wiki/Wiki...
On Fri, Dec 1, 2017 at 1:18 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
Dear Mathieu,
Your post demands my response since I was there when CC0 was first chosen (i.e., in the April meeting). I won't discuss your other claims here -- the discussions on the Wikidata list are already doing this, and I agree with Lydia that no shouting is necessary here.
Nevertheless, I must at least testify to what John wrote in his earlier message (quote included below this email for reference): it was not Denny's decision to go for CC0, but the outcome of a discussion among several people who had worked with open data for some time before Wikidata was born. I have personally supported this choice and still do. I have never received any money directly or indirectly from Google, though -- full disclosure -- I got several T-shirts for supervising in Summer of Code projects.
At no time did Google or any other company take part in our discussions in the zeroth hour of Wikidata. And why should they? From what I can see on their web page, Google has no problem with all kinds of different license terms in the data they display. Also, I can tell you that we would have reacted in a very allergic way to such attempts, so if any company had approached us, this would quite likely have backfired. But, believe it or not, when we started it was all but clear that this would become a relevant project at all, and no major company even cared to lobby us. It was still mostly a few hackers getting together in varying locations in Berlin. There was a lot of fun, optimism, and excitement in this early phase of Wikidata (well, I guess we are still in this phase).
So please do not start emails with made-up stories around past events that you have not even been close to (calling something "research" is no substitute for methodology and rigour). Putting unsourced personal attacks against community members before all other arguments is a reckless way of maximising effect, and such rhetoric can damage our movement beyond this thread or topic. Our main strength is not our content but our community, and I am glad to see that many have already responded to you in such a measured and polite way.
Peace,
Markus
On 30.11.2017 09:55, John Erling Blad wrote:
Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started working for Google.
As I recall it was mention during first week of the project (first week of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may), as the delivery of the laptoppen was delayed. I was against CC0 as I expected problems with reuse og external data. The arguments for CC0 convinced me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Scott,
The NC license clause is problematic in a number of jurisdictions. For example, at least in Germany, as I remember from my law classes, it also would definitively include not-for-profits, NGOs, and even say bloggers, with or without ads on their sites. One must always be careful in the choice of a license in order to avoid unintended consequences.
Just food for thought Denny
On Thu, Nov 30, 2017, 20:51 John Erling Blad jeblad@gmail.com wrote:
My reference was to in-place discussions at WMDE, not the open meetings with Markus. Each week we had an open demo where Markus usually attended. As I remember the May-discussion, it was just a discussion in the office, there was a reference to an earlier meeting. It is although easy to mix up old memories, so what happen first and what happen next should not be taken to be facts. If Markus also says the same it is although a reasonable chance we have got it right.
As to the questions about archives on open discussions with the community. This was in April-May 2012. There was no community, there were only concerned individuals. The community started to emerge in August with the first attempts to go public. On Wikidata_talk:Introduction there are some posts from 15. August 2012,[1] while first post on the subject page is from 30. October. The stuff from before October comes from a copy-paste from Meta.[3] Note that Denny writes "The data in Wikidata is published under a free license, allowing the reuse of the data in many different scenarios." but Whittylama changes this to "The data in Wikidata is published under [ http://creativecommons.org/publicdomain/zero/1.0/ a free license], allowing the reuse of the data in many different scenarios.",[4] and at that point there were a community on an open site and had been for a week. When Whittylama did his post it was the 4504th post on the site, so it was hardly the first! The license was initially a CC-SA.[8] I'm not quite sure when it was changed to CC0 in the footer,[9] but it seems to have happen before 31 October 2012, at 19:09. First post on Q1 is from 29. October 2012,[5] this is one of several items updated this evening.
It is quite enlightening to start at oldid=1 [6] and stepping forward. You will find that our present incarnation went live 25. October 2012. So much for the "birthday". To ask for archived community discussions before 25th October does not make sense, there were no site, and the only people involved were mostly devs posting at Meta. Note for example that the page Wikidata:Introduction is from Meta.[7]
[1] https://www.wikidata.org/wiki/Wikidata_talk:Introduction [2] https://www.wikidata.org/w/index.php?title=Wikidata:Introduction&oldid=2... [3]
https://www.wikidata.org/w/index.php?title=Wikidata_talk:Introduction&di... [4]
https://www.wikidata.org/w/index.php?title=Wikidata:Introduction&diff=ne... [5] https://www.wikidata.org/w/index.php?title=Q1&oldid=103 [6] https://www.wikidata.org/w/index.php?oldid=1 [7]
https://meta.wikimedia.org/w/index.php?title=Wikidata/Introduction&oldid... [8]
https://web.archive.org/web/20121027015501/http://www.wikidata.org/wiki/Wiki... [9]
https://web.archive.org/web/20121102074347/http://www.wikidata.org/wiki/Wiki...
On Fri, Dec 1, 2017 at 1:18 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
Dear Mathieu,
Your post demands my response since I was there when CC0 was first chosen (i.e., in the April meeting). I won't discuss your other claims here --
the
discussions on the Wikidata list are already doing this, and I agree with Lydia that no shouting is necessary here.
Nevertheless, I must at least testify to what John wrote in his earlier message (quote included below this email for reference): it was not
Denny's
decision to go for CC0, but the outcome of a discussion among several people who had worked with open data for some time before Wikidata was born. I have personally supported this choice and still do. I have never received any money directly or indirectly from Google, though -- full disclosure -- I got several T-shirts for supervising in Summer of Code projects.
At no time did Google or any other company take part in our discussions
in
the zeroth hour of Wikidata. And why should they? From what I can see on their web page, Google has no problem with all kinds of different license terms in the data they display. Also, I can tell you that we would have reacted in a very allergic way to such attempts, so if any company had approached us, this would quite likely have backfired. But, believe it or not, when we started it was all but clear that this would become a
relevant
project at all, and no major company even cared to lobby us. It was still mostly a few hackers getting together in varying locations in Berlin.
There
was a lot of fun, optimism, and excitement in this early phase of
Wikidata
(well, I guess we are still in this phase).
So please do not start emails with made-up stories around past events
that
you have not even been close to (calling something "research" is no substitute for methodology and rigour). Putting unsourced personal
attacks
against community members before all other arguments is a reckless way of maximising effect, and such rhetoric can damage our movement beyond this thread or topic. Our main strength is not our content but our community, and I am glad to see that many have already responded to you in such a measured and polite way.
Peace,
Markus
On 30.11.2017 09:55, John Erling Blad wrote:
Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started working for Google.
As I recall it was mention during first week of the project (first week of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may), as
the
delivery of the laptoppen was delayed. I was against CC0 as I expected problems with reuse og external data. The arguments for CC0 convinced
me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Thanks, Denny, and All,
Glad Wikidata is CC-0 re Wikipedia's now 299 languages:
"This is the list of the different language editions of *Wikipedia https://en.wikipedia.org/wiki/Wikipedia*; as of November 2017 there are 299 Wikipedias of which 288 are active and 11 are not." https://en.wikipedia.org/wiki/List_of_Wikipedias
... and that MIT OCW is CC-4 (in now 5 languages), where its NC seems to offer a kind of competitive advantage re other kinds of structured data networking.
Curious where Wiktionary will head in the future with its stated licensing on its front page - and even re GNMT.
Best regards, Scott
On Thu, Nov 30, 2017 at 10:54 PM, Denny Vrandečić vrandecic@gmail.com wrote:
Scott,
The NC license clause is problematic in a number of jurisdictions. For example, at least in Germany, as I remember from my law classes, it also would definitively include not-for-profits, NGOs, and even say bloggers, with or without ads on their sites. One must always be careful in the choice of a license in order to avoid unintended consequences.
Just food for thought Denny
On Thu, Nov 30, 2017, 20:51 John Erling Blad jeblad@gmail.com wrote:
My reference was to in-place discussions at WMDE, not the open meetings with Markus. Each week we had an open demo where Markus usually attended. As I remember the May-discussion, it was just a discussion in the office, there was a reference to an earlier meeting. It is although easy to mix up old memories, so what happen first and what happen next should not be taken to be facts. If Markus also says the same it is although a reasonable chance we have got it right.
As to the questions about archives on open discussions with the community. This was in April-May 2012. There was no community, there were only concerned individuals. The community started to emerge in August with the first attempts to go public. On Wikidata_talk:Introduction there are some posts from 15. August 2012,[1] while first post on the subject page is from 30. October. The stuff from before October comes from a copy-paste from Meta.[3] Note that Denny writes "The data in Wikidata is published under a free license, allowing the reuse of the data in many different scenarios." but Whittylama changes this to "The data in Wikidata is published under [ http://creativecommons.org/publicdomain/zero/1.0/ a free license], allowing the reuse of the data in many different scenarios.",[4] and at that point there were a community on an open site and had been for a week. When Whittylama did his post it was the 4504th post on the site, so it was hardly the first! The license was initially a CC-SA.[8] I'm not quite sure when it was changed to CC0 in the footer,[9] but it seems to have happen before 31 October 2012, at 19:09. First post on Q1 is from 29. October 2012,[5] this is one of several items updated this evening.
It is quite enlightening to start at oldid=1 [6] and stepping forward. You will find that our present incarnation went live 25. October 2012. So much for the "birthday". To ask for archived community discussions before 25th October does not make sense, there were no site, and the only people involved were mostly devs posting at Meta. Note for example that the page Wikidata:Introduction is from Meta.[7]
[1] https://www.wikidata.org/wiki/Wikidata_talk:Introduction [2] https://www.wikidata.org/w/index.php?title=Wikidata: Introduction&oldid=2677 [3] https://www.wikidata.org/w/index.php?title=Wikidata_talk: Introduction&diff=133569705&oldid=128154617 [4] https://www.wikidata.org/w/index.php?title=Wikidata: Introduction&diff=next&oldid=4504 [5] https://www.wikidata.org/w/index.php?title=Q1&oldid=103 [6] https://www.wikidata.org/w/index.php?oldid=1 [7] https://meta.wikimedia.org/w/index.php?title=Wikidata/ Introduction&oldid=4030743 [8] https://web.archive.org/web/20121027015501/http://www. wikidata.org/wiki/Wikidata:Main_Page [9] https://web.archive.org/web/20121102074347/http://www. wikidata.org/wiki/Wikidata:Main_Page
On Fri, Dec 1, 2017 at 1:18 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
Dear Mathieu,
Your post demands my response since I was there when CC0 was first
chosen
(i.e., in the April meeting). I won't discuss your other claims here --
the
discussions on the Wikidata list are already doing this, and I agree
with
Lydia that no shouting is necessary here.
Nevertheless, I must at least testify to what John wrote in his earlier message (quote included below this email for reference): it was not
Denny's
decision to go for CC0, but the outcome of a discussion among several people who had worked with open data for some time before Wikidata was born. I have personally supported this choice and still do. I have never received any money directly or indirectly from Google, though -- full disclosure -- I got several T-shirts for supervising in Summer of Code projects.
At no time did Google or any other company take part in our discussions
in
the zeroth hour of Wikidata. And why should they? From what I can see on their web page, Google has no problem with all kinds of different
license
terms in the data they display. Also, I can tell you that we would have reacted in a very allergic way to such attempts, so if any company had approached us, this would quite likely have backfired. But, believe it
or
not, when we started it was all but clear that this would become a
relevant
project at all, and no major company even cared to lobby us. It was
still
mostly a few hackers getting together in varying locations in Berlin.
There
was a lot of fun, optimism, and excitement in this early phase of
Wikidata
(well, I guess we are still in this phase).
So please do not start emails with made-up stories around past events
that
you have not even been close to (calling something "research" is no substitute for methodology and rigour). Putting unsourced personal
attacks
against community members before all other arguments is a reckless way
of
maximising effect, and such rhetoric can damage our movement beyond this thread or topic. Our main strength is not our content but our community, and I am glad to see that many have already responded to you in such a measured and polite way.
Peace,
Markus
On 30.11.2017 09:55, John Erling Blad wrote:
Licensing was discussed in the start of the project, as in start of developing code for the project, and as I recall it the arguments for CC0 was valid and sound. That was long before Danny started working
for
Google.
As I recall it was mention during first week of the project (first
week
of april), and the duscussion reemerged during first week of development. That must have been week 4 or 5 (first week of may), as
the
delivery of the laptoppen was delayed. I was against CC0 as I expected problems with reuse og external data. The arguments for CC0 convinced
me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens did too.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Le 01/12/2017 à 05:51, John Erling Blad a écrit :
My reference was to in-place discussions at WMDE, not the open meetings with Markus. Each week we had an open demo where Markus usually attended. As I remember the May-discussion, it was just a discussion in the office, there was a reference to an earlier meeting. It is although easy to mix up old memories, so what happen first and what happen next should not be taken to be facts. If Markus also says the same it is although a reasonable chance we have got it right.
It's perfectly understandable that human memory limits arise here, I was expecting such a response. Are they some minutes of this meetings? No blame if that's not the case, Wikimedia DE for what I found already release a large set of archives, including the IRC logs of the open meeting organized each weeks. Simply if there is no trace of this, it's really unfortunate that considerations for such a crucial decision fell in oblivion while so many log are available for far less important points in term of governance.
As to the questions about archives on open discussions with the community. This was in April-May 2012. There was no community, there were only concerned individuals.
Just as a side note if it wasn't clear, by community, I was talking about the Wikimedia community at large. And if I don't make the precision, you can assume that it's how it is supposed to be denoted in my sentences.
The community started to emerge in August with the first attempts to go public. On Wikidata_talk:Introduction there are some posts from 15. August 2012,[1] while first post on the subject page is from 30. October. The stuff from before October comes from a copy-paste from Meta.[3] Note that Denny writes "The data in Wikidata is published under a free license, allowing the reuse of the data in many different scenarios." but Whittylama changes this to "The data in Wikidata is published under[http://creativecommons.org/publicdomain/zero/1.0/a free license], allowing the reuse of the data in many different scenarios.",[4] and at that point there were a community on an open site and had been for a week. When Whittylama did his post it was the 4504th post on the site, so it was hardly the first! The license was initially a CC-SA.[8] I'm not quite sure when it was changed to CC0 in the footer,[9] but it seems to have happen before 31 October 2012, at 19:09. First post on Q1 is from 29. October 2012,[5] this is one of several items updated this evening.
It is quite enlightening to start at oldid=1 [6] and stepping forward. You will find that our present incarnation went live 25. October 2012. So much for the "birthday". To ask for archived community discussions before 25th October does not make sense, there were no site, and the only people involved were mostly devs posting at Meta. Note for example that the page Wikidata:Introduction is from Meta.[7]
Thank you for all this sourced informations.
[1] https://www.wikidata.org/wiki/Wikidata_talk:Introduction [2] https://www.wikidata.org/w/index.php?title=Wikidata:Introduction&oldid=2... [3] https://www.wikidata.org/w/index.php?title=Wikidata_talk:Introduction&di... [4] https://www.wikidata.org/w/index.php?title=Wikidata:Introduction&diff=ne... [5] https://www.wikidata.org/w/index.php?title=Q1&oldid=103 [6] https://www.wikidata.org/w/index.php?oldid=1 [7] https://meta.wikimedia.org/w/index.php?title=Wikidata/Introduction&oldid... [8] https://web.archive.org/web/20121027015501/http://www.wikidata.org/wiki/Wiki... [9] https://web.archive.org/web/20121102074347/http://www.wikidata.org/wiki/Wiki...