Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0

30 Nov 2017

Le 30/11/2017 à 10:14, John Erling Blad a écrit :
...
  A single property licensing scheme would allow storage
of data, it 
 might or might not allow reuse of the licensed data together with 
 other data. Remember that all entries in the servers might be part of 
 an mashup with all other entries. That's a very interesting point. Does anyone
know a clear extensive 
report of what is legal or not regarding massive import of data 
extracted from some source?

Indeed, if there was really no limit in using "factual statement" data, 
that would be a huge loophole in copyright. For example you might 
enumerate the position of each occurrence of a word in Harry Potter, 
that's all pure facts after all. But publishing an extensive set of that 
kind of factual statements would let anyone rebuild this books.

The same might happen with an extensive extraction of data stored 
initially in Wikipedia under CC-by-sa, and imported in Wikidata. There 
is already the ArticlePlaceholder[1] extension which is a first step in 
generating whole complete prosodic encyclopaedic article, which then 
should be logically be publishable under CC0. Thus the concerns of 
license laundering.

Not having a way to track sources and their corresponding licenses 
doesn't make automagically disappear that there are licenses issues in 
the first place. An integrating license tracking system should enable to 
detect possible infractions in remixes. Users should be informed that 
what they are trying to mix is legally authorized by the miscellaneous 
ultimate sources from which Wikidata gathered them, or not. Until some 
solid legal report point in this direction, it's not accurate to pretend 
unilaterally that they can do whatever they want regardless of sources 
from which Wikidata gathered them in the first place even if it's a 
massive import of a differently licensed source.

[1] https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder

...

 On Thu, Nov 30, 2017 at 9:55 AM, John Erling Blad &lt;jeblad(a)gmail.com 
 <mailto:jeblad@gmail.com>> wrote:

     Please keep this civil and on topic!

     Licensing was discussed in the start of the project, as in start
     of developing code for the project, and as I recall it the
     arguments for CC0 was valid and sound. That was long before Danny
     started working for Google.

     As I recall it was mention during first week of the project (first
     week of april), and the duscussion reemerged during first week of
     development. That must have been week 4 or 5 (first week of may),
     as the delivery of the laptoppen was delayed. I was against CC0 as
     I expected problems with reuse og external data. The arguments for
     CC0 convinced me.

     And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen
     and Jens did too.

     Argument is pretty simple: Part A has some data A and claim
     license A. Part B has some data B and claim license B. Both
     license A and  license B are sticky, this later data C that use an
     aggregation of A and B must satisfy both license A and license B.
     That is not viable.

     Moving forward to a safe, non-sticky license seems to be the only
     viable solution, and this leads to CC0.

     Feel free to discuss the merrit of our choice but do not use
     personal attacs. Thank you.

     Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli
     &lt;martinelliluca(a)gmail.com <mailto:martinelliluca@gmail.com>>:

         Oh, and by the way, ODbL was considered as a potential
         license, but I recall that that license could have been
         incompatible for reuse with CC BY-SA 3.0. It was actually a
         point of discussion with the Italian OpenStreetMap community
         back in 2013, when I first presented at the OSM-IT meeting the
         possibility of a collaboration between WD and OSM.

         L.

         Il 30 nov 2017 08:57, "Luca Martinelli"
         &lt;martinelliluca(a)gmail.com <mailto:martinelliluca@gmail.com>>
         ha scritto:

             I basically stopped reading this email after the first
             attack to Denny.

             I was there since the beginning, and I do recall the
             *extensive* discussion about what license to use. CC0 was
             chosen, among other things, because of the moronic EU rule
             about database rights, that CC 3.0 licenses didn't allow
             us to counter - please remember that 4.0 were still under
             discussion, and we couldn't afford the luxury of waiting
             for 4.0 to come out before publishing Wikidata.

             And possibly next time provide a TL;DR version of your
             email at the top.

             Cheers,

             L.

             Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz"
             &lt;psychoslave(a)culture-libre.org
             <mailto:psychoslave@culture-libre.org>> ha scritto:

                 Saluton ĉiuj,

                 I forward here the message I initially posted on the
                 Meta Tremendous Wiktionary User Group talk page

<https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>,
                 because I'm interested to have a wider feedback of the
                 community on this point. Whether you think that my
                 view is completely misguided or that I might have a
                 few relevant points, I'm extremely interested to know
                 it, so please be bold.

                 Before you consider digging further in this reading,
                 keep in mind that I stay convinced that Wikidata is a
                 wonderful project and I wish it a bright future full
                 of even more amazing things than what it already brung
                 so far. My sole concern is really a license issue.

                 Bellow is a copy/paste of the above linked message:

                 Thank you Lydia Pintscher
                 <https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29>
                 for taking the time to answer. Unfortunately this
                 answer

<https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0>
                 miss too many important points to solve all concerns
                 which have been raised.

                 Notably, there is still no beginning of hint in it
                 about where the decision of using CC0 exclusively for
                 Wikidata came from. But as this inquiry on the topic

<https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive>
                 advance, an answer is emerging from it. It seems that
                 Wikidata choice toward CC0 was heavily influenced by
                 Denny Vrandečić, who – to make it short – is now
                 working in the Google Knowledge Graph team. Also it
                 worth noting that Google funded a quarter of the
                 initial development work. Another quarter came from
                 the Gordon and Betty Moore Foundation, established by
                 Intel co-founder. And half the money came from
                 Microsoft co-founder Paul Allen's Institute for
                 Artificial Intelligence (AI2)[1]

<https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1>.
                 To state it shortly in a conspirational fashion,
                 Wikidata is the puppet trojan horse of big tech
                 hegemonic companies into the realm of Wikimedia. For a
                 less tragic, more argumentative version, please see
                 the research project (work in progress, only chapter 1
                 is in good enough shape, and it's only available in
                 French so far). Some proofs that this claim is
                 completely wrong are welcome, as it would be great
                 that in fact that was the community that was the
                 driving force behind this single license choice and
                 that it is the best choice for its future, not the
                 future of giant tech companies. This would be a great
                 contribution to bring such a happy light on this
                 subject, so we can all let this issue alone and go
                 back contributing in more interesting topics.

                 Now let's examine the thoughts proposed by Lydia.

                 Wikidata is here to give more people more access to
                 more knowledge.
                     So far, it makes it matches Wikimedia movement
                     stated goal. 
                 This means we want our data to be used as widely as
                 possible.
                     Sure, as long as it rhymes with equity. As in /Our
                     strategic direction: Service and //*Equity*/

<https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity>.
                     Just like we want freedom for everybody as widely
                     as possible. That is, starting where it confirms
                     each others freedom. Because under this level,
                     freedom of one is murder and slavery of others. 
                 CC-0 is one step towards that.
                     That's a thesis, you can propose to defend it but
                     no one have to agree without some convincing proof. 
                 Data is different from many other things we produce in
                 Wikimedia in that it is aggregated, combined,
                 mashed-up, filtered, and so on much more extensively.
                     No it's not. From a data processing point of view,
                     everything is data. Whether it's stored in a
                     wikisyntax, in a relational database or engraved
                     in stone only have a commodity side effect.
                     Whether it's a random stream of bit generated by a
                     dumb chipset or some encoded prose of Shakespeare
                     make no difference. So from this point of view,
                     no, what Wikidata store is not different from what
                     is produced anywhere else in Wikimedia projects. 
                     Sure, the way it's structured does extremely ease
                     many things. But this is not because it's data,
                     when elsewhere there would be no data. It's
                     because it enforce data to be stored in a way that
                     ease aggregation, combination, mashing-up,
                     filtering and so on. 

                 Our data lives from being able to write queries over
                 millions of statements, putting it into a mobile app,
                 visualizing parts of it on a map and much more.
                     Sure. It also lives from being curated from
                     millions[2]

<https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2>
                     of benevolent contributors, or it would be just a
                     useless pile of random bytes. 
                 This means, if we require attribution, in a huge
                 number of cases attribution would need to go back to
                 potentially millions of editors and sources (even if
                 that data is not visible in the end result but only
                 helped to get the result).
                     No, it doesn't mean that. 
                     First let's recall a few basics as it seems the
                     whole answer makes confusion between attribution
                     and distribution of contributions under the same
                     license as the original. Attribution is crucial
                     for traceability and so for reliable and trusted
                     knowledge that we are targeting within the
                     Wikimedia movement. The "same license" is the sole
                     legal guaranty of equity contributors have. That's
                     it, trusted knowledge and equity are requirements
                     for the Wikimedia movement goals. That means
                     withdrawing this requirements is withdrawing this
                     goals. 
                     Now, what would be the additional cost of storing
                     sources in Wikidata? Well, zero cost. Actually,
                     it's already here as the "reference" attribute is
                     part of the Wikibase item structure. So
                     attribution is not a problem, you don't have to
                     put it in front of your derived work, just look at
                     a Wikipedia article: until you go to history, you
                     have zero attribution visible, and it's ok. It's
                     also have probably zero or negligible computing
                     cost, as it doesn't have to be included in all
                     computations, it just need to be retrievable on
                     demand. 
                     What would be the additional cost of storing
                     licenses for each item based on its source? Well,
                     adding a license attribute might help, but
                     actually if your reference is a work item, I guess
                     it might comes with a "license" statement, so zero
                     additional cost. Now for letting user specify
                     under which free licenses they publish their work,
                     that would just require an additional attribute, a
                     ridiculous weight when balanced with equity
                     concerns it resolves. 
                     Could that prevent some uses for some actors? Yes,
                     that's actually the point, preventing abuse of
                     those who doesn't want to act equitably. For all
                     other actors a "distribute under same condition"
                     is fine. 
                 This is potentially computationally hard to do and and
                 depending on where the data is used very inconvenient
                 (think of a map with hundreds of data points in a
                 mobile app).
                     OpenStreetMap which use ODbL, a copyleft
                     attributive license, do exactly that too, doesn't
                     it? By the way, allowing a license by item would
                     enable to include OpenStreetMap data in WikiData,
                     which is currently impossible due to the CC0
                     single license policy of the project. Too bad, it
                     could be so useful to have this data accessible
                     for Wikimedia projects, but who cares? 
                 This is a burden on our re-users that I do not want to
                 impose on them.
                     Wait, which re-users? Surely one might expect that
                     Wikidata would care first of re-users which are in
                     the phase with Wikimedia goal, so surely needs of
                     Wikimedia community in particular and Free/Libre
                     Culture in general should be considered. Do this
                     re-users would be penalized by a copyleft license?
                     Surely no, or they wouldn't use it extensively as
                     they do. So who are this re-users for who it's
                     thought preferable, without consulting the
                     community, to not annoy with questions of equity
                     and traceability? 
                 It would make it significantly harder to re-use our
                 data and be in direct conflict with our goal of
                 spreading knowledge.
                     No, technically it would be just as easy as
                     punching a button on a computer to do that rather
                     than this. What is in direct conflict with our
                     clearly stated goals emerging from the 2017
                     community consultation is going against equity and
                     traceability. You propose to discard both to
                     satisfy exogenous demands which should have next
                     to no weight in decision impacting so deeply the
                     future of our community. 
                 Whether data can be protected in this way at all or
                 not depends on the jurisdiction we are talking about.
                 See this Wikilegal on on database rights
                 <https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights>
                 for more details.
                     It says basically that it's applicable in United
                     States and Europe on different legal bases and
                     extents. And for the rest of the world, it doesn't
                     say it doesn't say nothing can apply, it states
                     nothing. 
                 So even if we would have decided to require
                 attribution it would only be enforceable in some
                 jurisdictions.
                     What kind of logic is that? Maybe it might not be
                     applicable in some country, so let's withdraw the
                     few rights we have. 
                 Ambiguity, when it comes to legal matters, also
                 unfortunately often means that people refrain from
                 what they want to to for fear of legal repercussions.
                 This is directly in conflict with our goal of
                 spreading knowledge.
                     Economic inequality, social inequity and legal
                     imbalance might also refrain people from doing
                     what they want, as they fear practical
                     repercussions. CC0 strengthen this discrimination
                     factors by enforcing people to withdraw the few
                     rights they have to weight against the growing
                     asymmetry that social structures are concomitantly
                     building. So CC0 as unique license choice is in
                     direct conflict with our goal of *equitably*
                     spreading knowledge. 
                     Also it seems like this statement suggest that
                     releasing our contributions only under CC0 is the
                     sole solution to diminish legal doubts. Actually
                     any well written license would do an equal job
                     regarding this point, including many copyleft
                     licenses out there. So while associate a clear
                     license to each data item might indeed diminish
                     legal uncertainty, it's not an argument at all for
                     enforcing CC0 as sole license available to
                     contributors. 
                     Moreover, just putting a license side by side with
                     a work does not ensure that the person who made
                     the association was legally allowed to do so. To
                     have a better confidence in the legitimacy of a
                     statement that a work is covered by a certain
                     license, there is once again a traceability
                     requirement. For example, Wikidata currently
                     include many items which were imported from misc.
                     Wikipedia versions, and claim that the derived
                     work obtained – a set of items and statements – is
                     under CC0. That is a hugely doubtful statement and
                     it alarmingly looks like license laundering
                     <https://en.wikipedia.org/wiki/license_laundering>.
                     This is true for Wikipedia, but it's also true for
                     any source on which a large scale extraction and
                     import are operated, whether through bots or crowd
                     sourcing. 
                     So the Wikidata project is currently extremely
                     misplaced to give lessons on legal ambiguity, as
                     it heavily plays with legal blur and the hope that
                     its shady practises won't fall under too much
                     scrutiny. 
                 Licenses that require attribution are often used as a
                 way to try to make it harder for big companies to
                 profit from openly available resources.
                     No there are not. They are used as /a way to try
                     to make it harder for big companies to profit from
                     openly available resources/ *in inequitable
                     manners*. That's completely different. Copyleft
                     licenses give the same rights to big companies and
                     individuals in a manner that lower socio-economic
                     inequalities which disproportionally advantage the
                     former. 
                 The thing is there seems to be no indication of this
                 working.
                     Because it's not trying to enforce what you
                     pretend, so of course it's not working for this
                     goal. But for the goal that copyleft licenses aims
                     at, there are clear evidences that yes it works. 
                 Big companies have the legal and engineering resources
                 to handle both the legal minefield and the technical
                 hurdles easily.
                     There is no pitfall in copyleft licenses. Using
                     war material analogy is disrespectful. That's true
                     that copyleft licenses might come with some
                     constraints that non-copyleft free licenses don't
                     have, but that the price for fostering equity. And
                     it's a low price, that even individuals can
                     manage, it might require a very little extra time
                     on legal considerations, but on the other hand
                     using the free work is an immensely vast gain that
                     worth it. In Why you shouldn't use the Lesser GPL
                     for your next library
                     <https://www.gnu.org/licenses/why-not-lgpl.html>
                     is stated /proprietary software developers have
                     the advantage of money; free software developers
                     need to make advantages for each other/. This
                     might be generalised as /big companies have the
                     advantage of money; free/libre culture
                     contributors need to make advantages for each
                     other/. So at odd with what pretend this
                     fallacious claims against copyleft licenses, they
                     are not a "minefield and the technical hurdles"
                     that only big companies can handle. All the more,
                     let's recall who financed the initial development
                     of Wikidata: only actors which are related to big
                     companies. 
                 Who it is really hurting is the smaller start-up,
                 institution or hacker who can not deal with it.
                     If this statement is about copyleft licenses, then
                     this is just plainly false. Smaller actors have
                     more to gain in preserving mutual benefit of the
                     common ecosystem that a copyleft license fosters. 
                 With Wikidata we are making structured data about the
                 world available for everyone.
                     And that's great. But that doesn't require CC0 as
                     sole license to be achieved. 
                 We are leveling the playing field to give those who
                 currently don’t have access to the knowledge graphs of
                 the big companies a chance to build something amazing.
                     And that's great. But that doesn't require CC0 as
                     sole license. Actually CC0 makes it a less
                     sustainable project on this point, as it allows
                     unfair actors to take it all, add some interesting
                     added value that our community can not afford,
                     reach/reinforce an hegemonic position in the
                     ecosystem with their own closed solution. And, ta
                     ta, Wikidata can be discontinued quietly, just
                     like Google did with the defunct Freebase which
                     was CC-BY-SA before they bought the company that
                     was running it, and after they imported it under
                     CC0 in Wikidata as a new attempt to gather a
                     larger community of free curators. And when it
                     will have performed license laundering of all
                     Wikimedia projects works with shady mass extract
                     and import, Wikimedia can disappear as well. Of
                     course big companies benefits more of this
                     possibilities than actors with smaller financial
                     support and no hegemonic position. 
                 Thereby we are helping more people get access to
                 knowledge from more places than just the few big ones.
                     No, with CC0 you are certainly helping big
                     companies to reinforce their position in which
                     they can distribute information manipulated as
                     they wish, without consideration for traceability
                     and equity considerations. Allowing contributors
                     to also use copyleft licenses would be far more
                     effective to /collect and use different forms of
                     free, trusted knowledge/ that /focus efforts on
                     the knowledge and communities that have been left
                     out by structures of power and privilege/, as
                     stated in /Our strategic direction: Service and
                     Equity/. 

                 CC-0 is becoming more and more common.
                     Just like economic inequality
                     <https://en.wikipedia.org/wiki/economic_inequality>.
                     But that is not what we are aiming to foster in
                     the Wikimedia movement. 
                 Many organisations are releasing their data under CC-0
                 and are happy with the experience. Among them are the
                 European Union, Europeana, the National Library of
                 Sweden and the Metropolitan Museum of Modern Arts.
                     Good for them. But they are not the Wikimedia
                     community, they have their own goals and plan to
                     be sustainable that does not necessarily meet what
                     our community can follow. Different contexts
                     require different means. States and their
                     institutions can count on tax revenue, and if
                     taxpayers ends up in public domain works, that's
                     great and seems fair. States are rarely threatened
                     by companies, they have legal lever to pressure
                     that kind of entity, although conflict of interest
                     and lobbying can of course mitigate this statement. 
                     Importing that kind of data with proper
                     attribution and license is fine, be it CC0 or any
                     other free license. But that's not an argument in
                     favour of enforcing on benevolent a systematic
                     withdraw of all their rights as single option to
                     contribute. 
                 All this being said we do encourage all re-users of
                 our data to give attribution to Wikidata because we
                 believe it is in the interest of all parties involved.
                     That's it, zero legal hope of equity. 
                 And our experience shows that many of our re-users do
                 give credit to Wikidata even if they are not forced to.
                     Experience also show that some prominent actors
                     like Google won't credit the Wikimedia community
                     anymore when generating directly answer based on,
                     inter alia, information coming from Wikidata,
                     which is itself performing license laundering of
                     Wikipedia data. 
                 Are there no downsides to this? No, of course not.
                 Some people chose not to participate, some data can't
                 be imported and some re-users do not attribute us. But
                 the benefits I have seen over the years for Wikidata
                 and the larger open knowledge ecosystem far outweigh
                 them.
                     This should at least backed with some solid
                     statistics that it had a positive impact in term
                     of audience and contribution in Wikimedia project
                     as a whole. Maybe the introduction of Wikidata did
                     have a positive effect on the evolution of total
                     number of contributors, or maybe so far it has no
                     significant correlative effect, or maybe it is
                     correlative with a decrease of the total number of
                     active contributors. Some plots would be
                     interesting here. Mere personal feelings of
                     benefits and hindrances means nothing here, mine
                     included of course. 
                     Plus, there is not even the beginning of an
                     attempt to A/B test with a second Wikibase instant
                     that allow users to select which licenses its
                     contributions are released under, so there is no
                     possible way to state anything backed on relevant
                     comparison. The fact that they are some people
                     satisfied with the current state of things doesn't
                     mean they would not be even more satisfied with a
                     more equitable solution that allows contributors
                     to chose a free license set for their
                     publications. All the more this is all about the
                     sustainability and fostering of our community and
                     reaching its goals, not immediate feeling of
                     satisfaction for some people. 

                  *

                     [1] Wikipedia Signpost 2015, 2nd december

<https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed>

                  *

                     [2] according to the next statement of Lydia

                 Once again, I recall this is not a manifesto against
                 Wikidata. The motivation behind this message is a hope
                 that one day one might participate in Wikidata with
                 the same respect for equity and traceability that is
                 granted in other Wikimedia projects.

                 Kun multe da vikiamo,
                 mathieu

                 _______________________________________________
                 Wikidata mailing list
                 Wikidata(a)lists.wikimedia.org
                 <mailto:Wikidata@lists.wikimedia.org>
                 https://lists.wikimedia.org/mailman/listinfo/wikidata
                 <https://lists.wikimedia.org/mailman/listinfo/wikidata>

         _______________________________________________
         Wikidata mailing list
         Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
         https://lists.wikimedia.org/mailman/listinfo/wikidata
         <https://lists.wikimedia.org/mailman/listinfo/wikidata>

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0