A single property licensing scheme would allow storage
of data, it
might or might not allow reuse of the licensed data together with
other data. Remember that all entries in the servers might be part of
an mashup with all other entries.
That's a very interesting point. Does anyone
know a clear extensive
report of what is legal or not regarding massive import of data
extracted from some source?
Indeed, if there was really no limit in using "factual statement" data,
that would be a huge loophole in copyright. For example you might
enumerate the position of each occurrence of a word in Harry Potter,
that's all pure facts after all. But publishing an extensive set of that
kind of factual statements would let anyone rebuild this books.
The same might happen with an extensive extraction of data stored
initially in Wikipedia under CC-by-sa, and imported in Wikidata. There
is already the ArticlePlaceholder[1] extension which is a first step in
generating whole complete prosodic encyclopaedic article, which then
should be logically be publishable under CC0. Thus the concerns of
license laundering.
Not having a way to track sources and their corresponding licenses
doesn't make automagically disappear that there are licenses issues in
the first place. An integrating license tracking system should enable to
detect possible infractions in remixes. Users should be informed that
what they are trying to mix is legally authorized by the miscellaneous
ultimate sources from which Wikidata gathered them, or not. Until some
solid legal report point in this direction, it's not accurate to pretend
unilaterally that they can do whatever they want regardless of sources
from which Wikidata gathered them in the first place even if it's a
massive import of a differently licensed source.
[1]
On Thu, Nov 30, 2017 at 9:55 AM, John Erling Blad <jeblad(a)gmail.com
<mailto:jeblad@gmail.com>> wrote:
Please keep this civil and on topic!
Licensing was discussed in the start of the project, as in start
of developing code for the project, and as I recall it the
arguments for CC0 was valid and sound. That was long before Danny
started working for Google.
As I recall it was mention during first week of the project (first
week of april), and the duscussion reemerged during first week of
development. That must have been week 4 or 5 (first week of may),
as the delivery of the laptoppen was delayed. I was against CC0 as
I expected problems with reuse og external data. The arguments for
CC0 convinced me.
And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen
and Jens did too.
Argument is pretty simple: Part A has some data A and claim
license A. Part B has some data B and claim license B. Both
license A and license B are sticky, this later data C that use an
aggregation of A and B must satisfy both license A and license B.
That is not viable.
Moving forward to a safe, non-sticky license seems to be the only
viable solution, and this leads to CC0.
Feel free to discuss the merrit of our choice but do not use
personal attacs. Thank you.
Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli
<martinelliluca(a)gmail.com <mailto:martinelliluca@gmail.com>>:
Oh, and by the way, ODbL was considered as a potential
license, but I recall that that license could have been
incompatible for reuse with CC BY-SA 3.0. It was actually a
point of discussion with the Italian OpenStreetMap community
back in 2013, when I first presented at the OSM-IT meeting the
possibility of a collaboration between WD and OSM.
L.
Il 30 nov 2017 08:57, "Luca Martinelli"
<martinelliluca(a)gmail.com <mailto:martinelliluca@gmail.com>>
ha scritto:
I basically stopped reading this email after the first
attack to Denny.
I was there since the beginning, and I do recall the
*extensive* discussion about what license to use. CC0 was
chosen, among other things, because of the moronic EU rule
about database rights, that CC 3.0 licenses didn't allow
us to counter - please remember that 4.0 were still under
discussion, and we couldn't afford the luxury of waiting
for 4.0 to come out before publishing Wikidata.
And possibly next time provide a TL;DR version of your
email at the top.
Cheers,
L.
Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz"
<psychoslave(a)culture-libre.org
<mailto:psychoslave@culture-libre.org>> ha scritto:
Saluton ĉiuj,
I forward here the message I initially posted on the
Meta Tremendous Wiktionary User Group talk page
<https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>,
because I'm interested to have a wider feedback of the
community on this point. Whether you think that my
view is completely misguided or that I might have a
few relevant points, I'm extremely interested to know
it, so please be bold.
Before you consider digging further in this reading,
keep in mind that I stay convinced that Wikidata is a
wonderful project and I wish it a bright future full
of even more amazing things than what it already brung
so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher
<https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29>
for taking the time to answer. Unfortunately this
answer
<https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0>
miss too many important points to solve all concerns
which have been raised.
Notably, there is still no beginning of hint in it
about where the decision of using CC0 exclusively for
Wikidata came from. But as this inquiry on the topic
<https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive>
advance, an answer is emerging from it. It seems that
Wikidata choice toward CC0 was heavily influenced by
Denny Vrandečić, who – to make it short – is now
working in the Google Knowledge Graph team. Also it
worth noting that Google funded a quarter of the
initial development work. Another quarter came from
the Gordon and Betty Moore Foundation, established by
Intel co-founder. And half the money came from
Microsoft co-founder Paul Allen's Institute for
Artificial Intelligence (AI2)[1]
<https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1>.
To state it shortly in a conspirational fashion,
Wikidata is the puppet trojan horse of big tech
hegemonic companies into the realm of Wikimedia. For a
less tragic, more argumentative version, please see
the research project (work in progress, only chapter 1
is in good enough shape, and it's only available in
French so far). Some proofs that this claim is
completely wrong are welcome, as it would be great
that in fact that was the community that was the
driving force behind this single license choice and
that it is the best choice for its future, not the
future of giant tech companies. This would be a great
contribution to bring such a happy light on this
subject, so we can all let this issue alone and go
back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia.
Wikidata is here to give more people more access to
more knowledge.
So far, it makes it matches Wikimedia movement
stated goal.
This means we want our data to be used as widely as
possible.
Sure, as long as it rhymes with equity. As in /Our
strategic direction: Service and //*Equity*/
<https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity>.
Just like we want freedom for everybody as widely
as possible. That is, starting where it confirms
each others freedom. Because under this level,
freedom of one is murder and slavery of others.
CC-0 is one step towards that.
That's a thesis, you can propose to defend it but
no one have to agree without some convincing proof.
Data is different from many other things we produce in
Wikimedia in that it is aggregated, combined,
mashed-up, filtered, and so on much more extensively.
No it's not. From a data processing point of view,
everything is data. Whether it's stored in a
wikisyntax, in a relational database or engraved
in stone only have a commodity side effect.
Whether it's a random stream of bit generated by a
dumb chipset or some encoded prose of Shakespeare
make no difference. So from this point of view,
no, what Wikidata store is not different from what
is produced anywhere else in Wikimedia projects.
Sure, the way it's structured does extremely ease
many things. But this is not because it's data,
when elsewhere there would be no data. It's
because it enforce data to be stored in a way that
ease aggregation, combination, mashing-up,
filtering and so on.
Our data lives from being able to write queries over
millions of statements, putting it into a mobile app,
visualizing parts of it on a map and much more.
Sure. It also lives from being curated from
millions[2]
<https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2>
of benevolent contributors, or it would be just a
useless pile of random bytes.
This means, if we require attribution, in a huge
number of cases attribution would need to go back to
potentially millions of editors and sources (even if
that data is not visible in the end result but only
helped to get the result).
No, it doesn't mean that.
First let's recall a few basics as it seems the
whole answer makes confusion between attribution
and distribution of contributions under the same
license as the original. Attribution is crucial
for traceability and so for reliable and trusted
knowledge that we are targeting within the
Wikimedia movement. The "same license" is the sole
legal guaranty of equity contributors have. That's
it, trusted knowledge and equity are requirements
for the Wikimedia movement goals. That means
withdrawing this requirements is withdrawing this
goals.
Now, what would be the additional cost of storing
sources in Wikidata? Well, zero cost. Actually,
it's already here as the "reference" attribute is
part of the Wikibase item structure. So
attribution is not a problem, you don't have to
put it in front of your derived work, just look at
a Wikipedia article: until you go to history, you
have zero attribution visible, and it's ok. It's
also have probably zero or negligible computing
cost, as it doesn't have to be included in all
computations, it just need to be retrievable on
demand.
What would be the additional cost of storing
licenses for each item based on its source? Well,
adding a license attribute might help, but
actually if your reference is a work item, I guess
it might comes with a "license" statement, so zero
additional cost. Now for letting user specify
under which free licenses they publish their work,
that would just require an additional attribute, a
ridiculous weight when balanced with equity
concerns it resolves.
Could that prevent some uses for some actors? Yes,
that's actually the point, preventing abuse of
those who doesn't want to act equitably. For all
other actors a "distribute under same condition"
is fine.
This is potentially computationally hard to do and and
depending on where the data is used very inconvenient
(think of a map with hundreds of data points in a
mobile app).
OpenStreetMap which use ODbL, a copyleft
attributive license, do exactly that too, doesn't
it? By the way, allowing a license by item would
enable to include OpenStreetMap data in WikiData,
which is currently impossible due to the CC0
single license policy of the project. Too bad, it
could be so useful to have this data accessible
for Wikimedia projects, but who cares?
This is a burden on our re-users that I do not want to
impose on them.
Wait, which re-users? Surely one might expect that
Wikidata would care first of re-users which are in
the phase with Wikimedia goal, so surely needs of
Wikimedia community in particular and Free/Libre
Culture in general should be considered. Do this
re-users would be penalized by a copyleft license?
Surely no, or they wouldn't use it extensively as
they do. So who are this re-users for who it's
thought preferable, without consulting the
community, to not annoy with questions of equity
and traceability?
It would make it significantly harder to re-use our
data and be in direct conflict with our goal of
spreading knowledge.
No, technically it would be just as easy as
punching a button on a computer to do that rather
than this. What is in direct conflict with our
clearly stated goals emerging from the 2017
community consultation is going against equity and
traceability. You propose to discard both to
satisfy exogenous demands which should have next
to no weight in decision impacting so deeply the
future of our community.
Whether data can be protected in this way at all or
not depends on the jurisdiction we are talking about.
See this Wikilegal on on database rights
<https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights>
for more details.
It says basically that it's applicable in United
States and Europe on different legal bases and
extents. And for the rest of the world, it doesn't
say it doesn't say nothing can apply, it states
nothing.
So even if we would have decided to require
attribution it would only be enforceable in some
jurisdictions.
What kind of logic is that? Maybe it might not be
applicable in some country, so let's withdraw the
few rights we have.
Ambiguity, when it comes to legal matters, also
unfortunately often means that people refrain from
what they want to to for fear of legal repercussions.
This is directly in conflict with our goal of
spreading knowledge.
Economic inequality, social inequity and legal
imbalance might also refrain people from doing
what they want, as they fear practical
repercussions. CC0 strengthen this discrimination
factors by enforcing people to withdraw the few
rights they have to weight against the growing
asymmetry that social structures are concomitantly
building. So CC0 as unique license choice is in
direct conflict with our goal of *equitably*
spreading knowledge.
Also it seems like this statement suggest that
releasing our contributions only under CC0 is the
sole solution to diminish legal doubts. Actually
any well written license would do an equal job
regarding this point, including many copyleft
licenses out there. So while associate a clear
license to each data item might indeed diminish
legal uncertainty, it's not an argument at all for
enforcing CC0 as sole license available to
contributors.
Moreover, just putting a license side by side with
a work does not ensure that the person who made
the association was legally allowed to do so. To
have a better confidence in the legitimacy of a
statement that a work is covered by a certain
license, there is once again a traceability
requirement. For example, Wikidata currently
include many items which were imported from misc.
Wikipedia versions, and claim that the derived
work obtained – a set of items and statements – is
under CC0. That is a hugely doubtful statement and
it alarmingly looks like license laundering
<https://en.wikipedia.org/wiki/license_laundering>.
This is true for Wikipedia, but it's also true for
any source on which a large scale extraction and
import are operated, whether through bots or crowd
sourcing.
So the Wikidata project is currently extremely
misplaced to give lessons on legal ambiguity, as
it heavily plays with legal blur and the hope that
its shady practises won't fall under too much
scrutiny.
Licenses that require attribution are often used as a
way to try to make it harder for big companies to
profit from openly available resources.
No there are not. They are used as /a way to try
to make it harder for big companies to profit from
openly available resources/ *in inequitable
manners*. That's completely different. Copyleft
licenses give the same rights to big companies and
individuals in a manner that lower socio-economic
inequalities which disproportionally advantage the
former.
The thing is there seems to be no indication of this
working.
Because it's not trying to enforce what you
pretend, so of course it's not working for this
goal. But for the goal that copyleft licenses aims
at, there are clear evidences that yes it works.
Big companies have the legal and engineering resources
to handle both the legal minefield and the technical
hurdles easily.
There is no pitfall in copyleft licenses. Using
war material analogy is disrespectful. That's true
that copyleft licenses might come with some
constraints that non-copyleft free licenses don't
have, but that the price for fostering equity. And
it's a low price, that even individuals can
manage, it might require a very little extra time
on legal considerations, but on the other hand
using the free work is an immensely vast gain that
worth it. In Why you shouldn't use the Lesser GPL
for your next library
<https://www.gnu.org/licenses/why-not-lgpl.html>
is stated /proprietary software developers have
the advantage of money; free software developers
need to make advantages for each other/. This
might be generalised as /big companies have the
advantage of money; free/libre culture
contributors need to make advantages for each
other/. So at odd with what pretend this
fallacious claims against copyleft licenses, they
are not a "minefield and the technical hurdles"
that only big companies can handle. All the more,
let's recall who financed the initial development
of Wikidata: only actors which are related to big
companies.
Who it is really hurting is the smaller start-up,
institution or hacker who can not deal with it.
If this statement is about copyleft licenses, then
this is just plainly false. Smaller actors have
more to gain in preserving mutual benefit of the
common ecosystem that a copyleft license fosters.
With Wikidata we are making structured data about the
world available for everyone.
And that's great. But that doesn't require CC0 as
sole license to be achieved.
We are leveling the playing field to give those who
currently don’t have access to the knowledge graphs of
the big companies a chance to build something amazing.
And that's great. But that doesn't require CC0 as
sole license. Actually CC0 makes it a less
sustainable project on this point, as it allows
unfair actors to take it all, add some interesting
added value that our community can not afford,
reach/reinforce an hegemonic position in the
ecosystem with their own closed solution. And, ta
ta, Wikidata can be discontinued quietly, just
like Google did with the defunct Freebase which
was CC-BY-SA before they bought the company that
was running it, and after they imported it under
CC0 in Wikidata as a new attempt to gather a
larger community of free curators. And when it
will have performed license laundering of all
Wikimedia projects works with shady mass extract
and import, Wikimedia can disappear as well. Of
course big companies benefits more of this
possibilities than actors with smaller financial
support and no hegemonic position.
Thereby we are helping more people get access to
knowledge from more places than just the few big ones.
No, with CC0 you are certainly helping big
companies to reinforce their position in which
they can distribute information manipulated as
they wish, without consideration for traceability
and equity considerations. Allowing contributors
to also use copyleft licenses would be far more
effective to /collect and use different forms of
free, trusted knowledge/ that /focus efforts on
the knowledge and communities that have been left
out by structures of power and privilege/, as
stated in /Our strategic direction: Service and
Equity/.
CC-0 is becoming more and more common.
Just like economic inequality
<https://en.wikipedia.org/wiki/economic_inequality>.
But that is not what we are aiming to foster in
the Wikimedia movement.
Many organisations are releasing their data under CC-0
and are happy with the experience. Among them are the
European Union, Europeana, the National Library of
Sweden and the Metropolitan Museum of Modern Arts.
Good for them. But they are not the Wikimedia
community, they have their own goals and plan to
be sustainable that does not necessarily meet what
our community can follow. Different contexts
require different means. States and their
institutions can count on tax revenue, and if
taxpayers ends up in public domain works, that's
great and seems fair. States are rarely threatened
by companies, they have legal lever to pressure
that kind of entity, although conflict of interest
and lobbying can of course mitigate this statement.
Importing that kind of data with proper
attribution and license is fine, be it CC0 or any
other free license. But that's not an argument in
favour of enforcing on benevolent a systematic
withdraw of all their rights as single option to
contribute.
All this being said we do encourage all re-users of
our data to give attribution to Wikidata because we
believe it is in the interest of all parties involved.
That's it, zero legal hope of equity.
And our experience shows that many of our re-users do
give credit to Wikidata even if they are not forced to.
Experience also show that some prominent actors
like Google won't credit the Wikimedia community
anymore when generating directly answer based on,
inter alia, information coming from Wikidata,
which is itself performing license laundering of
Wikipedia data.
Are there no downsides to this? No, of course not.
Some people chose not to participate, some data can't
be imported and some re-users do not attribute us. But
the benefits I have seen over the years for Wikidata
and the larger open knowledge ecosystem far outweigh
them.
This should at least backed with some solid
statistics that it had a positive impact in term
of audience and contribution in Wikimedia project
as a whole. Maybe the introduction of Wikidata did
have a positive effect on the evolution of total
number of contributors, or maybe so far it has no
significant correlative effect, or maybe it is
correlative with a decrease of the total number of
active contributors. Some plots would be
interesting here. Mere personal feelings of
benefits and hindrances means nothing here, mine
included of course.
Plus, there is not even the beginning of an
attempt to A/B test with a second Wikibase instant
that allow users to select which licenses its
contributions are released under, so there is no
possible way to state anything backed on relevant
comparison. The fact that they are some people
satisfied with the current state of things doesn't
mean they would not be even more satisfied with a
more equitable solution that allows contributors
to chose a free license set for their
publications. All the more this is all about the
sustainability and fostering of our community and
reaching its goals, not immediate feeling of
satisfaction for some people.
*
[1] Wikipedia Signpost 2015, 2nd december
<https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed>
*
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against
Wikidata. The motivation behind this message is a hope
that one day one might participate in Wikidata with
the same respect for equity and traceability that is
granted in other Wikimedia projects.
Kun multe da vikiamo,
mathieu
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
<mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata