You don't seem to grasp the essential legal point, though several people
in this thread have already tried to tell you.
Copyright protects expression and creative originality. It does not
protect merely a collation of facts.
The CC-SA licence is based on copyright. Anything that is not protected
by copyright is not protected by the CC-SA licence.
To the extent that an article can be reduced to a mere collation of
facts, it is not protected by copyright. What is protected is any
originality or creativity in how those facts are organised and presented
-- the expression, the sequence of thought, the selections of words, all
the authorial choices in the text.
*That* is the difference between a copyright-protected Wiki article on
the one hand, and a Wikidata collection of facts on the other.
In the European Union collections of facts can be protected by database
That is the path Open Streetmap chose, when they designed the ODbL, to
prevent their work being eaten up and assimilated by closed commercial
It is not the choice Wikidata made. And it is not the choice any of the
Wiki projects made before Wikidata -- CC-SA disclaims database rights.
The debate between the two views goes back at least as far as GPL vs
BSD, and the arguments have been gone over many many times in many many
communities over that time.
Yes, CC0 causes us some difficulties.
It means what we can import from OpenStreetmap is very restricted --
mass import falls foul of OSM's database rights; and also coordinates
and boundaries are somewhat susceptible to judgment, so there is
probably a copyright element to.
It also makes it difficult to import from official sources (eg the UK
Open Government Licence) that use database rights to require attribution
-- that is not an obligation we are prepared to pass on to out re-users,
which means we generally have to forego such sources.
But the counterbalance is that for many people it is the openness and
reusability for all purposes of Wikidata that very much encourages them
to contribute -- they feel the more reusable and reused their work is,
the more it is worth contributing.
The important point though is that this boat has sailed. Wikidata is
CC0, and it is not going to change now.
Yes, somebody could fork the data from Wikidata into their own ODbL
project if they wanted to. CC0 allows that. (The reverse direction is
what is difficult). You might have preferred ODBL on viral GPL-style
community-building (or community-isolating) grounds. But that is not
going to happen.
As regards Wiktionary, it means that Wikidata cannot import from
Wiktionary anything that represents original expression or original
But there is no restriction, not from copyright law, nor from the
CC-BY-SA licence, to stop Wikidata -- or anyone else -- extracting and
systematically storing standard uncontroversial facts, so long as
nothing of original expression is taken.
Please confirm that you understand this.
On 30/11/2017 21:17, mathieu stumpf guntz wrote:
Le 30/11/2017 à 18:05, Yair Rand a écrit :
Wikidata is not replacing Wiktionary.
will see that in the future. At least the proposed model allow to
include most things that you might find in a Wiktionary article, plus it
comes with all the benefit of a relational(-like) database.
for more information on what it will allow or not.
Wikidata did not replace Wikipedia, and force all
articles to be under
Sure. Not yet. But if it continue to improve, as well as tools to
generate prose from it, at some point it might reach a good job at doing
Structured data for Commons doesn't replace
all Commons media with
Well, unlike one try to include use it in a very different
way than what
it is aiming at, there is no chance as pictures contains far more
information than their metadata. Now, technically one might probably be
able to store the whole picture in that kind of structure (provided no
size restriction is enforced), but this is not the goal.
This is very different case than the Wiktionary case. The case of
Wikipedia might be closer, but you can not make a simple one-to-one
correspondence between Wikidata elements and Wikipedia prose. Actually
Wikipedia extraction in statements usable in Wikidata is far more easier
with current natural language processing toolkits. One the other hand
such a bijective correspondence between a Wiktionary article and a set
of WikibaseLexeme elements is clearly straight forward. So the domain of
targeted knowledge documentation is extremely overlapping. Plus the
Wikibase approach bring many advantages in term of knowledge factorisation.
To my mind, WikibaseLexeme have a good potential to quickly supersede
our plethora of sparsely communicating Wiktionaries. At least far sooner
than Wikibase will have a chance to approach the same level as Wikipedia
The fact that France is in Europe is not,
copyrightable. The fact that
File:Vanessa_indica-Silent_Valley-2016-08-14-002.jpg is a picture of a
butterfly is not copyrightable. The facts that "balloons" is the
plural of "balloon", and that "feliĉiĝi" is an intransitive verb in
Esperanto, are not copyrightable.
Surely that is something we all agree. :)
Even if they were copyrightable, copyrighting
them independently would
harm their potential reuse, as elements of a database, as has been
Any information monopoly is a possible obstacle to reuse. No
deny that, I guess. But information monopolies, such as copyright,
patent and so on do exists. And so does unequal access to resources
useful for human flourishing, including knowledge.
Now, personally I am not satisfied with this situation, nor with the
growth of inequalities. A part of my motivation in contributing in
Wikimedia projects is that it might contribute to make situation evolve
otherwise. That might not enter in the field of motivations of every
contributor, but I guess I'm not alone on this.
So the question for me is not, "how do we make our knowledge bank
current snapshots as reusable as possible right now?", but "how do we
build a sustainable movement which maintain and update knowledge banks
that are as accessible as possible for every single human out there with
this goal of sustainability in mind?".
Maybe it's not what every single stakeholder of our movement is
expecting. But I don't feel that this personal vision is at odd with
what is stated in the strategic direction. And I hope I'm not alone
holding this vision.
Wikipedia articles and Commons Media are not
structured data, and as
such, they do not belong in Wikidata.
I think you statement is wrong here.
Wikipedia articles are structured
on several analysable levels. For example, from the point of view of a
common linguistic theory, they are structured and analysable on
syntaxique level, semantic level and pragmatic level. But they are many
other way in which you might analyse them because they are structured
data. But it is true that there are not structured in a way that ease
However, every single sentence contained in Wikipedia articles can be
reduce down to a set of predicates, that is they are reducible in things
that can be stored in Wikidata. There is no technical barrier I'm aware
of that prevent putting the whole content of all Wikipedia in as many as
required statements within Wikidata.
Elements of prose in Wiktionary, such as
extensive usage notes and notes on grammar and whatnot, are
copyrightable. Similar to Wikipedia articles, licensing them under
CC-BY-SA would not particularly harm their reuse, as attribution is
completely feasible. They are also not structured data, and can not be
made into structured data.
Well, as far as I'm concerned that would be great
news to hear that
Wikidata team will allow contributors to indeed include this CC-BY-SA
material in the Wikibase instance/namespace/whatever place where this
lexicological items will be stored in, rather than enforcing here too
contribution under CC0. But so far statement made by the Wikidata team
go in the exact opposite hypothesis, that is using CC0 for everything.
Wikidata will not be laundering this data to
CC-0, nor will it be
setting up a parallel project to duplicate the efforts under a license
which is not appropriate for the type of content.
I hope future will prove you
Attempting to license the database's contents
under CC-BY-SA would not
ensure attribution, and would harm reuse. I fail to see any potential
benefits to using the more restrictive license. Attribution will be
required where it is possible (in Wiktionary proper), and content will
be as reusable as possible in areas where requiring attribution isn't
feasible (in Wikidata). There's no real conflict here.
I hope my answer made
this conflicts more obvious, as well as showing
how "more reusable right now" might rhyme with "less equity and
accessibility of knowledge in the long term".
-- Yair Rand
2017-11-29 16:45 GMT-05:00 Mathieu Stumpf Guntz
I forward here the message I initially posted on the Meta
Tremendous Wiktionary User Group talk page
because I'm interested to have a wider feedback of the community
on this point. Whether you think that my view is completely
misguided or that I might have a few relevant points, I'm
extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind
that I stay convinced that Wikidata is a wonderful project and I
wish it a bright future full of even more amazing things than what
it already brung so far. My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher
for taking the time to answer. Unfortunately this answer
miss too many important points to solve all concerns which have
Notably, there is still no beginning of hint in it about where the
decision of using CC0 exclusively for Wikidata came from. But as
this inquiry on the topic
advance, an answer is emerging from it. It seems that Wikidata
choice toward CC0 was heavily influenced by Denny Vrandečić, who –
to make it short – is now working in the Google Knowledge Graph
team. Also it worth noting that Google funded a quarter of the
initial development work. Another quarter came from the Gordon and
Betty Moore Foundation, established by Intel co-founder. And half
the money came from Microsoft co-founder Paul Allen's Institute
for Artificial Intelligence (AI2)
To state it shortly in a conspirational fashion, Wikidata is the
puppet trojan horse of big tech hegemonic companies into the realm
of Wikimedia. For a less tragic, more argumentative version,
please see the research project (work in progress, only chapter 1
is in good enough shape, and it's only available in French so
far). Some proofs that this claim is completely wrong are welcome,
as it would be great that in fact that was the community that was
the driving force behind this single license choice and that it is
the best choice for its future, not the future of giant tech
companies. This would be a great contribution to bring such a
happy light on this subject, so we can all let this issue alone
and go back contributing in more interesting topics.
Now let's examine the thoughts proposed by Lydia.
Wikidata is here to give more people more access to more knowledge.
So far, it makes it matches Wikimedia movement stated goal.
This means we want our data to be used as widely as possible.
Sure, as long as it rhymes with equity. As in /Our strategic
direction: Service and //*Equity*/
Just like we want freedom for everybody as widely as possible.
That is, starting where it confirms each others freedom.
Because under this level, freedom of one is murder and slavery
of others. CC-0 is one step towards that.
That's a thesis, you can propose to defend it but no one have
to agree without some convincing proof. Data is different
from many other things we produce in Wikimedia
in that it is aggregated, combined, mashed-up, filtered, and so on
much more extensively.
No it's not. From a data processing point of view, everything
is data. Whether it's stored in a wikisyntax, in a relational
database or engraved in stone only have a commodity side
effect. Whether it's a random stream of bit generated by a
dumb chipset or some encoded prose of Shakespeare make no
difference. So from this point of view, no, what Wikidata
store is not different from what is produced anywhere else in
Wikimedia projects. Sure, the way it's structured does
extremely ease many things.
But this is not because it's data, when elsewhere there would
be no data. It's because it enforce data to be stored in a way
that ease aggregation, combination, mashing-up, filtering and
Our data lives from being able to write queries over millions of
statements, putting it into a mobile app, visualizing parts of it
on a map and much more.
Sure. It also lives from being curated from millions
of benevolent contributors, or it would be just a useless pile
of random bytes. This means, if we require attribution, in
a huge number of cases
attribution would need to go back to potentially millions of
editors and sources (even if that data is not visible in the end
result but only helped to get the result).
No, it doesn't mean that. First let's recall a few
basics as it seems the whole answer
makes confusion between attribution and distribution of
contributions under the same license as the original.
Attribution is crucial for traceability and so for reliable
and trusted knowledge that we are targeting within the
Wikimedia movement. The "same license" is the sole legal
guaranty of equity contributors have. That's it, trusted
knowledge and equity are requirements for the Wikimedia
movement goals. That means withdrawing this requirements is
withdrawing this goals. Now, what would be the
additional cost of storing sources in
Wikidata? Well, zero cost. Actually, it's already here as the
"reference" attribute is part of the Wikibase item structure.
So attribution is not a problem, you don't have to put it in
front of your derived work, just look at a Wikipedia article:
until you go to history, you have zero attribution visible,
and it's ok. It's also have probably zero or negligible
computing cost, as it doesn't have to be included in all
computations, it just need to be retrievable on demand.
What would be the additional cost of storing licenses for each
item based on its source? Well, adding a license attribute
might help, but actually if your reference is a work item, I
guess it might comes with a "license" statement, so zero
additional cost. Now for letting user specify under which free
licenses they publish their work, that would just require an
additional attribute, a ridiculous weight when balanced with
equity concerns it resolves. Could that prevent some
uses for some actors? Yes, that's
actually the point, preventing abuse of those who doesn't want
to act equitably. For all other actors a "distribute under
same condition" is fine. This is potentially
computationally hard to do and and depending
on where the data is used very inconvenient (think of a map with
hundreds of data points in a mobile app).
OpenStreetMap which use ODbL, a copyleft attributive license,
do exactly that too, doesn't it? By the way, allowing a
license by item would enable to include OpenStreetMap data in
WikiData, which is currently impossible due to the CC0 single
license policy of the project. Too bad, it could be so useful
to have this data accessible for Wikimedia projects, but who
cares? This is a burden on our re-users that I do not want
to impose on
Wait, which re-users? Surely one might expect that Wikidata
would care first of re-users which are in the phase with
Wikimedia goal, so surely needs of Wikimedia community in
particular and Free/Libre Culture in general should be
considered. Do this re-users would be penalized by a copyleft
license? Surely no, or they wouldn't use it extensively as
they do. So who are this re-users for who it's thought
preferable, without consulting the community, to not annoy
with questions of equity and traceability? It would make
it significantly harder to re-use our data and be in
direct conflict with our goal of spreading knowledge.
No, technically it would be just as easy as punching a button
on a computer to do that rather than this. What is in direct
conflict with our clearly stated goals emerging from the 2017
community consultation is going against equity and
traceability. You propose to discard both to satisfy exogenous
demands which should have next to no weight in decision
impacting so deeply the future of our community. Whether
data can be protected in this way at all or not depends on
the jurisdiction we are talking about. See this Wikilegal on on
It says basically that it's applicable in United States and
Europe on different legal bases and extents. And for the rest
of the world, it doesn't say it doesn't say nothing can apply,
it states nothing. So even if we would have decided to
require attribution it would
only be enforceable in some jurisdictions.
What kind of logic is that? Maybe it might not be applicable
in some country, so let's withdraw the few rights we have.
Ambiguity, when it comes to legal matters, also unfortunately
often means that people refrain from what they want to to for fear
of legal repercussions. This is directly in conflict with our goal
of spreading knowledge.
Economic inequality, social inequity and legal imbalance might
also refrain people from doing what they want, as they fear
practical repercussions. CC0 strengthen this discrimination
factors by enforcing people to withdraw the few rights they
have to weight against the growing asymmetry that social
structures are concomitantly building. So CC0 as unique
license choice is in direct conflict with our goal of
*equitably* spreading knowledge. Also it seems like
this statement suggest that releasing our
contributions only under CC0 is the sole solution to diminish
legal doubts. Actually any well written license would do an
equal job regarding this point, including many copyleft
licenses out there. So while associate a clear license to each
data item might indeed diminish legal uncertainty, it's not an
argument at all for enforcing CC0 as sole license available to
contributors. Moreover, just putting a license side by
side with a work does
not ensure that the person who made the association was
legally allowed to do so. To have a better confidence in the
legitimacy of a statement that a work is covered by a certain
license, there is once again a traceability requirement. For
example, Wikidata currently include many items which were
imported from misc. Wikipedia versions, and claim that the
derived work obtained – a set of items and statements – is
under CC0. That is a hugely doubtful statement and it
alarmingly looks like license laundering
<https://en.wikipedia.org/wiki/license_laundering>. This is
true for Wikipedia, but it's also true for any source on which
a large scale extraction and import are operated, whether
through bots or crowd sourcing. So the Wikidata
project is currently extremely misplaced to
give lessons on legal ambiguity, as it heavily plays with
legal blur and the hope that its shady practises won't fall
under too much scrutiny. Licenses that require attribution
are often used as a way to try
to make it harder for big companies to profit from openly
No there are not. They are used as /a way to try to make it
harder for big companies to profit from openly available
resources/ *in inequitable manners*. That's completely
different. Copyleft licenses give the same rights to big
companies and individuals in a manner that lower
socio-economic inequalities which disproportionally advantage
the former. The thing is there seems to be no indication
of this working.
Because it's not trying to enforce what you pretend, so of
course it's not working for this goal. But for the goal that
copyleft licenses aims at, there are clear evidences that yes
it works. Big companies have the legal and engineering
resources to handle
both the legal minefield and the technical hurdles easily.
There is no pitfall in copyleft licenses. Using war material
analogy is disrespectful. That's true that copyleft licenses
might come with some constraints that non-copyleft free
licenses don't have, but that the price for fostering equity.
And it's a low price, that even individuals can manage, it
might require a very little extra time on legal
considerations, but on the other hand using the free work is
an immensely vast gain that worth it. In Why you shouldn't use
the Lesser GPL for your next library
<https://www.gnu.org/licenses/why-not-lgpl.html> is stated
/proprietary software developers have the advantage of money;
free software developers need to make advantages for each
other/. This might be generalised as /big companies have the
advantage of money; free/libre culture contributors need to
make advantages for each other/. So at odd with what pretend
this fallacious claims against copyleft licenses, they are not
a "minefield and the technical hurdles" that only big
companies can handle. All the more, let's recall who financed
the initial development of Wikidata: only actors which are
related to big companies. Who it is really hurting is the
smaller start-up, institution or
hacker who can not deal with it.
If this statement is about copyleft licenses, then this is
just plainly false. Smaller actors have more to gain in
preserving mutual benefit of the common ecosystem that a
copyleft license fosters. With Wikidata we are making
structured data about the world
available for everyone.
And that's great. But that doesn't require CC0 as sole license
to be achieved. We are leveling the playing field to give
those who currently
don’t have access to the knowledge graphs of the big companies a
chance to build something amazing.
And that's great. But that doesn't require CC0 as sole
license. Actually CC0 makes it a less sustainable project on
this point, as it allows unfair actors to take it all, add
some interesting added value that our community can not
afford, reach/reinforce an hegemonic position in the ecosystem
with their own closed solution. And, ta ta, Wikidata can be
discontinued quietly, just like Google did with the defunct
Freebase which was CC-BY-SA before they bought the company
that was running it, and after they imported it under CC0 in
Wikidata as a new attempt to gather a larger community of free
curators. And when it will have performed license laundering
of all Wikimedia projects works with shady mass extract and
import, Wikimedia can disappear as well. Of course big
companies benefits more of this possibilities than actors with
smaller financial support and no hegemonic position.
Thereby we are helping more people get access to knowledge from
more places than just the few big ones.
No, with CC0 you are certainly helping big companies to
reinforce their position in which they can distribute
information manipulated as they wish, without consideration
for traceability and equity considerations. Allowing
contributors to also use copyleft licenses would be far more
effective to /collect and use different forms of free, trusted
knowledge/ that /focus efforts on the knowledge and
communities that have been left out by structures of power and
privilege/, as stated in /Our strategic direction: Service and
CC-0 is becoming more and more common.
Just like economic inequality
<https://en.wikipedia.org/wiki/economic_inequality>. But that
is not what we are aiming to foster in the Wikimedia movement.
Many organisations are releasing their data under CC-0 and are
happy with the experience. Among them are the European Union,
Europeana, the National Library of Sweden and the Metropolitan
Museum of Modern Arts.
Good for them. But they are not the Wikimedia community, they
have their own goals and plan to be sustainable that does not
necessarily meet what our community can follow. Different
contexts require different means. States and their
institutions can count on tax revenue, and if taxpayers ends
up in public domain works, that's great and seems fair. States
are rarely threatened by companies, they have legal lever to
pressure that kind of entity, although conflict of interest
and lobbying can of course mitigate this statement.
Importing that kind of data with proper attribution and
license is fine, be it CC0 or any other free license. But
that's not an argument in favour of enforcing on benevolent a
systematic withdraw of all their rights as single option to
contribute. All this being said we do encourage all
re-users of our data to
give attribution to Wikidata because we believe it is in the
interest of all parties involved.
That's it, zero legal hope of equity. And our experience
shows that many of our re-users do give credit
to Wikidata even if they are not forced to.
Experience also show that some prominent actors like Google
won't credit the Wikimedia community anymore when generating
directly answer based on, inter alia, information coming from
Wikidata, which is itself performing license laundering of
Wikipedia data. Are there no downsides to this? No, of
course not. Some people
chose not to participate, some data can't be imported and some
re-users do not attribute us. But the benefits I have seen over
the years for Wikidata and the larger open knowledge ecosystem far
This should at least backed with some solid statistics that it
had a positive impact in term of audience and contribution in
Wikimedia project as a whole. Maybe the introduction of
Wikidata did have a positive effect on the evolution of total
number of contributors, or maybe so far it has no significant
correlative effect, or maybe it is correlative with a decrease
of the total number of active contributors. Some plots would
be interesting here. Mere personal feelings of benefits and
hindrances means nothing here, mine included of course.
Plus, there is not even the beginning of an attempt to A/B
test with a second Wikibase instant that allow users to select
which licenses its contributions are released under, so there
is no possible way to state anything backed on relevant
comparison. The fact that they are some people satisfied with
the current state of things doesn't mean they would not be
even more satisfied with a more equitable solution that allows
contributors to chose a free license set for their
publications. All the more this is all about the
sustainability and fostering of our community and reaching its
goals, not immediate feeling of satisfaction for some people.
 Wikipedia Signpost 2015, 2nd december
 according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The
motivation behind this message is a hope that one day one might
participate in Wikidata with the same respect for equity and
traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo,
Wikidata mailing list
Wikidata mailing list
Wikidata mailing list
This email has been checked for viruses by AVG.