Saluton ĉiuj,
I forward here the
message I initially posted on the Meta Tremendous Wiktionary
User Group talk page, because I'm interested to have a wider
feedback of the community on this point. Whether you think that my
view is completely misguided or that I might have a few relevant
points, I'm extremely interested to know it, so please be bold.
Before you consider digging further in this reading, keep in mind
that I stay convinced that Wikidata is a wonderful project and I
wish it a bright future full of even more amazing things than what
it already brung so far. My sole concern is really a license
issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia
Pintscher for taking the time to answer. Unfortunately this
answer miss too many important points to solve all concerns
which
have been raised.
Notably, there is still no beginning of
hint in it about where the decision of using CC0 exclusively for
Wikidata came from. But as this
inquiry on the topic advance, an answer is emerging from it.
It
seems that Wikidata choice toward CC0 was heavily influenced by
Denny
Vrandečić, who – to make it short – is now working in the
Google Knowledge Graph team. Also it worth noting that Google
funded
a quarter of the initial development work. Another quarter came
from
the Gordon and Betty Moore Foundation, established by Intel
co-founder. And half the money came from Microsoft co-founder Paul
Allen's Institute for Artificial Intelligence (AI2)[1].
To state it shortly in a conspirational fashion, Wikidata is the
puppet trojan horse of big tech hegemonic companies into the realm
of
Wikimedia. For a less tragic, more argumentative version, please
see
the research project (work in progress, only chapter 1 is in good
enough shape, and it's only available in French so far). Some
proofs
that this claim is completely wrong are welcome, as it would be
great
that in fact that was the community that was the driving force
behind
this single license choice and that it is the best choice for its
future, not the future of giant tech companies. This would be a
great
contribution to bring such a happy light on this subject, so we
can
all let this issue alone and go back contributing in more
interesting
topics.
Now let's examine the thoughts proposed by Lydia.
- Wikidata is here to give more people more access to more
knowledge.
- So far, it makes it matches
Wikimedia movement stated goal.
- This means we want our data to be used as widely as possible.
- Sure, as long as it rhymes with
equity. As in Our
strategic direction: Service and Equity.
Just like we want freedom for everybody as widely as possible.
That is, starting where it confirms each others freedom. Because
under this level, freedom of one is murder and slavery of
others.
- CC-0 is one step towards that.
- That's a thesis, you can propose
to defend it but no one have to agree without some convincing
proof.
- Data is different from many other things we produce in
Wikimedia in that it is aggregated, combined, mashed-up,
filtered, and so on much more extensively.
- No it's not. From a data processing point of view, everything
is data. Whether it's stored in a wikisyntax, in a relational
database or engraved in stone only have a commodity side effect.
Whether it's a random stream of bit generated by a dumb chipset
or some encoded prose of Shakespeare make no difference. So from
this point of view, no, what Wikidata store is not different
from what is produced anywhere else in Wikimedia projects.
- Sure, the way it's structured
does extremely ease many things. But this is not because it's
data, when elsewhere there would be no data. It's because it
enforce data to be stored in a way that ease aggregation,
combination, mashing-up, filtering and so on.
- Our data lives from being able to write queries over millions
of statements, putting it into a mobile app, visualizing parts
of it on a map and much more.
- Sure.
It also lives from being curated from millions[2]
of benevolent contributors, or it would be just a useless pile
of random bytes.
- This means, if we require attribution, in a huge number of
cases attribution would need to go back to potentially millions
of editors and sources (even if that data is not visible in the
end result but only helped to get the result).
- No, it doesn't mean that.
- First let's recall a few basics as it seems the whole answer
makes confusion between attribution and distribution of
contributions under the same license as the original.
Attribution is crucial for traceability and so for reliable and
trusted knowledge that we are targeting within the Wikimedia
movement. The "same license" is the sole legal guaranty of
equity contributors have. That's it, trusted knowledge and
equity are requirements for the Wikimedia movement goals. That
means withdrawing this requirements is withdrawing this goals.
- Now, what would be the additional cost of storing sources in
Wikidata? Well, zero cost. Actually, it's already here as the
"reference" attribute is part of the Wikibase item structure. So
attribution is not a problem, you don't have to put it in front
of your derived work, just look at a Wikipedia article: until
you go to history, you have zero attribution visible, and it's
ok. It's also have probably zero or negligible computing cost,
as it doesn't have to be included in all computations, it just
need to be retrievable on demand.
- What would be the additional cost of storing licenses for
each item based on its source? Well, adding a license attribute
might help, but actually if your reference is a work item, I
guess it might comes with a "license" statement, so zero
additional cost. Now for letting user specify under which free
licenses they publish their work, that would just require an
additional attribute, a ridiculous weight when balanced with
equity concerns it resolves.
- Could that prevent some uses for
some actors? Yes, that's actually the point, preventing abuse of
those who doesn't want to act equitably. For all other actors a
"distribute under same condition" is fine.
- This is potentially computationally hard to do and and
depending on where the data is used very inconvenient (think of
a map with hundreds of data points in a mobile app).
- OpenStreetMap which use ODbL, a
copyleft attributive license, do exactly that too, doesn't it?
By the way, allowing a license by item would enable to include
OpenStreetMap data in WikiData, which is currently impossible
due to the CC0 single license policy of the project. Too bad, it
could be so useful to have this data accessible for Wikimedia
projects, but who cares?
- This is a burden on our re-users that I do not want to impose
on them.
- Wait, which re-users? Surely one
might expect that Wikidata would care first of re-users which
are in the phase with Wikimedia goal, so surely needs of
Wikimedia community in particular and Free/Libre Culture in
general should be considered. Do this re-users would be
penalized by a copyleft license? Surely no, or they wouldn't use
it extensively as they do. So who are this re-users for who it's
thought preferable, without consulting the community, to not
annoy with questions of equity and traceability?
- It would make it significantly harder to re-use our data and
be in direct conflict with our goal of spreading knowledge.
- No, technically it would be just
as easy as punching a button on a computer to do that rather
than this. What is in direct conflict with our clearly stated
goals emerging from the 2017 community consultation is going
against equity and traceability. You propose to discard both to
satisfy exogenous demands which should have next to no weight in
decision impacting so deeply the future of our community.
- Whether data can be protected in this way at all or not
depends on the jurisdiction we are talking about. See this
Wikilegal on on database rights for more details.
- It says basically that it's
applicable in United States and Europe on different legal bases
and extents. And for the rest of the world, it doesn't say it
doesn't say nothing can apply, it states nothing.
- So even if we would have decided to require attribution it
would only be enforceable in some jurisdictions.
- What kind of logic is that?
Maybe it might not be applicable in some country, so let's
withdraw the few rights we have.
- Ambiguity, when it comes to legal matters, also unfortunately
often means that people refrain from what they want to to for
fear of legal repercussions. This is directly in conflict with
our goal of spreading knowledge.
- Economic inequality, social inequity and legal imbalance
might also refrain people from doing what they want, as they
fear practical repercussions. CC0 strengthen this discrimination
factors by enforcing people to withdraw the few rights they have
to weight against the growing asymmetry that social structures
are concomitantly building. So CC0 as unique license choice is
in direct conflict with our goal of equitably spreading
knowledge.
- Also it seems like this statement suggest that releasing our
contributions only under CC0 is the sole solution to diminish
legal doubts. Actually any well written license would do an
equal job regarding this point, including many copyleft licenses
out there. So while associate a clear license to each data item
might indeed diminish legal uncertainty, it's not an argument at
all for enforcing CC0 as sole license available to contributors.
- Moreover, just putting a license side by side with a work
does not ensure that the person who made the association was
legally allowed to do so. To have a better confidence in the
legitimacy of a statement that a work is covered by a certain
license, there is once again a traceability requirement. For
example, Wikidata currently include many items which were
imported from misc. Wikipedia versions, and claim that the
derived work obtained – a set of items and statements – is under
CC0. That is a hugely doubtful statement and it alarmingly looks
like license
laundering. This is true for Wikipedia, but it's also true
for any source on which a large scale extraction and import are
operated, whether through bots or crowd sourcing.
- So the Wikidata project is
currently extremely misplaced to give lessons on legal
ambiguity, as it heavily plays with legal blur and the hope that
its shady practises won't fall under too much scrutiny.
- Licenses that require attribution are often used as a way to
try to make it harder for big companies to profit from openly
available resources.
- No there are not. They are used
as a way to try to make it harder for big companies to
profit from openly available resources in inequitable
manners. That's completely different. Copyleft licenses
give the same rights to big companies and individuals in a
manner that lower socio-economic inequalities which
disproportionally advantage the former.
- The thing is there seems to be no indication of this working.
- Because it's not trying to
enforce what you pretend, so of course it's not working for this
goal. But for the goal that copyleft licenses aims at, there are
clear evidences that yes it works.
- Big companies have the legal and engineering resources to
handle both the legal minefield and the technical hurdles
easily.
- There is no pitfall in copyleft
licenses. Using war material analogy is disrespectful. That's
true that copyleft licenses might come with some constraints
that non-copyleft free licenses don't have, but that the price
for fostering equity. And it's a low price, that even
individuals can manage, it might require a very little extra
time on legal considerations, but on the other hand using the
free work is an immensely vast gain that worth it. In Why you
shouldn't use the Lesser GPL for your next library is
stated proprietary software developers have the advantage of
money; free software developers need to make advantages for
each other. This might be generalised as big companies
have the advantage of money; free/libre culture contributors
need to make advantages for each other. So at odd with
what pretend this fallacious claims against copyleft licenses,
they are not a "minefield and the technical hurdles" that only
big companies can handle. All the more, let's recall who
financed the initial development of Wikidata: only actors which
are related to big companies.
- Who it is really hurting is the smaller start-up, institution
or hacker who can not deal with it.
- If this statement is about
copyleft licenses, then this is just plainly false. Smaller
actors have more to gain in preserving mutual benefit of the
common ecosystem that a copyleft license fosters.
- With Wikidata we are making structured data about the world
available for everyone.
- And that's great. But that
doesn't require CC0 as sole license to be achieved.
- We are leveling the playing field to give those who currently
don’t have access to the knowledge graphs of the big companies a
chance to build something amazing.
- And that's great. But that
doesn't require CC0 as sole license. Actually CC0 makes it a
less sustainable project on this point, as it allows unfair
actors to take it all, add some interesting added value that our
community can not afford, reach/reinforce an hegemonic position
in the ecosystem with their own closed solution. And, ta ta,
Wikidata can be discontinued quietly, just like Google did with
the defunct Freebase which was CC-BY-SA before they bought the
company that was running it, and after they imported it under
CC0 in Wikidata as a new attempt to gather a larger community of
free curators. And when it will have performed license
laundering of all Wikimedia projects works with shady mass
extract and import, Wikimedia can disappear as well. Of course
big companies benefits more of this possibilities than actors
with smaller financial support and no hegemonic position.
- Thereby we are helping more people get access to knowledge
from more places than just the few big ones.
- No, with CC0 you are certainly
helping big companies to reinforce their position in which they
can distribute information manipulated as they wish, without
consideration for traceability and equity considerations.
Allowing contributors to also use copyleft licenses would be far
more effective to collect and use different forms of free,
trusted knowledge that focus efforts on the knowledge
and communities that have been left out by structures of power
and privilege, as stated in Our strategic direction:
Service and Equity.
- CC-0 is becoming more and more common.
- Just like economic
inequality. But that is not what we are aiming to foster
in the Wikimedia movement.
- Many organisations are releasing their data under CC-0 and
are happy with the experience. Among them are the European
Union, Europeana, the National Library of Sweden and the
Metropolitan Museum of Modern Arts.
- Good for them. But they are not the Wikimedia community, they
have their own goals and plan to be sustainable that does not
necessarily meet what our community can follow. Different
contexts require different means. States and their institutions
can count on tax revenue, and if taxpayers ends up in public
domain works, that's great and seems fair. States are rarely
threatened by companies, they have legal lever to pressure that
kind of entity, although conflict of interest and lobbying can
of course mitigate this statement.
- Importing that kind of data with
proper attribution and license is fine, be it CC0 or any other
free license. But that's not an argument in favour of enforcing
on benevolent a systematic withdraw of all their rights as
single option to contribute.
- All this being said we do encourage all re-users of our data
to give attribution to Wikidata because we believe it is in the
interest of all parties involved.
- That's it, zero legal hope of
equity.
- And our experience shows that many of our re-users do give
credit to Wikidata even if they are not forced to.
- Experience also show that some
prominent actors like Google won't credit the Wikimedia
community anymore when generating directly answer based on,
inter alia, information coming from Wikidata, which is itself
performing license laundering of Wikipedia data.
- Are there no downsides to this? No, of course not. Some
people chose not to participate, some data can't be imported and
some re-users do not attribute us. But the benefits I have seen
over the years for Wikidata and the larger open knowledge
ecosystem far outweigh them.
- This should at least backed with some solid statistics that
it had a positive impact in term of audience and contribution in
Wikimedia project as a whole. Maybe the introduction of Wikidata
did have a positive effect on the evolution of total number of
contributors, or maybe so far it has no significant correlative
effect, or maybe it is correlative with a decrease of the total
number of active contributors. Some plots would be interesting
here. Mere personal feelings of benefits and hindrances means
nothing here, mine included of course.
- Plus, there is not even the
beginning of an attempt to A/B test with a second Wikibase
instant that allow users to select which licenses its
contributions are released under, so there is no possible way to
state anything backed on relevant comparison. The fact that they
are some people satisfied with the current state of things
doesn't mean they would not be even more satisfied with a more
equitable solution that allows contributors to chose a free
license set for their publications. All the more this is all
about the sustainability and fostering of our community and
reaching its goals, not immediate feeling of satisfaction for
some people.
Once again, I recall this is not a manifesto against Wikidata.
The motivation behind this message is a hope that one day one
might participate in Wikidata with the same respect for equity and
traceability that is granted in other Wikimedia projects.
Kun multe da vikiamo,
mathieu