Le 30/11/2017 à 18:05, Yair Rand a écrit :
Wikidata is not replacing Wiktionary.
We will see that in the future. At least the proposed model allow to
include most things that you might find in a Wiktionary article, plus it
comes with all the benefit of a relational(-like) database.
See
for more information on what it will allow or not.
Wikidata did not replace Wikipedia, and force all articles to be under
CC-0.
Sure. Not yet. But if it continue to improve, as well as tools to
generate prose from it, at some point it might reach a good job at doing
just that.
Structured data for Commons doesn't replace all Commons media with
CC-0-licensed content.
Well, unlike one try to include use it in a very different way than what
it is aiming at, there is no chance as pictures contains far more
information than their metadata. Now, technically one might probably be
able to store the whole picture in that kind of structure (provided no size
restriction is enforced), but this is not the goal.
This is very different case than the Wiktionary case. The case of
Wikipedia might be closer, but you can not make a simple one-to-one
correspondence between Wikidata elements and Wikipedia prose. Actually
Wikipedia extraction in statements usable in Wikidata is far more easier
with current natural language processing toolkits. One the other hand such
a bijective correspondence between a Wiktionary article and a set of
WikibaseLexeme elements is clearly straight forward. So the domain of
targeted knowledge documentation is extremely overlapping. Plus the
Wikibase approach bring many advantages in term of knowledge factorisation.
To my mind, WikibaseLexeme have a good potential to quickly supersede our
plethora of sparsely communicating Wiktionaries. At least far sooner than
Wikibase will have a chance to approach the same level as Wikipedia article.
The fact that France is in Europe is not, independently, copyrightable.
The fact that File:Vanessa_indica-Silent_Valley-2016-08-14-002.jpg is a
picture of a butterfly is not copyrightable. The facts that "balloons" is
the plural of "balloon", and that "feliĉiĝi" is an intransitive verb
in
Esperanto, are not copyrightable.
Surely that is something we all agree. :)
Even if they were copyrightable, copyrighting them independently would
harm their potential reuse, as elements of a database, as has been
previously explained.
Any information monopoly is a possible obstacle to reuse. No one will
deny that, I guess. But information monopolies, such as copyright, patent
and so on do exists. And so does unequal access to resources useful for
human flourishing, including knowledge.
Now, personally I am not satisfied with this situation, nor with the
growth of inequalities. A part of my motivation in contributing in
Wikimedia projects is that it might contribute to make situation evolve
otherwise. That might not enter in the field of motivations of every
contributor, but I guess I'm not alone on this.
So the question for me is not, "how do we make our knowledge bank current
snapshots as reusable as possible right now?", but "how do we build a
sustainable movement which maintain and update knowledge banks that are as
accessible as possible for every single human out there with this goal of
sustainability in mind?".
Maybe it's not what every single stakeholder of our movement is
expecting. But I don't feel that this personal vision is at odd with what
is stated in the strategic direction. And I hope I'm not alone holding this
vision.
Wikipedia articles and Commons Media are not structured data, and as
such, they do not belong in Wikidata.
I think you statement is wrong here. Wikipedia articles are structured on
several analysable levels. For example, from the point of view of a common
linguistic theory, they are structured and analysable on syntaxique level,
semantic level and pragmatic level. But they are many other way in which
you might analyse them because they are structured data. But it is true
that there are not structured in a way that ease SQL-like querying.
However, every single sentence contained in Wikipedia articles can be
reduce down to a set of predicates, that is they are reducible in things
that can be stored in Wikidata. There is no technical barrier I'm aware of
that prevent putting the whole content of all Wikipedia in as many as
required statements within Wikidata.
Elements of prose in Wiktionary, such as definitions, appendices,
extensive usage notes and notes on grammar and whatnot, are copyrightable.
Similar to Wikipedia articles, licensing them under CC-BY-SA would not
particularly harm their reuse, as attribution is completely feasible. They
are also not structured data, and can not be made into structured data.
Well, as far as I'm concerned that would be great news to hear that
Wikidata team will allow contributors to indeed include this CC-BY-SA
material in the Wikibase instance/namespace/whatever place where this
lexicological items will be stored in, rather than enforcing here too
contribution under CC0. But so far statement made by the Wikidata team go
in the exact opposite hypothesis, that is using CC0 for everything.
Wikidata will not be laundering this data to CC-0, nor will it be setting
up a parallel project to duplicate the efforts under a license which is not
appropriate for the type of content.
I hope future will prove you right.
Attempting to license the database's contents under CC-BY-SA would not
ensure attribution, and would harm reuse. I fail to see any potential
benefits to using the more restrictive license. Attribution will be
required where it is possible (in Wiktionary proper), and content will be
as reusable as possible in areas where requiring attribution isn't feasible
(in Wikidata). There's no real conflict here.
I hope my answer made this conflicts more obvious, as well as showing how
"more reusable right now" might rhyme with "less equity and accessibility
of knowledge in the long term".
-- Yair Rand
2017-11-29 16:45 GMT-05:00 Mathieu Stumpf Guntz <
psychoslave(a)culture-libre.org>gt;:
Saluton ĉiuj,
I forward here the message I initially posted on the Meta Tremendous
Wiktionary User Group talk page
<https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>,
because I'm interested to have a wider feedback of the community on this
point. Whether you think that my view is completely misguided or that I
might have a few relevant points, I'm extremely interested to know it, so
please be bold.
Before you consider digging further in this reading, keep in mind that I
stay convinced that Wikidata is a wonderful project and I wish it a bright
future full of even more amazing things than what it already brung so far.
My sole concern is really a license issue.
Bellow is a copy/paste of the above linked message:
Thank you Lydia Pintscher
<https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29> for
taking the time to answer. Unfortunately this answer
<https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0>
miss too many important points to solve all concerns which have been raised.
Notably, there is still no beginning of hint in it about where the
decision of using CC0 exclusively for Wikidata came from. But as this
inquiry on the topic
<https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive>
advance, an answer is emerging from it. It seems that Wikidata choice
toward CC0 was heavily influenced by Denny Vrandečić, who – to make it
short – is now working in the Google Knowledge Graph team. Also it worth
noting that Google funded a quarter of the initial development work.
Another quarter came from the Gordon and Betty Moore Foundation,
established by Intel co-founder. And half the money came from Microsoft
co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1]
<https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-1>.
To state it shortly in a conspirational fashion, Wikidata is the puppet
trojan horse of big tech hegemonic companies into the realm of Wikimedia.
For a less tragic, more argumentative version, please see the research
project (work in progress, only chapter 1 is in good enough shape, and it's
only available in French so far). Some proofs that this claim is completely
wrong are welcome, as it would be great that in fact that was the community
that was the driving force behind this single license choice and that it is
the best choice for its future, not the future of giant tech companies.
This would be a great contribution to bring such a happy light on this
subject, so we can all let this issue alone and go back contributing in
more interesting topics.
Now let's examine the thoughts proposed by Lydia.
Wikidata is here to give more people more access to more knowledge. So
far, it makes it matches Wikimedia movement stated goal. This means we
want our data to be used as widely as possible. Sure, as long as it
rhymes with equity. As in *Our strategic direction: Service and *
*Equity*
<https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Direction/Endorsement#Our_strategic_direction:_Service_and_Equity>.
Just like we want freedom for everybody as widely as possible. That is,
starting where it confirms each others freedom. Because under this level,
freedom of one is murder and slavery of others. CC-0 is one step
towards that. That's a thesis, you can propose to defend it but no one
have to agree without some convincing proof. Data is different from
many other things we produce in Wikimedia in that it is aggregated,
combined, mashed-up, filtered, and so on much more extensively. No it's
not. From a data processing point of view, everything is data. Whether it's
stored in a wikisyntax, in a relational database or engraved in stone only
have a commodity side effect. Whether it's a random stream of bit generated
by a dumb chipset or some encoded prose of Shakespeare make no difference.
So from this point of view, no, what Wikidata store is not different from
what is produced anywhere else in Wikimedia projects. Sure, the way
it's structured does extremely ease many things. But this is not because
it's data, when elsewhere there would be no data. It's because it enforce
data to be stored in a way that ease aggregation, combination, mashing-up,
filtering and so on. Our data lives from being able to write queries
over millions of statements, putting it into a mobile app, visualizing
parts of it on a map and much more. Sure. It also lives from being
curated from millions[2]
<https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#cite_note-2>
of benevolent contributors, or it would be just a useless pile of random
bytes. This means, if we require attribution, in a huge number of cases
attribution would need to go back to potentially millions of editors and
sources (even if that data is not visible in the end result but only helped
to get the result). No, it doesn't mean that. First let's recall a few
basics as it seems the whole answer makes confusion between attribution and
distribution of contributions under the same license as the original.
Attribution is crucial for traceability and so for reliable and trusted
knowledge that we are targeting within the Wikimedia movement. The "same
license" is the sole legal guaranty of equity contributors have. That's it,
trusted knowledge and equity are requirements for the Wikimedia movement
goals. That means withdrawing this requirements is withdrawing this goals. Now,
what would be the additional cost of storing sources in Wikidata? Well,
zero cost. Actually, it's already here as the "reference" attribute is
part
of the Wikibase item structure. So attribution is not a problem, you don't
have to put it in front of your derived work, just look at a Wikipedia
article: until you go to history, you have zero attribution visible, and
it's ok. It's also have probably zero or negligible computing cost, as it
doesn't have to be included in all computations, it just need to be
retrievable on demand. What would be the additional cost of storing
licenses for each item based on its source? Well, adding a license
attribute might help, but actually if your reference is a work item, I
guess it might comes with a "license" statement, so zero additional cost.
Now for letting user specify under which free licenses they publish their
work, that would just require an additional attribute, a ridiculous weight
when balanced with equity concerns it resolves. Could that prevent some
uses for some actors? Yes, that's actually the point, preventing abuse of
those who doesn't want to act equitably. For all other actors a "distribute
under same condition" is fine. This is potentially computationally hard
to do and and depending on where the data is used very inconvenient (think
of a map with hundreds of data points in a mobile app). OpenStreetMap
which use ODbL, a copyleft attributive license, do exactly that too,
doesn't it? By the way, allowing a license by item would enable to include
OpenStreetMap data in WikiData, which is currently impossible due to the
CC0 single license policy of the project. Too bad, it could be so useful to
have this data accessible for Wikimedia projects, but who cares? This
is a burden on our re-users that I do not want to impose on them. Wait,
which re-users? Surely one might expect that Wikidata would care first of
re-users which are in the phase with Wikimedia goal, so surely needs of
Wikimedia community in particular and Free/Libre Culture in general should
be considered. Do this re-users would be penalized by a copyleft license?
Surely no, or they wouldn't use it extensively as they do. So who are this
re-users for who it's thought preferable, without consulting the community,
to not annoy with questions of equity and traceability? It would make
it significantly harder to re-use our data and be in direct conflict with
our goal of spreading knowledge. No, technically it would be just as
easy as punching a button on a computer to do that rather than this. What
is in direct conflict with our clearly stated goals emerging from the 2017
community consultation is going against equity and traceability. You
propose to discard both to satisfy exogenous demands which should have next
to no weight in decision impacting so deeply the future of our community. Whether
data can be protected in this way at all or not depends on the jurisdiction
we are talking about. See this Wikilegal on on database rights
<https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights> for more
details. It says basically that it's applicable in United States and
Europe on different legal bases and extents. And for the rest of the world,
it doesn't say it doesn't say nothing can apply, it states nothing. So
even if we would have decided to require attribution it would only be
enforceable in some jurisdictions. What kind of logic is that? Maybe it
might not be applicable in some country, so let's withdraw the few rights
we have. Ambiguity, when it comes to legal matters, also unfortunately
often means that people refrain from what they want to to for fear of legal
repercussions. This is directly in conflict with our goal of spreading
knowledge. Economic inequality, social inequity and legal imbalance
might also refrain people from doing what they want, as they fear practical
repercussions. CC0 strengthen this discrimination factors by enforcing
people to withdraw the few rights they have to weight against the growing
asymmetry that social structures are concomitantly building. So CC0 as
unique license choice is in direct conflict with our goal of *equitably*
spreading knowledge. Also it seems like this statement suggest that
releasing our contributions only under CC0 is the sole solution to diminish
legal doubts. Actually any well written license would do an equal job
regarding this point, including many copyleft licenses out there. So while
associate a clear license to each data item might indeed diminish legal
uncertainty, it's not an argument at all for enforcing CC0 as sole license
available to contributors. Moreover, just putting a license side by
side with a work does not ensure that the person who made the association
was legally allowed to do so. To have a better confidence in the legitimacy
of a statement that a work is covered by a certain license, there is once
again a traceability requirement. For example, Wikidata currently include
many items which were imported from misc. Wikipedia versions, and claim
that the derived work obtained – a set of items and statements – is under
CC0. That is a hugely doubtful statement and it alarmingly looks like license
laundering <https://en.wikipedia.org/wiki/license_laundering>. This is
true for Wikipedia, but it's also true for any source on which a large
scale extraction and import are operated, whether through bots or crowd
sourcing. So the Wikidata project is currently extremely misplaced to
give lessons on legal ambiguity, as it heavily plays with legal blur and
the hope that its shady practises won't fall under too much scrutiny. Licenses
that require attribution are often used as a way to try to make it harder
for big companies to profit from openly available resources. No there
are not. They are used as *a way to try to make it harder for big
companies to profit from openly available resources* *in inequitable
manners*. That's completely different. Copyleft licenses give the same
rights to big companies and individuals in a manner that lower
socio-economic inequalities which disproportionally advantage the former. The
thing is there seems to be no indication of this working. Because it's
not trying to enforce what you pretend, so of course it's not working for
this goal. But for the goal that copyleft licenses aims at, there are clear
evidences that yes it works. Big companies have the legal and
engineering resources to handle both the legal minefield and the technical
hurdles easily. There is no pitfall in copyleft licenses. Using war
material analogy is disrespectful. That's true that copyleft licenses might
come with some constraints that non-copyleft free licenses don't have, but
that the price for fostering equity. And it's a low price, that even
individuals can manage, it might require a very little extra time on legal
considerations, but on the other hand using the free work is an immensely
vast gain that worth it. In Why you shouldn't use the Lesser GPL for
your next library <https://www.gnu.org/licenses/why-not-lgpl.html> is
stated *proprietary software developers have the advantage of money;
free software developers need to make advantages for each other*. This
might be generalised as *big companies have the advantage of money;
free/libre culture contributors need to make advantages for each other*.
So at odd with what pretend this fallacious claims against copyleft
licenses, they are not a "minefield and the technical hurdles" that only
big companies can handle. All the more, let's recall who financed the
initial development of Wikidata: only actors which are related to big
companies. Who it is really hurting is the smaller start-up,
institution or hacker who can not deal with it. If this statement is
about copyleft licenses, then this is just plainly false. Smaller actors
have more to gain in preserving mutual benefit of the common ecosystem that
a copyleft license fosters. With Wikidata we are making structured data
about the world available for everyone. And that's great. But that
doesn't require CC0 as sole license to be achieved. We are leveling the
playing field to give those who currently don’t have access to the
knowledge graphs of the big companies a chance to build something amazing. And
that's great. But that doesn't require CC0 as sole license. Actually CC0
makes it a less sustainable project on this point, as it allows unfair
actors to take it all, add some interesting added value that our community
can not afford, reach/reinforce an hegemonic position in the ecosystem with
their own closed solution. And, ta ta, Wikidata can be discontinued
quietly, just like Google did with the defunct Freebase which was CC-BY-SA
before they bought the company that was running it, and after they imported
it under CC0 in Wikidata as a new attempt to gather a larger community of
free curators. And when it will have performed license laundering of all
Wikimedia projects works with shady mass extract and import, Wikimedia can
disappear as well. Of course big companies benefits more of this
possibilities than actors with smaller financial support and no hegemonic
position. Thereby we are helping more people get access to knowledge
from more places than just the few big ones. No, with CC0 you are
certainly helping big companies to reinforce their position in which they
can distribute information manipulated as they wish, without consideration
for traceability and equity considerations. Allowing contributors to also
use copyleft licenses would be far more effective to *collect and use
different forms of free, trusted knowledge* that *focus efforts on the
knowledge and communities that have been left out by structures of power
and privilege*, as stated in *Our strategic direction: Service and
Equity*. CC-0 is becoming more and more common. Just like economic
inequality <https://en.wikipedia.org/wiki/economic_inequality>. But
that is not what we are aiming to foster in the Wikimedia movement. Many
organisations are releasing their data under CC-0 and are happy with the
experience. Among them are the European Union, Europeana, the National
Library of Sweden and the Metropolitan Museum of Modern Arts. Good for
them. But they are not the Wikimedia community, they have their own goals
and plan to be sustainable that does not necessarily meet what our
community can follow. Different contexts require different means. States
and their institutions can count on tax revenue, and if taxpayers ends up
in public domain works, that's great and seems fair. States are rarely
threatened by companies, they have legal lever to pressure that kind of
entity, although conflict of interest and lobbying can of course mitigate
this statement. Importing that kind of data with proper attribution and
license is fine, be it CC0 or any other free license. But that's not an
argument in favour of enforcing on benevolent a systematic withdraw of all
their rights as single option to contribute. All this being said we do
encourage all re-users of our data to give attribution to Wikidata because
we believe it is in the interest of all parties involved. That's it,
zero legal hope of equity. And our experience shows that many of our
re-users do give credit to Wikidata even if they are not forced to. Experience
also show that some prominent actors like Google won't credit the Wikimedia
community anymore when generating directly answer based on, inter alia,
information coming from Wikidata, which is itself performing license
laundering of Wikipedia data. Are there no downsides to this? No, of
course not. Some people chose not to participate, some data can't be
imported and some re-users do not attribute us. But the benefits I have
seen over the years for Wikidata and the larger open knowledge ecosystem
far outweigh them. This should at least backed with some solid
statistics that it had a positive impact in term of audience and
contribution in Wikimedia project as a whole. Maybe the introduction of
Wikidata did have a positive effect on the evolution of total number of
contributors, or maybe so far it has no significant correlative effect, or
maybe it is correlative with a decrease of the total number of active
contributors. Some plots would be interesting here. Mere personal feelings
of benefits and hindrances means nothing here, mine included of course. Plus,
there is not even the beginning of an attempt to A/B test with a second
Wikibase instant that allow users to select which licenses its
contributions are released under, so there is no possible way to state
anything backed on relevant comparison. The fact that they are some people
satisfied with the current state of things doesn't mean they would not be
even more satisfied with a more equitable solution that allows contributors
to chose a free license set for their publications. All the more this is
all about the sustainability and fostering of our community and reaching
its goals, not immediate feeling of satisfaction for some people.
-
[1] Wikipedia Signpost 2015, 2nd december
<https://en.wikipedia.org/wiki/en:Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed>
-
[2] according to the next statement of Lydia
Once again, I recall this is not a manifesto against Wikidata. The
motivation behind this message is a hope that one day one might participate
in Wikidata with the same respect for equity and traceability that is
granted in other Wikimedia projects.
Kun multe da vikiamo,
mathieu
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing
listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org