Rob Speer wrote:
The result of this, by the way, is that commercial entities sell modified versions of Wikidata with impunity. It undermines the terms of other resources such as DBPedia, which also contains facts extracted from Wikipedia and respects its Share-Alike terms. Why would anyone use DBPedia and have to agree to share alike, when they can get similar data from Wikidata which promises them it's CC-0?
The comparison to DBpedia is interesting: the terms for DBpedia state "Attribution in this case means keep DBpedia URIs visible and active through at least one (preferably all) of @href, <link />, or "Link:". If live links are impossible (e.g., when printed on paper), a textual blurb-based attribution is acceptable." http://wiki.dbpedia.org/terms-imprint
So according to these terms, when someone displays data from DBpedia, it is entirely sufficient to attribute DBpedia.
What that means is that DBpedia follows exactly the same theory as Wikidata: it is OK to extract data from Wikipedia and republish it as your own dataset under your own copyright without requiring attribution to the original source of the extraction.
(A bit more problematic might be the fact that DBpedia also republishes whole paragraphs of Text under these terms, but that's another story)
My understanding is that all that Wikidata has extracted from Wikipedia is non-copyrightable in the first place and thus republishing it under a different license (or, as in the case of DBpedia for simple triples, with a different attribution) is legally sound.
If there is disagreement with that, I would be interested which content exactly is considered to be under copyright and where license has not been followed on Wikidata.
For completion: the discussion is going on in parallel on the Wikidata project chat and in Phabricator:
https://phabricator.wikimedia.org/T193728#4212728 https://www.wikidata.org/wiki/Wikidata:Project_chat#Wikipedia_and_other_Wiki...
I would appreciate if we could keep the discussion in a single place.
Gnom1 on Phabricator has offered to actually answer legal questions, but we need to come up with the questions that we want to ask. If it should be, for example, as Rob Speer states on the bug, "has the copyright of interwiki links been breached by having them be moved to Wikidata?", I'd be quite happy with that question - if that's the disagreement, let us ask Legal help and see if my understanding or yours is correct.
Does this sound like a reasonable question? Or which other question would you like to ask instead?
On Thu, May 17, 2018 at 4:15 PM Rob Speer rob@luminoso.com wrote:
As always, copyright is predatory. As we can prove that copyright is the
enemy of science and knowledge
Well, this kind of gets to the heart of the issue, doesn't it.
I support the Creative Commons license, including the share-alike term, which requires copyright in order to work, and I've contributed to multiple Wikimedia projects with the understanding that my work would be protected by CC-By-SA.
Wikidata is engaged in a project-wide act of disobedience against CC-By-SA. I would say that GerardM has provided an excellent summary of the attitude toward Creative Commons that I've encountered on Wikidata: "it's holding us back", "it's the enemy", "you can't copyright knowledge", "you can't make us follow it", etc.
The result of this, by the way, is that commercial entities sell modified versions of Wikidata with impunity. It undermines the terms of other resources such as DBPedia, which also contains facts extracted from Wikipedia and respects its Share-Alike terms. Why would anyone use DBPedia and have to agree to share alike, when they can get similar data from Wikidata which promises them it's CC-0?
On Wed, 16 May 2018 at 21:43 Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Thank you for the overly broad misrepresentation. As always, copyright is predatory. As we can prove that copyright is the enemy of science and knowledge we should not be upset that *copyright *is abused we should welcome it as it proves the point. Also when we use texts from everywhere and rephrase it in Wikipedia articles "we" are not lily white either.
In "them old days" generally we felt that when people would use
Wikipedia,
it would only serve our purpose; share the sum of all knowledge. I still feel really good about that. And, it has been shown that what we do; maintain / curate / update that data that it is not easily given to do as well as "we" do it.
When we are to be more precise with our copyright, there are a few things we could do to make copyright more transparent. When data is to be
uploaded
(Commons / Wikipedia or Wikidata) we should use a user that is OWNED and operated by the copyright holder. The operation may be by proxy and as a consequence there is no longer a question about copyright as the
copyright
holder can do as we wants. This makes any future noises just that, annoying.
As to copyright on Wikidata, when you consider copyright using data from Wikipedia. The question is: "What Wikipedia" I have copied a lot of data from several Wikipedias and believe me, from a quality point of view
there
is much to be gained by using Wikidata as an instrument for good because
it
is really strong in identifying friends and false friends. It is superior as a tool for disambiguation.
About the copyright on data, the overriding question with data is: do you copy data wholesale in Wikidata. That is what a database copyright is about. As I wrote on my blog [1], the best data to include is data that
is
corroborated by the fact that it is present in multiple sources. This negates the notion of a single source, it also underscores that much of
the
data everywhere is replicated a lot. It also underscores, again, the
notion
that data that is only present in single sources is what needs attention. It needs tender loving care, it needs other sources to establish credentials. That is in its own right what makes any claim of copyright moot. It is in this process that it becomes a "creative" process negating the copyright held on databases.
I welcome the attention that is given to copyright in Wikidata. However
our
attention to copyright is predatory in two ways. It is how can we get around existing copyright and how can we protect our own. As argued, Wikidata shines when it is used for what it is intended to be; the place that brings data, of Wikipedias first and elsewhere second, together to
be
used as a repository of quality, open and linked data. Thanks, GerardM
[1]
https://ultimategerardm.blogspot.nl/2018/05/wikidata-copyright-and-linked-da...
On 11 May 2018 at 23:10, Rob Speer rob@luminoso.com wrote:
Wow, thanks for the heads up. When I was getting upset about projects
that
change the license on Wikimedia content and commercialize it, I had no
idea
that Wikidata was providing them the cover to do so. The Creative
Commons
violation is coming from inside the house!
On Tue, 8 May 2018 at 03:48 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Hello everybody,
There is a phabricator ticket on Solve legal uncertainty of Wikidata https://phabricator.wikimedia.org/T193728 that you might be
interested
to look at and participate in.
As Denny suggested in the ticket to give it more visibility through
the
discussion on the Wikidata chat < https://www.wikidata.org/wiki/Wikidata:Project_chat#
Importing_datasets_under_incompatible_licenses>,
I thought it was interesting to highlight it a bit more.
Cheers
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hi Denny,
On 18.05.2018 02:54, Denny Vrandečić wrote:
Rob Speer wrote:
The result of this, by the way, is that commercial entities sell modified versions of Wikidata with impunity. It undermines the terms of other resources such as DBPedia, which also contains facts extracted from Wikipedia and respects its Share-Alike terms. Why would anyone use DBPedia and have to agree to share alike, when they can get similar data from Wikidata which promises them it's CC-0?
The comparison to DBpedia is interesting: the terms for DBpedia state "Attribution in this case means keep DBpedia URIs visible and active through at least one (preferably all) of @href, <link />, or "Link:". If live links are impossible (e.g., when printed on paper), a textual blurb-based attribution is acceptable." http://wiki.dbpedia.org/terms-imprint
So according to these terms, when someone displays data from DBpedia, it is entirely sufficient to attribute DBpedia.
What that means is that DBpedia follows exactly the same theory as Wikidata: it is OK to extract data from Wikipedia and republish it as your own dataset under your own copyright without requiring attribution to the original source of the extraction.
(A bit more problematic might be the fact that DBpedia also republishes whole paragraphs of Text under these terms, but that's another story)
My understanding is that all that Wikidata has extracted from Wikipedia is non-copyrightable in the first place and thus republishing it under a different license (or, as in the case of DBpedia for simple triples, with a different attribution) is legally sound.
In the SmartDataWeb project https://www.smartdataweb.de/ we hired lawyers to write a legal review about the extraction situation. Facts can be extracted and republished under CC-0 without problem as is the case of infoboxes.. Copying a whole database is a different because database rights hold. If you only extract ~ two sentences it falls under citation, which is also easy. If it is more than two sentence, then copyright applies.
I can check whether it is ready and shareable. The legal review (Gutachten) is quite a big thing as it has some legal relevancy and can be cited in court.
Hence we can switch to ODC-BY with facts as CC-0 and the text as share-alike. However the attribution mentioned in the imprint is still fine, since it is under database and not the content/facts. I am still uncertain about the attribution. If you remix and publish you need to cite the direct sources. But if somebody takes from you, does he only attribute to you or to everybody you used in a transitive way.
Anyhow, we are sharpening the whole model towards technology, not data/content. So the databus will be a transparent layer and it is much easier to find the source like Wikipedia and Wikidata and do contributions there, which is actually one of the intentions of share-alike (getting work pushed back/upstream).
All the best, Sebastian
If there is disagreement with that, I would be interested which content exactly is considered to be under copyright and where license has not been followed on Wikidata.
For completion: the discussion is going on in parallel on the Wikidata project chat and in Phabricator:
https://phabricator.wikimedia.org/T193728#4212728 https://www.wikidata.org/wiki/Wikidata:Project_chat#Wikipedia_and_other_Wiki...
I would appreciate if we could keep the discussion in a single place.
Gnom1 on Phabricator has offered to actually answer legal questions, but we need to come up with the questions that we want to ask. If it should be, for example, as Rob Speer states on the bug, "has the copyright of interwiki links been breached by having them be moved to Wikidata?", I'd be quite happy with that question - if that's the disagreement, let us ask Legal help and see if my understanding or yours is correct.
Does this sound like a reasonable question? Or which other question would you like to ask instead?
On Thu, May 17, 2018 at 4:15 PM Rob Speer <rob@luminoso.com mailto:rob@luminoso.com> wrote:
> As always, copyright is predatory. As we can prove that copyright is the enemy of science and knowledge Well, this kind of gets to the heart of the issue, doesn't it. I support the Creative Commons license, including the share-alike term, which requires copyright in order to work, and I've contributed to multiple Wikimedia projects with the understanding that my work would be protected by CC-By-SA. Wikidata is engaged in a project-wide act of disobedience against CC-By-SA. I would say that GerardM has provided an excellent summary of the attitude toward Creative Commons that I've encountered on Wikidata: "it's holding us back", "it's the enemy", "you can't copyright knowledge", "you can't make us follow it", etc. The result of this, by the way, is that commercial entities sell modified versions of Wikidata with impunity. It undermines the terms of other resources such as DBPedia, which also contains facts extracted from Wikipedia and respects its Share-Alike terms. Why would anyone use DBPedia and have to agree to share alike, when they can get similar data from Wikidata which promises them it's CC-0? On Wed, 16 May 2018 at 21:43 Gerard Meijssen <gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>> wrote: > Hoi, > Thank you for the overly broad misrepresentation. As always, copyright is > predatory. As we can prove that copyright is the enemy of science and > knowledge we should not be upset that *copyright *is abused we should > welcome it as it proves the point. Also when we use texts from everywhere > and rephrase it in Wikipedia articles "we" are not lily white either. > > In "them old days" generally we felt that when people would use Wikipedia, > it would only serve our purpose; share the sum of all knowledge. I still > feel really good about that. And, it has been shown that what we do; > maintain / curate / update that data that it is not easily given to do as > well as "we" do it. > > When we are to be more precise with our copyright, there are a few things > we could do to make copyright more transparent. When data is to be uploaded > (Commons / Wikipedia or Wikidata) we should use a user that is OWNED and > operated by the copyright holder. The operation may be by proxy and as a > consequence there is no longer a question about copyright as the copyright > holder can do as we wants. This makes any future noises just that, > annoying. > > As to copyright on Wikidata, when you consider copyright using data from > Wikipedia. The question is: "What Wikipedia" I have copied a lot of data > from several Wikipedias and believe me, from a quality point of view there > is much to be gained by using Wikidata as an instrument for good because it > is really strong in identifying friends and false friends. It is superior > as a tool for disambiguation. > > About the copyright on data, the overriding question with data is: do you > copy data wholesale in Wikidata. That is what a database copyright is > about. As I wrote on my blog [1], the best data to include is data that is > corroborated by the fact that it is present in multiple sources. This > negates the notion of a single source, it also underscores that much of the > data everywhere is replicated a lot. It also underscores, again, the notion > that data that is only present in single sources is what needs attention. > It needs tender loving care, it needs other sources to establish > credentials. That is in its own right what makes any claim of copyright > moot. It is in this process that it becomes a "creative" process negating > the copyright held on databases. > > I welcome the attention that is given to copyright in Wikidata. However our > attention to copyright is predatory in two ways. It is how can we get > around existing copyright and how can we protect our own. As argued, > Wikidata shines when it is used for what it is intended to be; the place > that brings data, of Wikipedias first and elsewhere second, together to be > used as a repository of quality, open and linked data. > Thanks, > GerardM > > [1] > > https://ultimategerardm.blogspot.nl/2018/05/wikidata-copyright-and-linked-data.html > > On 11 May 2018 at 23:10, Rob Speer <rob@luminoso.com <mailto:rob@luminoso.com>> wrote: > > > Wow, thanks for the heads up. When I was getting upset about projects > that > > change the license on Wikimedia content and commercialize it, I had no > idea > > that Wikidata was providing them the cover to do so. The Creative Commons > > violation is coming from inside the house! > > > > On Tue, 8 May 2018 at 03:48 mathieu stumpf guntz < > > psychoslave@culture-libre.org <mailto:psychoslave@culture-libre.org>> wrote: > > > > > Hello everybody, > > > > > > There is a phabricator ticket on Solve legal uncertainty of Wikidata > > > <https://phabricator.wikimedia.org/T193728> that you might be > interested > > > to look at and participate in. > > > > > > As Denny suggested in the ticket to give it more visibility through the > > > discussion on the Wikidata chat > > > < > > > https://www.wikidata.org/wiki/Wikidata:Project_chat# > > Importing_datasets_under_incompatible_licenses>, > > > > > > I thought it was interesting to highlight it a bit more. > > > > > > Cheers > > > > > > _______________________________________________ > > > Wikimedia-l mailing list, guidelines at: > > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and > > > https://meta.wikimedia.org/wiki/Wikimedia-l > > > New messages to: Wikimedia-l@lists.wikimedia.org <mailto:Wikimedia-l@lists.wikimedia.org> > > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-request@lists.wikimedia.org <mailto:wikimedia-l-request@lists.wikimedia.org>?subject=unsubscribe> > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ > > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ > > wiki/Wikimedia-l > > New messages to: Wikimedia-l@lists.wikimedia.org <mailto:Wikimedia-l@lists.wikimedia.org> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > <mailto:wikimedia-l-request@lists.wikimedia.org <mailto:wikimedia-l-request@lists.wikimedia.org>?subject=unsubscribe> > _______________________________________________ > Wikimedia-l mailing list, guidelines at: > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and > https://meta.wikimedia.org/wiki/Wikimedia-l > New messages to: Wikimedia-l@lists.wikimedia.org <mailto:Wikimedia-l@lists.wikimedia.org> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-request@lists.wikimedia.org <mailto:wikimedia-l-request@lists.wikimedia.org>?subject=unsubscribe> _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org <mailto:Wikimedia-l@lists.wikimedia.org> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org <mailto:wikimedia-l-request@lists.wikimedia.org>?subject=unsubscribe>
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Thank you for your answer, Sebastian.
Publishing the Gutachten would be fantastic! That would be very helpful and deeply appreciated.
Regarding the relicensing, I agree with you. You can just go and do that, and given that you ask for attribution to DBpedia, and not to Wikipedia, I would claim that's what you're doing. And I think that's fine.
Regarding attribution, commonly it is assumed that you have to respect it transitively. That is one of the reasons a license that requires BY sucks so hard for data: unlike with text, the attribution requirements grow very quickly. It is the same as with modified images and collages: it is not sufficient to attribute the last author, but all contributors have to be attributed.
This is why I think that whoever wants to be part of a large federation of data on the web, should publish under CC0.
That is very different from licensing texts or images. But for data anything else is just weird and will bite is in the long run more than we might ever benefit.
So, just to say it again: if the Gutachten you mentioned could be made available, that would be very very awesome!
Thank you, Denny
On Thu, May 17, 2018, 23:06 Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Hi Denny,
On 18.05.2018 02:54, Denny Vrandečić wrote:
Rob Speer wrote:
The result of this, by the way, is that commercial entities sell modified versions of Wikidata with impunity. It undermines the terms of other resources such as DBPedia, which also contains facts extracted from Wikipedia and respects its Share-Alike terms. Why would anyone use
DBPedia
and have to agree to share alike, when they can get similar data from Wikidata which promises them it's CC-0?
The comparison to DBpedia is interesting: the terms for DBpedia state "Attribution in this case means keep DBpedia URIs visible and active through at least one (preferably all) of @href, <link />, or "Link:". If live links are impossible (e.g., when printed on paper), a textual blurb-based attribution is acceptable." http://wiki.dbpedia.org/terms-imprint
So according to these terms, when someone displays data from DBpedia, it is entirely sufficient to attribute DBpedia.
What that means is that DBpedia follows exactly the same theory as Wikidata: it is OK to extract data from Wikipedia and republish it as your own dataset under your own copyright without requiring attribution to the original source of the extraction.
(A bit more problematic might be the fact that DBpedia also republishes whole paragraphs of Text under these terms, but that's another story)
My understanding is that all that Wikidata has extracted from Wikipedia is non-copyrightable in the first place and thus republishing it under a different license (or, as in the case of DBpedia for simple triples, with a different attribution) is legally sound.
In the SmartDataWeb project https://www.smartdataweb.de/ we hired lawyers to write a legal review about the extraction situation. Facts can be extracted and republished under CC-0 without problem as is the case of infoboxes.. Copying a whole database is a different because database rights hold. If you only extract ~ two sentences it falls under citation, which is also easy. If it is more than two sentence, then copyright applies.
I can check whether it is ready and shareable. The legal review (Gutachten) is quite a big thing as it has some legal relevancy and can be cited in court.
Hence we can switch to ODC-BY with facts as CC-0 and the text as share-alike. However the attribution mentioned in the imprint is still fine, since it is under database and not the content/facts. I am still uncertain about the attribution. If you remix and publish you need to cite the direct sources. But if somebody takes from you, does he only attribute to you or to everybody you used in a transitive way.
Anyhow, we are sharpening the whole model towards technology, not data/content. So the databus will be a transparent layer and it is much easier to find the source like Wikipedia and Wikidata and do contributions there, which is actually one of the intentions of share-alike (getting work pushed back/upstream).
All the best, Sebastian
If there is disagreement with that, I would be interested which content exactly is considered to be under copyright and where license has not been followed on Wikidata.
For completion: the discussion is going on in parallel on the Wikidata project chat and in Phabricator:
https://phabricator.wikimedia.org/T193728#4212728
https://www.wikidata.org/wiki/Wikidata:Project_chat#Wikipedia_and_other_Wiki...
I would appreciate if we could keep the discussion in a single place.
Gnom1 on Phabricator has offered to actually answer legal questions, but we need to come up with the questions that we want to ask. If it should be, for example, as Rob Speer states on the bug, "has the copyright of interwiki links been breached by having them be moved to Wikidata?", I'd be quite happy with that question - if that's the disagreement, let us ask Legal help and see if my understanding or yours is correct.
Does this sound like a reasonable question? Or which other question would you like to ask instead?
On Thu, May 17, 2018 at 4:15 PM Rob Speer rob@luminoso.com wrote:
As always, copyright is predatory. As we can prove that copyright is the
enemy of science and knowledge
Well, this kind of gets to the heart of the issue, doesn't it.
I support the Creative Commons license, including the share-alike term, which requires copyright in order to work, and I've contributed to multiple Wikimedia projects with the understanding that my work would be protected by CC-By-SA.
Wikidata is engaged in a project-wide act of disobedience against CC-By-SA. I would say that GerardM has provided an excellent summary of the attitude toward Creative Commons that I've encountered on Wikidata: "it's holding us back", "it's the enemy", "you can't copyright knowledge", "you can't make us follow it", etc.
The result of this, by the way, is that commercial entities sell modified versions of Wikidata with impunity. It undermines the terms of other resources such as DBPedia, which also contains facts extracted from Wikipedia and respects its Share-Alike terms. Why would anyone use DBPedia and have to agree to share alike, when they can get similar data from Wikidata which promises them it's CC-0?
On Wed, 16 May 2018 at 21:43 Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Thank you for the overly broad misrepresentation. As always, copyright
is
predatory. As we can prove that copyright is the enemy of science and knowledge we should not be upset that *copyright *is abused we should welcome it as it proves the point. Also when we use texts from
everywhere
and rephrase it in Wikipedia articles "we" are not lily white either.
In "them old days" generally we felt that when people would use
Wikipedia,
it would only serve our purpose; share the sum of all knowledge. I still feel really good about that. And, it has been shown that what we do; maintain / curate / update that data that it is not easily given to do
as
well as "we" do it.
When we are to be more precise with our copyright, there are a few
things
we could do to make copyright more transparent. When data is to be
uploaded
(Commons / Wikipedia or Wikidata) we should use a user that is OWNED and operated by the copyright holder. The operation may be by proxy and as a consequence there is no longer a question about copyright as the
copyright
holder can do as we wants. This makes any future noises just that, annoying.
As to copyright on Wikidata, when you consider copyright using data from Wikipedia. The question is: "What Wikipedia" I have copied a lot of data from several Wikipedias and believe me, from a quality point of view
there
is much to be gained by using Wikidata as an instrument for good
because it
is really strong in identifying friends and false friends. It is
superior
as a tool for disambiguation.
About the copyright on data, the overriding question with data is: do
you
copy data wholesale in Wikidata. That is what a database copyright is about. As I wrote on my blog [1], the best data to include is data that
is
corroborated by the fact that it is present in multiple sources. This negates the notion of a single source, it also underscores that much of
the
data everywhere is replicated a lot. It also underscores, again, the
notion
that data that is only present in single sources is what needs
attention.
It needs tender loving care, it needs other sources to establish credentials. That is in its own right what makes any claim of copyright moot. It is in this process that it becomes a "creative" process
negating
the copyright held on databases.
I welcome the attention that is given to copyright in Wikidata. However
our
attention to copyright is predatory in two ways. It is how can we get around existing copyright and how can we protect our own. As argued, Wikidata shines when it is used for what it is intended to be; the place that brings data, of Wikipedias first and elsewhere second, together to
be
used as a repository of quality, open and linked data. Thanks, GerardM
[1]
https://ultimategerardm.blogspot.nl/2018/05/wikidata-copyright-and-linked-da...
On 11 May 2018 at 23:10, Rob Speer rob@luminoso.com wrote:
Wow, thanks for the heads up. When I was getting upset about projects
that
change the license on Wikimedia content and commercialize it, I had no
idea
that Wikidata was providing them the cover to do so. The Creative
Commons
violation is coming from inside the house!
On Tue, 8 May 2018 at 03:48 mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
Hello everybody,
There is a phabricator ticket on Solve legal uncertainty of Wikidata https://phabricator.wikimedia.org/T193728 that you might be
interested
to look at and participate in.
As Denny suggested in the ticket to give it more visibility through
the
discussion on the Wikidata chat < https://www.wikidata.org/wiki/Wikidata:Project_chat#
Importing_datasets_under_incompatible_licenses>,
I thought it was interesting to highlight it a bit more.
Cheers
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org
?subject=unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Hi,
Le 19/05/2018 à 03:35, Denny Vrandečić a écrit :
Regarding attribution, commonly it is assumed that you have to respect it transitively. That is one of the reasons a license that requires BY sucks so hard for data: unlike with text, the attribution requirements grow very quickly. It is the same as with modified images and collages: it is not sufficient to attribute the last author, but all contributors have to be attributed.
If we want our data to be trustable, then we need traceability. That is reporting this chain of sources as extensively as possible, whatever the license require or not as attribution. CC-0 allow to break this traceability, which make an aweful license to whoever is concerned with obtaining reliable data.
This is why I think that whoever wants to be part of a large federation of data on the web, should publish under CC0.
As long as one aim at making a federation of untrustable data banks, that's perfect. ;)
Hi Mathieu,
On 04-07-18 11:07, mathieu stumpf guntz wrote:
Hi,
Le 19/05/2018 à 03:35, Denny Vrandečić a écrit :
Regarding attribution, commonly it is assumed that you have to respect it transitively. That is one of the reasons a license that requires BY sucks so hard for data: unlike with text, the attribution requirements grow very quickly. It is the same as with modified images and collages: it is not sufficient to attribute the last author, but all contributors have to be attributed.
If we want our data to be trustable, then we need traceability. That is reporting this chain of sources as extensively as possible, whatever the license require or not as attribution. CC-0 allow to break this traceability, which make an aweful license to whoever is concerned with obtaining reliable data.
A license is not the way to achieve this. We have references for that.
This is why I think that whoever wants to be part of a large federation of data on the web, should publish under CC0.
As long as one aim at making a federation of untrustable data banks, that's perfect. ;)
So I see you started forum shopping (trying to get the Wikimedia-l people in) and making contentious trying to be funny remarks. That's usually a good indication a thread is going nowhere.
No, Wikidata is not going to change the CC0. You seem to be the only person wanting that and trying to discredit Wikidata will not help you in your crusade. I suggest the people who are still interested in this to go to https://phabricator.wikimedia.org/T193728 and make useful comments over there.
Maarten
I agree with Maarten and to add to that. It is a huge misconception that CC0 makes data unreliable. It is only a legal statement about copyright, nothing more, nothing less. Statements without proper references and qualifiers make data unreliable, but Wikidata has a decent mechanism to capture that needed provenance.
On Wed, Jul 4, 2018 at 12:50 PM, Maarten Dammers maarten@mdammers.nl wrote:
Hi Mathieu,
On 04-07-18 11:07, mathieu stumpf guntz wrote:
Hi,
Le 19/05/2018 à 03:35, Denny Vrandečić a écrit :
Regarding attribution, commonly it is assumed that you have to respect it transitively. That is one of the reasons a license that requires BY sucks so hard for data: unlike with text, the attribution requirements grow very quickly. It is the same as with modified images and collages: it is not sufficient to attribute the last author, but all contributors have to be attributed.
If we want our data to be trustable, then we need traceability. That is reporting this chain of sources as extensively as possible, whatever the license require or not as attribution. CC-0 allow to break this traceability, which make an aweful license to whoever is concerned with obtaining reliable data.
A license is not the way to achieve this. We have references for that.
This is why I think that whoever wants to be part of a large federation of data on the web, should publish under CC0.
As long as one aim at making a federation of untrustable data banks, that's perfect. ;)
So I see you started forum shopping (trying to get the Wikimedia-l people in) and making contentious trying to be funny remarks. That's usually a good indication a thread is going nowhere.
No, Wikidata is not going to change the CC0. You seem to be the only person wanting that and trying to discredit Wikidata will not help you in your crusade. I suggest the people who are still interested in this to go to https://phabricator.wikimedia.org/T193728 and make useful comments over there.
Maarten
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Andra,
I agree this is misconception that a copyright license make any direct change to data reliability. But attribution requirement does somewhat indirectly have an impact on it, as it legally enforce traceability. That is I strongly disagree with the following assertion: "a license that requires BY sucks so hard for data [because] attribution requirements grow very quickly". To my mind it is equivalent to say that we will throw away traceability because it is subjectively judged too large a burden, without providing any start of evidence that it indeed can't be managed, at least with Wikimedia current ressources.
Now, I don't say traceability is the sole factor one should take into account in data reliability, but certainly it is one of them. Maybe we should first come with clear criteria to put in a equation that enable to calculate reliability of information. Since it's in the core goals of the Wikimedia strategy, it would certainly worth the effort to establish clear metrics about reliability of information the movement is spreading.
Cheers
Le 04/07/2018 à 13:00, Andra Waagmeester a écrit :
I agree with Maarten and to add to that. It is a huge misconception that CC0 makes data unreliable. It is only a legal statement about copyright, nothing more, nothing less. Statements without proper references and qualifiers make data unreliable, but Wikidata has a decent mechanism to capture that needed provenance.
On Wed, Jul 4, 2018 at 12:50 PM, Maarten Dammers <maarten@mdammers.nl mailto:maarten@mdammers.nl> wrote:
Hi Mathieu, On 04-07-18 11:07, mathieu stumpf guntz wrote: Hi, Le 19/05/2018 à 03:35, Denny Vrandečić a écrit : Regarding attribution, commonly it is assumed that you have to respect it transitively. That is one of the reasons a license that requires BY sucks so hard for data: unlike with text, the attribution requirements grow very quickly. It is the same as with modified images and collages: it is not sufficient to attribute the last author, but all contributors have to be attributed. If we want our data to be trustable, then we need traceability. That is reporting this chain of sources as extensively as possible, whatever the license require or not as attribution. CC-0 allow to break this traceability, which make an aweful license to whoever is concerned with obtaining reliable data. A license is not the way to achieve this. We have references for that. This is why I think that whoever wants to be part of a large federation of data on the web, should publish under CC0. As long as one aim at making a federation of untrustable data banks, that's perfect. ;) So I see you started forum shopping (trying to get the Wikimedia-l people in) and making contentious trying to be funny remarks. That's usually a good indication a thread is going nowhere. No, Wikidata is not going to change the CC0. You seem to be the only person wanting that and trying to discredit Wikidata will not help you in your crusade. I suggest the people who are still interested in this to go to https://phabricator.wikimedia.org/T193728 <https://phabricator.wikimedia.org/T193728> and make useful comments over there. Maarten _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi!
I agree this is misconception that a copyright license make any direct change to data reliability. But attribution requirement does somewhat indirectly have an impact on it, as it legally enforce traceability.
While true, I don't think it's of much practical use if traceability is what you are seriously interested in. Imagine Wikidata were CC-BY, so each piece of data you use from Wikidata now has to be marked as "coming from Wikidata.Org". What have you gained? Wikidata is huge, and this mark doesn't even tell you which item it is from, while being completely satisfactory legally. Even more useless it is for actually ensuring the data is correct or tracing its provenance to primary sources - you'd still have to find the item and check the references manually (or automatically, maybe) as you could do for CC0. CC-BY license would not have added very much on Wikidata side. All this is while, of course, even with CC0 nothing prevents you from importing Wikidata data in such a way that each piece of data still carries the mark "coming from Wikidata". While it is not a legal requirement with CC0, nothing in CC0 prevents that from happening. If your provenance needs are matched by this, there's nothing preventing you from doing this, and legal requirements of CC-BY do not improve it for you in any way - they just would force people that *do not* need to do it still do it.
That is I strongly disagree with the following assertion: "a license that requires BY sucks so hard for data [because] attribution requirements grow very quickly". To my mind it is equivalent to say that
I think this assertion (that attribution requirements grow) is factually true. Each data piece from CC-BY data set needs to carry attribution. If your data needs require to combine several data sets, each of them needs to carry attribution. This attribution should be carried through all data processing pipelines. You may be OK with this growth, but as I just explained above, these requirements, while being onerous for people that don't need tracing each piece of data, are still unsatisfactory in many cases for those that do. So having CC-BY would be both onerous and useless.
we will throw away traceability because it is subjectively judged too large a burden, without providing any start of evidence that it indeed can't be managed, at least with Wikimedia current ressources.
It's not Wikimedia that will be shouldering the burden, it's every user of Wikimedia data sets.
Le 07/07/2018 à 19:55, Stas Malyshev a écrit :
I think this assertion (that attribution requirements grow) is factually true. Each data piece from CC-BY data set needs to carry attribution. If your data needs require to combine several data sets, each of them needs to carry attribution. This attribution should be carried through all data processing pipelines. You may be OK with this growth, but as I just explained above, these requirements, while being onerous for people that don't need tracing each piece of data, are still unsatisfactory in many cases for those that do. So having CC-BY would be both onerous and useless.
Hi Stas,
The attribution need to be carried only through processing pipelines whose results need to be published.
Can we talk about real concrete examples where attribution would seriously prevent any real case use? If all this stands on solid facts, surely it shouldn't be too hard to come with at least one example. Otherwise, it is certainly useless to continue this discussion.
Cheers
Hoi, Bolderdash and Wikipedia think. When you think Wikipedia has quality, and it has, it does not have absolute quality. I have added a lot of information from Wikipedia to Wikidata and there is a lot that is plain wrong from a data perspective, there are the errors and there is a lot that is just missing. This is particularly true when the subject is not really what people are interested in. Things like the Polk award, subdistricts of Botswana the list is long. I am adding much of the information by hand, add missing parts and the main use for the missing data is in the relations.
As I have said so often, quality of data is in having the same data in multiple sources. It follows that the data that can safely be added to Wikidata is the data where multiple sources agree on the represented facts. This is done easiest by bots and indeed there algorithms are defined in their code. When new data is included based on a multitude of sources, what is the source? Particularly when data is inconsistent as multiple sources cannot agree on specific data, sources become relevant but it is also where you go into real research.
Arguably, when data sources differ, you easily get into disputed facts and fake facts. This is where sourcing the facts becomes relevant. It is also where you get into real research and where as a consequence the license of the information becomes irrelevant.
In my opinion, we have grown up thinking in serial sourcing and particularly when you apply this approach on data stores like Wikidata your algorithms and thinking fails reality. Thanks, GerardM
On 7 July 2018 at 19:55, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
I agree this is misconception that a copyright license make any direct change to data reliability. But attribution requirement does somewhat indirectly have an impact on it, as it legally enforce traceability.
While true, I don't think it's of much practical use if traceability is what you are seriously interested in. Imagine Wikidata were CC-BY, so each piece of data you use from Wikidata now has to be marked as "coming from Wikidata.Org". What have you gained? Wikidata is huge, and this mark doesn't even tell you which item it is from, while being completely satisfactory legally. Even more useless it is for actually ensuring the data is correct or tracing its provenance to primary sources - you'd still have to find the item and check the references manually (or automatically, maybe) as you could do for CC0. CC-BY license would not have added very much on Wikidata side. All this is while, of course, even with CC0 nothing prevents you from importing Wikidata data in such a way that each piece of data still carries the mark "coming from Wikidata". While it is not a legal requirement with CC0, nothing in CC0 prevents that from happening. If your provenance needs are matched by this, there's nothing preventing you from doing this, and legal requirements of CC-BY do not improve it for you in any way - they just would force people that *do not* need to do it still do it.
That is I strongly disagree with the following assertion: "a license that requires BY sucks so hard for data [because] attribution requirements grow very quickly". To my mind it is equivalent to say that
I think this assertion (that attribution requirements grow) is factually true. Each data piece from CC-BY data set needs to carry attribution. If your data needs require to combine several data sets, each of them needs to carry attribution. This attribution should be carried through all data processing pipelines. You may be OK with this growth, but as I just explained above, these requirements, while being onerous for people that don't need tracing each piece of data, are still unsatisfactory in many cases for those that do. So having CC-BY would be both onerous and useless.
we will throw away traceability because it is subjectively judged too large a burden, without providing any start of evidence that it indeed can't be managed, at least with Wikimedia current ressources.
It's not Wikimedia that will be shouldering the burden, it's every user of Wikimedia data sets.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Sat, Jul 7, 2018 at 5:59 PM mathieu lovato stumpf guntz < psychoslave@culture-libre.org> wrote:
I agree this is misconception that a copyright license make any direct change to data reliability. But attribution requirement does somewhat indirectly have an impact on it, as it legally enforce traceability.
I know that "law" has a special corner, but therefore not always the best... law, in the end, is just a social construct, just like anything we agree on. First, we all agree (it seems to me) that provenance is valuable.
However, having something in law (or contract) effectively criminalizes if you fail to add the provenance. Is that what you really wish? Do you want to be able to legally punish people if the fail to give provenance? Honestly, that sounds a bit harsh to me... and to me, and this is a personal opinion and not an argument, I think Wikidata is more open, more inclusive than that: Wikidata offers carrots, not sticks.
Egon
Hi Andra,
Le 04/07/2018 à 13:00, Andra Waagmeester a écrit :
No, Wikidata is not going to change the CC0. You seem to be the only person wanting that and trying to discredit Wikidata will not help you in your crusade. I suggest the people who are still interested in this to go to https://phabricator.wikimedia.org/T193728 <https://phabricator.wikimedia.org/T193728> and make useful comments over there.
It seems all this assertions are following some erroneous assumptions. This ticket is not about changing Wikidata license. It aims at making sure what can and what can not be legally imported into a database using CC0, and in which juridiction it can be legally used safely or not in downstream projects.
It would certainly be interesting that Wikimedia infrastructure would allow to host projects using Wikibase with other topic/license scopes that are queriables within other Wikimedia projects. Surelly it would make a good match with the "become the essential infrastructure of the ecosystem of free knowledge" goal. But that's an other story, and I didn't found time to work on that topic so far.
It would also be great if we could avoid to imput the title of "crusader dedicated to discredit Wikidata" to someone that not later than this afternoon helped a new contributor to make its first edit on this project.
Cheers.
Maarten _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi,
2018-07-04 12:50 GMT+02:00 Maarten Dammers maarten@mdammers.nl:
Hi Mathieu,
So I see you started forum shopping (trying to get the Wikimedia-l people in) and making contentious trying to be funny remarks. That's usually a good indication a thread is going nowhere.
No, Wikidata is not going to change the CC0. You seem to be the only person wanting that and trying to discredit Wikidata will not help you in your crusade. I suggest the people who are still interested in this to go to https://phabricator.wikimedia.org/T193728 and make useful comments over there.
I concur totally with analysis.
Regards,
Yann Forget