Hi!
Sorry, I'm quite sure I'm re-opening an issue already discussed but I can't find where; if so, please share the link.
I'm working with cultural items data on wikidata and I'm wondering what I'm allowed to do when for instance: - I want to improve the item Q618719 https://www.wikidata.org/wiki/Q618719 - I find information in Google Books API https://www.googleapis.com/books/v1/volumes/?q=Asterix%20le%20gaulois about this item that could help me fill isbn properties in wikidata (P212 and P957)
Google Books API's Terms of Service https://developers.google.com/books/terms is elusive regarding the licence but it certainly isn't CC0. Meanwhile, I guess I'm allowed to copy the information by hand, right? It's just facts about the item, I'm using this API just like I would have used a newspaper or anything as a reference, right? But could I automate or semi-automate (à la wikidata game) the import process without being in infringement with either Google or Wikidata policies? I would just do the exact same thing - taking facts somewhere in the world and adding them to wikidata - but more efficiently, no?
My vision is quite blurred on this, thanks in advance for clarification!
Bests,
Max
Maxime Lathuilière, 08/09/2014 16:41:
Google Books API's Terms of Service https://developers.google.com/books/terms is elusive regarding the licence but it certainly isn't CC0. Meanwhile, I guess I'm allowed to copy the information by hand, right? It's just facts about the item, I'm using this API just like I would have used a newspaper or anything as a reference, right? But could I automate or semi-automate (à la wikidata game) the import process without being in infringement with either Google or Wikidata policies? I would just do the exact same thing - taking facts somewhere in the world and adding them to wikidata - but more efficiently, no?
Most of that is probably non-copyrightable and Wikidata is hosted in USA which have no database rights, so it can be legally ok, at least in some fashion. It's however hard to draw a line, so if there are alternatives it's better to use other data sources. See also the rather unconclusive https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
Moreover, the Google ToS are very clear in forbidding any activity which would result in you having a copy of their database/of the data the API provides access to, IIRC even in form of a cache. However, you only want ISBN? *If* you care about respecting your contract with Google, it may be wise to directly ask them if they're ok with it.
Nemo
I am not a lawyer, but if I remember correctly, copyright covers expression, not content. Since the Wikidata data model and its representation in JSON is rather unique, an ISBN number in a Wikidata statement seems to be novel to Wikidata.
Would rewriting a sentence from a book and then entering that sentence in Wikipedia violate copyright?
On Sat, Sep 13, 2014 at 11:59 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Maxime Lathuilière, 08/09/2014 16:41:
Google Books API's Terms of Service https://developers.google.com/books/terms is elusive regarding the licence but it certainly isn't CC0. Meanwhile, I guess I'm allowed to copy the information by hand, right? It's just facts about the item, I'm using this API just like I would have used a newspaper or anything as a reference, right? But could I automate or semi-automate (à la wikidata game) the import process without being in infringement with either Google or Wikidata policies? I would just do the exact same thing - taking facts somewhere in the world and adding them to wikidata - but more efficiently, no?
Most of that is probably non-copyrightable and Wikidata is hosted in USA which have no database rights, so it can be legally ok, at least in some fashion. It's however hard to draw a line, so if there are alternatives it's better to use other data sources. See also the rather unconclusive https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
Moreover, the Google ToS are very clear in forbidding any activity which would result in you having a copy of their database/of the data the API provides access to, IIRC even in form of a cache. However, you only want ISBN? *If* you care about respecting your contract with Google, it may be wise to directly ask them if they're ok with it.
Nemo
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Sat, Sep 13, 2014 at 7:23 PM, Denny Vrandečić vrandecic@gmail.com wrote:
I am not a lawyer, but if I remember correctly, copyright covers expression, not content. Since the Wikidata data model and its representation in JSON is rather unique, an ISBN number in a Wikidata statement seems to be novel to Wikidata.
Would rewriting a sentence from a book and then entering that sentence in Wikipedia violate copyright?
We have some documentation on that. :)
https://en.wikipedia.org/wiki/Wikipedia:Close_paraphrasing
-Jeremy
Thanks! That's a great page, and I think it is also helpful with the original question. On Sep 13, 2014 12:25 PM, "Jeremy Baron" jeremy@tuxmachine.com wrote:
On Sat, Sep 13, 2014 at 7:23 PM, Denny Vrandečić vrandecic@gmail.com wrote:
I am not a lawyer, but if I remember correctly, copyright covers
expression,
not content. Since the Wikidata data model and its representation in
JSON is
rather unique, an ISBN number in a Wikidata statement seems to be novel
to
Wikidata.
Would rewriting a sentence from a book and then entering that sentence in Wikipedia violate copyright?
We have some documentation on that. :)
https://en.wikipedia.org/wiki/Wikipedia:Close_paraphrasing
-Jeremy
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Jeremy Baron, 13/09/2014 21:25:
Would rewriting a sentence from a book and then entering that sentence in Wikipedia violate copyright?
We have some documentation on that. :)
I fail to see how this is relevant. We're talking of database rights here, not copyright; and a private contract, Google's TOS, which is clearly designed to give Google broader protection than the law already affords (otherwise they wouldn't even bother writing a contract, would they?).
Nemo
On Sat, Sep 13, 2014 at 8:00 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Jeremy Baron, 13/09/2014 21:25:
Would rewriting a sentence from a book and then entering that sentence in Wikipedia violate copyright?
We have some documentation on that. :)
I fail to see how this is relevant. We're talking of database rights here, not copyright; and a private contract, Google's TOS, which is clearly designed to give Google broader protection than the law already affords (otherwise they wouldn't even bother writing a contract, would they?).
I think it was relevant to Denny's question. But maybe you're questioning relevance to the original question from top of thread.
-Jeremy
On 13.09.2014 21:25, Jeremy Baron wrote:
On Sat, Sep 13, 2014 at 7:23 PM, Denny Vrandečić vrandecic@gmail.com wrote:
I am not a lawyer, but if I remember correctly, copyright covers expression, not content. Since the Wikidata data model and its representation in JSON is rather unique, an ISBN number in a Wikidata statement seems to be novel to Wikidata.
Would rewriting a sentence from a book and then entering that sentence in Wikipedia violate copyright?
We have some documentation on that. :)
Regarding data import, a very useful resource is:
https://wiki.creativecommons.org/Data#Frequently_asked_questions_about_data....
starting the first question (Which components of databases are protected by copyright?), where it says:
""" The data or other contents contained in the database are subject to copyright if they are sufficiently creative. Original poems contained in a database would be protected by copyright, but purely factual data (such as gene names or city populations) would not. Facts are not subject to copyright, nor are the ideas underlying copyrighted content. """
ISBNs are purely factual, non-creative data. Clearly, we do not copy the schema of the other database when importing such a fact -- we have our own schema. However:
""" In contrast to copyright, sui generis database rights are designed to protect a maker's substantial investment in a database. In particular, the right prevents the unauthorized extraction and reuse of a substantial portion of the contents. """
So you may not be allowed to copy "substantial portions" of a database, even if purely factual (there is another question in the FAQ on what this might mean). Note that only some jurisdictions have this concept in the first place.
Besides these general legal rights, of course many sites also have explicit terms of service that you must agree to explicitly before you can get access to their services. In general, this is like a contract between you and the service provider, and the default assumption would be that you are bound by whatever it says. So I suppose that such terms of service could add restrictions on top of applicable copyright (etc.) laws. Of course, it might be the case that some of these restrictions are void under applicable law, but that needs to be looked at on a case-by-case basis.
Markus
Regarding purely factual data comprising a less than significant portion of a database - which is certainly true for all ISBNs in Googles database, where records are considerably largen than a single ISBN - there should be no legal problem copying them under German law, regardless of Googles contract, imho.
But even, if there was one - it should be very easy to make software look up each ISBN elsewhere, too, and reconfirm that it is both formally correct and matching the book in question. There are plenty of places that offer these kinds of lookups, such as book sellers, library catalogs, the ISBN data base, and so on.
Once you have several sources of an ISBN, noone can tell any more, from which of them the copy is. So why rely on Google alone?
Btw. if a statement about an ISBN is sourced, among ohers, with "Source: Google", that does not imply having it from Google. It only states the fact: "Google has it, too."
Purodha
"Markus Krötzsch" markus@semantic-mediawiki.org writes:
On 13.09.2014 21:25, Jeremy Baron wrote:
On Sat, Sep 13, 2014 at 7:23 PM, Denny Vrandečić vrandecic@gmail.com wrote:
I am not a lawyer, but if I remember correctly, copyright covers expression, not content. Since the Wikidata data model and its representation in JSON is rather unique, an ISBN number in a Wikidata statement seems to be novel to Wikidata.
Would rewriting a sentence from a book and then entering that sentence in Wikipedia violate copyright?
We have some documentation on that. :)
Regarding data import, a very useful resource is:
https://wiki.creativecommons.org/Data#Frequently_asked_questions_about_data....
starting the first question (Which components of databases are protected by copyright?), where it says:
""" The data or other contents contained in the database are subject to copyright if they are sufficiently creative. Original poems contained in a database would be protected by copyright, but purely factual data (such as gene names or city populations) would not. Facts are not subject to copyright, nor are the ideas underlying copyrighted content. """
ISBNs are purely factual, non-creative data. Clearly, we do not copy the schema of the other database when importing such a fact -- we have our own schema. However:
""" In contrast to copyright, sui generis database rights are designed to protect a maker's substantial investment in a database. In particular, the right prevents the unauthorized extraction and reuse of a substantial portion of the contents. """
So you may not be allowed to copy "substantial portions" of a database, even if purely factual (there is another question in the FAQ on what this might mean). Note that only some jurisdictions have this concept in the first place.
Besides these general legal rights, of course many sites also have explicit terms of service that you must agree to explicitly before you can get access to their services. In general, this is like a contract between you and the service provider, and the default assumption would be that you are bound by whatever it says. So I suppose that such terms of service could add restrictions on top of applicable copyright (etc.) laws. Of course, it might be the case that some of these restrictions are void under applicable law, but that needs to be looked at on a case-by-case basis.
Markus
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Maxime and Wikidatans,
Just as Wikipedia / MediaWiki / Wikidata continue to generate anew open Creative Commons' licensed resources legally, and interlingually, I wonder if beginning searching C.C. first - http://search.creativecommons.org/ - for external-to-MediaWiki / Wikidata databases is a sensible first step in unfoldingly defining "Wikidata-l policy toward using non-CC0 licensed external databases as reference"?
Scott
Maxime Lathuilière, 08/09/2014 16:41:
Google Books API's Terms of Service https://developers.google.com/books/terms is elusive regarding the licence but it certainly isn't CC0. Meanwhile, I guess I'm allowed to copy the information by hand, right? It's just facts about the item, I'm using this API just like I would have used a newspaper or anything as a reference, right? But could I automate or semi-automate (à la wikidata game) the import process without being in infringement with either Google or Wikidata policies? I would just do the exact same thing - taking facts somewhere in the world and adding them to wikidata - but more efficiently, no?
Most of that is probably non-copyrightable and Wikidata is hosted in USA which have no database rights, so it can be legally ok, at least in some fashion. It's however hard to draw a line, so if there are alternatives it's better to use other data sources. See also the rather unconclusive https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
Moreover, the Google ToS are very clear in forbidding any activity which would result in you having a copy of their database/of the data the API provides access to, IIRC even in form of a cache. However, you only want ISBN? *If* you care about respecting your contract with Google, it may be wise to directly ask them if they're ok with it.
Nemo
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Sat, Sep 13, 2014 at 11:59 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
See also the rather unconclusive https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
Talk page for that is a good place to ask questions on this topic if you want LCA opinions (I have an intern working on one question as a result of it right now). And folks should definitely read it to get a sense of the relevant legal background.
As a more general note/context, everyone working in this area should be aware of a few key points:
- your intuitions about copyright may/may not apply, so be careful :/ - copyright law has had 100 years, multiple international treaties, and thousands of cases to be refined; database law has 15 years, no international treaties, and a few dozen cases. So clear yes/no answers are going to be hard to come by :/
FYI- Luis
On Sat, Sep 13, 2014 at 2:59 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
https://developers.google.com/books/terms Moreover, the Google ToS are very clear in forbidding any activity which would result in you having a copy of their database/of the data the API provides access to, IIRC even in form of a cache. However, you only want ISBN? *If* you care about respecting your contract with Google, it may be wise to directly ask them if they're ok with it.
Might be wise in any case: https://en.wikipedia.org/wiki/United_States_v._Aaron_Swartz
Running an automated process to access someone's computer in a way that they clearly don't allow is not a good idea.
Running an automated process to access someone's computer in a way that
they clearly don't allow is not a good idea.
European database laws are also really on the fact that a protected database copy is a copy, whatever the copy process is. Does not matter if an automated process took place or if a crowd take facts one by one, once there is a significant portion of the datas copied (whatever that means) there is a juridical risk.
2014-09-22 3:37 GMT+02:00 Anthony ok@theendput.com:
On Sat, Sep 13, 2014 at 2:59 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
https://developers.google.com/books/terms Moreover, the Google ToS are very clear in forbidding any activity which would result in you having a copy of their database/of the data the API provides access to, IIRC even in form of a cache. However, you only want ISBN? *If* you care about respecting your contract with Google, it may be wise to directly ask them if they're ok with it.
Might be wise in any case: https://en.wikipedia.org/wiki/United_States_v._Aaron_Swartz
Running an automated process to access someone's computer in a way that they clearly don't allow is not a good idea.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Mon, Sep 22, 2014 at 5:52 AM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
Running an automated process to access someone's computer in a way that
they clearly don't allow is not a good idea.
European database laws are also really on the fact that a protected database copy is a copy, whatever the copy process is. Does not matter if an automated process took place or if a crowd take facts one by one, once there is a significant portion of the datas copied (whatever that means) there is a juridical risk.
Yeah, its not clear to me how you could ever produce an ISBN database with any reasonable level of assurance that it's legit under those rules. With copyright you find multiple sources and put things in your own words. You cite your sources, and anyone can check that your sources provide the *information* which you have restated in your own words. While it's always possible that you plagiarized one source while citing another, these things can be discovered, and when two sources use the exact same words, copying is evident. With sui generis database laws, there's no way to really know if your multiple sources all are based on the same common source, and there's no way to "put things in your own words" to ensure that it doesn't matter. Moreover, the fact that one source has the exact same true factual information as another doesn't prove that one copied from another.
Aren't any of these databses copied, at least indirectly, from the database created by International ISBN Agency?
In any case, my point was that blatant violation of a TOS is more than just violation of "a private contract". It's at least potentially something much more serious. And on that, it matters quite a bit whether it's one person running an automated process or a group of people accessing the information bit by bit. The former got a brilliant young Wikimedian prosecuted. The latter hasn't.
--
By the way, I just read https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights and it's not clear to me exactly what the Wikimedia policy is with regard to sui generis database rights. The closest I see is "In the absence of a license, copying all or a substantial part of a protected database should be avoided." But what happens when thousands of people, working independently even, each copy an insubstantial portion of a protected database, and it adds up? I guess from a practical standpoint it'll be impossible to prove that this is what happened (unless there's a protected database which *created* the information in the first place, a la the one created by the International ISBN Agency). But then what we're really saying is not that you shouldn't violate European database law, but that you should do it in a way such that it can't be proven.
As I said above, I don't know how these laws can possibly be adhered to with any reasonable level of assurance. Maybe you or someone else on here has a suggestion?
Maybe you or someone else on here has a suggestion?
Sorry, although I'm aware of this laws for a long time, I never understood how this could work. Open Street Map is a precedent though and I'm not aware of any serious problem they had related to copyright. But they seem to be very conservative and cautious : see for example this topic on tgeir question site https://help.openstreetmap.org/questions/9171/sourcing-street-and-road-names...
"Even if it was legal to copy names etc from in copyright published maps (which may depend on the local legislation), we only want to use sources which we have either been explicitly allowed to use in OSM (and have a compatible licence) or have whatever the local equivalent of public domain status is."
2014-09-22 16:52 GMT+02:00 Anthony ok@theendput.com:
On Mon, Sep 22, 2014 at 5:52 AM, Thomas Douillard < thomas.douillard@gmail.com> wrote:
Running an automated process to access someone's computer in a way that
they clearly don't allow is not a good idea.
European database laws are also really on the fact that a protected database copy is a copy, whatever the copy process is. Does not matter if an automated process took place or if a crowd take facts one by one, once there is a significant portion of the datas copied (whatever that means) there is a juridical risk.
Yeah, its not clear to me how you could ever produce an ISBN database with any reasonable level of assurance that it's legit under those rules. With copyright you find multiple sources and put things in your own words. You cite your sources, and anyone can check that your sources provide the *information* which you have restated in your own words. While it's always possible that you plagiarized one source while citing another, these things can be discovered, and when two sources use the exact same words, copying is evident. With sui generis database laws, there's no way to really know if your multiple sources all are based on the same common source, and there's no way to "put things in your own words" to ensure that it doesn't matter. Moreover, the fact that one source has the exact same true factual information as another doesn't prove that one copied from another.
Aren't any of these databses copied, at least indirectly, from the database created by International ISBN Agency?
In any case, my point was that blatant violation of a TOS is more than just violation of "a private contract". It's at least potentially something much more serious. And on that, it matters quite a bit whether it's one person running an automated process or a group of people accessing the information bit by bit. The former got a brilliant young Wikimedian prosecuted. The latter hasn't.
--
By the way, I just read https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights and it's not clear to me exactly what the Wikimedia policy is with regard to sui generis database rights. The closest I see is "In the absence of a license, copying all or a substantial part of a protected database should be avoided." But what happens when thousands of people, working independently even, each copy an insubstantial portion of a protected database, and it adds up? I guess from a practical standpoint it'll be impossible to prove that this is what happened (unless there's a protected database which *created* the information in the first place, a la the one created by the International ISBN Agency). But then what we're really saying is not that you shouldn't violate European database law, but that you should do it in a way such that it can't be proven.
As I said above, I don't know how these laws can possibly be adhered to with any reasonable level of assurance. Maybe you or someone else on here has a suggestion?
One important thing to remember is that there's no "master database" of ISBNs. The central agency assigns very large blocks (eg 978-0-...), then national agencies assign smaller blocks to publishers (978-0-12-...), who then choose to assign them to books as and when they see fit.
The final assigners (publishers) keep records, which they usually make available to the book trade - this is the material that winds up in bookseller databases, and which you can see through Amazon. This may or many not have database-right issues, and comes originally from hundreds or thousands of individual publishers. Someone probably sells it in some way but I'm not particularly clear on this side of things.
Independently, after publication, libraries acquire copies of the books and create their own records (sometimes they copy each other's records, but usually from another library not from the book trade). These library databases have their own database-right issues - some make them available without restrictions, some claim copyright, some hand them over to third-party services who may themselves claim copyright, etc etc.
Again, there's a lot of library databases - there are attempts to produce a single aggregated database like WorldCat, but once you do that you definitely start getting into access rights issues. The only systematic attempt to make an unencumbered database is Open Library - but that is no doubt incomplete. https://openlibrary.org/help/faq/using#ownership
Andrew.
On 22 September 2014 15:52, Anthony ok@theendput.com wrote:
On Mon, Sep 22, 2014 at 5:52 AM, Thomas Douillard thomas.douillard@gmail.com wrote:
Running an automated process to access someone's computer in a way that they clearly don't allow is not a good idea.
European database laws are also really on the fact that a protected database copy is a copy, whatever the copy process is. Does not matter if an automated process took place or if a crowd take facts one by one, once there is a significant portion of the datas copied (whatever that means) there is a juridical risk.
Yeah, its not clear to me how you could ever produce an ISBN database with any reasonable level of assurance that it's legit under those rules. With copyright you find multiple sources and put things in your own words. You cite your sources, and anyone can check that your sources provide the *information* which you have restated in your own words. While it's always possible that you plagiarized one source while citing another, these things can be discovered, and when two sources use the exact same words, copying is evident. With sui generis database laws, there's no way to really know if your multiple sources all are based on the same common source, and there's no way to "put things in your own words" to ensure that it doesn't matter. Moreover, the fact that one source has the exact same true factual information as another doesn't prove that one copied from another.
Aren't any of these databses copied, at least indirectly, from the database created by International ISBN Agency?
In any case, my point was that blatant violation of a TOS is more than just violation of "a private contract". It's at least potentially something much more serious. And on that, it matters quite a bit whether it's one person running an automated process or a group of people accessing the information bit by bit. The former got a brilliant young Wikimedian prosecuted. The latter hasn't.
--
By the way, I just read https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights and it's not clear to me exactly what the Wikimedia policy is with regard to sui generis database rights. The closest I see is "In the absence of a license, copying all or a substantial part of a protected database should be avoided." But what happens when thousands of people, working independently even, each copy an insubstantial portion of a protected database, and it adds up? I guess from a practical standpoint it'll be impossible to prove that this is what happened (unless there's a protected database which *created* the information in the first place, a la the one created by the International ISBN Agency). But then what we're really saying is not that you shouldn't violate European database law, but that you should do it in a way such that it can't be proven.
As I said above, I don't know how these laws can possibly be adhered to with any reasonable level of assurance. Maybe you or someone else on here has a suggestion?
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Anthony, 22/09/2014 16:52:
In any case, my point was that blatant violation of a TOS is more than just violation of "a private contract". It's at least potentially something much more serious. And on that, it matters quite a bit whether it's one person running an automated process or a group of people accessing the information bit by bit. The former got a brilliant young Wikimedian prosecuted. The latter hasn't.
CFAA is not law everywhere. In many countries, a private contract is a private contract.
Nemo