BabelNet (http://babelnet.org) is a multilingual knowledge resource that defines words and phrases in many languages. I've noticed that it copies large amounts of content from Wikimedia projects, including Wikipedia, Wiktionary, and Wikiquote, while violating Wikimedia's CC-By-SA license by placing the content under an incompatible CC-By-NC-SA license.
As one example, I can search BabelNet for "Timsort", a Wikipedia article whose first sentence is one I wrote: http://live.babelnet.org/synset?word=Timsort&lang=EN&details=1&o...
The sentence I wrote appears at the top of the page (with credit to Wikipedia). The rest of the page is also content remixed from Wikipedia, including a gallery of images that are presented without credit. A scrolly box in the footer of the page says the content is under the CC-By-NC-SA 3.0 license. Other pages, such as http://babelnet.org/synset?word=bn:00852566n, combine data from multiple different resources.
The BabelNet creators are aware of the CC-By-SA licenses of the resources they use (see http://babelnet.org/licenses/). In addition to the non-commercial license they offer, their company, Babelscape ( http://babelscape.com/), sells commercial licenses to BabelNet.
I reached out to Roberto Navigli, who runs BabelNet and Babelscape, over e-mail on March 23. I asked if the non-commercial license clause was simply a mistake. In his reply, Navigli stated that BabelNet is not a derived work, but is a CC-By-NC-SA-licensed collection made of several different works. I responded that BabelNet doesn't meet the Creative Commons definition of a "Collective Work", which would be necessary for it to not be a derived work. Navigli responded:
"actually it is a collection of derivative work of several resources with heretogeneous licenses, each of which clearly separated with separate licenses and bundles. By transitivity derivative work is work with a certain license, so it is work. Therefore, it is a collection of works with different licenses and it can keep a separate license."
I believe this is nonsense on multiple levels. BabelNet is a derived work, and if someone could disregard their obligation to share-alike their derived work simply because they derived it from multiple resources, there would be no point to putting ShareAlike clauses on data resources at all.
As a Wikipedia contributor (and a lapsed admin), I am sad to see BabelNet appropriating the hard work of Wikimedians and others, placing a more restrictive license on it, and selling it. This is also relevant for me because I run ConceptNet (http://www.conceptnet.io/), a similar knowledge resource, and I have made sure to follow Creative Commons license requirements and to release all its data as CC-By-SA.
In a way I see BabelNet as a competitor, but ConceptNet is an open data project and this space shouldn't have "competitors". If the Creative Commons license were being used appropriately, then all of us working with this kind of data would be collaborators in the world of Linked Open Data. My preferred outcome would be to get BabelNet to change the copyright notices and Creative Commons links on their site to remove the "non-commercial" requirement, and to be able to download and use their data under the CC-By-SA license that it should be under.
I'm sure Wikimedia has dealt with similar situations to this. What would be the most effective next step to ensure that BabelNet follows the CC-By-SA license?
-- Rob Speer