I was pinged on a Unicode repository https://github.com/unicode-org/unilex/issues/10#issuecomment-1872496490 asking for a WMF perspective on license compatibility. I gave my personal answer but I'm notifying the list in case someone does want to answer in name of WMF as Unicode member/user. (Also cc'ing Hugo, Stephen and Richard as I mentioned them.)
As my answer turned out to be rather long I'll copy it here for the archives' benefit.
----
@srl295 Thanks for the ping. I wasn't aware of this issue but I'll give a quick reply. I've only read the discussion above and the README. I can't speak for WMF, let alone Unicode (I don't remember whether WMF is even a member now), but I can tell about the usage of Unicode components in MediaWiki software and Wikimedia wikis.
The issue description highlights some confusion on the licensing of this project. Meanwhile the LICENSE has been updated to the Unicode license v3 which has been recently approved by OSI on 2023-11-17: https://opensource.org/license/unicode-license-v3/ . So there's no doubt this repository is opensource. Maybe this can be explicitly mentioned on the README, as not everyone is able to recognize the license text as its own OSI-approved Unicode v3 license.
MediaWiki can and does use software under Unicode license all the time, for example in the [CLDR extension](https://www.mediawiki.org/wiki/Extension:CLDR), which is primarily GPLv2, under the understanding that the CLDR data inside was under a BSD-like license. (Apertium linguistic data is also [usually](https://wiki.apertium.org/wiki/Contributing_to_an_existing_pair#Consider_con...) under GPL.) As long as Unilex can be used in GPL software, there are probably ways it can benefit all Wikimedia wikis through MediaWiki.
However @hugolpz seems most concerned about usage in Wikidata and other Wikimedia wikis _content_. From the README it sounds like this repository mostly wants to collect uncopyrightable factual information. In the EU, there might still be problems with database rights. A general opinion from the WMF on how to handle these is at https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights . In short, it's complicated, and it's easier to incorporate a dataset into Wikidata when it's already under CC-0. If there's some doubt on whether the data/ directory here as a whole is a dataset
If you want to cooperate with Wikidata lexemes in the future, it's worth considering how to make it easier. As for LinguaLibre, as far I understand it helps produce some recording which might be considered copyrightable, and it wants its outputs to be available under CC BY-SA, so it benefits from its sources being as permissive as possible.
Finally, I see that [many files](https://github.com/search?q=repo%3Aunicode-org%2Funilex+SPDX-License-Identif...) carry a `SPDX-License-Identifier: Unicode-DFS-2016` header, which makes it easier to follow the [Reuse](https://reuse.software/) guidelines. Note Richard Fontana's suggestion for trivial files at https://github.com/fsfe/reuse-docs/issues/62#issuecomment-1200305896 (and my personal opinion below it).
So in conclusion my personal suggestions are: * mark the repository even more clearly as being under OSI-approved license Unicode v3; * keep marking the individual files copyright status, and consider even more permissive licenses like MIT-0 (or 0BSD or CC-0) when adding uncopyrightable files; * keep in mind possible copyright needs for Wikidata and Wikimedia Commons in the future, and ask help from WMF legal (legal@wikimedia.org) on any possible/needed clarifications for CC-0 and CC BY-SA compatibility (fyi @slaporte).
----
Cheers, Federico
Federico, Hugo,
Good to hear from you. I’m cc’ing Anne in here from the Unicode counsel side.
I can reply on a couple of technical points.
- As far as I know, WMF remains a member.
- There is a new SPDX identifier in progress for the v3 license, so that will be rolled out when available.
- Please take a look at the wording at the bottom of the README.md on https://github.com/unicode-org/icu4x which was written to address some of the concerns about the openness of the license. See if it is helpful, perhaps (to Anne) that is a good reason to roll that wording to all repositories.
Regards, and happy 2024, Steven
El El sáb, dic. 30, 2023 a la(s) 4:14 a.m., Federico Leva (Nemo) <nemowiki@gmail.com mailto:nemowiki@gmail.com> escribió:
I was pinged on a Unicode repository https://github.com/unicode-org/unilex/issues/10#issuecomment-1872496490 asking for a WMF perspective on license compatibility. I gave my personal answer but I'm notifying the list in case someone does want to answer in name of WMF as Unicode member/user. (Also cc'ing Hugo, Stephen and Richard as I mentioned them.)
As my answer turned out to be rather long I'll copy it here for the archives' benefit.
@srl295 Thanks for the ping. I wasn't aware of this issue but I'll give a quick reply. I've only read the discussion above and the README. I can't speak for WMF, let alone Unicode (I don't remember whether WMF is even a member now), but I can tell about the usage of Unicode components in MediaWiki software and Wikimedia wikis.
The issue description highlights some confusion on the licensing of this project. Meanwhile the LICENSE has been updated to the Unicode license v3 which has been recently approved by OSI on 2023-11-17: https://opensource.org/license/unicode-license-v3/ . So there's no doubt this repository is opensource. Maybe this can be explicitly mentioned on the README, as not everyone is able to recognize the license text as its own OSI-approved Unicode v3 license.
MediaWiki can and does use software under Unicode license all the time, for example in the [CLDR extension](https://www.mediawiki.org/wiki/Extension:CLDR), which is primarily GPLv2, under the understanding that the CLDR data inside was under a BSD-like license. (Apertium linguistic data is also [usually](https://wiki.apertium.org/wiki/Contributing_to_an_existing_pair#Consider_con...) under GPL.) As long as Unilex can be used in GPL software, there are probably ways it can benefit all Wikimedia wikis through MediaWiki.
However @hugolpz seems most concerned about usage in Wikidata and other Wikimedia wikis _content_. From the README it sounds like this repository mostly wants to collect uncopyrightable factual information. In the EU, there might still be problems with database rights. A general opinion from the WMF on how to handle these is at https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights . In short, it's complicated, and it's easier to incorporate a dataset into Wikidata when it's already under CC-0. If there's some doubt on whether the data/ directory here as a whole is a dataset
If you want to cooperate with Wikidata lexemes in the future, it's worth considering how to make it easier. As for LinguaLibre, as far I understand it helps produce some recording which might be considered copyrightable, and it wants its outputs to be available under CC BY-SA, so it benefits from its sources being as permissive as possible.
Finally, I see that [many files](https://github.com/search?q=repo%3Aunicode-org%2Funilex+SPDX-License-Identif...) carry a `SPDX-License-Identifier: Unicode-DFS-2016` header, which makes it easier to follow the [Reuse](https://reuse.software/) guidelines. Note Richard Fontana's suggestion for trivial files at https://github.com/fsfe/reuse-docs/issues/62#issuecomment-1200305896 (and my personal opinion below it).
So in conclusion my personal suggestions are:
- mark the repository even more clearly as being under OSI-approved
license Unicode v3;
- keep marking the individual files copyright status, and consider even
more permissive licenses like MIT-0 (or 0BSD or CC-0) when adding uncopyrightable files;
- keep in mind possible copyright needs for Wikidata and Wikimedia
Commons in the future, and ask help from WMF legal (legal@wikimedia.org mailto:legal@wikimedia.org) on any possible/needed clarifications for CC-0 and CC BY-SA compatibility (fyi @slaporte).
Cheers, Federico
Hello,
I cannot tell if there is a question here for me that I need to answer. If there is, can someone tease it out and clarify it for me?
Also, I don't see WMF on our list of members https://home.unicode.org/membership/members/.
Anne
On Sat, Dec 30, 2023 at 8:00 AM Steven R. Loomis srl295@gmail.com wrote:
Federico, Hugo,
Good to hear from you. I’m cc’ing Anne in here from the Unicode counsel side.
I can reply on a couple of technical points.
As far as I know, WMF remains a member.
There is a new SPDX identifier in progress for the v3 license, so that
will be rolled out when available.
- Please take a look at the wording at the bottom of the README.md on
https://github.com/unicode-org/icu4x which was written to address some of the concerns about the openness of the license. See if it is helpful, perhaps (to Anne) that is a good reason to roll that wording to all repositories.
Regards, and happy 2024, Steven
El El sáb, dic. 30, 2023 a la(s) 4:14 a.m., Federico Leva (Nemo) < nemowiki@gmail.com> escribió:
I was pinged on a Unicode repository https://github.com/unicode-org/unilex/issues/10#issuecomment-1872496490 asking for a WMF perspective on license compatibility. I gave my personal answer but I'm notifying the list in case someone does want to answer in name of WMF as Unicode member/user. (Also cc'ing Hugo, Stephen and Richard as I mentioned them.)
As my answer turned out to be rather long I'll copy it here for the archives' benefit.
@srl295 Thanks for the ping. I wasn't aware of this issue but I'll give a quick reply. I've only read the discussion above and the README. I can't speak for WMF, let alone Unicode (I don't remember whether WMF is even a member now), but I can tell about the usage of Unicode components in MediaWiki software and Wikimedia wikis.
The issue description highlights some confusion on the licensing of this project. Meanwhile the LICENSE has been updated to the Unicode license v3 which has been recently approved by OSI on 2023-11-17: https://opensource.org/license/unicode-license-v3/ . So there's no doubt this repository is opensource. Maybe this can be explicitly mentioned on the README, as not everyone is able to recognize the license text as its own OSI-approved Unicode v3 license.
MediaWiki can and does use software under Unicode license all the time, for example in the [CLDR extension](https://www.mediawiki.org/wiki/Extension:CLDR), which is primarily GPLv2, under the understanding that the CLDR data inside was under a BSD-like license. (Apertium linguistic data is also [usually]( https://wiki.apertium.org/wiki/Contributing_to_an_existing_pair#Consider_con...)
under GPL.) As long as Unilex can be used in GPL software, there are probably ways it can benefit all Wikimedia wikis through MediaWiki.
However @hugolpz seems most concerned about usage in Wikidata and other Wikimedia wikis _content_. From the README it sounds like this repository mostly wants to collect uncopyrightable factual information. In the EU, there might still be problems with database rights. A general opinion from the WMF on how to handle these is at https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights . In short, it's complicated, and it's easier to incorporate a dataset into Wikidata when it's already under CC-0. If there's some doubt on whether the data/ directory here as a whole is a dataset
If you want to cooperate with Wikidata lexemes in the future, it's worth considering how to make it easier. As for LinguaLibre, as far I understand it helps produce some recording which might be considered copyrightable, and it wants its outputs to be available under CC BY-SA, so it benefits from its sources being as permissive as possible.
Finally, I see that [many files]( https://github.com/search?q=repo%3Aunicode-org%2Funilex+SPDX-License-Identif...)
carry a `SPDX-License-Identifier: Unicode-DFS-2016` header, which makes it easier to follow the [Reuse](https://reuse.software/) guidelines. Note Richard Fontana's suggestion for trivial files at https://github.com/fsfe/reuse-docs/issues/62#issuecomment-1200305896 (and my personal opinion below it).
So in conclusion my personal suggestions are:
- mark the repository even more clearly as being under OSI-approved
license Unicode v3;
- keep marking the individual files copyright status, and consider even
more permissive licenses like MIT-0 (or 0BSD or CC-0) when adding uncopyrightable files;
- keep in mind possible copyright needs for Wikidata and Wikimedia
Commons in the future, and ask help from WMF legal (legal@wikimedia.org) on any possible/needed clarifications for CC-0 and CC BY-SA compatibility (fyi @slaporte).
Cheers, Federico
mediawiki-i18n@lists.wikimedia.org