I was pinged on a Unicode repository
asking for a WMF perspective on license compatibility. I gave my
personal answer but I'm notifying the list in case someone does want to
answer in name of WMF as Unicode member/user. (Also cc'ing Hugo, Stephen
and Richard as I mentioned them.)
As my answer turned out to be rather long I'll copy it here for the
@srl295 Thanks for the ping. I wasn't aware of this issue but I'll give
a quick reply. I've only read the discussion above and the README. I
can't speak for WMF, let alone Unicode (I don't remember whether WMF is
even a member now), but I can tell about the usage of Unicode components
in MediaWiki software and Wikimedia wikis.
The issue description highlights some confusion on the licensing of this
project. Meanwhile the LICENSE has been updated to the Unicode license
v3 which has been recently approved by OSI on 2023-11-17:
https://opensource.org/license/unicode-license-v3/ . So there's no doubt
this repository is opensource. Maybe this can be explicitly mentioned on
the README, as not everyone is able to recognize the license text as its
own OSI-approved Unicode v3 license.
MediaWiki can and does use software under Unicode license all the time,
for example in the [CLDR
extension](https://www.mediawiki.org/wiki/Extension:CLDR), which is
primarily GPLv2, under the understanding that the CLDR data inside was
under a BSD-like license. (Apertium linguistic data is also
under GPL.) As long as Unilex can be used in GPL software, there are
probably ways it can benefit all Wikimedia wikis through MediaWiki.
However @hugolpz seems most concerned about usage in Wikidata and other
Wikimedia wikis _content_. From the README it sounds like this
repository mostly wants to collect uncopyrightable factual information.
In the EU, there might still be problems with database rights. A general
opinion from the WMF on how to handle these is at
https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights . In short,
it's complicated, and it's easier to incorporate a dataset into Wikidata
when it's already under CC-0. If there's some doubt on whether the data/
directory here as a whole is a dataset
If you want to cooperate with Wikidata lexemes in the future, it's worth
considering how to make it easier. As for LinguaLibre, as far I
understand it helps produce some recording which might be considered
copyrightable, and it wants its outputs to be available under CC BY-SA,
so it benefits from its sources being as permissive as possible.
Finally, I see that [many
carry a `SPDX-License-Identifier: Unicode-DFS-2016` header, which makes
it easier to follow the [Reuse](https://reuse.software/) guidelines.
Note Richard Fontana's suggestion for trivial files at
(and my personal opinion below it).
So in conclusion my personal suggestions are:
* mark the repository even more clearly as being under OSI-approved
license Unicode v3;
* keep marking the individual files copyright status, and consider even
more permissive licenses like MIT-0 (or 0BSD or CC-0) when adding
* keep in mind possible copyright needs for Wikidata and Wikimedia
Commons in the future, and ask help from WMF legal (legal(a)wikimedia.org)
on any possible/needed clarifications for CC-0 and CC BY-SA
compatibility (fyi @slaporte).