Hi,

>> My understanding is that is not proprietary, and the only reason it doesn't qualify for Open Source Initiative approval is because of these use restrictions:

> To generate or disseminate information or content, in any context (e.g. posts, articles, tweets, chatbots or other kinds of automated bots) without expressly

> and intelligibly disclaiming that the text is machine generated

This makes it useless in most content-related use cases as it requires too much extra text to use the results.

About FOSS compatible LLMs, EleutherAI's GPT-J, NeoX, and Pythia and Cerebras-GPT are under Apache 2.0. The question is whether these models are good enough to be useful. However, the same question is relevant to Bloom too.

Br,

-- Kimmo Virtanen, Zache

On Thu, Mar 30, 2023 at 3:34 AM Lauren Worden <laurenworden89@gmail.com> wrote:

On Wed, Mar 29, 2023 at 1:50 PM Jan Ainali <jan@aina.li> wrote:
>
> I think it is important to, as early as possible, deter all these attempts to weaken the concept of "open" and that we as a movement need to take a hard stance against them.
> These proprietary licenses do not fit the spirit of sharing all knowledge and letting anyone do whatever they want with it.

Is the BLOOM RAIL license [
https://huggingface.co/spaces/bigscience/license ] proprietary? My
understanding is that is not proprietary, and the only reason it
doesn't qualify for Open Source Initiative approval is because of
these use restrictions:

"You agree not to use the Model or Derivatives of the Model:
(a) In any way that violates any applicable national, federal, state,
local or international law or regulation;
(b) For the purpose of exploiting, harming or attempting to exploit or
harm minors in any way;
(c) To generate or disseminate verifiably false information with the
purpose of harming others;
(d) To generate or disseminate personal identifiable information that
can be used to harm an individual;
(e) To generate or disseminate information or content, in any context
(e.g. posts, articles, tweets, chatbots or other kinds of automated
bots) without expressly and intelligibly disclaiming that the text is
machine generated;
(f) To defame, disparage or otherwise harass others;
(g) To impersonate or attempt to impersonate others;
(h) For fully automated decision making that adversely impacts an
individual’s legal rights or otherwise creates or modifies a binding,
enforceable obligation;
(i) For any use intended to or which has the effect of discriminating
against or harming individuals or groups based on online or offline
social behavior or known or predicted personal or personality
characteristics
(j) To exploit any of the vulnerabilities of a specific group of
persons based on their age, social, physical or mental
characteristics, in order to materially distort the behavior of a
person pertaining to that group in a manner that causes or is likely
to cause that person or another person physical or psychological harm;
(k) For any use intended to or which has the effect of discriminating
against individuals or groups based on legally protected
characteristics or categories;
(l) To provide medical advice and medical results interpretation;
(m) To generate or disseminate information for the purpose to be used
for administration of justice, law enforcement, immigration or asylum
processes, such as predicting an individual will commit fraud/crime
commitment (e.g. by text profiling, drawing causal relationships
between assertions made in documents, indiscriminate and
arbitrarily-targeted use)."

Those restrictions seem very reasonable to me, and I would consider
them an advantage given the problems the field is experiencing,
including the threats to project content integrity. I don't see any
drawbacks, and I see several advantages to encouraging such
restrictions.

So I expect the BLOOM license would therefor qualify for an exception
as described in
https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use

There is further discussion of these issues at
https://arxiv.org/pdf/2011.03116.pdf

-LW
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/L6DTD5QQWJPZVXDMT4L5NVFWCZKPLXJD/
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org