On Thu, Mar 30, 2023 at 4:28 AM Erik Moeller eloquence@gmail.com wrote:
If you want to impose _additional restrictions_ on a person for stuff they download from you, that actually requires proactive agreement from the user to those restrictions at the time they download the thing.
If you don't obtain this agreement, you cannot meaningfully enforce the "license" because the downloader never agreed to it in the first place. Moreover, you'll have to make sure that _everyone else making copies of the file_ also obtains agreement from people getting those copies, or your whole house of cards falls down.
Isn't that exactly how we impose attribution and share-alike requirements of CC-BY-SA content?
On Thu, Mar 30, 2023 at 4:25 AM Kimmo Virtanen kimmo.virtanen@wikimedia.fi wrote:
To generate or disseminate information or content, in any context (e.g. posts, articles, tweets, chatbots or other kinds of automated bots) without expressly and intelligibly disclaiming that the text is machine generated
This makes it useless in most content-related use cases as it requires too much extra text to use the results.
I guess that the General Disclaimer could serve to fulfill that requirement.
About FOSS compatible LLMs, EleutherAI's GPT-J, NeoX, and Pythia and Cerebras-GPT are under Apache 2.0. The question is whether these models are good enough to be useful. However, the same question is relevant to Bloom too.
I have no particular affinity to BLOOM, but I have been able to personally test that it is capable of at least a dozen different use cases that people have shown GPT-3 and ChatGPT can be used for on enwiki. My promotion of leveraging it is for the strictly utilitarian purpose of providing an infrastructure to work on the problems which seem to have the greatest risk to project content if not addressed.
I would prefer a more widely multilingual model trained on all of the Foundation content suitable for that purpose, but training such models is a much more expensive proposition than merely using them.
-LW