On Sat, Mar 18, 2023 at 3:49 PM Erik Moeller eloquence@gmail.com wrote:
On Fri, Mar 17, 2023 at 7:05 PM Steven Walling steven.walling@gmail.com wrote:
IANAL of course, but to me this implies that responsibility for the
*egregious* lack
of attribution in models that rely substantially on Wikipedia is
violating the Attribution
requirements of CC licenses.
Morally, I agree that companies like OpenAI would do well to recognize and nurture the sources they rely upon in training their models. Especially as the web becomes polluted with low quality AI-generated content, it would seem in everybody's best interest to sustain the communities and services that make and keep high quality information available. Not just Wikimedia, but also the Internet Archive, open access journals and preprint servers, etc.
Legally, it seems a lot murkier. OpenAI in particular does not distribute any of its GPT models. You can feed them prompts by various means, and get responses back. Do those responses plagiarize Wikipedia?
With image-generating models like Stable Diffusion, it's been found that the models sometimes generate output nearly indistinguishable from source material [1]. I don't know if similar studies have been undertaken for text-generating models yet. You can certainly ask GPT-4 to generate something that looks like a Wikipedia article -- here are example results for generating a random Wikipedia article:
Article: https://en.wikipedia.org/wiki/The_Talented_Mr._Ripley_(film) GPT-4 https://en.wikipedia.org/wiki/The_Talented_Mr._Ripley_(film)GPT-4 run 1: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/1 (cut off at the ChatGPT generation limit) GPT-4 run 2: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/2 GPT-4 https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/2GPT-4 run 3: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/3
It imitates the form of a Wikipedia article & mixes up / makes up assertions, but I don't know that any of its generations would meet the standard of infringing on the Wikipedia article's copyright. IANAL either, and as you say, the legal landscape is evolving rapidly.
Warmly, Erik
The whole thing is definitely a hot mess. If the remixing/transformation by the model is a derivative work, it means OpenAI is potentially violating the ShareAlike requirement by not distributing the text output as CC. But on other hand the nature of the model means they’re combining CC and non free works freely / at random, unless a court would interpret whatever % of training data comes from us as the direct degree to which the model output is derived from Wikipedia. Either way it’s going to be up to some legal representation of copyright holders to test the boundaries here.
[1] https://arstechnica.com/information-technology/2023/02/researchers-extract-t... _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org