[Wikimedia-l] Re: Bing-ChatGPT

19 Mar 2023


      On Sun, Mar 19, 2023 at 02:48:12AM -0700, Lauren Worden wrote:
...
They have, and LLMs absolutely do encode a verbatim copy of their
training data, which can be produced intact with little effort.
...
https://arxiv.org/pdf/2205.10770.pdf
https://bair.berkeley.edu/blog/2020/12/20/lmmem/
My understanding so far is that encoding a verbatim copy is typically due to 'Overfitting'.
This is considered a type of bug. It is undesirable for many reasons
(technical, ethical, legal).
Models are (supposed to be) trained to prevent this as much as possible.
Clearly there was still work to be done in dec 2020 at the least.
sincerely,
    Kim Bruning

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Wikimedia-l] Re: Bing-ChatGPT