Hi Tilman, I appreciate your detailed and thoughtful response.
Are you suggesting that we may never be able to attribute information
from LLMs better than as in the Anthropic paper? What is your opinion
of different approaches such as
https://arxiv.org/abs/2303.14186 ?
Relatedly, I wonder whether you agree that the first paragraph of the
Background and Related Work section on page 2 of
https://arxiv.org/pdf/2205.10770.pdf and
https://bair.berkeley.edu/blog/2020/12/20/lmmem/ suggest that LLMs
encode the rote memorization of their training data? If so, do you
believe that has any implications concerning the feasibility of
attribution? And perhaps more to your point for reviving this thread,
what about copyright law does it imply to you?
we can't enforce citing sources - [[WP:BURDEN]] -
as a legal requirement like we do for [[WP:COPYPASTE]]). This should be kept in mind by
folks who advocate for a moral or even legal obligation for LLMs to "cite their
sources'' for their output (like earlier in this thread: "just require an
open disclosure of where the bot pulled from and how much").
Do you believe that just because we can't force people to do what we
want, that we shouldn't ask them to, or bargain with options available
to us if they decline?
As for the Foundation's ChatGPT plugin, I'm afraid I find it mostly
unusable because it ignores everything after the first dozen
paragraphs of all articles. That was listed as needing 3-4 days to fix
on
https://phabricator.wikimedia.org/T343932 a month ago. Do you know
whether there are any plans to go ahead with that fix?
- LW
On Thu, Sep 7, 2023 at 2:09 AM Tilman Bayer <haebwiki(a)gmail.com> wrote:
>
> TL;DR: It was previously claimed on this list that it's generally technically
possible to attribute information in the output of a LLM-based chatbot (such as ChatGPT)
to specific parts of the LLM's training data (such as a Wikipedia article). These
claims are dubious and we shouldn't rely on them as we continue to navigate the
relations between Wikimedia projects and LLMs.
>
> On Sun, Mar 19, 2023 at 12:12 PM Lauren Worden <laurenworden89(a)gmail.com>
wrote:
> [...]
>>
>>
>> On Sun, Mar 19, 2023 at 1:20 AM Kimmo Virtanen
>> <kimmo.virtanen(a)wikimedia.fi> wrote:
>> >
>> >> Or, maybe just require an open disclosure of where the bot pulled from
and how much, instead of having it be a black box? "Text in this response derived
from: 17% Wikipedia article 'Example', 12% Wikipedia article
'SomeOtherThing', 10%...".
>> >
>> > Current (ie. ChatGPT) systems doesn't work that way, as the source of
information is lost in the process when the information is encoded into the model....
>>
>> In fact, they do work that way, but it takes some effort to elucidate
>> the source of any given output. Anyone discussing these issues needs
>> to become familiar with ROME:
>>
https://twitter.com/mengk20/status/1588581237345595394 Please see also
>>
https://www.youtube.com/watch?v=_NMQyOu2HTo
>>
> I sense some confusion here. That paper (ROME,
http://rome.baulab.info/ ) is about
attributing a model's factual claims to specific parts (weights, neurons) of its
neural network (and then changing them). It is *not* about attribution to specific parts
of its training data (such as Wikipedia articles or other web pages), which is what
Wikimedians have been expressing concerns about.
> In other words, it's entirely unclear why this should contradict what Kimmo had
said (and, separately in this thread, Galder).
>
> (Trying to understand LLMs with analogies can be treacherous. But for people who
automatically assume that neural networks "do work that way" - i.e. preserve
this kind of provenance information - and that chatbots can be required to disclose
"where [they] pulled from and how much" for a particular answer: Imagine someone
accosting you in the street and asking you where you had originally learned that Paris is
the capital of France, say. How many of us would be able to come up with a truthful answer
like "our geography teacher told us in third grade" or "I read this in
Encyclopaedia Britannica when I was 10 years old"?)
>
>> With luck we will all have the chance to discuss these issues in
>> detail on the March 23 Zoom discussion of large language models for
>> Wikimedia projects:
>>
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/…
>>
> The notes from that meeting (now at
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/…
) contain the following statements:
>
>
> "In an ideal world, the Foundation would start internal projects to replicate
ROME and RARR."
> "The Foundation should make a public statement in support of increasing the
accuracy of attribution and verification systems such as RARR [
https://arxiv.org/abs/2210.08726 ]"
>
>
> These proposals do not seem to have made it into the WMF's actual annual plan in
the end. And I realize that this thread is already a couple of months old. However, it
still seems worth resolving misconceptions in this regard, e.g. because there have been
references to such claims more recently in other community discussion spaces.
>
> Regarding RARR (the second project proposed in that meeting and here on this list, as
something that WMF should replicate or embrace):
> RARR is indeed designed to find *a* text document supporting a given statement
produced by an LLM. But importantly, it makes no claims that the source it finds was
"the" original source used by the LLM. The first "R" in
"RARR" stands for "Retrofit[ting]" attribution - not for
"restoring", "retrieving" or such. (In fact, RARR doesn't even try
to find a source in the model's training corpus. It simply does a Google search of the
entire internet, see section 3.1 in the paper.) In other words, it too won't
"elucidate *the* source of any given output" as claimed above (my bolding).
> This is particularly relevant in light of the fact that e.g. on English Wikipedia we
generally require information to be attributable to reliable external sources. So even if
a chatbot's statement was indeed based on a Wikipedia article, but that Wikipedia
article cited the New York Times for this information, RARR might very well pick the NYT
article as source instead of Wikipedia.
>
> A larger issue here is that individual facts are not owned by any company or
community. Specifically, they are not copyrightable (as Wikipedians are well aware from
their daily practice: we can't enforce citing sources - [[WP:BURDEN]] - as a legal
requirement like we do for [[WP:COPYPASTE]]). This should be kept in mind by folks who
advocate for a moral or even legal obligation for LLMs to "cite their
sources'' for their output (like earlier in this thread: "just require an
open disclosure of where the bot pulled from and how much").
>
> Back to the technical difficulties and claims that machine learning models "do
work that way":
> Folks may also be interested in a general overview paper titled "Training Data
Influence Analysis and Estimation: A Survey" (
https://arxiv.org/abs/2212.04612 ). It
says e.g. that
>
> "it can be very difficult to answer even basic questions about the relationship
between training data and model predictions; for example: [...] Which instances in the
training set caused the model to make a specific prediction?"
>
>
> Now all that said, some weeks ago, Anthropic (a startup focused on responsible use of
AI, which is researching interpretability of LLMs) released a new research paper that
actually tries to tackle this very difficult question in case of LLMs, and do something
like the kind of attribution we are concerned with here:
>
> "Large language models have demonstrated a surprising range of skills and
behaviors. How can we trace their source? In our new paper, we use influence functions to
find training examples that contribute to a given model output. [...]"
(
https://threadreaderapp.com/thread/1688946685937090560.html )
>
> It looks like really interesting cutting-edge research. (They used some advanced
approximation techniques to make the required calculations feasible in case of some LLMs
that are however still much smaller than e.g. GPT 3.5 or whatever the version of ChatGPT
you use is based on.) If someone with access to the required huge compute resources and
technical skills would apply the methods described in the paper
(
https://arxiv.org/abs/2308.03296 ) to specifically investigate the case of Wikipedia,
that could be fascinating. (There's also an upcoming conference soliciting such
research:
https://attrib-workshop.cc/ .)
>
> But before anyone gets too excited: This is a statistical approach focused on
generating estimates of influence ratios only. And from the concrete examples Anthropic
shares, it seems that the relation between source and output is typically much more
diffuse and tenuous than simplistic "AI steals from Wikipedia!!1!" type
arguments would let you believe. (That's even true for their "simple factual
queries" category - see figure 42 in the paper, for example: "Prompt: Inflation
is often measured using / Completion: the Consumer Price Index." Table 9 in the
appendix describes the sequences from the training data that were found to be most
influential for the answer in one of the examined LLMs. It observes that most of these
source texts don't actually contain the term "consumer price index",
contrary to what one might expect.)
> The Anthropic authors also state generally that:
>
> "Model outputs do not seem to result from pure memorization [...] the influence
of any particular training sequence is much smaller than the information content of a
typical sentence, so the model does not appear to be reciting individual training examples
at the token level." (
https://www.anthropic.com/index/influence-functions )
>
>
> Regards, Tilman
>
> PS 1: Of course there are still ways to make an LLM-based chatbot actually cite
sources, if one is prepared to restrict the kind of answers it can give. Bing Chat
actually does this by default (or at least tries to), as opposed to ChatGPT, retrieving
live sources at question time. Specifically regarding Wikipedia, one can prompt ChatGPT or
other LLMs to only answer based on Wikipedia content, and hope that it complies without
hallucinating. (I summarized three such approaches, including the Wikimedia
Foundation's ChatGPT plugin, here:
https://meta.wikimedia.org/wiki/Research:Newsletter/2023/July .) "Retrieval-augmented
generation" (RAG) is a good search term for learning more about similar approaches.
>
> PS 2: All this is separate from questions about the overall influence of Wikipedia
(or other parts of an LLM's training data) on the general performance of e.g. ChatGPT,
with regard to its average factual accuracy, biases etc. The answers there are also much
less clear than some appear to assume, but that's a topic for another post.
>
>
>
>
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
> To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org