[Wikimedia-l] Re: Chat GPT

31 Dec 2022

Of relevance to this conversation:

https://www.wired.com/story/large-language-models-artificial-intelligence/

On Fri, Dec 30, 2022 at 9:32 AM Neurodivergent Netizen <
idoh.idreamofhorses(a)gmail.com&gt; wrote:

...
  One concern I have is that all “oldbies” like myself
have all seen bots
 basically decay after whomever is maintaining goes inactive. Of course,
 this could be mostly rectified by having the AI be open source. This leaves
 the “people” aspect; that is, not only does the AI need to be maintained,
 but interest needs to be maintained as well.

 From,
 I dream of horses
 She/her

 On Dec 30, 2022, at 8:53 AM, Victoria Coleman &lt;vstavridoucoleman(a)gmail.com&gt;
 wrote:

 Anne,

 Interestingly enough what these large companies have to spend a ton of
 money on is creating and moderating content. In other words people.
 Passionate volunteers in large numbers is what the movement has in
 abundance. Imagine the power of combining the talents and passion of our
 community members with the advances offered by AI today. I was struck
 recently during a visit to NVIDIA how language models have changed. Back in
 my day, we would have to build one language model per domain and then load
 it in to the device, a computer or a phone, to  use. Now they have one
 massive combined language model in a data center full of their GPUs which
 is there so long as you are connected. My sense is that within the guard
 rails offered by our volunteer community, we could use AI to force multiply
 their efforts and make knowledge even more accessible than it is today.
 Both for those who create and record knowledge as well as those who consume
 it. In the case of Chat GPT, our volunteers could use supervised learning
 for example to narrow down the mistakes the bot makes - which should be
 many fewer that the Open AI version since the Wikipedia version would be
 trained on good, clean Wikipedia content which is constantly reviewed by
 the community.

 Best regards,

 Victoria Coleman

 On Dec 30, 2022, at 12:21 AM, Risker &lt;risker.wp(a)gmail.com&gt; wrote:

 Given what we already know about AI-like projects (think Siri, Alexis,
 etc), they're the result of work done by organizations utilizing resources
 hundreds of times greater than the resources within the entire Wikimedia
 movement, and they'renot all that good if we're being honest.  They're
 entirely dependent on existing resources.  We have seen time and again how
 easily they can be led astray; ChatGPT is just the most recent example.  It
 is full of misinformation.  Other efforts have resulted in the AI becoming
 radicalized.  Again, it's all about what sources the AI project uses in
 developing its responses, and those underlying sources are generally
 completely unknown to the person asking for the information.

 Ironically, our volunteers have created software that learns pretty
 effectively (ORES, several anti-vandalism "bots").  The tough part is
 ensuring that there is continued, long-term support for these volunteer-led
 efforts, and the ability to make them effective on projects using other
 languages. We've had bots making translations of formulaic articles from
 one language to another for years; again, they depend on volunteers who can
 maintain and support those bots, and ensure continued quality of
 translation.

 AI development is tough. It is monumentally expensive. Big players have
 invested billions USD trying to develop working AI, with some of the most
 talented programmers and developers in the world, and they're barely
 scratching the surface.  I don't see this as a priority for the Wikimedia
 movement, which achieves considerably higher quality with volunteers
 following a fairly simple rule set that the volunteers themselves develop
 based on tried and tested knowledge.  Let's let those with lots of money
 keep working to develop something that is useful, and then we can start
 seeing if it can become feasible for our use.

  I envision the AI industry being similar to the computer hardware
 industry. My first computer cost about the same (in 2022 dollars) as the
 four computers and all their peripherals that I have within my reach as I
 write this, and had less than 1% of the computing power of each of
 them.[1]  The cost will go down once the technology gets better and more
 stable.

 Risker/Anne

 [1] Comparison of 1990 to 2022 dollars.

 On Fri, 30 Dec 2022 at 01:40, Yaroslav Blanter &lt;ymbalt(a)gmail.com&gt; wrote:

  Hi,

 just to remark that it superficially looks like a great tool for small
 language Wikipedias (for which the translation tool is typically not
 available). One can train the tool in some less common language using the
 dictionary and some texts, and then let it fill the project with a
 thousands of articles. (As an aside, in fact, one probably can train it to
 the soon-to-be-extint languages and save them until the moment there is any
 interest for revival, but nobody seems to be interested). However, there is
 a high potential for abuse, as I can imagine people not speaking the
 language running the tool and creating thousands of substandard articles -
 we have seen this done manually, and I would be very cautious allowing this.

 Best
 Yaroslav

 On Fri, Dec 30, 2022 at 4:57 AM Raymond Leonard <
 raymond.f.leonard.jr(a)gmail.com&gt; wrote:

  As a friend wrote on a Slack thread about the
topic, "ChatGPT can
 produce results that appear stunningly intelligent, and there are things
 that I’ve seen that really leave me scratching my head- “how on Earth
 did it DO that?!?”  But it’s important to remember that it isn’t actually
 intelligent.  It’s not “thinking.”  It’s more of a glorified version of
 autosuggest.  When it apologizes, it’s not really apologizing, it’s just
 finding text that fits the self description it was fed and that looks
 related to what you fed it."

 The person initiating the thread had asked ChatGPT "What are the 5
 biggest intentional communities on each continent?" (As an aside, this
 was as challenging as the question that led to Wikidata, "What are the ten
 largest cities in the world that have women mayors?") One of the answers
 ChatGPT gave for Europe was "Ikaria (Greece)". As near as I can determine,
 there is no intentional community of any size in Ikaria. However, the
 Icarians <https://en.wikipedia.org/wiki/Icarians> were a 19th-century
 intentional community in the US founded by French expatriates. It was named
 after a utopian novel, *Voyage en Icarie*, that was written by Étienne
 Cabet. He chose the Greek island of Icaria as the setting of his utopian
 vision. Interesting that ChatGPT may have conflated these.

 It seems that given a prompt, ChatGPT shuffles & regurgitates facts.
 Just as a card dealer deals a good hand, sometimes ChatGPT seems to make
 sense, but I think at present it really is " a glorified version of
 autosuggest."

 Yours
 Peaceray

 On Thu, Dec 29, 2022 at 6:39 PM Gnangarra &lt;gnangarra(a)gmail.com&gt; wrote:

  I think the simplest answer is yes its an
artificial writer but its not
 intelligence as the name implies but rather just a piece of software that
 gives answers according to the methodology of that software. The garbage in
 garbage out format, it can never be better than the programmers behind the
 machine

 On Fri, 30 Dec 2022 at 09:56, Victoria Coleman <
 vstavridoucoleman(a)gmail.com&gt; wrote:

> Thank you Ziko and Steven for the thoughtful responses.
>
> My sense is that for a class for readers having a generative UI that
> returns an answer VS an article would be useful. It would probably put
> Quora out of business. :-)
>
> If the models are not open source, this indeed would require
> developing our own models. For that kind of investment, we would probably
> want to have more application areas. Translation being one that Ziko
> already pointed out but also summarization. These kinds of Information
> retrieval queries would effectively index into specific parts of an article
> vs returning the whole thing.
>
> Wikipedia as we all know is not perfect but it’s about the best you
> can get with the thousands of editors and reviewers doing quality control.
> If a bot was exclusively trained on Wikipedia, my guess is that the
> falsehood generation would be as minimal as it can get. Garbage in garbage
> out in all these models. Good stuff in good stuff out. I guess the
> falsehoods can also come when no material exists in the model. So instead
> of making stuff up, they could default to “I don’t know the answer to
> that”. Or in our case, we could add the topic to the list of article
> suggestions to editors…
>
> I know I am almost day dreaming here but I can’t help but think that
> all the recent advances in AI could create significantly broader free
> knowledge pathways for every human being. And I don’t see us getting after
> them aggressively enough…
>
> Best regards,
>
> Victoria Coleman
>
> On Dec 29, 2022, at 5:17 PM, Steven Walling &lt;steven.walling(a)gmail.com&gt;
> wrote:
>
> 
>
>
> On Thu, Dec 29, 2022 at 4:09 PM Victoria Coleman <
> vstavridoucoleman(a)gmail.com&gt; wrote:
>
>> Hi everyone. I have seen some of the reactions to the narratives
>> generated by Chat GPT. There is an obvious question (to me at least) as to
>> whether a Wikipedia chat bot would be a legitimate UI for some users. To
>> that end, I would have hoped that it would have been developed by the WMF
>> but the Foundation has historically massively underinvested in AI. That
>> said, and assuming that GPT Open source licensing is compatible with the
>> movement norms, should the WMF include that UI in the product?
>
>
> This is a cool idea but what would the goals of developing a
> Wikipedia-specific generative AI be? IMO it would be nice to have a natural
> language search right in Wikipedia that could return factual answers not
> just links to our (often too long) articles.
>
> OpenAI models aren’t open source btw. Some of the products are free to
> use right now, but their business model is to charge for API use etc. so
> including it directly in Wikipedia is pretty much a non-starter.
>
> My other question is around the corpus that Open AI is using to train
>> the bot. It is creating very fluid narratives that are massively false in
>> many cases. Are they training on Wikipedia? Something else?
>
>
> They’re almost certainly using Wikipedia. The answer from ChatGPT is:
>
> “ChatGPT is a chatbot model developed by OpenAI. It was trained on a
> dataset of human-generated text, including data from a variety of sources
> such as books, articles, and websites. It is possible that some of the data
> used to train ChatGPT may have come from Wikipedia, as Wikipedia is a
> widely-used source of information and is likely to be included in many
> datasets of human-generated text.”
>
> And to my earlier question, if GPT were to be trained on Wikipedia
>> exclusively would that help abate the false narratives
>
>
> Who knows but we would have to develop our own models to test this
> idea.
>
>>
> This is a significant matter for the  community and seeing us step to
>> it would be very encouraging.
>>
>> Best regards,
>>
>> Victoria Coleman
>> _______________________________________________
>> Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org,
>> guidelines at:
>> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>> Public archives at
>>
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
>> To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org
>>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org,
> guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
>
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
> To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org,
> guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
>
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
> To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org

 --
 Boodarwun
 Gnangarra
 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar
 koortaboodjar'

 _______________________________________________
 Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org,
 guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 and https://meta.wikimedia.org/wiki/Wikimedia-l
 Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
 To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org 
 _______________________________________________
 Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines
 at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
 https://meta.wikimedia.org/wiki/Wikimedia-l
 Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
 To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org 
 _______________________________________________
 Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines
 at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
 https://meta.wikimedia.org/wiki/Wikimedia-l
 Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
 To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org 
 _______________________________________________
 Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines
 at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
 https://meta.wikimedia.org/wiki/Wikimedia-l
 Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
 To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org

 _______________________________________________
 Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines
 at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
 https://meta.wikimedia.org/wiki/Wikimedia-l
 Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
 To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org

 _______________________________________________
 Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines
 at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
 https://meta.wikimedia.org/wiki/Wikimedia-l
 Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
 To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Wikimedia-l] Re: Chat GPT