ChatGPT and Wikipedia

List overview All Threads
Download

newer

older

Wikimedia Taiwan’s statement on...

Mozilla's social media pledge

Anders Wennersten

10 Dec 2022 10 Dec '22

3:42 p.m.

Is this Ai software using info from Wikipedia directly or indirecly, and if not is it an alternative way of storing knowledge to the wikiway?

Anders

Show replies by date

The Cunctator

10 Dec 10 Dec

4:23 p.m.

It's trained on Wikipedia. Here's a 2020 paper from the authors. I would argue it's violating the copyright but I'm aware the foundation isn't very interested in defending it.

https://arxiv.org/abs/2005.14165

On Sat, Dec 10, 2022, 10:42 AM Anders Wennersten mail@anderswennersten.se wrote:

...

Is this Ai software using info from Wikipedia directly or indirecly, and if not is it an alternative way of storing knowledge to the wikiway?

Anders

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Kim Bruning

11 Dec 11 Dec

12:18 p.m.

Hello Cuncutator,

I think there's several lines of thought on this.

On what theory would you argue that ChatGPT is violating Wiki[p|m]edia copyright?

(If you've already posted reasoning elsewhere, or if someone else has posted an opinion you happen to agree with; I'd be happy to read a link as well)

sincerely, Kim

On Sat, Dec 10, 2022 at 11:23:48AM -0500, The Cunctator wrote:

...

It's trained on Wikipedia. Here's a 2020 paper from the authors. I would argue it's violating the copyright but I'm aware the foundation isn't very interested in defending it.

https://arxiv.org/abs/2005.14165

On Sat, Dec 10, 2022, 10:42 AM Anders Wennersten mail@anderswennersten.se wrote:

...
Is this Ai software using info from Wikipedia directly or indirecly, and if not is it an alternative way of storing knowledge to the wikiway?

Anders

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

...

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Anders Wennersten

1:55 p.m.

I perceive we have a good dialogue with Google, and where Google show respect and also acknowledge Wikipedia role in their fact box etc

ChatGPT is now making headlines more or less every day and I perceive them to try to position themself av the "next" google. But what about our dialogue with them, and respect and acknowledgement, if they are in their back end making use of the content of Wikipedia?

And if they are only having a very loose use of Wikipediacontent and are building content on their own, have we made an investigation in how it will effect us in the short and long term?

Anders

https://en.wikipedia.org/wiki/ChatGPT

Den 2022-12-11 kl. 13:18, skrev Kim Bruning via Wikimedia-l:

...

Hello Cuncutator,

I think there's several lines of thought on this.

On what theory would you argue that ChatGPT is violating Wiki[p|m]edia copyright?

(If you've already posted reasoning elsewhere, or if someone else has posted an opinion you happen to agree with; I'd be happy to read a link as well)

sincerely, Kim

On Sat, Dec 10, 2022 at 11:23:48AM -0500, The Cunctator wrote:

...
It's trained on Wikipedia. Here's a 2020 paper from the authors. I would argue it's violating the copyright but I'm aware the foundation isn't very interested in defending it.

https://arxiv.org/abs/2005.14165

On Sat, Dec 10, 2022, 10:42 AM Anders Wennersten mail@anderswennersten.se wrote:

...
Is this Ai software using info from Wikipedia directly or indirecly, and if not is it an alternative way of storing knowledge to the wikiway?

Anders

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Erik Moeller

18 Dec 18 Dec

10:01 p.m.

On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten mail@anderswennersten.se wrote:

...

ChatGPT is now making headlines more or less every day and I perceive them to try to position themself av the "next" google.

I suspect OpenAI will continue to focus on generative applications (images, code, text for purposes such as copywriting, eventually music/video) and won't attempt to compete with Google directly, but we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone to generating nonsensical answers, citations to works that don't exist, etc. But it is pretty cool if you keep its limitations in mind--for example, it's quite good at bootstrapping small scripts in various programming languages (with mistakes and idiosyncrasies).

Google has one of the largest AI research programs on the planet, they just are extremely conservative about letting anyone try their models (due to reputational concerns, e.g., that generative AI will spit out racist output within about 30 seconds of people poking its guardrails). This blog post from September is instructive about the direction they're taking with what's called retrieval-augmented generation; see the paper linked from the post for details:

https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind is part of Google)

That is likely to yield significantly more accurate answers than what ChatGPT is doing, and is difficult to replicate for folks like OpenAI without being dependent on the search APIs of big search companies. It's worth noting that Google has also started to incorporate language model tooling into how it's presenting search results (e.g., summarizing or highlighting different parts of a website to make the result snippet more useful).

A retrieval-augmented approach that leverages Wikidata could IMO be quite powerful and could be a useful research program for Wikimedia to pursue, be it independently or in partnership with others. The resulting technology should of course be fully open source.

Querying Wikidata via SPARQL is currently still a bit of wizardry (and the query builder is extremely limited). To pick a completely random example not at all inspired by current events, if I wanted to see a list of journalists with Mastodon accounts & a picture, I currently have to do this:

SELECT DISTINCT ?personLabel ?mastodonName ?pic WHERE { ?person wdt:P4033 ?mastodonName ; wdt:P106 ?occupation . OPTIONAL { ?person wdt:P18 ?pic . } ?occupation wdt:P279* wd:Q1930187 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }

Make a small mistake (a curly brace missing) and you'll get a red error message. Forgot the * after wdt:P279? A different response set in ways that are difficult to spot or reason about.

Why can't I type "list of journalists with their picture and Mastodon account" as a natural language query? (You can try it in ChatGPT and it'll get you started, but it'll generate nonsense P/Q numbers.) If such queries could be produced reliably, it could be a very useful tool for readers as well.

Warmly, Erik

Gnangarra

19 Dec 19 Dec

1:26 p.m.

AI simply cant descriminate between good research and faked research, for any outcome it must provide all of its sources whether they are from Wikipedia, Wikidata, WikiCommons, WikiSource or some other place. Otherwise it will answer yes to some asking if the world is flat because it'll seek out that answer and find all the nonsense that has been produced.

On Mon, 19 Dec 2022 at 06:02, Erik Moeller eloquence@gmail.com wrote:

...

On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten mail@anderswennersten.se wrote:

...
ChatGPT is now making headlines more or less every day and I perceive them to try to position themself av the "next" google.

I suspect OpenAI will continue to focus on generative applications (images, code, text for purposes such as copywriting, eventually music/video) and won't attempt to compete with Google directly, but we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone to generating nonsensical answers, citations to works that don't exist, etc. But it is pretty cool if you keep its limitations in mind--for example, it's quite good at bootstrapping small scripts in various programming languages (with mistakes and idiosyncrasies).

Google has one of the largest AI research programs on the planet, they just are extremely conservative about letting anyone try their models (due to reputational concerns, e.g., that generative AI will spit out racist output within about 30 seconds of people poking its guardrails). This blog post from September is instructive about the direction they're taking with what's called retrieval-augmented generation; see the paper linked from the post for details:

https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind is part of Google)

That is likely to yield significantly more accurate answers than what ChatGPT is doing, and is difficult to replicate for folks like OpenAI without being dependent on the search APIs of big search companies. It's worth noting that Google has also started to incorporate language model tooling into how it's presenting search results (e.g., summarizing or highlighting different parts of a website to make the result snippet more useful).

A retrieval-augmented approach that leverages Wikidata could IMO be quite powerful and could be a useful research program for Wikimedia to pursue, be it independently or in partnership with others. The resulting technology should of course be fully open source.

Querying Wikidata via SPARQL is currently still a bit of wizardry (and the query builder is extremely limited). To pick a completely random example not at all inspired by current events, if I wanted to see a list of journalists with Mastodon accounts & a picture, I currently have to do this:

SELECT DISTINCT ?personLabel ?mastodonName ?pic WHERE { ?person wdt:P4033 ?mastodonName ; wdt:P106 ?occupation . OPTIONAL { ?person wdt:P18 ?pic . } ?occupation wdt:P279* wd:Q1930187 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }

Make a small mistake (a curly brace missing) and you'll get a red error message. Forgot the * after wdt:P279? A different response set in ways that are difficult to spot or reason about.

Why can't I type "list of journalists with their picture and Mastodon account" as a natural language query? (You can try it in ChatGPT and it'll get you started, but it'll generate nonsense P/Q numbers.) If such queries could be produced reliably, it could be a very useful tool for readers as well.

Warmly, Erik _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Boodarwun Gnangarra 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar koortaboodjar'

Ad Huikeshoven

1:45 p.m.

Here is what ChatGPT has to offer on the subject of Earth's flatness:

Please answer in English the question "Elaborate on the question wether the earth is flat, and give some sources."

There is no scientific evidence that the earth is flat. On the contrary, there is a wealth of evidence that demonstrates that the earth is round. This evidence has been obtained through a variety of scientific methods, including observation of the sky, the use of geodetic systems, and the findings of astronauts who have traveled to space.

Here are a few sources that you can consult if you want to learn more about the fact that the earth is round:

NASA: "Is the Earth Round?" ( https://www.nasa.gov/audience/forstudents/k-4/stories/nasa-knows/what-is-ear... ) -

Live Science: "Why is the Earth Round?" ( https://www.livescience.com/32489-why-is-the-earth-round-.html) -

Encyclopedia Britannica: "Earth" (https://www.britannica.com/topic/Earth)

It is important to remember that the earth is round and that this has been proven through scientific research. The idea that the earth is flat is a myth that has long been debunked.

On Mon, Dec 19, 2022 at 2:27 PM Gnangarra gnangarra@gmail.com wrote:

...

AI simply cant descriminate between good research and faked research, for any outcome it must provide all of its sources whether they are from Wikipedia, Wikidata, WikiCommons, WikiSource or some other place. Otherwise it will answer yes to some asking if the world is flat because it'll seek out that answer and find all the nonsense that has been produced.

On Mon, 19 Dec 2022 at 06:02, Erik Moeller eloquence@gmail.com wrote:

...
On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten mail@anderswennersten.se wrote:

...
ChatGPT is now making headlines more or less every day and I perceive them to try to position themself av the "next" google.

I suspect OpenAI will continue to focus on generative applications (images, code, text for purposes such as copywriting, eventually music/video) and won't attempt to compete with Google directly, but we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone to generating nonsensical answers, citations to works that don't exist, etc. But it is pretty cool if you keep its limitations in mind--for example, it's quite good at bootstrapping small scripts in various programming languages (with mistakes and idiosyncrasies).

Google has one of the largest AI research programs on the planet, they just are extremely conservative about letting anyone try their models (due to reputational concerns, e.g., that generative AI will spit out racist output within about 30 seconds of people poking its guardrails). This blog post from September is instructive about the direction they're taking with what's called retrieval-augmented generation; see the paper linked from the post for details:

https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind is part of Google)

That is likely to yield significantly more accurate answers than what ChatGPT is doing, and is difficult to replicate for folks like OpenAI without being dependent on the search APIs of big search companies. It's worth noting that Google has also started to incorporate language model tooling into how it's presenting search results (e.g., summarizing or highlighting different parts of a website to make the result snippet more useful).

A retrieval-augmented approach that leverages Wikidata could IMO be quite powerful and could be a useful research program for Wikimedia to pursue, be it independently or in partnership with others. The resulting technology should of course be fully open source.

Querying Wikidata via SPARQL is currently still a bit of wizardry (and the query builder is extremely limited). To pick a completely random example not at all inspired by current events, if I wanted to see a list of journalists with Mastodon accounts & a picture, I currently have to do this:

SELECT DISTINCT ?personLabel ?mastodonName ?pic WHERE { ?person wdt:P4033 ?mastodonName ; wdt:P106 ?occupation . OPTIONAL { ?person wdt:P18 ?pic . } ?occupation wdt:P279* wd:Q1930187 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }

Make a small mistake (a curly brace missing) and you'll get a red error message. Forgot the * after wdt:P279? A different response set in ways that are difficult to spot or reason about.

Why can't I type "list of journalists with their picture and Mastodon account" as a natural language query? (You can try it in ChatGPT and it'll get you started, but it'll generate nonsense P/Q numbers.) If such queries could be produced reliably, it could be a very useful tool for readers as well.

Warmly, Erik _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Boodarwun Gnangarra 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar koortaboodjar'

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Anders Wennersten

1:51 p.m.

I think we should not underestimate what this could evolve into. We thrived because our readers find us "good enough" when it comes to finding facts, not the ultimate source.

And the software learns by reading, and can (and have done so) Wikipedia, Wikidata etc and represent our data in its own syntax och present it in a way so it is not a direct copy. Perhaps data will be a bit delayed to the actual content in Wikipedia, but so what - good enough?

Anders

Den 2022-12-19 kl. 14:26, skrev Gnangarra:

...

On Mon, 19 Dec 2022 at 06:02, Erik Moeller eloquence@gmail.com wrote:

On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten
<mail@anderswennersten.se> wrote:
> ChatGPT is now making headlines more or less every day and I
perceive
> them to try to position themself  av the "next" google.

I suspect OpenAI will continue to focus on generative applications
(images, code, text for purposes such as copywriting, eventually
music/video) and won't attempt to compete with Google directly, but
we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone
to generating nonsensical answers, citations to works that don't
exist, etc. But it is pretty cool if you keep its limitations in
mind--for example, it's quite good at bootstrapping small scripts in
various programming languages (with mistakes and idiosyncrasies).

Google has one of the largest AI research programs on the planet, they
just are extremely conservative about letting anyone try their models
(due to reputational concerns, e.g., that generative AI will spit out
racist output within about 30 seconds of people poking its
guardrails). This blog post from September is instructive about the
direction they're taking with what's called retrieval-augmented
generation; see the paper linked from the post for details:

https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind
is part of Google)

That is likely to yield significantly more accurate answers than what
ChatGPT is doing, and is difficult to replicate for folks like OpenAI
without being dependent on the search APIs of big search companies.
It's worth noting that Google has also started to incorporate language
model tooling into how it's presenting search results (e.g.,
summarizing or highlighting different parts of a website to make the
result snippet more useful).

A retrieval-augmented approach that leverages Wikidata could IMO be
quite powerful and could be a useful research program for Wikimedia to
pursue, be it independently or in partnership with others. The
resulting technology should of course be fully open source.

Querying Wikidata via SPARQL is currently still a bit of wizardry (and
the query builder is extremely limited). To pick a completely random
example not at all inspired by current events, if I wanted to see a
list of journalists with Mastodon accounts & a picture, I currently
have to do this:

SELECT DISTINCT ?personLabel ?mastodonName ?pic
WHERE {
  ?person wdt:P4033 ?mastodonName ;
    wdt:P106 ?occupation .
  OPTIONAL { ?person wdt:P18 ?pic . }
  ?occupation wdt:P279* wd:Q1930187 .
   SERVICE wikibase:label {
     bd:serviceParam wikibase:language "en"
   }
}

Make a small mistake (a curly brace missing) and you'll get a red
error message. Forgot the * after wdt:P279? A different response set
in ways that are difficult to spot or reason about.

Why can't I type "list of journalists with their picture and Mastodon
account" as a natural language query? (You can try it in ChatGPT and
it'll get you started, but it'll generate nonsense P/Q numbers.) If
such queries could be produced reliably, it could be a very useful
tool for readers as well.

Warmly,
Erik
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,
guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/YYTLJVCDSYITUKNA2DJSK5SSR3AZ3B5F/
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Boodarwun Gnangarra 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar koortaboodjar' //

Wikimedia-l mailing list --wikimedia-l@lists.wikimedia.org, guidelines at:https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines andhttps://meta.wikimedia.org/wiki/Wikimedia-l Public archives athttps://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email towikimedia-l-leave@lists.wikimedia.org

ZhaoFJx

10:03 p.m.

I'd be curious about copyright issues though, as it's licensed on Github https://github.com/acheong08/ChatGPT/blob/main/LICENSE. It may indeed be infringing copyright, since he may require attribution. However, I agree with The Cunctator that the community won't do anything about it.

Incidentally, I'm a little worried that he might grab broken content that wasn't rolled back. After all - the current artificial intelligence can not reach the level of human thinking

Sincerely, ZhaoFJx _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

Anders Wennersten mail@anderswennersten.se 于2022年12月19日周一 08:52写道：

...

I think we should not underestimate what this could evolve into. We thrived because our readers find us "good enough" when it comes to finding facts, not the ultimate source.

And the software learns by reading, and can (and have done so) Wikipedia, Wikidata etc and represent our data in its own syntax och present it in a way so it is not a direct copy. Perhaps data will be a bit delayed to the actual content in Wikipedia, but so what - good enough?

Anders

Den 2022-12-19 kl. 14:26, skrev Gnangarra:

AI simply cant descriminate between good research and faked research, for any outcome it must provide all of its sources whether they are from Wikipedia, Wikidata, WikiCommons, WikiSource or some other place. Otherwise it will answer yes to some asking if the world is flat because it'll seek out that answer and find all the nonsense that has been produced.

On Mon, 19 Dec 2022 at 06:02, Erik Moeller eloquence@gmail.com wrote:

...
On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten mail@anderswennersten.se wrote:

...
ChatGPT is now making headlines more or less every day and I perceive them to try to position themself av the "next" google.

I suspect OpenAI will continue to focus on generative applications (images, code, text for purposes such as copywriting, eventually music/video) and won't attempt to compete with Google directly, but we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone to generating nonsensical answers, citations to works that don't exist, etc. But it is pretty cool if you keep its limitations in mind--for example, it's quite good at bootstrapping small scripts in various programming languages (with mistakes and idiosyncrasies).

Google has one of the largest AI research programs on the planet, they just are extremely conservative about letting anyone try their models (due to reputational concerns, e.g., that generative AI will spit out racist output within about 30 seconds of people poking its guardrails). This blog post from September is instructive about the direction they're taking with what's called retrieval-augmented generation; see the paper linked from the post for details:

https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind is part of Google)

That is likely to yield significantly more accurate answers than what ChatGPT is doing, and is difficult to replicate for folks like OpenAI without being dependent on the search APIs of big search companies. It's worth noting that Google has also started to incorporate language model tooling into how it's presenting search results (e.g., summarizing or highlighting different parts of a website to make the result snippet more useful).

A retrieval-augmented approach that leverages Wikidata could IMO be quite powerful and could be a useful research program for Wikimedia to pursue, be it independently or in partnership with others. The resulting technology should of course be fully open source.

Querying Wikidata via SPARQL is currently still a bit of wizardry (and the query builder is extremely limited). To pick a completely random example not at all inspired by current events, if I wanted to see a list of journalists with Mastodon accounts & a picture, I currently have to do this:

SELECT DISTINCT ?personLabel ?mastodonName ?pic WHERE { ?person wdt:P4033 ?mastodonName ; wdt:P106 ?occupation . OPTIONAL { ?person wdt:P18 ?pic . } ?occupation wdt:P279* wd:Q1930187 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }

Make a small mistake (a curly brace missing) and you'll get a red error message. Forgot the * after wdt:P279? A different response set in ways that are difficult to spot or reason about.

Why can't I type "list of journalists with their picture and Mastodon account" as a natural language query? (You can try it in ChatGPT and it'll get you started, but it'll generate nonsense P/Q numbers.) If such queries could be produced reliably, it could be a very useful tool for readers as well.

Warmly, Erik _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Boodarwun Gnangarra 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar koortaboodjar'

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Pharos

20 Dec 20 Dec

7:24 p.m.

I'd like to share my experiment with using ChapGPT to help write an article on English Wikipedia:

https://en.wikipedia.org/wiki/Artwork_title

You can see an explanation of the process here, your comments are welcome:

https://en.wikipedia.org/wiki/Talk:Artwork_title#Use_of_ChatGPT

Thanks, Richard (User:Pharos)

On Tue, Dec 20, 2022 at 1:45 PM ZhaoFJx zhaofjx@gmail.com wrote:

...

I'd be curious about copyright issues though, as it's licensed on Github https://github.com/acheong08/ChatGPT/blob/main/LICENSE. It may indeed be infringing copyright, since he may require attribution. However, I agree with The Cunctator that the community won't do anything about it.

Incidentally, I'm a little worried that he might grab broken content that wasn't rolled back. After all - the current artificial intelligence can not reach the level of human thinking

Sincerely, ZhaoFJx _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

Anders Wennersten mail@anderswennersten.se 于2022年12月19日周一 08:52写道：

...
I think we should not underestimate what this could evolve into. We thrived because our readers find us "good enough" when it comes to finding facts, not the ultimate source.

And the software learns by reading, and can (and have done so) Wikipedia, Wikidata etc and represent our data in its own syntax och present it in a way so it is not a direct copy. Perhaps data will be a bit delayed to the actual content in Wikipedia, but so what - good enough?

Anders

Den 2022-12-19 kl. 14:26, skrev Gnangarra:

AI simply cant descriminate between good research and faked research, for any outcome it must provide all of its sources whether they are from Wikipedia, Wikidata, WikiCommons, WikiSource or some other place. Otherwise it will answer yes to some asking if the world is flat because it'll seek out that answer and find all the nonsense that has been produced.

On Mon, 19 Dec 2022 at 06:02, Erik Moeller eloquence@gmail.com wrote:

...
On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten mail@anderswennersten.se wrote:

...
ChatGPT is now making headlines more or less every day and I perceive them to try to position themself av the "next" google.

I suspect OpenAI will continue to focus on generative applications (images, code, text for purposes such as copywriting, eventually music/video) and won't attempt to compete with Google directly, but we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone to generating nonsensical answers, citations to works that don't exist, etc. But it is pretty cool if you keep its limitations in mind--for example, it's quite good at bootstrapping small scripts in various programming languages (with mistakes and idiosyncrasies).

Google has one of the largest AI research programs on the planet, they just are extremely conservative about letting anyone try their models (due to reputational concerns, e.g., that generative AI will spit out racist output within about 30 seconds of people poking its guardrails). This blog post from September is instructive about the direction they're taking with what's called retrieval-augmented generation; see the paper linked from the post for details:

https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind is part of Google)

That is likely to yield significantly more accurate answers than what ChatGPT is doing, and is difficult to replicate for folks like OpenAI without being dependent on the search APIs of big search companies. It's worth noting that Google has also started to incorporate language model tooling into how it's presenting search results (e.g., summarizing or highlighting different parts of a website to make the result snippet more useful).

A retrieval-augmented approach that leverages Wikidata could IMO be quite powerful and could be a useful research program for Wikimedia to pursue, be it independently or in partnership with others. The resulting technology should of course be fully open source.

Querying Wikidata via SPARQL is currently still a bit of wizardry (and the query builder is extremely limited). To pick a completely random example not at all inspired by current events, if I wanted to see a list of journalists with Mastodon accounts & a picture, I currently have to do this:

SELECT DISTINCT ?personLabel ?mastodonName ?pic WHERE { ?person wdt:P4033 ?mastodonName ; wdt:P106 ?occupation . OPTIONAL { ?person wdt:P18 ?pic . } ?occupation wdt:P279* wd:Q1930187 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }

Make a small mistake (a curly brace missing) and you'll get a red error message. Forgot the * after wdt:P279? A different response set in ways that are difficult to spot or reason about.

Why can't I type "list of journalists with their picture and Mastodon account" as a natural language query? (You can try it in ChatGPT and it'll get you started, but it'll generate nonsense P/Q numbers.) If such queries could be produced reliably, it could be a very useful tool for readers as well.

Warmly, Erik _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Boodarwun Gnangarra 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar koortaboodjar'

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Ziko van Dijk

22 Dec 22 Dec

6 p.m.

Hello, About my experiences with Chat CPT - is it good enough to write our articles? - I made this video: https://youtu.be/zKPEyxYt5kg Kind regards Ziko

Am Di., 20. Dez. 2022 um 20:25 Uhr schrieb Pharos pharosofalexandria@gmail.com:

...

I'd like to share my experiment with using ChapGPT to help write an article on English Wikipedia:

https://en.wikipedia.org/wiki/Artwork_title

You can see an explanation of the process here, your comments are welcome:

https://en.wikipedia.org/wiki/Talk:Artwork_title#Use_of_ChatGPT

Thanks, Richard (User:Pharos)

On Tue, Dec 20, 2022 at 1:45 PM ZhaoFJx zhaofjx@gmail.com wrote:

...
I'd be curious about copyright issues though, as it's licensed on Github. It may indeed be infringing copyright, since he may require attribution. However, I agree with The Cunctator that the community won't do anything about it.

Incidentally, I'm a little worried that he might grab broken content that wasn't rolled back. After all - the current artificial intelligence can not reach the level of human thinking

Sincerely, ZhaoFJx _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

Anders Wennersten mail@anderswennersten.se 于2022年12月19日周一 08:52写道：

...
I think we should not underestimate what this could evolve into. We thrived because our readers find us "good enough" when it comes to finding facts, not the ultimate source.

And the software learns by reading, and can (and have done so) Wikipedia, Wikidata etc and represent our data in its own syntax och present it in a way so it is not a direct copy. Perhaps data will be a bit delayed to the actual content in Wikipedia, but so what - good enough?

Anders

Den 2022-12-19 kl. 14:26, skrev Gnangarra:

AI simply cant descriminate between good research and faked research, for any outcome it must provide all of its sources whether they are from Wikipedia, Wikidata, WikiCommons, WikiSource or some other place. Otherwise it will answer yes to some asking if the world is flat because it'll seek out that answer and find all the nonsense that has been produced.

On Mon, 19 Dec 2022 at 06:02, Erik Moeller eloquence@gmail.com wrote:

...
On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten mail@anderswennersten.se wrote:

...
ChatGPT is now making headlines more or less every day and I perceive them to try to position themself av the "next" google.

I suspect OpenAI will continue to focus on generative applications (images, code, text for purposes such as copywriting, eventually music/video) and won't attempt to compete with Google directly, but we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone to generating nonsensical answers, citations to works that don't exist, etc. But it is pretty cool if you keep its limitations in mind--for example, it's quite good at bootstrapping small scripts in various programming languages (with mistakes and idiosyncrasies).

Google has one of the largest AI research programs on the planet, they just are extremely conservative about letting anyone try their models (due to reputational concerns, e.g., that generative AI will spit out racist output within about 30 seconds of people poking its guardrails). This blog post from September is instructive about the direction they're taking with what's called retrieval-augmented generation; see the paper linked from the post for details:

https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind is part of Google)

That is likely to yield significantly more accurate answers than what ChatGPT is doing, and is difficult to replicate for folks like OpenAI without being dependent on the search APIs of big search companies. It's worth noting that Google has also started to incorporate language model tooling into how it's presenting search results (e.g., summarizing or highlighting different parts of a website to make the result snippet more useful).

A retrieval-augmented approach that leverages Wikidata could IMO be quite powerful and could be a useful research program for Wikimedia to pursue, be it independently or in partnership with others. The resulting technology should of course be fully open source.

Querying Wikidata via SPARQL is currently still a bit of wizardry (and the query builder is extremely limited). To pick a completely random example not at all inspired by current events, if I wanted to see a list of journalists with Mastodon accounts & a picture, I currently have to do this:

SELECT DISTINCT ?personLabel ?mastodonName ?pic WHERE { ?person wdt:P4033 ?mastodonName ; wdt:P106 ?occupation . OPTIONAL { ?person wdt:P18 ?pic . } ?occupation wdt:P279* wd:Q1930187 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }

Make a small mistake (a curly brace missing) and you'll get a red error message. Forgot the * after wdt:P279? A different response set in ways that are difficult to spot or reason about.

Why can't I type "list of journalists with their picture and Mastodon account" as a natural language query? (You can try it in ChatGPT and it'll get you started, but it'll generate nonsense P/Q numbers.) If such queries could be produced reliably, it could be a very useful tool for readers as well.

Warmly, Erik _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Boodarwun Gnangarra 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar koortaboodjar'

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Kimmo Virtanen

11:40 p.m.

Hi,

I am not sure if the generated article text is how it will be used as long it will invent stuff from thin air. However, it is rather good for summarizing text and converting source text to a structured format. For example, converting Wikimedia Commons categories to wikidata properties.

Source file for the categories * https://commons.wikimedia.org/wiki/File:1150D_-_01,_Acervo_do_Museu_Paulista...

*Kimmo*

...

Describe "19th-century portrait photographs at bust length", "19th-century portrait photographs of men", "Albumen prints", "Photographia Americana", "19th-century photographs of Brazil", "Works by Militão A. de Azevedo at Museu Paulista". Use only a list of Wikidata properties and identifiers. Context is photography.

*ChatGPT*

...

Here is a list of Wikidata properties and identifiers that could be used to describe the following topics: "19th-century portrait photographs at bust length":

"point in time" (P585) - 19th century

"genre" (P136) - portrait photograph

"depicts" (P180) - bust

"medium" (P186) - photograph

"19th-century portrait photographs of men":

"point in time" (P585) - 19th century

"genre" (P136) - portrait photograph

"depicts" (P180) - men

"medium" (P186) - photograph

"Albumen prints":

"medium" (P186) - albumen print

"Photographia Americana":

"genre" (P136) - photography

"country" (P17) - United States of America

"19th-century photographs of Brazil":

"point in time" (P585) - 19th century

"genre" (P136) - photograph

"country" (P17) - Brazil

"Works by Militão A. de Azevedo at Museu Paulista":

"artist" (P170) - Militão A. de Azevedo

"location" (P276) - Museu Paulista

It fails with Photographia Americana which is photographic studio in São Paulo.

However in other ways the result was quite a good proxy for what the photo contains. Information could be used as filling the wikidata items of the commons categories, suggested values for the Structured data on commons values, multilingual search etc.

Br, -- Kimmo Virtanen, Zache

On Thu, Dec 22, 2022 at 8:06 PM Ziko van Dijk zvandijk@gmail.com wrote:

...

Hello, About my experiences with Chat CPT - is it good enough to write our articles? - I made this video: https://youtu.be/zKPEyxYt5kg Kind regards Ziko

Am Di., 20. Dez. 2022 um 20:25 Uhr schrieb Pharos pharosofalexandria@gmail.com:

...
I'd like to share my experiment with using ChapGPT to help write an

article on English Wikipedia:

...
https://en.wikipedia.org/wiki/Artwork_title

You can see an explanation of the process here, your comments are

welcome:

...
https://en.wikipedia.org/wiki/Talk:Artwork_title#Use_of_ChatGPT

Thanks, Richard (User:Pharos)

On Tue, Dec 20, 2022 at 1:45 PM ZhaoFJx zhaofjx@gmail.com wrote:

...
I'd be curious about copyright issues though, as it's licensed on

Github. It may indeed be infringing copyright, since he may require attribution. However, I agree with The Cunctator that the community won't do anything about it.

...
...
Incidentally, I'm a little worried that he might grab broken content

that wasn't rolled back. After all - the current artificial intelligence can not reach the level of human thinking

...
...
Sincerely, ZhaoFJx _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
Anders Wennersten mail@anderswennersten.se 于2022年12月19日周一 08:52写道：

...
I think we should not underestimate what this could evolve into. We

thrived because our readers find us "good enough" when it comes to finding facts, not the ultimate source.

...
...
...
And the software learns by reading, and can (and have done so)

Wikipedia, Wikidata etc and represent our data in its own syntax och present it in a way so it is not a direct copy. Perhaps data will be a bit delayed to the actual content in Wikipedia, but so what - good enough?

...
...
...
Anders

Den 2022-12-19 kl. 14:26, skrev Gnangarra:

AI simply cant descriminate between good research and faked research,

for any outcome it must provide all of its sources whether they are from Wikipedia, Wikidata, WikiCommons, WikiSource or some other place. Otherwise it will answer yes to some asking if the world is flat because it'll seek out that answer and find all the nonsense that has been produced.

...
...
...
On Mon, 19 Dec 2022 at 06:02, Erik Moeller eloquence@gmail.com

wrote:

...
...
...
...
On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten mail@anderswennersten.se wrote:

...
ChatGPT is now making headlines more or less every day and I

perceive

...
...
...
...
...
them to try to position themself av the "next" google.

I suspect OpenAI will continue to focus on generative applications (images, code, text for purposes such as copywriting, eventually music/video) and won't attempt to compete with Google directly, but we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone to generating nonsensical answers, citations to works that don't exist, etc. But it is pretty cool if you keep its limitations in mind--for example, it's quite good at bootstrapping small scripts in various programming languages (with mistakes and idiosyncrasies).

Google has one of the largest AI research programs on the planet, they just are extremely conservative about letting anyone try their models (due to reputational concerns, e.g., that generative AI will spit out racist output within about 30 seconds of people poking its guardrails). This blog post from September is instructive about the direction they're taking with what's called retrieval-augmented generation; see the paper linked from the post for details:

https://www.deepmind.com/blog/building-safer-dialogue-agents

(DeepMind

...
...
...
...
is part of Google)

That is likely to yield significantly more accurate answers than what ChatGPT is doing, and is difficult to replicate for folks like OpenAI without being dependent on the search APIs of big search companies. It's worth noting that Google has also started to incorporate language model tooling into how it's presenting search results (e.g., summarizing or highlighting different parts of a website to make the result snippet more useful).

A retrieval-augmented approach that leverages Wikidata could IMO be quite powerful and could be a useful research program for Wikimedia to pursue, be it independently or in partnership with others. The resulting technology should of course be fully open source.

Querying Wikidata via SPARQL is currently still a bit of wizardry (and the query builder is extremely limited). To pick a completely random example not at all inspired by current events, if I wanted to see a list of journalists with Mastodon accounts & a picture, I currently have to do this:

SELECT DISTINCT ?personLabel ?mastodonName ?pic WHERE { ?person wdt:P4033 ?mastodonName ; wdt:P106 ?occupation . OPTIONAL { ?person wdt:P18 ?pic . } ?occupation wdt:P279* wd:Q1930187 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }

Make a small mistake (a curly brace missing) and you'll get a red error message. Forgot the * after wdt:P279? A different response set in ways that are difficult to spot or reason about.

Why can't I type "list of journalists with their picture and Mastodon account" as a natural language query? (You can try it in ChatGPT and it'll get you started, but it'll generate nonsense P/Q numbers.) If such queries could be produced reliably, it could be a very useful tool for readers as well.

Warmly, Erik _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
...
...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
...
...
...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Boodarwun Gnangarra 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar

koortaboodjar'

...
...
...

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
...
...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
...
...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines

at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Paulo Santos Perneta

23 Dec 23 Dec

1:03 p.m.

I've been playing with it on over the last week, and my experience is that that AI is incredibly prone to invent informations that look credible, while being entirely made up by it. It would generate inexistent persons, dates, events and places, and build an entire story out of them that, nevertheless, looks credible. I would advise anyone against using it for anything more than helping with paraphrasing already existing texts, when using it.

Just now. when asked about bishop Pedro Sardinha (first bishop of Brazil) he told me: It's possible that you are referring to Pedro da Silva Sardinha, also known as Father Francisco, who was a Portuguese Jesuit missionary who served as the bishop of Timbuktu in the early 18th century. Father Francisco was born in Lisbon, Portugal, and entered the Jesuit order as a young man. He was sent to West Africa as a missionary and served as the bishop of Timbuktu from 1705 to 1706. Father Francisco is known for his efforts to convert the local population to Christianity and for his work to improve the education and literacy of the people in Timbuktu. However, he died in 1706, apparently while being held captive by local authorities.

This person never existed, nor those events, it was entirely made up by the AI, but looks credible. Often it says the information is based on its research on reliable sources, even presenting some of them (but also fake).

Yesterday, when asked about the Quintinense Scheme, a well known sockpuppet affair of the Portuguese Wikipedia, it said it was an arrangement by King Manuel I of Portugal in the early 16th century, concerning the way the lands should be cultivated in the colonies, named after Quinta do Lorde, which it said was the first settlement in Madeira (it's actually a tourist resort built in the late 1990s). I'm constantly surprised by the AI capacity of inventing such elaborate fakes.

In the case of Pharos experiment, this AI generated information, which still persists (unsourced) in the article, is probably fake as well: The practice of titling artworks dates back to ancient civilizations, where works of art were often inscribed with the name of the artist and the subject of the piece.

Best,

Paulo

Ziko van Dijk zvandijk@gmail.com escreveu no dia quinta, 22/12/2022 à(s) 18:01:

...

Hello, About my experiences with Chat CPT - is it good enough to write our articles? - I made this video: https://youtu.be/zKPEyxYt5kg Kind regards Ziko

Am Di., 20. Dez. 2022 um 20:25 Uhr schrieb Pharos pharosofalexandria@gmail.com:

...
I'd like to share my experiment with using ChapGPT to help write an

article on English Wikipedia:

...
https://en.wikipedia.org/wiki/Artwork_title

You can see an explanation of the process here, your comments are

welcome:

...
https://en.wikipedia.org/wiki/Talk:Artwork_title#Use_of_ChatGPT

Thanks, Richard (User:Pharos)

On Tue, Dec 20, 2022 at 1:45 PM ZhaoFJx zhaofjx@gmail.com wrote:

...
I'd be curious about copyright issues though, as it's licensed on

Github. It may indeed be infringing copyright, since he may require attribution. However, I agree with The Cunctator that the community won't do anything about it.

...
...
Incidentally, I'm a little worried that he might grab broken content

that wasn't rolled back. After all - the current artificial intelligence can not reach the level of human thinking

...
...
Sincerely, ZhaoFJx _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
Anders Wennersten mail@anderswennersten.se 于2022年12月19日周一 08:52写道：

...
I think we should not underestimate what this could evolve into. We

thrived because our readers find us "good enough" when it comes to finding facts, not the ultimate source.

...
...
...
And the software learns by reading, and can (and have done so)

Wikipedia, Wikidata etc and represent our data in its own syntax och present it in a way so it is not a direct copy. Perhaps data will be a bit delayed to the actual content in Wikipedia, but so what - good enough?

...
...
...
Anders

Den 2022-12-19 kl. 14:26, skrev Gnangarra:

AI simply cant descriminate between good research and faked research,

for any outcome it must provide all of its sources whether they are from Wikipedia, Wikidata, WikiCommons, WikiSource or some other place. Otherwise it will answer yes to some asking if the world is flat because it'll seek out that answer and find all the nonsense that has been produced.

...
...
...
On Mon, 19 Dec 2022 at 06:02, Erik Moeller eloquence@gmail.com

wrote:

...
...
...
...
On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten mail@anderswennersten.se wrote:

...
ChatGPT is now making headlines more or less every day and I

perceive

...
...
...
...
...
them to try to position themself av the "next" google.

I suspect OpenAI will continue to focus on generative applications (images, code, text for purposes such as copywriting, eventually music/video) and won't attempt to compete with Google directly, but we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone to generating nonsensical answers, citations to works that don't exist, etc. But it is pretty cool if you keep its limitations in mind--for example, it's quite good at bootstrapping small scripts in various programming languages (with mistakes and idiosyncrasies).

Google has one of the largest AI research programs on the planet, they just are extremely conservative about letting anyone try their models (due to reputational concerns, e.g., that generative AI will spit out racist output within about 30 seconds of people poking its guardrails). This blog post from September is instructive about the direction they're taking with what's called retrieval-augmented generation; see the paper linked from the post for details:

https://www.deepmind.com/blog/building-safer-dialogue-agents

(DeepMind

...
...
...
...
is part of Google)

That is likely to yield significantly more accurate answers than what ChatGPT is doing, and is difficult to replicate for folks like OpenAI without being dependent on the search APIs of big search companies. It's worth noting that Google has also started to incorporate language model tooling into how it's presenting search results (e.g., summarizing or highlighting different parts of a website to make the result snippet more useful).

A retrieval-augmented approach that leverages Wikidata could IMO be quite powerful and could be a useful research program for Wikimedia to pursue, be it independently or in partnership with others. The resulting technology should of course be fully open source.

Querying Wikidata via SPARQL is currently still a bit of wizardry (and the query builder is extremely limited). To pick a completely random example not at all inspired by current events, if I wanted to see a list of journalists with Mastodon accounts & a picture, I currently have to do this:

SELECT DISTINCT ?personLabel ?mastodonName ?pic WHERE { ?person wdt:P4033 ?mastodonName ; wdt:P106 ?occupation . OPTIONAL { ?person wdt:P18 ?pic . } ?occupation wdt:P279* wd:Q1930187 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }

Make a small mistake (a curly brace missing) and you'll get a red error message. Forgot the * after wdt:P279? A different response set in ways that are difficult to spot or reason about.

Why can't I type "list of journalists with their picture and Mastodon account" as a natural language query? (You can try it in ChatGPT and it'll get you started, but it'll generate nonsense P/Q numbers.) If such queries could be produced reliably, it could be a very useful tool for readers as well.

Warmly, Erik _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
...
...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
...
...
...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Boodarwun Gnangarra 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar

koortaboodjar'

...
...
...

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
...
...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
...
...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

-- Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,

guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines

at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l

...
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/...

...
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Chris Keating

3:29 p.m.

On Fri, Dec 23, 2022 at 1:04 PM Paulo Santos Perneta < paulosperneta@gmail.com> wrote:

...

I've been playing with it on over the last week, and my experience is that that AI is incredibly prone to invent informations that look credible, while being entirely made up by it. It would generate inexistent persons, dates, events and places, and build an entire story out of them that, nevertheless, looks credible.

Indeed; here is another example, after prompting to write an article about the song "Tomorrow Belongs to Me", which was written in 1966 for the Broadway musical "Cabaret". ChatGPT attributes it, entirely fictionally, to a real 1920s German lyricist.

"Tomorrow Belongs to Me" is a song that has been used in a variety of media, including film, television, and theater. The song is known for its nationalist and anti-Semitic lyrics, which have made it controversial and a subject of debate.

The song was originally written by the German composer and lyricist Hans Leip in the 1920s. It was later adopted by the National Socialist German Workers' Party (Nazi Party) as a propaganda song, and was performed at party rallies and events.

In the United States, "Tomorrow Belongs to Me" was popularized by its use in the film "Cabaret," which was released in 1972. In the film, the song is sung by a young man at a Nazi rally, and is later performed by the character of Sally Bowles, played by Liza Minnelli. The song has since been used in a number of other films, television shows, and stage productions, often as a means of referencing the Nazi party or the Holocaust. Despite its widespread use, "Tomorrow Belongs to Me" has been criticized for its nationalist and anti-Semitic themes, and some have called for it to be banned or censored. The song has also been the subject of legal disputes, with some arguing that its use in certain contexts may infringe on copyright laws or be considered hate speech.

Anders Wennersten

25 Dec 25 Dec

9 a.m.

I have also played around with simpler queries (see below) and find the same: excellent, easy to read answers, with minor mistakes and sometimes false, invented info.

But I still see this as the future. If/when this is implemented with speech recognition (already exists) and good synthetic speech, you have a new interface for users that will take on the world. For me the only question is of Google come first (who has better knowledge how to interface backend knowledge repositories the Wikipedia will become) or if chatGPT will learn this, that seems for me to be a minor thing for them compared with getting the answering logic in place that they have mastered.

And I still believe is is vital for us to understand how this future will affect or systems as well as how we will need to work in order to still provide knowledge, even if it is to a repository and not the world directly.

My test "Where lies Farsta?"

"Farsta is a suburb located in the southeastern part of Stockholm, the capital city of Sweden. It is situated in the Söderort borough of Stockholm and is known for its shopping center and recreational areas. Farsta is easily accessible by public transport, with several bus and metro lines passing through the area. The suburb is also home to Farsta Strand, a beach and recreational area that attracts visitors during the summer months."

*that söderort actually is not a borough is not of vital importance to the reader.

*the last sentence and to call it recreational areas is just wrong. Farsta strand is called a satellite town i enwp and is primary a built up area located close by a smaller lake. There exist a small beach "farsta strandbad" for local residents, but there are 5-6 better beaches at cleaner lakes within a 5 km radius. And not a recreation area and certainly not attracting visitors during the summer months from outside this place (there are many excellent recreational areas very close by)

Anders

Den 2022-12-23 kl. 16:29, skrev Chris Keating:

...

On Fri, Dec 23, 2022 at 1:04 PM Paulo Santos Perneta paulosperneta@gmail.com wrote:
I've been playing with it on over the last week, and my experience
is that that AI is incredibly prone to invent informations that
look credible, while being entirely made up by it. It would
generate inexistent persons, dates, events and places, and build
an entire story out of them that, nevertheless, looks credible.
Indeed; here is another example, after prompting to write an article about the song "Tomorrow Belongs to Me", which was written in 1966 for the Broadway musical "Cabaret". ChatGPT attributes it, entirely fictionally, to a real 1920s German lyricist.

"Tomorrow Belongs to Me" is a song that has been used in a variety of media, including film, television, and theater. The song is known for its nationalist and anti-Semitic lyrics, which have made it controversial and a subject of debate.

The song was originally written by the German composer and lyricist Hans Leip in the 1920s. It was later adopted by the National Socialist German Workers' Party (Nazi Party) as a propaganda song, and was performed at party rallies and events.

In the United States, "Tomorrow Belongs to Me" was popularized by its use in the film "Cabaret," which was released in 1972. In the film, the song is sung by a young man at a Nazi rally, and is later performed by the character of Sally Bowles, played by Liza Minnelli. The song has since been used in a number of other films, television shows, and stage productions, often as a means of referencing the Nazi party or the Holocaust.

Despite its widespread use, "Tomorrow Belongs to Me" has been criticized for its nationalist and anti-Semitic themes, and some have called for it to be banned or censored. The song has also been the subject of legal disputes, with some arguing that its use in certain contexts may infringe on copyright laws or be considered hate speech.

Wikimedia-l mailing list --wikimedia-l@lists.wikimedia.org, guidelines at:https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines andhttps://meta.wikimedia.org/wiki/Wikimedia-l Public archives athttps://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email towikimedia-l-leave@lists.wikimedia.org

Erik Moeller

9:20 p.m.

On Sun, Dec 25, 2022 at 1:00 AM Anders Wennersten mail@anderswennersten.se wrote:

...

For me the only question is of Google come first (who has better knowledge how to interface backend knowledge repositories the Wikipedia will become) or if chatGPT will learn this

No speech interface as far as I can tell, but FYI, there now is at least one search engine that already integrates a language model based chatbot into search: https://you.com/, which has the backing from Salesforce founder & billionaire Marc Benioff (a bit more: https://www.protocol.com/you-dot-com-benioff). Unlike ChatGPT, it tries to directly cite web sources. When that source is Wikipedia, you'll note it's basically rewriting/summarizing the Wikipedia article. I don't know if it uses GPT underneath or its own language model; Salesforce has certainly funded the creation of models of its own.

When I asked You.com if it uses GPT-3, it said yes. When I asked it to provide a source, it generated a URL that does not exist.

I also observed other failure modes, such as combining multiple persons with the same name into one, or giving directly contradictory answers to the same question being asked repeatedly. All of these failure modes are characteristic of language models, which are a bit like pinball machines in that they will generate results nondeterministically from the training data.

Of course, this is the technology as it exists today, and even with those limitations in mind it can prove useful (though it seems irresponsible to market it as part of a search engine in its current form).

Warmly, Erik

Kimmo Virtanen

10:01 p.m.

...

No speech interface as far as I can tell, but FYI, there now is at....

Emerson by Quickchat has a speech interface integration to GPT models. - https://www.quickchat.ai/emerson https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbm5BaEVPdXdIVGFveGNTb3MxTG9vcnlOelFKZ3xBQ3Jtc0ttRWJnZGt4dWJqWHpKLUtRM1REZmtvd2FwWWk3YmtteDJrRHNMZHVJdjdRYmtwdmg2UmdHTGd3OWZOYmRnWUJEY09LSk1YSms0Q3NfOW11aWVlbmpBY2JiTUxjcldSOGhmUmZRamprVXFISWR5Yy1yNA&q=https%3A%2F%2Fwww.quickchat.ai%2Femerson&v=5DBXZRZEBGM

Some sample videos by Dr Alan D. Thompson.

Leta, GPT-3 AI - Episode 1 (Five things, Art, Seeing, Round) - Conversations and talking with GPT3 - https://www.youtube.com/watch?v=5DBXZRZEBGM

Leta, GPT-3 AI - Episode 10 (GPT-J, GPT-3, GPT-2 questions, facts, general knowledge) - https://www.youtube.com/watch?v=V0pceNYgELE

Br, -- Kimmo Virtanen, Zache

On Sun, Dec 25, 2022 at 11:22 PM Erik Moeller eloquence@gmail.com wrote:

...

On Sun, Dec 25, 2022 at 1:00 AM Anders Wennersten mail@anderswennersten.se wrote:

...
For me the only question is of Google come first (who has better

knowledge how to interface

...
backend knowledge repositories the Wikipedia will become) or if chatGPT

will learn this

No speech interface as far as I can tell, but FYI, there now is at least one search engine that already integrates a language model based chatbot into search: https://you.com/, which has the backing from Salesforce founder & billionaire Marc Benioff (a bit more: https://www.protocol.com/you-dot-com-benioff). Unlike ChatGPT, it tries to directly cite web sources. When that source is Wikipedia, you'll note it's basically rewriting/summarizing the Wikipedia article. I don't know if it uses GPT underneath or its own language model; Salesforce has certainly funded the creation of models of its own.

When I asked You.com if it uses GPT-3, it said yes. When I asked it to provide a source, it generated a URL that does not exist.

I also observed other failure modes, such as combining multiple persons with the same name into one, or giving directly contradictory answers to the same question being asked repeatedly. All of these failure modes are characteristic of language models, which are a bit like pinball machines in that they will generate results nondeterministically from the training data.

Of course, this is the technology as it exists today, and even with those limitations in mind it can prove useful (though it seems irresponsible to market it as part of a search engine in its current form).

Warmly, Erik _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org

Erik Moeller

24 Dec 24 Dec

2:16 a.m.

On Sun, Dec 18, 2022 at 2:01 PM Erik Moeller eloquence@gmail.com wrote:

...

Querying Wikidata via SPARQL is currently still a bit of wizardry (and the query builder is extremely limited). To pick a completely random example not at all inspired by current events, if I wanted to see a list of journalists with Mastodon accounts & a picture, I currently have to do this:

In case folks find it useful, that same query is now available (along with other profession-based Wikidata searches) here: https://eloquence.github.io/fediscope/

To bring it back on-topic, much of the coding and data processing was aided by ChatGPT ;)

Warmly, Erik

744

Age (days ago)

759

Last active (days ago)

wikimedia-l@lists.wikimedia.org

17 comments

12 participants

tags (0)

participants (12)

Ad Huikeshoven
Anders Wennersten
Chris Keating
Erik Moeller
Gnangarra
Kim Bruning
Kimmo Virtanen
Paulo Santos Perneta
Pharos
The Cunctator
ZhaoFJx
Ziko van Dijk