From what I have seen the AIs are not great on citing sources. If they start citing
reliable sources, their contributions can be verified, or not. If they produce verifiable,
adequately sourced, well written information, are they a problem or a solution?
Cheers,
Peter
From: Gnangarra [mailto:gnangarra@gmail.com]
Sent: 04 February 2023 17:04
To: Wikimedia Mailing List
Subject: [Wikimedia-l] Re: Chat GPT
I see our biggest challenge is going to be detecting these AI tools adding content whether
it's media or articles, along with identifying when they are in use by sources. The
failing of all new AI is not in its ability but in the lack of transparency with that
being able to be identified by the readers. We have seen people impersonating musicians
and writing songs in their style. We have also seen pictures that have been created by
copying someone else's work yet not acknowledging it as being derivative of any kind.
Our big problems will be in ensuring that copyright is respected in legally, and not
hosting anything that is even remotely dubious
On Sat, 4 Feb 2023 at 22:24, Adam Sobieski <adamsobieski(a)hotmail.com> wrote:
Brainstorming on how to drive traffic to Wikimedia content from conversational media,
UI/UX designers could provide menu items or buttons on chatbots' applications or
webpage components (e.g., to read more about the content, to navigate to cited resources,
to edit the content, to discuss the content, to upvote/downvote the content, to share the
content or the recent dialogue history on social media, to request
review/moderation/curation for the content, etc.). Many of these envisioned menu items or
buttons would operate contextually during dialogues, upon the most recent (or otherwise
selected) responses provided by the chatbot or upon the recent transcripts. Some of these
features could also be made available to end-users via spoken-language commands.
At any point during hypertext-based dialogues, end-users would be able to navigate to
Wikimedia content. These navigations could utilize either URL query string arguments or
HTTP POST. In either case, bulk usage data, e.g., those dialogue contexts navigated from,
could be useful.
The capability to perform A/B testing across chatbots’ dialogues, over large populations
of end-users, could also be useful. In this way, Wikimedia would be better able to: (1)
measure end-user engagement and satisfaction, (2) measure the quality of provided content,
(3) perform personalization, (4) retain readers and editors. A/B testing could be
performed by providing end-users with various feedback buttons (as described above). A/B
testing data could also be obtained through data mining, analyzing end-users’ behaviors,
response times, responses, and dialogue moves. These data could be provided for the
community at special pages and could be made available per article, possibly by enhancing
the “Page information” system. One can also envision these kinds of analytics data
existing at the granularity of portions of, or selections of, articles.
Best regards,
Adam
_____
From: Victoria Coleman <vstavridoucoleman(a)gmail.com>
Sent: Saturday, February 4, 2023 8:10 AM
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Subject: [Wikimedia-l] Re: Chat GPT
Hi Christophe,
I had not thought about the threat to Wikipedia traffic from Chat GPT but you have a good
point. The success of the projects is always one step away from the next big disruption.
So the WMF as the tech provider for the mission (because first and foremost in my view
that?s what the WMF is - as well as the financial engine of the movement of course) needs
to pay attention and experiment to maintain the long term viability of the mission. In
fact I think the cluster of our projects offers compelling options. For example to your
point below on data sets, we have the amazing Wikidata as well the excellent work on
abstract Wikipedia. We have Wikipedia Enterprise which has built some avenues of
collaboration with big tech. A bold vision is needed to bring all of it together and build
an MVP for the community to experiment with.
Best regards,
Victoria Coleman
On Feb 4, 2023, at 4:14 AM, Christophe Henner <christophe.henner(a)gmail.com> wrote:
?Hi,
On the product side, NLP based AI biggest concern to me is that it would drastically
decrease traffic to our websites/apps. Which means less new editors ans less donations.
So first from a strictly positioning perspective, we have here a major change that needs
to be managed.
And to be honest, it will come faster than we think. We are perfectionists, I can assure
you, most companies would be happy to launch a search product with a 80% confidence in
answers quality.
From a financial perspective, large industrial investment like this are usually a pool of
money you can draw from in x years. You can expect they did not draw all of it yet.
Second, GPT 3 and ChatGPT are far from being the most expensive products they have. On top
of people you need:
* datasets
* people to tag the dataset
* people to correct the algo
* computing power
I simplify here, but we already have the capacity to muster some of that, which
drastically lowers our costs :)
I would not discard the option of the movement doing it so easily. That being said, it
would mean a new project with the need of substantial ressources.
Sent from my iPhone
On Feb 4, 2023, at 9:30 AM, Adam Sobieski <adamsobieski(a)hotmail.com> wrote:
?
With respect to cloud computing costs, these being a significant component of the costs to
train and operate modern AI systems, as a non-profit organization, the Wikimedia
Foundation might be interested in the National Research Cloud (NRC) policy proposal:
https://hai.stanford.edu/policy/national-research-cloud .
"Artificial intelligence requires vast amounts of computing power, data, and
expertise to train and deploy the massive machine learning models behind the most advanced
research. But access is increasingly out of reach for most colleges and universities. A
National Research Cloud (NRC) would provide academic and non-profit researchers with the
compute power and government datasets needed for education and research. By democratizing
access and equity for all colleges and universities, an NRC has the potential not only to
unleash a string of advancements in AI, but to help ensure the U.S. maintains its
leadership and competitiveness on the global stage.
"Throughout 2020, Stanford HAI led efforts with 22 top computer science universities
along with a bipartisan, bicameral group of lawmakers proposing legislation to bring the
NRC to fruition. On January 1, 2021, the U.S. Congress authorized the National AI Research
Resource Task Force Act as part of the National Defense Authorization Act for Fiscal Year
2021. This law requires that a federal task force be established to study and provide an
implementation pathway to create world-class computational resources and robust government
datasets for researchers across the country in the form of a National Research Cloud. The
task force will issue a final report to the President and Congress next year.
"The promise of an NRC is to democratize AI research, education, and innovation,
making it accessible to all colleges and universities across the country. Without a
National Research Cloud, all but the most elite universities risk losing the ability to
conduct meaningful AI research and to adequately educate the next generation of AI
researchers."
See also: [1][2]
[1]
https://www.whitehouse.gov/ostp/news-updates/2023/01/24/national-artificial…
[2]
https://www.ai.gov/wp-content/uploads/2023/01/NAIRR-TF-Final-Report-2023.pdf
_____
From: Steven Walling <steven.walling(a)gmail.com>
Sent: Saturday, February 4, 2023 1:59 AM
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Subject: [Wikimedia-l] Re: Chat GPT
On Fri, Feb 3, 2023 at 9:47 PM Gerg? Tisza <gtisza(a)gmail.com> wrote:
Just to give a sense of scale: OpenAI started with a $1 billion donation, got another $1B
as investment, and is now getting a larger investment from Microsoft (undisclosed but
rumored to be $10B). Assuming they spent most of their previous funding, which seems
likely, their operational costs are in the ballpark of $300 million per year. The idea
that the WMF could just choose to create conversational software of a similar quality if
it wanted seems detached from reality to me.
Without spending billions on LLM development to aim for a conversational chatbot trying to
pass a Turing test, we could definitely try to catch up to the state of the art in search
results. Our search currently does a pretty bad job (in terms of recall especially).
Today's featured article in English is the Hot Chip album "Made in the
Dark", and if I enter anything but the exact article title the typeahead results are
woefully incomplete or wrong. If I ask an actual question, good luck.
Google is feeling vulnerable to OpenAI here in part because everyone can see that their
results are often full of low quality junk created for SEO, while ChatGPT just gives a
concise answer right there.
https://en.wikipedia.org/wiki/The_Menu_(2022_film) is one of the top viewed English
articles. If I search "The Menu reviews" the Google results are noisy and not so
great. ChatGPT actually gives you nothing relevant because it doesn't know anything
from 2022. If we could just manage to display the three sentence snippet of our article
about the critical response section of the article, it would be awesome. It's too bad
that the whole "knowledge engine" debacle poisoned the well when it comes to a
Wikipedia search engine, because we could definitely do a lot to learn from what people
like about ChatGPT and apply to Wikipedia search.
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l(a)lists.wikimedia.org, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…
To unsubscribe send an email to wikimedia-l-leave(a)lists.wikimedia.org
--
Boodarwun
Gnangarra
'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar koortaboodjar'
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
Virus-free.
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
www.avg.com