With respect to cloud computing costs, these being a significant component of the costs to train and operate modern AI systems, as a non-profit organization, the Wikimedia Foundation might be interested in the National Research Cloud (NRC) policy proposal: https://hai.stanford.edu/policy/national-research-cloud .

"Artificial intelligence requires vast amounts of computing power, data, and expertise to train and deploy the massive machine learning models behind the most advanced research. But access is increasingly out of reach for most colleges and universities. A National Research Cloud (NRC) would provide academic and non-profit researchers with the compute power and government datasets needed for education and research. By democratizing access and equity for all colleges and universities, an NRC has the potential not only to unleash a string of advancements in AI, but to help ensure the U.S. maintains its leadership and competitiveness on the global stage.

"Throughout 2020, Stanford HAI led efforts with 22 top computer science universities along with a bipartisan, bicameral group of lawmakers proposing legislation to bring the NRC to fruition. On January 1, 2021, the U.S. Congress authorized the National AI Research Resource Task Force Act as part of the National Defense Authorization Act for Fiscal Year 2021. This law requires that a federal task force be established to study and provide an implementation pathway to create world-class computational resources and robust government datasets for researchers across the country in the form of a National Research Cloud. The task force will issue a final report to the President and Congress next year.

"The promise of an NRC is to democratize AI research, education, and innovation, making it accessible to all colleges and universities across the country. Without a National Research Cloud, all but the most elite universities risk losing the ability to conduct meaningful AI research and to adequately educate the next generation of AI researchers."

See also: [1][2]

[1] https://www.whitehouse.gov/ostp/news-updates/2023/01/24/national-artificial-intelligence-research-resource-task-force-releases-final-report/
[2] https://www.ai.gov/wp-content/uploads/2023/01/NAIRR-TF-Final-Report-2023.pdf

From: Steven Walling <steven.walling@gmail.com>
Sent: Saturday, February 4, 2023 1:59 AM
To: Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org>
Subject: [Wikimedia-l] Re: Chat GPT

On Fri, Feb 3, 2023 at 9:47 PM Gerg§ Tisza <gtisza@gmail.com> wrote:
Just to give a sense of scale: OpenAI started with a $1 billion donation, got another $1B as investment, and is now getting a larger investment from Microsoft (undisclosed but rumored to be $10B). Assuming they spent most of their previous funding, which seems likely, their operational costs are in the ballpark of $300 million per year. The idea that the WMF could just choose to create conversational software of a similar quality if it wanted seems detached from reality to me.

Without spending billions on LLM development to aim for a conversational chatbot trying to pass a Turing test, we could definitely try to catch up to the state of the art in search results. Our search currently does a pretty bad job (in terms of recall especially). Today's featured article in English is the Hot Chip album "Made in the Dark", and if I enter anything but the exact article title the typeahead results are woefully incomplete or wrong. If I ask an actual question, good luck. 

Google is feeling vulnerable to OpenAI here in part because everyone can see that their results are often full of low quality junk created for SEO, while ChatGPT just gives a concise answer right there. 

https://en.wikipedia.org/wiki/The_Menu_(2022_film) is one of the top viewed English articles. If I search "The Menu reviews" the Google results are noisy and not so great. ChatGPT actually gives you nothing relevant because it doesn't know anything from 2022. If we could just manage to display the three sentence snippet of our article about the critical response section of the article, it would be awesome. It's too bad that the whole "knowledge engine" debacle poisoned the well when it comes to a Wikipedia search engine, because we could definitely do a lot to learn from what people like about ChatGPT and apply to Wikipedia search.

Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/6OBPB7WNHKJQXXIBCK73SDXLE3DMGNMY/
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org