[Wikitech-l] Re: ORES To Lift Wing Migration

8 Aug 2023

Hi Chris,

On Mon, Aug 7, 2023 at 11:51 AM Chris Albon &lt;calbon(a)wikimedia.org&gt; wrote:

...
  Hi Tilman,

 Most of the work is still very experimental. We have hosted a few LLMs on
 Lift Wing already (StarCoder for example) but they were just running on
 CPU, far too slow for real use cases. But it proves that we can easily host
 LLMs on Lift Wing. We have been pretty quiet about it while we focus on the
 ORES migration, but it is our next big project. More soon hopefully!
 Understood. Looking forward to learning more later!

...
  Where we are now is that we have budget for a big GPU
purchase (~10-20
 GPUs depending on cost), the question we will try to answer after the ORES
 migration is complete is: what GPUs should we purchase? We are trying to
 balance our strong preference to stay open source (i.e. AMD mROC) in a
 world dominated by a single closed source vendor (i.e. Nvidia). In
 addition, do we go for a few expensive GPUs better suited to LLMs (A1000,
 H100, etc) or a mix of big and small? We will need to figure out all this.
 I see. On that matter, what do you folks make of the recent announcements
of AMD's partnerships with Hugging Face and Pytorch[5]? (which, I
understand, came after the ML team had already launched the aforementioned
new AMD explorations)

"Open-source AI: AMD looks to Hugging Face and Meta spinoff PyTorch to take
on Nvidia [...]
Both partnerships involve AMD’s ROCm AI software stack, the company’s
answer to Nvidia’s proprietary CUDA platform and application-programming
interface. AMD called ROCm an open and portable AI system with
out-of-the-box support that can port to existing AI models. [...B]oth AMD
and Hugging Face are dedicating engineering resources to each other and
sharing data to ensure that the constantly updated AI models from Hugging
Face, which might not otherwise run well on AMD hardware, would be
“guaranteed” to work on hardware like the MI300X. [...] AMD said PyTorch
will fully upstream the ROCm software stack and “provide immediate ‘day
zero’ support for PyTorch 2.0 with ROCm release 5.4.2 on all AMD Instinct
accelerators,” which is meant to appeal to those customers looking to
switch from Nvidia’s software ecosystem."

In their own announcement, Hugging Face offered further details, including
a pretty impressive list of models to be supported:[6]

"We intend to support state-of-the-art transformer architectures for
natural language processing, computer vision, and speech, such as BERT,
DistilBERT, ROBERTA, Vision Transformer, CLIP, and Wav2Vec2. Of course,
generative AI models will be available too (e.g., GPT2, GPT-NeoX, T5, OPT,
LLaMA), including our own BLOOM and StarCoder models. Lastly, we will also
support more traditional computer vision models, like ResNet and ResNext,
and deep learning recommendation models, a first for us. [..] We'll do our
best to test and validate these models for PyTorch, TensorFlow, and ONNX
Runtime for the above platforms. [...] We will integrate the AMD ROCm SDK
seamlessly in our open-source libraries, starting with the transformers
library."

Do you think this may promise too much, or could it point to a possible
solution of the Foundation's conundrum?
In any case, this seems to be an interesting moment where many in AI are
trying to move away from Nvidia's proprietary CUDA platform. Most of them
probably more for financial and availability reasons though, given the
current GPU shortages[7] (which the ML team is undoubtedly aware of
already; mentioning this as context for others on this list. See also
Marketwatch's remarks about current margins[5]).

Regards, Tilman

[5]
https://archive.ph/2023.06.15-173527/https://www.marketwatch.com/amp/story/…
[6] https://huggingface.co/blog/huggingface-and-amd
[7] See e.g. https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/
(avoid playing the song though. Don't say I didn't warn you)

...
  I wouldn't characterize WMF's Language Team
using CPU as because of AMD,
 rather at the time we didn't have the budget for GPUs so Lift Wing didn't
 have any. Since then we have moved two GPUs onto Lift Wing for testing but
 they are pretty old (2017ish). Once we make the big GPU purchase Lift Wing
 will gain a lot of functionality for LLM and similar models.

 Chris

 On Sun, Aug 6, 2023 at 9:57 PM Tilman Bayer &lt;haebwiki(a)gmail.com&gt; wrote:

  On Thu, Aug 3, 2023 at 7:16 AM Chris Albon
&lt;calbon(a)wikimedia.org&gt; wrote:

  Hi everybody,

 TL;DR We would like users of ORES models to migrate to our new open
 source ML infrastructure, Lift Wing, within the next five months. We are
 available to help you do that, from advice to making code commits. It is
 important to note: All ML models currently accessible on ORES are also
 currently accessible on Lift Wing.

 As part of the Machine Learning Modernization Project (
 https://www.mediawiki.org/wiki/Machine_Learning/Modernization), the
 Machine Learning team has deployed a Wikimedia’s new machine learning
 inference infrastructure, called Lift Wing (
 https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing). Lift
 Wing brings a lot of new features such as support for GPU-based models,
 open source LLM hosting, auto-scaling, stability, and ability to host a
 larger number of models.

 This sounds quite exciting! What's the best place to read up on that
 planned support for GPU-based models and open source LLMs? (I also saw in
 the recent NYT article[1] that the team is "in the process of adapting A.I.
 models that are 'off the shelf; — essentially models that have been made
 available by researchers for anyone to freely customize — so that
 Wikipedia’s editors can use them for their work.")

 I'm aware of the history[2] of not being able to use NVIDIA GPUs due to
 their CUDA drivers being proprietary. It was mentioned recently in the
 Wikimedia AI Telegram group that this is still a serious limitation,
 despite some new explorations with AMD GPUs[3] - to the point that e.g. the
 WMF's Language team has resorted to using models without GPU support (CPU
 only).[4]
 It sounds like there is reasonable hope that this situation could change
 fairly soon? Would it also mean both at the same time, i.e. open source
 LLMs running with GPU support (considering that at least some
 well-known ones appear to require torch.cuda.is_available() == True for
 that)?

 Regards, Tilman

 [1] https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html
 [2]

https://techblog.wikimedia.org/2020/04/06/saying-no-to-proprietary-code-in-…
 [3] https://phabricator.wikimedia.org/T334583 etc.
 [4]

https://diff.wikimedia.org/2023/06/13/mint-supporting-underserved-languages…
 or https://thottingal.in/blog/2023/07/21/wikiqa/ (experimental but, I
 understand, written to be deployable on WMF infrastructure)

 With the creation of Lift Wing, the team is turning its attention to
 deprecating the current machine learning infrastructure, ORES. ORES served
 us really well over the years, it was a successful project but it came
 before radical changes in technology like Docker, Kubernetes and more
 recently MLOps. The servers that run ORES are at the end of their planned
 lifespan and so to save cost we are going to shut them down in early 2024.

 We have outlined a deprecation path on Wikitech (
 https://wikitech.wikimedia.org/wiki/ORES), please read the page if you
 are a maintainer of a tool or code that uses the ORES endpoint
 https://ores.wikimedia.org/). If you have any doubt or if you need
 assistance in migrating to Lift Wing, feel free to contact the ML team via:

 - Email: ml(a)wikimedia.org
 - Phabricator: #Machine-Learning-Team tag
 - IRC (Libera): #wikimedia-ml

 The Machine Learning team is available to help projects migrate, from
 offering advice to making code commits. We want to make this as easy as
 possible for folks.

 High Level timeline:

 **By September 30th 2023: *Infrastructure powering the ORES API
 endpoint will be migrated from ORES to Lift Wing. For users, the API
 endpoint will remain the same, and most users won’t notice any change.
 Rather just the backend services powering the endpoint will change.

 Details: We'd like to add a DNS CNAME that points ores.wikimedia.org to
 ores-legacy.wikimedia.org, a new endpoint that offers a almost complete
 replacement of the ORES API calling Lift Wing behind the scenes. In an
 ideal world we'd migrate all tools to Lift Wing before decommissioning the
 infrastructure behind ores.wikimedia.org, but it turned out to be
 really challenging so to avoid disrupting users we chose to implement a
 transition layer/API.

 To summarize, if you don't have time to migrate before September to Lift
 Wing, your code/tool should work just fine on ores-legacy.wikimedia.org
 and you'll not have to change a line in your code thanks to the DNS CNAME.
 The ores-legacy endpoint is not a 100% replacement for ores, we removed
 some very old and not used features, so we highly recommend at least test
 the new endpoint for your use case to avoid surprises when we'll make the
 switch. In case you find anything weird, please report it to us using the
 aforementioned channels.

 **September to January: *We will be reaching out to every user of ORES
 we can identify and working with them to make the migration process as easy
 as possible.

 **By January 2024: *If all goes well, we would like zero traffic on the
 ORES API endpoint so we can turn off the ores-legacy API.

 If you want more information about Lift Wing, please check
 https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing

 Thanks in advance for the patience and the help!

 Regards,

 The Machine Learning Team
 _______________________________________________
 Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
 To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org

 https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ 
 _______________________________________________
 Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
 To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org

 https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ 
 _______________________________________________
 Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
 To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org
 https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: ORES To Lift Wing Migration