Hi Tilman, 

Most of the work is still very experimental. We have hosted a few LLMs on Lift Wing already (StarCoder for example) but they were just running on CPU, far too slow for real use cases. But it proves that we can easily host LLMs on Lift Wing. We have been pretty quiet about it while we focus on the ORES migration, but it is our next big project. More soon hopefully!

Where we are now is that we have budget for a big GPU purchase (~10-20 GPUs depending on cost), the question we will try to answer after the ORES migration is complete is: what GPUs should we purchase? We are trying to balance our strong preference to stay open source (i.e. AMD mROC) in a world dominated by a single closed source vendor (i.e. Nvidia). In addition, do we go for a few expensive GPUs better suited to LLMs (A1000, H100, etc) or a mix of big and small? We will need to figure out all this.

I wouldn't characterize WMF's Language Team using CPU as because of AMD, rather at the time we didn't have the budget for GPUs so Lift Wing didn't have any. Since then we have moved two GPUs onto Lift Wing for testing but they are pretty old (2017ish). Once we make the big GPU purchase Lift Wing will gain a lot of functionality for LLM and similar models.

Chris

On Sun, Aug 6, 2023 at 9:57 PM Tilman Bayer <haebwiki@gmail.com> wrote:
On Thu, Aug 3, 2023 at 7:16 AM Chris Albon <calbon@wikimedia.org> wrote:
Hi everybody,

TL;DR We would like users of ORES models to migrate to our new open source ML infrastructure, Lift Wing, within the next five months. We are available to help you do that, from advice to making code commits. It is important to note: All ML models currently accessible on ORES are also currently accessible on Lift Wing.

As part of the Machine Learning Modernization Project (https://www.mediawiki.org/wiki/Machine_Learning/Modernization), the Machine Learning team has deployed a Wikimedia’s new machine learning inference infrastructure, called Lift Wing (https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing). Lift Wing brings a lot of new features such as support for GPU-based models, open source LLM hosting, auto-scaling, stability, and ability to host a larger number of models.
 
This sounds quite exciting! What's the best place to read up on that planned support for GPU-based models and open source LLMs? (I also saw in the recent NYT article[1] that the team is "in the process of adapting A.I. models that are 'off the shelf; — essentially models that have been made available by researchers for anyone to freely customize — so that Wikipedia’s editors can use them for their work.")

I'm aware of the history[2] of not being able to use NVIDIA GPUs due to their CUDA drivers being proprietary. It was mentioned recently in the Wikimedia AI Telegram group that this is still a serious limitation, despite some new explorations with AMD GPUs[3] - to the point that e.g. the WMF's Language team has resorted to using models without GPU support (CPU only).[4] 
It sounds like there is reasonable hope that this situation could change fairly soon? Would it also mean both at the same time, i.e. open source LLMs running with GPU support (considering that at least some well-known ones appear to require torch.cuda.is_available() == True for that)? 

Regards, Tilman

[1] https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html
[2] https://techblog.wikimedia.org/2020/04/06/saying-no-to-proprietary-code-in-production-is-hard-work-the-gpu-chapter/
[3] https://phabricator.wikimedia.org/T334583 etc.
[4] https://diff.wikimedia.org/2023/06/13/mint-supporting-underserved-languages-with-open-machine-translation/ or https://thottingal.in/blog/2023/07/21/wikiqa/ (experimental but, I understand, written to be deployable on WMF infrastructure)
 

With the creation of Lift Wing, the team is turning its attention to deprecating the current machine learning infrastructure, ORES. ORES served us really well over the years, it was a successful project but it came before radical changes in technology like Docker, Kubernetes and more recently MLOps. The servers that run ORES are at the end of their planned lifespan and so to save cost we are going to shut them down in early 2024.

We have outlined a deprecation path on Wikitech (https://wikitech.wikimedia.org/wiki/ORES), please read the page if you are a maintainer of a tool or code that uses the ORES endpoint https://ores.wikimedia.org/). If you have any doubt or if you need assistance in migrating to Lift Wing, feel free to contact the ML team via:

- Email: ml@wikimedia.org
- Phabricator: #Machine-Learning-Team tag
- IRC (Libera): #wikimedia-ml

The Machine Learning team is available to help projects migrate, from offering advice to making code commits. We want to make this as easy as possible for folks.

High Level timeline:

*By September 30th 2023: Infrastructure powering the ORES API endpoint will be migrated from ORES to Lift Wing. For users, the API endpoint will remain the same, and most users won’t notice any change. Rather just the backend services powering the endpoint will change.

Details: We'd like to add a DNS CNAME that points ores.wikimedia.org to ores-legacy.wikimedia.org, a new endpoint that offers a almost complete replacement of the ORES API calling Lift Wing behind the scenes. In an ideal world we'd migrate all tools to Lift Wing before decommissioning the infrastructure behind ores.wikimedia.org, but it turned out to be really challenging so to avoid disrupting users we chose to implement a transition layer/API.

To summarize, if you don't have time to migrate before September to Lift Wing, your code/tool should work just fine on ores-legacy.wikimedia.org and you'll not have to change a line in your code thanks to the DNS CNAME. The ores-legacy endpoint is not a 100% replacement for ores, we removed some very old and not used features, so we highly recommend at least test the new endpoint for your use case to avoid surprises when we'll make the switch. In case you find anything weird, please report it to us using the aforementioned channels.

*September to January: We will be reaching out to every user of ORES we can identify and working with them to make the migration process as easy as possible.

*By January 2024: If all goes well, we would like zero traffic on the ORES API endpoint so we can turn off the ores-legacy API.

If you want more information about Lift Wing, please check https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing

Thanks in advance for the patience and the help!

Regards,

The Machine Learning Team
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/