Hi Tilman!

On Tue, Aug 8, 2023 at 5:45 AM Tilman Bayer <haebwiki@gmail.com> wrote:

Hi Chris,

On Mon, Aug 7, 2023 at 11:51 AM Chris Albon <calbon@wikimedia.org> wrote:
Hi Tilman, 

Most of the work is still very experimental. We have hosted a few LLMs on Lift Wing already (StarCoder for example) but they were just running on CPU, far too slow for real use cases. But it proves that we can easily host LLMs on Lift Wing. We have been pretty quiet about it while we focus on the ORES migration, but it is our next big project. More soon hopefully!
Understood. Looking forward to learning more later!


Where we are now is that we have budget for a big GPU purchase (~10-20 GPUs depending on cost), the question we will try to answer after the ORES migration is complete is: what GPUs should we purchase? We are trying to balance our strong preference to stay open source (i.e. AMD mROC) in a world dominated by a single closed source vendor (i.e. Nvidia). In addition, do we go for a few expensive GPUs better suited to LLMs (A1000, H100, etc) or a mix of big and small? We will need to figure out all this.
I see. On that matter, what do you folks make of the recent announcements of AMD's partnerships with Hugging Face and Pytorch[5]? (which, I understand, came after the ML team had already launched the aforementioned new AMD explorations)

"Open-source AI: AMD looks to Hugging Face and Meta spinoff PyTorch to take on Nvidia [...]
Both partnerships involve AMD’s ROCm AI software stack, the company’s answer to Nvidia’s proprietary CUDA platform and application-programming interface. AMD called ROCm an open and portable AI system with out-of-the-box support that can port to existing AI models. [...B]oth AMD and Hugging Face are dedicating engineering resources to each other and sharing data to ensure that the constantly updated AI models from Hugging Face, which might not otherwise run well on AMD hardware, would be “guaranteed” to work on hardware like the MI300X. [...] AMD said PyTorch will fully upstream the ROCm software stack and “provide immediate ‘day zero’ support for PyTorch 2.0 with ROCm release 5.4.2 on all AMD Instinct accelerators,” which is meant to appeal to those customers looking to switch from Nvidia’s software ecosystem."

In their own announcement, Hugging Face offered further details, including a pretty impressive list of models to be supported:[6]
 
"We intend to support state-of-the-art transformer architectures for natural language processing, computer vision, and speech, such as BERT, DistilBERT, ROBERTA, Vision Transformer, CLIP, and Wav2Vec2. Of course, generative AI models will be available too (e.g., GPT2, GPT-NeoX, T5, OPT, LLaMA), including our own BLOOM and StarCoder models. Lastly, we will also support more traditional computer vision models, like ResNet and ResNext, and deep learning recommendation models, a first for us. [..] We'll do our best to test and validate these models for PyTorch, TensorFlow, and ONNX Runtime for the above platforms. [...] We will integrate the AMD ROCm SDK seamlessly in our open-source libraries, starting with the transformers library."

Do you think this may promise too much, or could it point to a possible solution of the Foundation's conundrum?

In https://phabricator.wikimedia.org/T334583 we experimented with LLMs and AMD GPUs on Lift Wing, and we confirmed the good results that Pytorch announced, We were able to run bloom-3b, bloom-560m, nllb-200 and falcon-7b on Lift Wing, having issues only with the last one since the GPU VRAM was not enough (16GB are low for Falcon-7b). So we can confirm that AMD ROCm works really well with Pytorch :)
 
In any case, this seems to be an interesting moment where many in AI are trying to move away from Nvidia's proprietary CUDA platform. 

This is my own view, not my team's, so I can't speak up for what the WMF will decide, but I think we should keep going with AMD and avoid Nvidia as much as possible. Our strong stand against proprietary software should hold, even if it means more efforts and work to advance in the ML field. I completely get the frustration when common libraries and tools have more difficulty to run on AMD than Nvidia, but our communities should align (in my opinion) to the most open source solution and contribute (where possible) so that more and more people adopt the same.
Adding proprietary software to the WMF infrastructure and practices is also something that is technically difficult for various reasons (from the Linux Kernel maintenance to Debian package upload), meanwhile we already have everything set up and working for AMD (that works nicely with our infrastructure). Moreover Debian upstream has recently created a team to maintain AMD ROCm packages (https://lists.debian.org/debian-ai/), so it will be interesting to see what their direction will be (so far it seems aligned to ours).

Thanks!

Luca