Hi Tilman!
On Tue, Aug 8, 2023 at 5:45 AM Tilman Bayer <haebwiki(a)gmail.com> wrote:
Hi Chris,
On Mon, Aug 7, 2023 at 11:51 AM Chris Albon <calbon(a)wikimedia.org> wrote:
Hi Tilman,
Most of the work is still very experimental. We have hosted a few LLMs on
Lift Wing already (StarCoder for example) but they were just running on
CPU, far too slow for real use cases. But it proves that we can easily host
LLMs on Lift Wing. We have been pretty quiet about it while we focus on the
ORES migration, but it is our next big project. More soon hopefully!
Understood. Looking forward to learning more later!
Where we are now is that we have budget for a big
GPU purchase (~10-20
GPUs depending on cost), the question we will try to answer after the ORES
migration is complete is: what GPUs should we purchase? We are trying to
balance our strong preference to stay open source (i.e. AMD mROC) in a
world dominated by a single closed source vendor (i.e. Nvidia). In
addition, do we go for a few expensive GPUs better suited to LLMs (A1000,
H100, etc) or a mix of big and small? We will need to figure out all this.
I see. On that matter, what do you folks make of the recent announcements
of AMD's partnerships with Hugging Face and Pytorch[5]? (which, I
understand, came after the ML team had already launched the aforementioned
new AMD explorations)
"Open-source AI: AMD looks to Hugging Face and Meta spinoff PyTorch to
take on Nvidia [...]
Both partnerships involve AMD’s ROCm AI software stack, the company’s
answer to Nvidia’s proprietary CUDA platform and application-programming
interface. AMD called ROCm an open and portable AI system with
out-of-the-box support that can port to existing AI models. [...B]oth AMD
and Hugging Face are dedicating engineering resources to each other and
sharing data to ensure that the constantly updated AI models from Hugging
Face, which might not otherwise run well on AMD hardware, would be
“guaranteed” to work on hardware like the MI300X. [...] AMD said PyTorch
will fully upstream the ROCm software stack and “provide immediate ‘day
zero’ support for PyTorch 2.0 with ROCm release 5.4.2 on all AMD Instinct
accelerators,” which is meant to appeal to those customers looking to
switch from Nvidia’s software ecosystem."
In their own announcement, Hugging Face offered further details, including
a pretty impressive list of models to be supported:[6]
"We intend to support state-of-the-art transformer architectures for
natural language processing, computer vision, and speech, such as BERT,
DistilBERT, ROBERTA, Vision Transformer, CLIP, and Wav2Vec2. Of course,
generative AI models will be available too (e.g., GPT2, GPT-NeoX, T5, OPT,
LLaMA), including our own BLOOM and StarCoder models. Lastly, we will also
support more traditional computer vision models, like ResNet and ResNext,
and deep learning recommendation models, a first for us. [..] We'll do our
best to test and validate these models for PyTorch, TensorFlow, and ONNX
Runtime for the above platforms. [...] We will integrate the AMD ROCm SDK
seamlessly in our open-source libraries, starting with the transformers
library."
Do you think this may promise too much, or could it point to a possible
solution of the Foundation's conundrum?
In
https://phabricator.wikimedia.org/T334583 we experimented with LLMs and
AMD GPUs on Lift Wing, and we confirmed the good results that Pytorch
announced, We were able to run bloom-3b, bloom-560m, nllb-200 and falcon-7b
on Lift Wing, having issues only with the last one since the GPU VRAM was
not enough (16GB are low for Falcon-7b). So we can confirm that AMD ROCm
works really well with Pytorch :)
In any case, this seems to be an interesting moment
where many in AI are
trying to move away from Nvidia's proprietary CUDA platform.
This is my own view, not my team's, so I can't speak up for what the WMF
will decide, but I think we should keep going with AMD and avoid Nvidia as
much as possible. Our strong stand against proprietary software should
hold, even if it means more efforts and work to advance in the ML field. I
completely get the frustration when common libraries and tools have more
difficulty to run on AMD than Nvidia, but our communities should align (in
my opinion) to the most open source solution and contribute (where
possible) so that more and more people adopt the same.
Adding proprietary software to the WMF infrastructure and practices is also
something that is technically difficult for various reasons (from the Linux
Kernel maintenance to Debian package upload), meanwhile we already have
everything set up and working for AMD (that works nicely with our
infrastructure). Moreover Debian upstream has recently created a team to
maintain AMD ROCm packages (
https://lists.debian.org/debian-ai/), so it
will be interesting to see what their direction will be (so far it seems
aligned to ours).
Thanks!
Luca