How to run Large AI Models from Hugging Face on Single GPU without OOM

This demo shows how to run large AI models from #huggingface on a Single GPU without Out of Memory error. Take a OPT-175B or BLOOM-176B parameter model .These are large language models and often require very high processing machine or multi-GPU, but thanks to bitsandbytes, in just a few tweaks to your code, you can run these large models on single node.

In this tutorial, we’ll see 3 Billion parameter BLOOM AI model (loaded from Hugging Face) and #LLM inference on Google Colab (Tesla T4) without OOM.

This is brilliant! Kudos to the team.

bitsandbytes –
Google Colab Notebook –