Deploying Hugging Face models with Amazon SageMaker and AWS Inferentia2
Вставка
- Опубліковано 3 жов 2024
- In this video, I walk you through the simple process of deploying a Hugging Face large language model on AWS, with Amazon SageMaker and the AWS Inferentia2 accelerator.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.subst.... ⭐️⭐️⭐️
Notebook:
gitlab.com/jul...
Deep Dive: Hugging Face models on AWS AI Accelerators
• Deep Dive: Hugging Fac...
Blog posts:
huggingface.co...
aws.amazon.com... - Наука та технологія
Great video!
Glad you enjoyed it
Great
Thank you
Great video Julien, thank you! Does the model have to be pre-compiled to run on AWS (EC2 or SageMager)?
Thank you. If you're going to deploy on SageMaker, yes. At the moment, our container won't compile the moment. On EC2, the model will be compiled on the fly if needed.
hi Julien! Do you have any tips on how can I convert a ComfyUI workflow SD1.5 based model to 🤗 or run directly on INF2 ?
I am going to use inf2 to run finetuned llama 3 70B should be great, I am curious about token gen speed on inf2 different sizes, if you can as a side note mention that in your next video, like this created it at x token/s
You'll find benchmarks in the the Neuron SDK documentation awsdocs-neuron.readthedocs-hosted.com/en/latest/general/benchmarks/index.html
Great video! However when I try to deploy llama2 7B on a inf2.xlarge instance, I get an out of memory error. However I have seen posts about people deploying llama2 7b on a inf2.xlarge instance. How can this be?
Please post details and logs at discuss.huggingface.co/c/aws-inferentia-trainium/66