Deploying Hugging Face models with Amazon SageMaker and AWS Inferentia2

Поділитися
Вставка
  • Опубліковано 3 жов 2024
  • In this video, I walk you through the simple process of deploying a Hugging Face large language model on AWS, with Amazon SageMaker and the AWS Inferentia2 accelerator.
    ⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.subst.... ⭐️⭐️⭐️
    Notebook:
    gitlab.com/jul...
    Deep Dive: Hugging Face models on AWS AI Accelerators
    • Deep Dive: Hugging Fac...
    Blog posts:
    huggingface.co...
    aws.amazon.com...
  • Наука та технологія

КОМЕНТАРІ • 11

  • @briangman3
    @briangman3 4 місяці тому +2

    Great video!

  • @caiyu538
    @caiyu538 6 місяців тому +4

    Great

  • @rileyheiman1161
    @rileyheiman1161 5 місяців тому +1

    Great video Julien, thank you! Does the model have to be pre-compiled to run on AWS (EC2 or SageMager)?

    • @juliensimonfr
      @juliensimonfr  4 місяці тому

      Thank you. If you're going to deploy on SageMaker, yes. At the moment, our container won't compile the moment. On EC2, the model will be compiled on the fly if needed.

  • @leonardoschenkel9168
    @leonardoschenkel9168 3 місяці тому

    hi Julien! Do you have any tips on how can I convert a ComfyUI workflow SD1.5 based model to 🤗 or run directly on INF2 ?

  • @briangman3
    @briangman3 4 місяці тому

    I am going to use inf2 to run finetuned llama 3 70B should be great, I am curious about token gen speed on inf2 different sizes, if you can as a side note mention that in your next video, like this created it at x token/s

    • @juliensimonfr
      @juliensimonfr  4 місяці тому

      You'll find benchmarks in the the Neuron SDK documentation awsdocs-neuron.readthedocs-hosted.com/en/latest/general/benchmarks/index.html

  • @larsjacobs253
    @larsjacobs253 5 місяців тому +2

    Great video! However when I try to deploy llama2 7B on a inf2.xlarge instance, I get an out of memory error. However I have seen posts about people deploying llama2 7b on a inf2.xlarge instance. How can this be?

    • @juliensimonfr
      @juliensimonfr  5 місяців тому +1

      Please post details and logs at discuss.huggingface.co/c/aws-inferentia-trainium/66