Deploy LLMs (Large Language Models) on AWS SageMaker using DLC
Вставка
- Опубліковано 30 чер 2023
- In this comprehensive video tutorial, I will show you how to effortlessly deploy large language models (LLMs) on AWS SageMaker using the unique DLC (Deep Learning Containers) service. With the ability to deploy models like Falcon 7B and MPT7B, you can quickly configure endpoints and create the necessary infrastructure for seamless deployment.
I will guide you through the entire process, starting from the initial setup of SageMaker to the configuration of LLM model endpoints. You'll learn how to leverage the power of AWS Lambda to trigger events and generate responses using a function URL. This integration enables you to seamlessly incorporate LLM models into your applications and services.
By following this step-by-step guide, you'll gain the confidence and knowledge to deploy LLMs on AWS SageMaker with ease. Subscribe now to embark on your journey towards harnessing the full potential of large language models for your projects. Join me in this video as we explore the fascinating world of LLM deployment on AWS SageMaker using DLC.
AWS Sagemaker: aws.amazon.com/sagemaker/
Falcon 7B Huggingface: huggingface.co/tiiuae/falcon-40b
AI Anytime's Github: github.com/AIAnytime
#sagemaker #aws #ai - Наука та технологія
great video brother..I was looking exactly for this stuff and luckily landed over your channel..keep up the good work
really love to see this I will definitely follow this video and get this done today only!!
Thanks Vivek for your kind words.....
great video sir, I was looking something really like this for my client POC and fortunately landed to your videos. Thanks a ton for your efforts.
Glad to hear that... keep learning and growing.
Can anyone please say how much money it will cost for me for doing all this or is it free???
Great video Sonu. Thanks for sharing 🙏
My pleasure 😊
I learned a lot here, thanks a lot! 🙌
Glad it was helpful!
Great content and excellent tutorial! thank you
Glad it was helpful!
Excellent tutorial. Could you please give a tutorial how to feed a document and extract the answers from it and then deployment it. Thank you in advance❤
Excellent tutorial
Thank you! Cheers!
Thanks a lot brother! Means a lot
No problem
Great video ❤
Glad you liked it!!
@Thanks for Sharing Sonu!!!!!
My pleasure!
Thanks got to know about how to increase output length using hyperparameters ....
Glad it helped
Great tutorial thanks. The only bit i got lost with is creating the policy to let lambda call the sagemaker endpoint. GPT4 helped :)
Glad to hear that ..... Thanks!
@@AIAnytime better hota bata dete wo policy kaise create karna hai ?
谢谢!
Thank you so much for the support.
just great .....
Thank you!
A very nice video.
A small suggestion - Maybe after the video, you should also show how to terminate all the 3 things - notebook, model and endpoint, so that people don't incur a lot of cost.
Keep up the good work!
Noted
if i do all of the things that shows in the video so it will cost me for 24 hours? if yes then how can i save my cost by only triggering it when sending a request and then terminate it?
Can anyone please say how much money it will cost for me for doing all this or is it free???
@@sravantipris3544 If you use initial free credits then it would not cost you any money. However, make sure to disable all the services immediately else it could go to USD 600+
That Greath Video, Thank
You're welcome!
@@AIAnytime can your deploy huggingface model on azure please 🙏
Very soon. I am working on it.
Was the API Gateway used for anything? Thank you for the video again! Very useful!
Great, video. Followed along until 48:46. Please go into depth on the Policies error and how you fixed it. I have no experience with AWS and got the same error but you skipped over why the error occurred or detailed instructions of how to solve it.
Very informative video... I have a query. There will be some provision to disable the notebook and endpoint when not in use right??
Yes, correct, Ashwani. You can control it completely. Stop the endpoint, delete endpoint, etc! You can also set limits on budget etc.
Very instructive video, I would like to know if it is possible to upload the model directly to AWS without going through HF. Thanks you in advance.
Absolutely , you can do that. That will be a bit of manual deployment. Where you can push the model weights to S3 and using script you can deploy via Sagemaker. The good way is to deploy through DLC. I mean images.
I think I almost missed where you referred "AWS SageMaker DLCs", maybe possible to emphasize more on DLCs.
How are you querying the model in Jupyter Labs prior to you ever deploying the model? I am confused by that. Amazing video, just want some clarification if possible. In addition, why is the instance type configured in the deployment code different than the T5.2xlarge you configured in SageMaker?
He first configured the notebook instance that was used to run the Jupyter Notebook code, etc. then later in the video he is configuring the predictor (i.e. inference endpoint) that will hold the model and can be called from AWS Lambda
Can anyone please say how much money it will cost for me for doing all this or is it free???
I finetuned tinyllama on my own dataset. Can I deploy my finetuned model with these steps that you mentioned in this video
Absolutely....
How does it compare with hosting on a cheaper cloud provider or GPU such as Lambda Labs?
Depends..... AWS is the primary cloud provider. If you are working in IT you will probably work with AWS, Azure or GCP. AWS provides different ways of deploying this models like via one click deployment using DLC . And hourly rate or pay per go is quite affordable. But yes you have to select your options. Based on many things like data protection, privacy, governance, scaling, etc.
how much is the monthly cost of keeping the service up?
it could go up to $500 or more. We need to terminate if we don't want to incur this cost.
Error show when choosing 70b how do fix it this error showing jumpstart-dft-meta-textgenerationneuron-llama-2-70b-f
Something went wrong
We encountered an error while preparing to deploy your endpoint. You can get more details below.
operation deployAsync failed: handler error
in the video I have 2 doubts.
1) 48:48 you have created some IAM > Policy like AWSlambdaBasicExecutionRole-30e....... and AmzonSagemaker-ExecutionPolicy...
how did you did that!!
2) 46:40 what is that "path" : "\example" can you please explain that!!
1. You can create policies in the IAM. Search IAM in search box. Open IAM, look for policies in the left hand side and go inside it add policies.
2. Path is typically related to the URL path of the incoming HTTP request, specifically when working with API gateway. Mainly, you will configure API gateway after Lambda function. That's why but you can ignore that. You can just define queryStringParameters. param1=query or something depending upon how you write your lambda code.
@@AIAnytime thankyou!!
i am stucked on policy creation. anybody can help or have a guide how to create that policy?
Hi! I am trying to deploy llama 2 in sagemaker. Not sure how to use the HF tokens. The endpoint is failing saying that the repo is gated.
May be you have to login into Huggingface hub using your access tokens. Just do a login from notebook cell in sagemaker. Then you can deploy. FYI, DLC for official Llama2 is still not available for deployment. You can deploy manually or from jumpstart.
Can anyone please say how much money it will cost for me for doing all this or is it free???
how to fix permissions error problem
go to i am Access management/Policies and make new policy and create for s3, lamda, and sagemaker give access to all permission and save after that link the policy to your sagemaker project go to Access management/Policies select the policy you created then go to entitled attached and attach your sagemaker sand save problem fix
Great. Thanks for the detailed steps.
Hi, getting below error when creating a end point. can anyone help please. Error Message: " UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-09-04-16-49-09-918: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.. "
Can you check if you are using the right model? Do you need to authenticate with the Huggingface model repo? Pls look at the logs in cloudwatch in AWS console.
checkpoint = "MBZUAI/LaMini-T5-738M"
How can u fine tune this model with your own data ?
1. Prepare the data in Alpaca format 2. Spin up a machine like g5.2x large or above 3. Fine tune using PEFT and QLoRA
Hi, Thanks for the valuable video. My doubts are
1. How you handled error on lambda related to IAM policy whether it is specifically for accessing sagemaker endpoints here...
2. For getting api response, whether we do not need any flask or fastapi implementation.
Can you guide on this. waiting for ur responses and videos....
Hi Venkatesan, you have to attach the policies in IAM policy for lambda, S3, sagemaker, etc. For getting an API response, you can deploy a Microservice as well. I have created a function url from lambda that i can use in any of my app through backend like FastAPI, flask, Streamlit, etc.
@@AIAnytime Thanks for the kind response whether lambda is mandatory here or i can use inference endpoint from aws sagemaker directly in fastapi....
I am getting error here-
model = llm_pipeline()
generated_text = model(input_prompt)
print(generated_text)
ValueError: The following `model_kwargs` are not used by the model: ['return_full_text'] (note: typos in the generate arguments will also show up in this list)
i am also getting error here
where is the code?
Bro I am not able to add the policy, can you help?
Why are you not able to attach policies? Can you open an issue on GitHub repo of this video and put some screenshots so I can help you debug?
How to fix "Internal Server Error" ?
Can you paste the error complete trace?
Can anyone please say how much money it will cost for me for doing all this or is it free???
How about on Google Cloud
Very soon.....
where it code repo on your girhab?
This should be
Is this serverless?
Yes it is.....
Unable to understand clearly because of video quality, please provide high quality video.
Sure... Thanks for the feedback
When will you begin to look like your Avatar photo? 😝
Haha... I usually keep like that. Let's c 🔜
Hi, thanks for the video it teaches a lot. I just want to know, what is ideal notebook instance i can go to load and deploy starcoder 15B model? At first level, i tried with ml.g4dn.xlarge instance but i got "out of memory error".
I've got this error, how to solve it?
Test Event Name
generateTestResponse
Response
{
"errorMessage": "'queryStringParameters'",
"errorType": "KeyError",
"requestId": "4196dde2-b2e7-4863-afa7-f2a67129021b",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 10, in lambda_handler
query_params = event['queryStringParameters']
"
]
}
I'm getting the same error, did you find anything? @AIAnytime can you please check?