amazing content!! you deserve more subscribers. This is exactly the kind of tutorial I've been looking for but NO ONE EXPLAINS THIS. There's hundreds of videos on deploying open source LLMs locally but there's almost no in-depth high quality info on deploying to remote servers on AWS especially for the masses. You earned a new subscriber with this one, keep making great tutorials!
I'm trying to create 100 000 reliable tutorials for hundred complex software like photoshop, blender, da vinci resolve etc.. Llama and gpt don't give reliable answer unfortunately. Do you think finetuning llama 7b would be enough (compared to 70b)? Do you know how much time/data that would take? I also heard about embedding but couldn't get it to work on large dataset. Would that be a better option? We have at least 40 000 pages of documentation I don't know what the better approach is.
Really interesting use-case (lot's to share on this below...👀)! LLaMa 13B (which I used in my tutorials) is pretty solid. Jumping to 70B might be overkill in terms of time and resources, especially if you're initially testing out feasibility. I'd say, test the waters with something smaller like 7B, or 13B like what I used, and then decide. There's an inherent trade-off between model size and quality. LLaMA 70B will generally have better performance than LLaMa 7B due to its larger parameter count, but the improvements might be marginal beyond a certain point, and the cost in terms of computation and time might be disproportionately higher for the 70B model. That's where 13B could be the happy medium for testing/use, and once you get the tests you want to test for, maybe quickly run a 70B build for a bit and see if the performance is any different. Just keep costs in mind of course! Related to embeddings - I've seen the debates too! RAG is awesome, but there are some quirks, especially when handling broad queries. Augmenting LLMs using RAG is particularly effective for specific tasks, but there are of course certain inherent challenges, like what you shared. It’s all about how you chunk and index your data. Make sure your tutorials are bite-sized to get the most out of RAG. RAG can handle localized info retrieval well, but struggles with broader queries requiring a scan of the entire dataset, especially if it's as large as you're describing like in the 100,000s. Overall, I'd say start small, test it out, then scale. And with your 40k pages of docs, you’ve got a goldmine to work with! 💎 Please let me know how you get along with this! Curious to hear how it goes and what you build! 🛠
Can you explain how many concurrent request can g5.12xLarge instance handles when using LlaMa 2 7B or 13B model? What would be solution for such scenarios?
amazing content!! you deserve more subscribers. This is exactly the kind of tutorial I've been looking for but NO ONE EXPLAINS THIS. There's hundreds of videos on deploying open source LLMs locally but there's almost no in-depth high quality info on deploying to remote servers on AWS especially for the masses. You earned a new subscriber with this one, keep making great tutorials!
Thanks for sharing ❤️👍
underrated content, this is fire content!
So glad you liked this video!! Thanks so much for the kind words and support! 🤩🔥💯🙏
Let's goo. Keep pushing content. Great stuff brother.
Thanks so much, bro!! 🤩🔥 This was a really fun video to put together and I learned a ton in the process! 💡
that's brilliant.. I was looking for this
Outstanding.
thanks very interesting bro ❤
Thanks so much for checking out this build! Really glad you enjoyed the AI/ML content! 🙏🔥
Let’s gooo
Wooo! 🔥
I'm trying to create 100 000 reliable tutorials for hundred complex software like photoshop, blender, da vinci resolve etc.. Llama and gpt don't give reliable answer unfortunately. Do you think finetuning llama 7b would be enough (compared to 70b)? Do you know how much time/data that would take?
I also heard about embedding but couldn't get it to work on large dataset. Would that be a better option? We have at least 40 000 pages of documentation I don't know what the better approach is.
Really interesting use-case (lot's to share on this below...👀)! LLaMa 13B (which I used in my tutorials) is pretty solid. Jumping to 70B might be overkill in terms of time and resources, especially if you're initially testing out feasibility. I'd say, test the waters with something smaller like 7B, or 13B like what I used, and then decide. There's an inherent trade-off between model size and quality. LLaMA 70B will generally have better performance than LLaMa 7B due to its larger parameter count, but the improvements might be marginal beyond a certain point, and the cost in terms of computation and time might be disproportionately higher for the 70B model. That's where 13B could be the happy medium for testing/use, and once you get the tests you want to test for, maybe quickly run a 70B build for a bit and see if the performance is any different. Just keep costs in mind of course!
Related to embeddings - I've seen the debates too! RAG is awesome, but there are some quirks, especially when handling broad queries. Augmenting LLMs using RAG is particularly effective for specific tasks, but there are of course certain inherent challenges, like what you shared. It’s all about how you chunk and index your data. Make sure your tutorials are bite-sized to get the most out of RAG. RAG can handle localized info retrieval well, but struggles with broader queries requiring a scan of the entire dataset, especially if it's as large as you're describing like in the 100,000s.
Overall, I'd say start small, test it out, then scale. And with your 40k pages of docs, you’ve got a goldmine to work with! 💎 Please let me know how you get along with this! Curious to hear how it goes and what you build! 🛠
Can you explain how many concurrent request can g5.12xLarge instance handles when using LlaMa 2 7B or 13B model? What would be solution for such scenarios?
I have used smaller instance and getting issue with multiple request as the instance memory insufficient for handle multiple requests
Amazon SageMaker is good for a few users, but when the number of users is 10 thousand or 100 thousand it is no longer useful.