Local vs. Cloud LLMs/RAG - Let's FINALLY End this Debate

Cole Medin

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 22 лис 2024

КОМЕНТАРІ • 23

@kobe5113 Місяць тому ⁺²
again, awesome video, thank you very much! :) you make everything very clear
@ColeMedin Місяць тому
I appreciate it a lot!! Clear and easy to follow is my primary goal so thank you for calling that out.
@zdenekjanda2315 2 місяці тому ⁺⁶
Great summary. You could also add one more key point and that is focus on data. You need to have all your foundation data on one place, so you can run agents over everything at same time efficiently. This is key, in my opinion saas will die because exactly of this, you can never maximize yield with data scattered over dozens of apis. With focus on data it's no more local or cloud, it's owned or not. You can for example run all your open source stack with your own data in cloud kubernetes cluster, or even local kubernetes in your home, and use openai strawberry for hard stuff, anthropic for coding and your two rtx 4090 for bunch of task specific 24/7 agents which will keep busy with simple tasks. With such stack and apis, you can use voice controlled aider to code, test and run new versions of software today, while driving home 😅
@ColeMedin 2 місяці тому ⁺²
Thank you! And I love the picture you're painting here haha, that would be the DREAM...
You hit the nail on the head with SaaS - that's actually been the toughest part for me when I try to justify saving time by adopting another product/API/whatever into my suite of tools. Most of the time, it's best to keep the number of services to a minimum and host as much as you can!
I am actually working on a video showcasing using Docker compose to run a bunch of services locally for LLMs/RAG - that'll be a good start towards something like what you are describing!
@zdenekjanda2315 2 місяці тому ⁺²
@@ColeMedinthen let's make this dream happen. I have sent you mail, let's brainstorm what can be done about it !
@lancelot222 2 місяці тому ⁺²
Is there a best cloud RAG you'd recommend? If Chatbase is considered a RAG, the things I didn't like about it are:
- 11 million character limit (would be great to have something that just synced with your Google Drive)
- It would go outside your documents to answer questions (even when it was told not to)
The convenience was great though.
Awesome video, thanks for breaking this down!
@ColeMedin 2 місяці тому
For a cloud RAG solution I would highly recommend either Supabase (with pgvector) or Pinecone.
I have not actually heard of Chatbase before, but that certainly doesn't sound ideal that there is a character limit and it always performs RAG even when asked not to...
I actually have another video on my channel where I build a RAG AI Agent that syncs with a folder in Google Drive! So if you're looking for that and are down to implement something yourself, feel free to check that out:
ua-cam.com/video/PEI_ePNNfJQ/v-deo.html
@tecnopadre 2 місяці тому ⁺¹
I think you are very right on this. Congratulations for your videos
@ColeMedin 2 місяці тому
Thank you very much!!
@anonymousalias69 2 місяці тому ⁺²
Can you please make a video on the best ways to monetize AI-based skills? This is still a very early phase of adoption for businesses and most aren't even aware of how to integrate this technology into their operations.
We'd love to know how are you doing it!🙂
@ColeMedin 2 місяці тому
I appreciate the suggestion! I am certainly open to making a video like this in the near future - just working on building up an audience first so people will care to hear about my strategies!
@amerrashed6287 2 місяці тому ⁺¹
I have mac air m2, most of local host llms very slow. You need really good machine so it's not my option now!😅
@ColeMedin 2 місяці тому ⁺²
Yes that is true! Generally you'll need a GPU with at least 8 GB of VRAM for an 8 billion parameter model like Llama 3.1 8B and 16 GB of VRAM for a 70 billion parameter model like Llama 3.1 70b. So not cheap!
One fantastic option if you want to use open source models but not run them yourself is to go with Groq!
@Gagsi73 Місяць тому
How would a hybrid model look like? For example LLM in the Cloud and RAG locally. Does that make any sense? Would that keep any possible sensitive data on my local machine? Or will that also expose it to the public? Could though reduce the costs as I would not need to buy any RAG services.
@ColeMedin Місяць тому
Great question! So having the RAG process local is unfortunately not enough to keep your data private because whatever is retrieved from the knowledge base would still be sent to the LLM in the cloud as a part of the prompt. RAG is what gets relevant document chunks from your database. But those relevant chunks are included in the prompt then for that extra context.
That is true though that it could save you costs since you wouldn't have to pay for RAG services!
@stavos1221 Місяць тому ⁺¹
I am failing to understand the benefits and differences of self hosting vs managed hosting... can someone please explain to me or refer me to a guide? much appreciated
@ColeMedin Місяць тому ⁺¹
Great question!
Pros of self hosting:
- Highest level of privacy
- No recurring costs for the infrastructure (besides electricity)
- You can use the machine for other things if you want (like I use my PC to run LLMs and edit my videos)
- Most control over the exact hardware
Pros of managed hosting:
- Lower up front cost
- Easier to scale by moving to a more powerful machine or spinning up more
- You don't have to maintain any physical hardware
@tecnopadre 2 місяці тому
I think also the hard answer is when a client asks: how much is the inbound and outbound going to cost monthly? Of course, it depends, although I'm working on a table of users + amount of content, it would be awesome to have a video about it. Models, quantity, prices...
@ColeMedin 2 місяці тому
Yeah great point! So you're looking for a video specifically on how to track token usage/cost when using different models for various use cases?
@tecnopadre 2 місяці тому
@@ColeMedin No! I already do it with Langsmith. I'm just trying to answer with some type of measure, the amount a client is going to spend. How would you reply to that answer? With a table? Which variables should that table have? Users? Lenght of what questions? Models use? I know it's not easy
@ColeMedin 2 місяці тому ⁺²
@@tecnopadre Oh okay sweet! I see what you mean more.
I would have the client estimate how many requests they think the system will have (put that on them), and then you can estimate the tokens for each request (based on the system message, the # of documents for RAG + chunk size, what you think the output will look like, etc.) and then in a table list out the cost based on tokens for various models. So keeping it simple in terms of the number of variables!
@jarad4621 2 місяці тому
Why not FAISS?
@ColeMedin 2 місяці тому
Dude, fantastic question! FAISS is awesome.
The two biggest reasons are that FAISS isn't something you can plug into n8n super easily and I wanted to just stick to everything that is packaged in this local AI starter kit.
But I will 100% be making content around FAISS in the near future!

Наступне

Автоматичне відтворення

The HARD Truth About Hosting Your Own LLMs