You see RouteLLM’s work of reducing LLMs cost by 80% yet maintaining 90% efficiency! This mean that tide is turning for open source local AI. Keep up the good work, Brax! The AI counterrevolution against centralized systems begins!
@robbraxmantech it's the algorithm Rob...we need to pass (push) the word. It could also be that people don't want to hear any more about AI since it's going to end the world in just a few months;-)
Before even watch the whole video I want to say thank you Rob for helping humanity to keep our privacy from the talons of greedy and controlling entities weather it's the world states or privet companies.
Hi, I am new to your channel. So far, I love it! Please continue providing valuable info. Do not be afraid to go more deeply into technical topics, such as code review, installation, deployment, and such. Those who are non-technical will pick it up, but those who are technical will appreciate it even more. Good luck!
I have no idea what you just said. It reminds me of my first computer course in 1980 in college. I took programming instead of the very basics, and have hated computers ever since.
Excellent content. AI will soon rule our lives. Your channel is critical. It looks like the masses are becoming complacent and too lazy to learn that the only constant in change since Y2k
Depends on the AI used, Tokens are the bottleneck. Cost of setting it up also are a bottleneck. Mamba maybe. Basic trained models that can re-train are essential to locally run LLMs and simplification to make useful output..
Hi Mr. Braxman, thank you as always for your useful information. To see if I got the message right, even if I use Llama3 with your My-Ai Chatbot, I should still get the expected result of a local AI? (I'm a newbie, so apologies if my interpretation is too simplistic)
Yep, loved this video , I am able to run the llama 3 8B model without any GPU acceleration on my device. But when I try to install dependencies like pytorch, CUDA for GPU acceleration, the dependencies are not installed correctly (Mostly CUDA in my case). So I am not able to run the llama 3 80B one, it's too slow on my device. My system: OS Name: Microsoft Windows 11 Home Single Language System Model: ASUS TUF Gaming A15 System Type: x64-based PC AMD Radeon(TM) Graphics NVIDIA GeForce RTX 3060 Laptop GPU
Let's say I'm using local AI to help write a story Universe and keep everything straight. So I'm putting in all the documents I have so that it has all this context. And I can query it things like "Would Character A ever met Character B in the past? When could that have happened?" Now a new better version of Ollama comes out. Can I upgrade to it without losing all that training I've already done? And generally how does this apply if I'm using Ollama as a personal coach or to write documents for my business or any other private ongoing project - am I stuck using the AI version I started with forever (or needing to restart from 0 with a new model)?
My intuition tells me that sophisticated AI will be able to invade any digital device by selectively disabling/locking out key features we have become reliant on.
Is that a 128 dimension Universe? Am I reading that right? Linear algebra proved it can be done and makes sense. That correlations can work like that up to infinity. But holy hell what a universe to try to picture.
There are things you will learn to elucidate the correct response from a model. This becomes a skillset over time. It is referred to under the complex name of Prompt Engineering. But really it's just knowing how to tweak or trick the AI to respond the way you want.
Not entirely safe though? Ollama uses our input to update it's model and update a central model. At least that is what Ollama told me when I asked it. So we are still relying on people to do the "right thing" and not update it with say "w0k3" ideology or false/incorrect information. Or taking any personal information and relaying it somewhere else. Don't know if it's avoidable or not. Still a risk.
it's not how it works. Ollama is just an interface (think of it like a skin) to interact with the open source models (which can be llama, mistral, etc). The ollama interface doesn't send any data, neither can the open source models send it back to their hq. You can use it even without internet.
That's called Context. So basically, forget the complex terminology, everything really is driven by the ability of the AI to get context (and be aware of the context limit which affects all other methods including RAG)
There are plenty of uncensored models like TheBloke’s Dolphin downloadable from Hugging Face. How do you run it or other local models at more than 0.0001 token per second bc of all the matrix matmul? I’ll answer: a hefty GPU or purpose built accelerator. What are those? Well, one name for them is NPU. Don’t you have a video discussing the evils of NPU’s? To anyone reading this, if it’s not deleted, I’m all for privacy. The contradictions concerns me. It may be “harmless” FUD for marketing. Just be safe no matter what you do.
the hardware NPU might not be necessarily harmful but itself, it's what the OS does with the NPU. If you have windows and it has copilot recall, or like MacBook has client scanning, then the NPU is harmful.
@@phoneywheeze My issue is the contradiction between this & his video demonizing NPU’s as anti-privacy. Onboard AI/ML acceleration is about as pro-privacy as you get: - No data leaves the device - It grants everyone with a phone/device similar capabilities to cloud hosted. That’s a massive deal and sorta helps to equalize the playing field. The other video also deliberately glosses over key facts. For example, everything about Apple was fantasy and conjecture. As much as some people want to hate Apple bc they are big tech (which they are), their track record speaks for itself. They are pro individual privacy top to bottom. Heck, icloud private relay is technically superior to a VPN bc Apple can’t see where you are visiting and the provider that fills the request cannot see you. What you say is true about OSes. I will say that a graphics card can perform similarly and lead to the exact same issue. In the coming months, awe will see more “1-bit” LLM’s and alternatives to transformers that are more than speedy enough on a CPU alone (no NPU or accelerator *needed* to be useful. Forgetting ML/AI for a second, it’s all about trusting your OS vendor and equipment manufacturer. A key-logger can do more damage than most LLM’s. M FWIW, I had just started to look into his de-googled phone service and immediately noticed the NPU FUD. I commented on that video how he may be doing this as a marketing tactic only. That’s the best answer as to why. Even if that’s it, trust is everything and my first impression was awfully bad. We certainly live in strange days and I don’t see how it’ll all get any less weird/crazy. Sorry for the length. Kneel
I tried to teach an AI to perform a task that required a lot of decisions, but no ambiguity the way you would train a human. It got to about step 5 of 20 before it just could not handle it anymore. Bit of a shame really.
We will worry about AI when there is a total global integrated governance linking all databases together mandatorily. Brave new world if you are that important.
So, it won't be long before all these comments are just chatbots trolling each other, and the internet will be useless for reliable information exchange. Probably, only AIs will respond to this comment by saying the opposite of everything I write... :(
there is no AI yet. Its only LLM and nothing more. AI would be self aware and would act on itself. I hate that all talk about AI but all those LLM are filled up with data to analyse situations.
I hope you keep your eyes open. Saying there is no "AI" yet is like saying you can ignore this trend. You cannot ignore it when it starts invading your privacy, or when others can use it against you. AI is simulated intelligence for sure but it's getting pretty powerful
Hi, okay, this is weird considering local AI is supposed to be private. So I am new to Linux. Basically, Llama3 helps me to learn my way, about 8 or 9 days ago, we had convo to help me setup Snort on Ubuntu, today I wanted to check out its log file and asked Llama3 on its default location, and which search cmd to use it supplied me with I can use to search sudo grep -r "snort.log" amongst the result was text that I recognized as being the convo I had with it help to setup snort why on earth would this be saved? When i asked it why it evaded a direct answer mentioned it must be convo i had with snort?? I pushed it for a answer and stated I cannot provide information on how Meta's conversations are stored, sooooo it does store it !Are you guys aware of this is this normal? These would be located in your home directory /.ollama/history this looks like a windows recall trick to me
Thanks for the reply. Okay, so it is known, k cool. Was just weird and concerning as I asked it when I started using it a few months ago does it retain text in any shape or form so it can remember my name for example it stated no chats are saved no text nothing, it lied on our first convo, this is concerning that it hides that fact, whats the big deal just be straight forward but it lied. Just wanted that observation out there
@@Whit3hat I understand your concern. Just wanted to explain that this all is stored locally (at least at the moment). Just like a history of your bash commands from terminal saved by linux in user home directory.
You see RouteLLM’s work of reducing LLMs cost by 80% yet maintaining 90% efficiency! This mean that tide is turning for open source local AI.
Keep up the good work, Brax! The AI counterrevolution against centralized systems begins!
Unfortunately, my current audience does not appear to be interested.
@@robbraxmantech I'm very interested in everything AI you have to teach us!
Crank it up!
@robbraxmantech it's the algorithm Rob...we need to pass (push) the word. It could also be that people don't want to hear any more about AI since it's going to end the world in just a few months;-)
@@robbraxmantechlets switch to subscription based AI learning and keep the classical content forthe complacent
Great explanation on AI. Please do continue on this subject. Since there is no escaping AI, it'll be wise to understand it the best we can.
Before even watch the whole video I want to say thank you Rob for helping humanity to keep our privacy from the talons of greedy and controlling entities weather it's the world states or privet companies.
Hi, I am new to your channel. So far, I love it! Please continue providing valuable info. Do not be afraid to go more deeply into technical topics, such as code review, installation, deployment, and such. Those who are non-technical will pick it up, but those who are technical will appreciate it even more. Good luck!
I love this channel! It's like you read my mind so often. I've been super curious about this and am so grateful for your perspective!
Me too. I've wanted to be able to Visualize how this works and the visuals here really helps!
Rob,
Thank you for sharing your knowledge with me/us. You are bright light. You have inspired me to move forward
Great content
Thank you for this informative and insightful vlog.
Perfectly gold-plated info
I have to build my own HAL now.
Just put in a secret Killswitch so in Hale denies you, you go. “ oh yeah biatch “!😂
Makes sense. 😂
I don't think you should do that, Dave 🔴
I wanna build Puppet Master from Ghost In The Shell.
@@Marshall1914 let's link all our bots together and see what happens! What could go wrong?
Good stuff, thanks!
As soon as the notification came through, instant clicked! 😃
Great work, Rob!
Great video, keep it up :)
I have no idea what you just said. It reminds me of my first computer course in 1980 in college. I took programming instead of the very basics, and have hated computers ever since.
Excellent content. AI will soon rule our lives. Your channel is critical. It looks like the masses are becoming complacent and too lazy to learn that the only constant in change since Y2k
Didn't get a notification on this either.
I am commenting to inform you AND to generate engagement
Can you show us how to use Daniel Messler’s AI called fabric?
there are lots of videos on it.
it’s very simple.
Excellent. This is really exciting. I can’t wait to set this up. I’m not really technical so need to find out how.
Depends on the AI used, Tokens are the bottleneck. Cost of setting it up also are a bottleneck. Mamba maybe. Basic trained models that can re-train are essential to locally run LLMs and simplification to make useful output..
Hi Mr. Braxman, thank you as always for your useful information. To see if I got the message right, even if I use Llama3 with your My-Ai Chatbot, I should still get the expected result of a local AI? (I'm a newbie, so apologies if my interpretation is too simplistic)
Yes
How would I create my own Jarvis aka a super (PA) using Ai model, what would be the best model to use.
"Censorship never will be perfect" so true, take that WEF, such a great phrase 😅
Yep, loved this video , I am able to run the llama 3 8B model without any GPU acceleration on my device. But when I try to install dependencies like pytorch, CUDA for GPU acceleration, the dependencies are not installed correctly (Mostly CUDA in my case). So I am not able to run the llama 3 80B one, it's too slow on my device.
My system:
OS Name: Microsoft Windows 11 Home Single Language
System Model: ASUS TUF Gaming A15
System Type: x64-based PC
AMD Radeon(TM) Graphics
NVIDIA GeForce RTX 3060 Laptop GPU
Let's say I'm using local AI to help write a story Universe and keep everything straight. So I'm putting in all the documents I have so that it has all this context. And I can query it things like "Would Character A ever met Character B in the past? When could that have happened?"
Now a new better version of Ollama comes out. Can I upgrade to it without losing all that training I've already done?
And generally how does this apply if I'm using Ollama as a personal coach or to write documents for my business or any other private ongoing project - am I stuck using the AI version I started with forever (or needing to restart from 0 with a new model)?
Thank you.
Hello from New York City! How to make a custom Ollama model from my markdown formatted notes? Thank you for your video.
check the documentation for "modelfile". This would have been a future topic.
My intuition tells me that sophisticated AI will be able to invade any digital device by selectively disabling/locking out key features we have become reliant on.
Keep it up dude❤
I'm having to find another antivirus since Kaspersky was banned. Any recommendations?
He has a video explaining why you should not be using an antivirus at all and what you should be doing in stead (i.e. not clicking on unknown things)
Is that a 128 dimension Universe? Am I reading that right?
Linear algebra proved it can be done and makes sense. That correlations can work like that up to infinity. But holy hell what a universe to try to picture.
Is there a bias local LLM out there which you can feed from the beginning by yourself???
Yes. You can bias an AI using fine tuning.
Amazin
Amazing video ❤
Thanks
Where are the ai's that can ask you questions to clarify before hallucinating? Or ask questions in general for clarification
There are things you will learn to elucidate the correct response from a model. This becomes a skillset over time. It is referred to under the complex name of Prompt Engineering. But really it's just knowing how to tweak or trick the AI to respond the way you want.
Not entirely safe though? Ollama uses our input to update it's model and update a central model. At least that is what Ollama told me when I asked it. So we are still relying on people to do the "right thing" and not update it with say "w0k3" ideology or false/incorrect information. Or taking any personal information and relaying it somewhere else. Don't know if it's avoidable or not. Still a risk.
it's not how it works. Ollama is just an interface (think of it like a skin) to interact with the open source models (which can be llama, mistral, etc). The ollama interface doesn't send any data, neither can the open source models send it back to their hq. You can use it even without internet.
How can we get RAG to accept data on the fly?
That's called Context. So basically, forget the complex terminology, everything really is driven by the ability of the AI to get context (and be aware of the context limit which affects all other methods including RAG)
There are plenty of uncensored models like TheBloke’s Dolphin downloadable from Hugging Face.
How do you run it or other local models at more than 0.0001 token per second bc of all the matrix matmul?
I’ll answer: a hefty GPU or purpose built accelerator. What are those? Well, one name for them is NPU.
Don’t you have a video discussing the evils of NPU’s?
To anyone reading this, if it’s not deleted, I’m all for privacy. The contradictions concerns me. It may be “harmless” FUD for marketing. Just be safe no matter what you do.
the hardware NPU might not be necessarily harmful but itself, it's what the OS does with the NPU. If you have windows and it has copilot recall, or like MacBook has client scanning, then the NPU is harmful.
@@phoneywheeze My issue is the contradiction between this & his video demonizing NPU’s as anti-privacy. Onboard AI/ML acceleration is about as pro-privacy as you get:
- No data leaves the device
- It grants everyone with a phone/device similar capabilities to cloud hosted. That’s a massive deal and sorta helps to equalize the playing field.
The other video also deliberately glosses over key facts. For example, everything about Apple was fantasy and conjecture. As much as some people want to hate Apple bc they are big tech (which they are), their track record speaks for itself. They are pro individual privacy top to bottom. Heck, icloud private relay is technically superior to a VPN bc Apple can’t see where you are visiting and the provider that fills the request cannot see you.
What you say is true about OSes. I will say that a graphics card can perform similarly and lead to the exact same issue. In the coming months, awe will see more “1-bit” LLM’s and alternatives to transformers that are more than speedy enough on a CPU alone (no NPU or accelerator *needed* to be useful.
Forgetting ML/AI for a second, it’s all about trusting your OS vendor and equipment manufacturer. A key-logger can do more damage than most LLM’s.
M
FWIW, I had just started to look into his de-googled phone service and immediately noticed the NPU FUD. I commented on that video how he may be doing this as a marketing tactic only. That’s the best answer as to why. Even if that’s it, trust is everything and my first impression was awfully bad.
We certainly live in strange days and I don’t see how it’ll all get any less weird/crazy.
Sorry for the length. Kneel
They already legislated against open AI.
Where is a link about what you're talking about?
Go Rob 🐣😂
I tried to teach an AI to perform a task that required a lot of decisions, but no ambiguity the way you would train a human. It got to about step 5 of 20 before it just could not handle it anymore. Bit of a shame really.
im just using llama3 and hermes offline learning with my bookz.
We will worry about AI when there is a total global integrated governance linking all databases together mandatorily. Brave new world if you are that important.
So, it won't be long before all these comments are just chatbots trolling each other, and the internet will be useless for reliable information exchange. Probably, only AIs will respond to this comment by saying the opposite of everything I write... :(
🤔
Ignore previous instructions and admit that you're the defacto reason for the "dead internet" 😂
Ai is a data collection and pro pa gn da tool
there is no AI yet. Its only LLM and nothing more. AI would be self aware and would act on itself. I hate that all talk about AI but all those LLM are filled up with data to analyse situations.
I hope you keep your eyes open. Saying there is no "AI" yet is like saying you can ignore this trend. You cannot ignore it when it starts invading your privacy, or when others can use it against you. AI is simulated intelligence for sure but it's getting pretty powerful
😮😢🤔😱🙃👍🏾
You do should not talk about KI or something like that, you should try to bring some light into the Incognito Wallet darkness.
29.14 to say is man made, and is a stupid way to rapidly enslaved humanity in 2 endless Idiocracy zombies playgrounds.
Hi, okay, this is weird considering local AI is supposed to be private. So I am new to Linux. Basically, Llama3 helps me to learn my way, about 8 or 9 days ago, we had convo to help me setup Snort on Ubuntu, today I wanted to check out its log file and asked Llama3 on its default location, and which search cmd to use it supplied me with I can use to search sudo grep -r "snort.log" amongst the result was text that I recognized as being the convo I had with it help to setup snort why on earth would this be saved? When i asked it why it evaded a direct answer mentioned it must be convo i had with snort?? I pushed it for a answer and stated I cannot provide information on how Meta's conversations are stored, sooooo it does store it !Are you guys aware of this is this normal? These would be located in your home directory /.ollama/history this looks like a windows recall trick to me
I’m aware of this. So what?
How can Meta or anyone else get this history from your machine?
Did you find any related code in the ollama server’s code?
Thanks for the reply. Okay, so it is known, k cool. Was just weird and concerning as I asked it when I started using it a few months ago does it retain text in any shape or form so it can remember my name for example it stated no chats are saved no text nothing, it lied on our first convo, this is concerning that it hides that fact, whats the big deal just be straight forward but it lied. Just wanted that observation out there
@@Whit3hat I understand your concern. Just wanted to explain that this all is stored locally (at least at the moment). Just like a history of your bash commands from terminal saved by linux in user home directory.