Nirav Patel is always such an inspiration. A geek who made his own company and is still that same geek who interacts with the community and improves the product every single year. I know Framework will never try to sell "AI laptops" to just be on the money train. And this video was super informative and down-to-earth!
"Everyone" as in greedy tech investors who care too much about short term profit to think about whether the way this 'AI' was built is even legal at all, or if maybe it is in fact the largest instance of mass copyright infringement in the history of copyright and AI companies owe everyone whose work (not to mention private information) they stole royalties and damages.
@@ninjadodovideos I'm not averse to saying that the accessibility to not JUST such a vast amount of aggregated knowledge but a utility that can process and integrate it for anyone is seconds is (and will be) more beneficial and valuable to humankind in total than any patent or copyright infringement around. I go by the adage " 'Strong' is what we make each other. ". 99% of capitalism's philosophy is predicated on this all being a zero sum game which never has been true. Wealth is made by doing more with less. This technology is our path to doing almost everything with almost nothing. The birth of the age of abundance is upon us.
@@ninjadodovideos Tantacrul just released a massive video about facebook's shady practices and he gets into AI in the final chapter of that video... truly disgusting stuff... Big tech has been chasing profits without paying for the consequences of their actions for too long, and hopefully AI will be the last straw in that fight for people's rights and privacy but I'm pessimistic in our current state... @ExtantFrodo2 This reads like AI-generated slop, which proves our point if anything...
I installed GPT4ALL and Ollama on my Framework 13. So my Framework 13 is now an AI PC. More importantly, I just upgraded from a 4-core CPU to 12 cores, so it is arguably the best AI PC, because I can upgrade it!
i love this. framework coming and being like, oh, you want AI? why are you asking, you can already have AI. and then, proceeding to explain how to run ollama. as a long time ollama user, this makes me really happy.
I still don't believe raytracing has value for gamers. Just give us better framerates and higher resolutions, make 4k affordable for everyone. No need for car reflections. RTX only exists because the same tech can be sold as AI chips, so it's paying for its development
Exactly. AI models already run on anything with a reasonably okay GPU. We don't need to pay hundreds of dollars more for increasingly complex, harder to repair hardware, that can't be used for anything outside of those AI models.
@@The-Catjust because people like him more than most other CEOs doesn't mean that they are in a cult. Though I do see your point and hope it doesn't go that way either.
My respect for Framework and NIrav Patel has just been cemented. What a great video on the topic and such a gift to the community! SW Engineer for 30+ years and this was such a soothing relief compared to the marketing hype and meaningless drivel I've seen from so many other companies.
I like this company a lot. They're not shoveling hype and have made a dent. I'll be interested to see how their pricing model adjusts once they have enough critical mass. For me right now their pricing is a barrier to entry.
Repair friendly hardware will always carry a premium. Manufacturing wise, soldering things to the PCB is cheaper. I don't think many people appreciate that. For the consumer though, the lifetime cost of ownership is a lot lower (also cheaper in terms of non-monetary things like aggravation and time cost when things do break). Unfortunately this is not reflected in sticker price. It's difficult to convey this at the time of purchase. For the consumer who cares, maybe one way to get a sense of this is to look up repairability scores from sources like iFixit.
You're probably already aware of this if you follow their activities, but in case you aren't: Framework sometimes has old stock at steeply discounted prices, like 11th gen laptops going for like $600 or 500 or so when they were being phased out iirc, which is a super competitive price on a workhorse laptop for the modest user :) Higher processing power is more expensive for sure, but eGPU's can be a decent solution if some portability can be sacrificed while still being far more portable than a desktop :) Hope things can work out for you soon!
Thank you Nirav for this introduction. I just went by your steps, installed ollama and asked it - `what is framework laptop` And it looks that my llama from paraller universe)) This is response (shortened) ``` Framework is a brand of modular laptops that were designed to be highly customizable and upgradable. The idea behind the Framework is to create a platform where users can easily upgrade individual components of their laptop, rather than having to replace the entire device. However, in March 2022, it was announced that the Framework would no longer manufacture new laptops under this model, citing constraints and a shift towards more traditional computer manufacturing. Although the original Framework laptop is no longer available for purchase, some third-party companies have started to develop their own modular laptop designs inspired by Framework's concept. These newer products may offer similar upgradeability features, but with different approaches and designs. It's worth noting that the Framework has shifted its focus towards developing custom hardware solutions for businesses and organizations, such as custom-built laptops and other devices, rather than consumer-facing products. ``` I would like our universes not to intersect in the future.
As someone who strives to build autonomes robot, AI does play a crucial part in my academic research. But in most cases I'd use the GPU server from our institute to train the reinforcement learning policy. Better linux support somehow becomes the most important factor when I consider a new laptop, and I'm glad I chose framework 13.
Great intro to the topic! I would just add that LLMs don't have "knowledge" or reasoning per se, but rather try to generate the statistically best response to your inputs-I've heard it described as "autocorrect on steroids" which pretty much sums it up. That's why they sometimes give nonsensical or-even more dangerously-reasonable-sounding, confidently wrong answers. It's an important distinction that highlights why no one should rely on LLMs for anything mission-critical.
LLMs are also quite good at transforming information, that's where RAG (retrieval-augmented generation) and tools come into play. Instead of asking the model a question and relying on its own "knowledge", you can tell it to fetch an article from Wikipedia, pick the information you're looking for, and summarize that information for you. That usually works pretty well. You can also give LLMs access to tools such as calculators to enhance their capabilities. Also, one interesting reason LLMs tend to hallucinate is that they were trained to consider "I don't know" a bad answer, so they'll just make stuff up. Researchers from BuildIO recently found that simply telling the LLM in the system prompt not to hallucinate and just say if they don't know or can't do something actually works, and significantly reduces those "confidently wrong" answers.
The delay before second response is more likely caused by the model being unloaded from VRAM because it wasn't called for some time. Ollama allows you to tweak that and you can have the model stay in VRAM indefinitely.
Framework's direct and open communication is so refreshing! I love your mission and your responsiveness to the community. I don't have a framework laptop (yet) but I really enjoy the content. Keep it up!
That's actually a great summary on what's a local LLM, how to run them, and what's the state of support on consumer HW (like VRAM size and nVidia vs AMD). It would have been useful back when I started to experiment with this stuff myself. :)
Thanks for the video, and glad to see such a level-headed take on AI, instead of just diving into the marketing hype like so many other companies! Also, from my personal experience, the reason why its taking so long to respond at 7:33 might be related to Ollama aggressively trying to free up GPU resources and off-loading the model from the GPU when it's doesn't get used for a few minutes. Any new prompt past a few minutes would need it to load back the model on the GPU, which takes some time. I'm sure the screen cap is contributing to the delay as well, but I think its primarily the model loading.
Honestly this was 100% the best way to address this issue, showing the things that you could do with AI but not overhyping it. Showing the actual use cases (Running Local LLMS) is awesome and showing you how to do it is even better. I'm definitely excited to try out my own local LLMs too now
WOW!!! Great Video!!! I (quickly) read all of the comments and I want to repeat many/all of the positive ones!!!! So many of the positive comments said exactly what I was thinking!!!! (and the few neutral/negative comments are just a bit of noise to me...) Great timing too! I already own and love my Framework 13... and was considering getting a Framework 16 for running local models... so it was nice to see the actual performance!!!! Well done all around!!!! (here is where I want to repeat all the glowing praise for the CEO, the company, the community, etc... but I will stop... and let the other comments speak for me)
Hi Framework! What I would really love is a better trackpad. Apples force trackpad is unrivaled to me and I wish you had a competitor or at the very least to allow clicking throughout the pad.
A bit of off-topic, but I am so pumped for your RISC-V experiment. If I will be able to afford it (price is still not announced?) I am getting one as soon as you start selling it.
@@aeghohloechu5022 AI is 100% more of a meme at this point then a gaming laptop. Framework would have already had my money if they had offered more GPU options.
I really like ollama. I've been using it for a year to test all sorts of models, though usually I end up using Claude or OpenAI. Hopefully Cerebras will let us test Llama 405b soon, though a couple other slower providers have it already. (The newest o1 is pretty smart but extremely verbose.)
You can run a model on RAM and CPU. I believe it's achievable with the current architecture of the framework 13 ( depends on the cache size of the CPU for inference and RAM size for storage) but for anything more technical besides chatbot applications using anything other than a dedicated GPU will need technical knowledge on programming/DEV OPS . the hype is just there to keep Nvidia stock price afloat after the crypto bubble burst.
The downside of Lunar Lake is the RAM is soldered on the CPU package meaning it's completely non-upgradeable. I'd expect Lunar Lake to be skipped and Arrow Lake to be the next Intel generation on Framework mainboards.
@@hyperspeed1313I’m not sure it’s as simple as “skipping Lunar Lake”. Lunar Lake, AMD Strix Point, Snapdragon X Elite, most ARM SoCs, most RISC V SoCs that you would use in this case all use soldered RAM. The same class of SoC that went into Lunar Lake and Strix Point (Intel and AMD’s mobile SoCs respectively) tend to have more carefully binned CPUs, integrated GPUs of decent performance, etc… The future successors to those chips will probably have soldered RAM as well. Arrow Lake isn’t really suited to a mobile format, and even if it’s okay for workstation laptops / desktop replacements, it’ll be dependent on having a dedicated GPU, meaning your Framework will be essentially sacrificing battery life (by having a dGPU) for upgradable RAM. Some consumers might make that tradeoff (and if they do, it’ll work out really well!) but a lot of consumers who have laptops want them to be portable and have a great battery life, which his what the mobile SoCs AMD and Intel, and now Qualcomm make are keying their planning around. So, is it really a great strategy to not give Framework customers access to best in class battery life? Now, I don’t want that outcome personally. I love upgradeable RAM. But Framework is beholden to the SoCs that are available off the shelf in the wider market, and I’m not sure that failing to use what their competitors are using is really the best approach. I think what’s more important than upgradable RAM is being able to upgrade the SoC, because a lot of consumers (most, in fact), even when buying a highly upgradable desktop will just buy RAM with their CPU, and never upgrade it. I think as long as Framework still lets you swap out the mainboard non-upgradable RAM isn’t necessarily the worst outcome possible.
@@hyperspeed1313 the ram not being upgradable, it would make even more sense to buy a framework with it. For a laptop, I would take that trade off with a better battery life.
A bit disappointed not to see image generation here as well. Projects like Fooocus have made it very easy to setup and use, even on Windows, and even with AMD cards.
Wish this video went more into the controversial aspects of LLM's regarding training sets, but otherwise I think this was a very level-headed video, especially coming from the CEO.
This changed my perspective on these “ai computers” a lot, it seems like it could be huge for any smaller companies looking to make their own ai platforms for people to use locally with their computers files too
If you are running ollama on linux, make sure to get the "rocm" version of it for it to utilize the GPU. By default it runs in cpu mode, which is quite a bit slower than running on the GPU, though can be useful if you have a really memory pool for your cpu.
The main reason the second llama 3.1 demo took a while to get going is that Ollama is designed to run as a system service, so that it's always available to other software like Open WebUI or Continue. To make sure it doesn't constantly eat all your memory just waiting for a client to connect, it unloads the model after a short period of inactivity, and then loads it again once a request comes in. Loading models takes a while.
15:18 when chatGPT first came out, the slow "typing speed" output was a gimmick to make the output seem more effortful and thus something to take more seriously. (The delay up to the first output was actual processing.) Pretty sure it's *still* a gimmick - otherwise most laptops would be painfully slow in their output - so why do you (or anyone) put up with it, let alone praise how "fast" it is?
Wouldn't putting FW16 on the side like in the background degrade the CPU cooling performance since CPU cooling is only using side exhaust? I myself also wonder what posture to use when I don't need to look at the screen but want the laptop to have better cooling than just putting it on a desk or couch.
I bet this CEO know much more than some other random tech company CEO about AI stuff. Talk like a real practical dude, making himself sounds real trustworthy. I will consider Framework laptop next time.
I like running whisper for speech to text on my machines. Runs well even on my phone, runs even better on my workstation. If the accelerators can be hooked up for that use case under linux, I'll be happy to see it.
right to repair in the context of AI is AI that the end user can modify over time and ensure longevity of functionality vs cloud hardware that can change on you whenever
I don’t even care about AI, i just want a Ryzen HX 370 framework laptop because it’s the first x86 processor ever made that absolutely destroys Apple Silicon.
I love that you don't just jump on the AI hype train to get another selling point but that you can acknowledge the noise around it and focus on what's real 👏 Also "i use ollama from a group called ollama and they make ollama" was pretty funny to me😂
Hello Guys at Framework, is NVIDIA GPU not possible? My app(not games) behaves funky when using AMD eGPU(not their iGPU) Just asking, if it's not possible I'll just buy dock to use alongside your 16", and if it's not possible I'll not buy the dock and wait
It is technically possible. However, NVIDIA is strongly opposed to any new form factors, and they do not allow anyone to use their GPUs without permission.
nvidia more than likely will never allow it. However I know in the professional space amd is working to combine their gaming tech and there more server tech. So hopefully in the future your program (that sounds like it works better with CUDA) should work better.
Been loving my Framework 16 with the Bluefin community build. Using Ollama-Web has been helping with programming and much more. It's been fun to play around with.
They won't have to on all models; they have a Linux keyboard that likely won't do it. And I don't think Microsoft requires all OEMs replace that key with Copilot (Minisforum V3 and Lenovo's ThinkPads add it between the alt and control keys). I hate when that control key is gone; I use it all the time.
GPU is cool and all, but what about the NPU unit in Ryzen? GPU is limited only to the VRAM, but NPU potentially can use your whole RAM, 96GB or so. It would be awesome to be able to run LLama3.1-70B on a Ryzen.
Anyone here used an LLM locally on a FW13 AMD 7640U? Curious what the performance on that is like (ideally with 24GB or more memory installed) Secondly, Nirav is such a cool guy! I also would love more videos by him on any topic! We're really lucky he and his colleagues decided to make a company like this when they probably could've done something much less open and more lucrative.
You should be able to run the models he showed on the video above reading speed on pretty much any hardware. If you don't fancy using the terminal you can also host your own frontend such as openwebui or librechat.
I just ran llama3.1 through ollama like Nirav on my FW13 with a 7640U and 24GB of DDR5 5600 memory and windows 11. Seems to generate faster than I can read, obv not as fast as the 7700S in the video but it seems rather usable. Keep in mind I'm too lazy to reach for my charger so I've been running this on battery (~30-40% too). Loading the 3.1 model seems to use about 6GB of RAM. I had a bunch of other things running so my RAM usage went up to 91% so I would suggest 32GB or even way more if you're going to run other programs other than the LLM at the same time. I also have the VRAM of the iGPU set manually to 4GB in the bios so my total system memory is limited to 20GB. I could do more detailed hardware info testing, but I used task manager since it was easier. From task manager I see that it's using around 66% of the CPU and peaking at around 88%. Not sure if task manager is misreporting the iGPU but it showed only about 8% usage. Overall, it's not bad but RAM might be a concern if you are running other programs. Side note: My battery just dropped below 30% and it seems to be generating a tiny bit slower, but still a bit faster than I can read. CPU usage dropped to 40% average.
Can't speak to the 7640 in particular, but I'm using Llamafile on a 7840 with 32GB and it generally works well w anything 13B and below. That's around where you start getting perceptible latency for general use, though models around 7B will also see this with larger prompts like character cards. Would also echo the bits about RAM usage when running other stuff. This works fine on its own but not so much with my tabbed browsing problem.
@@tivrusky4 you should give llama.cpp or one of its wrapper like ollama a shot, quantized model loose very little performance at ~6 or 8 bpw and are noticeably lighter on RAM.
This talk about AI leads me to a question: As both x86 Copilot+ platforms, that being Lunar Lake and Ryzen AI, do not feature upgradable RAM, will there still be motherboards for these platforms? I'd also be interested in knowing whether or not there are plans for a dedicated convertible (maybe 14-15in?) or a convertible mod?
As someone who will eventually need to upgrade my aging and falling apart Surface book 2 (what a beast lasting this long), framework is where I am going.
So in a way you are saying Apple Unified memory machines M1-M3 are better suited for AI since you can have nearly 128 GB of memory available to the GPU and run really large models on them?
I think it would have been great to hear your perspective on how important (or unimportant) it could be to have NPUs integrated into the CPU. This was such a great demonstration of where we are currently at with LLMs and consumer-grade hardware, and I would like to understand how new chips like AMD's AI 300 series "moves the needle" (forgive the corpo speak 🤦♀️).
the decision factor for me for my next laptop is being able to run ollama / lm studio -- a decent / strong NPU, and > 32 GB of RAM. Not going for one with GPU.
I don't know if you guys have Codium or Co-pilot but it already feels like I have an AI PC. They are similar but Codium can automatically access your whole codebase without any extra setup. Not sure if co-pilot can do that at all
hey laptops can usually last 10+ years already just for work but can you make framework 16 inch one designed for gaming as upgrades are more necessary for playing newer games and make it like an old alienware so it is sort of bulky and it might make more room for upgrades and cooling and will overall be great as even 20 years ago people could live with old thinkpads when they werent really thin and lightweight
I have my home server with a 2600x and an rx6600 for Ollama running. Mistral nemo runs great, some stable diffusion branches also work so I'm happy. It's integrated into Nextcloud assist so why would I need anything on my fw 13? Also, my experience with mistral is definitely better than with llama 3.1. I was also surprised how easily rocm finally works and fedora server is pretty awesome.
We need that Framework gaming handheld! Ultra modularity! Running SteamOS. Sounds beautiful. The idea of being able to upgrade the board on my gaming handheld would be awesome!!! awesome.
Nice presentation but honestly still don't know why I would use any of the things covered.. Maybe the only legit use cases I can think of are if I feed it some documents and it parses them locally and helps me summarise them or combine information from them. Or maybe give it a folder of images and ask it to classify things or pick out images of a cat etc. But i think for consumers there needs to be a proper UI interface and stuff.
No hype marketing. Just the CEO giving us an honest tech explanation. Framework is so cool.
Nirav Patel made a netbook-sized laptop!!!!
Nirav Patel is always such an inspiration. A geek who made his own company and is still that same geek who interacts with the community and improves the product every single year. I know Framework will never try to sell "AI laptops" to just be on the money train. And this video was super informative and down-to-earth!
more geeks should become ceos
Everyone: more AI now
Nirav: do you even know what AI is?
"Everyone" looks around for everyone.
@@keyboard_g vocal minority more like
"Everyone" as in greedy tech investors who care too much about short term profit to think about whether the way this 'AI' was built is even legal at all, or if maybe it is in fact the largest instance of mass copyright infringement in the history of copyright and AI companies owe everyone whose work (not to mention private information) they stole royalties and damages.
@@ninjadodovideos I'm not averse to saying that the accessibility to not JUST such a vast amount of aggregated knowledge but a utility that can process and integrate it for anyone is seconds is (and will be) more beneficial and valuable to humankind in total than any patent or copyright infringement around. I go by the adage " 'Strong' is what we make each other. ". 99% of capitalism's philosophy is predicated on this all being a zero sum game which never has been true. Wealth is made by doing more with less. This technology is our path to doing almost everything with almost nothing. The birth of the age of abundance is upon us.
@@ninjadodovideos Tantacrul just released a massive video about facebook's shady practices and he gets into AI in the final chapter of that video... truly disgusting stuff...
Big tech has been chasing profits without paying for the consequences of their actions for too long, and hopefully AI will be the last straw in that fight for people's rights and privacy but I'm pessimistic in our current state...
@ExtantFrodo2 This reads like AI-generated slop, which proves our point if anything...
I installed GPT4ALL and Ollama on my Framework 13. So my Framework 13 is now an AI PC.
More importantly, I just upgraded from a 4-core CPU to 12 cores, so it is arguably the best AI PC, because I can upgrade it!
This is how they win. The toasters gain the ability to repair themselves ;)
This is awesome. You can even upgrade the cpu. Really out of this world
I really wonder, how would a frame work qualcom pc look like..
@@iroar5982 so it's basically swapping out whole motherboard not a chip like on pc's!
This format works so much better than the awkward interview format.
i love this.
framework coming and being like, oh, you want AI? why are you asking, you can already have AI. and then, proceeding to explain how to run ollama.
as a long time ollama user, this makes me really happy.
It’s an excuse to jack the prices of hardware for NPUs that have zero use case for %99 of consumers
I find it difficult to believe that CPU and SoC OEMs would sacrifice performance for that.
if they had oculink addon will buy straight away
I still don't believe raytracing has value for gamers. Just give us better framerates and higher resolutions, make 4k affordable for everyone. No need for car reflections.
RTX only exists because the same tech can be sold as AI chips, so it's paying for its development
And make it way easier to tag data and spy on people
Exactly. AI models already run on anything with a reasonably okay GPU. We don't need to pay hundreds of dollars more for increasingly complex, harder to repair hardware, that can't be used for anything outside of those AI models.
this is probably the best ad for Framework's morals possible
Did Nirav, the effin CEO of Framework, just made a video to talk us about something... he just wanted to talk about? How cool is that?!
I like the awkward interviews too!
Please dont turn into a cultist/tech ceo worshipper
I'm tired of iFruit fanboys and Musky-Tesla slaves
@@The-Cat it's impossible to make a cult around him as long as he stays this way. people just appreciate because there aren't many around
@@The-Catjust because people like him more than most other CEOs doesn't mean that they are in a cult.
Though I do see your point and hope it doesn't go that way either.
My respect for Framework and NIrav Patel has just been cemented. What a great video on the topic and such a gift to the community! SW Engineer for 30+ years and this was such a soothing relief compared to the marketing hype and meaningless drivel I've seen from so many other companies.
13:06 Hahaha, the dig against Acer and Toshiba! 😂 "Wow, this is even worse!"
RIP Toshiba
I like this company a lot. They're not shoveling hype and have made a dent. I'll be interested to see how their pricing model adjusts once they have enough critical mass. For me right now their pricing is a barrier to entry.
Repair friendly hardware will always carry a premium. Manufacturing wise, soldering things to the PCB is cheaper. I don't think many people appreciate that. For the consumer though, the lifetime cost of ownership is a lot lower (also cheaper in terms of non-monetary things like aggravation and time cost when things do break). Unfortunately this is not reflected in sticker price. It's difficult to convey this at the time of purchase. For the consumer who cares, maybe one way to get a sense of this is to look up repairability scores from sources like iFixit.
You're probably already aware of this if you follow their activities, but in case you aren't: Framework sometimes has old stock at steeply discounted prices, like 11th gen laptops going for like $600 or 500 or so when they were being phased out iirc, which is a super competitive price on a workhorse laptop for the modest user :)
Higher processing power is more expensive for sure, but eGPU's can be a decent solution if some portability can be sacrificed while still being far more portable than a desktop :)
Hope things can work out for you soon!
CEO who can type 🤙🏿🔥
Dholay mari. Mamulga undadu
Thank you Nirav for this introduction.
I just went by your steps, installed ollama and asked it - `what is framework laptop`
And it looks that my llama from paraller universe))
This is response (shortened)
``` Framework is a brand of modular laptops that were designed to be highly customizable and upgradable. The idea behind the Framework is to create a platform where users can easily upgrade individual components of their laptop, rather than having to replace the entire device.
However, in March 2022, it was announced that the Framework would no longer manufacture new laptops under this model, citing constraints and a shift towards more traditional computer manufacturing.
Although the original Framework laptop is no longer available for purchase, some third-party companies have started to develop their own modular laptop designs inspired by Framework's concept. These newer products may offer similar upgradeability features, but with different approaches and designs.
It's worth noting that the Framework has shifted its focus towards developing custom hardware solutions for businesses and organizations, such as custom-built laptops and other devices, rather than consumer-facing products.
```
I would like our universes not to intersect in the future.
The most personable CEO in consumer hardware (tech?)
You've gained my respect Mr Framework CEO
More Nirav rants! More R2R! Happy framework owner here!
As someone who strives to build autonomes robot, AI does play a crucial part in my academic research. But in most cases I'd use the GPU server from our institute to train the reinforcement learning policy. Better linux support somehow becomes the most important factor when I consider a new laptop, and I'm glad I chose framework 13.
Nirav seems like a really great CEO. Gives me a lot of confidence in his product and vision.
But Nirav, it can't automatically take screenshots and sell my information (for someone else) if it isn't running in the background all the time!
LOL XD.
Ask it to write a script to sell your data! Problem solved.
I love the realness of this demo because the model doesn't know about Framework. Good demo.
I love that Framework is using their channel to actually make these types of videos just exploring a topic that applies to their products
Great intro to the topic! I would just add that LLMs don't have "knowledge" or reasoning per se, but rather try to generate the statistically best response to your inputs-I've heard it described as "autocorrect on steroids" which pretty much sums it up. That's why they sometimes give nonsensical or-even more dangerously-reasonable-sounding, confidently wrong answers. It's an important distinction that highlights why no one should rely on LLMs for anything mission-critical.
LLMs are also quite good at transforming information, that's where RAG (retrieval-augmented generation) and tools come into play. Instead of asking the model a question and relying on its own "knowledge", you can tell it to fetch an article from Wikipedia, pick the information you're looking for, and summarize that information for you. That usually works pretty well. You can also give LLMs access to tools such as calculators to enhance their capabilities.
Also, one interesting reason LLMs tend to hallucinate is that they were trained to consider "I don't know" a bad answer, so they'll just make stuff up. Researchers from BuildIO recently found that simply telling the LLM in the system prompt not to hallucinate and just say if they don't know or can't do something actually works, and significantly reduces those "confidently wrong" answers.
The delay before second response is more likely caused by the model being unloaded from VRAM because it wasn't called for some time. Ollama allows you to tweak that and you can have the model stay in VRAM indefinitely.
Framework's direct and open communication is so refreshing! I love your mission and your responsiveness to the community. I don't have a framework laptop (yet) but I really enjoy the content. Keep it up!
That's actually a great summary on what's a local LLM, how to run them, and what's the state of support on consumer HW (like VRAM size and nVidia vs AMD). It would have been useful back when I started to experiment with this stuff myself. :)
I didn't even know that I can do stuff like this. Thank you very much!!!
We need more companies like Framework laptop and more CEOs like Nirav. Great video, keep up the good work
Thanks for the demo Nirav! Robert Martin is a great pioneer when it comes to architectural modelling.
Thanks for the video, and glad to see such a level-headed take on AI, instead of just diving into the marketing hype like so many other companies! Also, from my personal experience, the reason why its taking so long to respond at 7:33 might be related to Ollama aggressively trying to free up GPU resources and off-loading the model from the GPU when it's doesn't get used for a few minutes. Any new prompt past a few minutes would need it to load back the model on the GPU, which takes some time. I'm sure the screen cap is contributing to the delay as well, but I think its primarily the model loading.
i just got llama running on my system after seeing this video. super interesting stuff. thanks for sharing.
Honestly this was 100% the best way to address this issue, showing the things that you could do with AI but not overhyping it. Showing the actual use cases (Running Local LLMS) is awesome and showing you how to do it is even better. I'm definitely excited to try out my own local LLMs too now
It is a joy to listen to you! A smart and realisttic viewpoint. Subscribed immediately. Thank you :-)
Bahaha the ending was so funny, especially seeing Nirav smile more and more as he was going along with the upbeat script LOL
WOW!!! Great Video!!! I (quickly) read all of the comments and I want to repeat many/all of the positive ones!!!!
So many of the positive comments said exactly what I was thinking!!!! (and the few neutral/negative comments are just a bit of noise to me...)
Great timing too! I already own and love my Framework 13... and was considering getting a Framework 16 for running local models... so it was nice to see the actual performance!!!!
Well done all around!!!! (here is where I want to repeat all the glowing praise for the CEO, the company, the community, etc... but I will stop... and let the other comments speak for me)
Hi Framework! What I would really love is a better trackpad. Apples force trackpad is unrivaled to me and I wish you had a competitor or at the very least to allow clicking throughout the pad.
I just got my AMD FW 13 a few weeks ago. Running AI's locally are a bit tricky but way better than my XPS 13.
Got to meet the framework team at LTX last year, cool team!!
This was very interesting - thanks !! And if you don't have a Framework laptop - order one ASAP - best company and computers ever !!
A bit of off-topic, but I am so pumped for your RISC-V experiment. If I will be able to afford it (price is still not announced?) I am getting one as soon as you start selling it.
I'm sick of AI at this point, I want to see a gaming chassis form you guys, and more GPU options for the Framework.
gaming laptops are just as.much of a meme as ai is lets be real
@@aeghohloechu5022 AI is 100% more of a meme at this point then a gaming laptop. Framework would have already had my money if they had offered more GPU options.
I mean more options in general are always better, but what would a gaming chassis have that the FW16 doesn't already have?
@@T0DD Racing stripes
The framework 16 seems to be a good option, with its support for dGPUs that don't have terrible cooling. What more do you need?
patel, my man. love it. make these more
I really like ollama. I've been using it for a year to test all sorts of models, though usually I end up using Claude or OpenAI. Hopefully Cerebras will let us test Llama 405b soon, though a couple other slower providers have it already. (The newest o1 is pretty smart but extremely verbose.)
This is the way. We should all teach people how to use software instead of hyping up stuff without context. Good stuff Framework people, good stuff!
You know whats good content for this channel, its interviewing engineers asking them what they love doing at framework
You can run a model on RAM and CPU. I believe it's achievable with the current architecture of the framework 13 ( depends on the cache size of the CPU for inference and RAM size for storage) but for anything more technical besides chatbot applications using anything other than a dedicated GPU will need technical knowledge on programming/DEV OPS . the hype is just there to keep Nvidia stock price afloat after the crypto bubble burst.
The battery saving on lunar lake looks promising, putting aside the ai stuff, any plan to create a framework laptop with lunar lake and Linux?
They’re only just releasing the previous gen, so expect it to take a while if they do.
The downside of Lunar Lake is the RAM is soldered on the CPU package meaning it's completely non-upgradeable. I'd expect Lunar Lake to be skipped and Arrow Lake to be the next Intel generation on Framework mainboards.
@@hyperspeed1313I’m not sure it’s as simple as “skipping Lunar Lake”. Lunar Lake, AMD Strix Point, Snapdragon X Elite, most ARM SoCs, most RISC V SoCs that you would use in this case all use soldered RAM. The same class of SoC that went into Lunar Lake and Strix Point (Intel and AMD’s mobile SoCs respectively) tend to have more carefully binned CPUs, integrated GPUs of decent performance, etc… The future successors to those chips will probably have soldered RAM as well.
Arrow Lake isn’t really suited to a mobile format, and even if it’s okay for workstation laptops / desktop replacements, it’ll be dependent on having a dedicated GPU, meaning your Framework will be essentially sacrificing battery life (by having a dGPU) for upgradable RAM. Some consumers might make that tradeoff (and if they do, it’ll work out really well!) but a lot of consumers who have laptops want them to be portable and have a great battery life, which his what the mobile SoCs AMD and Intel, and now Qualcomm make are keying their planning around.
So, is it really a great strategy to not give Framework customers access to best in class battery life?
Now, I don’t want that outcome personally. I love upgradeable RAM. But Framework is beholden to the SoCs that are available off the shelf in the wider market, and I’m not sure that failing to use what their competitors are using is really the best approach.
I think what’s more important than upgradable RAM is being able to upgrade the SoC, because a lot of consumers (most, in fact), even when buying a highly upgradable desktop will just buy RAM with their CPU, and never upgrade it. I think as long as Framework still lets you swap out the mainboard non-upgradable RAM isn’t necessarily the worst outcome possible.
@@hyperspeed1313 the ram not being upgradable, it would make even more sense to buy a framework with it. For a laptop, I would take that trade off with a better battery life.
@@EugeneBuvard tf? that makes no sense at all, framework laptop supposed to be flexible on upgradability, not the other way around
A bit disappointed not to see image generation here as well.
Projects like Fooocus have made it very easy to setup and use, even on Windows, and even with AMD cards.
You’re fighting the good fight 🤲
Wish this video went more into the controversial aspects of LLM's regarding training sets, but otherwise I think this was a very level-headed video, especially coming from the CEO.
This changed my perspective on these “ai computers” a lot, it seems like it could be huge for any smaller companies looking to make their own ai platforms for people to use locally with their computers files too
I'd be interested in seeing the updated AMD CPU / GPU. AI or not.
Nirav sounds like a fantastic person!
WE LOVE YOU NIRAV
If you are running ollama on linux, make sure to get the "rocm" version of it for it to utilize the GPU. By default it runs in cpu mode, which is quite a bit slower than running on the GPU, though can be useful if you have a really memory pool for your cpu.
The main reason the second llama 3.1 demo took a while to get going is that Ollama is designed to run as a system service, so that it's always available to other software like Open WebUI or Continue. To make sure it doesn't constantly eat all your memory just waiting for a client to connect, it unloads the model after a short period of inactivity, and then loads it again once a request comes in. Loading models takes a while.
although it is not the most lucrative market and the target audience will be very small in numbers, i really wish Framework expands to India
15:18 when chatGPT first came out, the slow "typing speed" output was a gimmick to make the output seem more effortful and thus something to take more seriously. (The delay up to the first output was actual processing.) Pretty sure it's *still* a gimmick - otherwise most laptops would be painfully slow in their output - so why do you (or anyone) put up with it, let alone praise how "fast" it is?
I thought they made the free version slow so they can upsell you a faster one for the low low price of $x per month
Wouldn't putting FW16 on the side like in the background degrade the CPU cooling performance since CPU cooling is only using side exhaust? I myself also wonder what posture to use when I don't need to look at the screen but want the laptop to have better cooling than just putting it on a desk or couch.
the end is epic tho.😂
This was a great video, cool info, thanks.
I bet this CEO know much more than some other random tech company CEO about AI stuff. Talk like a real practical dude, making himself sounds real trustworthy. I will consider Framework laptop next time.
I like running whisper for speech to text on my machines. Runs well even on my phone, runs even better on my workstation. If the accelerators can be hooked up for that use case under linux, I'll be happy to see it.
When are you going to enable the NPU on the FW16 in Windows?
right to repair in the context of AI is AI that the end user can modify over time and ensure longevity of functionality vs cloud hardware that can change on you whenever
Hi, thank you for such easy to understand description.
Cool, will you build PCs with AMD hx 370 / 365 or Intel Lunar Lake ?
I don’t even care about AI, i just want a Ryzen HX 370 framework laptop because it’s the first x86 processor ever made that absolutely destroys Apple Silicon.
Wow, I'm honestly impressed.
Really cool seeing the ceo like this. Looks like a great company
I don't know much about ollama, but it doesn't upload any user data at all?
I love that you don't just jump on the AI hype train to get another selling point but that you can acknowledge the noise around it and focus on what's real 👏
Also "i use ollama from a group called ollama and they make ollama" was pretty funny to me😂
what are those vertical laptop stands in the background?
We designed them! www.printables.com/model/517170-parametric-vertical-laptop-stand
Hello Guys at Framework, is NVIDIA GPU not possible? My app(not games) behaves funky when using AMD eGPU(not their iGPU)
Just asking, if it's not possible I'll just buy dock to use alongside your 16", and if it's not possible I'll not buy the dock and wait
It is technically possible. However, NVIDIA is strongly opposed to any new form factors, and they do not allow anyone to use their GPUs without permission.
@@linuxsbc That's a shame
@@linuxsbcIsn't that because Nvidia is a garbage company?
nvidia more than likely will never allow it. However I know in the professional space amd is working to combine their gaming tech and there more server tech. So hopefully in the future your program (that sounds like it works better with CUDA) should work better.
Nvidia themselves doesnt want to support this cause as they won't earn as much $$$ as they are rn
Been loving my Framework 16 with the Bluefin community build. Using Ollama-Web has been helping with programming and much more. It's been fun to play around with.
Can you tell the model that the laptop logo is Framework and teach the model about the company?
If you replace the right ctrl key like how Microsoft made it into a copilot key, how would you rename and repurpose that key?
They won't have to on all models; they have a Linux keyboard that likely won't do it. And I don't think Microsoft requires all OEMs replace that key with Copilot (Minisforum V3 and Lenovo's ThinkPads add it between the alt and control keys). I hate when that control key is gone; I use it all the time.
GPU is cool and all, but what about the NPU unit in Ryzen? GPU is limited only to the VRAM, but NPU potentially can use your whole RAM, 96GB or so. It would be awesome to be able to run LLama3.1-70B on a Ryzen.
Anyone here used an LLM locally on a FW13 AMD 7640U? Curious what the performance on that is like (ideally with 24GB or more memory installed)
Secondly, Nirav is such a cool guy! I also would love more videos by him on any topic! We're really lucky he and his colleagues decided to make a company like this when they probably could've done something much less open and more lucrative.
You should be able to run the models he showed on the video above reading speed on pretty much any hardware.
If you don't fancy using the terminal you can also host your own frontend such as openwebui or librechat.
I just ran llama3.1 through ollama like Nirav on my FW13 with a 7640U and 24GB of DDR5 5600 memory and windows 11. Seems to generate faster than I can read, obv not as fast as the 7700S in the video but it seems rather usable. Keep in mind I'm too lazy to reach for my charger so I've been running this on battery (~30-40% too). Loading the 3.1 model seems to use about 6GB of RAM. I had a bunch of other things running so my RAM usage went up to 91% so I would suggest 32GB or even way more if you're going to run other programs other than the LLM at the same time. I also have the VRAM of the iGPU set manually to 4GB in the bios so my total system memory is limited to 20GB. I could do more detailed hardware info testing, but I used task manager since it was easier. From task manager I see that it's using around 66% of the CPU and peaking at around 88%. Not sure if task manager is misreporting the iGPU but it showed only about 8% usage. Overall, it's not bad but RAM might be a concern if you are running other programs.
Side note: My battery just dropped below 30% and it seems to be generating a tiny bit slower, but still a bit faster than I can read. CPU usage dropped to 40% average.
Can't speak to the 7640 in particular, but I'm using Llamafile on a 7840 with 32GB and it generally works well w anything 13B and below. That's around where you start getting perceptible latency for general use, though models around 7B will also see this with larger prompts like character cards.
Would also echo the bits about RAM usage when running other stuff. This works fine on its own but not so much with my tabbed browsing problem.
@@tivrusky4 you should give llama.cpp or one of its wrapper like ollama a shot, quantized model loose very little performance at ~6 or 8 bpw and are noticeably lighter on RAM.
@@lastrae8129 Llamafile is built around Llama.cpp, and I already am using quantized weights
Closing shot with a friendly smile. 15:43 😂
You can set HSA_OVERRIDE_GFX_VERSION=11.0.0 on the latest rocm to use the 7700s on the framework 16.
12:47 let’s pray for the day when you can insert a MacBook picture and have it identified as framework ⚙️
Loved the video. Easy to follow
This talk about AI leads me to a question: As both x86 Copilot+ platforms, that being Lunar Lake and Ryzen AI, do not feature upgradable RAM, will there still be motherboards for these platforms? I'd also be interested in knowing whether or not there are plans for a dedicated convertible (maybe 14-15in?) or a convertible mod?
As someone who will eventually need to upgrade my aging and falling apart Surface book 2 (what a beast lasting this long), framework is where I am going.
11:05 I'm surprised it didn't just make up all of those results like it tends to do for me
So in a way you are saying Apple Unified memory machines M1-M3 are better suited for AI since you can have nearly 128 GB of memory available to the GPU and run really large models on them?
I have no intention of buying this product but the company gained ++respect with this video. Nice representative.
I think it would have been great to hear your perspective on how important (or unimportant) it could be to have NPUs integrated into the CPU. This was such a great demonstration of where we are currently at with LLMs and consumer-grade hardware, and I would like to understand how new chips like AMD's AI 300 series "moves the needle" (forgive the corpo speak 🤦♀️).
Will there be a ryzen ai 300 framework laptop?
the decision factor for me for my next laptop is being able to run ollama / lm studio -- a decent / strong NPU, and > 32 GB of RAM. Not going for one with GPU.
I don't know if you guys have Codium or Co-pilot but it already feels like I have an AI PC.
They are similar but Codium can automatically access your whole codebase without any extra setup. Not sure if co-pilot can do that at all
hey laptops can usually last 10+ years already just for work but can you make framework 16 inch one designed for gaming as upgrades are more necessary for playing newer games and make it like an old alienware so it is sort of bulky and it might make more room for upgrades and cooling and will overall be great as even 20 years ago people could live with old thinkpads when they werent really thin and lightweight
will you guys support snapdragon x series in the near future?
I have my home server with a 2600x and an rx6600 for Ollama running. Mistral nemo runs great, some stable diffusion branches also work so I'm happy. It's integrated into Nextcloud assist so why would I need anything on my fw 13?
Also, my experience with mistral is definitely better than with llama 3.1.
I was also surprised how easily rocm finally works and fedora server is pretty awesome.
Thanks so much for not wasting your efforts on AI which has not proved itself
We need that Framework gaming handheld! Ultra modularity! Running SteamOS. Sounds beautiful. The idea of being able to upgrade the board on my gaming handheld would be awesome!!! awesome.
This video solidified my decision that my next laptop will be a Framework laptop.
Do you have a guide on how to set this up?
Nice presentation but honestly still don't know why I would use any of the things covered..
Maybe the only legit use cases I can think of are if I feed it some documents and it parses them locally and helps me summarise them or combine information from them. Or maybe give it a folder of images and ask it to classify things or pick out images of a cat etc. But i think for consumers there needs to be a proper UI interface and stuff.