GPU vs CPU: Running Small Language Models with Ollama & C#
Вставка
- Опубліковано 5 лют 2025
- In this video, we'll explore the performance differences when running Large Language Models (LLMs) in Ollama using both the CPU and GPU. Watch as I demonstrate a live sample in C# using Microsoft.Extensions.AI to run Ollama inside a Docker container. Curious to see how these models perform locally? Let's dive in and compare the results!
Useful links:
.NET Video Analyzer repository: aka.ms/netaivi...
Ollama in Docker: ollama.com/blo...
.NET & AI Show: • .NET AI Community Stan...
Another great video from ElBruno :)
Thank you Bruno.
Glad you liked it!
Thank you Bruno, interesting as usual !!!
@@eugene5096 Thanks! The CPU vs GPU is a wow one 😁
👍
❤❤❤
Bilal here😊, I think you should go for creating an extension then it will be good and easy to accessible
There is one in the Aspire Community Toolkit: github.com/CommunityToolkit/Aspire/tree/main
I may record a video about that one!
Best
Hi ! This is very interesting but I wonder how it would perform on an NPU ? Is it possible to make it run on NPU ?
NPU is a GPU with all the graphics bits removed.
@@cuachristine Yes I know thank you but that was not my question
Ohh that's a great question! I still don't have access to an NPU machine, however, if docker desktop allows access to the NPU, it should work. Let me ask Justin (a fellow CA.NET), who rocks the docker world to see what he can share about this.
@@elbruno Thanks man I would realy apreciate that 🙂
Does this support multiple GPUs ?
I'm not sure, I'll say no, we may need to check with the ollama team