QWEN 2.5 Coder (32B) LOCALLY with Ollama, Open WebUI and Continue

Chris Hay

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 19 гру 2024
Наука та технологія

КОМЕНТАРІ •

@blub5760 28 днів тому ⁺¹
Thank you so much for your video!
Really informative.
Didnt expect it to work that good.
@cwxuser5557 18 днів тому ⁺⁵
What's the memory usage while using 32B
@TestMyHomeChannel Місяць тому ⁺³
Great video. I loved the way you covered all these technically challenging areas for me so quickly and so comprehensively! Best wishes!
@chrishayuk 23 дні тому
thank you
@ComputerworxVideo 19 днів тому ⁺²
most important info missing!, what is the memory usage when 32B Qwen 2.5 running. Please provide the info.
@hannespi2886 Місяць тому ⁺¹
Thx for the conclusion on top at the end
@renerens Місяць тому ⁺³
I use the 14b model and continue on my 4090 and it is fast and works great!
@andrepaes3908 Місяць тому
@@renerens what quantization are you using? And context length?
@asifudayan Місяць тому
Which model/llm are you using?
@renerens Місяць тому
@@andrepaes3908 I used the default model not a quantized one, 32768 context length.
@renerens Місяць тому
@@asifudayan qwen2.5-coder:14b
@ziobuddalabs 23 дні тому ⁺²
Hi, is your computer a mac ? If yes: which one ?
@chrishayuk 23 дні тому ⁺¹
mac m3 max with128GB of unified memory
@TimothyHuey Місяць тому ⁺¹
Hey, Chris! That was a great video. So easy to understand and I setup everything and followed along. You are very easy to understand and do a great job of explaining the concepts you are discussing. I just subscribed and I hope you continue with this. I agree, this is all about what's coming and not necessarily "is this the end all, be all LLM for coding." If you take the time and follow along with the evolving AI it will be much easier to adapt to the next thing coming. Thanks for keeping me informed.
@chrishayuk 29 днів тому ⁺¹
Glad it’s useful, honestly I don’t think there is a more exciting time to be a developer
@godned74 Місяць тому ⁺¹
For speed Cerebras A.I. is nuts. over 2000 tokens per second using meta lama 7 b and over 1800 using 70 b
@TheCopernicus1 Місяць тому ⁺³
Great video, with Open Webui need to up the num_ctx as it defaults to 2048, perhaps 32768 might help with full response.
@andrepaes3908 Місяць тому ⁺¹
@@TheCopernicus1 how do you increase context of a model length within openwebui?
@TheCopernicus1 Місяць тому
@@andrepaes3908 on the top right hand corner click on the controls icon (looks like sliders) then go down to "Context Length" and for starters try 16384 if your device supports it as 32768 might be really slow, goodluck!
@TheCopernicus1 Місяць тому
@@andrepaes3908 in Open Webui click on the controls icon top right hand corner then set context length to 16383 to start with and go up or depending on your system resources! good luck
@kepenge 28 днів тому ⁺¹
@@andrepaes3908 you can change it easily in Openweb UI configuration or just create a new Modelfile and with Ollama create create a new version of the model with longer context.
@hmmyaa7867 Місяць тому ⁺²
What a great video, thanks man
@chrishayuk Місяць тому ⁺¹
My pleasure!
@andrepaes3908 Місяць тому ⁺¹
Great video, amazing content! I see you used the 4bit quantized version to run all tests. Since you've got 128gb RAM, could you run the same tests for the 32b model with 8bit and FP16 quants to check if it improves responses? If so please make another video to share the great news!
@chrishayuk Місяць тому ⁺²
that’s a really good shout, I’ll do that
@LoveMapV4 17 днів тому ⁺¹
Yeah, I never use 4-bit quantization anymore because it often gives very poor output results. Q8 is okay and almost as perfect as FP16. Also, Q5K_M should be the minimum since it still gives very good results. In fact, I don’t notice any quality loss with Q5K_M models. I’ve tested it on the Gemma 2 27B model and the Llama 3.1 8B and 70B models. However, if you have extra RAM, I highly recommend always using Q8 for the best performance.
@themax2go 9 днів тому
why use that if you could use ottodev?
@faiz697 18 днів тому
What is the minimum spec that is required to run this ?
@LoveMapV4 17 днів тому
All PCs will be able to run it. The question is, how big is your RAM? The bigger your RAM, the larger the model you can run. Usually, an 8B model requires 8GB of RAM, a 27B model requires 32GB of RAM, and so on (this is if you are only using the Q4 quantization). The speed of your CPU doesn’t matter; it only affects the speed of generation. You can still run it, though it will take longer if you have a slow CPU.
@spaul-vin 9 днів тому
32b-instruct-q4_K_M - 23gb vram
@NScherdin Місяць тому ⁺²
ChatGPTs donkey is like the spherical cow meme in phyics is a cow. :)
@chrishayuk 29 днів тому
hahaha nice
@kdietz65 29 днів тому ⁺¹
Good demo. You just went way, way too fast past the install/setup/config. I'm still trying to work my way past all the errors and figure out why I'm not getting any models showing up in my WebUI.
@chrishayuk 29 днів тому
Apologies I did that because I have a video where I l walk through ollama (much slower and in detail), this is the link
ua-cam.com/video/uAxvr-DrILY/v-deo.html hope it helps
@kdietz65 29 днів тому
@@chrishayuk Okay thanks. I'll try that.
@jargolauda2584 23 дні тому ⁺¹
Claude sucks big time with VUE, Vuetify, Bootsrap, bootstrap-vue, Laravel etc. Qwen is absolutely amazing! It makes so good VUE components, it knows Vite, it knows Laravel, it does not confuse VUE2 to VUE3 and differentiates versions. GPT-4-turbo does not understand different versions and just produces garbage.
@NitishSingh-iz5wo 15 днів тому
I love you my guy
@kdietz65 29 днів тому ⁺¹
No models. Why?
@chrishayuk 29 днів тому
This should help Getting Started with OLLAMA - the docker of ai!!!
ua-cam.com/video/uAxvr-DrILY/v-deo.html
@LoveMapV4 17 днів тому
You should be connected to Wi-Fi for the models to appear. It’s strange since it doesn’t use the internet but requires Wi-Fi. The truth is, you can access Open WebUI on any device as long as it is connected to the same network as the server.
@kdietz65 16 днів тому
@@LoveMapV4 Well I figured it out. You just gotta take your time getting the configuration right. I installed ollama as a local service. But I had to install open-webui using Docker because the Python PIP install didn't work. PIP didn't work because you need exactly version 3.11. It has to be exactly that version. 3.10 won't work. 3.12 won't work. It must be 3.11. Well, I had 3.10. I didn't spend enough time figuring out how to get exactly 3.11 installed. If I just blindly upgraded, that gave me 3.12, not 3.11. Arghhh!!! So I gave up on that path because I was impatient and just used Docker. But then I had to do something to configure open-webui in Docker to talk to ollama running locally, i.e., not in Docker. I followed the instructions on the web site and just took my time and finally got it to work. The installation and the documentation could both be better, but what the heck, that's what we get paid the big bucks for, right?
I watched a few more of Chris's videos and they are really good. It's a good resource for doing this kind of work. Thank you.

Наступне

Автоматичне відтворення

Anthropic MCP with Ollama, No Claude? Watch This!