M4 MAX MacBook Pro BENCHMARKED: Deepseek v3 vs Qwen, Phi-4 and Llama on Ollama

IndyDevDan

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 10 січ 2025

КОМЕНТАРІ • 130

@aujamin 4 дні тому ⁺³⁰
I'm not sure investing $6k for local model is worth it, deepseek v3 is great, but it will take atleast 3~4 month for next SLM to be useful, I'm pretty doubtful but maybe at the end of 2025 we will final see some good local model, but if you just use there api, it would be so cheap even if you run deepseekv3 for 24/7 * 365 its around $3600, about half of the price... Feel like MAXed out m4 is for youtubers who want to test things but not realistic for devs, but I could be wrong so feel free to comment please~
@marvin_hansen 4 дні тому ⁺¹⁰
Man, depends on your project. I have a maxed out M3 max and work on a large Rust monorepo build with Bazel. I would not trade in the Max for anything because incremental build times just fly. A full rebuild on my previous computer took about an hour to complete. The M3 Max does the same job in about ten minutes… if your time is valuable, the Max gets the job done.
@zbyszeklupikaszapl3280 4 дні тому ⁺²
unfortunately, you could be right...
@gijrijasdas5055 4 дні тому ⁺¹
nah man, if u use it for API you spend 3600$ for nothing. yes. nothing, if u sell your m4 max next year still around 5k$ for the m4 max + 0$ API FEE.. yeah if u are calculated correctly u will know, there is areason elon musk spend 100 mio usd in GPU bruhh
@Sanchuniathon384 4 дні тому
I use the M4 Max 128GB for React Native and for LLM app development. It's very, very fast and very portable. Rapid development is a lot easier when everything flies, and that includes custom models for custom purposes.
@gijrijasdas5055 4 дні тому
@ yea and some people still think better pay API Services 😂😂😂 they forgot the other benefit + price of the laptop etc its not compareable!
@TheSelfSA 4 дні тому ⁺⁹
Been waiting for someone to post a video like this since M4 MAX launched. Thanks man!
@clearlisted 4 дні тому ⁺³
are you a psychic? yesterday I was searching for a comprehensive M4 benchmark video and all I got was slop with slop thumbnails with stupid faces
and now you upload this, great work man thank you so much
@christianwheeler8441 4 дні тому ⁺¹
Very impressive piece of work Dan! Keep it up
@fredkzk 4 дні тому ⁺²
Besides data privacy, the other key purpose of running a local model is not speed (even 10 TPS is faster than I am) or cost (it is already getting increasingly cheaper), but upskilling it via RAG, fine tune, etc... Would be interested in those metrics when those local models are trained on my own data/knowledge base.
@bambo999 4 дні тому ⁺²
Your next video should be with something related to running local LLM models through Ollama vs MLX on Apple machines. You going to love how faster the MLX framework is, I have M4 Pro Mac Mini with 64GB unified RAM and I only run models through MLX framework, it’s a night and day vs Ollama.
@AndySpamer День тому
Second this... Would love to see some comparative benchmarks of MLX vs Ollama. Let's see how good the Apple GPU optimisation is.
I'd also be interested in some side by side comparisons of different quants. How much does the accuracy drop and the ups go up when going to say 8bit, 5bit, 4bit etc..
@314159265mangler 6 годин тому
Exactly the same spec as mine! I absolutely love the machine. It’s not only a workhorse, it’s a piece of art. Just perfect build quality. Makes my razer blade 18 (one of the best build quality windoze machines) feel cheap. Don’t get me wrong, love my razer, but apple is next level.
@jksoftware1 3 дні тому ⁺¹
Project DIGITS will become my personal AI model machine to run local models. I may even get 2 to connect them together.
@magnusquest 4 дні тому ⁺²
IndyDevDan out here making AI ASMR unboxing videos in 2025 like he's living in the future.
@aaagaming2023 4 дні тому ⁺²
What do you make of the NIVIDIA Digits units coming out?
@indydevdan 3 дні тому ⁺¹
Looks incredible could be the move for q3. Right now the big question I'm looking to answer is - how can I get DeepSeek v3 hosted? Will 3 Digits do the job? We'll see. Will update on the channel ofc.
@aaagaming2023 3 дні тому
@@indydevdan 3 units will work with llama.ccp for sure imo (though inference speed may be pretty slow compared to api). The real value with digits over M4 Ultra seems to be (at least for image and video gen) that at fp16 its twice as fast as M4 Ultra. If the memory bandwidth is the same as M4 Ultra, at 1092 GB/s then we'll really be cooking!
@andytolle4352 3 дні тому
Thank you, I find your work very valuable!
If I may: a video I would personally be looking out for is the comparisson of different quantizations of the same model. I understand and respect you can't do individual requests, but maybe it's something you or others may see value in as well.
@indydevdan 2 дні тому
This is a great idea - stay tuned.
@ndrcntrl 4 дні тому ⁺¹
Very nice perf comparison of both the hardware and models! Cool benchmarking tool and your rigorous process is excellent. One shot performance is definitely useful but would also be nice to see how increased context length impacts performance - e.g. what is the max usable token count for each model?
@samsquamsh78 4 дні тому ⁺¹
all I have to say is WOW! Fantastic video, and great job with "benchy"! would love to see how speed would increase with mlx, did not know of the other you mentioned, llamafile or what it was. thanks!
@egalanos 4 дні тому
Great timing. I'm going through your course and have also been contemplating what spec M4 to get and whether local model usage was a use case worth buy for (vs getting a lower spec MacBook Pro & putting money instead to a GPU rig).
@GencayKazımSelçuk 4 дні тому
Appreciate this man , this is huge for prospective buyers!!
@IlkkaNisula 4 дні тому
One interesting benchmark would be to allow model to correct it's answer by allowing it to review it's output 1-2 times before passing output to the evaluator function. This should increase success rate with a toll on execution time. Here local models have an edge compared to per token API models costs.
@paulmiller591 День тому
Nice, great video and interesting approach!
@paulyflynn 4 дні тому ⁺¹
love the cross-test of prompts
@gr8tbigtreehugger 4 дні тому
Many thanks for this - was looking forthis exactly!
@paulyflynn 4 дні тому ⁺²
For agentic and batch tasks, I am fine with sub 10 TPS
@indydevdan 4 дні тому
I respect your patience
@OctavChelaru 4 дні тому
Always useful stuff, thanks man!😊
@Techonsapevole 4 дні тому ⁺¹
Fantastic video, I hope to run Llama3.3 70B at 10tokens/s soon but on a Linux compatible hardware with NPU
@FalconStudioWin 4 дні тому
Your videos are starting good and are improving keep it going.
@dwerbam 4 дні тому
Nice!.My two cents: you could try specifying PARAMETER seed with a custom Modelfile, to generate predictable results and avoid different outputs of the model when benchmarking.
@NeuralDev 4 дні тому
That's so beautiful, as a AI nerd it's like watching a beautiful Art Masterpiece
@k22marie 4 дні тому ⁺⁶
falcon 3 kaaawww I'm rolling. Also, enjoyed seeing just a static shot of the box and hearing the ambient noise being made while you openend. Audio was clear. I understand this is an AI channel, but I thought it was produced well
@HerroEverynyan 4 дні тому
That caught me really off guard.
@andrewandreas5795 4 дні тому ⁺¹
Really nice video as usual. With 126 GB of unified memory, you could have tested Llama 3.3 70B and Qwen2.5 72B on the last problem
@indydevdan 4 дні тому ⁺¹
True, I took them out because it takes ages to run 30 tests. They do run quite well though, I'm keeping a sharp eye out for performant 30-70b models. It's seems like the sweet spot for the M4 Max.
@drewku42 4 дні тому
I got the M4 Pro for Christmas and I am super interested to test these models locally. One of my favorite tech products ever as a developer coming from a windows laptop.
@indydevdan 4 дні тому ⁺¹
Big upgrade imo. Enjoy the experience. The M4 has been a dream to work with so far.
@drewku42 3 дні тому ⁺¹
@@indydevdan Been working on a web app the last few days and it has been a treat. Did not realize how badly I needed a Unix-based OS for coding too.
@panny3897 4 дні тому
Glad to see my $199 went to a good cause. Enjoy it! :) One of my clients bought me the top spec mac mini with 64gb for xmas :)
@vrbased 2 дні тому
“Your videos are some of the best I’ve seen, and I always look forward to your guidance. I was wondering if you’re still using Marimo notebooks? I’ve tried a few times but can’t seem to make it click. Are you mainly using them as a prompt library?”
@SerhiiZhydel 3 дні тому
How much worse is the regular M4 compared to the M4 Max? Thinking on buying Air or Pro model with regular M4 and want to know if there is a big deal in Ai performance in there.
@CarlosValero 4 дні тому
I couldn't easily find the memory configuration of you Mac Pros, which I think is very important to know.
@epicpopularguy 4 дні тому ⁺¹
Given your logic here, a Project Digits build might be your next move in Q3-4. Similar price for two of them which Nvidia says you can run 405b models on.
@EcommerceGrowthHacker 3 дні тому
I've got a m3 max out and was a little envious of the m4 here, but 100% getting at least one digits when they come out! Just saying, it's not just HIS logic!
@indydevdan 3 дні тому
100% - big announcement for local LLMs. That prediction is coming true only 7 days into 2025 🎯.
@App-Generator-PRO 3 дні тому
Is mac the best device for local AI or should I buy a nvidia jensen GTX device? Would love to see a video about that. ps the index file in your project is blanc
@animelover5093 23 години тому
Will be graet also if they change or give option for a macbook with magnesium alloy !! at least not too heavy !!
@snygg-johan9958 4 дні тому
Nice video!! Would be super interesting so get some info of boosting performance with conversion to mlx or any other tips
@janoscsikos6311 4 дні тому
33:02 > I had some loop of try and fail and ended up with a very reliable tool use prompt that was very consistent all my local Qwen2.5 Coder models including 1.5B, 7B as well 14B. At the end I decided only use 1.5B because it nailed perfectly every single time. That was surprising.
@kayomazpatel9300 4 дні тому ⁺¹
all thanks to the people who paid for the course XD. Jokes, cool and always supportive and thankful of you!
@aaronabuusama 4 дні тому
im actually thinking about getting a m2 ultra. i wonder how it stack up to the m4 max
@zbyszeklupikaszapl3280 4 дні тому
How about visual models benchmarking? Like Llama3.2-vision?
@jksoftware1 2 дні тому ⁺¹
As far as min tokens per sec goes it depends on the type of task. One I'm waiting for results I'd agree 10 tokens per sec is about my minimum. However, an automated task that I don't care how long it takes there is no minimum.
@NicuDoca 22 години тому
It's a bit weird to hear you say that qwen 32b is wrong because it just gave the result without the underlying python code, when the prompt given was "Use Python to compute... print the numerical result only." I think the model performed exactly as expected given your prompt.
@sharex21 4 дні тому
Love the Was Anderson style intro.
@vifvrTtb0vmFtbyrM_Q 4 дні тому
At one time I wanted to buy a MacBook for local LLM. But then I read Reddit and they said that MacBooks are very very slow when the prompt reaches the 30k limit. It turns out like some kind of marketing bullshit. Like "AMD Ryzen AI Max+ 395 2.2x faster than 4090" but the LLama 3.1-70B Q4 won't fit into the entire VRAM. 43GB model size vs 24gb vram size.
@maxim-saplin 4 дні тому
Open models often happen to be poor instruction followers while prone ro verbosity - I have same impression after testing plenty with simple "move piece on the chess board" prompt when simulating chess games (LLM vs Random player). DeepSeek v3, Phi-4 (as well as Qwen 2.5 and Llama 3.1) are at the bottom of my "LLM Chess Leaderboard" if sorted by the number if mistakes made by LLMs when failing to comply with prompt instructions
@indydevdan 4 дні тому
Wow that's surprising. Instruction following should be the first benchmark for model builders to prioritize imo. Obviously it's more complicated than turning the 'instruction following' dial but a model that doesn't follow instructions well is pretty useless to us.
@AmrAbdeen 4 дні тому
that's on 2k context length, which is completely useless in alot of use cases
@deepdivesessions-dds 3 дні тому
This is great, infairness that "tough" questions are really tough, even for me.
@TheInfinityReaper 4 дні тому
Fantastic video, and great job with the benchmarking tool, have been waiting for something like this for a while. Are you by chance interested to test the Qwen QwQ-32B model? It's a really good model when it comes to reasoning, mathematics and problem solving.
@christianwheeler8441 4 дні тому
Would also really like to see the benchmarks for that model. Although reasoning might be a bit more difficult to evaluate.
@webrevolution. 2 дні тому
Pricing is outrageous if you're buying this in Europe. It should cost less because of the usd to eur conversion being favorable to eur, instead it ends up costing way more. I have just spent 5k eur on a macbook pro m4 max with 64GB ram and 1tb ssd and nano texture display. It's getting to the point where it would ALMOST be more feasible to go to the US and buy it from there and at least this way I also take a one day vacation in the US. I am honestly very outraged by this.
The only reason I am willing to pay these prices is that I really like to have things running locally and wouldn't want to rely on APIs also I like the apple ecosystem.
@marleymomo9582 4 дні тому
i got my 14 inch 128 gb ram 2 tb space black m4 max macbook on jan 4th 2025. This is a magical device..... I wanted to review this but i don't have a youtube channel...
@schongut9030 4 дні тому ⁺¹
Why aren't you using the highest quality version you can of the model, or at least the q8 version? There is a big difference in quality between the 3b-instruct-q4_K_M of llama3.2 (which you are using) and the q6, for example. The 3b-instruct-fp16 full version may be overkill but is only 6.4GB. It still sucks but it's better than the q4 version.
@indydevdan 4 дні тому
True but why not just bump to 10b, 14b, or 32b models?
@strength9621 4 дні тому
Right on time I have a m4 machine on the way
@victorchoi1318 4 дні тому
this buddy knows what others want 😄
@fhsp17 4 дні тому
Cool but can it run cyberpunk?
@wilfredomartel7781 2 дні тому
has anyone install deepseek-v3 locally ?
@lordkacke6220 4 дні тому
Someday I will do the same pack opening and I will be happy :)
@RedShiftedDollar 4 дні тому
For some tasks, accuracy and intelligence trumps all. If it took 1 week to complete a prompt and that prompt has the right accuracy, no matter how fast a less capable model is, if it can’t give you the depth and accuracy you need then it’s better to use the slow model.
What is more useful? The ability to ask a 300 IQ individual one question per day or the ability to ask 1000 80 IQ individuals 1000 questions per second?
The low intelligence route cannot help you build a rocket ship, the high intelligence route can.
@yashshinde8185 4 дні тому ⁺¹
How did you download the deepseekcoder v3 in Ollama?
@____2080_____ 4 дні тому
Agreed.
I have a topped out M2 so I’m looking at this to learn.
@____2080_____ 4 дні тому ⁺³
3:40 he just stated that deep sea was cloud based
@yashshinde8185 4 дні тому
@____2080_____ thanks
@augmentos 4 дні тому
Yes!! I was like dang do I need an m4, but I have a m3, think worth it then?
@indydevdan 4 дні тому ⁺¹
If you have a specced out M3 I would wait for m5 or mNext
@augmentos 4 дні тому
@@indydevdan I have notifs on for your channel. Love the content. Agree, thanks!!
@jeffwads 4 дні тому
Sweet if you have cash to burn. Love the performance but that price is ouch ouch ouch.
@Barc0d3 4 дні тому ⁺¹
Aesthetic AF :) Thank you for the video and the results.
@thunkin-ai 4 години тому
ASMR vibes
@julienarpin5745 4 дні тому
He can't stand speeds slow enough to read. If you're generating code, you'll have to read what's being written eventually, so you may as well do so during inference. I don't mind speeds as slow as natural human handwriting.
@awksedgreep День тому
You spelled Benchee wrong. ;)
@RobertRaught 4 дні тому
oh great, another mac
@paulyflynn 4 дні тому
interesting falcon fails on the first problem
@paulyflynn 4 дні тому
maybe a test issue, since startup time is slow
@bakrmasaeed7777 4 дні тому
Super
@ryzikx 3 дні тому
i got an m3 pro last year lol and theres already m4. my wallet would be cooked im not getting this
@marcusdavenport1590 3 дні тому
Didn't like the silent intro.
@fkxfkx 4 дні тому
It would be better to put the 6k into bitcoin two years previous if thats still an option. Time can move backwards if yiur a photon 🤷‍♂️
@tollson22 4 дні тому
would never buy apple, too authoritarian for me... so would be good if you can point out differences to windows
@MilimoQuantum 2 дні тому
M3 Max, llama3.2:1b: total duration: 2.413439167s
load duration: 22.982167ms
prompt eval count: 44 token(s)
prompt eval duration: 67ms
prompt eval rate: 656.72 tokens/s
eval count: 390 token(s)
eval duration: 2.321s
eval rate: 168.03 tokens/s
@indydevdan 2 дні тому
Nice! Try a larger, more useful model though like Qwen2.5:32b.
@MilimoQuantum 2 дні тому
@@indydevdan qwen2.5:32b: total duration: 5.795758917s
load duration: 20.935334ms
prompt eval count: 48 token(s)
prompt eval duration: 577ms
prompt eval rate: 83.19 tokens/s
eval count: 87 token(s)
eval duration: 5.196s
eval rate: 16.74 tokens/s
@MilimoQuantum 2 дні тому
@@indydevdan qwen2.5:32b: total duration: 5.795758917s
load duration: 20.935334ms
prompt eval count: 48 token(s)
prompt eval duration: 577ms
prompt eval rate: 83.19 tokens/s
eval count: 87 token(s)
eval duration: 5.196s
eval rate: 16.74 tokens/s
@indydevdan День тому ⁺¹
Now that's a solid model to work with. Phi-4 just (officially) dropped too. That will be a little faster but will drop your intelligence a bit coming down from qwen2.5:32b.
@MilimoQuantum День тому ⁺¹
@@indydevdan I uploaded a video testing llama3.3:70b-instruct-q8_0 earlier this morning. yea I tested phi4:14b-fp16 with some of my apps was not really impressed with its intelligence.

Наступне

Автоматичне відтворення

Using AI To Detect AI Music (and other music industry data-porn)