Amazing how much the little things matter. The fact that you tracked that percentage graphic with your finger movement boosted the entertainment value. I even rewinded twice.
Not sure if the comments already mention it: Without taking into account how docker is set up, this benchmark doesn’t say that much. Starting with the question if it is wsl integrated or based on docker desktop‘s Linux VM. Then, we can clearly see the warning that the container hasn’t been started with the appropriate env for best tensor flow support. So, what should I take away from the benchmark? As it is, the result doesn’t mean much.
I don’t suppose you’ve done this test with the M1 Ultra yet, have you? If not, that’s something I’d like to put a vote in to see! Thanks for everything, Alex!
Your ANN training test is wrong. Firstly you selected benchmark which is far from real workload. Secondly in code you have: " # Set the random seeds os.environ['TF_CUDNN_DETERMINISTIC'] = '1' " This setting makes performance degradation on CUDA HW. Are you sure that setting have the same effect for M1 Max? :-) Thirdly you have several warnings inside Docker container which you simply ignored. Finally we haven't seen training progress on M1 Max at all (XX ms/step?, samples per epoch?). Just to check that everything was fine and we have true comparison.
If you want to run your model using tensor cores in RTX GPUs, you should use TensorRT. Also, the model should meet a few requirements. Furthermore, training a model on fp16 configuration doesn't actually affect training considerably since quantization affects the forward path(and inference time). However, most of the computation is due to back propagation which is done in full precision.
Absolutely right, unless u take care of k8s (the whole stack, IO especially) , native is always gonna be faster. It is a no brainer, and am quite disappointed to find only single comment mentioning this
There is a big performance degradation if you were to run tensorflow inside docker on an Apple Silicon machine, it’s very slow. Which is mostly because docker is optimized for most linux machines
@@capetorch That is complete BS, that Docker will run programs and code faster, than running them natively. Its well known that its the exact opposite. Docker is known for running programs and code slower than running programs and code natively.
Very interesting as I was having this exact decision when looking at X17 3070 or M1 Max 16. I went for X17 due to needing X86 Microsoft applications like Power Bi and SQL developer edition to learn for my new job. Be good to see the results unplugged on both machines too!
@@ayassn4890 I think the 16GB model with the 4K screen and 12th gen chip is yep. I struggled to use the 1080p model for work but the actual keyboard is very nice and I was mainly using it linked to a decent monitor. Mine has 8GB 3070 and 1080p (but I bought it before the world went mad so it was 1K less than the R2). It’s a nice laptop in terms of thinness for the power and doesn’t throttle much. In all honesty it all comes down to cash. If I had unlimited cash I would get the RB 16 or 18 now.
That's good to hear 16" M1 Max 32 core GPU isn't far behind. I would love if someone with 16" M1 Max 24 core GPU can share the results of this exact same test. I am about to buy a 16" M1 Max but want to go with 24 core GPU model and therefore want to know how far behind I would be if I don't go with 32 core GPU option. Thank you for your time! :)
I bought the 24 I had the 32 and it was eating my battery cuz even when u don’t use those extra cores they still are consuming power. I don’t do crazy work so I figured 24 would be enough.
@@lukesky1998 This is the main reason I want to go with 24 core option. I need dual encoders for video editing so can't go with M1 Pro. I just wanted to know how much of a hit would 24 core option have in ML training performance as compared to 32 core option.
This guys benchmark test is flawed, especially since the Nvidia laptop ran Tensorflow in a Docker container, which is well known to run programs and code slower, than running them natively instead. Clearly the Tensorflow on the M1 Mac was run natively. So why didn't he run this Tensorflow test on the Linux laptop natively?
This benchmark test is flawed, because on the Nvidia laptop he used Tensorflow in Docker, which does slow down code and programs, especially when compared to running natively instead. Clearly he ran Tensorflow natively on the M1 Mac, but decided to run Tensorflow on the Linux laptop in a Docker container. Why?
@@sgodsellify Is it slower for sure? Docker kinda is native (it’s a cgroup containers running on the host kernel, not a virtual machine), so as long as it has sufficient resource (CPU/memory) allocation it should be fine. At least that’s what I would expect - is there a quirk I’m not aware of? 🙏
@@Aaronage1 on a number of platforms including Linux, a Docker container defaults to using two CPUs. So if your PC or laptop has 4 CPU cores, then guess what, that Docker container is not running with all of the CPU resources. Then there is the issue of memory. How much of this systems memory did he allow this container to use? Look at the end of the day why use any container if it's already running Linux?
@@sgodsellify As you said this benchmark is non-sense. He couldn't manage to install Cuda natively for sure, I see a lot of developers use Docker to avoid installing Cuda dependencies. There is no other explanation for that...
Interesting to see but from what I can see of the stats you show from each result the speed evaluation isn't actually straight forward since you seem to be using larger batch sizes on the M1 than on the 3070 system. Larger batch sizes would allow the machine to finish faster even if the hardware was technically slower. I know you could argue that this is a fair advantage the M1 has over the 3070 in a realistic way because the RAM available to the M1 is greater than the VRAM available to the 3070.
From what I can see with the tests, the ASUS is the clear winner. In the first unoptimised test on the ROG, sample size from 587 vs 832 samples for seconds. Adding the --fp16 flag to optimise the run increase the samples per second to 1075 samples per second. Once you factor in that the ROG laptop costs roughly half as much as the Mac, while performing 20% faster, you can save a lot of money for a decent performance boost, or buy 2 ASUS laptops so a massive boost in workload. Running TensorFlow natively would get you even better results. Yes, the battery life on the Mac is better but knowing I can't upgrade storage solutions down the road without having to buy a whole new laptop makes the Mac a complete non-starter for serious work.
that benchmark uses CIFAR10, which has 60K images. It's a good starting point for small tests and experiments. I compared my RTX 2060 to M1 Max MBP with 24GPU and 32G memory. Once your dataset increases, video memory is probably your biggest bottleneck.
In that case, the M1 Max should absolutely obliterate any every other notebook GPU on the market, as a 64GB RAM M1 Max MBP essentially also has 64GB VRAM.
@@utubekullanicisi I tested my workstation with RTX 2060 6G to my M1Max MBP with 24gpu and 32G. I can confirm with larger batch sizes and pascal VOC, apple CreateML is faster than RTX 2060 + tensorflowGPU. The usual trick is to use min-batches or small batches so you don't get "OutOfMemory" errors in TF. In some cases TF will throw the error, but it will still run. You take a performance hit and it runs much slower. It can be as bad as 10-20x slower once you run out of memory.
Thank god. I have been browsing this kind of contents for ages. You can also try making videos comparing different nvdia laptop gpus. I was looking in youtube but could not find one.
Hello Alex, I got one M1 Max 64GB as well as Aorus 15P with RTX3070 64GB too. I am wondering why both done a sample run with ComfyUI+SD+FLUX.1 Dev for generation of a pic will having such a big different in terms of execution time (Mac by 366 secs vs PC by 80 secs). They both are running Pytorch nightly build with this command "conda install pytorch torchvision torchaudio -c pytorch-nightly". The sample pic and embeded ComfyUI script can be found under ComfyUI_examples page (Flux Dev, a cartoon girl with big ear and tail under section of "Simple to use FP8 Checkpoint version"). Sorry YT don't allow me to put URL here for your easy reference. What do you think for my case?
Pretty sure --fp16 is referring to 16bit floating point (half precision as it's sometimes called). So these tests are almost certainly not apples to apples.
Apple must have optimized their drivers pretty well, also don't forget Apple has unified memory... so keeping those GPUs fed with data is much faster than AMD/Intel.
That just shows how uninformed people are about Apple's GPU architecture. Apple's GPUs are really impressive. They currently have over 2x the perf/watt over Nvidia. Like, actually more impressive than their CPU design, which people have complimented left and right for how wide it is.
@@utubekullanicisi We call it ... Apple Derangement Syndrome. Apple haters are looking for ways to dunk of Apple. Confirmation Bias is real. Once they find "a weakness" in the Apple SoC, the laser focus on it and won't listen to any rational thoughts that differs from their worldview. Fortunately, most people don't treat their computers as their religion. People for the most part just buy what works. Only enthusiasts are stick in the mud.
@Alex Zisking what about the power consumption for that time? Energy efficiency of a laptop running a tensorflow benchmark would show how big of a win it is!
How much cheaper is an off-the-shelf 3070 Laptop 125W (assuming you benchmarked the top config) vs. the M1 Max for a 13% performance increase? I can get a 3070 125W for halt the price than the M1 Max and it's only "not that much faster" ?
But the MacBook Pro is a whole package with also a mini led display, 7.4gb/s SSD and pro speakers etc. for content creation. If anybody wants just the M1 cores wait for the new Mac Mini but the MacBook Pro worth absolutely every penny.
@@arnavodkaa yeah the all day battery has had a huge increase in my productivity since I’m running around town all day. I use Parallels VM running eCAD and simulating EM fields in PCBs, and the battery holds it’s own surprisingly well. When I first got the m1 max MBP I thought the battery icon, on the UI, was bugged. Of course, though, there has been trade offs. So I don’t mean to throw shade to the many other capable laptops on the market. Whatever works best for the user.
Hello Alex, I hope you are well. I needed your help, I've always had Windows and since recently I started working on mobile applications made with the IONIC framework and MEAN stack, in addition I use Java and SQL developer a lot for other projects, my question is if a MacBook Pro M1 is good for this type of work? Will I encounter compatibility issues? And what is the best version of MacBook that you recommend? Thank you very much in advance, I'm a huge fan.
we’re so close to m2 releases, then maybe wait a few weeks until there are some real comparisons being made. however, based on what you said, any recent mac will do the trick for you. you can probably pick up a decent intel 2019 macbook for a decent price too
@@AZisk That's well thought out, I'll wait for the launch of the new MacBook Pro and soon I'll see which one I'll get. Thank you for the quick answers Alex, I'm a huge fan, thank you very much once again
Opting for fp16 does reduce your numerical accuracy enough that if you have a deep network with many layers you might find yourself suffering from rounding artifacts. It’s a shame using the same flag on the M1 Max does not use the NPU (it could as the NPU is the same fp16 units)
If I'm not mistaken, NVidia has blogged about this and mixed precision to boost training speed. In practice, it shouldn't affect the final training accuracy. In many cases, using bfloat or lower precision can drastically speed up training time. I haven't kept up with the latest drivers, but it's suppose to be easier in the latest release, since many developers complained the learning curve was pretty steep. I'm also hoping apple updates metal so training can use NPU, but they haven't made any official announcements.
Currently the NPU from Apple is close sourced and not open for 3rd party. The only approach to use NPU for training on m1 mac is that convert your model to apples learning model and train again using Apple Machine Learning framework. Then the NPU will work. Also the tensor flow metal plugin is also closed source provided by Apple .
@@woolfel I have a regression model used to approximate a PDE solver which uses stacking convolutional layers and changing to fp16 has a substantial negative impact on the model. NVidia and AMD even make special machine learning cards for scientific work that are designed to run everything in double precision because it matters so much in regression models. It really is a setting that can heavily degrade a model if you don't check carefully.
@@dreamcode4204 This is the same for the Nvidia pathways in thier the tensor cores. The Nvidia driver is closed source, the CUDA compiler is closed source. Yep you need to convert your model to CoreML to make use of the NPU but that is just the same as converting your model to use CUDA, you can't make use of the Nvidia Tensor corse without using closed source pathways.
@@woolfel So it depends a LOT on your use case as to if the move from fp32 to fp16 has an impact on the output. In peritculare if your working with recurrent networks the errors can build up but even if it is just very deep complex regular networks. Lets just say NV blog has an agenda so tend to miss some use cases.
@@danishblunt9698 Oh hey, wasn't really expecting to come across our basement dweller Apple hater here :) Also there's another reply under this video that says "Tensorflow sucks in windows.... Nvidia NGC containers is as fast as it gets" to a comment that said "You should try the 3070 in Windows because the Nvidia Linux driver is not as good as in Windows".
@@utubekullanicisi since i know ure an incompetent apple fanboy ill enlighten you. Tensorflow surprisingly needs quite some configuration to run properly on both windows and linux. Typicially people just do install and go on windows without trying to configure much while linux users have to configure it, no easy install on linux. Thats why incompetent users such as yourself think its OS related. Enjoy being clueless.
Alex.. this is the video that the entire internet is missing. When weighing up an XPS versus an MBP, no one has stats for ML, not even now, in 2024. Like a billion reviews will tell you what you already know - they're nice laptops, with decent chips..
I have a macbook air with M1 and I work as a PHP senior developer never was so happy until now. The performance is just amazing and no heat at all. For the price of the macbook air is the best option for web developers.
@@AZisk thanks a lot I was installed 10.1 and cudnn 7.6 but I upgraded to 11 and 8.0 and its working fine .. can you please make a video on these versions because it realy going to help alot of people with rtx 30xx
Curious about 24 core 14” pro max And if I can use mini development possibilities as well as doing graphics video editing it so long and is there acceleration for common tasks or progress files are certain codecs. I noticed they didn’t put HDMI 2.1 in there which would’ve been great for powering and oh led 120 Hz television to watch edited video back on. Maybe one can use a display port access 120 Hz? What do you think in terms of everything that’s available in the PC world and everything that’s available in the Apple world what would you choose if you had a budget of $3000 3500
You can do up to 1440p 240Hz or 6K 60Hz via the TB4 ports. They literally couldn't add HDMI 2.1 or SDExpress because there wasn't enough available bandwidth as the new MBPs now have 3 TB4 full 40Gbps ports each with their own TB hub. And with M1 Max you can connect 3 6K 60Hz 10-bit displays (and a 4K 60Hz display) while with the M1 Pro you can connect 2 6K 60Hz displays.
it all comes down to price NVidia just has better optimisation and integration for both tensorflow, OpenGL, Maya, Blender, etc. An entry level Gaming PC with a 3060 to 3080 will give much better value for money man.
Alex, I don't understand one thing? You are running Linux on the PC... While I applaud you using Ubuntu, where's Windows 10 or 11? I would like to see the test run on a PC running Windows, since it's the main competitor. Also, how about running it on AC power and battery power. Lastly, what are specs on the PC? You mentioned it has an RTX3070... which version, how much video memory, which processor, how much system memory, screen resolution and street price? Lets be apples to oranges, I feel like we slide a lime or grapefruit in there somewhere.
You don’t understand what you’re even saying… With a PC you can choose to use Linux instead of Windows so it totally is fair comparison. So being such a fanboy.
@@teemuvesala9575 youre an idiot. The nvidia linux drivers are ass and he ran it in docker. Please stop Also he has a good point. The tdp situation on notebooks is ridiculous.
As soon as you said “docker container”, I consider your results to be void. Unless you’re running metal vs metal or docker vs docker, I wouldn’t trust this comparison. I get that some may argue that “you’re splitting hairs at that point” but no. You’re really not. An unfair comparison, regardless of how unfair, it’s still… unfair.
that’s why they are comparisons. A fair comparison would be comparing A to A, and that would not be interesting at all. All interesting comparisons are unfair by their nature. As long as you have the facts, you know what is being compared, and you get informed.
Aha , he's back with most required video which were on minds of many of us , i doubt alex is working with some AI Tech to find whats on viewers mind...😅
But being practical, with the price of m1 macbook you can buy at least two 3070 and required hardware and use with sli, would be cool to see a test with "price per training", as always, itodlers btfo
I am not sure about this and I may sound stupid but I think Apple is highly using OpenCL/ SYCL frameworks to efficiently implement parallel algorithms on Apple SoC. These vendor neutral frameworks have the ability to run multiple accelerators simultaneously. So my guess is when you ran AI test on M1, it used all resouces (CPUs + GPUs) on M1. However, on ASUS machine it was only using RTX. I may be wrong but thats what I think.
I really LOVE Apple M1, because nVidia support about CUDA and cuDNN is really Hell on Earth, this problem with installing nvidia CUDA, cuDNN and Tensorflow is here for many years, and now again - I have Ubuntu 22, then install CUDA 12, but nVidia don't release cuDNN compatible version! OMG! Installation on Apple M1? 10 minutes and tensorflow-gpu working :) never again nvidia waste of time
why did you measure --fp16 flag only on NVIDIA GPU? It is not special flag for NVIDIA. This flag sets the calculation precision. Less precision leads to faster calculation. M1 would be faster with this flag as well.
If the results are so close that you need "the Schwarzenegger" then you're really resting how quite and efficient the Mac does what the LINUX machines do.
I know people can use Xcode and Swift for making her native Mac app and if somebody wants to program python since I just signed up for a class what sort of environment what I use would it be that anaconda or mini condo and what if I wanted to compile and is it possible to create stuff for multiple platforms? I also heard python is very useful for programming APIs end it seems like these new machines have a small thermal envelope in the very powerful but there’s been a lot of innovation and modularity on the PC side of things but I haven’t used windows since I was a kid what do you think is the best machine yes you can choose anything what would you get in the PC world and what would you get in the Apple world. How does the 3070 compared to the 3080 compared to the 3090and what do you make of the new Nvidia ampere GPUs have a Lotta power on them for machine learning and ray tracing Are the Cuda cores much better for accelerating something like Adobe Premier and what about how Apple accelerates Final Cut or how does resolve it into the equation what are the biggest downside is that you found out so far about the new Mac because I’m curious about what the new powerful desktop version will be on the iMac iMac mini
Amazing how much the little things matter. The fact that you tracked that percentage graphic with your finger movement boosted the entertainment value. I even rewinded twice.
Not sure if the comments already mention it:
Without taking into account how docker is set up, this benchmark doesn’t say that much. Starting with the question if it is wsl integrated or based on docker desktop‘s Linux VM.
Then, we can clearly see the warning that the container hasn’t been started with the appropriate env for best tensor flow support. So, what should I take away from the benchmark? As it is, the result doesn’t mean much.
I don’t suppose you’ve done this test with the M1 Ultra yet, have you? If not, that’s something I’d like to put a vote in to see! Thanks for everything, Alex!
Your ANN training test is wrong.
Firstly you selected benchmark which is far from real workload.
Secondly in code you have:
"
# Set the random seeds
os.environ['TF_CUDNN_DETERMINISTIC'] = '1' "
This setting makes performance degradation on CUDA HW. Are you sure that setting have the same effect for M1 Max? :-)
Thirdly you have several warnings inside Docker container which you simply ignored.
Finally we haven't seen training progress on M1 Max at all (XX ms/step?, samples per epoch?). Just to check that everything was fine and we have true comparison.
If you want to run your model using tensor cores in RTX GPUs, you should use TensorRT. Also, the model should meet a few requirements. Furthermore, training a model on fp16 configuration doesn't actually affect training considerably since quantization affects the forward path(and inference time). However, most of the computation is due to back propagation which is done in full precision.
Isn’t there some performance degradation on the Ubuntu machine due to running the test inside Docker?
No, it's actually faster. Docker NGC it's sometimes twice as fast as installing everything by hand.
Docker is container.. not virtual machines.. so no degradation at all
Absolutely right, unless u take care of k8s (the whole stack, IO especially) , native is always gonna be faster. It is a no brainer, and am quite disappointed to find only single comment mentioning this
There is a big performance degradation if you were to run tensorflow inside docker on an Apple Silicon machine, it’s very slow. Which is mostly because docker is optimized for most linux machines
@@capetorch That is complete BS, that Docker will run programs and code faster, than running them natively. Its well known that its the exact opposite. Docker is known for running programs and code slower than running programs and code natively.
It would be interesting to see same test and both laptops on battery! Let’s see how fast!
Very interesting as I was having this exact decision when looking at X17 3070 or M1 Max 16. I went for X17 due to needing X86 Microsoft applications like Power Bi and SQL developer edition to learn for my new job. Be good to see the results unplugged on both machines too!
That's the cool thing. These new M1 Macs run the same on battery.
Bro as a Bootcamp Data Science we face the same things, so do you think x17 is worth to buy ?
@@ayassn4890 I think the 16GB model with the 4K screen and 12th gen chip is yep. I struggled to use the 1080p model for work but the actual keyboard is very nice and I was mainly using it linked to a decent monitor. Mine has 8GB 3070 and 1080p (but I bought it before the world went mad so it was 1K less than the R2). It’s a nice laptop in terms of thinness for the power and doesn’t throttle much. In all honesty it all comes down to cash. If I had unlimited cash I would get the RB 16 or 18 now.
That's good to hear 16" M1 Max 32 core GPU isn't far behind. I would love if someone with 16" M1 Max 24 core GPU can share the results of this exact same test. I am about to buy a 16" M1 Max but want to go with 24 core GPU model and therefore want to know how far behind I would be if I don't go with 32 core GPU option. Thank you for your time! :)
I bought the 24 I had the 32 and it was eating my battery cuz even when u don’t use those extra cores they still are consuming power. I don’t do crazy work so I figured 24 would be enough.
@@lukesky1998 This is the main reason I want to go with 24 core option. I need dual encoders for video editing so can't go with M1 Pro. I just wanted to know how much of a hit would 24 core option have in ML training performance as compared to 32 core option.
This guys benchmark test is flawed, especially since the Nvidia laptop ran Tensorflow in a Docker container, which is well known to run programs and code slower, than running them natively instead. Clearly the Tensorflow on the M1 Mac was run natively. So why didn't he run this Tensorflow test on the Linux laptop natively?
@Sean Godsell that is a great question. I was wondering same. 🤔
@@sgodsellify yeah he seems like a Mac fanboy who doesn’t really know how to code 😂
An amazing result for M1 Max, the efficiency advantage is huge 🤩
This benchmark test is flawed, because on the Nvidia laptop he used Tensorflow in Docker, which does slow down code and programs, especially when compared to running natively instead. Clearly he ran Tensorflow natively on the M1 Mac, but decided to run Tensorflow on the Linux laptop in a Docker container. Why?
@@sgodsellify Is it slower for sure? Docker kinda is native (it’s a cgroup containers running on the host kernel, not a virtual machine), so as long as it has sufficient resource (CPU/memory) allocation it should be fine.
At least that’s what I would expect - is there a quirk I’m not aware of? 🙏
@@Aaronage1 on a number of platforms including Linux, a Docker container defaults to using two CPUs. So if your PC or laptop has 4 CPU cores, then guess what, that Docker container is not running with all of the CPU resources. Then there is the issue of memory. How much of this systems memory did he allow this container to use? Look at the end of the day why use any container if it's already running Linux?
@@sgodsellify As you said this benchmark is non-sense. He couldn't manage to install Cuda natively for sure, I see a lot of developers use Docker to avoid installing Cuda dependencies. There is no other explanation for that...
I guess it's comes down to memory size, M1 has very fast shared memory 64gb and 3070 has 8gb, hence M1 can have bigger batch size.
Interesting to see but from what I can see of the stats you show from each result the speed evaluation isn't actually straight forward since you seem to be using larger batch sizes on the M1 than on the 3070 system. Larger batch sizes would allow the machine to finish faster even if the hardware was technically slower. I know you could argue that this is a fair advantage the M1 has over the 3070 in a realistic way because the RAM available to the M1 is greater than the VRAM available to the 3070.
From what I can see with the tests, the ASUS is the clear winner. In the first unoptimised test on the ROG, sample size from 587 vs 832 samples for seconds. Adding the --fp16 flag to optimise the run increase the samples per second to 1075 samples per second. Once you factor in that the ROG laptop costs roughly half as much as the Mac, while performing 20% faster, you can save a lot of money for a decent performance boost, or buy 2 ASUS laptops so a massive boost in workload. Running TensorFlow natively would get you even better results. Yes, the battery life on the Mac is better but knowing I can't upgrade storage solutions down the road without having to buy a whole new laptop makes the Mac a complete non-starter for serious work.
that benchmark uses CIFAR10, which has 60K images. It's a good starting point for small tests and experiments. I compared my RTX 2060 to M1 Max MBP with 24GPU and 32G memory. Once your dataset increases, video memory is probably your biggest bottleneck.
In that case, the M1 Max should absolutely obliterate any every other notebook GPU on the market, as a 64GB RAM M1 Max MBP essentially also has 64GB VRAM.
@@utubekullanicisi I tested my workstation with RTX 2060 6G to my M1Max MBP with 24gpu and 32G. I can confirm with larger batch sizes and pascal VOC, apple CreateML is faster than RTX 2060 + tensorflowGPU. The usual trick is to use min-batches or small batches so you don't get "OutOfMemory" errors in TF. In some cases TF will throw the error, but it will still run. You take a performance hit and it runs much slower. It can be as bad as 10-20x slower once you run out of memory.
RTX3070: 9m40s (attention to the parameters you set --fp16 for RTX cards)
M1 Max: 11m03s
RTX3070 won. Like :)
This parameter is for low precision 16-bit floating point number mode, while the M1 is using the standard fp32...
Thank god. I have been browsing this kind of contents for ages. You can also try making videos comparing different nvdia laptop gpus. I was looking in youtube but could not find one.
Hello Alex, I got one M1 Max 64GB as well as Aorus 15P with RTX3070 64GB too. I am wondering why both done a sample run with ComfyUI+SD+FLUX.1 Dev for generation of a pic will having such a big different in terms of execution time (Mac by 366 secs vs PC by 80 secs). They both are running Pytorch nightly build with this command "conda install pytorch torchvision torchaudio -c pytorch-nightly". The sample pic and embeded ComfyUI script can be found under ComfyUI_examples page (Flux Dev, a cartoon girl with big ear and tail under section of "Simple to use FP8 Checkpoint version"). Sorry YT don't allow me to put URL here for your easy reference. What do you think for my case?
Pretty sure --fp16 is referring to 16bit floating point (half precision as it's sometimes called). So these tests are almost certainly not apples to apples.
This. "Optimization flag" lol
it is apple to ASUS not apples to apples
I don't understand. I expected RTX3070 to be an order of magnitude faster, as CUDA should eat Metal on Apple GPUs alive.
Same🥲
Same
Apple must have optimized their drivers pretty well, also don't forget Apple has unified memory... so keeping those GPUs fed with data is much faster than AMD/Intel.
That just shows how uninformed people are about Apple's GPU architecture. Apple's GPUs are really impressive. They currently have over 2x the perf/watt over Nvidia. Like, actually more impressive than their CPU design, which people have complimented left and right for how wide it is.
@@utubekullanicisi We call it ... Apple Derangement Syndrome. Apple haters are looking for ways to dunk of Apple.
Confirmation Bias is real. Once they find "a weakness" in the Apple SoC, the laser focus on it and won't listen to any rational thoughts that differs from their worldview.
Fortunately, most people don't treat their computers as their religion. People for the most part just buy what works.
Only enthusiasts are stick in the mud.
Just wondering is it possible to run tensorflow 1.x on m1 Mac? Even on cpu?
When using the cuda cores on the RTX 3070 wouldn‘t it be fair to use the NPU on the M1 for comparison?
Metal automatically uses the neural engine
The neral unit is not used for deep learning training, only inference. The GPU is way faster than the npu.
Ok, thanks a lot for this clarification!
Runing in docker on the 3070 doesnt completly ruin performance ? Im not too familiar, but when I run soemthing in docker it like 3x-5x slower.
Could it be slower in docker, than it would be native?
Nope
ML Engineer and I got M1 for work and I have so many compatibility issues, so I'm changing back.
@@xink64 how has it been so far?
Yes but does your benchmark take advantage of the TPUs on the M1? 🤔
I was wondering just this. Very cool. Thanks!
This video got an instalike from me as soon as I saw the fingers added to the wood device.
Actually I wanted to buy it for programming, but I really like gaming so I think I will get a razor laptop.
@Alex Zisking what about the power consumption for that time? Energy efficiency of a laptop running a tensorflow benchmark would show how big of a win it is!
Deam that was loud test. After migration to M1 Pro and ditching streaming PC lodest component in my setup is SL60W light :)
It gets loud in my office :)
How much cheaper is an off-the-shelf 3070 Laptop 125W (assuming you benchmarked the top config) vs. the M1 Max for a 13% performance increase? I can get a 3070 125W for halt the price than the M1 Max and it's only "not that much faster" ?
Right now you could get that machine used probably a lot less than when I bought it new for $2k
@@AZisk Well, the "used" discount applies across the board. My point is that Macs deliver premium design, but at a steep premium...
Please add to that test de energy consumption, and battery level after the test. Was de RTX3070 with ac power o was running on battery ?
But that ryzen laptop is like half the price of M1 Max tho...
Each have their pros/ cons or trade-offs. While the Ryzen is cheaper, the Mac has the ability to run this test on battery and maintain battery life.
But the MacBook Pro is a whole package with also a mini led display, 7.4gb/s SSD and pro speakers etc. for content creation. If anybody wants just the M1 cores wait for the new Mac Mini but the MacBook Pro worth absolutely every penny.
@@Cat-kp7rl No doubt about it. MBP is easily a better laptop.
Sure but MBP is a way better quality laptop that uses way less power and fan noise. So it all depends on how you use it and what you value.
@@arnavodkaa yeah the all day battery has had a huge increase in my productivity since I’m running around town all day. I use Parallels VM running eCAD and simulating EM fields in PCBs, and the battery holds it’s own surprisingly well. When I first got the m1 max MBP I thought the battery icon, on the UI, was bugged.
Of course, though, there has been trade offs. So I don’t mean to throw shade to the many other capable laptops on the market. Whatever works best for the user.
Are you sure you were using Tensor Cores? You'd need to verify that, or you're not really using the RTX to its full advantage...
Why did u use docker on Ubuntu? How strong can it affect to performance?
According to the benchmark author, running inside docker is faster (although it’s not intuitive to me)
But, why were you using Docker for the RTX?
Hello Alex, I hope you are well.
I needed your help, I've always had Windows and since recently I started working on mobile applications made with the IONIC framework and MEAN stack, in addition I use Java and SQL developer a lot for other projects, my question is if a MacBook Pro M1 is good for this type of work? Will I encounter compatibility issues? And what is the best version of MacBook that you recommend? Thank you very much in advance, I'm a huge fan.
we’re so close to m2 releases, then maybe wait a few weeks until there are some real comparisons being made. however, based on what you said, any recent mac will do the trick for you. you can probably pick up a decent intel 2019 macbook for a decent price too
@@AZisk
That's well thought out, I'll wait for the launch of the new MacBook Pro and soon I'll see which one I'll get. Thank you for the quick answers Alex, I'm a huge fan, thank you very much once again
Does this new macbook comes with extra RAM slot because it comes with 16 gb RAM in the beginning so that in future we can add more RAM to it
I don´t undertand this test, you must not use Docker on Linux, WHY?
Yeah The Schwarzenegger in action
Opting for fp16 does reduce your numerical accuracy enough that if you have a deep network with many layers you might find yourself suffering from rounding artifacts. It’s a shame using the same flag on the M1 Max does not use the NPU (it could as the NPU is the same fp16 units)
If I'm not mistaken, NVidia has blogged about this and mixed precision to boost training speed. In practice, it shouldn't affect the final training accuracy. In many cases, using bfloat or lower precision can drastically speed up training time. I haven't kept up with the latest drivers, but it's suppose to be easier in the latest release, since many developers complained the learning curve was pretty steep.
I'm also hoping apple updates metal so training can use NPU, but they haven't made any official announcements.
Currently the NPU from Apple is close sourced and not open for 3rd party. The only approach to use NPU for training on m1 mac is that convert your model to apples learning model and train again using Apple Machine Learning framework. Then the NPU will work. Also the tensor flow metal plugin is also closed source provided by Apple .
@@woolfel I have a regression model used to approximate a PDE solver which uses stacking convolutional layers and changing to fp16 has a substantial negative impact on the model. NVidia and AMD even make special machine learning cards for scientific work that are designed to run everything in double precision because it matters so much in regression models. It really is a setting that can heavily degrade a model if you don't check carefully.
@@dreamcode4204 This is the same for the Nvidia pathways in thier the tensor cores. The Nvidia driver is closed source, the CUDA compiler is closed source. Yep you need to convert your model to CoreML to make use of the NPU but that is just the same as converting your model to use CUDA, you can't make use of the Nvidia Tensor corse without using closed source pathways.
@@woolfel So it depends a LOT on your use case as to if the move from fp32 to fp16 has an impact on the output. In peritculare if your working with recurrent networks the errors can build up but even if it is just very deep complex regular networks. Lets just say NV blog has an agenda so tend to miss some use cases.
What difference does it make running it native in Windows Vs containerised in Docker?
I expected much bigger difference in favor of RTX 3070, especially with Tensor cores turned on.
Docker + linux, not much more has to be added
@@danishblunt9698 Oh hey, wasn't really expecting to come across our basement dweller Apple hater here :)
Also there's another reply under this video that says "Tensorflow sucks in windows.... Nvidia NGC containers is as fast as it gets" to a comment that said "You should try the 3070 in Windows because the Nvidia Linux driver is not as good as in Windows".
@@utubekullanicisi since i know ure an incompetent apple fanboy ill enlighten you. Tensorflow surprisingly needs quite some configuration to run properly on both windows and linux. Typicially people just do install and go on windows without trying to configure much while linux users have to configure it, no easy install on linux.
Thats why incompetent users such as yourself think its OS related.
Enjoy being clueless.
@Muhammad Ehtasam rtx 3070 was not plugged in though
@@erichwiehahn3402 Where did you get that the RTX 3070 was not charged? The Ubuntu desktop battery icon shows the laptop is charged.
Alex.. this is the video that the entire internet is missing. When weighing up an XPS versus an MBP, no one has stats for ML, not even now, in 2024.
Like a billion reviews will tell you what you already know - they're nice laptops, with decent chips..
thanks for your video, one little suggestion, show a table to show the result of both :-)
nice video as always
Thanks 🙏
He's wearing a Really cool white and gold shirt there
Just to clarify was your test using the M1 Max GPU? or was it using the M1 Max Neural Engine?
Now run the test on battery
Lol🤣 the asus machine will finish in 40 minutes
Your cant be serious.
Have you not used non apple devices before. They don't run well without plug and will throttle and run at lower clocks.
@@rns10 No device other than Apple has battery life at the same level as performance, and that's a fact. And look, I'm a Dell fan boy
@@gabrielpeixoto5029 I said the same. Non apple devices don't have that.
I understand you all. Just have in mind these are laptops. Not desktops and he is comparing to CPU. I think it is fair.
I think this was done when they are unplugged, otherwise performance difference would be huge
This test were run plugged in (see ubuntu's battery indicator on 00:36 ~ 00:37. It's charging)
I'll be waiting for the pytorch tests
I have a macbook air with M1 and I work as a PHP senior developer never was so happy until now. The performance is just amazing and no heat at all. For the price of the macbook air is the best option for web developers.
Hi Alex , hate dumbly trying these scripts. Where can I learn about the yml files.
Can u please tell me which cuda version and cudnn version ur using for the 3070 please 🙏 am having an issue 😔
it’s v11 i believe
@@AZisk thanks a lot I was installed 10.1 and cudnn 7.6 but I upgraded to 11 and 8.0 and its working fine .. can you please make a video on these versions because it realy going to help alot of people with rtx 30xx
Great job!
Can you do a similar video with unity?
you never i formed what the tgp in this specific rog strix model as that affect the performance
Alex, thanks for doing cool videos!
m1 air vs gtx 1650 ti, which will be better for pytorch?
Shahed is my username
Great video! I knew the m1 max would shine here :)
Do anyone know the name of the app that measures temperature and the fans of the M1 Mac?
it’s called TG Pro: a.paddle.com/v2/click/114/137247?link=48
What happens when you run the test in docker on the m1? I guess 11 minutes becomes 11h xD
Native install is impressive though.
How much RAM in the Asus? My GTX 1650 come close to you???(got a 2x32gb ram ddr4) I feel like you have a bottleneck because of low ram on the asus.
Curious about 24 core 14” pro max And if I can use mini development possibilities as well as doing graphics video editing it so long and is there acceleration for common tasks or progress files are certain codecs. I noticed they didn’t put HDMI 2.1 in there which would’ve been great for powering and oh led 120 Hz television to watch edited video back on. Maybe one can use a display port access 120 Hz? What do you think in terms of everything that’s available in the PC world and everything that’s available in the Apple world what would you choose if you had a budget of $3000 3500
You can do up to 1440p 240Hz or 6K 60Hz via the TB4 ports. They literally couldn't add HDMI 2.1 or SDExpress because there wasn't enough available bandwidth as the new MBPs now have 3 TB4 full 40Gbps ports each with their own TB hub. And with M1 Max you can connect 3 6K 60Hz 10-bit displays (and a 4K 60Hz display) while with the M1 Pro you can connect 2 6K 60Hz displays.
You had to use the Tensor coresin order to beat the MacBook but M1 Max also has a dedicated 16 core TPU
You should also try the RTX 3070 in Linux running without docker and in Windows because the nvidia linux driver is not as good as in Windows.
Tensorflow sucks in windows.... Nvidia NGC containers is as fast as it gets
Docker just a container not virtual machine.. so test still valid..
But (what) can you play on M1 Max, Alex ? :)
Videos and music
it all comes down to price NVidia just has better optimisation and integration for both tensorflow, OpenGL, Maya, Blender, etc. An entry level Gaming PC with a 3060 to 3080 will give much better value for money man.
I’d like to see a bigger case running for a longer time. 😬
Alex, I don't understand one thing? You are running Linux on the PC... While I applaud you using Ubuntu, where's Windows 10 or 11? I would like to see the test run on a PC running Windows, since it's the main competitor. Also, how about running it on AC power and battery power. Lastly, what are specs on the PC? You mentioned it has an RTX3070... which version, how much video memory, which processor, how much system memory, screen resolution and street price? Lets be apples to oranges, I feel like we slide a lime or grapefruit in there somewhere.
You don’t understand what you’re even saying… With a PC you can choose to use Linux instead of Windows so it totally is fair comparison. So being such a fanboy.
@@teemuvesala9575 youre an idiot. The nvidia linux drivers are ass and he ran it in docker.
Please stop
Also he has a good point. The tdp situation on notebooks is ridiculous.
Please open 1000 or 6000 etc. folders you'll see performance, heavy duty!
Next test , mine crypto on M1 and NVidia. I can feel the miners scouring every mac in stock
Nvidia still holds its ground, beating even a machine that is twice the price...
❤ You, you make the exact content that I want to watch... I know Im such a 🤓. Thank you!
RTX Power baby!!!
Is this key works with rtx 3060
As soon as you said “docker container”, I consider your results to be void. Unless you’re running metal vs metal or docker vs docker, I wouldn’t trust this comparison. I get that some may argue that “you’re splitting hairs at that point” but no. You’re really not. An unfair comparison, regardless of how unfair, it’s still… unfair.
that’s why they are comparisons. A fair comparison would be comparing A to A, and that would not be interesting at all. All interesting comparisons are unfair by their nature. As long as you have the facts, you know what is being compared, and you get informed.
Aha , he's back with most required video which were on minds of many of us , i doubt alex is working with some AI Tech to find whats on viewers mind...😅
Tf
apple compare in there keynote , M1 Max with rtx 3080 we should Compare with that one
compare*
@@utubekullanicisi 😂 just In hurry . Thanks for correction
But being practical, with the price of m1 macbook you can buy at least two 3070 and required hardware and use with sli, would be cool to see a test with "price per training", as always, itodlers btfo
I thought M1 Max has 16-core neural engine 🤔
it would have been useful if you could install tensorflow on m1 T_T
I am not sure about this and I may sound stupid but I think Apple is highly using OpenCL/ SYCL frameworks to efficiently implement parallel algorithms on Apple SoC. These vendor neutral frameworks have the ability to run multiple accelerators simultaneously. So my guess is when you ran AI test on M1, it used all resouces (CPUs + GPUs) on M1. However, on ASUS machine it was only using RTX. I may be wrong but thats what I think.
sure is faster but lemme spend over 3k$ on a machine that does the same thing for half the price
1min 20sec faster at nearly half the price
the interesting part here is apple is using metal as their gpu instruction and nvidia is using cuda and cudnn 😅
I really LOVE Apple M1, because nVidia support about CUDA and cuDNN is really Hell on Earth, this problem with installing nvidia CUDA, cuDNN and Tensorflow is here for many years, and now again - I have Ubuntu 22, then install CUDA 12, but nVidia don't release cuDNN compatible version! OMG!
Installation on Apple M1? 10 minutes and tensorflow-gpu working :) never again nvidia waste of time
why did you measure --fp16 flag only on NVIDIA GPU? It is not special flag for NVIDIA. This flag sets the calculation precision. Less precision leads to faster calculation. M1 would be faster with this flag as well.
I was following instructions in the repo. Good to know though - will try it again
THE SCHWARZENEGGER is a cool tool
Ahaha Schwarzy's thumbs are so funny 🤣
Soo... gtx faster and cheaper? Then what's the point?
If the results are so close that you need "the Schwarzenegger" then you're really resting how quite and efficient the Mac does what the LINUX machines do.
That's great news
Human Touch... funny folk you are...
Can u try android studio
test it in houdini sims plz water, smoke
I know people can use Xcode and Swift for making her native Mac app and if somebody wants to program python since I just signed up for a class what sort of environment what I use would it be that anaconda or mini condo and what if I wanted to compile and is it possible to create stuff for multiple platforms? I also heard python is very useful for programming APIs end it seems like these new machines have a small thermal envelope in the very powerful but there’s been a lot of innovation and modularity on the PC side of things but I haven’t used windows since I was a kid what do you think is the best machine yes you can choose anything what would you get in the PC world and what would you get in the Apple world.
How does the 3070 compared to the 3080 compared to the 3090and what do you make of the new Nvidia ampere GPUs have a Lotta power on them for machine learning and ray tracing Are the Cuda cores much better for accelerating something like Adobe Premier and what about how Apple accelerates Final Cut or how does resolve it into the equation what are the biggest downside is that you found out so far about the new Mac because I’m curious about what the new powerful desktop version will be on the iMac iMac mini
3070m
I think it'd be better to test on AMD GPU since they have better drivers
1st
You were ready :)
The AMD computer you're using is trash. Horrible test to much missing infor and tpp many missing steps. Not to mention you hanicapped the PC.