MI210s vs A100 -- Is ROCm Finally Viable in 2023? Tested on the Supermicro AS-2114GT-DNR

Поділитися
Вставка
  • Опубліковано 16 лип 2023
  • Wendell discusses the race in machine learning, going over Google's, Nvidia's, and AMD's tech to see who's got what in 2023.
    *********************************
    Check us out online at the following places!
    bio.link/level1techs
    IMPORTANT Any email lacking “level1techs.com” should be ignored and immediately reported to Queries@level1techs.com.
    -------------------------------------------------------------------------------------------------------------
    Intro and Outro Music: "Earth Bound" by Slynk
    Edited by Autumn
  • Наука та технологія

КОМЕНТАРІ • 241

  • @kazriko
    @kazriko 11 місяців тому +297

    AMD needs two modes, "Accurate" and "Green Team Inaccuracy"

    • @TheHighborn
      @TheHighborn 11 місяців тому +18

      More like, red dot accurate, noisy green data

    • @sinom
      @sinom 11 місяців тому +35

      Calling it "green mode" and justifying it as "oh it uses less power because it is less accurate" might actually be something they could do

    • @ac3d657
      @ac3d657 6 місяців тому

      You need two mode... wrapped, and dropped

  • @TAP7a
    @TAP7a 11 місяців тому +121

    ROCm seems to have planted itself in the scientific HPC world, let’s hope it can grow from there

    • @tappy8741
      @tappy8741 11 місяців тому +13

      With CDNA yes. With RDNA1/2/3 they've severely dropped the ball and didn't adequately make it clear that that was the plan all along. On the consumer side which is where hobbyist compute lives the 6950X was the first card to approach the Radeon VII for a traditional (non-AI ML whatever) scientific workload. The 7000 series is actually worse as they cut FP64 performance and the memory model with infinity cache split 5/6 ways (and/or something else) seems to have hurt this specific (opencl which is why it can be tested) workload.
      George Hotz to the rescue would be awesome.

  • @datapro007
    @datapro007 11 місяців тому +47

    I hope to heck it is. It's NVidia or nothing until now. Terrific video Wendell. I like it that you have content for the working folks.

  • @dirg3music
    @dirg3music 11 місяців тому +41

    I completely agree, if history has shown is anything it's that when Lisa Su goes all in on something, that something tends to work and work well. I'm just excited to see the market get more diverse as opposed to "CUDA or gtfo", closed ecosystems like that are bad for everyone.

    • @psionx1
      @psionx1 11 місяців тому +3

      except it was AMDs own fault that cuda became the standard for GPU compute work and they still have not learned adding features to hardware and slapping them on the box is not enough to win. they actually have to provide support and funding to develop 3rd party software that uses the features of the hardware.

    • @makisekurisu4674
      @makisekurisu4674 11 місяців тому +2

      ​@@psionx1Give them a break, they are running on less than half the so of course they'd have to pick and choose their fights

  • @jannegrey593
    @jannegrey593 11 місяців тому +47

    Well - I hope for some competition. Standards are fine, but one company owning them is very monopolistic. And AMD's disadvantage seemed to be lack of software rather than hardware.

  • @P0WERCosmic
    @P0WERCosmic 6 місяців тому +2

    ROCm 6.0 just dropped today! Love for you Wendell to do an update on this video to show off all the advancements with 6.0 and if there are any noticable performance bumps 🙏

  • @flamingscar5263
    @flamingscar5263 11 місяців тому +61

    Honestly, im hopeful for ROCm on consumer hardware soon, and windows, if your someone that uses any form of creative app like blender or the adobe suite then you know how valuable CUDA is, this really could be the boost AMD needs, Ive been trying my best to recommend AMD but its surprising how many people go Nvidia because of how much better Nvidia is in creative apps, even if they dont use them, its always "well I might want to use them in the future so Ill just go Nvidia"
    Soon there will be little exscuse to not go AMD, and im all for it, competition is good, not that im in any way an AMD fanboy, I knoe for a fact that if somehow AMD dethroned Nvidia as the market lead they would pull the same shit Nvidia does, but competition is what is meant to stop that

    • @GlacikingTheIceColdKing
      @GlacikingTheIceColdKing 11 місяців тому +6

      funny enough, they most likely won't use them in the future. I've seen a lot of people using the same argument to go Nvidia but they don't even install any creative applications after buying their gpus.
      Also I've been using AMD for about 7 months, it isn't necessarily horrible for people who want to just do video editing with premiere pro and illustrator or photoshopping. I use those softwares almost regularly and I face no problem with it.

    • @reekinronald6776
      @reekinronald6776 11 місяців тому +3

      Yup. For about a decade I was scratching my head why AMD had such a lousy Software strategy. It had great hardware, but the drivers and the lack of tools or API for programmers just seemed like a huge business mistake. Perfect example was the time and resources spent on AMD's Prorender. Considering the multitude of professional and high quality open source renders, ProRender was a pointless exercise; better to spend the man power and money on driver development, or even on openCL when it was viable.
      At least with ROCm they now seem to understand that everything that is needed to support the hardware is as important as the hardware itself.

  • @s1ugh34d
    @s1ugh34d 11 місяців тому +29

    We need more high end AI comparisons like this. Hope you get more gear to test!

  • @ProjectPhysX
    @ProjectPhysX 11 місяців тому +22

    We have both an MI210 64GB and A100 40GB for my FluidX3D OpenCL software. Both cards are fine, the software runs flawless, but they are super expensive. Value regarding VRAM capacity is better for the MI210, yet performance (actual VRAM bandwidth) is better on the A100. Somehow the memory controllers on AMD cards are not up to the task, 1638 GB/s promised, 950-1300 GB/s delivered. The A100 does the actual 1500 GB/s. Compute performance for such HPC workloads is irrelevant, only VRAM capacity and bandwidth counts.

    • @mdzaid5925
      @mdzaid5925 10 місяців тому +3

      What a time we are living in.... ~1000Gbps is not enough 😅

    • @ProjectPhysX
      @ProjectPhysX 10 місяців тому +3

      @@mdzaid5925 crazy right? Transistor density and with it compute power (Flops/s) has grown so fast in the last decade that memory bandwidth cannot keep up. Today almost all compute applications are bandwidth-bound, meaning the CPU/GPU is idle most of the time waiting for data. Even at 2 TB/s.

    • @mdzaid5925
      @mdzaid5925 10 місяців тому

      @@ProjectPhysX True..... not sure about performance implications but computing has evolved very very rapidly. When I think how small each transistor it, how many and how closely they are packed, it feels impossible. Personally, I feel that eventually analog neural networks will take over and gpu's dependency should be reduced to only training / assisting the analog chipsets. Also, I don't have too much faith in current generation of "AI" 😅.

    • @Teluric2
      @Teluric2 3 місяці тому +1

      What kind of setup you use with this software? Windows redhat?

    • @ProjectPhysX
      @ProjectPhysX 3 місяці тому

      @@Teluric2 for these servers openSUSE Leap, for others mostly Ubuntu Server minimal installation.

  • @nickelsey
    @nickelsey 11 місяців тому +75

    Tensorflow never directly competed with CUDA, it sits on top of CUDA - Tensorflow's primary competitor was (and still is) Pytorch. Both Tensorflow and Pytorch can be run on TPUs, but of course Tensorflow has 1st class support. Both Tensorflow and Pytorch have 1st class support for CUDA. I suspect the real reason Tensorflow hasn't been as popular lately is two-fold. First, a lot of internal Google development resources have moved on to develop JAX instead of TF, and secondly (and more importantly), Pytorch is simply better than Tensorflow. Its significantly more enjoyable and easier to use. And the reason CUDA has beaten out TPUs is also simple - you can only get TPUs using Google Cloud, whereas every cloud, every enterprise datacenter, and every school had direct access to CUDA capable devices. Everyone uses and develops for them, whereas TPUs and the XLA compiler basically only developed by Google.
    Also, in deep learning we actually don't mind the reduced accuracy for many problems. In fact, a mix of 32 bit and 16 bit is the *default* data format for deep learning now. Reduced precision deep learning is extremely important for large scale neural network development - for three reasons. First, obviously, if you use fewer bits for your model, you can fit a larger model in a single GPU's memory, which makes development easier. Second, the Tensor Cores basically double their FLOPs every time you halve the precision of your data. So if you have 256 TOPs using 32 bit floating point data, then you have 512 using FP16 data, and 1024 TOPs using FP8 data. Even further compression work is being done for INT8 and even INT4. Finally, one of the most important and oft-overlooked issues is that many neural net architectures require very high GPU memory bandwidth - thats why data center GPUs use HBM. When you reduce your data from 32 bit to 16 bit floats, you reduce the memory bandwidth pressure by half.
    We won't consider AMD cards until they're competitive at FP16 performance with CUDA, and even then, AMD would REALLY need to convince us that their software stack works as seamlessly as CUDA does - you have to add wasted developer and data scientist time to the total cost of the device to get a proper apples-to-apples comparison. We just started getting our H100 deliveries in, and they are truly beasts. I'm hoping we can get some AMD hardware in for benchmarking at some point soon.

    • @nexusyang4832
      @nexusyang4832 11 місяців тому +8

      Pin this comment above.

    • @seeibe
      @seeibe 11 місяців тому +8

      It all sounds viable from the hobbyist / small company standpoint. But come on, if you can afford H100s, you're big and successful enough that you can just invest in AMD as a backup plan. This would basically be the equivalent of Valve saying "All PC gamers are on Windows, so we won't invest in Linux". At a certain point, you're the one who has to make it happen.

    • @GeekProdigyGuy
      @GeekProdigyGuy 2 місяці тому

      Reduced precision is NOT the same as violating the FP standards. Going from FP32 to FP16 is a reduction in precision, but if the hardware implements the standards correctly, an FP16 calculation should have the exact same result no matter what card you run it on. Fudging the calculations probably doesn't make a huge difference for most ML applications, but for companies that need auditability (eg finance) or even big tech companies that want to debug an issue affecting a million users out of their billion users... Standards compliance is important, and Nvidia needs to fix their shit.

  • @leucome
    @leucome 11 місяців тому +35

    I got a 7900xt when rocm5.5 went out. Specifically to use with A1111. It works pretty good. To give an idea I tried 32 image of Dany Devito 768px 20samples,it took 2:30 min. Though I did 8x4 batch if I do 16x2 take 2:40 for 32x1 then it took 3 min. SO yeah the performance is there. I can just imagine how fast the MI300 will be.

    • @sebastianguerraty6413
      @sebastianguerraty6413 11 місяців тому +1

      I thought Rocm was only suported in very few 6xxx gpus on AMD and their server class gpus

    • @chrysalis699
      @chrysalis699 11 місяців тому +6

      @@sebastianguerraty6413 Rocm 5.5 fixed that. Added gfx1100 and thus the 7xxx support. I've been custom compiling pytorch with every new release of rocm. Can't wait for them to start leveraging the AI accelerators cores in the 7xxx series. Weather that is CUDA compatible, and will be exposed via HIP still needs to be seen.

    • @sailorbob74133
      @sailorbob74133 11 місяців тому

      @@chrysalis699 when you compile pytorch for gfx1100 how much of a uplift do you get over stock pytorch? What benefits do you see from the custom compile in general?

    • @chrysalis699
      @chrysalis699 11 місяців тому +3

      @@sailorbob74133 The stock pytorch compiled against rocm 5.4.2 doesn't detect my card at all, so the uplift is infinity 🤣. I doubt it there is much difference for RX 6xxx cards, and there is still quite a bit of unlocked potential in the RX 7xxx cards, as I haven't seen any HIP APIs for the AI accelerators. There are actually barely any mention of them on AMD's site, just an obscure reference on the RX 7600. Probably have to wait for CDNA 3 to release those APIs.

    • @chrysalis699
      @chrysalis699 11 місяців тому +1

      I just noticed that pytorch nightly is now compiled against ROCm 5.6, so I'll probably just switch to those. 🤞the next release will be build against 5.6

  • @Bill_the_Red_Lichtie
    @Bill_the_Red_Lichtie 11 місяців тому +19

    I am such a geek, "Can't believe it's not CUDA" made me actually laugh out loud.

  • @astarothgr
    @astarothgr 11 місяців тому +12

    The worst thing about ROCm is the hit'n'miss support for commodity GPUs. Back in the 3.x / 4.x days of ROCm, commodity GPUs were half-heartedly supported, with bugs, and sometimes support retroactively withdrawn. These days at least they tell you that if you buy anything other than the W-series of GPUs (i.e. W6800) they don't promise anything.
    This however, will not increase the mind share; all students and budget-strapped researchers just buy off-the-self nvidia GPUs and go to work. If you've picked a commodity GPU card and are trying to get ROCm to work, be ready for tons of frustration; really, this use case is unsupported.
    Source: my own experience with ROCm 4.x, using rx480/580, vega 56/64 and Radeon VII (the only one that worked reasonably well).

    • @mytech6779
      @mytech6779 11 місяців тому +4

      I would add to the student/budget research thing, they may not be looking for high performance, but they do need the full feature set to do the primary development work, then once working and somewhat debugged they will upgrade to get performance.
      Even for big-budget ops it makes no sense to have top-end hardware sitting there depreciating for a year or four while the dev team runs experimental test builds. By the time it comes to a real production run another purchase will be needed anyway.
      That core functionality problem has always been AMDs GPU problem, promises that seem good on paper but ultimately don't deliver. "Oh yeah now that we have your money it turns out you need this specific version of PCIx with that CPU subfamily, on these motherboards with this narrow list of our cards (as we have terrible product line numbering so many in the same apparent series don't work) made in these years, with that specific release of this OS...."
      Years ago I bought a W7000 (well over $1000 12 years ago) specifically because I wanted to play with the compute side, and there were claims that it had compatible drivers and such (I use Linux, nVidia had terrible support), Nah oops something in the GCN1.x arch was screwed up and compute was never useable even after several major changes in drivers and supposed open sourcing. It worked OK for graphics but my graphics needs are minimal.
      Later, I switched to a much newer and cheaper equivalent performance consumer AMD card that claimed OpenCL support, nah again doesn't really.
      Gave me a rather bad taste for AMD. I'm hoping Intel can push some viable non-proprietary alternative to CUDA, I'm due for a new system in the next couple years.

  • @solidreactor
    @solidreactor 11 місяців тому +24

    Rumor says that ROCm might work for RDNA3 on Windows this fall (repo & comments). However something similar was said earlier for 5.6 and that might not be true anymore?
    I really hope the consumer RDNA cards could run ROCm on Windows and act both like an evaluation for the CDNA platform and as an entry for AI compute, to democratize AI access.
    Having ROCm support on consumer cars on Windows might also develop traction from other companies (like Tiny corp) to embrace the more open solution, who knows, maybe that will tip the scale to AMDs favor?

    • @flamingscar5263
      @flamingscar5263 11 місяців тому +2

      Everything points towards it being the case, AMD hasn't said anything offically but the in development documents leaked saying a fall time frame
      It will happen eventually, even if not fall, it will happen, AMD knows how far behind they are on the consumer side for creative work, they need this

    • @stevenwest1494
      @stevenwest1494 11 місяців тому +5

      I'm hanging my GPU choice on this date, because honestly I don't want a rtx 3060 12gb, and NGreedier's horrible GeForce experience 🤮 but I want to get into Stable diffusion. A 3080 12gb is just waaaay too much still! But I really want is a rx 6800, with RocM for Windows!

    • @eaman11
      @eaman11 11 місяців тому +2

      Intel says the same thing, their stack working on Windows too.

    • @mytech6779
      @mytech6779 11 місяців тому +2

      AMD will see a squirrel by then and abandon yet another project with half implemented "support". Why they would even mess with Windows support at this point is dumbfounding, most systems in this realm run Linux unless they are forced to Windows by some 3rd party need for proprietary crap. Windows may still be king of Ma and Pa Kettle's desktop but that isn't this target market segment.

    • @reekinronald6776
      @reekinronald6776 11 місяців тому +1

      @@mytech6779 I would like to see a segment breakdown between corporate GPU computing and consumer. I would still think Windows Users running blender, Adobe, or some other video graphics program that use GPU rendering is quite large.

  • @tad2021
    @tad2021 11 місяців тому +8

    If you didn't know, in A1111, change the RNG source from GPU to CPU and the optimizer to sdp-no-mem. That should make the differences running on different GPUs as little as possible.
    Using xformers on cuda can be faster (sdp on pytorch2 has mostly caught up), but the output isn't deterministic.

  • @joshxwho
    @joshxwho 11 місяців тому +2

    Thank you for producing this content. As always, incredibly interesting

  • @steve55619
    @steve55619 11 місяців тому

    Thanks for this video, this field is moving so quickly it's really hard to keep up to date on the latest advancements, let alone the current status quo

  • @MaxHaydenChiz
    @MaxHaydenChiz 11 місяців тому +40

    It'd be easier to get students experienced with AMD hardware and get open source support for it, if RDNA had more compatibility with CDNA / better performance parity against NVidia hardware.
    Students and hobbiests aren't spending 10+k on this kind of stuff.

    • @nexusyang4832
      @nexusyang4832 11 місяців тому +16

      Yeah, the fact someone can walk into best buy and get a prebuilt and download cuda sdk and learn says a lot on how easy and affordable someone can get into AI/ML. If AMD can do the same for their consumer/gaming hardware then that would be a big game changer.

    • @levygaming3133
      @levygaming3133 11 місяців тому +15

      @@nexusyang4832exactly. There’s a lot of hand wringing about all the various things Nvidia does to needlessly segment their lineup, and that’s all well and good, but that’s not at all what CUDA is.
      CUDA’s advantage is that it’s the same CUDA wether you have an MX iGPU replacement, the same CUDA that’s in your old Nvidia GPU that you’re replacing (assuming you have an Nvidia gpu, obviously) and it’s the very same CUDA that’s in last year’s laptops, this years laptops, and is certainly going to be in next year’s laptops.
      It’s not like AMD makes CDNA laptops, and that’s kinda the point.

    • @nexusyang4832
      @nexusyang4832 11 місяців тому +1

      @@levygaming3133 You're spitting facts. 👍👍👍👍

    • @steve55619
      @steve55619 11 місяців тому

      Excuse me??? Lol

    • @mytech6779
      @mytech6779 11 місяців тому +2

      Hobbiest /student stuff doesn't need performance parity with CDNA.
      What it needs is ease of access (Availible as a standard feature on commonly availible consumer priced cards, without hobbling); similarity of interface accross products for the user and for software portability between consumer stuff and CDNA; and performance that is good enough to not be frustrating.
      Reasonable Linux support is also needed. Linux may only make up 2% of total desktops, but Ditzy Sue and Joe Sixpack aren't GPU-compute hobbyists, so total desktops is the wrong stat; in reallity Linux is closer to 50% or more of the relevent market segments.

  • @AndreiNeacsu
    @AndreiNeacsu 11 місяців тому +14

    I am really happy that Ryzen paid off. in 2017 I was one of the earliest adopters who pre-ordered two Ryzen 1700 (non X) systems with X370 boards; and I never pre-order stuff, did not before and have not since. Now, AMD is a proper force for innovation and competition in both the GPU and GPU spaces, for consumers and datacenters. Also, Intel ARC seems to become more interesting by the day. Got an Acer A770 16GB as a curiosity at the start of this year and I still haven't reached the final conclusions about it; seems like every second driver update makes things better.

    • @flamingscar5263
      @flamingscar5263 11 місяців тому +8

      Yea, it's honestly good ryzen happened, because there was reports they were on the road to bankruptcy
      All of this is thanks to Lisa Su, she really saved AMD

    • @peterconnell2496
      @peterconnell2496 11 місяців тому +2

      Well done. Therein lies a tale many of us would like to hear. The buying decision in the market of the day? The cost of an 8 core intel vs amd then e.g.? Lets not forget what a classic the 1600 proved to be.

    • @MatthewSwabey
      @MatthewSwabey 11 місяців тому +5

      According to two senior AMD tech folks Zen was designed because they had to! bulldozer/etc. was a failure. Originally they aimed for 70% of Intel performance for 50% of the price, but then TSMC's silicon just kept getting better and Intel stopped innovating. [I had the chance to talk to some senior AMD tech folks when they were recruiting on campus and they were surprised how great Zen turned out too!]

  • @mrfilipelaureanoaguiar
    @mrfilipelaureanoaguiar 11 місяців тому +4

    That m.2 scanning multiple 4k videos to check for a choice of shape...really Nice what it can process and check at that size without cooling on it. As long is detected...

  • @Owenzzz777
    @Owenzzz777 11 місяців тому +12

    You forgot to mention George Hotz’s discussion started with his frustration with AMD GPU. The so called “open source” software isn’t so open. Look at the “open” FSR 2 repo, no one is reviewing public pull requests, it’s used more as a marketing tool than supporting OSS community

    • @tstager1978
      @tstager1978 11 місяців тому +4

      They never said that fsr2 would be an open source project. They said it would be open source meaning free access to source code and the ability to modify for your own needs. They never said they would accept pull request from the public.

  • @spuchoa
    @spuchoa 11 місяців тому

    Great video Wendell!. This is good for the market, lets hope that the prices adjust in the next 12 months.

  • @bennett5436
    @bennett5436 11 місяців тому +7

    please do 'tech tubers by Balenciaga' next

  • @marktackman2886
    @marktackman2886 11 місяців тому +2

    These videos empower my team to express ideas to upper management.

  • @reto
    @reto 11 місяців тому +6

    Got SD A1111 to work on an RX 6500 XT and an Arc A770. But I wasn't able to run it on Vega iGPUs. The A770 16GB crushed the 3060 12GB I usually use.

    • @littlelostchild6767
      @littlelostchild6767 11 місяців тому

      hey, if you don't mind , could you please make a short test video on a770.? I'm thinking getting a770

  • @SomeGuyInSandy
    @SomeGuyInSandy 11 місяців тому +5

    Seeing those giant GPU modules gave me Pentium II flashbacks, lol!

  • @methlonstorm2027
    @methlonstorm2027 11 місяців тому

    i enjoyed this thanks you.

  • @Alice_Fumo
    @Alice_Fumo 11 місяців тому +1

    This is such a curious way to create spot the difference images.

  • @usamaizm
    @usamaizm 11 місяців тому +3

    I think the subtleties shouldn’t be an issue.

  • @Stealthmachines
    @Stealthmachines 10 місяців тому

    You're simply the best!

  • @post-leftluddite
    @post-leftluddite 11 місяців тому +10

    Wendell....this is seriously important work. Making the alternative to what many see as the default choice observably feasible is crucial to easing the hesitancy many people have, and just like in anything else [under the clutches of capitalism] a defacto monopoly can only harm consumers/users.

  • @shauna996
    @shauna996 11 місяців тому

    Thanks!

  • @mdzaid5925
    @mdzaid5925 10 місяців тому +2

    ROCm support is definitely need on consumer grade hardware.
    -This will give AI students some experience in Amd ecosystem.
    - Also, not all AI models run on the cloud. For local use, the companies have to consider the available options and currently it's only nvidia.

  • @cedrust4111
    @cedrust4111 11 місяців тому +3

    Is ROCm supported on RDNA3 IGPU?
    By that i mean if one has a Minisforum UM790 Pro (with Ryzen9 7940HS) can that work?

  • @KeithTingle
    @KeithTingle 11 місяців тому +1

    love these talks

  • @wsippel
    @wsippel 11 місяців тому +12

    I run AI workloads on a 7900XTX. It's a bit of a headache sometimes, but it works. But there's so much performance left on the table. I recently played around with AMD's AITemplate fork, and it's really fast on RDNA. But it's also incomplete and unstable. Triton recently got lots of MFMA optimizations, no WMMA though. They're largely the same thing as far as I understand, except MFMA is Instinct, WMMA is Radeon. I think even most AMD engineers don't realize Radeon has 'Tensor Cores' now.

    • @whoruslupercal1891
      @whoruslupercal1891 11 місяців тому +2

      >They're largely the same thing as far as I understand
      Absolutely not, MFMA is 1 clock whatever matrice size MMA, WMMA is just running wave64 in however many clocks on double the SIMD width.

    • @wsippel
      @wsippel 11 місяців тому +1

      @@whoruslupercal1891 Maybe, but the instructions are mostly the same, no? And WMMA on RDNA3 is actually accelerated (CDNA2, CDNA3 and RDNA3 are the only three architectures supported by rocWMMA, so I assume previous RDNA chips simply didn't have an equivalent), so AMD should probably use those instructions wherever possible.

    • @whoruslupercal1891
      @whoruslupercal1891 11 місяців тому

      @@wsippel >but the instructions are mostly the same, no
      no.
      >CDNA2, CDNA3 and RDNA3 are the only three architectures supported by rocWMMA
      Yea but MFMA is different.

  • @apefu
    @apefu 11 місяців тому

    This some guuud video!

  • @jp-ny2pd
    @jp-ny2pd 11 місяців тому

    I always spun that technical difference as a "One is a more mature, but less complete offering." So then it became a question of what is good enough for their needs.

  • @jordanmccallum1234
    @jordanmccallum1234 11 місяців тому +3

    the promise for ROCm is huge, but better hardware support and better communications about what is and what is intended to be supported is needed. I had to buy a GPU a few years back, and really wanted an AMD GPU for the Linux drivers but I needed tensorflow ability for university. ROCm existed, but there was barely any documentation about what was supported, nothing on what they intend to support, and no timeline for software development, so I got a 2080.
    I remember roughly at the same time, AMD were touting that "you don't need to buy an instinct to do datacenter compute", but how is "datacenter compute is locked to tesla" any different to "there is no software support for radeon" when you want to get real work done *now*?

    • @leucome
      @leucome 11 місяців тому +2

      Better communications for sure. One of the main issue is that the list they provide is not about the GPU that work with rocm but about the GPU AMD offer support. It is totally useless for people who want to know what GPU will actually run or not. As far as I know about all AMD GPU since vega are already working even if AMD dont offer official "support".

  • @Icureditwithmybrain
    @Icureditwithmybrain 11 місяців тому +2

    Will ROCm permit me to leverage my AMD 7900 XTX for accelerating the locally executing personal AI LLM on my PC? Presently, it operates on my CPU, causing sluggish responses from the LLM.

  • @zachnilsson4682
    @zachnilsson4682 11 місяців тому +2

    I'm going to Argonne National Lab later this week. Let me know if you want to sneak into the new super computer there ;)

  • @dmoneyballa
    @dmoneyballa 11 місяців тому +2

    where do you find the model used? I can't find where it is in hugging face. icantbeliveitsnotphotography safe tensors that is.

    • @wargamingrefugee9065
      @wargamingrefugee9065 11 місяців тому +2

      Maybe this, Google: civitai ICBINP - "I Can't Believe It's Not Photography". I'm downloading it now. Best of luck.

  • @SamGib
    @SamGib 11 місяців тому +7

    If AMD wants to get popular, they need to support their consumer grade GPU in ROCm. And also the used market.

  • @sinom
    @sinom 11 місяців тому +1

    I was waiting for this video since the teardown came out

  • @tad2021
    @tad2021 11 місяців тому +2

    We've been using a lot of TPU the past few months. It's such a weird platform with interesting self-imposed bottlenecks, and doesn't help that Google will suddenly reboot or down our nodes for maintenance at least once or more times every few days without any warnings.

  • @outcastp23
    @outcastp23 11 місяців тому +1

    Thanks for the stock tip Wendell! I'm selling all my TSLA and buying up AMD stock.

  • @ddnguyen278
    @ddnguyen278 11 місяців тому +9

    Kinda hard to build for determinism when your hardware does lossy stochastic compression on compute.. Even multiple runs of the same data set wouldn't result in the same output on Nvidia. I suspect if the didn't do that they would be significantly slower.

  • @jpsolares
    @jpsolares 11 місяців тому +1

    There is a tutorial for amd instict and stable difusion? thanks in advance.

  • @dholzric1
    @dholzric1 11 місяців тому +1

    Is there any way to get the new version of rocm to work with the mi25?

  • @jadesprite
    @jadesprite 11 місяців тому +3

    But what I really want to know is, can I use it to TRAIN models too?? Esp on voice and faces, I don't want to upload my family's private data to a cloud service and potentially have them save it forever, I would only trust that locally.

  • @PramitBiswas
    @PramitBiswas 11 місяців тому +1

    Open standards for ML (read TF) kernel API will help massively to achieve cross-hardware support.

  • @Timer5Tim
    @Timer5Tim 11 місяців тому +12

    As nice as it is and as cool as it is, I expect ROCm for windows and Half Life 3 to come out on the same day.....

  • @AI-xi4jk
    @AI-xi4jk 11 місяців тому +8

    Appreciate the work you’ve put into this Wendel. I think AMD needs to support not only frameworks like TF and Torch but also model conversion from one framework/hw to another. Basically the primitives mapping between systems.

  • @b.ambrozio
    @b.ambrozio 5 місяців тому

    Well, why we don't have it on AWS, or GCP? I'm really looking forward to seeing it.

  • @cromefire_
    @cromefire_ 11 місяців тому +1

    One big problem for Google was that you only get full TPUs in Google Cloud, otherwise it'd be pretty different.

  • @CattoRayTube
    @CattoRayTube 11 місяців тому

    Big fan of Evelon Techs

  • @callowaysutton
    @callowaysutton 11 місяців тому +1

    Did you get to test out running LLMs on these GPUs? I'd be curious how many tokens per second these bad boys can push out, especially since it seems like LLMs are going to be a main point of interest for AI companies for at least the next 1-3 years.

  • @stuartlunsford7556
    @stuartlunsford7556 11 місяців тому +6

    AMD's FP64 cores are great, but they still need more dedicated AI silicon, preferably integrated on the same package.

  • @doppelkloppe
    @doppelkloppe 11 місяців тому +1

    Are the differences in the images really due to different precision levels in the hardware or is it (also partly) due to limited determinism and reproducibility? After all you're not guaranteed to get the same image twice, even when using the same seed and HW.

  • @floodo1
    @floodo1 11 місяців тому

    fascinating

  • @SlinkyBass0815
    @SlinkyBass0815 4 місяці тому

    Hi,
    I would like to get started with ML and currently do have 2 offers for graphics card.
    RX 6800 16 GB and RTX 4060 8 GB
    Do you know if the 6800 would be suitable for getting started or is it better to use the 4060?
    Thank you in advance!

  • @MrMaximiliansa
    @MrMaximiliansa 11 місяців тому +6

    Very interesting!
    Do you know why Stable Diffusion seems to use so much more VRAM on the MI210 than on the A100?

    • @Level1Techs
      @Level1Techs  11 місяців тому +5

      Maybe related to the accuracy stuff? I'm not sure tbh

  • @VFPn96kQT
    @VFPn96kQT 11 місяців тому +5

    Hopefully SYCL will abstract platform specific APIs like ROCm/CUDA etc.

    • @mytech6779
      @mytech6779 11 місяців тому +2

      I used to think that but realized I'll grow grey waiting on a decent implementation. SYCL seems to be stuck in some quasi-propriatary limbo with a company that won't or can't make it widely availible.

    • @VFPn96kQT
      @VFPn96kQT 11 місяців тому +1

      @@mytech6779 The most popular Sycl implementations are #OpenSYCL and #DPC++ . Both are open-sourced and work on many different architectures. What do you mean - "stuck in quasi-propriatary limbo with a company" ?

  • @anarekist
    @anarekist 11 місяців тому +1

    Aw was hoping to use rocm my 6800xt

    • @leucome
      @leucome 11 місяців тому

      Try it... I bet it will work. My 6700xt and 7900xt work fine with ROCm. SO I guess that the 6800xt will work too.

  • @DOGMA1138
    @DOGMA1138 11 місяців тому +1

    I'm pretty sure your running torch with cu117 or older, the numbers are about 70% lower than what an A100 puts out with these settings on cu118.... if you did just pip install form the default repo it's cu117.

  • @Fractal_32
    @Fractal_32 11 місяців тому +1

    I’m glad to be an AMD shareholder, although I guess I might grab a few more shares just in case. (My AMD shares have made a killing so far especially off this AI hype bubble.)

  • @Dallen9
    @Dallen9 11 місяців тому +2

    Pausing the Video at 11:37 If AMD is on the left, and Nvidia is on the right. AMD has the better Algorithm running than Nvidia. The Smart phone in Devito's hand isn't merging with the spoon and he has one button on his collar instead of two. Might have taken longer but the image looks more natural which is kind of nuts.

  • @vsz-z2428
    @vsz-z2428 11 місяців тому +1

    thoughts on opensycl?

  • @wecharg
    @wecharg 10 місяців тому

    Best in the world at what he does ^

  • @samghost13
    @samghost13 11 місяців тому +1

    could you use those AI parts on Ryzen? I think it is a Notebook CPU

  • @aacasd
    @aacasd 11 місяців тому +1

    any benchmarks with AMD Ryzen AI?

  • @shieldtablet942
    @shieldtablet942 11 місяців тому +2

    AMD keeps dropping old GPUs in ROCm. RDNA has been ignored forever, I not even OpenCL worked at launch with regular drivers. So there will be little uptake when Nvidia has still something that performs ok at the lower end.
    Gaudi 2 is also looking OK and Intel seems committed to have the software running on potatoes.

  • @VegetableJuiceFTW
    @VegetableJuiceFTW 11 місяців тому +1

    LLMs, please next!

  • @skilletpan5674
    @skilletpan5674 11 місяців тому +1

    There is a fork of automatic that supports AMD. It's in the main project readme or a google. It seems they randomly decided to drop support for some older cards a few months ago (rocm). Rx 5xx isn't supported and I think vega was also dropped.

  • @Cadambank
    @Cadambank 3 місяці тому

    With the new release of ROCm 6.0 can we revisit this topic?

  • @EvanBurnetteMusic
    @EvanBurnetteMusic 11 місяців тому +2

    Would love a better explanation for why the math is different. Could be that floating point math is not commutative. That is A * B does not equal B * A. Optimizing compilers sometimes break the order of operations in the name of speed.

    • @Level1Techs
      @Level1Techs  11 місяців тому +2

      developer.nvidia.com/blog/tensor-cores-mixed-precision-scientific-computing/ mixed precision instead of full fat fp64. Usually the mantissa is not as many bits. Is why fp64 is a diff compute rate than "fp64" for ai

    • @EvanBurnetteMusic
      @EvanBurnetteMusic 11 місяців тому

      @@Level1Techs My first thought was that the AMD card was using f32 instead of bfloat16 but I googled and it looks like bfloat16 has been supported since MI100. Perhaps the port isn't using the bfloat16 yet?

  • @WhhhhhhjuuuuuH
    @WhhhhhhjuuuuuH 11 місяців тому +3

    This is really interesting I was to know about how a 4090 vs a 7900XTX compares for these workflows. I know both are consumer products but I feel at the top end the line is blurred.

  • @dearheart2
    @dearheart2 11 місяців тому

    I am in AL and never have access to the newest HW. Damn ...

  • @WiihawkPL
    @WiihawkPL 10 місяців тому

    working in opengl for a long time i've come to sum it up as nvidia playing it fast and loose and amd being more accurate. and then there's mesa, which is as close to a reference implementation as you'll getp

  • @stevenwest1494
    @stevenwest1494 11 місяців тому +1

    Please please Wendell, use your mighty powers and shake down the answers from above when ROCm Windows support is coming around. I mean it'll actually bring more value to AMD's lackluster RDNA 3 so far!

  • @SamGib
    @SamGib 11 місяців тому +1

    Unless Google sell TPU for enterprise to host themselves, I don't think there will be any large scale adoption to use in consumer products. See, OpenAI trained their model on GPU, best to assume that's Nvidia hardware.

  • @paulwais9219
    @paulwais9219 11 місяців тому +1

    the demo is for inference, but training is key advantage to nvidia. need to get compute cards at gamer card scale in order for that software support to level out. that's why Ponte Vecchio and TPUs are DOA consumer products.
    but let's supposed AMD does catch up for the desktop. for mobile, apple and google and Samsung own their own stacks. for robotics, nvidia already has jetson. the market beyond the desktop would need to be big for AMD to really be able to invest and nail AI

  • @SirMo
    @SirMo 11 місяців тому +1

    Open Source > Proprietary Vendor Lock-ins

  • @DSDSDS1235
    @DSDSDS1235 11 місяців тому +1

    to be honest, you suggested that rocm went from can't train shit to can't train shit, which is what nvidia is specialises in. there are more inference startups dying each day than mi200s and mi300s combined shipped that day, and every vendor is coming up with their own inference chip. why would aws offer mi200 or mi300 when they can offer inf1 of their own and can abstract any software difference under ml frameworks? and if they do, why would anyone use that instead of inf1, or better yet, building their own?

  • @WMRamadan
    @WMRamadan 11 місяців тому +1

    Does ROCm work with Radeon 7900 series cards now?

    • @leucome
      @leucome 11 місяців тому

      Yes... I use a 7900xt+ROCm for generating image with A1111.

  • @mr.selfimprovement3241
    @mr.selfimprovement3241 11 місяців тому +1

    ......I will never look at Danny DaVito the same again. 😱😳😂

  • @sailorbob74133
    @sailorbob74133 11 місяців тому

    What'll be interesting will be MI300C - which will be all CPU chiplets and Xilinix AI chiplets - Turin-AI... MLID has a video about it. A dual socket version could have more TOPs than an H100.

    • @samlebon9884
      @samlebon9884 11 місяців тому

      I imagined AMD would develop that kind of chip. I enen named it MI300AI.
      Could you provide a link to the MI300C?

    • @sailorbob74133
      @sailorbob74133 11 місяців тому

      @@samlebon9884 There's a very reliable rumor channel I've tracked for a few years called Moore's Law is Dead which spoke about the MI300C chip which is all CPU chiplets with HBM3 and a separate AMD project called Turin-AI which is a mix of Zen5 chiplets together with Xilinix AI chiplets on a single package which in a 2P config would be about as powerful as an H100.

  • @eddietoro2682
    @eddietoro2682 11 місяців тому +1

    Canme here for the tech, stayed for the Danny DeVito AI memes

  • @sinom
    @sinom 11 місяців тому +2

    Nvidia going the inaccurate but faster route has always been a problem for AMD. Because Nvidia is the market leader most software actually expects the inaccuracies in implementations of standards that Nvidia has which will lead to the software not working on the technically more accurate implementations on AMD (or even Intel). So to the consumer that will then mean software doesn't work properly on AMD and at the same time runs slower.
    Before the recent rewrite this also was a problem for OpenGL, some versions of DX etc.

  • @shrek22
    @shrek22 9 місяців тому

    Will w7900 with 48gb compare MI210?

  • @davtech
    @davtech 6 місяців тому

    We need an update for ROCm 6.0 and RDNA3

  • @Zoragna
    @Zoragna 11 місяців тому +1

    Non-"PhD students at Oak Ridge" I love that

  • @NoidoDev
    @NoidoDev 11 місяців тому +1

    Let's hope this works for the consumer GPUs as well. MI210 is 20k at Ebay, the A100 at 6k. Hmm. AMD should make a rather slow GPU with a lot of vRAM for inference and below 1k. Allegedly vRAM prices are currently very low.

    • @gatocochino5594
      @gatocochino5594 11 місяців тому +3

      The price difference is because the MI210 is relatively new, released early last year. The A100 is 3 years old now and also the A100 has been deployed a lot more, every cloud provider offers some A100 server instance. Meanwhile the only AMD GPUs I can find are Radeon gaming GPUs. The V620 is just a Navi 21 chip with 32gb of vram.

    • @NoidoDev
      @NoidoDev 11 місяців тому +1

      @@gatocochino5594 Thanks, but I wouldn't pay more just because it's rarer while it doesn't even seem to use less energy.

  • @Artificial-Insanity
    @Artificial-Insanity 10 місяців тому +1

    The differences in the images stem from you using an ancestral sampler, not from the GPU you're using.

  • @MrBillythefisherman
    @MrBillythefisherman 11 місяців тому +1

    Where is a Microsoft DirectX style layer that sits on top of the GPUs and makes ML vendor agnostic (even if it makes it OS dependent)? If you dont like the OS specific DirectX API then swap in Vulkan API. Ive heard of DirectCompute and OpenCL but they dont seem to have gained traction - why? Also why is ROCm needed when you have those APIs - what is it that makes CUDA compete against all of the above?

  • @zen.mn.
    @zen.mn. 11 місяців тому +1

    "I can't believe it's not CUDA" dead

  • @zherkohler4188
    @zherkohler4188 11 місяців тому +1

    Are you sure that the visual differences are because of the different hardware? Is Xformers disabled? I think it should be disabled for a test like this. I think it would explain the visual differences.

  • @sayemprodhanananta144
    @sayemprodhanananta144 11 місяців тому

    training performance...?

  • @dewijones92
    @dewijones92 11 місяців тому

    love it.
    More ai videos please :)

  • @grimtagnbag
    @grimtagnbag 11 місяців тому

    Run a GPT selfhosted instance