CUDA Crash Course (v2): Unified Memory

Поділитися
Вставка
  • Опубліковано 5 жов 2024

КОМЕНТАРІ • 20

  • @rbaleksandar
    @rbaleksandar Рік тому +1

    Lovely tutorial. Far superior than sifting through all the spagetti that is the CUDA documentation.

  • @mario7501
    @mario7501 4 роки тому +5

    These are some high quality tutorials! I really hope you’ll get more traffic to your channel soon!

  • @stevenh7729
    @stevenh7729 2 роки тому

    This series of tutorials is very helpful for me, a novice who is just learning cuda! keep it up!

  • @lucasdiazmiguez8680
    @lucasdiazmiguez8680 4 роки тому +4

    Hi, im from Argentina. I would love to see the next episode of this series!
    Im new to the CUDA arquitecture, so im getting started with this videos and the book CUDA by example.
    Thanks a lot for this videos!
    Any book you recommend for me?
    Saludos!

  • @TheKenigham
    @TheKenigham 2 роки тому +1

    Thanks for the cuda videos! they are very helpful!

  • @Dr.tech-
    @Dr.tech- 3 роки тому +1

    Thank you. This is a high-quality tutorial. I have checked our channel and it is great. keep it up

  • @closerlookcrime
    @closerlookcrime Рік тому +1

    Great video. I learned a lot. Thank-you sir.

  • @quantphobia2944
    @quantphobia2944 Рік тому

    Hi, Nick; amazing tutorial; I was looking for something like this; get down to coding right away.
    nvprof is no longer used with devices having more than 8.0 compute capability. To profile, one can use this, for example, "nsys profile --stats=true -t cuda ./vector_add_unified_memory.out.
    Also, will there be version 2 of your previous sum reduction videos? Thanks a ton for sharing

  • @penguimTwo
    @penguimTwo 4 роки тому +1

    Thank you!

  • @anonymoussloth6687
    @anonymoussloth6687 10 місяців тому

    Please make more cuda videos

  • @lolololo359
    @lolololo359 4 роки тому +1

    Hi , thanks so much for all the videos which have been very helpful. If you don't mind , I have a question . For some reason my device has significantly more counts than yours even when using the prefetch code. Do you know what might be the issue?
    ==25060== Unified Memory profiling result:
    Device "GeForce RTX 2080 Ti (0)"
    Count Avg Size Min Size Max Size Total Size Total Time Name
    16 32.000KB 32.000KB 32.000KB 512.0000KB 292.5000us Host To Device
    36 35.555KB 32.000KB 128.00KB 1.250000MB 6.072400ms Device To Host

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  4 роки тому +1

      Different GPUs and driver versions will likely behave differently. I had to play around with my hints to get the results I did in the video. Playing around with the hints on your system will likely be your best bet. Hope this helps!

    • @lolololo359
      @lolololo359 4 роки тому +2

      @@CoffeeBeforeArch Hi thanks for the reply , I believe its because i'm on windows at the prefetch commands don't work there hahaha

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  4 роки тому +1

      Ah, makes sense! Haha

  • @abdelhaksaouli8802
    @abdelhaksaouli8802 3 роки тому

    quick question Sir, if the size of the memory is bigger then the GPU max size, what will happen ? For instance you have like 3 GB of data allocated on Host how does the unified memory deal with it ? will it send it patch by patch or what ?

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  3 роки тому +1

      Good question - It somewhat depends on the GPU arch you're working with. For some, you'll be limited by the max capacity of the GPU (i.e., you can't reserve a unified memory region larger than what is available on the GPU). Newer GPUs (pascal and later and only on Linux) support memory oversubscription where you can reserve more than what is available on the GPU (and data will be paged back and forth as needed).

    • @abdelhaksaouli8802
      @abdelhaksaouli8802 3 роки тому

      @@CoffeeBeforeArch Thank you for you quick replay Sir , BTW good videos hope to see more.