Tim Besard - GPU Programming in Julia: What, Why and How?

Поділитися
Вставка
  • Опубліковано 16 чер 2024
  • This talk will introduce the audience to GPU programming in Julia. It will explain why GPUs can be useful for scientific computing, and how Julia makes it really easy to use GPUs. The talk is aimed at people who are familiar with Julia and want to learn how to use GPUs in their Julia code.
    Resources
    AMDGPU.jl package repository: github.com/JuliaGPU/AMDGPU.jl
    CUDA.jl package repository: github.com/JuliaGPU/CUDA.jl
    Metal.jl package repository: github.com/JuliaGPU/Metal.jl
    oneAPI.jl package repository: github.com/JuliaGPU/oneAPI.jl
    Contents
    00:00 Introduction
    00:31 Back to the basics: What are GPUs?
    01:26 Why you should use GPUs?
    02:01 All toolkits provided by vendors are using low level languages. So, time to switch to Julia
    02:20 We now have Julia packages for creating code for GPUs of all major vendors
    02:48 Funding principles of JuliaGPU ecosystem
    03:23 Principle 1: Userfriendlines
    04:54 Principle 2: Multiple programming interfaces
    05:24 Main interface to program on GPU: GPU's arrays
    06:43 The main power of Julia comes from higher-order abstractions, this is also true on GPUs
    07:47 Array programming is powerful
    08:23 Kernel programming give us performance & flexibility
    09:30 We don't want to put too many abstraction into kernel code, here is why
    10:04 We want to keep consistency across Julia GPU's ecosystem
    10:47 Kernel programming features that we support
    11:24 Support of more advanced features
    11:37 What is JIT doing behind the scene?
    12:37 Benchmarking and profiling
    12:51 How to benchmark your GPU's code correctly?
    13:46 You can't profile your GPU's code using standard methods, you must use vendor-specific tools
    14:24 How do we ACTUALLY use all this?
    15:15 We don't need to use `CUDA.@sync`, here is why
    15:32 We disable scalar iteration
    16:09 Optimizing array operations for the GPU
    17:13 Pro tip: Write generic array code!
    18:21 Contrived example of using generic code
    19:05 Let's write a kernel
    19:36 Writing fast GPU code isn't trivial
    21:02 Let's write a PORTABLE kernel
    21:36 Pros and cons of kernel abstractions
    22:07 Kernel abstractions and high-performance code
    22:35 Conclusion
    24:07 Q&A: Do you implemented dummy GPU type that actually runs on GPU?
    25:51 Q&A: What about support for vendor-agnostic backends like Vulkan?
    27:12 Q&A: What is a status of project like OpenCL?
    28:45 Q&A: How easy is to use multiple GPUs at once?
    29:45 Closing applause
    Want to help add timestamps to our UA-cam videos to help with discoverability? Find out more here: github.com/JuliaCommunity/You...
    Interested in improving the auto generated captions? Get involved here: github.com/JuliaCommunity/You...
  • Наука та технологія

КОМЕНТАРІ • 11

  • @kamilziemian995
    @kamilziemian995 4 місяці тому +4

    Tim Besard is a magician that just happens to work on GPUs.

  • @kamilziemian995
    @kamilziemian995 5 місяців тому +1

    What a fantastic presentation. 😲

  • @conradwiebe7919
    @conradwiebe7919 5 місяців тому +1

    The slide at 14:30 goes against what was said on a previous slide (12:50), using `@btime` alone to time the GPU operation when we said that the GPU is running asynchronously. Should this be `@btime CUDA.@sync`? (I don't do GPU programming so I have no familiarity here)

    • @conradwiebe7919
      @conradwiebe7919 5 місяців тому +2

      ope, he answered the question a little later on

  • @mattettus1934
    @mattettus1934 5 місяців тому +1

    I tried the RMSE example on a oneAPI GPU (Iris Xe) and it is 2x SLOWER than the CPU. Am I doing something wrong or is the Xe really that bad?

  • @Ptr-NG
    @Ptr-NG 5 місяців тому

    I'd like to learn Julia for finite element method analysis... which Package(s) should I focus on? Thank you

    • @chrisrackauckasofficial
      @chrisrackauckasofficial 5 місяців тому +6

      I'd recommend Ferrite.jl or Gridap.jl

    • @Ptr-NG
      @Ptr-NG 5 місяців тому

      I found Gridap.jl... Thank you@@chrisrackauckasofficial

  • @conradwiebe7919
    @conradwiebe7919 5 місяців тому +6

    I feel like CUDA.allowscalar should be false by default

    • @conradwiebe7919
      @conradwiebe7919 5 місяців тому +1

      I don't know how common it is to have a situation where you have a workflow that requires you to fallback to a loop on the GPU but it seems like that should be a hard error by default while allowing a user to opt in to inefficient usage of the GPU if wanted. I think most people would want the feedback that something is sub-optimal. Maybe instead of the error/no error, it could be error/print a warning message?

    • @alexgamingtv7118
      @alexgamingtv7118 5 місяців тому

      The important section from the docs says: Scalar indexing is only allowed in an interactive session, e.g. the REPL, because it is convenient when porting CPU code to the GPU. If you want to disallow scalar indexing, e.g. to verify that your application executes correctly on the GPU, call the allowscalar function…