Extrae.jl: Advance profiling of Julia code in HPC clusters | Giordano, Sánchez Ramírez, Garcia Saez

Поділитися
Вставка
  • Опубліковано 18 вер 2024
  • Extrae.jl: Advance profiling of Julia code in HPC clusters by Mosè Giordano, Sergio Sánchez Ramírez, Artur Garcia Saez
    PreTalx: pretalx.com/ju...
    GitHub: github.com/bsc...
    Julia has revolutionized the development of high-performance applications. Yet its native profiling capabilities limit to callstack sampling (i.e. periodic statistical sampling of the process callstack). Although useful for identifying bottlenecks, callstack sampling fails to explain the source of performance degradation. Without querying hardware counters, it is impossible to know if vector units are fully used or that memory access pattern is provoking many cache misses.
    In this work, we present the bindings to the `Extrae` tracer and sampler profiler developed at BSC.
    It lets you:
    - Annotate user regions
    - Sample hardware counters
    - Inspect the callstack inside C libraries
    - Mark inter-node, inter-process, inter-thread communication
    - Intercept MPI, CUDA and OpenMP calls
    - Emit custom user events
    We will showcase the performance evaluation of a some scientific apps written in Julia on x86_64 and AArch64 architectures. Some of these architectures features interesting capabilities such as scalable vector ISA (i.e. SVE) or unified memory between CPU and GPU (e.g. NVIDIA Grace Hopper). `Extrae` will show to be vital to understand its performance behaviour, and to later optimize it.

КОМЕНТАРІ •