Fastware - perf - How to analyse the performance of my program!

Поділитися
Вставка
  • Опубліковано 4 гру 2024

КОМЕНТАРІ • 22

  • @ericli4711
    @ericli4711 Рік тому +5

    At 10:00, I think it's 97.62% for the return because you were sampling "stalled-cycles-frontend", meaning this instruction was stalled the vast majority of the time, waiting for the previous instruction to finish (pipeline can't start executing earlier due to data dependency).

    • @Fastware
      @Fastware  Рік тому +2

      You might be right here. We are probably stalling here due to the instruction decode dispatch stage.

  • @graham12345dd
    @graham12345dd Рік тому +1

    Just EXCELLENT! Thank you

  • @danielfoehr9204
    @danielfoehr9204 10 місяців тому

    Thank you, just what I needed

    • @Fastware
      @Fastware  10 місяців тому

      Glad that you found it useful.

  • @Youda00008
    @Youda00008 2 роки тому +2

    I love that poke at OOP in the end :D

  • @sijiewu
    @sijiewu Рік тому

    Concise and succinct video!

  • @michaeldunlavey6015
    @michaeldunlavey6015 2 роки тому

    I've also done a great deal of performance tuning. The method I rely on requires only a debugger that can be manually interrupted and the call stack displayed. Basically, real software (not just academic toy programs) often has several performance wasters, each taking a percentage of time, like 12.5%, 25%, and 50%. The chance that an interrupt happens while a waste is happening is proportional to its size, and it can readily be seen on the stack. Almost always it is a function call that doesn't really need to be done, and half a dozen halts will spot it. If you find the big one and fix it, speed is doubled, and the remaining time-wasters are twice as big, so they are easier to find the next time you do it. Find all three, and you're eight times faster! Once you do all that, then you can get down to worrying about cache-misses and pipeline stalls.

    • @Fastware
      @Fastware  2 роки тому +2

      Absolutely, first, solve the efficiency aspect of the program. Do not call what does not need to be called, and do not read data that you already have. The performance aspect of the program is about doing the same amount of work just faster in time. Hence, cache misses, data alignment, and false sharing become the dominating factors.

  • @amiralimov3074
    @amiralimov3074 3 місяці тому

    GOATED

  • @coldblade666
    @coldblade666 3 роки тому +4

    Don't know how you don't have many views or any comments, but this was extremely useful! I finally stumbled across perf after looking for better tools than Valgrind to use to profile performance of processes.
    On my system, I use VMs, so I had to make sure the Virtual Performance Counters were enabled for the VM in ESXi to even allow me to use perf.
    I really enjoyed the explanation you gave for some of the statistics that are output as well. Do you know of any good resources for understanding more about branches and other stats, or determining which stats to look at over others, or indicators to look for?

    • @Fastware
      @Fastware  3 роки тому +3

      Hey, thank you for your kind words. Please share if you find it useful it helps a lot.
      An excellent place to start is perf.wiki.kernel.org/index.php/Tutorial. To be perfectly honest with you, I have learned perf by working with it. The perf stat and perf record/report are the two most valuable tools I have used, and I highly recommend them.
      However, if you have a specific problem, please let me know, and I will try to help solve it.

    • @coldblade666
      @coldblade666 3 роки тому

      @@Fastware Thanks! I've been reading through that page trying to soak up what I can! I will probably share your video with other engineers I work with. The software quality assurance department at my company is still gaining its feet, and I'm trying to help pull together tools to help supplement our testing results to provide to our development team. Looking forward to seeing more of your videos. Earned a subscriber from me!

    • @ahmadalastal5303
      @ahmadalastal5303 4 місяці тому

      I was using WSL2 on Windows 11, perf caused me alot of problems there so I decided to switch completely to Linux and here I am.

  • @SuperWhatusername
    @SuperWhatusername 2 роки тому

    Thank you

  • @SumriseHD
    @SumriseHD 7 місяців тому

    You have a very beautiful output, why is mine so ugly?
    some things are "not counted" and all of them have weird names with a u at the end, also the time is not visualized as pretty
    Performance counter stats for './test':
    17.46 msec task-clock:u # 0.957 CPUs utilized
    0 context-switches:u # 0.000 /sec
    0 cpu-migrations:u # 0.000 /sec
    2,013 page-faults:u # 115.290 K/sec
    12,202,743 cpu_atom/cycles/u # 0.699 GHz (42.78%)
    cpu_core/cycles/u (0.00%)
    26,648,232 cpu_atom/instructions/u # 2.18 insn per cycle (54.24%)
    cpu_core/instructions/u (0.00%)
    1,985,735 cpu_atom/branches/u # 113.729 M/sec (54.35%)
    cpu_core/branches/u (0.00%)
    4,006 cpu_atom/branch-misses/u # 0.20% of all branches (59.36%)
    cpu_core/branch-misses/u (0.00%)
    TopdownL1 (cpu_atom) # 15.5 % tma_bad_speculation
    # 40.9 % tma_retiring (65.80%)
    # 42.1 % tma_backend_bound
    # 42.1 % tma_backend_bound_aux
    # 1.5 % tma_frontend_bound (71.50%)
    18,888,656 L1-dcache-loads:u # 1.082 G/sec (57.09%)
    L1-dcache-loads:u (0.00%)
    L1-dcache-load-misses:u
    L1-dcache-load-misses:u (0.00%)
    609 LLC-loads:u # 34.879 K/sec (51.39%)
    LLC-loads:u (0.00%)
    0 LLC-load-misses:u (45.66%)
    LLC-load-misses:u (0.00%)
    0.018244928 seconds time elapsed
    0.012015000 seconds user
    0.005997000 seconds sys

    • @Fastware
      @Fastware  7 місяців тому +1

      Hi,
      Thanks for your comment. The not counted or not supported can be either because you are running in a virtual machine or because the Linux kernel does not support the CPU that you are running on. Additionally, kernel config might be a problem. Try relaxing sysctl-explorer.net/kernel/perf_event_paranoid/ and sysctl-explorer.net/kernel/kptr_restrict/ or running as root. The '/u' or ':u' after the counter indicates that the counter represents user space counters only.
      Let me know if you will manage to get it working or email me, which you can find in the video description, and we can try to get it working.

  • @technicaluserco
    @technicaluserco Рік тому

    Mates, I found this jewell video,
    but I could not do the very first step:
    when running perf
    got:
    /sbin/perf: line 6: /usr/libexec/perf.4.18.0-425.19.2.el8_7.x86_64: No such file or directory
    uninstalling, installing, reinstalling, nothing helps :(

    • @Fastware
      @Fastware  Рік тому

      Hey, which OS are you running? When you run 'which perf' what is the output?
      On some distros perf needs a few packages to get it working correctly.