52
54 945

ChEESE and POP: a Story of Success and Fruitful Interaction

54:53

Assessing CPU Code Quality

46:26

How to Use POP Services

1:30

The CARM Tool: Cache-aware Roofline Model for HPC

49:44

Performance Analysis of OpenMP Target Offloading in Score-P

44:13

Asynchronous GPU Programming in OpenMP

1:09:58

OpenMP 6.0 Part 1: New Host-side Features and Enhancements

This POP webinar starts with a discussion on OpenMP's development process, highlighting the strategic roadmap and the continuum of control that guides the evolution of this powerful API.
Attendees then learn about the improvements in base language support for C23, C++23, and FORTRAN. Then, key updates in tasking will be presented, including the introduction of free-agent threads, the taskgraph construct and the concept of iteration within tasking contexts. Participants also learn about the loop transformation capabilities, featuring a general overview and insights into user-defined inductions. Finally, the webinar briefly touches upon other noteworthy features, such as threadset and transparent tasks.
The second part of the webinar features new OpenMP 6.0 device offloading features.
About the Presenters
Dr. Christian Terboven leads the HPC group at RWTH Aachen University as a senior scientist. His research interests center around Parallel Programming and related Software Engineering aspects. He leads several research projects in the area of programming models and the improvement of productivity and efficiency of modern HPC systems. In the context of OpenMP, Christian is the co-lead of the Affinity Subcommittee within the OpenMP Language Committee, and he is co-author of the book "Using OpenMP - The Next Step", published by MIT Press. Christian has been involved in POP since the beginning and currently leads the work package Co-design.
Dr. Michael Klemm is a Principal Member of Technical Staff in the Compilers, Languages, Runtimes & Tools team of the Machine Learning & Software Engineering group at AMD. He is part of the OpenMP compiler team, focusing on application and kernel performance for AMD Instinct accelerators for High Performance and Throughput Computing. He holds an M.Sc. in Computer Science and a Doctor of Engineering degree (Dr.-Ing.) in Computer Science from the Friedrich-Alexander-University Erlangen-Nuremberg, Germany. Michael's research focus is on compilers and runtime optimizations for distributed systems. His areas of interest include compiler construction, design of programming languages, parallel programming, and performance analysis and tuning. Michael is the Chief Executive Officer of the OpenMP Architecture Review Board.
Slides: pop-coe.eu/sites/default/files/pop_files/pop-webinar-openmp6-part1.pdf
POP CoE: pop-coe.eu

Відео

ChEESE and POP: a Story of Success and Fruitful Interaction

54:53

ChEESE and POP: a Story of Success and Fruitful Interaction

Переглядів 562 місяці тому

Among the main objectives of the ChEESE-2P project are the improvement of the performance of the 11 flagship codes of the CoE towards Exascale.These codes are integrated into a Scientific and Computational Grand Challenge of significant impact in the Solid Earth field. Work package 2 (WP2) of the ChEESE CoE is the part of the project where this work on the flagship codes is designed, executed, ...

46:26

Assessing CPU Code Quality

Переглядів 852 місяці тому

Code quality is essential for getting high performance: for various reasons (including poor performance models, lack of adequate transformations, and limited analysis capabilities) compilers are often producing suboptimal codes, which can significantly hurt performance. MAQAO is a performance analysis framework offering features designed for assessing CPU (X86 and ARM) code quality, detecting p...

1:30

How to Use POP Services

Переглядів 142 місяці тому

Simple cartoon explaining how to use POP services (pop-coe.eu/services)

The CARM Tool: Cache-aware Roofline Model for HPC

49:44

The CARM Tool: Cache-aware Roofline Model for HPC

Переглядів 1554 місяці тому

In recent years, HPC systems have become increasingly complex and heterogeneous, making application development and optimisation challenging. To this respect, intuitive performance models like the Cache-aware Roofline Model (CARM) offer effective guidance by providing insights into bottlenecks that limit the application’s ability to reach the system’s maximum performance. The current landscape ...

Performance Analysis of OpenMP Target Offloading in Score-P

44:13

Performance Analysis of OpenMP Target Offloading in Score-P

Переглядів 1677 місяців тому

With increasing demand in compute performance of HPC systems, accelerators are getting the main focus for application development. Many of the Top500 HPC systems now include accelerators, with the top 3 systems alone having accelerators of three different vendors. This diversity requires application developers to choose portable frameworks to support all at the same time, as developing applicat...

1:09:58

Asynchronous GPU Programming in OpenMP

Переглядів 5578 місяців тому

The OpenMP 4.0 standard introduced support for accelerator and GPU programming and there are many introductory tutorials available. In this webinar, we will present OpenMP's support for asynchronous kernel offloading and explain how to use it. In addition, we will show how OpenMP supports the combination with GPU-native programming models. About the Presenters Dr. Christian Terboven leads the H...

Six and a half years of POP CoE: What Remains?

47:11

Six and a half years of POP CoE: What Remains?

Переглядів 822 роки тому

The EU Performance Optimisation and Productivity Centre of Excellence in HPC (POP CoE) operated from October 2015 to May 2022. In its lifetime, it provided over 400 Performance Assessment or Proof-of-Concept services free of charge to many academic and research organisations, SMEs, ISVs, or companies in Europe. The services were based on the successful POP Performance Metrics and Methodology de...

48:01

Resources for Co Design

Переглядів 1033 роки тому

Resources for co-design (co-design.pop-coe.eu) is a section within the POP website which gathers together a set of typical behavioural patterns seen in HPC codes, potentially resulting in some kind of performance degradation, that POP has identified in our analyses of user applications. For each of these patterns, the site links to the corresponding best-practice(s) that address their performan...

0:33

The POP Superheroes

Переглядів 4563 роки тому

Learn about the POP service and see if you can spot the five famous European landmarks that our POP superheroes fly over (clue: there’s one from each country where one or more POP partners are based). Find out more at the POP website: pop-coe.eu/

53:37

POP: The SME Perspective

Переглядів 1873 роки тому

POP (Performance Optimisation and Productivity) is an EU Centre of Excellence focussed on improving the performance of parallel codes. Our analysts profile the performance of such codes and identify ways in which they can be improved. In many cases, we write codes to demonstrate those improvements in performance. These services are free throughout the EU and UK. While we welcome many customers ...

48:14

Introduction to Paraver

Переглядів 8843 роки тому

Paraver is a browser to process and visualize traces capturing the behaviour of parallel programs. Paraver is at the core of the BSC tools framework, and allows very detailed qualitative and quantitative analysis of traces. It has a flexible programmable interface that lets the analyst tailor it and squeeze the information within the data. This 30-minute webinar describes the fundamentals behin...

Module 4.2a: Introduction to Paraver - Timelines

21:57

Module 4.2a: Introduction to Paraver - Timelines

Переглядів 2423 роки тому

What you will learn How to use the timelines of the trace visualizer Paraver Prerequisites Paraver trace visualizer installed Speaker: Jesus Labarta (BSC) Module web page: pop-coe.eu/further-information/online-training/using-pop-tools-paraver

Module 4.2b: Introduction to Paraver - Tables

27:59

Module 4.2b: Introduction to Paraver - Tables

Переглядів 1463 роки тому

What you will learn How to use the tables of the trace visualizer Paraver Prerequisites Paraver trace visualizer installed Speaker: Jesus Labarta (BSC) Module web page: pop-coe.eu/further-information/online-training/using-pop-tools-paraver

Module 4.2c: Introduction to Paraver - Semantic Functions

23:38

Module 4.2c: Introduction to Paraver - Semantic Functions

Переглядів 1503 роки тому

What you will learn How to use the semantic functions of the trace visualizer Paraver Prerequisites Paraver trace visualizer installed Speaker: Jesus Labarta (BSC) Module web page: pop-coe.eu/further-information/online-training/using-pop-tools-paraver

15:43

Module 4.2: Introduction to Paraver

Переглядів 2923 роки тому

Module 4.2: Introduction to Paraver

Module 8: Computing the POP Metrics with Score-P, Scalasca, Cube

11:57

Module 8: Computing the POP Metrics with Score-P, Scalasca, Cube

Переглядів 3033 роки тому

Module 8: Computing the POP Metrics with Score-P, Scalasca, Cube

The Scalasca Scalable Parallel Performance Analysis Toolset - for POP assessments and beyond

48:56

The Scalasca Scalable Parallel Performance Analysis Toolset - for POP assessments and beyond

Переглядів 5583 роки тому

The Scalasca Scalable Parallel Performance Analysis Toolset - for POP assessments and beyond

Debugging Tools for Correctness Analysis of MPI and OpenMP Applications

59:40

Debugging Tools for Correctness Analysis of MPI and OpenMP Applications

Переглядів 3003 роки тому

Debugging Tools for Correctness Analysis of MPI and OpenMP Applications

19:04

Using Paraver: On Sampling In Traces

Переглядів 1833 роки тому

Using Paraver: On Sampling In Traces

20:21

Using Paraver: Life without Noise

Переглядів 1213 роки тому

Using Paraver: Life without Noise

25:32

Using Paraver: See the Noise

Переглядів 1263 роки тому

Using Paraver: See the Noise

19:10

Using Paraver: Identifying Structure

Переглядів 3693 роки тому

Using Paraver: Identifying Structure

12:26

POPCast #4: Why Does Code Matter?

Переглядів 1113 роки тому

POPCast #4: Why Does Code Matter?

Identifying performance bottlenecks in hybrid MPI + OpenMP software

51:21

Identifying performance bottlenecks in hybrid MPI + OpenMP software

Переглядів 5613 роки тому

Identifying performance bottlenecks in hybrid MPI OpenMP software

PyPOP An interactive tool for performance assessment

42:09

PyPOP An interactive tool for performance assessment

Переглядів 2,4 тис.4 роки тому

PyPOP An interactive tool for performance assessment

Profiling GPU Applications with Nsight Systems

54:53

Profiling GPU Applications with Nsight Systems

Переглядів 31 тис.4 роки тому

Profiling GPU Applications with Nsight Systems

Inclusive Leadership and Inspiring Action and Innovation

56:44

Inclusive Leadership and Inspiring Action and Innovation

Переглядів 864 роки тому

Inclusive Leadership and Inspiring Action and Innovation

20:36

Module 7: Using POP Tools: Cube

Переглядів 3854 роки тому

Module 7: Using POP Tools: Cube

Module 6: Using POP Tools: Score-P and Scalasca

16:02

Module 6: Using POP Tools: Score-P and Scalasca

Переглядів 1,3 тис.4 роки тому

Module 6: Using POP Tools: Score-P and Scalasca

КОМЕНТАРІ

@OpenMPARB 2 місяці тому
Thanks for the webinar. For your information, the "illustration of tasking" section of the video starting at 5:40 inspired us to make a video clip on tasking.
@NinjaAdorable Рік тому
How do you run this on WSL?
@tha_ba2s Рік тому
audio quality is bad. cant see what is going on; too zoomed out.
@hestonc4553 Рік тому
Hello, thank you for the helpful video. Is it possible to get the improved code for Slide 24 that has flattened loops?
@krinodagamer6313 Рік тому
I have an 4090 I'm tryna learn Nsight very useful tool
@kentgauen 3 роки тому
This tutorial is very useful for my research project. Thank you Dr. Kabiri Chimeh!
@VladimirStegailov 4 роки тому
The initial execution time was 3.52 sec, and after instrumentation you got 3.24 sec. So there is no overhead but some acceleration of the execution! What could be the reason for such a behaviour?
@ИльяТимохин-н7е 3 роки тому
I think that was just one of possible values (cause we didn't see more values from MPI execution, so may be the "mean value" for such task is 3.24)
@zyten 2 роки тому
There's a simple explanation for it. When looking at the source code, the examples use the wall clock time to determine how long the execution took. This time can be influenced by the operating system or other running programs. Therefore, you normally take multiple measurements and use those to determine your runtime of the application instead of just a single measurement.
@kesav1985 4 роки тому
Hi, very nice introduction to MAQAO. I am interested in using this tool in my research. I download the Linux binaries. However, I have not been able to figure out how to run the simulation for collecting the profiling the info and visualise it. A video tutorial demonstrating the usage with one example would be greatly appreciated. Thanks.
@riper1303 6 років тому
Great video thanks :)
@giorgosstamatakis7144 6 років тому
Nice video, are the slides available?
@POPHPC 6 років тому
The slides link has been added to the video description. They are available here: pop-coe.eu/sites/default/files/pop_files/pop-webinar-openmptasking.pdf

POP HPC

КОМЕНТАРІ