- 52
- 54 945
POP HPC
Germany
Приєднався 25 січ 2018
EuroHPC JU funded Centre of Excellence Performance Optimization and Productivity (POP) to boost performance and productivity in HPC applications
OpenMP 6.0 Part 1: New Host-side Features and Enhancements
This POP webinar starts with a discussion on OpenMP's development process, highlighting the strategic roadmap and the continuum of control that guides the evolution of this powerful API.
Attendees then learn about the improvements in base language support for C23, C++23, and FORTRAN. Then, key updates in tasking will be presented, including the introduction of free-agent threads, the taskgraph construct and the concept of iteration within tasking contexts. Participants also learn about the loop transformation capabilities, featuring a general overview and insights into user-defined inductions. Finally, the webinar briefly touches upon other noteworthy features, such as threadset and transparent tasks.
The second part of the webinar features new OpenMP 6.0 device offloading features.
About the Presenters
Dr. Christian Terboven leads the HPC group at RWTH Aachen University as a senior scientist. His research interests center around Parallel Programming and related Software Engineering aspects. He leads several research projects in the area of programming models and the improvement of productivity and efficiency of modern HPC systems. In the context of OpenMP, Christian is the co-lead of the Affinity Subcommittee within the OpenMP Language Committee, and he is co-author of the book "Using OpenMP - The Next Step", published by MIT Press. Christian has been involved in POP since the beginning and currently leads the work package Co-design.
Dr. Michael Klemm is a Principal Member of Technical Staff in the Compilers, Languages, Runtimes & Tools team of the Machine Learning & Software Engineering group at AMD. He is part of the OpenMP compiler team, focusing on application and kernel performance for AMD Instinct accelerators for High Performance and Throughput Computing. He holds an M.Sc. in Computer Science and a Doctor of Engineering degree (Dr.-Ing.) in Computer Science from the Friedrich-Alexander-University Erlangen-Nuremberg, Germany. Michael's research focus is on compilers and runtime optimizations for distributed systems. His areas of interest include compiler construction, design of programming languages, parallel programming, and performance analysis and tuning. Michael is the Chief Executive Officer of the OpenMP Architecture Review Board.
Slides: pop-coe.eu/sites/default/files/pop_files/pop-webinar-openmp6-part1.pdf
POP CoE: pop-coe.eu
Attendees then learn about the improvements in base language support for C23, C++23, and FORTRAN. Then, key updates in tasking will be presented, including the introduction of free-agent threads, the taskgraph construct and the concept of iteration within tasking contexts. Participants also learn about the loop transformation capabilities, featuring a general overview and insights into user-defined inductions. Finally, the webinar briefly touches upon other noteworthy features, such as threadset and transparent tasks.
The second part of the webinar features new OpenMP 6.0 device offloading features.
About the Presenters
Dr. Christian Terboven leads the HPC group at RWTH Aachen University as a senior scientist. His research interests center around Parallel Programming and related Software Engineering aspects. He leads several research projects in the area of programming models and the improvement of productivity and efficiency of modern HPC systems. In the context of OpenMP, Christian is the co-lead of the Affinity Subcommittee within the OpenMP Language Committee, and he is co-author of the book "Using OpenMP - The Next Step", published by MIT Press. Christian has been involved in POP since the beginning and currently leads the work package Co-design.
Dr. Michael Klemm is a Principal Member of Technical Staff in the Compilers, Languages, Runtimes & Tools team of the Machine Learning & Software Engineering group at AMD. He is part of the OpenMP compiler team, focusing on application and kernel performance for AMD Instinct accelerators for High Performance and Throughput Computing. He holds an M.Sc. in Computer Science and a Doctor of Engineering degree (Dr.-Ing.) in Computer Science from the Friedrich-Alexander-University Erlangen-Nuremberg, Germany. Michael's research focus is on compilers and runtime optimizations for distributed systems. His areas of interest include compiler construction, design of programming languages, parallel programming, and performance analysis and tuning. Michael is the Chief Executive Officer of the OpenMP Architecture Review Board.
Slides: pop-coe.eu/sites/default/files/pop_files/pop-webinar-openmp6-part1.pdf
POP CoE: pop-coe.eu
Переглядів: 133
Відео
ChEESE and POP: a Story of Success and Fruitful Interaction
Переглядів 562 місяці тому
Among the main objectives of the ChEESE-2P project are the improvement of the performance of the 11 flagship codes of the CoE towards Exascale.These codes are integrated into a Scientific and Computational Grand Challenge of significant impact in the Solid Earth field. Work package 2 (WP2) of the ChEESE CoE is the part of the project where this work on the flagship codes is designed, executed, ...
Assessing CPU Code Quality
Переглядів 852 місяці тому
Code quality is essential for getting high performance: for various reasons (including poor performance models, lack of adequate transformations, and limited analysis capabilities) compilers are often producing suboptimal codes, which can significantly hurt performance. MAQAO is a performance analysis framework offering features designed for assessing CPU (X86 and ARM) code quality, detecting p...
How to Use POP Services
Переглядів 142 місяці тому
Simple cartoon explaining how to use POP services (pop-coe.eu/services)
The CARM Tool: Cache-aware Roofline Model for HPC
Переглядів 1554 місяці тому
In recent years, HPC systems have become increasingly complex and heterogeneous, making application development and optimisation challenging. To this respect, intuitive performance models like the Cache-aware Roofline Model (CARM) offer effective guidance by providing insights into bottlenecks that limit the application’s ability to reach the system’s maximum performance. The current landscape ...
Performance Analysis of OpenMP Target Offloading in Score-P
Переглядів 1677 місяців тому
With increasing demand in compute performance of HPC systems, accelerators are getting the main focus for application development. Many of the Top500 HPC systems now include accelerators, with the top 3 systems alone having accelerators of three different vendors. This diversity requires application developers to choose portable frameworks to support all at the same time, as developing applicat...
Asynchronous GPU Programming in OpenMP
Переглядів 5578 місяців тому
The OpenMP 4.0 standard introduced support for accelerator and GPU programming and there are many introductory tutorials available. In this webinar, we will present OpenMP's support for asynchronous kernel offloading and explain how to use it. In addition, we will show how OpenMP supports the combination with GPU-native programming models. About the Presenters Dr. Christian Terboven leads the H...
Six and a half years of POP CoE: What Remains?
Переглядів 822 роки тому
The EU Performance Optimisation and Productivity Centre of Excellence in HPC (POP CoE) operated from October 2015 to May 2022. In its lifetime, it provided over 400 Performance Assessment or Proof-of-Concept services free of charge to many academic and research organisations, SMEs, ISVs, or companies in Europe. The services were based on the successful POP Performance Metrics and Methodology de...
Resources for Co Design
Переглядів 1033 роки тому
Resources for co-design (co-design.pop-coe.eu) is a section within the POP website which gathers together a set of typical behavioural patterns seen in HPC codes, potentially resulting in some kind of performance degradation, that POP has identified in our analyses of user applications. For each of these patterns, the site links to the corresponding best-practice(s) that address their performan...
The POP Superheroes
Переглядів 4563 роки тому
Learn about the POP service and see if you can spot the five famous European landmarks that our POP superheroes fly over (clue: there’s one from each country where one or more POP partners are based). Find out more at the POP website: pop-coe.eu/
POP: The SME Perspective
Переглядів 1873 роки тому
POP (Performance Optimisation and Productivity) is an EU Centre of Excellence focussed on improving the performance of parallel codes. Our analysts profile the performance of such codes and identify ways in which they can be improved. In many cases, we write codes to demonstrate those improvements in performance. These services are free throughout the EU and UK. While we welcome many customers ...
Introduction to Paraver
Переглядів 8843 роки тому
Paraver is a browser to process and visualize traces capturing the behaviour of parallel programs. Paraver is at the core of the BSC tools framework, and allows very detailed qualitative and quantitative analysis of traces. It has a flexible programmable interface that lets the analyst tailor it and squeeze the information within the data. This 30-minute webinar describes the fundamentals behin...
Module 4.2a: Introduction to Paraver - Timelines
Переглядів 2423 роки тому
What you will learn How to use the timelines of the trace visualizer Paraver Prerequisites Paraver trace visualizer installed Speaker: Jesus Labarta (BSC) Module web page: pop-coe.eu/further-information/online-training/using-pop-tools-paraver
Module 4.2b: Introduction to Paraver - Tables
Переглядів 1463 роки тому
What you will learn How to use the tables of the trace visualizer Paraver Prerequisites Paraver trace visualizer installed Speaker: Jesus Labarta (BSC) Module web page: pop-coe.eu/further-information/online-training/using-pop-tools-paraver
Module 4.2c: Introduction to Paraver - Semantic Functions
Переглядів 1503 роки тому
What you will learn How to use the semantic functions of the trace visualizer Paraver Prerequisites Paraver trace visualizer installed Speaker: Jesus Labarta (BSC) Module web page: pop-coe.eu/further-information/online-training/using-pop-tools-paraver
Module 8: Computing the POP Metrics with Score-P, Scalasca, Cube
Переглядів 3033 роки тому
Module 8: Computing the POP Metrics with Score-P, Scalasca, Cube
The Scalasca Scalable Parallel Performance Analysis Toolset - for POP assessments and beyond
Переглядів 5583 роки тому
The Scalasca Scalable Parallel Performance Analysis Toolset - for POP assessments and beyond
Debugging Tools for Correctness Analysis of MPI and OpenMP Applications
Переглядів 3003 роки тому
Debugging Tools for Correctness Analysis of MPI and OpenMP Applications
Identifying performance bottlenecks in hybrid MPI + OpenMP software
Переглядів 5613 роки тому
Identifying performance bottlenecks in hybrid MPI OpenMP software
PyPOP An interactive tool for performance assessment
Переглядів 2,4 тис.4 роки тому
PyPOP An interactive tool for performance assessment
Profiling GPU Applications with Nsight Systems
Переглядів 31 тис.4 роки тому
Profiling GPU Applications with Nsight Systems
Inclusive Leadership and Inspiring Action and Innovation
Переглядів 864 роки тому
Inclusive Leadership and Inspiring Action and Innovation
Module 6: Using POP Tools: Score-P and Scalasca
Переглядів 1,3 тис.4 роки тому
Module 6: Using POP Tools: Score-P and Scalasca
Thanks for the webinar. For your information, the "illustration of tasking" section of the video starting at 5:40 inspired us to make a video clip on tasking.
How do you run this on WSL?
audio quality is bad. cant see what is going on; too zoomed out.
Hello, thank you for the helpful video. Is it possible to get the improved code for Slide 24 that has flattened loops?
I have an 4090 I'm tryna learn Nsight very useful tool
This tutorial is very useful for my research project. Thank you Dr. Kabiri Chimeh!
The initial execution time was 3.52 sec, and after instrumentation you got 3.24 sec. So there is no overhead but some acceleration of the execution! What could be the reason for such a behaviour?
I think that was just one of possible values (cause we didn't see more values from MPI execution, so may be the "mean value" for such task is 3.24)
There's a simple explanation for it. When looking at the source code, the examples use the wall clock time to determine how long the execution took. This time can be influenced by the operating system or other running programs. Therefore, you normally take multiple measurements and use those to determine your runtime of the application instead of just a single measurement.
Hi, very nice introduction to MAQAO. I am interested in using this tool in my research. I download the Linux binaries. However, I have not been able to figure out how to run the simulation for collecting the profiling the info and visualise it. A video tutorial demonstrating the usage with one example would be greatly appreciated. Thanks.
Great video thanks :)
Nice video, are the slides available?
The slides link has been added to the video description. They are available here: pop-coe.eu/sites/default/files/pop_files/pop-webinar-openmptasking.pdf