Lecture 16. Static Instruction Scheduling - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

An Intro to GPU Architecture and Programming Models I Tim Warburton, Virginia Tech

Lecture 22: Memory Controllers - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

How to treat Acne💉

Морпіх із Каліфорнії доєднався до лав ЗСУ #shorts

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

Lecture 15. GPUs, VLIW, Execution Models - Carnegie Mellon - Computer Architecture 2015 - Onur Mutlu

Carnegie Mellon Computer Architecture

Переглядів 20 401

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 січ 2025

КОМЕНТАРІ • 13

@muneshchauhan 4 роки тому ⁺²
A good description of the difference between simt and simd 19:12
@FOUSTE95 4 роки тому
So, does anybody know an answer to question b? If first instruction is always executed for all 64 lanes, and the other 3 always have same number of threads executed out of 64. To get 67/256 utilization means that 4/64 threads execute all three instructions while 60/64 threads have bubbles. So, for array A[i], 4 elements out of every 64 are positive numbers. Also, it's possible that they are the same 4 lanes as there is no regrouping of threads into warps? What about B and C? Am I missing the point?
@vinnym2923 4 роки тому
This is my take (correct me if i'm wrong):-
#Warps # Threads for full utilization per warp # Threads actually utilized per warp Instruction
106012 64 64 >
106012 64 X Add
106012 64 X Add
106012 64 X Add
(64 + 3X) / (64 * 4 ) = 67/256

X = 1
So 1 thread per warp after the branch is not utilized.
@FOUSTE95 4 роки тому
@@vinnym2923 Yes, for that part of the question it seems you are right, I don't remember what kind of drunk math I was doing. It was two months ago and can't remember my thought process. So it's 1/64 utilized for those three add instructions. Still, the 'main' question I had is how can you determine anything about those three arrays, other than that array A has 1 positive number out of 64. *because in a warp one thread will execute fully, thus condtion (A[i] > 0) is met*. What about B and C?
@FOUSTE95 4 роки тому
@@vinnym2923 Update> Just googled and found the solution. For A is what you calculated (1 out of 64 is positive). For arrays B and C answer is: "Nothing". -_-
@FOUSTE95 4 роки тому ⁺¹
@@vinnym2923 Here's a link if somebody wants to see other solutions> www.coursehero.com/file/9331629/Homework-4-Solutions/
@vinnym2923 4 роки тому
FOUSTE95 In that case my calculation is wrong. I was assuming an even distribution of un utilized threads in warps. I assumed this because my understanding was that whenever the condition is satisfied, the i th threads won’t take part in the calculation. But based on the solution you mentioned, it looks like the threads are getting in utilized only when we add with A and mov data to A.
@chrissears2395 5 років тому
How is DAE different from a superscalar like Haswell with separate memory ports (Access) and ALU ports (Execute)?
@wewillrockyou1986 3 роки тому ⁺¹
x86 uses a single instruction stream, DAE has 2 explicitly separate instruction streams that are independent of each other.
@jeroenvanlangen8953 6 років тому
On page 36, you talked about losing efficiency because on a condition a warp gets split-up into two warps. What if all the threads follow the same path, so none are taking (for example branch D). Is step D completely skipped? So you're not losing an extra cycle? Or does it do a NOP at D. At some point the PC should skip it?
This could mean, building a "huge shader to rule them all" when all threads are following the same conditions, you won't get a panalty?
@vivekpadigar1033 6 років тому ⁺¹
Next PC is known only after the execution of the branch condition. I think the mask is generated after this. So if only one path C is taken, then the active mask should be 1111 (no need for a fork) and D should be skipped entirely.
@JoannaHammond 5 років тому ⁺¹
I suppose if your Warp hits 10 it will be computing at all points in space and time?
;)
@muneshchauhan 4 роки тому
Relocating the thread (10) in a warp in order to have more denser warp may also need reconfiguring the data access mapping in memory.

Наступне

Автоматичне відтворення

Lecture 16. Static Instruction Scheduling - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

Lecture 16. Static Instruction Scheduling - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

An Intro to GPU Architecture and Programming Models I Tim Warburton, Virginia Tech

An Intro to GPU Architecture and Programming Models I Tim Warburton, Virginia Tech

Lecture 22: Memory Controllers - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

Lecture 22: Memory Controllers - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

How to treat Acne💉

How to treat Acne💉

Морпіх із Каліфорнії доєднався до лав ЗСУ #shorts

Морпіх із Каліфорнії доєднався до лав ЗСУ #shorts

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

Unexpected way to open the new Audi A6 e-tron Frunk 😮! #shorts

Unexpected way to open the new Audi A6 e-tron Frunk 😮! #shorts

Computer Architecture - Lecture 9: GPUs and GPGPU Programming (ETH Zürich, Fall 2017)

Computer Architecture - Lecture 9: GPUs and GPGPU Programming (ETH Zürich, Fall 2017)

Lecture 12. Out of Order Execution - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

Lecture 12. Out of Order Execution - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

How Myst Almost Couldn't Run on CD-ROM | War Stories | Ars Technica

How Myst Almost Couldn't Run on CD-ROM | War Stories | Ars Technica

Lecture 21: Main Memory and the DRAM System - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

Lecture 21: Main Memory and the DRAM System - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

I Hate Nintendo and I’m Buying a Switch 2 IMMEDIATELY

I Hate Nintendo and I’m Buying a Switch 2 IMMEDIATELY

Computer Architecture - Lecture 29: SIMD & GPU Architectures (Fall 2023)

Computer Architecture - Lecture 29: SIMD & GPU Architectures (Fall 2023)

Writing Code That Runs FAST on a GPU

Writing Code That Runs FAST on a GPU

Lecture 14. SIMD (Vector Processors) - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

Lecture 14. SIMD (Vector Processors) - Carnegie Mellon - Comp. Arch. 2015 - Onur Mutlu

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

How to treat Acne💉

How to treat Acne💉

Анна Трінчер - Треш (Official Music Video)

Анна Трінчер - Треш (Official Music Video)

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

Рабочий способ бросить вредную привычку

Рабочий способ бросить вредную привычку

Мама загинула у блокадному Чернігові, а тато у полоні РФ #війна #люди #україна #shorts #смерть

Мама загинула у блокадному Чернігові, а тато у полоні РФ #війна #люди #україна #shorts #смерть

REAL or FAKE? #beatbox #tiktok

REAL or FAKE? #beatbox #tiktok