Rajeev Balasubramonian
Rajeev Balasubramonian
  • 176
  • 2 446 258
MICRO 2019 Lightning Talk: WAX: Wire-Aware Accelerator for Deep Neural Networks
Lightning talk for the MICRO 2019 paper: "Wire-Aware Architecture and Dataflow for CNN Accelerators", Sumanth Gudaparthi, Surya Narayanan, Rajeev Balasubramonian, Edouard Giacomin, Hari Kambalasubramanyam, Pierre-Emmanuel Gaillardon, 52nd International Symposium on Microarchitecture (MICRO-52) , Columbus OH, October 2019.
Paper pdf: www.cs.utah.edu/~rajeev/pubs/micro19b.pdf
Переглядів: 1 227

Відео

MICRO 2019 Lightning Talk: GenCache: Accelerator for Sequence Alignment
Переглядів 5244 роки тому
Lightning talk for the MICRO 2019 paper: "GenCache: Leveraging In-Cache Operators for Efficient Sequence Alignment", Anirban Nag, C.N. Ramachandra, Rajeev Balasubramonian, Ryan Stutsman, Edouard Giacomin, Hari Kambalasubramanyam, Pierre-Emmanuel Gaillardon, 52nd International Symposium on Microarchitecture (MICRO-52) , Columbus OH, October 2019. Paper pdf: www.cs.utah.edu/~rajeev/pubs/micro19a.pdf
ASPLOS 2019 Lightning Talk, Relaxed Hierarchical ORAM
Переглядів 9555 років тому
This is the lightning talk for the ASPLOS 2019 paper: "Relaxed Hierarchical ORAM", C. Nagarajan, A. Shafiee, R. Balasubramonian, M. Tiwari. The full pdf: www.cs.utah.edu/~rajeev/pubs/asplos19.pdf
Spectre Explained (CS/ECE 3810 Computer Organization)
Переглядів 5 тис.6 років тому
This is the University of Utah's undergraduate course on Computer Organization. Instructor: Rajeev Balasubramonian. This video explains the Spectre attack that exploits side channels and speculative execution to allow an attacker to examine the memory contents of a co-scheduled victim program.
Meltdown Explained (CS/ECE 3810 Computer Organization)
Переглядів 5 тис.6 років тому
This is the University of Utah's undergraduate course on Computer Organization. Instructor: Rajeev Balasubramonian. This video explains the Meltdown attack that exploits side channels and speculative execution to expose kernel secrets to an attacker.
Secure DIMM Lightning talk for HPCA 2018
Переглядів 7246 років тому
Lightning talk for HPCA 2018. Full paper: www.cs.utah.edu/~rajeev/pubs/hpca18.pdf "Secure DIMM: Moving ORAM Primitives Closer to Memory", Ali Shafiee, Rajeev Balasubramonian, Mohit Tiwari, Feifei Li, 24th International Symposium on High-Performance Computer Architecture (HPCA-24) , Vienna, Austria, February 2018. Paper abstract: As more critical applications move to the cloud, there is a pressi...
ISAAC: An Analog Convolutional Neural Network Accelerator (Part II)
Переглядів 6 тис.7 років тому
This is the second of two videos on the ISAAC analog accelerator for deep neural networks. The video is based on the ISCA 2016 paper "ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars" by A. Shafiee et al. Part II introduces the different components in an ISAAC chip and tile, the ISAAC balanced pipeline, and a results summary.
ISAAC: An Analog Convolutional Neural Network Accelerator (Part I)
Переглядів 17 тис.7 років тому
This is the first of two videos on the ISAAC analog accelerator for deep neural networks. The video is based on the ISCA 2016 paper "ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars" by A. Shafiee et al. Part I introduces the memristor crossbar unit, its inherent challenges, and how ADC overheads and precision can be managed.
Video 81: GPU Hardware Introduction, CS/ECE 3810 Computer Organization
Переглядів 6 тис.8 років тому
This is the University of Utah's undergraduate course on Computer Organization. Instructor: Rajeev Balasubramonian. This video describes the basic architecture of a GPU.
Video 80: Simultaneous Multi-Threading (SMT), CS/ECE 3810 Computer Organization
Переглядів 4,7 тис.8 років тому
This is the University of Utah's undergraduate course on Computer Organization. Instructor: Rajeev Balasubramonian. This video discusses simultaneous multi-threading.
Video 79: Parallel Version of the Ocean Kernel, CS/ECE 3810 Computer Organization
Переглядів 2,7 тис.8 років тому
This is the University of Utah's undergraduate course on Computer Organization. Instructor: Rajeev Balasubramonian. This video describes the shared-memory and message-passing parallel versions of the Ocean kernel.
Video 78: Parallel Programming Models, CS/ECE 3810 Computer Organization
Переглядів 3,6 тис.8 років тому
This is the University of Utah's undergraduate course on Computer Organization. Instructor: Rajeev Balasubramonian. This video discusses the programming process for multi-processors, contrasting shared-memory and message-passing models. It introduces the example of the Ocean kernel.
Video 77: Consistency Models, CS/ECE 3810 Computer Organization
Переглядів 4,3 тис.8 років тому
This is the University of Utah's undergraduate course on Computer Organization. Instructor: Rajeev Balasubramonian. This video describes consistency models, sequential consistency, relaxed consistency, and fences.
Video 76: Synchronization Primitives, CS/ECE 3810 Computer Organization
Переглядів 6 тис.8 років тому
This is the University of Utah's undergraduate course on Computer Organization. Instructor: Rajeev Balasubramonian. This video describes basic synchronization primitives (lock/unlock) and how they are implemented in hardware with an atomic exchange instruction (a test-and-set).
Video 75: Directory Based Cache Coherence, CS/ECE 3810 Computer Organization
Переглядів 17 тис.8 років тому
This is the University of Utah's undergraduate course on Computer Organization. Instructor: Rajeev Balasubramonian. This video describes a directory-based cache coherence protocol to support a multiprocessor system with a distributed memory organization.
Video 74: Cache Coherence Example (cont.), CS/ECE 3810 Computer Organization
Переглядів 13 тис.8 років тому
Video 74: Cache Coherence Example (cont.), CS/ECE 3810 Computer Organization
Video 73: Snooping Based Cache Coherence, CS/ECE 3810 Computer Organization
Переглядів 29 тис.8 років тому
Video 73: Snooping Based Cache Coherence, CS/ECE 3810 Computer Organization
Video 72: Virtual Memory Basics, CS/ECE 3810 Computer Organization
Переглядів 6 тис.8 років тому
Video 72: Virtual Memory Basics, CS/ECE 3810 Computer Organization
Video 71: Virtual and Physical Memory Management, CS/ECE 3810 Computer Organization
Переглядів 7 тис.8 років тому
Video 71: Virtual and Physical Memory Management, CS/ECE 3810 Computer Organization
Video 70: Main Memory System Basics, CS/ECE 3810 Computer Organization
Переглядів 6 тис.8 років тому
Video 70: Main Memory System Basics, CS/ECE 3810 Computer Organization
Video 69: Capacity, Conflict, Compulsory Misses, CS/ECE 3810 Computer Organization
Переглядів 9 тис.8 років тому
Video 69: Capacity, Conflict, Compulsory Misses, CS/ECE 3810 Computer Organization
Video 68: Cache Policies, CS/ECE 3810 Computer Organization
Переглядів 7 тис.8 років тому
Video 68: Cache Policies, CS/ECE 3810 Computer Organization
Video 67: Set-Associative Caches, CS/ECE 3810 Computer Organization
Переглядів 8 тис.8 років тому
Video 67: Set-Associative Caches, CS/ECE 3810 Computer Organization
Video 66: Cache Access Example, CS/ECE 3810 Computer Organization
Переглядів 8 тис.8 років тому
Video 66: Cache Access Example, CS/ECE 3810 Computer Organization
Video 65: Cache Access Terminology/Concepts, CS/ECE 3810 Computer Organization
Переглядів 8 тис.8 років тому
Video 65: Cache Access Terminology/Concepts, CS/ECE 3810 Computer Organization
Video 64: Access and Placement in a Cache, CS/ECE 3810 Computer Organization
Переглядів 7 тис.8 років тому
Video 64: Access and Placement in a Cache, CS/ECE 3810 Computer Organization
Video 63: Locality Benefits, CS/ECE 3810 Computer Organization
Переглядів 6 тис.8 років тому
Video 63: Locality Benefits, CS/ECE 3810 Computer Organization
Video 62: Cache Hierarchy Intro, CS/ECE 3810 Computer Organization
Переглядів 8 тис.8 років тому
Video 62: Cache Hierarchy Intro, CS/ECE 3810 Computer Organization
Video 61: Out-of-Order Implementation Details, CS/ECE 3810 Computer Organization
Переглядів 6 тис.8 років тому
Video 61: Out-of-Order Implementation Details, CS/ECE 3810 Computer Organization
Video 60: Out-of-Order Example, CS/ECE 3810 Computer Organization
Переглядів 12 тис.8 років тому
Video 60: Out-of-Order Example, CS/ECE 3810 Computer Organization

КОМЕНТАРІ

  • @user-bm4ig4fw1t
    @user-bm4ig4fw1t 2 місяці тому

    very nice,..do y ouhave videos on mesi, moesi as well

  • @quercus_opuntia
    @quercus_opuntia 4 місяці тому

    this video saved my life and marraige

  • @quercus_opuntia
    @quercus_opuntia 5 місяців тому

    Thank u Rajeev!!!

  • @hussainfathy1065
    @hussainfathy1065 8 місяців тому

    thank you <3

  • @VikramReddyAnapana
    @VikramReddyAnapana 9 місяців тому

    Excellent teaching, and so well explained. I absorbed. Thank you so much

  • @VikramReddyAnapana
    @VikramReddyAnapana 9 місяців тому

    Thank you so much.

  • @enzoding7558
    @enzoding7558 9 місяців тому

    it should be -128~127 instead of -127~128

  • @lulu-xp7mf
    @lulu-xp7mf 10 місяців тому

    thank you

  • @EriknocTDW
    @EriknocTDW Рік тому

    The 1st LW instruction has 8(R4) and the 2nd has 16(R4). What are the numbers 8 and 16 for?

    • @enzoding7558
      @enzoding7558 9 місяців тому

      4 byte, 8 byte, 16 byte, etc. that's the offset from the stack pointer or from the current address of a register(e.g. R4, R3, R8, etc)

  • @shridharbendi9087
    @shridharbendi9087 Рік тому

    Excellent presentation 👏👏

  • @rashedh2009
    @rashedh2009 Рік тому

    Annoying voice

  • @achieverakash5192
    @achieverakash5192 Рік тому

    you are super sir tomorrow is my exam you are my saviour

  • @chinmayrath8494
    @chinmayrath8494 Рік тому

    Awesome explanation. Thank you !!!!

  • @raghul1208
    @raghul1208 Рік тому

    best explanation

  • @MaxAbramson3
    @MaxAbramson3 Рік тому

    Nowadays, compile time branch prediction (with profiling) is usually better than 95%, and run time branch prediction (one cycle beforehand) is about 97%, according to some manufacturers. So branching code actually suffers from this technique as it make the code larger, resulting in both higher ICache mispredict rate and pressure on the instruction bandwidth. It's not ironic that the only surviving major architectures are those without a delayed branch.

    • @MaxAbramson3
      @MaxAbramson3 Рік тому

      Perfect branch prediction recognizes that only about 2-3% of branches are actually misbehaving, though we're only about to see about 90% of the time what direction a branch will go well ahead of time.

  • @ggxue
    @ggxue Рік тому

    nice thank you

  • @archgirl2665
    @archgirl2665 Рік тому

    Great video thankyou sir!

  • @chelseaethiohighlights1971

    Thank you! excuse me, would you provide slide for us?

  • @gabrieldias6430
    @gabrieldias6430 Рік тому

    you are life savier, man

  • @ok-jg9jb
    @ok-jg9jb 2 роки тому

    Tq sir

  • @Kepp-w6r
    @Kepp-w6r 2 роки тому

    concise explanation..

  • @rajeshhariharan7575
    @rajeshhariharan7575 2 роки тому

    Prof. Rajeev, Thanks a lot. very nicely explained.

  • @TheAyanMan
    @TheAyanMan 2 роки тому

    Great vid ☺️

  • @TheAyanMan
    @TheAyanMan 2 роки тому

    First comment 😃

  • @abhishekghosh1998
    @abhishekghosh1998 2 роки тому

    Also the fast algorithm picture, (as given in Hennessy Patterson Textbook) seems a bit erroneous to me. When we are adding Multiplicand.Multiplier[0] + Multiplicand.Multiplier[1], then wouldn't we need to shift the Multiplicand.Multiplier[1] by 1 unit to the left for addition alignment? The fast multiplier implementation given in the Carl Hamacher textbook, explains it properly (where the entire hardware is given in details). Actually, Multiplicand.Multiplier[0], this gives a 32 bit number, the LSB of which forms product[0].... And the rest of the bits of Multiplicand.Multiplier[0], i.e. from 1 to 31st are given to the first level adder... such that Multiplicand.Multiplier[0][1] is aligned with Multiplicand.Multiplier[1][0]. Please correct me if I am wrong.

  • @abhishekghosh1998
    @abhishekghosh1998 2 роки тому

    I do not think that the previous algorithm also works for signed numbers (I mean negative numbers) in all cases. The previous algorithm works in case of signed numbers, I guess, only when the multiplier is positive. The multiplicand can be positive or negative. But while right shifting the partial product formed at each step, we need to do an arithmetic right shift (instead of a simple logical right shift). Please correct me if I am wrong.

  • @souravgupta8182
    @souravgupta8182 2 роки тому

    Awesome Video!!

  • @jaysiddhapura
    @jaysiddhapura 2 роки тому

    Does DA and AD conversion needed between each layer !?

  • @weirdsciencetv4999
    @weirdsciencetv4999 2 роки тому

    Why would we need to keep the resolution as opposed to just making a network which is tolerant of error accumulation?

  • @Max-ge7sv
    @Max-ge7sv 2 роки тому

    If you have a network like this, the currents will not simply be added. In fact you have a complex current divider with multiple voltage sources. The output current is the result of the superposition of all current dividers, which depend on the resistor values and it will get more and more complex by increasing the input vector. Furthermore, the memrisors are changing their values by applying a voltage. How is it possible to get a consistent result?

    • @nabhay583
      @nabhay583 Рік тому

      fWhat if we ground the lines? Won't we then easily be able to add the currents due to superposition?

  • @omersakkar5670
    @omersakkar5670 2 роки тому

    أشكرك

  • @arnabsaha2021
    @arnabsaha2021 2 роки тому

    Very well explained

  • @hungke6211
    @hungke6211 2 роки тому

    how did you do it can you share with me , thank you

  • @warcroft23
    @warcroft23 2 роки тому

    I came across this channel in 2015; I was pursuing my master's. I was impressed by your way of teaching and articulate explanations. Since then, I have been revisiting this channel whenever I need to recall Computer Organization concepts.

  • @sugee98
    @sugee98 2 роки тому

    what happens is you've a heterogeneous multicore system with one processing element (say a DSP, for example) that has caches but doesn't participate in the MSI/MESI/MOESI protocol. I presume a read request will cause it to get he correct data from either main memory or another cache. But what if it wishes to modify that location and has, say write back cache). How do the other processing elements know it's been modified? Must the DSP do something in software to alert the other PEs? Excellent tutorial - the best I've seen in fact!

  • @manarzh7460
    @manarzh7460 2 роки тому

    Thank you Rajeev

  • @eurotrash4970
    @eurotrash4970 3 роки тому

    I have to say I love your british accent with a hint of indian as well. A lot less thick than most Indian computer science teachers you can find, and lot easier to understand

  • @pulkitjain25
    @pulkitjain25 3 роки тому

    Also, what is the size of the the directory typically ?

  • @pulkitjain25
    @pulkitjain25 3 роки тому

    Is the Directory on each node and is it kept coherent across all nodes? Or it is a in a global shared memory ?

    • @living_curious
      @living_curious 2 роки тому

      How the directory is updated on all nodes?

  • @rajatbhattacharjya1443
    @rajatbhattacharjya1443 3 роки тому

    This playlist needs to go viral

  • @vamosabv
    @vamosabv 3 роки тому

    Thanks Rajeev for sharing :)

  • @akhilhooda740
    @akhilhooda740 3 роки тому

    thanks sir ,you made easy

  • @saurabhdp
    @saurabhdp 3 роки тому

    Prof. Rajeev, thanks for the video and your explanation. I have 1 question. If a situation arises where, say for example, both Processor P1 and P2 want to write to their copies of x exactly at the same time and issue an upgrade request simultaneously. Then how is this issue resolved. Thanks.

  • @bhavanibedre
    @bhavanibedre 3 роки тому

    Your content is amazing. Do you have a sharable reference to your slides? It will be helpful to revise.

  • @ulasakyildiz
    @ulasakyildiz 3 роки тому

    ADAM OĞLU ADAM Thank you brother

  • @MelvinSimon777
    @MelvinSimon777 3 роки тому

    Amazingly explained Sir!

  • @vipulsharma3846
    @vipulsharma3846 3 роки тому

    Nice explanation of clock speed

  • @naziyaakhtha3424
    @naziyaakhtha3424 3 роки тому

    Very helpful

  • @ssvemuri
    @ssvemuri 3 роки тому

    Nice to see you on youtube Rajeev. Hope you are doing well!

  • @padalatejasaikumarreddy5321
    @padalatejasaikumarreddy5321 3 роки тому

    wowwwwwwwwwwwwww